We Takes Our Chances

It started for me at 07:15 EDT, when I roused enough that the noise on the radio resolved itself into words. Something about a Microsoft update going wrong and several airlines having trouble? That initial report proved to be inaccurate: it wasn’t a Microsoft update, it was other software that had gone wrong.

The first report from the front is wrong.
– Rick Hillier (Gen, ret’d)

But the glitch was shutting down computers running Microsoft, of which there are a few, here and there, and the bit about the airlines being affected was accurate. I was now fully awake, because my day included the task of getting a West Coast teenager onto a homeward-bound Air Canada airplane. Exactly which airlines were affected, did you say?

A cursory search showed that in Canada the problems were mainly being felt by (were, perhaps, limited to?) Porter Airlines. Whew. But I wondered how bad it was, really, so I checked their website. At 08:00 Porter was already saying that no flights would leave Ottawa, their hub, before noon.

The first estimate from the front is wrong.
– Rick Hillier (Gen, ret’d) (paraphrased)

By 12:00, that in-the-first-flush-of-optimism estimate had been pushed out to 15:00, like, you know, at least.

Further delays and cancellations are possible.

And what should we expect even if things went as smoothly as possible?

Passengers cannot be rebooked while systems are offline.
The rebooking process will take a period of time,
with new flights confirmed over a number of days
due to high passenger volume.
Porter Airlines (emphasis added)

It’s that bolded bit that caught my attention: This is going to take days, folks. Days. That’s to recover from the travel disruption caused by an IT glitch for which the fix/patch/repair had already been identified and distributed to affected companies in the dawn’s early light.

It caught my attention not because it was new, but because it wasn’t. Building systems, operating them well, and recovering them after a failure: a textbook illustration of hard, harder, and hardest.

When a single flight is delayed, it disrupts travel plans for tens of people for hours at least. When a single flight is flat-out cancelled, it disrupts travel plans for hundreds of people for a day; two days, if it’s a fully booked big airplane in a busy travel season. When a half-day’s worth of flights (or more) are cancelled by a storm–physical or virtual–the impacts are felt for days. Some people wait a while for a new flight; some people wait a while longer; some people abandon their plans to travel. Eventually, things settle back into equilibrium, where each day’s travellers are accommodated by that day’s operational flights.

Our systems–not just airlines, but including them–function best under a steady, predictable load. When things go sideways–not just for airlines, but including them–the mechanisms designed for steady-state operations struggle to accommodate the extra demand.

Businesses can make their systems more resilient to sudden, sharp shocks by adding capacity. In this example, airlines could book airplanes to only about 60% capacity and operate extra flights. Would it limit the impact of disruptions? Sure. Will they do it? Nope. Most people won’t/can’t pay the added cost required for hyper-reliable air travel: a high surcharge on every flight to mitigate the risk of the infrequent incidents.

We’ve paid our money;
we’ll take our chances.

In my own life, I often have the same choice: Should I allow for things to go wrong? Or, rather, how much should I allow?

I leave an extra 10 minutes on a 20-minute drive across town for a mid-morning appointment at my doctor’s office: I don’t camp out in the parking lot the night before.

We arrive at the “rally point” one day early when we’re travelling to join a group tour: We don’t go out a week ahead.

The Big Guy checks weather forecasts carefully when we drive across all those rectangular states in early winter, adjusts our time of departure and route of travel to the extent possible, and knows enough about the route to be able to choose a get-off-the-road-safely spot if we have to: We don’t stay home and wait for a clear run of good weather.

Those are all travel examples, but the principle holds true in other areas of life, from stocking a pantry to scheduling activities. I have the option of running at full capacity (Plan for the most from every moment and hang the disruptions!) or of building-in resilience with surplus capacity (Guarantee the outcome and hang the cost, opportunity and otherwise!). I don’t have the option of avoiding problems.

Further delays and cancellations are possible.

Yes, they are. In life looked at as a whole, they’re not just possible but certain. If I want to manage them within my own tolerance for risk, my own reaction to disruptions, then I have to stop to think:

  • What could go wrong?
  • What am I prepared to do to reduce the likelihood of that wrongness?
  • How can I access extra capacity if that wrong thing happens anyway?

The answers I gave at 20–or even at 50–aren’t likely to be the same ones I give in my 70s.

 

This entry was posted in Thinking Broadly, Wired and tagged , , . Bookmark the permalink.

8 Responses to We Takes Our Chances

  1. John Whitman says:

    Isabel – sums up things in life quite well I think. Wish more people thought the way you do.

    • Isabel Gibson says:

      John – Well, there’s thinking and then there’s executing. Sometimes I do both . . . 🙂

  2. Jim Robertson says:

    As John says, a good summary

    I try not to think about a prolonged outage vis-a-vis banks accounts and investments…

    • Isabel Gibson says:

      Jim R – What do they call it – an electro-magnetic pulse? I guess that’s a real danger. Time to stash some cans of beans, maybe.

  3. Tom Watson says:

    Isabel
    The cyber security firm CrowdStrike caused huge disruptions when its software update went kerflooey. It even affected little old me. I use a Kindle when I’m conducting a wedding service, because I can hold it in my right hand, turn the page with a flick of my right thumb, and hold a microphone in my left hand. Quite convenient. But, on Friday, the wedding document wouldn’t download onto the Kindle. Under other circumstances I could have used my iPad—not nearly as convenient because it’s larger to hold, but a work-around when necessary—but iPads are useless in sunlight because of the glare on the screen. So, what to do?

    Horror of horrors, I was relegated to use a paper copy! How antiquated for a 21st century geek!

    Think I’ll call the CEO of CrowdStrike and give him a piece of my mind!
    Tom

    • Isabel Gibson says:

      Tom – 🙂 I didn’t see anything in the news about this particular glitch, so thanks for sharing. Digital technology is woven through our lives now, and all kinds of things start to come undone when the technology fails.

      • Barbara Carlson says:

        We are one perfect solar storm away from humans going into a fetal position as the cyber world shuts off. There are now too many things I try not to think about. About as possible as not thinking about an elephant…
        Think I’ll go read a book.

        • Isabel Gibson says:

          Barbara – Or take up knitting! But get your patterns downloaded from your online archive and printed now… 🙂

Comments are closed.