Birmingham plane & Swiss cheese: Understanding what really causes software system flaws

Tania Ostanina
Bootcamp
Published in
4 min readApr 10, 2021

--

Plane take-off (image: priyanka | Noun Project)

In this Friday’s news, a Tui plane took off precariously from the Birmingham airport, because the pilot underestimated the plane’s weight. This was due to a mistake in the software that had predicted the average weight of passengers. Persons with a ‘Miss’ in their title had been classed as children, and therefore given an estimated weight of 35kg, instead of the usual 63kg reserved for grown women. Thankfully, the pilot’s skill and expertise avoided a disaster.

When this news story was first posted on my office’s Slack, it was flagged as a developer mistake. But, from where I am standing, it is not. Let me elaborate.

The Swiss cheese model (image: Alice Noir | Noun Project)

Even before I ventured into UX design, I have, like many of us, devoured Don Norman’s iconic book, The Design of Everyday Things. Throughout the book, Don argues that human mistakes in using a system are never the user’s fault, but always the system design’s fault. Further, Norman discusses the importance of the so-called Swiss cheese model in understanding the causes of system accidents — an approach known in the risk management world for some time. Broadly, this model can be summed up as follows:

When disasters happen, it is rarely just one event or one flaw that causes them. It is usually several flaws, many of them systemic rather than individual human actions. When they happen to align on one particular day, like the proverbial Swiss cheese holes, disaster strikes. For every system, a critical number of holes must align; otherwise, disaster does not strike. The more holes there are that need to align for disaster to strike within a system, the rarer is the disaster’s occurrence.

For simpler systems (like house fire safety measures), it could be just 3–4 flaws:

A faulty fire alarm system + flammable surface finishes + a tealight left unattended = house catches fire.

For a nuclear meltdown, it could be hundreds of flaws.

The infamous Hawaii missile alert text message (image: Wikimedia Commons)

Sadly, despite the awareness of the Swiss cheese model in the industry, because of our love of a good scapegoat, more often than not, a single human culprit gets the axe, so they can get publicly fired and gain notoriety in the media. Take, for example, the Hawaii false missile alert of 2018. There was a campaign within the UX community to expose the faulty design of the alert system and to exonerate the single person who pressed send on the now infamous text message. Unfortunately, the person in question still lost their job.

Now, let’s come back to our Tui airplane with all the “Misses” on board. From the information I have at this stage, the most obvious Swiss cheese holes are as follows (there will undoubtedly be others):

HOLE 1: Loss of context due to globalised outsourcing

The software for the passenger weight estimation was outsourced to a country other than the origin or the destination. Hardly surprising — airplane building these days is an international affair.

HOLE 2: Miscommunication about cultural norms

The difference in the meaning of the word “Miss” between the country that produced the software, and the origin and destination countries, was not taken into account.

HOLE 3: Wrong specification

The designers specified what needed to be built, including providing an estimated weight for passengers with “Miss” in the title.

HOLE 4: Wrong implementation

The developer team went ahead and built what they were asked to build.

HOLE 5: Lack of adequate tests…

…in the consumer countries prior to release, to ensure that the software was functioning correctly.

HOLE 6: Thankfully, hole 6 never materialised

Thankfully, the pilot was experienced and perhaps also lucky, and managed to navigate their way out of the tricky situation they had been thrown into. And because hole 6 did not align with the others, this event was a near, well, miss (excuse the bad pun).

And now, coming back to my original point — was this a developer mistake or not?

Out of the 6 Swiss cheese holes identified above, only one (no.4) is developer related. The remaining ones are process and design related. A developer, regardless of the country they are in, should not be expected to know what titles are used by persons in the UK. That is not their job. Their job is to follow specifications, which is what they did. While the implementation of the product had a mistake in it, the real faults happened elsewhere along the line.

Conclusion

For every incident like this, instead of blaming individuals or single teams for creating flaws or causing accidents, let us consider how the Swiss cheese model could help us understand the complexity of the whole process of design and creation of software systems. Hopefully a future detailed investigation into the Birmingham plane take-off will do just that.

--

--

A UX designer who has switched from architecture. I write about UX, design, architecture, art, and the social impact of technology.