Imagine we have a clock with internal gears and springs and other mechanical parts. The clock works fine for awhile, then one day the clock malfunctions—the clock does something unexpected, and we are surprised by the unexpected results. We want to fix the clock, so we take the clock apart and look for a broken part. Our intention is to find the broken part, remove the broken part, and replace the broken part with a new part. Once the broken part is replaced we expect the clock to function properly again.
Now imagine we have the same broken clock, we take it apart and carefully examine all the internal parts, and determine that there are no defective parts inside. Every part is found to be in good condition. There is no single broken part—the problem is with the design of the system-as-a-whole. Even though all the parts are working as expected, the system-as-a-whole still failed.
It's possible for a system with no defective parts to still fail. Just because we get an unfavorable result does not mean there must be a broken part.
This is especially true of systems which include people. Take for example the field of airplane transportation. When an airplane takes off the hope is that it will land at it’s destination airport without incident. It’s amazing how often this is actually the case. However, once in awhile the end result turns out to be not as we had hoped—instead of the airplane ending up back on the ground safely in one piece, something unexpected happens: the airplane “crashes.”
Inevitably the first reaction is to “look for the broken part”. First the pilots are scrutinized—could this disaster have been averted if the pilots had done something different? If the answer is YES, then the pilots are blamed—they are identified as “The Broken Part” which needs fixing or replacing.
However, this does nothing to focus attention on the system-as-a-whole. Pilots are only one part of a much bigger system, because inevitably it was our system-as-a-whole that failed us, not just one part. No one wanted the airplane to crash; everyone wanted and expected the airplane to end up at it’s destination airport safely in one piece. However, this wasn’t the end result. It’s quite possible that none of the parts involved in this system-as-a-whole were defective, and it was the system-as-a-whole itself that failed us.
Each part, each person involved, may have performed as best as anyone could hope for. People make decisions based on limited knowledge, to the best of their ability, in real time, with no crystal ball to consult that would accurately predict what the future will be based on a decision they make now. It is only after the end result is known that we can look back in hindsight with a God-like view of what happened. And it is our philosophy that determines where we look and what we look for.
The problem isn’t that people fail to follow our ideal model of how the system should work, the problem is our ideal model fails to model how the real world actually works.
An excellent introduction to this new philosophy is the book:
This is more a book on philosophy than it is about aircraft accident investigations, though it uses transportation accident investigations as a framework. The basic result is people don’t want disasters to happen, and yet they still happen. The old philosophy was to “find the broken part” i.e. someone or something to blame, and replace him/her/it. Underlying this old philosophy is the unspoken assumption that the design of the system itself is safe, therefore unwanted outcomes are caused by defective parts.
The new philosophy is the system itself is not entirely safe. It’s possible to “drift into failure,” where no one wants a disaster to happen, no one is to blame, yet we still occasionally have an unfavorable outcome, because the system itself is not 100% safe. (After all, the system incorrectly predicted a favorable outcome, so the system must be defective.)
—David W. Deley