Worthy Read: All Complex Systems (transportation, healthcare, etc.) are Inherently and Unavoidably Hazardous

Why Every Product Manager Should Read Richard Cook's 'How Complex Systems Fail'

Jul 19, 2024

Today’s cascade of disruptions worldwide from a faulty software update at one company has highlighted vulnerabilities in our interconnected systems, affecting airlines, banks, and hospitals. Commenting on the news, Ethan Mollick shared Richard Cook's paper titled "How Complex Systems Fail" on Linkedin, where Cook asserts that "All of the interesting systems (e.g., transportation, healthcare, power generation) are inherently and unavoidably hazardous"

After reading the paper, I found it to be extremely concise, wise and insightful. Following this, I discovered a talk by Richard Cook at the Royal Institute of Technology in Stockholm, which further elaborates on the themes of his article. Both the article and the talk are definitely worth reading and watching for anyone interested in understanding the intricacies of complex systems. I thought I'd share these resources and some key takeaways from them.

The Paper : How Complex Systems Fail by Richard Cook

Richard Cook’s article, “How Complex Systems Fail” provides an analysis of the nature of complex systems and their inherent hazards. Cook, a renowned expert in cognitive technologies, outlines how systems like transportation, healthcare, and power generation are not only intricate but also intrinsically hazardous. I believe this understanding is important for product managers, innovation leaders, and technology consultants who navigate these complexities daily.

Cook's article emphasizes that these systems are heavily defended against failure through multiple layers of defense. These defenses include obvious technical components such as backup systems and safety features, as well as human components like training and organizational policies. However, despite these defenses, catastrophic failures can still occur, often resulting from a combination of smaller issues rather than a single point of failure.

I was impressed by the conciseness and clarity of Cook's insights. His framework is not only theoretical but also relevant to real-world applications, making it an important (short) read for professionals in our field.

Key Insights from Cook’s Analysis

Inherent Hazards of Complex Systems

One of the most interesting points Cook makes is that complex systems are intrinsically hazardous. This inherent risk comes from the very nature of these systems, which are designed to perform highly complex and critical tasks. The frequency of hazard exposure can sometimes be managed, but the processes involved in the system are themselves irreducibly hazardous.

Layers of Defense

Cook highlights the importance of constructing multiple layers of defense to protect against failures. These defenses include technical components like backup systems and safety features, as well as human elements such as training, knowledge, and organizational policies.

Combination of Failures

Another key insight from Cook’s article is that catastrophic failures often result from a combination of smaller issues rather than a single point of failure.

Additional insights from Cook’s Talk at Royal Institute of Technology, Stockholm

Richard Cook expands on his paper’s insights in his talk at the Royal Institute of Technology, offering practical examples and deeper explanations:

Systems as Imagined vs. Systems as Found

Cook introduces a critical distinction between "systems as imagined" (theoretical design) and "systems as found" (real-world operation). While theoretical designs are often static and deterministic, real-world systems are dynamic and stochastic. This disparity highlights the importance of understanding both perspectives for effective system management. For instance, the "systems as imagined" approach may fail to account for the myriad ways operators interact with the system in practice, leading to unexpected issues and failures.

Real-World Example: The Infusion Pump Incident

Cook shares a compelling anecdote about an infusion pump incident to illustrate how small, overlooked details can lead to significant failures. A hospital decided to standardize all its infusion pumps, bringing in a new, state-of-the-art model. However, a year later, a software configuration setting caused 20% of the pumps to malfunction at the same time. This incident underscores the need for thorough understanding and preparation for seemingly minor details that can escalate into major issues.

Resilience Engineering

Cook emphasizes the importance of resilience engineering, which focuses on designing systems that can withstand and recover from failures. Unlike traditional reliability-focused engineering, resilience engineering prioritizes the system's ability to adapt and continue operating under stress. This approach is vital for product managers aiming to create robust and flexible systems. Resilience engineering involves continuous monitoring, rapid adaptation to emerging risks, and ongoing operator training to handle unforeseen challenges.

Success in Operations

A fascinating point Cook makes is that the real surprise is not the number of accidents, but rather how few there are. His research focuses on understanding why systems succeed despite their inherent risks. For example, Cook references various high-stakes environments such as surgical operating rooms and military cockpits, where the high tempo and complexity make failures seem almost inevitable. Yet, these environments often perform exceptionally well due to the resilience and adaptability of their operators.

Practical Implications for Product Managers

Understanding Complexity: Recognize inherent complexities and risks in systems. Prepare for and mitigate potential failures. Foster a culture of continuous learning and adaptation.
Implementing Defenses:
1. Invest in Both Technical and Human Components: Ensure systems have technological safeguards (backup systems) and well-trained personnel.
2. Design Robust Systems: Develop systems that can withstand multiple simultaneous failures through thorough testing and validation.
3. Continuous Monitoring and Adaptation: Implement ongoing monitoring, regular audits, stress testing, and scenario planning.
4. Prioritize Education and Development: Emphasize continuous training and skill development to handle new challenges and technologies.
Continuous Adaptation: Stay updated with the latest technological advancements. Understand emerging risks. Develop proactive strategies to address new challenges.

Final words

Reflecting on Richard Cook’s article and his talk, I found them to be invaluable resources for understanding the intricacies of complex systems. His analysis provides a framework for thinking about system failures in a more holistic and nuanced way. For product managers, this perspective is crucial for designing systems that are both innovative and resilient.

Incorporating Cook’s insights into our work can help us better anticipate and mitigate the inherent risks in complex systems. It is essential to remember that managing these systems requires not only technical expertise but also a deep understanding of human and organizational factors.

I highly encourage you to read Cook’s article and watch his talk. While I don’t post often, I aim to share more insightful content in the future. Stay tuned and subscribe :)

Article: How Complex Systems Fail
Talk: Richard Cook's Talk at Royal Institute of Technology

Innovation Talks by Younes Labbar

Discussion about this post