08/27/2010 03:21 pm ET Updated May 25, 2011

When Complex Systems Fail

As progress continues in efforts to permanently shut down the oil well damaged by the explosion on the Deepwater Horizon oil rig, engineers, policymakers and others must now turn their attention to taking a renewed look at the far-reaching consequences that can occur when complex, human-engineered systems fail. Regardless of whether the complex system in question is an oil rig in deep water, a nuclear power plant, a spacecraft or even a long-haul jet, engineers and others involved in design, regulatory or maintenance processes have a moral imperative to fully examine not only the development and operation of these complex systems, but also their "fault-tolerance," in other words, a design's ability to continue to operate at a reduced level or safely cease operation even in the event of a predictable failure.

Tragedies like Deepwater Horizon, which take a toll on human life and result in catastrophic social, economic and environmental consequences, remind us of both the robust nature of these systems and of their vulnerability. It's precisely because of the overwhelming success and vigor of such systems that we often take their reliability for granted. Most of us rarely consider the safety of an oil rig until something goes awry and it makes the front page.

Likewise, we don't consider the vast efficiencies of modern transportation unless our aircraft gets rerouted and we miss that all-important connecting flight. Most of us don't think much about the power grid unless there is a systems failure and our lights don't respond when we flip the switch. Yet, on January 28, 1986, it was a systems failure that caused the Space Shuttle Challenger to explode 73 seconds into its flight, an incident that nearly brought an end to America's space program.

Two years following the Challenger disaster, the Aeronautics and Space Engineering Board stated that NASA's processes for analyzing failure modes had to include three elements: a comprehensive method to identify potential failure modes and hazards; a specific, quantitative methodology for identifying and assessing the safety risks, and a risk management process "by which the safety risk can be brought to levels (or values) that are acceptable to the final approval authority."

When we look at the Deepwater Horizon tragedy we can see how these steps resonate today -- by recognizing the necessity of establishing appropriate risk levels for the functioning of complex systems and for assuring appropriate certification and stringent quality assurance.

To maintain the public's confidence and trust, engineers and other technologists must remain vigilant and mindful never to test the boundaries of prudent risk management. One untimely decision -- just one misstep -- can change the course of many lives. Our world today is haunted by the specter of risk -- ranging from the technical to the economic to the geopolitical. That's why the time is right for a cross-disciplined approach to reviewing risk-management processes as they relate to complex systems and sound engineering ethics.

In October, the American Society for Mechanical Engineers (ASME) will convene a task force of experts and authorities in risk management processes to begin exploration of experiences across industries, lessons learned, best practices, R&D needs and ethical responsibilities associated with mitigating the consequences of complex system failures.

Assessing risk is a rigorous technical process that involves sophisticated methodologies. Engineers must be vigilant in analyzing and applying protocols and scenarios to thwart events that can cause an undesired consequence. Engineers possess the core competencies both to assess the technical specifications of the components used in complex systems and to ensure that those systems live up to the necessary standards for the life of the project.

All of us -- engineers, policymakers, the media and the public -- have a role to play in ensuring that proper practices are in place to minimize the risks inherent with many critical complex systems.

Thomas G. Loughlin
Executive Director, ASME