Avoiding catastrophe: the lessons of Deepwater Horizon

|
The Deepwater Horizon fire, 21st April 2010. Photo: Deepwater Horizon Response via Flickr (CC BY-ND).
The Deepwater Horizon fire, 21st April 2010. Photo: Deepwater Horizon Response via Flickr (CC BY-ND).
We must coldly examine how inherently dangerous systems work and how they fail, writes Earl Boebert, and then apply those insights to reducing the risk of failure through systems design, regulation, and education. That examination must apply the most modern and effective analytic tools. To do otherwise is to almost guarantee a repeat catastrophe.
This model will be a simplification and an idealization, and consequently a falsification. It is to be hoped that the features retained for discussion are those of greatest importance in the present state of knowledge.

A recent film casts the tragic fate of the Deepwater Horizon, her crew and the Gulf of Mexico as resulting from a confrontation between two individuals which the wrong person won.

What actually occurred was a classic 'emergent event', arising from the interaction of multiple decisions and actions, the majority of which took place weeks or months before the events of the film began.

Taken out of context, each factor and its associated risk appeared innocuous-taken together they generated a catastrophe.

If we are to recognize and avert such events before they emerge, we need to learn from those that have occurred. Before we can intelligently devise technologies, procedures, and regulations we must insure that our analyses are sound. To do so requires that we cope with the inherent complexity of emergent events.

Abstract models and analysis

Analysis deals with complexity by constructing abstract models; the nature and limitations of such models was succinctly stated by Alan Turing in the last paper he published:

"This model will be a simplification and an idealization, and consequently a falsification. It is to be hoped that the features retained for discussion are those of greatest importance in the present state of knowledge."

It is now common knowledge that abstract models omit some things and tidy up others, but the phrase "it is to be hoped" shows a depth of understanding rare even today.

It recognizes the fact that there is no objective criteria for determining what is put in and what is left out of an abstract model, just the hope that arbitrary set of "features retained for discussion" is one that provides useful insights, not just the individual factors of an emergent event but also how they interacted to cause the event to emerge.

Barrier models

The 'barrier' or 'Swiss Cheese' model is pervasive in accident analysis. It models a failed system as a set of preventative barriers with failures represented as holes in them; when the holes line up, a 'hazard', or potential accident, is permitted to pass through and becomes real.

Another form of barrier model is the so-called 'bow tie' model, which enumerates not only barriers which could prevent an accident but also those which could contain its effects. This model was adopted by the US Chemical Safety Board in their analysis of Deepwater Horizon.

This model will be a simplification and an idealization, and consequently a falsification. It is to be hoped that the features retained for discussion are those of greatest importance in the present state of knowledge.

Other reports on the event, official and unofficial, did not adopt models but were content to simply enumerate causal factors without formally treating their interaction with each other.

Barrier models have been criticized, most notably by Professor Nancy Leveson of MIT, for being simplistic, focused on blame or exoneration, and giving excessive importance to the actions of crews - the latter misdirection being reinforced by the film.

A barrier model was used in BP's internal report to argue that they were not involved in the accident: they listed eight failed barriers, all of which were the responsibility of other participants.

Control system model

Leveson proposes an alternative model of an emergent event as the failure of a feedback control system, one that receives information and initiates action. Leveson's model has been used to analyze the Fukushima nuclear powerplant disaster, a high-speed train accident in China, and the Korean ferry accident.

It has also been used to analyze the safety of nuclear powerplant controls and the dynamic positioning system of an offshore drilling support vessel. Most importantly, it has formed the basis of a process to 'design in' systems safety at the earliest stages of designing high-consequence systems.

We used the principles of this model in our book, but the limitations of evidence, such as the almost complete absence of event data recording and the refusal of key personnel to testify, prevented us from adopting it in full.

The illustration (above right) shows the overall model we used, a hierarchy of control systems with BP corporate at the top, BP's Exploration and Production office (known to participants as 'Town', and an entity omitted from the film) as intermediary between corporate and the Deepwater Horizon.

The Macondo well at the bottom and the regulatory agency acted essentially as a bystander. The control mechanisms between BP corporate and Town, and Town and the rig, were administrative in nature. The control of the well by the rig was technological.

Insights from the model

The disaster has been characterized as a 'normal accident', one that is the unavoidable result of the operation of a complex system. In the case of Deepwater Horizon, this observation is true but incomplete - that which is 'normal' for a corporation organized like BP may not be so for a corporation organized like Exxon Mobil.

BP was well-known in the industry as an outlier with regard to its tolerance for risk and its management system, which resembled a hedge fund more than an engineering company. By looking at its corporate management system as a control system we see how it emphasized financial efficiency, imposed severe cost constraints on units such as Town, and took a 'hands off' attitude toward how they operated.

Town was a severely understaffed organization that was unable to respond adequately to a difficult well and the pressure to move off Macondo to meet an important regulatory deadline. The shortcomings of the Houston operation made it unable to properly interpret information and issue sound instructions to the crew on the rig.

The resulting miscommunication, confusion and improvisation on the part of the crew during the last 48 hours then led them to improperly interpret signals from the on-board control technology that connected the rig to the well. What errors the crew might have made were facilitated by inadequacies in that control technology.

Those inadequacies included the misleadingly-named blowout preventer, which was intended to act both as a control mechanism during normal operations and a device of last resort in abnormal circumstances. Almost uniformly, previous analyses focused on the failure of the device to stop the blowout at the last moment without examining how other shortcomings contributed to the blowout happening in the first place.

None of the systems inadequacies described above are readily visible in a barrier model or an analysis which simply enumerates factors without examining how they interact.

Significance

Whether we like it or not, complex, high-risk / high-consequence systems like offshore drilling exist and massive financial interests will insure that they are not going away any time soon. Their safety results from the interaction of technology, human action, and organizational dynamics, and that interaction is far from simple.

If we as a society are to intelligently cope with them we must set aside glib generalizations, ideological preconceptions and Hollywood caricatures. We must coldly examine how such systems work and how they fail, and then apply those insights to reducing the risk of failure through systems design, regulation, and education.

That examination must apply the most modern and effective analytic tools. To do otherwise is to almost guarantee a repeat catastrophe.

 


 

Earl Boebert is a retired Senior Scientist of Sandia National Laboratories. He has participated in ten National Research Council studies and has been named a National Associate of that organization. He and James Blossom have spent the last five years studying the Deepwater Horizon event and the results of their work have been published by Harvard University Press as 'Deepwater Horizon: A Systems Study of the Macondo Disaster'.