Peter B. Ladkin
Research Report RVS-RR-96-12
Abstract
I analyse the 'probable cause' of the 1979 Chicago DC-10 accident
using a minimal formalism, and find an omission. The omission is
contained in the body of the report. McDonnell Douglas's statement
barely addresses the point contained in the omission. I conclude
that formalism helps in accident reporting by enabling simple consistency
and omission checks.
Accident reports in aviation
present careful reasoned conclusions about causes and causal factors
contributing to the accident, as well as providing pro forma
details which may be useful in other contexts, say for statistical
investigations of accident types.
The reasoning in accident reports is informal. Could it help to
formalise this reasoning? I'll consider one famous example in detail.
First, the statement of 'probable cause'
of the Chicago-O'Hare 1979 DC-10 accident
(1):
-
- The National Transportation Safety Board determines that the
probable cause of this accident was the asymmetrical stall and
the ensuing roll of the aircraft because of the uncommanded
retraction of the left wing leading edge slats [...] and the loss
of stall warning and slat disagreement systems resulting from
maintenance-induced damage leading to the separation of the
No. 1 engine and pylon assembly at a critical point during
takeoff. The separation resulted from damage by improper
maintenance procedures which led to failure of the pylon
structure.
Let's analyse this statement with the method of
(2).
First, we list the crucial events, and denote each by a simple phrase.
Second, we determine all relations between
the events given by true assertions of the form:
'why ..... because .... .
A Simple Example
Let's take a simple hypothetical example to illustrate the method.
Stage 1.
The crucial events are
- [1] the aircraft hit the ground
- [2] the aircraft stalled
Stage 2.
'Why did the aircraft hit the ground? Because
it stalled'
Well, actually no, maybe not. If the aircraft was at 100 feet, this would
be true. But the aircraft was at 3000 feet.
I write this 'why?....because' relation as [2]~>[1].
'Why did the aircraft hit the ground? Because
it stalled and did not recover in time
We need to return to Stage 1 and modify.
Stage 1'.
The crucial events are
- [1] the aircraft hit the ground
- [2] the aircraft stalled
- [3] Recovery was not effected in time
Stage 2'.
[2] /\ [3] ~> [1]
(here, '/\' is formal notation for 'and')
Event [1] fulfils the FAA definition of accident. We have three events,
one being the accident, and the other two causally related to it (which
we determined by asking 'why?....because'.
We observe that there were two causally determining events for [1], and
both of these events had to pertain for [1] inevitably to occur. Events
[2] and [3] are thus jointly necessary but not individually sufficient.
We critiqued the initial result, and went back to modify
the derivation. This revisiting is usual in formal methods, and could be
named the 'inevitable intertwining' or 'spiral'
(Note 1).
The final causal relation that we obtained in Stage 2' holds as well
for the airplane at 100 feet as for the airplane at 3000 feet. Simply at
100 feet, noone expects a stall recovery to be effected in time.
Since we're 'expecting' it, does it need to be said? In an accident
report, yes. A clear statement could lead to useful research into stall
recovery in less than 100 feet. If it's not said, no-one will remark it.
Analysing the Probable Cause of the Chicago DC-10 Accident
First, a list of events mentioned in the statement of probable cause:
- [1] The accident ( = aircraft impacted ground and people on board died);
- [2] the roll of the aircraft;
- [3] the asymmetrical stall;
- [4] the uncommanded retraction of the leading edge slats;
- [5] loss of stall warning system;
- [6] loss of slat disagreement system;
- [7] separation of No. 1 engine and pylon assembly at critical point;
- [8] improper maintenance procedures.
Second, the apparent relationship between events as asserted in the
'probable cause' appears to be a complex causal chain of the form
-
- [8] ~> [7] ~> [5] /\ [6] /\ [4] ~> [3] ~> [2] ~> [1]
So, the accident report considers a 'probable cause' to be a causal
chain. It singles out this causal chain as the most important
interconnection of events. However, the stall warning
system is an indication to the pilots of what was happening, as is
also the slat disagreement system, and their loss ([5] and [6]) only
affects at most pilots' behavior, and not directly
the control systems of the aircraft. They certainly play no direct role in
[3], [2] or [1]. Specifically, although
why [3]? because [4] /\ [5] /\ [6]' is true, so is
why [3]? because [4]'.
Therefore one could conclude that [5] and [6] are superfluous in
statement of this causal chain, since if it is a correct
causal assertion, the following is also a causal chain leading to
the accident:
-
- [8] ~> [7] ~> [4] ~> [3] ~> [2] ~> [1]
However, during the discussion, the report says:
-
- The simulator tests showed that, even with the loss of the
number two and number four spoilers, sufficient lateral
control was available from the ailerons and other spoilers
to offset the asymmetric lift caused by left slat retraction
at airspeeds above that at which the wing would stall. However
the stall speed for the left wing increased to 159 KIAS.
(KIAS denotes 'Knots Indicated Air Speed', i.e., the figure
displayed on the Air Speed Indicators in the cockpit.)
The report is saying explicitly that [4] did not inevitably lead to
[3]. The airplane remained controllable.
That entails that [4] did not inevitably result in [3].
This statement is simply inconsistent with the assertion of
'probable cause' (Note 2).
The solution is that something is missing from the causal chain
expressed in the 'probable cause' statement.
This omission is, however, contained clearly in the body of the report.
-
- The evidence was conclusive that the aircraft was being flown
in accordance with the carrier's prescribed engine failure
procedures. [...] Since the wing and engine cannot be seen
from the cockpit and the slat position indicating system was
inoperative, there would have been no indication to the
flight crew of the slat retraction and its subsequent
performance penalty. Therefore, the first officer [the
pilot flying] continued to comply with carrier procedures
and maintained the commanded pitch attitude [...] which
decelerated the aircraft towards V2, and at V2 + 6, 159 KIAS,
the roll to the left began. [...] There would be little or
no [impending-stall-indicating] buffet. [...] Since the
roll to the left began at V2 + 6 and since the pilots
were aware that V2 was well above the aircraft's stall speed,
the probably did not suspect that the roll to the left
indicated a stall. In fact, the roll probably confused
them, especially since the stick-shaker [a stall warning]
had not activated.
This says clearly that because the flight crew were unaware of the slat
retraction, they didn't know that the stall speed had increased, and
they flew the airplane 'in accordance with procedures which
dictated a speed slower than the new increased stall speed. It was
thus inevitable that the airplane's left wing would stall. There was
no indication to the pilots of this impending stall because the stall
warning system was also inoperative. Had there been, one imagines that they
would have reacted immediately (the indications are that they were
excellent pilots, who the report says were flying exactly 'by the book')
and the airplane could have been controlled (the report has stated, above,
that the airplane was controllable, derived from simulator tests).
Hence the report says that pilots' ignorance of the asymmetrical flap
condition and impending stall allowed the stall of the left wing to
take place. Thus is an essential causal factor missing from the
'probable cause' statement:
-
- [5] /\ [6] ~> [9] ~> [3]
where
- [9] pilots continued to fly the airplane at below
new left-wing stall speed
and the causal chain should read
-
- [8] ~> [7] ~> [5] /\ [6] /\ [4] ~> [9] ~> [3] ~> [2] ~> [1]
So the logic of the report is faulty. The 'probable cause' statement
includes an incomplete causal chain. A simple semi-formal analysis of
the report itself, namely just asking what does it say were the
critical events and what does it say are their causal relationships,
exposes this incompleteness, and demonstrates the inconsistency in the
report itself.
Well, OK, an engineer might reply, this doesn't satisfy the logical
nit-pickers, but we can all figure this out from the report
for ourselves, so why worry?
There was considerable public interest at the time
concerning the engineering of the DC-10 because of the accident.
McDonnell Douglas issued a report (3)
in an attempt 'To Set The Record Straight':
-
- There is no point, as rule as old as Aristotle tells us, in
debating a question that can be settled simply by examining
the facts. [....]
[The circumstances of the accident] gave rise to important
- to urgent - questions. [Questions follow.]
Naturally, properly, discussion of the DC-10 continued as
long as such questions remained unanswered. And not all
of them were answered quickly. [..]
The answers, when they emerged, were clear and conclusive.
They proved that the DC-10 meets the tougest standards of
aerospace technology.
They proved, too, that the Chicago accident did not result
from any deficiencies of aircraft design, and that steps
taken shortly after the accident had eliminated any possibility
of recurrence.
In a section entitled
The Basic Questions, they asked and answered:
- Why did a DC-10's pylon and engine separate from the wing at
Chicago?
[Because of a very large crack in the horizontal flange of the
pylon's aft bulkhead.]
- What was the origin of this crack?
[Damaged by improper maintenance procedures, which were
thereafter immediately 'banned by law' as soon as discovered.]
- Have changes in the pylon's structural design been ordered?
[No.]
- Is the pylon supported from the wing by a single quarter-inch bolt?
[No.]
- Why were DC-10s grounded?
[Premptive prophylactic action because of a failure to detect such
cracks on other airplanes at the time of the accident, and subsequent
detection of such cracks.]
- Are the DC-10's hydraulics systems effective and safe?
[Yes.]
- Is there a problem with the DC-10's wing slats?
[No.]
- But weren't changes to the slats required after the accident?
[No. But stall-warning system changes were. They
'provide additional backup in the system [...]. The DC-10
stall warning system's "redundancy" - duplication to provide
back-up security - exceeds industry standards for transport
aircraft.']
- [Some questions about ' two other fatal DC-10 accidents in
1979 after the Chicago crash'.]
McDonnell Douglas clearly felt the need to clarify public perceptions
of the accident by enumerating and commenting the facts. It is a
laudable goal, one which I support and which is supported by all the
engineers working within democratic societies whom I have ever met.
First, we can imagine that a clear, consistent, complete explanation to the
public of what had gone wrong, a goal of the NTSB, McDonnell Douglas,
and the airlines, could have followed directly and unambiguously from
the NTSB report without the intervention of McDonnell Douglas, had the
NTSB report conclusion been complete and had the report itself not
been inconsistent.
Second, McDonnell Douglas's 'Basic Questions'
generally follow the 'probable cause' statement of the NTSB report.
As factor [9] was not included from the 'probable cause' statement, so
it does not appear in the 'Basic Questions'. An answer is given,
however, namely that the stall warning system's redundancy "exceeds
industry standards for transport aircraft.". We can conclude
- that the stall warning system redundancy did not suffice, since
the airplane remained flyable, but the pilots flew it 'by the
book' into a stall;
- that if the redundancy 'exceeds industry standards',
the industry standards do not suffice.
The NTSB in fact drew both these conclusions, even though they
do not explicitly pertain to the 'probable cause' statement.
The report's 'Safety Recommendations' (Class II, Priority
Action A-79-99) recommended that
-
- [...] if certification is based upon demonstrated
controllability of the aircraft under condition of
asymmetry, insure that asymmetric warning systems,
stall warning systems, or other critical systems
needed to provide the pilot with information essential
to safe flight are completely redundant.
(This is the clause of A-79-99 pertaining to the DC-10. The McDonnell
Douglas report states that the DC-10 was the only wide-body cabin
airliner to have demonstrated the ability to fly with asymmetrical
slats, which it did during certification.)
Simple formalisation has shown infelicities in the NTSB report of
its conclusions concerning the Chicago crash. McDonnell Douglas felt
the need for public clarification, and a clear statement of the facts.
However, full information on one necessary causal factor was not
provided in their clarification. This is consistent with the omission of
this factor from the statement of probable cause in the NTSB report.
We can imagine that public and professional discussion of the
accident, an essential factor for safety progress in a democratic
society, could have been aided by simple formalisation, which
demonstrates this omission.
This is not the only example to demonstrate advantages of this simple
formalism. In (2), I demonstrated using the same
technique that two necessary causal factors, the position of an earth
bank and the state of the runway surface, were omitted from the
'Causes' statement of the report on the A320 accident in Warsaw in
September 1993. Both of these are under direct control of the Polish
Authorities, yet recommendations to the authorities were only that the
system of collecting and distributing meteorological information
should be adapted to conform to ICAO Convention Annex 3 standards, and
that the bank should be described in the AIP Poland (the official
description of airports). One can thus observe from the formalisation
that the recommendation prima facie does not conform precisely
to all the necessary causal factors, and imagine that it would have helped
the goals of accident analysis to have addressed this apparent
disparity in the report itself.
I conclude that formalisation helps. It enables us to check not only
the events, but also the reasoning concerning those events and the
derivation of the conclusions and recommendations in an accident report.
References
References link back to the first mention.
Back to top
(1): National Transportation Safety Board,
Aircraft Accident Report, American Airlines, Inc.
DC-10-10, N110AA, Chicago-O'Hare International Airport, Chicago, Illinois,
May 25, 1979., Report NTSB-AAR-79-17, NTSB, Washington, DC, 1979.
Also in (4).
Back
(2): Peter Ladkin,
The X-31 and A320 Warsaw Crashes: Whodunnit?.
Back
(3): McDonnell Douglas Corporation The DC-10: A Special
Report, McDonnell Douglas Corp., 1979.
Also in (4).
Back
(4): John H. Fielder and Douglas Birch, eds.,
The DC-10 Case, State University of New York Press, 1992.
(5): W. Swartout and R. Balzer The Inevitable Intertwining
of Specification and Implementation,
Communications of the ACM 25(7):438-440, July 1982.
Back
(6): B. W. Boehm A Spiral Model of Software Development
and Enhancement, ACM SIGSOFT Software Engineering Notes 11(4):14-24,
August 1986.
Back
(7): P. B. Ladkin
Time for Causes, to appear in
http://www.rvs.uni-bielefeld.de/~ladkin/ in September 1996.
Back
Notes
(Note 1): After Swartout and Balzer
(5)
and Boehm (6), respectively.
Back
(Note 2):
This provides yet another reason why the 'causes' relation
cannot be identified with the temporal logic 'leads to' relation.
See
(7).
Back