Peter B. Ladkin

Article RVS-RR-96-16

: Abstract: The purpose of this note is threefold: to provide factual information on the AeroPeru 603 accident on October 2, 1996; to provide comparison and draw some conclusions on the basis of that information, and and to provide a history of what was said about the crash, when, and by whom. The perceived safety of air travel depends in large part on accurate reporting in news of accurate statements by those with accurate knowledge of what happened. These three conditions are not always fulfilled, as shown in this note.

Synopsis of the Latest Information
Aviation Journal Correspondence
The Dual Purposes of This Note
Pertinent Factual Reports
The Most Reliable News Information
Technical Details and Assessment
A History of News Reports
Information from the CVR and DFDR
The Argument Against `Computer Failure' Being the Sole Cause
Information on the Aircraft and Other Comments
Comparison with other Accidents in which a Maintenance-Induced Common Failure Mode Was Implicated
The Bottom Line
References

Synopsis of the Latest Information

18 January 1997

Reuters reported that a `special commission' of the Peruvian transport ministry had issued a report which said

: According to the evidence that has been found, it has been concluded that staff cleaning the lower part of the aircraft did not remove protective adhesive tape when they finished their work and so the sensors remained obstructed.

According to Reuters, the report said that the obstruction of the static ports `explained' the erroneous and confused information about altitude and airspeed indicated by the instruments after takeoff.

Aeroperu pilots (union pilots?) apparently disputed this attribution of causality, claiming the aircraft would have crashed immediately on takeoff rather than half an hour later.

5 December 1996

Edward H. Phillips reported (8) in Aviation Week that the NTSB has issued a safety recommendation concerning static port covers. It recommends the FAA

: [...] mandate use of conspicuous covers over static ports during aircraft cleaning operations to help ensure their removal before flight and thus prevent erroneous airspeed and altitude indications.

The Boeing B757 maintenance manual

: [...] calls for the taping of moisture-resistant paper over the ports, according to the Safety Board. Both McDonnell Douglas and Airbus Industrie provide brightly colored covers that are installed over static ports during cleaning of an aircraft. These covers also have warning flags to remind personnel to remove them before flight, according to the NTSB.

Phillips reports that the DFDR and CVR data indicate that airspeed and altitude readings were normal during the takeoff roll, but that during initial climb airspeed readings were `too low' and altitude readings `increasing too slowly'. The wind shear warning triggered, even though winds were calm and there was no inclement weather. The aircraft climbed to 13,000 ft, then descended to return to Lima. Captain's airspeed and altitude were both reading high, and FO's airspeed low, apparently (I don't know how this was calculated). The stickshaker and overspeed warning were activated, while the captain's airspeed read more than 350kt. At impact, captain's IAS was about 450kt and ALT 9,500, according to an unnamed NTSB official.

12 November 1996:

James T. McKenna reports the sequence of events in Aviation Week and Space Technology on 11 November 1996 (7)Officials familiar with the investigation", about 5 minutes after takeoff, the FO declared an emergency, telling Lima Tower that they had no airspeed or altitude indications. The Captain circled the 757 to the west, the tower controller providing altitude and position readouts to the pilots from the tower's radar (recently returned to service after a lengthy outage). However, the tower's altitude readouts come from the aircraft's Mode C transponder, which gets its altitude information from the aircraft's altitude system (i.e. the static ports). Some at least of these ports appear to have been taped over during maintenance and the tape not removed (see below).

Apparently, the pilots "did a good job, despite stickshaker, overspeed and other warnings", and eventually determined that the static system was yielding false readings. They used the radio altimeter (normally used only for landings) to return to Lima at 1,500 ft. Nearer to the airport, they were apparently distracted by a ground-proximity (GPWS) warning. The FO queried the controller, who responded that the aircraft was indicating 9,000 ft. Believing that information, the Captain started a descent. The aircraft skimmed the ocean surface and one engine failed, (investigators presume from water ingestion). The Captain tried to continue flying, but a wingtip apparently hit the water and the aircraft cartwheeled and was lost.

If this account is veridical, they were desperately unlucky, due to one unfortunately mistaken call by a presumably very tired pilot, after a competent job of flying the aircraft on severely misleading air data. The accident was CFIT (Controlled Flight Into Terrain). That would make it B2 in my classification scheme (Section The Argument Against `Computer Failure' ...... below). The `pilot behavior' mentioned thereunder would be the action of initiating the descent taken on the basis of the false altitude report.

7 November 1996:

The following note which appeared in RISKS-18.59 summarises what is reliably known as of November 7 1996.

: Date: Fri, 8 Nov 1996 01:52:12 +0100
From: Peter Ladkin
Subject: Careful AeroPerusal (Ladkin RISKS-18.51, PGN RISKS-18.57)
[....] The facts are, from a source at the NTSB as well as information about the B757 static system [...] [see below. PBL]
a) *Masking tape*, not duct tape or `Remove Before Flight' Covers, was covering the *left-side static ports* on the aircraft [NTSB]; (there's no way to attach covers: the ports are flush with the fuselage [B757 P/S system diagram]);
b) Static ports to all three independent pitot-static systems are on both the left side and the right side of the fuselage, including those for the electro-mechanical backup: both static ports and pitot in each system are interconnected by an open tube [B757 pitot-static system diagram];
c) the right-side static ports have not been recovered; it is therefore not known whether masking tape was also covering these [NTSB];
d) blockage of all the left static ports would cause some degradation of *all* the air data in both EFIS-displayed P/S systems plus the backup; blockage of the right-side static ports as well would cause worse degradation [general aero and system knowledge]; this is thus a *common failure mode* of all three independent P/S systems: both primaries and backup.
e) the Peruvian Transport Ministry said that this obstruction of the sensors "could explain the erroneous and confusing altitude and speed information received by the pilots after takeoff" [NTSB source, quoting an official statement]. This contrasts with the Minister's reported statement on October 2 which seemed to the press to ascribe computer problems as the cause.
f) Putting masking tape on the ports when cleaning the aircraft is a normal maintenance procedure [NTSB]; however, leaving it on is certainly not! I don't know whether after such a procedure the aircraft has explicitly to be `signed off' after inspection by a qualified inspector, who would then make a `returned to service' entry in the maintenance logs. This is so for most procedures which render an aircraft temporarily unairworthy (as putting tape on the static ports does). This is a question still to be answered here, and I'm sure there are many readers who could do so;
g) A further question, posed by Jim Wolper, is why the air crew did not notice the tape on static ports on the pre-flight inspection. It was dark, but nonetheless on most airplanes visually checking the static ports is an explicit item on the pre-flight inspection check list. The B757 body is relatively high off the ground, but nevertheless I should have thought that tape on the ports would be clearly visible.
h) The CVR and DFDR have been recovered, examined in the NTSB Laboratories, and the data returned to Peruvian colleagues [NTSB];
In RISKS-18.51, I expressed extreme scepticism that computer failure could be the sole cause of any B757 accident (except for one possibility which has never happened to any aircraft). It should now be clear that the recently-discovered failure mode under discussion is (a) not computer-related, and (b) deemed sufficient by itself to cause the known effects and history of the flight. This does not of course rule out other simultaneous failure modes that are computer-related. We still await the CVR and DFDR data. [...]

Aviation Journal Correspondence

Professional pilots and aviation engineers often comment in professional journals and magazines about accidents. The views they profess are not always well-balanced or based on the most accurate information, but they can introduce what-if scenarios, and emphasise points, that official investigators would be unable to make thanks to role constraints.

Robert M. Jenney suggested (9) that an angle-of-attack (AOA) indicator could have allowed the crew to avoid the accident. Since the crew descended under control into the ocean apparently believing they were at over 9,000 ft., it is difficult to see what relevance AOA might have had. One might surmise that flying with reference to AOA might have made the crew's initial 30 mins of flying a little easier, and therefore the effects of fatigue a little less. But one must balance this against the added cognitive burden of having yet another instrument on board. Jenney's suggestion that this might have helped in the Birgenair case seems more apposite - but given the basic crew mistakes, and their inability to use the alternative instruments they already had, one could question whether yet another problem indication would have brought home to them what was going on.

Gary T. Dye (10) made three suggestions: that, in the absence of accurate static pressure, maintaining a radio altimeter reading of (say) 1,500 ft. over water and flying with reference to attitude and power settings would have been possible; that dumping cabin pressure would make the cabin altimeter and VSI emergency [altimeter] substitutes; and, alternatively, the glass could be broken on a lesser static system [instrument] to provide an emergency static port.

W. E. Kelly (11) countered Dye's third suggestion with an anecdote, showing how countering one failure might induce another:

: When I tried to break the glass on the VSI, the ax [sic] bounced off and went through my only horizon on the airplane - and it was a dark night. My boss couldn't believe my apparent stupidity and tried to break the glass by holding a VSI in one hand and using a hammer with the other. He failed. Pressure instruments necessarily have very tough glasses.

Jack Karran (12) noted the contribution of secondary radar to the accident.

: I understand that, following a request for position and height, air-traffic control informed the flightcrew [sic] that it was at 19,000ft (5,800m) as the 757 hit the sea. There should be a requirement for being able to test static and dynamic air-pressure sensors before take-off. The aircraft's instrumentation and systems are compromised if there is a fault, as is secondary radar - with all that implies.

Two comments that don't affect Karran's argument: the term `height' is not used in aerospace - he means `indicated altitude'; his figure of 19,000 ft conflicts with the accounts of Phillips and McKenna (see above, (7), (8)).

C. H. Morshead (13) thinks that testing pitot-static sensors is inadequate unless there is a full system test, which

: requies disturbance of the system by the installation of adaptors over pitot tubes and static ports, a possible source of blocked ports if the adaptors are not removed before flight.
A sensor test would only confirm the co-operation of individual instruments and, as such, would not have identified the problem which triggered the confusion leading to the [accident, and] would also probably require some form of disturbance to the system - for example, the isolation of the sensor from the system, to prevent back-leakage out of the ports. This, again, provides a potential source of errors

He recommends checking ASI on the runway, and that the VSI operates in the correct sense and with appropriate readings on initial climb-out. Cross-checking the ASI, VSI and altimeters operating off different circuits will indicate whether a particular circuit is malfunctioning, he says, or if all are affected. Pitot system problems will then result to an aborted take-off, and static-system problems in a circuit-to-land.

Dye's suggestion to use the radio altimeter would certainly have avoided the decision leading to the descent into the ocean, which was based on the crew's (and Lima tower's) false indications of an approximately 9,000 ft altitude from the pressure instruments. A radio altimeter is a completely indepedent radar-like system. I do not know if the accident airplane was equipped with a radio altimeter.

Dye's second suggestion, to use the cabin altimeter, is possible provided one knows one has a static system problem, and at approximately which altitude one is flying. At night, with ground fog below, there is a lack of visual reference, and cognitive capabilities are demonstrably impaired at 13,000 ft, for example. However, even with cabin pressure, the cabin altimeter will show a positive pressure differential (`cabin altitude'), and so cross-checking against the cabin altitude will enable one to determine if one is flying high or low. I don't know what the cabin altitude at 9,000 ft in a B757 normally is, but it certainly isn't zero, and the crew could hhave cross-checked.

Karran's observation that the static ports are a single point of failure not just of the pilot's pressure instruments, but also of the secondary radar, is accurate. Dye shows how there exist work-arounds - that the aircraft systems provide the lost information, in a degraded form, redundantly. Karran recommends an extra system to check the failure point. Morshead points out that devising such testing systems could introduce new potential failure modes. This encapsulates a common type of engineering discussion about critical systems: engineering redundant systems so that if one fails, others may be used; introducing test systems or interlocks against certain failure modes almost inevitably introduces new failure modes of the test or interlock systems themselves, and these failure modes may themselves present a hazard. Finally, from the available data I cannot tell what influence performing Morshead's ASI/VSI cross-checks would have on the course of events.

The Dual Purposes of This Note

This was the third accident to a B757 aircraft within 10 months, after a nearly 13 year perfect safety record. Immediately, the Peruvian Transport Minister was reported by The Times and CNN to be focusing on the aircraft's computer systems as the cause (thanks, presumably, to some words the pilot had spoken to ATC). This was well before investigators knew exactly where the wreckage was (it was in 500 ft of water of the coast near Lima, Peru), or before the CVR and DFDR had been recovered and analysed.

The only way I could see that a computer failure could be the sole cause of a B757 accident is if the FADECs caused both engines to flame out at the same time, leaving the pilot no choice but to descend earthwards and fail to make an airport landing. According to reports, this hadn't happened: the pilot had reported a loss of with either air data or attitude information or both, and had flown the aircraft for a while before disappearing from radar after 28 minutes of flight.

I am not comfortable with any attempt to `blame the computers' when the faults lie elsewhere or are shared between computer behavior and human, mechanical or environmental failures. I am also not comfortable with speculating about accident causes on the basis of too little information. The questions for me are thus: when is it appropriate to speculate?; when is it appropriate to join in a public discussion which involves speculation?; and how then to join in? These issues had been discussed in recent issues of RISKS, and I made a collection and summary available through this compendium.

I had been disturbed by some of the initial responses into the Birgenair accident in February 1996. The German Transport Ministry initially reported that the aircraft was not properly insured, and had not been cleared to land in Germany. The first claim was false: the aircraft was insured. As to the second, as I understand the details, the aircraft was a substitute for one whose landing had been cleared, the operator was working on obtaining landing clearance for this substitute at the time of takeoff, and the aircraft planned to land in Gander, in Newfoundland, to refuel and pick up its landing clearance, which it presumed it would by then have obtained. All completely in order. The Ministry's statement was thus doubly misleading. There is a further sense in which it was misleading: neither insurance nor landing permission have the slightest connection with why the aircraft would have crashed. In turn, the German pilot's union, Cockpit, claimed that the pilots were not type-rated in the aircraft. But the union knew that they had been initially scheduled to fly a B767 and that type-ratings apply to both B757/767 aircraft. In short, the union had claimed publically something which it probably knew very well was false (if I knew it, why didn't they?). I hope readers understand why I was disturbed.

Now to AeroPeru. Authoritative sources had speculated: I felt they had given a demonstrably mistaken premature attribution of cause, and that this therefore must be refuted, from the armchair if need be. I produced an argument that computers could not be the sole cause of a B757 except for the one possibility, which was published in a RISKS-18.51 note. The argument is reproduced below in the Section The Argument Against `Computer Failure' Being the Sole Cause, along with commentary added from the further information that we now have.

My answers to the questions of public discussion were thus: it is appropriate to speculate when all parties to a discussion are devoid of much factual information but it is nevertheless important to join such a discussion; it is appropriate to join in a speculative discussion in order to reduce the plausibility of implausible scenarios which have been given the weight of some authority; and one joins in such a discussion by analysing the possible scenarios. In other words, I think it completely appropriate to try to build something like a `fault tree' (1) which can in principle be done purely from knowledge about the aircraft type and its environment and to discuss this `fault tree'; and then to use the facts to traverse and refine the tree as they come in. The only question is that fault trees have been used thus far in engineering only essentially for the nuts-and-bolts. But there's no reason in principle why they cannot involve required or appropriate operator actions as well.

But a caveat: if one makes assessments on partial information, it is all but inevitable that one will sometimes be simply wrong. I thus hold it as incumbent on assessors to indicate as well as possible how and why they might be wrong. And I hold it as imperative to distinguish between incontrovertible veritude (facts and valid reasoning from facts) and those suggestions which are possibilities only.

The purpose of this note is then to track reliable information about the accident as it arrives; and to track also the press reports and other commentary as they are published. I hope thus to track explanations of the crash; dually to give readers insight into the social process of discussing accidents. My own bias is towards determining the facts and analysing the causality of the accident; towards making the most objective assessment possible when needed in states of partial information; and assessing the reliability of such an assessment as accurately as possible.

Pertinent Factual Reports

The BBC World Service on Thursday 3 October, 1996 reported an accident to Aeroperu 603, a B757, on Wednesday 2 October. The BBC report said that the aircraft had mechanical problems, that an oil slick had been found in the ocean near where the plane crashed, and that an official had claimed that the cause of the crash was `computer failure'. Other news organisations such as The Times and CNN reported similarly, but in more detail.

CNN reported on October 20, 1996 that Divers recover black box from Peru crash. Just one, they said, but did not say whether the DFDR or CVR. An NTSB source reported privately on 7 November, 1996 that the recorders (plural) had been examined in the NTSB laboratories and the data given to the Peruvian authorities.

On November 5, 1996, CNN reported Duct tape blamed in Peruvian plane crash: that `crucial sensors' had been covered over during cleaning and polishing of the aircraft, and that apparently workers forgot to remove it. (`Duct tape' is wrong. According to an NTSB source, it was masking tape. This mistake was also propagated the day before by AP, reported by CNN in Report: Duct tape caused Peru jet crash.)

According to UPI on November 6, 1996, a statement released by Aeroperu said that investigators had managed to recover pieces of the aircraft where the static ports are located, "which will help the investigating committee determine one of the possible causes of the crash." UPI also said that a Transport Ministry statement confirmed that part of the fuselage with three sensors had been found, covered by "adhesive tape of the kind used to block the ducts when cleaning the aircraft." The Transport Ministry statement also said that "The obstruction of these sensors could explain the erroneous and confusing altitude and speed information received by the pilots after takeoff", according to UPI.

(UPI also reported the pilot's words from before the crash as "The computers have gone crazy". This presumably links with the statement by the Peruvian Transport Minister, reported by CNN on October 3 in Computer Failure Puzzling in Peruvian Crash, that "We have to find out why the computers went crazy".)

The Most Reliable News Information

A reliable source reports privately also on 7 November 1996 that the Peruvian statement says that the left side static ports have been raised to the surface and were found "blocked" with masking tape, not duct tape, consistent with the procedure used when maintenance personnel polish the airplane. Investigators have been unable to find and recover the right side static ports. According to this source, the statement also says that "The obstruction of the static ports could explain the erroneous and confusing information received by the flightcrew regarding airspeed and altitude." (confirming UPI's wording above).

Technical Details and Assessment

The left side static ports of the accident aircraft were found blocked with masking tape (not `duct tape' as reported by CNN or `Remove Before Flight' covers as suggested by PGN in a RISKS-18.57 note). This would cause static system unreliability: in particular, an impairment of altitude and vertical-speed information. This in turn could cause difficulties in maintaining accurate pitch.

High-performance turbine aircraft are very sensitive to precise pitch, and they can quickly speed up to above `Never-exceed Speed', which overstresses the aircraft structurally and aerodynamically and can lead to structural failure or loss of control or both. For example, the phenomenon of `jet upset' first occurred on February 12, 1963 to a B720B, a 707 derivative, out of Miami. Noone survived. Two months later, a United jet suffered a similar sequence of events but recovered. It was investigated, and procedures developed to avoid it in which all pilots were trained. Then after a few years they ceased happening with any frequency, except in business aviation :-(. For some history, see Chapter 9 of (2).

The reader may like to consult the schematic diagram(JPEG, GIF) of the B757 pitot-static system to picture the description which follows. The Boeing B757 has three independent pitot static systems: two pass through the Left, respectively Right, Air Data Computers to the captain's, respectively first officer's, EFIS (CRT) displays, and the third is a traditional electro-mechanical backup. In case of air data problems with either EFIS, the EFIS displays may be switched to read from the other Air Data Computer (captain's from RADC, F/O's from LADC). The center autopilot, which normally gets data from the LADC, will in this `alternate' mode obtain air data instead from the RADC. (It was the failure to switch to `alternate' after the captain's air data was discovered to be faulty that led the center autopilot, which was switched on after the discovery of faulty captain's air data, to direct the Birgenair aircraft into a stall. Had `alternate' been used, the autopilot would have obtained correct air data from the RADC).

A single pitot-static system consists of a pitot (pointing directly forwards, to measure air pressure in the direction of flight) and two static ports (small precision-machined openings, one mounted on each side of the fuselage flush with the fuselage). The static ports measure ambient air pressure. Airspeed data is obtained by comparing the difference between pitot pressure and static pressure; altitude is obtained by measuring ambient air pressure direct from the static port (`static pressure') and correcting for known atmospheric pressure conditions; vertical speed (`rate of climb' or `rate of descent') is obtained by measuring the rate of change of `static pressure'. The textual description of the Air Data System from the B757 Operating Manual may be of interest.

A technical description for non-aerodynamicists of how a pitot-static system functions and what happens when a static port is blocked has been contributed by Robert Dorsett

Given the latest information on the static port blockage on AeroPeru 603, the possibilities are:

Both the left-side static ports and the right-side static ports were still covered. This would lead to serious difficulties in the measurement of altitude and vertical speed, and considerable errors in measurement of airspeed: but airspeed would still show trends;
The left-side ports were covered, and the right-side ones not covered. This would lead to similar, but less acute phenomena as in the first case.

It is clear that this represents a common failure mode of all three pitot-static systems: the backups suffer the same failure as the primary systems. It should also be clear that this failure mode is not computer-related: neither the design nor the operation of the LADC or RADC or any other computer contributed to this particular way of failing to deliver reliable air data.

The covering of either left-side-only or both-side static ports could be sufficient by itself to explain the difficulty the AeroPeru 603 pilot reported with his air data. It remains to be seen whether there were other contributory failure modes, and if so, whether these failure modes were computer-related.

One may now favorable compare my assertion in the RISKS-18.51 note of October 9, 1996, that a computer-related failure could not be the sole cause of the crash, with the Peruvian Transport Minister's reported statements of October 2 (presumably, the most recent Transport Ministry statement as reported by UPI on November 6 constitutes an implicit retraction of those earlier claims). But the full story is by no means in yet.

We are also awaiting the publication of the DFDR information and CVR transcript.

Some further questions lie behind the story of the masking tape. It is standard procedure, according to a reliable source, to cover the ports with masking tape while cleaning the aircraft. However, while those ports are covered, the airplane is technically unairworthy, as during all non-trivial maintenance procedures. Such procedures normally must be accompanied by an explicit `returned to service' entry in the maintenance logs after inspection by a qualified inspector mechanic. Is it in fact necessary for this case? If so, was the aircraft signed off? Also, although the aircraft fuselage is high off the ground, the static ports should be visible on the air crew's mandatory pre-flight walk-around inspection. On most aircraft (such as my former Piper Archer), inspecting the static ports is a mandatory item on the pre-flight inspection. Is it mandatory on the B757 pre-flight? In any case, even though it was dark, why did the crew member not inspect the static ports on the pre-flight and see the tape? (We will probably never know the answer to this last question.)

A History of News Reports

The news reports on October 3, 1996 by the BBC World Service and by CNN have already been mentioned.

The Electronic Telegraph reported on 3 October that the B757 crashed into the Pacific about 3 miles off the coast, north of Lima, on a flight to Santiago, Chile, with 70 people on board. The report said: "Gen Juan Piperes, fire chief of the Peruvian port of Callao, said: "The plane's whole system completely failed."

A fuller report by Quentin Letts appeared in The Times for 3 October, 1996 (3). The Times reported that Flight 603 was en route from Lima to Santiago, Chile, and disappeared from radio and radar contact at 1.10am local time. It had taken off from Lima at 12.42am with 61 passengers and 9 crew. Visibility was down to 30ft in fog as the search took place. The pilot had reported mechanical trouble before the accident, said he was turning back and declared an emergency. Elsa Carrera de Escalante, the Peruvian Transport Minister, said that computer failure appeared to have been the cause. She said that "It seems there was a blockage in the computer system". It appears from The Times' report that the pilot was confused as to his altitude, the meaning of some cockpit warnings, and the attitude of the aircraft at various points. [Interested readers may register for free on the Times WWW site, and search the archives for Aeroperu, to obtain Letts' article. I negotiated with Mike Murphy of The Times an agreement to link their archives directly for readers of this compendium, but have not yet implemented links.]

CNN reported in Computer Failure Puzzling in Peruvian Crash on October 3, 1996 that Senora Carrera said immediately after the accident that "It is not the first time that one of these planes has had this kind of fault. We have to find out why the computers went crazy." CNN also reported that a source close to the US investigators said it was "very premature" to speculate about the cause of the crash and about possible technical failures.

James T. McKenna reported in Aviation Week and Space Technology (6) that in November 1995, FAA officials `reported' that Peru does not fully comply with ICAO safety standards. Peru is an ICAO signatory. Earlier this year, US officials had threatened to charge Peru with failure to comply with ICAO standards unless the Peruvians took steps to improve their oversight of commercial aviation. Such a finding would have blocked Peruvian-operated aircraft from serving the USA. Peru has "struggled to address the FAA's concerns," said McKenna, but that the US criticism had left them "defensive and even more sensitive to perceived interference from Americans in their activities". These are strong words.

An NTSB source said privately (7 November 1996) that cooperation with Peruvian colleagues was "excellent", and that the Peruvian Navy and AeroPeru are "fully supportive".

The most complete WWW reports I have found are from CNN at Searchers comb Pacific for more bodies after Peruvian Crash on October 2, Computer Failure Puzzling in Peruvian Crash on October 3, Divers recover black box from Peru crash on October 20, Report: Duct tape caused Peru jet crash from Associated Press, and Duct tape blamed in Peruvian plane crash. (AP and CNN are wrong about the duct tape. The Peruvian authorities said masking tape.)

[Many thanks to Thomas Netter, maintainer of the Bluecoat archive, for the information on CNN's WWW archive. Thomas informs me that CNN's URLs change monthly or so. I noticed yesterday that there's a typo in the link from their November 5 article to the October 3 article, but they're still all there as of November 7, 1996]

Information from the CVR and DFDR

An NTSB source confirmed privately on 7 November, 1996 that the recorders (plural) have been recovered and examined in the NTSB laboratories and the data provided to the Peruvian authorities.

I my note of 5 October, I had said the following:

: With due respect to the Minister, any attribution of cause of this crash is premature (my reasoning is given below). The digital flight data recorder (DFDR) and cockpit voice recorder (CVR) must first be recovered and analysed. Until this is done, very little can be determined about the sequence of events leading to the accident.
: The information available so far is entirely gleaned from the transcript of pilot/controller conversation, and radar plots. These, by themselves, are insufficient to determine the nature of the problems. For example, it is not yet known whether control of the aircraft was lost.

The Argument Against `Computer Failure' Being the Sole Cause

In my RISKS-18.51 note, I divided the possible sequences of events grossly into two, and then into four and two subcases, respectively. This is similar to what is done in a Fault Tree Analysis (1), but fault trees are usually used for fault analysis at a much lower level. However, they are based on simple Boolean logic, which applies to anything (a note for logical insiders: it applies unless one is an intuitionist, of course), and there is no reason in principle why one cannot attempt to construct one for total system failures. I'll annotate this attempt at analysis with the new information.

A: suppose normal control of the aircraft was lost. The B757 is conventionally controlled (not computer-controlled), and the air data systems have electromechanical backups. Therefore, in the event control was lost,
- 1: either these backup systems would have had to fail also (in which case there would be a physical contributing factor);
- 2: or the pilot would have to have made ineffective use of these backup systems (in which case either inappropriate pilot action or some other cognitive confusion would also be a contributing factor);
- 3: or the autopilot flew the aircraft into an out-of-control situation (as in the Birgenair accident), in which case the pilot's behavior in engaging and not disengaging the autopilot would be a factor;
- 4: or the pilot would somehow otherwise have allowed control to be lost.
Noone has yet conclusively determined whether any of these situations occurred. The recovery of the taped-over left-side static ports suggest situation A1. This does not preclude also situations A2 or A3 (as consequences thereof), but it almost decisively rules out A4 in combination with the published ATC/pilot conversation.
B: if, on the other hand, normal control was not lost, then
- 1: either the aircraft must have suffered some form of structural failure in normal flight, which computers alone could not have been responsible for (structures can fail under normal control inputs if the aircraft is in an overspeed condition, but normally not otherwise); or
- 2: the aircraft flew under control into the water (i.e., a CFIT, Controlled Flight Into Terrain, accident), in which case pilot behavior or engine failure must also have played a role.
The ATC/pilot conversation already published virtually certainly rules out possibility B2. The CVR and DFDR transcripts will enable us to determine whether possibility B1 occurred.

These alternatives cover, grossly, all the possible scenarios. Alternative B2 is the only one which allows a computer failure alone to cause an accident: a simultaneous spontaneous failure mode in the FADECs (full-authority digital engine controllers) which causes both engines to lose effective thrust, and the aircraft must then inevitably land at an airport (if there is one within gliding distance) or CFIT providing control is not lost.

Except, then, for B2, computers alone could not cause any of the other failure modes. Since CFIT due to a simultaneous double engine flame-out didn't even come into question in the Aeroperu crash, we may conclude from the fault tree that singling out computer failure of any kind, at any point in the investigation, could not have been the whole story.

This conclusion was obtained from knowing a little about the design of the B757 - it should in principle not be surprising that recent discoveries are corroborating it. Unless the analysis above is incorrect, of course. I will gladly discuss this `fault tree', and correct it if there's a mistake.

Accidents have many causes and contributing factors. These are determined from the sequence of events, and it is not until all these factors are known, and thoroughly worked through by accident investigators, that anyone can tell which ones were decisive (these will be cited in the 'probable cause' of the final accident report).

Information on the Aircraft and Other Comments

The Boeing 757 entered service with Eastern Airlines in January 1983 (4), and by 25 September 1996 a total of 699 aircraft had been delivered (5). It is the fifth most common aircraft used by airlines, after the B727-200, B737, B747 and MD-80. The B737 family should be divided into two categories, as should the B747, since newer generations of these aircraft are significantly different from older versions, especially with regard to their avionics.

The B727-200 (992 flying with airlines, a total of 1,831 built) is an older aircraft, in service for 35 years, no longer manufactured. There are 953 early-generation B737-100/200 aircraft flying with airlines, out of a total of 1,144 built (these models are no longer manufactured); and 1,618 newer-generation B737-300/400/500 aircraft. There were 724 B747-100/SP/200/300 aircraft delivered, of which 613 are still flying with airlines, and many are used as freighters. They are no longer manufactured. There are 390 of the newer B747-400 flying with airlines. A total of 1,118 MD-80 aircraft are flying with airlines. (All data from op. cit.)

After a superlative and unprecedented accident-free service of nearly 13 years, this is the third B757 accident within 10 months. It can be concluded from these figures that the B757 is generally an extremely safe aircraft. The cluster of 3 recent accidents is not of statistical significance. The other two were

to an American Airlines aircraft near Cali, Colombia on 20 December 1995. This accident was the result of controlled flight into terrain (CFIT). The final report attributes the crash to human factors;
to a Birgenair aircraft off Puerto Plata, Colombia on 6 February, 1996. This accident occurred when the pilots lost control of the aircraft in the climb after takeoff. The final report will be likely to attribute the crash to a minor mechanical failure (likely caused by improper storage of the aircraft for the two weeks before the accident) and human factors.

Information on both of these accidents is available in this Compendium. Given that human factors were involved in these two accidents after 13 years of service, and the mechanical failure, most probably a blocked pitot, likely occurring because one of the pitot tubes was left uncovered during storage, one concludes that the accidents had little to do with the design and nothing to do with the construction of the aircraft itself.

Although computer systems were involved in both the AA and Birgenair accidents, it is important to emphasise that in both of these cases, a flyable aircraft crashed, and the computer systems were not the cause of the crash. (How the pilots used the computer systems was, however, a decisive factor in both accidents.)

The B757 aircraft uses computer systems for displaying air data, for navigation, and for autopilot control and flight management. The flight controls are conventional hydromechanical systems. Furthermore, the air data computer systems are backed up by conventional electromechanical `standby' instruments of the sort used for over 60 years.

Because of the above analysis, I treated reports that the computer systems were the cause of the Aeroperu accident with utmost scepticism. It is difficult to see how any of the computer systems could be the sole cause of a B757 crash, except for situation B2 which has never occurred with any airplane computer system. Piperes' comment, above, that the plane's `whole system' failed, is meaningless. There are hundreds of distinct systems on this aircraft. For any given accident, one can pretty well be certain that not all of them had suffered failure modes contributing to the accident.

Comparison with other Accidents in which a Maintenance-Induced Common Failure Mode Was Implicated

A maintenance-induced common failure occurred to the three engines of an Eastern Airlines L-1011 flying out of Miami on May 5, 1983. The NTSB Report (175K + GIFs + JPEGs) explains that during scheduled maintenance an engineer replaced oil seals on all three engines with the wrong seals. These seals failed at various during the next flight, and all engines at various times lost oil. Two were shut down in-flight, and the airplane returned to Miami. One was restarted for touchdown and the aircraft landed safely. Remaining time on the running engines before they would have seized up was estimated to be on the order of a couple of minutes after touchdown.

There was reported to be some confusion amongst the mechanic and his supervisor as to the correct seals to be used. Furthermore, the mechanic had replaced the seals using the headlights of a fork-lift truck, which is certainly not appropriate lighting for this procedure.

This accident involved a common common failure mode which incorporated one major checklist failure, shared between two mechanics, engendered by a confusion over the exact type of visually- and topologically-similar replacement parts.

The hypothesis I currently favor in the Aeroperu accident would show a greater depth of procedural failure. Use of masking tape is an appropriate procedure, and my current hypothesis holds it to be a sign-off item on return-to-service. There would thus be three consecutive checklist failures under this hypothesis: that of the workers who did not remove the tape, the inspector who signed off on return-to-service, and the aircrew on pre-flight inspection. Furthermore, unlike the Eastern Airlines incident, this common failure mode should have been visibly apparent to all parties involved, and would not have involved any confusion about appropriate procedure or parts.

In summary, the Eastern Airlines incident involved a common failure-mode, induced by confusion over part-type and improper installation procedure, with only one relevant checkpoint failure. The Aeroperu accident (under the most-favored hyopthesis) would involve three consecutive checkpoint failures, and no part-related confusion.

The Bottom Line

A common failure mode of the three B757 pitot-static systems has been discovered by investigators of Flight 603's accident: masking tape covering the LHS static ports. This failure mode is deemed sufficient by itself to explain the known phenomena associated with the aircraft's demise; and this failure mode is not at all computer-related. We await the CVR and DFDR transcripts and the possible discovery of other potentially contributing failure modes.

References

(1) W. E. Vesely, F. F. Goldberg, N. H. Roberts and D. F. Hassl, Fault Tree Handbook, NUREG-0492, U.S. Nuclear Regulatory Commission, Washington, D.C., January 1981. Back

(2) Robert Buck, The Pilot's Burden: Flight Safety and the Roots of Pilot Error, Iowa State University Press, 1994. Back

(3): Computer Blamed as 70 are killed in Peru crash The Times, 3 October 1996. Back

(4): Airliners of the World Flight International, 6 - 12 December, 1995, pp49-86. Back

(5): World Airliner Census, Flight International, 25 September - 1 October, 1996, pp31-51. Back

(6): James T. McKenna, Peru 757 Crash Probe Faces Technical, Political Hurdles, Aviation Week and Space Technology, October 7, 1996, pp21-22. Back

(7): James T. McKenna, Blocked Static Ports Eyed in Aeroperu 757 Crash, Aviation Week and Space Technology, November 11, 1996, p76. Back

(8): Edward H. Phillips, NTSB urges Change in Static Port Covers, Aviation Week and Space Technology, December 2, 1996, p33. Back

(9): Robert M. Jenney, AOA would Have Helped, Correspondence, Aviation Week and Space Technology, December 9, 1996, p6. Back

(10): Gary T. Dye, Don't Surrender Judgement, Correspondence, Aviation Week and Space Technology, December 16, 1996, p6. Back

(11): W. E. Kelly, Not Such a Great Idea, Correspondence, Aviation Week and Space Technology, January 27, 1997, p6. Back

(12): Jack Karran, Secondary Implications, Letters, Flight International, 5-11 February, 1997, p41. Back

(13): C. H. Morshead, Full system testing is necessary, Letters, Flight International, 19-25 February, 1997, p58. Back

Back to 'Incidents and Accidents'