The term ‘safety critical system’ is usually associated with today’s modern technologies but there are many older applications that can also be classed as safety critical systems (Safety Critical Systems have been around for many years, but the term has become more popular with the advent of computer technologies). As an example if we look at the network of mechanical signal-box’s, with their associated mechanical interlocking that once proliferated the railway scene (although now largely redundant there are still many examples in use today, some even serving main line routes). The interlocking prevented the signalman inadvertently ‘Pulling Off’ the wrong lever thereby allowing (in theory) the train a safe passage through the controlled section. If the interlocking had not been in place then the signalman would be free to pull any lever off, causing a degree of chaos at the least, but more seriously, the possibility of loss of life. By Peter A Johnstone
Safety critical systems are taken for granted most of the time. When these systems are performing normally we do not give them a second thought - that is if we are aware of their existence in the first place. It is only when a particular safety system fails, or to be more precise, fails badly, do we become conscious of them. To illustrate this we have only to look at the world of aviation.
Today’s modern generation of aircraft with their ‘Glass Cockpit’ and ‘Fly-by-Wire’ technologies take off and land without incident the vast majority of times. Millions of people every year fly to their destinations aboard these aircraft without problems, it is only when one falls from the sky, or is involved in some other incident, that the public become aware that a particular system or systems were in place or even existed.
If a vital system were to fail on an aircraft it could lead to tragic results, therefore it is of utmost importance that all of an aircraft’s vital systems are designed, in the event of a failure, to fail safely, thereby allowing the crew or the system to control the aircraft in a satisfactory manner, until either such time as the fault can be remedied or the aircraft can reach safety.
Putting a system in place that is an aid to the safety of the aircraft and passengers, is of prime importance, but the system is as good as useless if the crew either do not fully understand how it works, or worse still don’t trust it. The incident in which a British European Airways (BEA) Trident 1 (G-ARPI) crashed shortly after taking off from Heathrow in 1972, has several such examples. Although the Trident is not one of today’s modern flight management controlled aircraft, it did contain several innovations that at the time were the state-of-the-art, and many of the systems in place on the Trident 1, were of a safety critical nature.
The crash involved flight BE548 on route to Brussels from Heathrow, and like so many aviation accidents it was put down to errors committed by the flight crew. However if we look deeper at the events that led up to the fateful crash, it can be seen that maybe there was a recipe for a disaster there already.
It was Sunday 18th June 1972 and the weather was deteriorating and as the events of the day unfolded it would worsen. There was an approaching cold front which was giving an overcast condition with rain and a cloud base of 1,000 feet, in general the flying conditions were going to be unpleasant and turbulent.
The normal cockpit arrangements for the Trident were for the captain to sit in the P1 seat (front left), the co-pilot in P2 (front right) and another co-pilot in P3 (central behind P1 & P2). Under normal conditions P1 and P2 would share the aircraft handling roles with P3 assisting in the operation and monitoring of the aircraft systems, monitoring the flights progress, and the checking and completion of any paperwork. It was usual practice for the co-pilots in the P2 and P3 positions to change seats and duties as the flight progressed.
The crew for this particular flight would consist the Captain, Stanley Key - a 51 year old very experienced senior officer who had over 4,000 hours commanding Tridents to his credit, he would occupy the P1 position. The P2 position would be occupied by the relatively inexperienced 22 year old, Jeremy Keighley, who had only had 40 hours experience as a co-pilot. In the final seat P3 was Simon Ticehurst a fairly, but not fully, experienced 24 year old co-pilot. The anomaly whereby there was an inexperienced second officer in the co-pilot seat (P2), who would be expected to take control of the aircraft should the captain become disabilited, but a much more experienced and senior second officer in a much less demanding role in the P3 position, was not an ideal arrangement. This situation was bought about by an ongoing dispute between BEA and BALPA (British Airline Pilots Association) which was over various matters including pay and working conditions, this had already led to one group of supervisory officers withdrawing their labour. (In a normal circumstance a supervisory officer would occupy the P3 position when there was an inexperienced co-pilot in the P2 seat)
(Interestingly one of the disputed items was the of fitting CVR’s (Cockpit Voice Recorder) into all commercial aircraft, BALPA felt this was a device that would be used by the management to spy on the pilots and strongly resisted it. However the accident that befell this flight was the turning point which led to it being made compulsory for airlines to fit the CVR to all their aircraft).This group of supervisory officers were those responsible for training junior pilots (trainees), Jeremy Keighley was just one of these junior pilots, he had qualified as a co-pilot for P2 duties but due to the dispute had as of this time not qualified for P3 duties (Because Jeremy Keighley had not yet qualified to P3 level, it meant that he could not change positions at any time with the co-pilot in the P3 seat, this left the least trained and most junior officer in the role of second pilot). However BEA had made the decision to continue to use these trainees for flight duties in the interim, until such time as the dispute was settled. The dispute was also to cause another problem for flight BE548. Captain Stanley Key was vehemently opposed to the industrial action that was due to be taken by many of his colleagues the next day, and during a discussion on this subject in the crew room before the flight, there was a raging outburst and loss of temper by him levelled at one of his colleagues (this was not one of his own flight crew who were to fly with him that day - although Jeremy Keighley witnessed the whole incident and was probably a bit unnerved by his captains outburst). Unfortunately for Stanley Key, and all the others who were to travel on the same aircraft, he was suffering from an undiagnosed heart disease, Arteriosclerosis. This outburst would have pushed his blood pressure up very high, which in turn would have put tremendous strain on his heart.
The Trident, like the Boeing 727, had three engines grouped around the tail fin, with the tailplane mounted clear and high up on the tail fin. Although this engine arrangement had its advantages, one of which was that if one of the engines lost power the plane would not veer off in one direction as with planes with wing mounted designs, it also had its disadvantages. The engine and tailplane arrangement of both the Trident and the 727 can lead to the possibility of deep stall. As an aircraft decreases speed the angle of the wing to the air passing over and under it has to be increased, this is done by raising the nose of the aircraft. With the Trident and 727 if the nose is pulled up too high, the turbulent flow of air passing over the wing can enter the engine intakes causing a stall, but if the angle of the nose is increased to a greater angle this same turbulent air not only interferes with the engines, but also the tailplane and can render the stabilisers useless creating a deep stall condition. Once the plane has entered a deep stall condition it is almost impossible to rescue the situation.
To increase lift at slow speeds, preventing the need for a high nose angle, the Trident 1 was fitted with a combination of Krüger and droop systems along the wings leading edges (commonly both called droops by air crew) along with conventional flaps on the trailing edges of the wings. The Krüger and droop systems, employed on the Trident, quite literally drooped the leading edge of the wings, altering the wing profile and giving increased lift at slow speeds.
At this time the Trident was the only British aircraft that had leading edge high-lift devises that were retractable, the principle was simple and efficient, but not without complications. Both the droops and the flaps were operated independently by two separate levers that were positioned side by side (a single lever to operate both the leading and trailing edge lifting devices has since been fitted to all aircraft). The droops had a substantially greater effect on lift than had the flaps at low speeds, so it was important that there should be a device fitted whereby the droops could not be retracted in mistake for the flap selector (the aircraft would lose lift and be instantly close to a stall if the droops were retracted first, possibly causing the plane to drop from the sky).
"To prevent inadvertent retraction of the droops instead of the flaps the droop lever was protected by a mechanical guard throughout most of the flap lever range. After selection of flaps up, the droop lever was unguarded. A climb to 3,000ft, and further acceleration to the minimum ‘droops up’ safety speed of 225kt, was then required before droop retraction. During this period of about 2min the droop lever was unprotected, but since there was no requirement to operate either lever a speed guard was not considered necessary. At a later date a speed guard was introduced. An amber warning light placed forward of the droop lever was arranged to illuminate if the droop lever was out of position: i.e. if the airspeed was too low when retracted or if excessive when extended." (Stewart 1994. p96)
In the past, incidents had occurred whereby the first officer was flying the aircraft and the captain, without saying anything had raised the flaps to gain speed immediately after the landing gear was raised, the first officer had then reached over to raise the flaps and grabbed the only lever that was left to him and retracted the droops instead. The aircraft immediately began to drop from the sky, fortunately on those occasions the crews were alert and skilled enough to rescue the situation before a disaster happened.
"With time the matter faded into the background. Little or no information was circulated on the effects of change of configuration on stall warning and stick push functions; few were aware that in such circumstances, with droops retracted early, stick shake and stick push were almost coincidental." (Stewart 1994.p99)
With a traditionally designed aircraft, the aircraft would suffer buffeting as the air over the wing became turbulent warning the crew of a possible stall situation, if this situation was ignored then the nose would pitch sharply down, speed would increase, and a stall averted. However, with the droops extended on the Trident there was no buffeting to warn of a possible stall condition, and the characteristics of a rear engine T-tailed aircraft are that the nose tends to pull up in a stall, thus slowing the plane down even more, adding further to the stall. This can lead to the deep stall situation, described earlier, from which there is little chance of recovery, the aircraft would then just fall flatly from the air as happened on 3rd June 1966 when a Trident 1 (G-ARPY) deep stalled during stall tests, killing the test crew on board.
The Trident was equipped with stall warning and recovery devices, which included a stick shaker, a devise that shock the control column pre-warning of a stall and a stick pusher that would push the column forward if a stall developed, this to compensate for the nose up situation that the plane would otherwise enter into. Also there was an amber “stall recovery operate” lamp and a red “stall recovery fail” lamp situated by the airspeed indicators, the first would light if the stick push situation occurred and the second would light if there was a failure in the system. Another amber lamp was positioned next to the droop out and which would illuminate if the stick push system was affected by air pressure dropping below a certain level.
The general feeling among the pilots was that these systems were unreliable, due to the fact that both of the stall warning devices had suffered teething troubles at the beginning, although now these problems had been rectified and there was no evidence to back up their suspicions, they still weren’t convinced. Because of this, crews would often override the stick push if they believed otherwise, even if the situation was genuine.
Unknown to the crew of flight BE548 the locking wire on the three way valve on the stick push device was missing, any sudden jarring or jerking might be enough to upset the integrity of the device. Even a small misalignment of the valve could effect a slight pressure drop, this could in turn cause the amber ‘low pressure warning’ light to illuminate, situated just in front of the droop control and next to the ‘droop out of position’ warning lamp.
The flight was ready for departure, the trim of the aircraft set. There was a full payload of passengers and baggage, but only a low amount of fuel due to the short duration of the flight. Although there was room for more fuel, no more payload could be carried due to the aircraft already having reached its maximum payload weight. At the last moment the flight dispatcher re-entered the cockpit with news that a freighter crew were required in Brussels and room on flight BE548 must be found for them. The extra weight put the aircraft over the recommended payload capacity and this meant that load readjustments had to be made. With the addition of the three freighter crew this brought the total number of people on board to 118, including the three pilots and the three cabin crew.
G-ARPI made its way to, and lined up on, the runway but a last minute problem occurred and the tower was informed that there was a small problem. One can only guess at what this problem was, as the aircraft was not fitted with a cockpit voice recorder, but it could well have been the ‘low pressure warning’ light coming on, but half a minute later the tower were informed that all was OK and they were ready for take-off, they were cleared for take-off almost immediately at 17.08:24 BST.
At 17.11:00 BST, G-ARPI crashed into the ground not four miles from Heathrow, killing all those on board. The flight recorder recovered from the wreckage enabled the air accident investigation team to ascertain that the crash was caused by the droops being retracted too soon, the minimum speed required for droop retraction was 225kt, but in this case they were retracted when the aircraft was travelling at only 162kt. Without the benefit of a voice recorder we can only theorise as to what actually took place in the cockpit, but with the aid of the flight recorder we can at least begin to piece together the last moments of flight BE548.
Captain Stanley Key was probably suffering a fair amount of discomfort from his chest and possibly a lack of concentration as well. The fact that the speed lock was selected very soon after the autopilot was engaged, with the speed 7kt lower than that required (target speed for this flight should have been 177kt), points us to surmise that the captains mind was not fully on the job, could it be he was suffering with pain or maybe he was distracted by the low pressure warning light? Stanley Key was a pilot who flew by the book, he knew that in such turbulent conditions it would be very inadvisable to fly below the target speed, but he did, something was distracting him.
The flaps were selected fully up at 17.10.03 and power was reduced to the noise abatement power setting, at this point the autopilot reduced the climb angle to maintain speed. The autopilot was not coping too well in the turbulent conditions and the speed began to drop to 157kt, now 20kt below the target speed. This seemed to go unnoticed by the crew, probably due to the captain be distracted and the co-pilot having very little experience. At sometime around this point the three way valve in the stick push ducting shifted to one-sixth out of position, this could have led to the system pressure dropping below its lower level and this in turn would illuminate the ‘low pressure’ warning lamp. As revealed before the ‘low pressure’ warning lamp is situated adjacent to the droops ‘out of position’ warning lamp and in the heat of the moment could easily have been mistaken for that. Whether that was the case or not, we can never be sure, but at 17.10:24 (21sec after flaps selected up) the droops were selected in. The plane instantly went into the stall regime, what happened next was rapid and tragic.
"Within one second of the droop lever movement, the flashing amber ‘alert’ lamp on each pilot’s station operated, indicating a problem. The ‘controls’ window of the central instrument warning system (CIWS) display panel also illuminated outlining the fault area, and the droop ‘out of position’ lamp lit up specifying the cause. One second later the stick shaker operated, followed half a second later by the stick push. The amber ‘stall recovery operate’ lamps by the airspeed indicators illuminated with stick push operation, and, under certain circumstances, there may have also been a fleeting illumination of the red ‘stall recovery fail’ light situated nearby. The ram force pushing the control forward immediately disconnected the autopilot, illuminating the flashing red ‘alert’ lights at each pilot’s station and also the red ‘autopilot’ window of the CIWS display panel. A loud audio autopilot disconnect warning was also transmitted to each pilot’s headset." (Stewart 1994.p108)
This chain of events would have taken less than three seconds and would have placed even the healthiest person under duress, but for Stanley Key it must have been overwhelming, it was about this time when his heart began to give way. The crew appeared to be at a loss as to what had and was happening and why, and twice more the stick push cut in, each time the aircraft was kept level by the crew, but no attempts were made to reselect the droops. After the third stick push the crew overrode the system controlling it, possibly as they were suspicious of the systems integrity. The confusion in the cockpit during all this must have been extreme, as at no time was any attempt made to disconnect the autopilot alarm, this in itself adding even more to the situation with red lights flashing and an audible warning ringing through their headsets.
Captain Key managed to level the aircraft and must have felt confident that he had it under control, as with the airspeed at 193kt the captain pulled back on the control column to try to gain height. It was at this point that G-ARPI entered the ‘true aerodynamic stall’. Thirty-six seconds after the movement of the droop lever the aircraft hit the ground, killing all those onboard.
Many of the systems on the Trident 1 aircraft can be classified as safety critical and yet when things went wrong the systems did not prevent the disaster. Although the systems themselves did not fail (with the exception of the locking wire on the three way valve), the way in which these systems were able to be (mis)used, the poor layout of controls and warnings, the lack of confidence in the systems, the lack of full understanding and/or proper training, must call into doubt the effectiveness of having these systems in place (The use of inexperienced air crew by BEA must also be called into question, as no matter how many safety devices you build into a aircraft (or any other machine) they are of no use if the operators do not have the experience or don’t know how to use them, especially in a case like this when the lives of many people are in your charge). It is no use installing safety critical systems without first considering how they will be used by the operators, when reviewing what happened to G-ARPI it becomes obvious that the designers did not take into account or consider how these systems could or would be used.
There can be no doubt that Stanley Key’s heart problem was a contributing factor to the disaster, however if we look at the events more closely it is possible to see that had the design and performance of the systems been superior this disaster may well have been avoided.
The way the Trident was designed, with its three rear mounted engines, makes it inherently more likely to stall than those of conventional design with wing mounted engines. The problem that rear engine planes are more likely to stall was known before the Trident was designed, and yet it was felt that the risk was minimal. But note was taken, and various safety features were added to try to alleviate the problem, including the stick shaker and the stick push devices. Unfortunately the pilots didn’t trust the systems as from their introduction they had had more than a few teething problems and scares with the devices. False warnings from time to time had given the crews a genuine cause for worry, it is not hard to understand the concern the crew would have felt when the ‘stick push’ falsely operated, and even more worrying if this were to happen during a take-off or landing. The mistrust by crews of this particular device led to them often disconnecting the system, even though the warning may have been genuine.
Having the droop and flap control levers situated side by side, whereby they could be operated in an unsafe manner, was a particularly poor piece of design and hints of little or no thought being given to those that would have to fly the plane, and the way it could or would be operated. Later designs had a single lever that operated both flaps and droops eliminating the problem. The fact that the same scenario had happened more than once before, but with different consequences, and yet this information was not made readily available to the crews who flew the Tridents, shows how a breakdown in communication can have devastating effects. Had the crew been made aware about this and it was taught during training, the accident could well have been avoided.
The arrangement of the droop ‘out of position’ and the ‘low pressure’ warning lamps side by side just forward of the droop control lever, both lamps being of the same colour, was poor design. The positioning of these lamps was likely to be a reason as to why the crew never noticed, or took any notice of, the amber droop ‘out of position’ lamp. As was stated earlier, the amber ‘low pressure’ lamp had already been lit up and, as it was next to the droop lamp the crew may have presumed that the amber lamp they saw glowing, out of the corners of their eyes was the ‘low pressure lamp’, not the ‘out of position’ lamp.
For safety critical systems to be of use they must be easy to understand and easy to use, it is apparent that the systems onboard the Trident 1 were neither. In fact the doomed flight was reflown on a simulator by a very experienced pilot Jon Scott (now with British Airways), he had to admit that the situation the crew were faced with was startling, as can be seen by his comments:
“The more we do it, the more horrific it appears. The time scale is very, very short. It's two and a half minutes from starting the take-off roll to impact in a field in Staines, of which three quarters of a minute is spent on the take-off roll. There were only thirty-six seconds between when the droop lever was moved to impact. There was a whole cacophony of sound and light going on all at once, totally unexpectedly, as well as the aircraft buffeting around in the weather conditions at the time. It's quite understandable how the crew didn't recognise their predicament.” (Faith 1997.p174)
Some of the problems in the past that were experienced with the Trident 1, together with some of the factors that led to this crash, can be put down to failures by the aircraft manufacturer to ensure that certain built in safety features worked correctly, and any problem areas identified, before the aircraft went into commercial service. The same situation faces modern aircraft designers, how to make sure that every safety system works efficiently, and if it does fail, fails safely.
The reliability that is placed today on modern technology to control an aeroplane does lead to a whole new lot of problems, not only in respect to safety aspects but also the way the role of the pilot is changing. Today’s pilots who fly this new breed of aircraft have to learn a whole new field of skills, as Professor David Woods of Ohio State University puts is; “….the manual skills are not as critical. Instead, what we need to train is the judgement, because the pilot’s role has shifted and he has become more of a manager of these automated resources.” (Faith 1997.p86)
Or as Peter Mellor of the Centre for Software reliability Sums it up; “The computer flies the plane and the pilot flies the computer.” (Faith 1997.p80)
The Airbus A320, the first of the modern generation of fly by wire aircraft, could be described as a flying computer. It contains approximately 150 different boxes, each of which can be described as an individual computer (Although there are many computers onboard the A320 controlling specific tasks, there are five main flight control computers as the hub of the system). Having computers to control some of the functions on an aircraft is desirable, they can take away some of the more mundane tasks and also be used to monitor systems. However, relying on computers to run all the systems and fly the aircraft as well can lead to whole lot of possible new problems.
Those of us who use computers are well aware of the problems that can arise with the software we are using (even on some well tried, tested and updated packages), we ask it to do something and all of a sudden we get an error message, the program may just crash or it does something else for no apparent reason. When this happens at home we can play around and try to remedy the problem or switch it off and come back to it later armed with the manual. If the worse comes to the worse we can take it back to the shop and get them to look at it. None of these options are open to the crew who are trying to fly an aircraft, you cannot put a plane on hold and try to remedy the fault, especially if it is heading in a downward direction at the time. Even if the aircraft is in a stable flight position there would not be the amount of fuel on board to allow any real amount of time to sort the problem out anyway, even if there were the time, pilot’s are trained in the art of flying an aircraft and not in computer repairs. Bob Besco a safety consultant and ex-pilot with American Airlines relates a story that is doing the rounds within aviation circles:
“There's an apocryphal story of a flight crew which put an automated airplane over Paris into a holding pattern and couldn't figure out how to change the program to get it back out again. It's still there, three years later, with the crew trying to figure it out. On the ground they're trying to find a way to refuel the airplane and get some food to the pilots and passengers so they get it out of the holding pattern.” (Faith 1997.p80)
This story, although tongue in cheek, does have a familiar ring to it. On the 26th June 1988 at Habsheim, France an A320 crashed on a demonstration run. The crew were conducting a low speed fly-past and tried to pull up and clear some trees at the end of the runway. The computer on board the A320 believed the plane was in landing mode and by the time the crew had managed to throttle up the engines it was too late.
The computers on these modern aircraft need to be in different modes to deal with the different part of the flight that the aircraft is in, i.e. climbing, descending, landing, taking-off, etc, and this is where confusion can, and often does occur. A perfect example of the mode selection problem was demonstrated in November 1991 when a Fokker F100 tried to land at Chicago’s O’Hare airport. Although the plane had landed safely, the computers on board believed the plane to still be in the air and would not allow the crew to apply reverse thrust, or use the main gear braking either.
In this instance the reverse thrust was not available to the crew to stop the aircraft on the ground, however what if the opposite were to happen and the reverse thrust were to be applied when the aircraft was in mid-flight? This is exactly what happened on the 26th June 1991 to a Lauda Air 767-300ER when a thrust reverser deployed during the flight, causing the plane break-up and killing all 223 onboard. The application of applying reverse thrust in mid-flight would literally rip the plane apart, and so safety critical systems should be in place to stop this happening. The scenario whereby a thrust reverser can automatically deploy, is supposedly impossible, but upon investigation of other aircraft within the same class similar defects were discovered, and changes were instigated.
One company that has led the world in the technological advancement of the cockpit and the way that an aircraft is controlled, is the Airbus consortium. Airbus were the first aircraft manufacturer to introduce a fully digital fly-by-wire control system on an airliner (A320 certified in 1988). However, there has been a series of crashes with the new generations of the Airbus fleet, as well as many other incidents, which has caused many people (including some pilots) to doubt the integrity of the systems on-board these aircraft. Airbus themselves, whilst trying to protect their name, have fuelled this speculation about the systems by their constant insistence that there is nothing wrong with their aeroplanes, always trying to shift any blame on to others, usually the ‘others’ that they try to blame are the pilots and flight crew.
The usual reasons that the pilots were blamed, were; That they did not understand the information the computer was presenting them with. That they misread the computer display(s). That they tried to get the aircraft to do something that the computer didn’t agree with.
It is easy to say that the crew are to blame because they did this or they didn’t do that, that is why ‘x’ happened, however if we look at the radically different environment and control methods that crews are now being faced with, part of the problem soon becomes apparent. Most of today’s airline pilots have been flying many thousands of hours on conventionally controlled aircraft, now they are also being required to learn to fly these new breed of aircraft, where they are expected to be more of a systems manager, and less an airline pilot. The pilot no longer has ‘the feel’ of a plane, when the stick is moved it moves the appropriate surface via a digital link, the pilot has no feedback from the electrical interface as to how the plane, or particular piece of equipment, responds to his commands. Unlike on the conventional aircraft they were trained, and have been used to flying, where, when the stick is moved it actuated the appropriate surface via an electrical, mechanical and/or hydraulic method, and the pilot could feel the reaction though the column.
The way that a pilot is able to judge the status of the plane and its performance in a visual sphere, has also changed radically. With a traditional aircraft there were banks of switches, dials, knobs, lights, etc that the flight crew could see and read at a glance to ascertain all was well, or identify any problem easily. With the introduction of the glass cockpit most these dials etc have been replaced by a couple of computer screens containing a vast amount of information and although these screens contain most of the information required by the crew, this information is not always easily read. Information in traditionally designed cockpits was usually of the analogue type, the crew were able to observe a needle progressing up or down a dial, and make out certain trends. In the glass cockpit, information is displayed in a digital format, and crews have found it more difficult to observe these trends as information is given as just a few digits, a change in one or more digits in a particular system may go unseen and trends can go unnoticed (At times it may become almost impossible to note certain trends as the screens can only display a small amount of the information required at any one time (otherwise screens would become too cluttered to read). The information has to be displayed as a number of pages on a single screen and therefore a single piece of information on a certain screen could quite easily be missed).
As mentioned before, the Flight Management and Guidance Systems (FMGS) that control modern aircraft require that they are in specific modes before they can perform certain tasks. If the FMGS is in a different mode to that required then problems can occur, as in the cases mentioned earlier with the Fokker F100 and the A320 at Habsheim (An incident whereby an A320 crashed near Strasbourg on 20th January 1992 was due to confusion of the Flight-Path Angle (FPA) and the Vertical Speed (VS) modes of descent that had been selected on the FMGS. The crew had not noticed that they were in VS mode when they should have been in the FPA mode. This led the plane to descend at a vertical speed of 1100 metres per minute, this when the plane was only 1500 meters above the ground. (Risk 14.74)).
Aircrews were used to flying aeroplanes, they had complete control over the way the craft flew, nowadays though (even when they override the system) the FMGS has the final say of what they can or can not do. The reciprocal knowledge that pilots have gained over the years can never all be programmed into a computer, the way that a pilot will automatically react to a situation is a reflex action, and this action may not always follow a logical pattern. Therefore if the system will not allow the pilot to react in an automatic way in an emergency, then valuable seconds could be lost in the conflict between the pilot and the FMGS in an attempt to resolve the situation.
Airbus’s continuing insistence that there is nothing wrong with the systems computer hardware or software on their planes that have led to an incident has, although doubted, never been proved. Accident reports over the years have shied clear of apportioning the blame to software and hardware faults, although they may have been suspicions about them, it was very difficult to actually confirm or reproduce this type of fault.
However, a recent Air Accident Investigation Board (AAIB) report into a series of incidents that befell a Virgin Atlantic A340 Airbus has made history in being the first to specifically cite the reliability of the hardware and software as a major factor. The incidents in question occurred on the 19th September 1994 involving the said A340 (G-VAEL) on a flight from Narita Airport, Japan bound for Heathrow Airport, London. To be more accurate, the troubles started before the flight had even left the ground.
During preparations for the flight, one of the two Fuel Control Monitoring Computers (FCMC) indicated numerous faults, and so the aircraft departed with just one FCMC operating (an accepted procedure), the crew following the correct procedures for calculating fuel in this situation. During the early part of the flight the commanders map symbology on his Electronic Flight Instrument System (EFIS) disappeared, and all calculations on his Multifunction Display and Control Unit (MDCU) ceased. The co-pilots EFIS and MDCU were working and so both units were slaved off of the co-pilots Display Management Computer (DMC). After about an hour the commanders EFIS was restored, and a little later the FCMC was restored after resetting the computer.
As the aircraft neared Heathrow, the co-pilot manually tuned the Lambourne VOR (radio beacon) using his MCDU to ensure that there EFIS navigation displays were working correctly. A few miles east of the beacon “the commanders EFIS map display symbology froze and lost all computed data for no apparent reason. His MDCU displayed the message ‘PLEASE WAIT’ together with the data entry page normally seen only when initialising the computer before flight; he was unable to obtain any other display.”(AAIB 3/95). At roughly the same time the co-pilots EFIS and MDCU behaved in an identical manner.
At this point there was an inadequate amount of time to try to restore the EFIS, and so after contacting Air Traffic Control (ATC), they decided to make an Instrument Landing System (ILS) approach (Not all flight control information was lost from the EFIS, what information there was enabled the crew to still ‘fly’ the aircraft). Whilst tuning the navaids the crew received an ECAM warning of low fuel state, followed by instructions to open the crossfeeds (there are several tanks on the aircraft, and fuel is pumped between them) for engines 3 and 4. After a short while the warning reoccurred but this time with instructions to open the crossfeeds between engines 1 and 2, also the readings were indicating that they had 4.5 tonnes of fuel left, some two tonnes less than they were expecting. After a discussion with ATC the commander decided to declare an emergency which would allow them a priority landing.
The autopilot was being used to capture the ILS and at five miles the glideslope was intercepted, however the autopilot was having difficulty in following the glide path and was temporarily disconnected. The tower was informed and an Surveillance Radar Approach (SRA) was requested.
"Initially the SRA proceeded normally and the co-pilot re-engaged the autopilot in heading and height modes. On the base leg heading of 180°, when a left turn onto 130° was demanded using the heading selector knob, both heading bugs went left to 130° and the commander's flight director bar went to the left. However, the co-pilot's flight director bar went to the right and the aircraft turned right. At this stage the co-pilot disconnected the autopilot and flight directors and flew the aircraft manually in accordance with the headings and advisory altitudes. However, because of the unwanted turn, the aircraft overshot the centreline and large heading corrections were required to regain it. When the aircraft was established on the centreline and descent profile, both sets of ILS indications appeared to give correct information. After informing the crew that the RVR was now 1,300 metres, the controller cleared the aircraft to land and continued with the 'talkdown' commentary. The crew saw the runway at about 500 feet altitude and the aircraft landed at 1503 hours. After taxiing in and shutting down, the fuel indications recovered to 4.5 tonnes." (AAIB 3/95)
The AAIB identified several problem areas with the software and hardware on the A340, some of which were already known to Airbus Industries. A brief summary of the problem areas and findings taken from the AAIB report are included below (AAIB Bulletin No: 3/95 Ref: EW/C94/9/2 Category: 1.1).
AUTOPILOT AND FLIGHT DIRECTOR HEADING PERFORMANCE.
“The reason for the wrong response of the autopilot and one flight director to the left turn demand was a software error.” This error along with several others was known to Airbus Industries and corrective measures “contained in Flight Management Guidance Envelope Computer (FMGEC) standard L-5 and has been issued and incorporated in most A340’s on the UK register.”FUEL QUANTITY INDICATIONS.
In July 1994 Airbus Industries issued an Operations Engineering Bulletin on the subject of fuel quantity indication. I include this bulletin (as written), to demonstrate how easy it is for aircrews to get confused as to fuel quantity levels, both before and after reading the bulletin. The AAIB contact several operators of the A340 and found all had experienced problems of this nature. Some had overcome this with help from Airbus Industries, and those still having the problem were hoping for improvement with the installation of extra fuel probes in the inner tanks and FCMC standard 6. Meanwhile, one operator carried an extra 2 tonnes of fuel each flight just to prevent the nuisance low fuel level warnings.BULLETIN:-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
REASON FOR ISSUE
Several cases of abnormal fuel quantity indications have been reported by operators.EXPLANATIONS
The following phenomenon’s have been observed;
1. On ground when FOB quantity is above 75t there is an under-read quantity (actual FOB above ECAM indication). 'The cockpit indication may be out of tolerance by 45 kg for each ton above 75 t fuel on board.2. The inner tank quantity indication is affected by the significant pitch variations. This induces: - an over-read in climb (up to 4t at the end of the climb)- an under-read in descent (up to 1 t at the end of the descent).
3. When the fuel quantity in each inner tank is between 3 and 7 t, an under-read of up to 1,3t on the FOB may occur. This can be noticed especially on ground at pump ON selection. After pump OFF selection a stabilisation time of about 10 minutes is necessary to get an assessment of fuel remaining at the end of flight. In flight this phenomenon can induce some fluctuations of the indicated quantity linked to fuel shifting due to aircraft movements.
ACTIONS
1. A solution to cancel this anomaly is under investigation.
2. & 3. A solution to these anomalies will be provided with FCMC standard 6.RECOMMENDATION
In flight consider that a stabilisation time of 15 minutes in cruise is necessary to get an accurate fuel indication.
In addition, as recommended in SOP 3.03.16 P.2: - during descent preparation: check FUEL USED indication,
- in Go Around phase: Use FUEL USED indication to determine remaining fuel quantity.On ground with inner tank quantity between 3 and 7t, if the fuel quantity decreases at pumps ON selection, consider that the right value is the value with pumps off.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
FMGS DOUBLE FAILURES.
“After landing the aircraft's Central Maintenance System had logged a fault in No 2 FMGEC. This was removed and sent to France for data extraction and fault analysis. No fault was found within the hardware and a comparable software fault could not be reproduced on the test bench. Nevertheless, the BITE data dump showed that at 1435 hours the No 2 FMGEC had detected a CLASS 1 HARD failure within itself and a simultaneous fault within FMGEC No 1. The investigation was complicated by the involvement of several sub-contractors in the manufacture of the FMGEC and its database.”Other A340 operators were contacted by the AAIB, and they found that this was not an isolated incident with regards to this problem, with half of those contacted having suffered double failures of the FMGS. Airbus Industries were well aware of the problem which had first emerged with the A320, and have successfully reduced the frequency of a double FMGS failure, but as of yet have not eradicated the problem.
All this led the AAIB to issue the following recommendation.
“It is recommended that the reliability of the Airbus A340 FMGS and the fuel management system should be reviewed to ensure that modified software and hardware required to achieve a significant improvement in reliability is introduced as quickly as possible and the subsequent system performance closely monitored.” [Safety Recommendation 95-1]The two incidents dealt with in depth in this essay (Trident 1 and A340) although separated in time by over 22 years both involved ‘state of the art’, technological aircraft. Both had the latest safety systems installed and yet one crashed, and the other ran into numerous difficulties. The systems on board were designed as safety critical systems, they were designed not to fail but if they did, they would fail in a way that would not cause problems to the crew or the passengers, safely. However the systems did fail. In the Trident 1 disaster, the ‘system’ was not so much a piece or pieces of equipment failing, but more the inadequacies of the system to perform to the required and designed level and provide the level of safety needed when dealing with systems that could (as in this case) lead to loss of human lives.
With the A340 incident, systems did fail, nobody was killed or injured, but the crew were forced to declare an emergency as the system led them to believe that they were running low on fuel. As it turned out the fuel levels were more than adequate. Nevertheless if this situation had happened over an ocean, as opposed to near Heathrow airport, the crew would have been looking for a place to ditch, which would have started a full scale emergency. The failure of both the pilots and co-pilots EFIS and MCDU, and the software error that led to the plane turning right instead of left, could have had serious consequences. Was it a similar problem that recently caused an Airbus to crash in Indonesia? Apparently the ATC instructed the pilot to turn left, but the plane turned right and flew into a mountain. Until the official report is released we can only surmise what happened, but the similarities are there.
Safety critical systems are systems that are, or should be, safety orientated, even if they were to fail they should fail gracefully. They should be error free as possible and provide the necessary protection that they are designed for. However as we have seen, safety critical systems do not always live up to their name and things can go very wrong, sometimes with tragic results. Today with the growing trust being placed on technology, we should be aware that the systems we are relying on, are only as good as the person(s) who designed and built them. However safety critical systems should be reliable, the persons operating the equipment must have trust in the system. Yet there are many instances where systems have given false alarms, and eventually this could lead to the operators either ignoring or overriding the systems (as happened in the early days of the Trident), if they were to ‘cry wolf’ too often.
In the desire to capture markets, manufacturers are filling their planes with innovative ideas and technologies, it has become a race to see who will have the most sophisticated aeroplane in the air first. There is nothing wrong with this, providing that safety isn’t compromised. Testing new systems before they go into use on an aircraft, has always been a complicated procedure, especially when trying to foresee all of the permutations that could occur, but this is made even harder today with the use of computers in aircraft. Their can be many million lines of code for a computer system on one of today’s modern airliners, and it would take so many years to test and check every permutation before the aeroplane goes into service, that it would render the design redundant before it ever flew, therefore aircraft manufacturers work on the principle of levels of risk. When a new piece of software (or hardware) is tested, many thousands of tests are conducted to try to get the program to fail, only when testing is completed satisfactorily will it be installed. Though there will be programming errors that are not picked up in these tests, the manufacturers belief that the possibility of these errors arising, or affecting the safety of the aircraft, is minimal and these are acceptable risks to take. Yet these ‘acceptable risks’ do appear with alarming frequency, as in the case of the A340. Much greater effort is required to try and eliminate these risks and make the systems more reliable, it is not just the aircraft that are at risk, but also the lives of all those on board.
It is said of modern aircraft, that they are able to take-off, navigate their own way across the skies and land without the aid of a pilot. Yet as we have seen, when things go wrong it is the human element that has had to intervene and try to rectify the problem. No matter how many safety critical systems we put into place (the A340 has systems monitoring systems), there will always be a need for the human element to oversee the operations, therefore surely we should regard the most important safety critical system, as being the human element?
Faith, N. (1997) Black Box: Why air safety is no accident. London: Macmillan Publishers. Bibliography
Hatton, L. (1996) The Risk Digest [On line] A340 Shenanigans. Risk-16.92 (http://catless.ncl.ac.uk/Risks/16.92.htm#subj2)
Ladkin, P. (1996) The Risk Digest [On line] A340 Accident at Heathrow. Risk-16.96 (http://catless.ncl.ac.uk/Risks/16.96.htm#subj7)
Mellor, P. (1993) The Risk Digest [On line] Strasbourg A320 crash
Risk-14.74 (http://catless.ncl.ac.uk/Risks/14.74.htm#subj4)Newman, P.G. (1995) Computer Related Risks. New York: ACM Press.
Reason, J. (1990) Human Error. Cambridge, England. Cambridge University Press.
Stewart, S. (1994) Air Disasters: Dialogue from the Black Box. Leicester, England: PRC.
UK Air Accident Investigation Board. (1995) AAIB Bulletin No:3/95 Ref: EW/C94/9/2 Category: 1.1
Chronicle of the 20th Century. [CD-ROM] DK Multimedia 1996
Return to Aviation Accidents