Getting it right – avoiding maintenance error

Aircraft Engineering and Aerospace Technology

ISSN: 0002-2667

Article publication date: 1 April 2005

699

Keywords

Citation

(2005), "Getting it right – avoiding maintenance error", Aircraft Engineering and Aerospace Technology, Vol. 77 No. 2. https://doi.org/10.1108/aeat.2005.12777bac.001

Publisher

:

Emerald Group Publishing Limited

Copyright © 2005, Emerald Group Publishing Limited


Getting it right – avoiding maintenance error

Getting it right – avoiding maintenance error

Keywords: Conferences, Aircraft industry, Maintenance

A Conference on this very large topic was held recently at the Royal Aeronautical Society, encompassing airframe, systems and component design, engineering, procedures, training, management, regulation and human factors. Despite recent initiatives, errors continue to occur and attempts are made to identify the critical areas with speakers drawn from a wide variety of fields of endeavour. The keynote address was given by Bernard Newman, technical director of TUI Airline Management who began with the belief that there has been some progress made in reducing these errors but that we have reached a plateau. As CFIT and other piloting accidents are reduced, then maintenance related accidents and incidents start to climb the ladder of cause related headings. The benefits of getting it right first time must be re-emphasised and Safety Management Systems and risk assessments used to pinpoint efforts and not as a source of interesting statistics.

References were made to Human Factors which is a generic heading to a very complex subject and the speaker draws his knowledge from three areas. One is that he has been 17 years as a post holder for a JAR 145 organisation. Second, of course he made mistakes in the earlier years of his career and remembers them. Third, with this experience he is determined to do all he can along with others to understand our vulnerabilities while developing learning and trust between staff and management.

The question must be asked “is progress in Human Factors consistent across the UK and indeed Europe? Advanced flight deck technologies have made and will continue to make significant improvements to avoidable accidents. However, the accidents over the German/Swiss border is a timely remainder that nothing is foolproof. Modern systems are more complex and it may be suggested that do we understand all the other interfaces that may be necessary, for example, for a control output to operate as advertised. In recent accidents, have deficiencies in maintenance skills resulted from ineffective training or systems integration and complexity? In the 1990s “Ageing Aircraft” and “Human Factors” were major initiatives driven by major accidents. The former resulted in satisfactory continued structural integrity. Human Factors have been the subject of many studies and conferences and a typical list of top headings which lead to mistakes is: Job Cards/Stage Sheets; Shift Manning and Control; Quality Assurance out of hours; Tool Control; and Airport Environment and Bylaws. Implications of the last-named may be that staff can be put under undue pressure at times by over-zealous airport officials, leading to the lowering of airworthiness standards.”

These days more people are trained and experienced in Human Factors and through the use of modern software, Safety Management Systems and risk assessment can pinpoint where we should focus our effort. We are moving towards a “Just Culture” but there is much to do and the way middle management acts and behaves is key I think to the reduction of incidents in the future.

Safety management and changing

From Jim McKenna, chief surveyor and head of Aircraft Maintenance Standards, UK CAA, came this paper in which the public perception of the role of any CAA is seen to be that it ensures that the aircraft that they fly on and the infrastructure that supports it is safe. Regulations and requirements shape the manner in which the industry works; the regulator is however, only part of the safety chain. Modern aircraft reflect the culmination of an evolutionary progress. Any accident demonstrates once again that something, someone has failed in the safety system. The public wish to know that it will not happen to them. In one article in 2001 emphasis was placed on the difficulty of analysing trends and patterns. Many airlines take the view that because they do not operate that type or were not involved directly they have nothing to learn from the event. There is a certain ring of familiarity in the failure to detect the spoiler lock-out on the Excalibur A320 and the reverse connection of the sidestick controller on the Lufthansa A320. Both were detectable if the proper “sense” and movement checks were carried out. We all know of occasions where deviations or disturbances have occurred not covered by the manufacturers instructions. Underlying causes and factors should be the basis for changing practices.

The emphasis is on organisations with regulatory approval and not ignoring individuals, either. At the centre of the new requirements are training and education. In the 1990s the emphasis on organisational development led to the establishment of the JAA. The elements that must be of most concern when looking at organisational effectiveness is the issue of personal competence and the misperception that the quality system is there to fix the issues found. The BAC1-11 windscreen, Boeing 737 double engine oil loss and the Airbus A320 spoiler lock-out incidents are well known examples where individuals, who were all trained and well experienced, within the appropriate organisational system failed. Of greater concern was the fact that the investigations revealed more widespread failures and weaknesses within the organisational processes and practices. Many of the key issues revealed were the repeated failure of internal quality systems, management controls and the degree to which processes and procedures were being ignored by individuals.

There were also two post-accident solutions offered, technological and organisational. Thus, a repair or stronger material might hopefully be an answer. Not necessarily, however. The cracking of the Trident wing structure demonstrates this where the design fix for external doubler plates at Rib 8 moved the in-flight stresses to the wing inboard centre section resulting in further cracking at Rib 0 which ultimately accelerated withdrawal of the type from service. Assumptions that the design was flawed are not necessarily true. We must manage the safety aspects of the human interface. How do you measure the effectiveness of training? How much value does an aviation maintenance organisation put on quality assurance? The simple fact is that an organisation's management is failing if they cannot work a quality system to their advantage from a business perspective.

Recent incidents suggest that all is not well. The Airbus total fuel loss and the Boeing 777 panel loss during climb which punctured the pressure hull are two such occurrences. The adoption of human factors awareness over the years in the requirements for maintenance organisation approval and maintenance engineers establish the need to address the issues of human performances, error and its consequences. Other measures include the extension of the UK CHIRP into engineering and cabin crew areas. Safety management needs a live, real-time proactive and reactive system that focuses upon risk. It must gather intelligence and react. It must change to survive and changing practices is at the heart of that.

A just – not a no blame culture

A paper based on a service career over several decades was given by Major Jeff Gardner of REME. We should be aiming to achieve zero accidents attributable to maintenance error. In army aviation 8 per cent are attributable to maintenance personnel. There has been a recurrence of some of these within the same year or over several years, all of which could have been prevented. There are many factors involved, not generally single events and so they will be considered jointly under appropriate headings.

When Ethos is considered, part of the training is to the requirement to follow instructions and where necessary challenge inappropriate, incorrect or confusing information or advice. This openness ensures that when a maintenance error occurs within a unit it is recorded, action taken to rectify it and then a detailed lesson learnt process instigated. The report is passed to an external and independent Flight Safety representative who evaluates the findings and looks towards a wider distribution if required.

Repetitive Tasks should ensure that the technician has a full understanding of the task and is more than capable of completing it correctly. This does mean however that it might be carried out from memory or even worse, results in an inadequate independent inspection. Proper management controls and correct monitoring should avoid this.

Change has a major bearing when what was deemed as standard practice is suddenly not standard any more. It is no longer one dimensional and there is a greater onus on management. The army is under constant change and together with other changes, the overall result means that training or at least induction, is required but may not be forthcoming in many cases. This can result nowadays in technology expertise being orchestrated from the bottom up rather than management controlled, top down. Relying too much on “on the job” training or inadequate training create uncertainty and contribute towards maintenance errors.

Both senior and middle level management play a key part in ensuring that maintenance errors are minimised. It needs to be robust enough to set the right standards but flexible enough to encourage change and personal initiative. The correct level of supervision is essential in ensuring that high standards are not compromised. Without doubt Human Factors is a major contributory factor in maintenance errors and due regard is now given to it at all levels. Army aviation now fully embrace HF and have introduced it as an individual subject heading for all stages of technical training.

Distractions can have a direct effect on carrying out any maintenance task and in many instances it is a break in routine. The correct level of supervision, good management and an environment of minimised pressure all play a part. Thus, the route to reducing maintenance errors is through sound management, openness and by creating the simplest method of work. This is best achieved by identifying and resolving HF issuers and the creation of a “Maintenance Bubble” that reduces perceived and actual pressures and encompasses supervision and independent inspections on a timely basis.

Preventing maintenance error – a regional airline view

This contribution came from Simon Witts, engineering director British Airways CitiExpress who explained that his role is responsible for ensuring that the 523 engineering and maintenance people in the airline who deliver safety have everything in place to enable them to carry out their roles safely. Regulations cover design, manufacture, testing and certification, and in-service activities (engineering and maintenance). The focus is on maintenance error because it is the last link in the chain. A maintenance error chart was shown that shows the number of investigations and illustrates that the organisation has a healthy culture.

A maintenance error is not something to fear but to harness. Thus, if we detect an error in the workpack before it is sent to be implemented, we correct it because it has not yet formed a safety issue. If we detect it on the shop floor via an event, it can cause alarm because it may be classed as a maintenance error. But if the error is harnessed via a maintenance error investigation (MEI) it means that we learn from it.

In my organisation we have sought to stress the positive aspects of an MEI and we work with the process to use an event as a learning experience. Great progress has been made in the industry with the adoption of a new approach to HF as regards training and continuation training. Several examples show the way in which a design may not prevent an incident from happening and reliance is placed on a Maintenance Manual to prevent it. HF work was then built into a safety management approach (CAP 712, etc.).

The way forward is closer links between all the organisations mentioned earlier via a closed loop Safety Management System so that a proactive joint approach will prevent incidents before they happen.

Design dictates the maintenance burden

Stephen James of the UK CAA explored the opportunities to minimise the potential for maintenance error by designers employing “human centred design” (HCD) principles. This paper explores the opportunities to minimise the potential for human error at source. It is the design choices of weight, cost, etc., that dictate much of its maintenance burden and an additional goal of minimising maintenance error is suggested, referred to as “HCD”.

The regulator achieves an acceptable level of safety by both proactive and reactive methods. The object of this paper is how the potential for human error can be addressed at source, i.e. by proactive means. This is against a background that of the total of accidents/serious incidents caused by human error causal factors, 25 per cent are maintenance related. Various EASA standards are reviewed here.

The definition of HCD is “Human Centred Design – design that accommodates realistic human performance characteristics in achieving total system safety”. Aircraft maintenance is an important part of achieving total system safety and therefore maintenance error is a prime focus for our initiatives.

Considering CS 25 (Large Aeroplanes), a few HF related requirements exist but they follow no overall strategy or intent. There is no top level HCD requirement that considers all elements of the aeroplane design and demands risk from human error to be systematically addressed. The CAA has developed new requirements and associated advisory material. It was appreciated that the new requirement should be well defined and that predictive tools cannot be 100 per cent effective. The advisory material goes on to state that “the intent of this requirement is to address the risk of hazardous of catastrophic failure due to the effects of human error. No foreseeable human error acting singly or in combination should have the potential to result in as hazardous or catastrophic event. After further guidance material it goes on to detail one way (but not the only way) for conducting a qualitative analysis of risk by employing an inductive (bottom up) analysis method and exercising sound and informed engineering judgement.”

To further assist the applicant in identifying any potential for error the advisory material highlights known problems with expectation any analysis would include, but not be limited to their conclusion. The examples of maintenance error include that relating to fasteners. The choice of fasteners must consider the effects of human error including from omission, incorrect assembly, mixing of fasteners similar in fit but not function, for example accessibility, incorrect application of torque, or thread specifications that allow a female thread to one specification to engage with a male thread from another specification should be avoided. Wherever possible, removable blind fasteners should be avoided and lengths of common fasteners should either be standardised or clearly and visibly different. Non- structural fasteners should be clearly different to structural fasteners.

As regards accessibility provisions – safety critical tasks must be reasonably achievable when considering the worst environmental and logistical constraints/conditions expected in operations and that there are adequate clearances for any adjustments and servicing to be performed. This should be determined by conducting zonal analysis and validation of tasks. Wiring and rigging of systems should be insensitive to human error at maintenance, or controlled by the provision of BITE or functional checks.

MOD – developing the joint approach; part 1

An explanation of convergence in the UK Services is included in this paper by Cdr. Brian Parsons of the UK DLO. Convergence means change. The task was to take the AP 100 series of single service regulations and bring them into a single, converged set of regulations, supported by common forms and procedures to allow the aviation engineers across the military air environment (MAE) to complete their mission efficiently, whilst maintaining the airworthiness of the platforms. To meet the task a team from all three services with a wealth of experience was put in place at RAF Wyton. It was a lengthy procedure and called for a series of rolling amendments, one every three months introducing more regulations at each amendment, whilst withdrawing single-Service legacy regulations in parallel. In this sizeable task, any project that has an impact on safety must have checks and balances.

Of the initial 291 single-Service instructions identified for convergence, 118 were either decreed no longer required, were incorporated into an existing chapter or the policy was contained in another publication. Convergence is not easy – it requires resources, dedication, motivation, opportunism, pragmatism and good communication. Most of all however, it requires commitment. The single- Service regulations understandably evolved in different ways. The convergence process highlighted many anomalies within single-Service regulations and between the three Services. It was readily apparent that many single-Service engineering activities were over-regulated and over- complicated. Some de-regulation was necessary although this was not without risk. Where the risk of process failure was deemed unacceptable more prescriptive regulation was required to provide clarity and safety. Some important lessons have been learnt during the preparation of the Joint Aviation Publication (JAP) and the concept of the publication has evolved and improved since its initial issue in January 2002, notable minimising duplication between chapters.

Associated improvements of note include the creation of tri-Service authorizations. In addition there have been several areas where the fundamental review of processes have led to the introduction of new procedures, which have improved efficiency and reduced the potential for human error. Although not immediately apparent, the benefits of creating tri-Service Military Aviation Engineering Policy and Regulation have been immense. For example, there have been significant benefits for the operation and maintenance of aircraft in terms of continuous charge, self- supervision and tool control and the administration of authorizations.

Outstanding issues for the JAP centre on the production of the last remaining chapter that are approximately two- thirds complete. There still remains the perennial question of how closely the military and civilian regulatory system should be aligned. Should a common database of definitions be worth pursuing or should we accept that there will always be two aligned but different systems?

MOD – developing the joint approach, part 2

This was given by Cdr. lan Peck of Defence Aviation Safety Centre (DASC) who included several illustrations in his paper, one of which, a pie chart, showed a breakdown of tri- Service Cat 4/5 accidents by cause 1994-2003. 52.21 per cent were caused by human factor – aircrew, 5.31 per cent by human factor non- aircrew, 23.01 per cent by technical fault, 3.54 per cent were not positively determined, 4.42 per cent by natural operating risk, and 11.5 per cent by other causes. Another chart showed RAF ground incidents 1990-2003, where it was apparent that the great majority, particularly very recently, involved no risk. The “event iceberg” was also shown, with the peak at accident level, below this, incidents, and “under the water”, unreported events.

Common features of maintenance error accidents include circadian lows since most maintenance is carried out at night; supervisors tackling long, hands- on tasks, interruptions, failure to follow manuals, confusing and misleading manuals, shift hand-overs, time pressures, shortage of equipment or spares, shortage of staff, and the “can- do” attitude. The most common feature of all such accidents is the cultural acceptance of all these factors.

The question is what is the MOD doing about this? It has a long history of HF knowledge and such training is given during initial/trade training phases, together with some local initiatives. Information is collected and analysed on both current and future activities but a Pan-Defence HF policy is needed to bring them together. The policy will include the statements that a “Just Culture” is to be encouraged, an effective reporting system established and the full training requirement (who, when, frequency, and who pays?)

The flight safety information management system (FSIMS) was originally an RAF Strike Command project which enabled a web-based formatted input for occurrence reporting. It is easier to use for aircrew and engineers and is able to collect and analyse much more HF causal and contributory factor information. It also allows development in-service. The MOD has a maintenance error management system (MEMS) although the systems is not coherent. Local initiatives are at RAF Lyneham and RAF St Mawgan (SAR Force).

Training requirements include the three Services plus civilian contractors, with different cultures and stages of development. The route is policy, training needs, analysis, identity objectives and needs and this will combine “top down” and “bottom up” procedures. “Train the trainers” course will involve a two day course at DASC including the psychology of error and HF in the military aviation maintenance environment. The challenges include the need for some tailoring and refreshment to maintain relevance and the successful development of FSIMS training will provide the information that is needed to show we are preventing maintenance error. A final thought is will we be preaching the same lessons in seven years time?

Corporate responsibility for airworthiness: from a lawyer's desk

Corporate responsibility can include many areas but in the context of this conference it should be understood to be “the responsibility for airworthiness and safety which is imposed on the Management of those corporations involved in the Aviation industry” in the widest sense. Responsibility must be distinguished from liability, the latter having a louder ring to it. It has long been established that those causing death, injury or loss to others because of acts of omissions which broke the duty of care owed to the injured party, should have a civil liability to compensate the injured for their loss, and may in certain cases be punished on conviction of a criminal offence.

An example is the case of the accident to the British Midland 737. The circumstances have been well publicised. The engine oil covers had not been replaced on the evening before the accident and the engine run was not done. This was a breach of the airline's maintenance procedures. If the test run had been done, the problem would have been discovered. There were other shortcomings and British Midland was the first airline to be convicted of negligently endangering human life and fined in 1996. Thus, the necessarily highly regulated world of the aviation industry is not confined solely to the man on the shop floor, but reaches into the boardroom.

One question is whether corporate responsibility is enhanced by punishment. Aside from this, the sources of corporate responsibility can be categorised into four principal groups; – ICAO has established a specialised agency for safety oversight and the European Aviation Safety Agency (EASA) is responsible for the regulation of the continued airworthiness of most of the aircraft operating in the EU. Legislation, subsidiary legislation and associated regulations are also in being. Judicial precedent is also an area to be considered as a result of cases. Also, Guidelines and Advice can have the effect of imposing Corporate Responsibility in certain situations.

The CAA can be regarded as the principal enforcer of Aviation and related regulations. The role of the Health and Safety Executive (HSE) tends to be more restrictive to employee cases at least in the area of prosecutions. It is clear that there are many areas where the functions and powers of the CAA and HSE overlap. The third body which enters into the enforcement area is the Police.

A high profile case in the USA involving a ValueJet DC9 aircraft led to the first criminal indictment in the US of an aviation organisation for manslaughter in which Sabretech was found guilty on eight charges relating to, the reckless transportation of hazardous materials in addition to another charge relating to training of employees.

According to CAP 712 there are three main key elements to a Safety Management System (SMS): a comprehensive corporate approach to safety; man effective organisation for delivering safety; and systems sufficient to achieve safety oversight. With regard to the last named, one of the greatest contributions to sharing of safety related information.

The concluding part of the paper looks at the potential consequences of a failure of Corporate Responsibility. Instances were quoted regarding the issues of civil liability and regulator liability. Human factors (HF) often contribute to aviation accidents. CAP 716 deals with Aviation Maintenance Human Factors. The emphasis is upon practical guidance material for real- world situations. Further regulations are being considered. Some of the essential functions have been considered in this paper on exercising corporate responsibility with hopefully increased awareness of such areas with particular impact on the activities of a company and its management. It is also hoped that the cautionary tales will better equip the reader to deal with the present and more especially the future.

Related articles