Presentation

advertisement
Why Do Airplanes Crash?
An Open Source Air Data Inertial Reference Unit Investigation
***
2012 PSU/Galois Capstone Project
Chris Andrews, Trang Nguyen, Mark Craig, Kayla Seliner
Presentation
Air Data Inertial Reference Unit
•Our Project: building an open source ADIRU
•Overview: what is an ADIRU?
•Motivation: why are they important?
•Fault Tolerance: types of faults.
•Approach: voting methods.
•Design: hardware and software architecture.
•Results
•Conclusion
2
Project Goals
• Construct a small low power ADIRU system to
deploy on an RC aircraft
• Implement a Byzantine fault tolerant algorithm
on a system of multiple microprocessors (voters)
and sensors.
• Use input from multiple sensors including
gyroscopes, accelerometers, GPS, and airspeed.
• Use open source hardware and software when
possible.
3
Air Data Inertial Reference Units are an essential
component in modern avionics systems
•The ADIRU system collects and processes sensor values from
accelerometers, gyroscopes, altimeters and airspeed indicators
and functions as the single source of sensor data aboard the
aircraft.
•Many commercial aircraft including the Airbus A330 and Boeing
777 implement ADIRU units as part of their avionics suite.
•The Air Data Inertial Reference Unit may itself be triple
redundant.
•The ADIRU system replaces earlier fault tolerant triple modular
redundant systems.
•Autopilot and unstable flight regimes depend upon valid and
uninterrupted sensor data for safe flight.
4
TMR vs. ADIRU
5
Benefits of ADIRU Systems
•Redundancy: redundant sensors make system
less vulnerable to single sensor failure.
•Modularization and fault containment: the
ADIRU system is the single source of sensor data
for all the cockpit instruments and avionics
software on aircraft.
•Deferred maintenance: sufficient margin of
safety may be preserved in some systems to
operate with small number of faulty
components and avoiding expensive emergency
repairs
6
ADIRU Vulnerabilities
•
•
System complexity
Closed source, proprietary system
7
Triple Modular Redundant System
•
•
•
Votes on outputs of three redundant sensors.
System can tolerate single sensor fault.
Relatively simple to implement and diagnose.
Byzantine Fault Tolerant System
•
•
•
•
•
At least 4 different voters each with a sensor.
Tolerates fault in sensor or in voter.
F faults require 3F+1 voters with sensors.
Requires complex voting algorithm.
Can survive class of faults not dealt with by
TMR.
8
ADIRU failures are a critical event
With serious consequences if the
Aircraft is not in a visual flight mode.
Retrieved May 24, 2012 from https://encryptedtbn2.google.com/images?q=tbn:ANd9GcSYkmoQuv2Uml0vQjTVrW6z0zXHMhr6MdlZkyQJhHD5D5h_2vwZA
[1]
Air France Flight 447
On May 31st, 2009, Air France Flight 447 enroute
from Rio de Janeiro to Paris crashed into the atlantic
ocean killing all passengers.
Retrieved June 1, 2012 from: cdn.blogs.sheknows.com/
thewire.sheknows.com/2011/05/airfrance447.jpg
10
Sequence of Events Leading To Crash
Corrupted Sensor Data: pitot tubes blocked with ice
transmit Byzantine faults to ADIRU.
Loss of Control: Autopilot disengages. Flight crew
receive erratic and inconsistent airspeed data and stall the
aircraft.
No Recovery: Flight crew fails to recover from stall
because crew cannot determine actual airspeed. Flight
computer does not restart. Aircraft free falls into Atlantic.
11
Qantas Flight 72
On October 7th, 2008, Qantas Flight 72 enroute from
Singapore to Perth suffered a malfunction in the ADIRU
and flight computer causing a series of rapid descents
that threw passengers and crew about the cabin.
12
Sequence of Events Leading To Incident
ADIRU failure: A “spiky” series of measurements
from the angle of attack sensors that measure
aircraft pitch in relation to airflow exploited a
vulnerability in the ADIRU software. Bad data is
output to the flight controller from one of the
ADIRU units.
Flight Computer software failure: The flight
computer under autopilot fails to filter the bad data
and executes an abrupt dive of -0.8G .
The flight crew disengages autopilot and makes an
emergency landing at Learmouth, Western
Australia.
13
How ADIRU Systems Fail
•Failure of ADIRU may be intermittent and cause cockpit
instrumentation to send contradictory warnings (stall and high
speed).
•ADIRU is the root of all sensor data for flight avionics. Failure in
the ADIRU can instantly propagate throughout flight control
system.
•Failures of the ADIRU system effect both autopilot and manual
flight modes
14
Multiple Sources of Failure
Human Causes:
Deferred maintenance can cause errors to accumulate
until the ADIRU system fails.
Environmental Causes:
ADIRU systems interface with physical sensors outside the
cabin that can be effected by ice and environmental
conditions.
Software:
Software may hide bugs that appear under anomalous
conditions.
Most accidents have multiple causes.
15
Types of Faults
•Fail Silent: system fails to send data.
This fault is masked by a redundant system
•Byzantine Failure: system sends arbitrary data
including different data to different controllers.
This fault cannot be masked by simple redundancy.
16
Project Requirements
• Exhibit Byzantine and fail silent fault
tolerance
• Include fault injection
• Must be able to mask faults
• System must be expandable
• Must follow open source guidelines
17
Build a four redundant network using arduino
microcontrollers polling gyroscopes and
accelerometers. Network with an I2C bus.
18
Features:
I2C and Power Bus
Environmental Enclosure
Separate board for power supply
19
20
Reasons For Choosing Arduino
Open Source Hardware and Software
Large community of developers
Libraries for I2C communication already exist
Lowest hardware entry cost to develop a
multi-module fault tolerant system
• Quickest start time (no hardware developmen
necessary)
•
•
•
•
21
Arduino ArduIMU+V3
Features:
Atmega 328 uP
3D Accelerometer and 3D Gyroscope
3D Magnetometer
22
23
Software Algorithm
Clock Synchronization
Multi-Master I2C bus
Byzantine Algorithm
Fault injection
24
25



Safety critical systems should be able to handle failures
of one or more of its components and continue to
operate correctly.
Byzantine faults consist of one or more components or
subsystems sending inconsistent data to other
components and subsystems.
Handling these type of failures is known as the
Byzantine Generals Problem.
26



The Byzantine generals problem guarantees fault
tolerant behavior under the following premises.
 All loyal generals decide upon the same plan of
action
 A small number of traitors cannot cause the loyal
generals to adopt a bad plan.
More than 2/3 of the generals must be loyal.
Must have 3*N + 1 generals to handle N traitors.
27



General sends command to N-1 lieutenants.
 All loyal lieutenants obey the same command.
 If the general is loyal, then every loyal lieutenant
obeys the command he sends.
Each lieutenant communicates the command they
received from the general to each other.
Each lieutenant reaches a decision based on a majority
vote of the commands received from the general and
all other lieutenants.
28
1
2
2
3
4
3
4
2
4
3
3
2
1
4
4
3
3
4
2
4
1
4
3
4
1
1
1
2
2
1
1
4
3
2
2
1
3
3
2
1
29
[X, Y, Z] = [1, 1, 1]
1
[1, 1, 1]
[1, 1, 1]
2
3
[1, 1, 1]
[1, 1, 1]
4
3
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
}
4
[1, 1, 1]
2
4
[1, 1, 1]
[1, 1, 1]
2
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
3
[1, 1, 1]
}
2
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
3
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
4
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
}
[1, 1, 1]
30
[X, Y, Z] = [1, 1, 1]
1
[1, 1, 1]
[1, 1, 1]
2
[1, 1, 1]
4
3
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
}
4
[1, 1, 1]
2
4
[1, 1, 1]
[1, 1, 1]
2
[1, 1, 1]
[1, 1, 1]
[0, 0, 0]
[0, 0, 0]
3
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
Link Error between
Module 1 and
Module 4
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
[0, 0, 0]
3
[1, 1, 1]
}
2
[1, 1, 1]
[1, 1, 1]
[0, 0, 0]
[1, 1, 1]
[1, 1, 1]
[0, 0, 0]
3
[1, 1, 1]
[1, 1, 1]
[0, 0, 0]
[0, 0, 0]
4
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
}
[1, 1, 1]
31
[X, Y, Z] = [1, 1, 1]
1
[1, 1, 1]
[0, 0, 0]
2
4
[1, 1, 1]
3
2
[0, 0, 0]
[1, 1, 1]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[1, 1, 1]
2
[0, 0, 0]
[1, 1, 1]
[0, 0, 0]
}
[0, 0, 0]
[0, 0, 0]
3
[0, 0, 0]
[0, 0, 0]
Link Error between Module
1 and Module 4 as well as
Module 1 and Module 2
4
[1, 1, 1]
4
[0, 0, 0]
[0, 0, 0]
[1, 1, 1]
[0, 0, 0]
3
}
2
[1, 1, 1]
[0, 0, 0]
[1, 1, 1]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
3
[1, 1, 1]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
4
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[1, 1, 1]
}
[0, 0, 0]
32
33
34
35




Sensor reads are interrupt driven.
Must synchronize clocks for all modules to ensure
an “apples-to-apples” comparison of sensor
values.
Variable used to synchronize all modules is the
Timer/Compare interrupt counter.
By ensuring the counter is the same on all modules
we can ensure that the interrupt that drives sensor
reads occurs at the same time in all modules.
36



One module is dedicated to synchronizing the
clocks of all other modules.
Accuracy of clock synchronization is
determined by Timer Interrupt clock speed and
is approximately:
Timing of clock synchronization cycles is set so
that each device is synchronized to the master
every few data cycles.

This helps to ensure a tight synchronization as well
as lessen the interference of data processing.
37
Master
Request Slave Clock
Value
Send Back Slave
Clock Value to
Calculate Delay
Send Clock Value
Slave
T1
T2
Send Clock Value
T4
Calculate Delay
Delay = (T4 – T2)/2
T6
Calculate Offset
Offset = T6 – Master Clock Value Delay
T3
T5
New Clock Value = Old Clock Value - Offset
38


The output displays the
original clock value, the
clock value from the
master, the offset, and
the new clock value.
The offset is “0” because
a delay of “1” was
calculated.
39
Results
System exhibits Byzantine fault tolerance. A
system that is BFT requires 3F+1 voters.
System masks fail silent faults
(need a graphic to show this)
40
Budget
41
Conclusion
Our system exhibits basic fault tolerant
functionality. It demonstrates the feasibility of
an open source fault tolerant project.
42
Further Work
• Integrate GPS, magnetometer, altimeter and
other sensors into the system.
• Implement kalman filters in the SW to
smooth out sensor noise.
• Gather real data by launching aboard a
vehicle.
43
Lessons Learned
Interrupt routines on microcontrollers
Debugging methods
Code development: algorithm>python>C
How to organize a large project involving
hardware and software
• Documentation
•
•
•
•
44
Acknowledgements
We would like to thank our sponsors:
Dr. Lee Pike and Galois Inc.
We also acknowledge the help of our advisor:
Dr. Christof Teuscher Portland State University
45
References
2.
46
Download