Near

advertisement
The Near Earth Asteroid Rendezvous (NEAR)
Rendezvous Burn Anomaly
Susan C. Lee
The Johns Hopkins University Applied Physics Laboratory
SCL Sep-04
Disclaimer
The NEAR Mission ended in February 2001 and some
documentation has dissipated. Some of this presentation relies
on memory, but is basically accurate.
The Lessons Learned represent my own opinions, not
necessarily those of the JHUAPL Space Department, where I
have not worked since January 1998.
SCL Sep-04
1
Overview
• NEAR Overview
• Anomaly Description
• Investigation Findings
• Lessons Learned
SCL Sep-04
2
Overview
• NEAR Overview
• Anomaly Description
• Investigation Findings
• Lessons Learned
SCL Sep-04
3
Mission Description
• Three-year cruise to the Near-Earth
Asteroid Eros
– Up to 12-day solar transit
– Round-trip light times up to 40 min.
– Numerous small Trajectory Correction
Maneuvers (TCMs)
– 2 Large TCMs using bi-propellant
large velocity adjust thruster
- Deep Space Maneuver
- Eros Rendezvous Burn
– No critical time windows for TCMs
• One-year science mission orbiting Eros
– TCMs planned for once a week
– Frequent momentum dumps
SCL Sep-04
4
Spacecraft Description
• Mechanically simple
– Fixed solar panels (after one-time
deployment)
– Fixed antennas
- High Gain (1º BW)
- Fan Beam (40ºx 8º BW)
- Dual hemispherical low gain
• Electrically simple
– Direct energy transfer power system
– 1553 bus/discrete line communications
• Computationally complex
– 3-axis active guidance and control using thrusters and momentum wheels
– Careful power management
SCL Sep-04
5
Spacecraft Block Diagram
SCL Sep-04
6
Safing Design
• Goal of safing: keep the S/C viable and make ground contact
1. Keep the solar panels pointed at the Sun and the load below the
solar panel output
2. Point the fan beam antenna at the Earth and swap redundant RF
systems
• Joint function of the C&T Processor and G&C system
C&T Processors
AUI
FC
Check power system health and
manage loads
Check solar panel normal
within Keep-in zone
Check for bad sensor data
Check RF system health
Check momentum
Check for burn anomalies
Check on AIU health
Check on FC health
Maintain viable G&C configuration
Prevent erroneous thruster use
• Coordinated via housekeeping telemetry and discrete lines
SCL Sep-04
7
Spacecraft Mode Descriptions
• Operational
– Under ground command (either
real time or uploaded command
sequences)
– Solar panel normal near Sun line
• Earth Safe
– Solar panel normal on Sun line
– Earth in fan beam antenna
– Downlink at 10 bps
• Sun Safe - rotate
– Solar panel normal on Sun line
– Rotate about Sun line at 1 rev/3
hours
– Beacon on fan beam downlink
• Sun Safe - freeze
– Same as Earth Safe
SCL Sep-04
8
Safing Implementation:
Command and Telemetry Processor
• Simple, rule-based autonomy system
–
–
–
–
Rules checked flags, relay states, housekeeping telemetry, discrete lines
Triggered rules point to associated command sequence
No loops or jumps
Priority dictated by order in the list of rules
• Single Software Mode
– Check for commands at the uplink interface/Execute
– Check autonomy commands needed/Execute
– Check for commands in uploaded command sequence/Execute
– Place telemetry on the downlink according to commanded format and rate
• S/C Safing Modes implemented by executing autonomy commands to set
desired power state, RF state, formats, etc.
SCL Sep-04
9
Safing Implementation:
Guidance and Control (G&C) Processors
• Many, many possible combinations of functions performed on Flight
Computer (FC) and Attitude Interface Unit (AIU) using combinations
of sensors, actuators and guidance algorithms
– Implemented as in-line code triggered by IF-THEN-ELSE sequences
– Multiple flags, timers and parameters
• NOMINALLY, FC controls the S/C, using the AIU simply as an
interface unit
– Guidance algorithms based on stored orbit
– Attitude based on Star Camera and IMU input
– Nominal wheel control; thruster control during TCMs only
• NOMINALLY, AIU checks for FC problems (e.g., Sun-pointing keepin violated)
– Can ask C&T Processor to switch to backup FC
– Can take over S/C control for Safe Modes
SCL Sep-04
10
Thruster Use Safing - Clementine Prevention
• Thruster hardware enable/disable
• ‘Open’ commands must be sent every 40 ms, else thruster values
close automatically
• Separation of thruster use function between C&T Processor and
G&C
– C&T processor controls enable/disable
- Faulty CTP can only enable thrusters, not open values
– G&C (AIU) controls thruster value open/close
- Faulty AIU can try to open values, but won’t succeed unless CTP also
enables thruster hardware
SCL Sep-04
11
Thruster Use Safing Trajectory Correction Maneuvers
• TCM thruster use under ground control only
• Parameters loaded on G&C in advance and verified prior to burn
• Timetagged commands on C&T Processor initiate and terminate burn
SCL Sep-04
12
Burn Abort Conditions
• NEAR burn philosophy: Better safe than sorry
– No burns with critical timing
– Better to shutdown, correct problem if any, and try again
• G&C Burn Shutdown Criteria (partial list)
–
–
–
–
Attitude Keep-in violated
Acceleration Keep-in violated
Anomalous pressure reading on fuel or oxygen tanks
C&T Processor signals a Safe Mode
• C&T Processor Burn Shutdown Criteria
– Loss of AIU Heartbeat for 5 seconds
– Occurrence of any condition normally causing a Safe Mode entry
– G&C signals a Safe Mode
SCL Sep-04
13
Thruster Use Safing Autonomous Momentum Dumps
• Autonomous use of the thrusters for emergency momentum
dump only
SCL Sep-04
14
Overview
• NEAR Overview
• Anomaly Description
• Investigation Findings
• Lessons Learned
SCL Sep-04
15
Preparation - Burn Scripts
• Work began in November 1998, with DSM1 scripts as model
• Significant deviation from nominal safing practice
Recommended Procedure
NEAR Rendezvous Burn Procedure
RF Watchdog Timer set to expire after 2 missed DSN
contacts.
RF Watchdog Timer set to nine days (rendezvous burn
had continuous DSN coverage).
Short Command Loss Timer set for critical operations.
No Command Loss Timer
Fast (5 sec.) AIU Heartbeat rule to detect a failed AIU.
Autonomy macros must be run on both CTPs, so that
they stay synchronized.
Backup systems must be maintained in a state of
readiness for use.
No Fast AIU Heartbeat rule
The burn abort macro was loaded only on CTP1. CTP1
ran Burn Abort while CTP2 ran Earth Safe simultaneously.
EEPROM on both FC’s contained out-of-date orbit
information that could not be used for guidance.
Critical parameters must be verified via download and
comparison with expected values.
Burn parameters uploaded and used without
verification.
• Final review December 7, 1998
– System Engineer not present
• Brassboard testing of the nominal burn only
– Brassboard configuration deviated significantly from the S/C
– Burn abort not tested at all
SCL Sep-04
16
The Anomaly: Just the Facts, Ma’am
• Burn Command Script
– Uploaded December 16, 1998
• Burn
– On-board initiation at specified time
– Normal execution of 200-sec settling burn
– Initiation of bi-propellant burn as expected
• Anomaly
– Burn abort within fraction of a second from bi-propellant
initiation
– S/C signal lost 37 seconds following abort
• DSN acquired Sun Safe beacon 27 hours after LoS
– Freeze command stopped rotation with Earth in the fan beam, and
telemetry downlink was commanded.
SCL Sep-04
17
Recovery and Outcome
• When reacquired, S/C in stable Sun Safe mode controlled by the backup
AIU (AIU 2)
• Mission Operations recovered to Operational Mode
– Interrupted Command History/Autonomy History downlink
– Faulty procedure used in first attempt resulted in immediate demotion
back to Safe Mode and AIU switch
• S/C state assessed and immediate cause of burn abort ascertained in two
days
• New burn planned and executed on Jan. 3, 2000 allows completion of
NEAR mission
– Used up fuel margin
– Additional 13 months of cruise prior to mission start
– Some contamination of imager optics
– Degradation of some thrusters due to cold firings
SCL Sep-04
18
Data Sources for Diagnosis
Location
Data type
SSR
Housekeeeping & G&C Telemetry
CTP1
Data Summary (max/min values of
memory
housekeeping telemetry)
Snapshot Data (all housekeeping telemetry
for 1st 3 triggers rules)
Autonomy Rule Enable States Pre-Event
Autonomy Rule Enable States Post Event
Command History (165 commands deep)
Autonomy History (32 rules deep)
CTP2
memory
AIU1
memory
AIU2
memory
FC1
memory
Snapshot Data (1st 3 rules)
Autonomy Rule Enable States Pre-Event
Availability
Lost during LVS event
Have, but was not reset since T-6 hrs; minimum AIU
values all =0, since AIU's were power cycled
Not reset prior to burn; contained old data from
operational use of autonomy rules
Have
Have
Partially overwritten unnecessarily during recovery
Partially overwritten during incorrect recovery
sequence
Have, but was not reset since T-6 hrs; minimum AIU
values all =0, since AIU's were power cycled
Have
Unconfirmed; not downloaded prior to burn
Autonomy Rule Enable States Post Event
Not downloaded, then overwritten
Command History (165 commands deep)
Autonomy History (32 rules deep)
Partially overwritten unnecessarily during recovery
Partially overwritten during incorrect recovery
sequence
Lost during AIU1 switches
Downloaded late, after some modifications
Have
Downloaded late, after some modifications
Have
Downloaded late, after some modifications
FC not commanded to store
Never downloaded; lost during FC reboot 2 months
following event
Data Summary (max/min values)
Command History
Data Structure Values
Command History
Data Structure Values
Command History
Data Structure Values
IMU High Rate Data
Program Load
SCL Sep-04
Comment
In safe mode telemetry
In safe mode telemetry
In safe mode telemetry
In safe mode telemetry
In safe mode telemetry
In safe mode telemetry
One command to
downlink.
One command to
downlink.
In safe mode telemetry
In safe mode telemetry
19
Early Sequence of Events (1)
Burn Abort
Time from Burn Abort
00:00:10
00:06:00
00:37:54
00:42:37
00:42:47
00:42:48
00:47:48
S/C Mode
Operational
Earth Safe
Sun Safe
Commanded Actuators
Thrusters Only
Wheels Only
Actuators In Use
Thrusters Only
Wheels Only
Momemtum Dump
System Momentum
OK
Too High
Thruster Request
Thruster Request ON
Thruster Request OFF
Thruster Enable State
Enabled
Disabled
AIU in Use
AIU 1
AIU 2
FC in Use
FC1
FC2
None (AIU Only)
SCL Sep-04
20
Why did the burn abort?
SCL Sep-04
21
Early Sequence of Events (2)
Time from Burn Abort
S/C Mode
Operational
Earth Safe
Sun Safe
Commanded Actuators
Thrusters Only
Wheels Only
Actuators In Use
Thrusters Only
Wheels Only
Momemtum Dump
System Momentum
OK
Too High
Thruster Request
Thruster Request ON
Thruster Request OFF
Thruster Enable State
Enabled
Disabled
AIU in Use
AIU 1
AIU 2
FC in Use
FC1
FC2
None (AIU Only)
SCL Sep-04
00:00:10
00:06:00
Transition
Earth Safe
00:37:54 to00:42:37
00:42:47 initiates
00:42:48
00:47:48
burn shutdown command sequence
and high-rate slew to Sun pointing
using thrusters-only.
Command script error
causes abrupt
transition to wheelsonly control.
NEAR goes to Sun Safe before the
wheels can overcome the high rate.
22
What was the script error?
• The script did not return control to the wheels at all, but did disable the
thrusters.
– Without enabled thrusters, the G&C autonomously forced wheel control,
but without the controlled transition.
– Because thruster-only control was commanded, the G&C used thrusters
each time they were enabled.
• Need for a carefully timed sequence for returning control to the wheels
was established prior to the first TCM one week after launch.
– Brassboard testing showed that the S/C was very likely to receive a “kick”
from the thrusters without this controlled procedure.
• Brassboard testing of the script reproduced the early anomaly events
perfectly
• The DSM1 burn script DID contain the wheels-only command (but not
the right sequence), and the missing safing rules.
SCL Sep-04
23
Early Sequence of Events (3)
Time from Burn Abort
00:00:10
00:06:00
S/C Mode
Operational
Earth Safe
Sun Safe
Commanded Actuators
Thrusters Only
Wheels Only
Actuators In Use
Thrusters Only
Wheels Only
Momemtum Dump
System Momentum
OK
Too High
Thruster Request
Thruster Request ON
Thruster Request OFF
Thruster Enable State
Enabled
Disabled
Command script error causes new
AIU in Use
AIU 1
‘kick’ to attitude
and momentum.
AIU 2
FC in Use
FC1
FC2
None (AIU Only)
SCL Sep-04
00:37:54
00:42:37
00:42:47
00:42:48
00:47:48
“Kick” leaves high system momentum
and initiates a momentum dump.
24
Simulated Early Sequence of Events
SCL Sep-04
25
Complete Timeline (00:00 - 03:00)
[Ref. 22:03:16Z 20 Dec 1998]
0:00
0:30
1:00
1:30
2:00
2:30
Charge D Charge Intermittent C/D Charge
Power System Status
(D = Discharge; C = Charge)
(Bus V < 26V; LVS Trip)
(Bus V > 28V)
State
Operational
Earth-Safe
Sun-Safe Rotate
Momentum (Body + Wheels)
Off-Scale
High/No warmup
High/warmup
O.K.
Momentum Dumps:
Gyro Mode
Guidance and Control
Attitude Interface Unit
Control by FC or AIU
Key:
Dumps #1 - #7
Normal
#1
FC Probably FC
Whole-Angle (noisy)
Five Switches #2
FC
Recovered in AUI-only Mode; timing unknown
= Uncertain
SCL Sep-04
26
Complete Timeline (03:00 - 06:30)
[Ref. 22:03:16Z 20 Dec 1998] 3:00
Power System Status
(D = Discharge; C = Charge)
3:30
4:00
4:30
5:00
5:30
6:00
Intermittent C/D
(Bus V =23.4V)
State
Operational
Earth-Safe
Sun-Safe Rotate
Momentum (Body + Wheels)
Off-Scale
High/No warmup
High/warmup
O.K.
Momentum Dumps:
#8
#9
#10
#11
#12 #13
#14
Gyro Mode
Guidance and Control
Attitude Interface Unit
Control by FC or AIU
Key:
= Uncertain
SCL Sep-04
27
Complete Timeline (06:30 - 09:00)
[Ref. 22:03:16Z 20 Dec 1998]
6:30
7:00
7:30
8:00
8:30
9:00
Power System Status
(D = Discharge; C = Charge)
State
Operational
Earth-Safe
Sun-Safe Rotate
Momentum (Body + Wheels)
Off-Scale
High/No warmup
High/warmup
O.K.
Momentum Dumps:
#15
Gyro Mode
Guidance and Control
Attitude Interface Unit
Control by FC or AIU
Key:
SCL Sep-04
= Uncertain
28
Overview
• NEAR Overview
• Anomaly Description
• Investigation Findings
• Lessons Learned
SCL Sep-04
29
NEAR Anomaly Investigation
• NEAR Anomaly Review Board established on 6 January 1999
– Assess APL efforts to understand and correct causes of anomaly,
and recommend additional efforts
– Determine most probable cause of the anomaly
– Review NEAR program and recommend improvements for future
missions
• Timeline reconstruction from available data
• Determination of probable cause
– Fact of and reason for burn abort recorded in snapshot
data
– Script error obvious to knowledgeable engineers
- Impact of such an error known prior to the event
- Impact confirmed by brassboard simulation of the burn abort
event
– Fault tree developed for anomalous momentum dumps
– Analysis and 128 brassboard simulations of potential scenarios
SCL Sep-04
30
Findings
“The investigation established a good understanding of the
events during approximately the first 47 min after the abort,
but no explanation for the failure of onboard autonomy to
quickly correct the problem. The Board found no evidence
that any hardware fault or single-event upset contributed to
the failure. Although software errors were found that could
prolong and exacerbate the recovery, they by no means fully
explain it.”
• No explanation for the long-term behavior of the S/C
• Only remaining branch in the fault tree is “two or more failures”
SCL Sep-04
31
Hardware Faults
• All hardware functioned nominally before, after and as far as can
be seen with limited data, during the anomaly
• Most hardware failure modes failed to reproduce known events
when simulated
• Only gyro noise gave results close to observed behavior
– Required noise levels 10x higher measured on ground or in flight,
before or after the anomaly
– With high gyro noise, simulations show NEAR never recovers, so
noise would have to go up and down and up and down (…)
– No credible mechanism for this phenomenon was ever suggested
by APL or the gyro manufacturer (Litton)
SCL Sep-04
32
Software Failures
• An independent review team found 17 errors in AIU or FC
software
– 9 in complied code
– 8 in data structures or other parameters (in addition to the
acceleration limit that precipitated the anomaly)
• One error caused a high momentum wheel rate to be set to zero
– Known to have occurred at least once during the anomaly
– Can cause high momentum to be calculated as low OR low
momentum to be calculated high
– In simulation, S/C always recovered in 20 minutes or less
• Software error eliminated as the total cause, because:
– Simulator running flight code did not exhibit anomaly
– No repeat of anomaly for remainder of the mission
- But there were parameter changes and software uploads
SCL Sep-04
33
Can Software/SEU’s be eliminated?
• Many G&C data structures not downloaded until after certain
parameters were changed
– Data structures were not verified prior to burn
– SEU, upload error, or configuration management error possible
• FC1 program memory not downloaded and verified after
anomaly
– February 24, 1999, FC1 spontaneously re-booted (first and only)
– Could be SEU or still unknown software error
– When the anomaly review began, it found two versions 1.11 of the
FC code. Which was on the S/C? Was either?
• 80,000 lines of highly convoluted code
• The brassboard simulates the physics; the S/C lives it
More about this later.
SCL Sep-04
34
Overview
• NEAR Overview
• Anomaly Description
• Investigation Findings
• Lessons Learned
SCL Sep-04
35
First Observation
• Apparently, it took four, seemingly-independent errors to cause
the NEAR rendezvous burn anomaly
– Burn abort caused by a threshold set too low
- Data was available to set it properly
– Serious errors in a script that should have been under
configuration management, reviewed and tested
– Two or more unknown errors that caused continued control
problems, even after the autonomy actions corrected the
configuration error
- NARB concluded that no single error could produce the known behavior
• Was NEAR just unbelievably unlucky, or is there something to be
learned here?
– Examine the patterns
SCL Sep-04
36
Q: Why didn’t the G&C use data from DSM1 to
set the acceleration limit?
• Consider the following:
– Less than a week prior to spacecraft/rocket mating, the System
Engineer checked the alignment of the Star Camera and found it
to be 90° out.
– Immediately following launch, the Star Camera was unable to find
guide stars, because the on-board star map of southern
hemisphere of the celestial sphere was incorrect.
– The first TCM was poorly controlled, because the control law used
an inaccurate model of the thruster value action. (Manufacturer’s
data was located in the G&C engineer’s file cabinet.)
• The burn anomaly was not the only or first time the NEAR G&C
team failed to measure or use data to check their models
• S/C testing failed to uncover any of these errors, including the
faulty acceleration limit
SCL Sep-04
37
Testing G&C Algorithms
• Lacking a zero-gravity environment, a wrap-around simulation
with a ‘truth model’ is the only way to test G&C
– Meticulous attention to modeling of physical phenomena
– Independence between the flight algorithms and the truth model
• The NEAR ‘truth model’ was written by the flight team and
mirrored all the incorrect physical models used to design the S/C
G&C algorithms
– Although NEAR had an Independent V&V team for G&C, the flight
team GAVE them all the models
• MSX (the program prior to NEAR) had an independent team build
the truth model
– NEAR flight G&C team opposed an independent team for NEAR
– “50% of the errors found on MSX were in the truth model, not the
S/C”
SCL Sep-04
38
Lesson Number 1
• Always have an independent team build the
simulation that will be used to test the G&C
algorithms.
– Different approaches by different teams can
uncovered biases on the part of either team
– Use measurements on real flight hardware as much
as possible
– Accept the time spent on errors in the truth model to
get find the errors in flight algorithms
You can’t fool Mother Nature.
SCL Sep-04
39
Q: How did such an obviously flawed script
escape notice?
• Consider the following:
– At launch, most of the scripts in use by Mission Operations were lastminute adaptations of S/C-level test scripts
- Dangerous test commands still in place in the rendezvous burn script
– In just the first 8 months of operations, there were 7 entries into Safe
Mode caused by Mission Operations errors
- Many script errors that could have been found by brassboard testing
- Lessons of the first TCM
– At the time of the rendezvous burn, Mission Operations still had no set
procedure for recovery from Safe Mode
- This was an Action Item from Critical Design Review
– 2 months after the burn anomaly, two new Safe Mode entries were
caused by operations errors in loading orbits
- Loading new orbits was a routine operation for three years of cruise
• Such events were almost accepted as an inevitable part of operations
SCL Sep-04
40
Preparation for Operations
• Most operations on spacecraft can be planned, scripted and tested
on the ground before launch
– A Concept of Operations that reflects the actual S/C design
– Contingency planning, as well as nominal operations
– Scripts that can be used in flight
• Pre-launch NEAR Mission Operations concentrated on ground
system acquisition and DSN connectivity
– The NEAR CONOPs was essentially generic - no thinking about
how the operations would be conducted with NEAR
– No thinking about contingencies
– No practice of operations with significant round-trip light times
• A function of better, cheaper, faster?
– Nope; function of inexperience with professional operations
SCL Sep-04
41
Mission Operations Professionalism
• Professional Mission Operations requires discipline
– Configuration management of scripts, code, parameters, etc.
– Following a process
- Review process
- Test requirements
- Script sign-off
– Use of proven procedures to perform routine tasks
– Using Problem Failure Reporting as an opportunity to learn
- Change or institute process to avoid repeat of errors
• NEAR approach: Conduct operations with a team of engineers who
would become experts on the spacecraft and mission
– Resulted in a ‘heroic’ mode of operations - CMM Level 1 of ops
– Configuration management, reviews, sign-off on scripts were not the
interesting part of operations for the NEAR team
– Didn’t acquire the degree of knowledge required for hero status
SCL Sep-04
42
Specialized Technical Knowledge
• Very few people were truly capable of reviewing scripts
– G&C engineers didn’t understand the scripting language
– Mission Operations team didn’t understand the spacecraft
• Running the brassboard simulator took knowledge and patience
–
–
–
–
Setting up the ‘truth model’ simulation
Maintenance to keep the brassboard synchronized with the S/C
Ran in real-time, so simulations took time
“Half the time, errors are in the brassboard setup, not the script”
SCL Sep-04
43
Lesson Number 2
• Mission Operations requires a team of experienced,
dedicated professionals with a unique set of skills.
– Planning, preparation, process control, configuration
management are as important than detailed technical
knowledge
– Practice on the pre-flight the way you plan to operate
in flight and then don’t deviate unless absolutely
necessary
– Accept the time spent on errors in the ground
simulator to find the errors in scripts
Being a hero means never having to say “I’m sorry”.
SCL Sep-04
44
Q: Why didn’t the S/C recover after autonomy
corrected the G&C mis-configuration?
• Consider the following:
– Pre-flight, G&C code had more SW PRF’s than the other 5 processors
combined
– New versions were loaded ~ 10 times during S/C-level testing, in
Maryland and the Cape
- The first three versions wouldn’t boot
- Three separate problems that caused FC commands to be ignored were
discovered pre-flight and a fourth after launch.
– Telemetry was a particular problem
- The G&C used the ground simulator, not telemetry, to test their software
– Prior to the anomaly, FC code was uploaded three times and the AIU
once to correct major problems in flight
– 17 additional errors were found during the rendezvous burn
investigation
• The existence of undiscovered G&C code errors is not unlikely, based
on the continued high rate of fault discovery
SCL Sep-04
45
Q: Do we even need an undiscovered error
to explain the anomaly?
• G&C software error caused a high momentum wheel rate to be
set to zero
– Can cause high momentum to be calculated as low OR low
momentum to be calculated high
• In simulation, S/C always recovered in 20 minutes or less
– Limited brassboard simulation of this scenario
– Other failures of the simulation to catch errors in flight code have
been caused by mismatches between the truth model and reality
– How accurate is the wheel model?
– How would the behavior change if the wheel model is changed?
• An attractive hypothesis
– Requires only a known error in the G&C code, plus an unrealistic
wheel model
– Code containing the error known to be invoked at least once
during the anomaly
SCL Sep-04
46
Lesson Number 3
• If the software error discovery rate is still high,
keep testing, even if the S/C has already been
launched.
– Use the ground simulator
– Use an independent team, like the NARB did
following the anomaly
– Remember Lesson 1: Take every opportunity to
adjust the truth model to match S/C performance in
flight
It ain’t over ‘til it’s over.
SCL Sep-04
47
Q: Was NEAR just really unlucky?
• Consider the following:
– G&C code was known to be buggy before pre-flight
– G&C code continued to be buggy during flight
– The fact that there had been no true independent look at the G&C
truth model was known within a week of launch
– Mission Operations preparation was known to be inadequate prior
to launch
– Mission Operations were fault-prone throughout cruise
– Mission Operations was never asked for an accounting of their
process prior to the burn anomaly
• Each event was treated individually, rather than as a pattern that
had a high probability of converging into disaster
– The NEAR burn anomaly represented a Management failure
SCL Sep-04
48
Lesson Number 4
• Management must stay informed and involved
before there is a serious problem
– Look for trends and patterns
– When there is serious disagreement on the cause or
meaning of events, look closer
– Get an independent opinion
Ultimately, leadership is responsible.
SCL Sep-04
49
Conclusion
• NEAR survived under the conditions described because it had:
– A very forgiving mission design
- No critical timing for TCMs
– A huge fuel margin
- Despite 96 m/s lost in the anomaly and the addition of a DSM2, NEAR
had sufficient fuel to conduct the mission
– A reliable and comprehensive safing system
- Took care of almost everything that was thrown at it during flight
- Still had one rule to go when the burn anomaly corrected itself
• The burn anomaly served as a wake-up call, and with another
reload of the flight computer, went on to perform a flawless, yearlong mission around Eros, terminating in a controlled landing.
• By heeding the lessons of NEAR, other, less-blessed missions can
be just as successful without the initial trauma.
SCL Sep-04
50
Download