System Engineering Experiences Harold Sasnowitz, IEEE Life Senior Member Agenda • Angle Rate Bombing System • Space Shuttle • Harpoon Missile • Airborne Mine Countermeasures What is Real Time Software? • Time of arrival of the solution is part of the solution • Usually have hardware timer(s) causing interrupts US Navy Angle Rate Bombing System (ARBS) • Circa 1975 • Purpose: precision bombing using horizontal flight path • Computer Interface: Discretes, Direct memory access, interrupts, serial and parallel channels • Processor: SP-1, two page processor, 250 Kop/sec • Programmed in assembler language • Radar image projected onto HUD; Pilot places crosshairs on target and computer determines release point • Problem posed as three DMA channels, always in particular order • Initial flight testing resulted in system failure: HUD blanks when system engaged • Finding: DMA channel order different than originally specified • Be sure specification reflects real requirement • Be sure test software reflects real requirement Space Shuttle • System description • 300 Kop/sec • 256Kbytes ferrite core memory – later updated to semiconductor memory • Programmed in HAL/S (C like) • System Requirement: survive two like failures • Five general purpose computers, all executing identical software • All sensors and flight effectors fault tolerant and redundant • All 5 computers receive all data input • All 5 computers receive output data from the other 4 computers and “fail votes” • Input and output sum check on minor cycle basis • First launch failure at t-30 seconds First Launch Failure • • • • • • • • • Insufficient processing power drives software design 50 msec major cycle ~10 minor cycles/major cycle Computers send/compare received/calculated data Non-compare sets “fail vote” Different set of data in different minor cycles @t-30 sec backup brought into redundant set Prime sent wrong minor cycle number 7 “fixes” to create this condition Shuttle Reliability Study • Purpose: Recommend flight rules for mission abort due to computer system failures • Method: Determine reliability of Shuttle computer system • Reliability at time t is probability system will be working at time t • For components: R=e-λt • What is λ? Failure rate in failures per million hours • For system: R(t)=∫e-λtdt • What is λ for a system? System State Space Diagram 4/1 λ λ 4/0 3/1 λ λ λ λ 3/0 3/0 λ λ Vehicle loss What is system λ? λ11 λ12 λ13 λ14 Λ1,23 λ22 λ33 Partial Matrix shown Complete matrix is 23 x 23 elements For system: R=∫e-λtdt • Solution is alternating infinite sum • Programmed in APL • A high order language that has the feel of spreadsheet, but looks like a classical software language US Navy Harpoon Anti-ship Missile • • • • • • • • • • Circa 1971 Mid-course Guidance Unit (MGU) Programmed in assembler language One page, 250 kops; 8K core memory For tracking state vector (v, a) contains Attitude Reference Assembly (ARA) • Contains 3 gyros and 3 accelerometers • ARA interface to counters in computer Present position provided by launch computer before launch Target location determined and provided by on-board radar 50 msec hardware timer interrupt causes software interrupt • Interrupt Handler reads counters Test software showed g slightly off 50 msec counter off by one US Navy Airborne Mine Countermeasures • • • • Initial Deployment Now Purpose: find and destroy shallow and deepwater mines Sensors controlled from console on-board MH-60S Numerous tests required to qualify equipment beyond performing basic function • Temperature, humidity, salt spray, electromagnetic compatibility, others • Coordinating scheduling of production, factory flight testing, Navy operational testing complex • Programmed in C • Each sensor developed by programmed by different company Software Failures in History • 22/7 = ∏? • 24 hours = one day? • How many times does the earth make a complete rotation on its axis in one 365 day year? • Space probe missed Mars because dimension that should have been in English measure was in metric • Errors in formula transcription (handwritten to code) • The number of significant figures matters Questions First Flight Failure • • • • • • • • • First flight not vertical launch Drop from Shuttle Carrier Aircraft Shuttle and Carrier held together with pyro bolts At separation “GPC 2 light” Studied System Services software looking for “single point failures” “All leaves cancelled” Lost Weekend to engineering staff Bad solder joint on a computer card Proved system design