Software_Engineering_Disasters

advertisement

1

Software in our lives, then and now

I think there is a world market for maybe five computers. - IBM Chairman

Thomas Watson, 1943

Medical (processing and analysis, Computer Aided Surgery, other various equipment)

Financial and business (banking, trading)

Transportation (trains, cars, planes, auto-pilot)

Home (security / fire)

Leisure

Military

2

Murphy’s law

“Anything that can go wrong, will go wrong.”

3

Previously in CS 577

Mars 2 Rover crash-landing (1971)

 dust storm caused incorrect landing angle computations?

Ariane 5 self-destruct (1996)

Data conversion from 64-bit floating point to 16-bit signed integer: overflow

Cost: $370,000,000

Therac-25

Beta radiation overdose (10,000%)

Replacing hardware interlocks with software interlock mechanisms

Frequent overflow in a one-byte counter. Operator input to the machine during overflow causes interlock mechanism to fail due to race condition

3 deaths, 3 injured

Unrealistic risk assessment, inadequate testing

AMR / Budget Rent-A-Car / Hilton Hotels / Marriott International “Confirm”

Bank of America “MasterNet”

4

Disasters at the people (not company) level

Panama Radiation Therapy Overdose (2000)

18 deaths, 10 injured

Double counting, Overreliance on automation

Various military vehicle crashes

Chinook Helicopter Crash, 29 deaths (1994): uncommanded run up and run down of the engines (analysis shows 486 anomalies in 18% of the code)

V-22 Osprey Crash, 4 deaths (2000): software causes aircraft to decelerate when pilot attempts to reset software

Failed missile interception, 28 deaths, 94 injured (1991): system clock

Y2K (2000)

Abbreviating year with 2 digits

$300,000,000,000 cost

5

Toyota Anti-Lock Brake recalls

(2010)

~150,000 vehicles recalled

Reason: 1 second lag

60 mph (96.5 km/h)  ~90 feet (27.5m)

Enough to cause accidents

Bad PR

$1.1 billion in repairs

$770-880 million in lost sales

Toyota

"Moving forward"... even when you don't want to.

Endangering people’s lives

6

Stock Market Flash Crash

(2010)

Dow Jones stock market (very closely watched U.S. benchmark indices tracking targeted stock market activity).

Biggest on-day market decline, 998.5 points

Cost: $1,000,000,000,000

Procter & Gamble, Accenture: shares price down to a penny, or up to $100,000.

Recovered a large amount of the point drop

7

Cold War Nuclear Missile

False Alarm

Very sensitive period

Strategy was an immediate nuclear counter-attack to guarantee “Mutually Assured Destruction”

How it was mitigated: soldier considered it was a computer error

The bug: false alarm created by a rare alignment of sunlight on high-altitude clouds and the satellites’ orbits.

Cost: Nuclear World War 3

8

What’s next?

Just as Thomas Watson couldn’t guess what was coming up in the next 40 years, it is pretty hard for us to estimate how computers and technology will evolve in the near future.

However, we know for sure that software systems will get

MUCH larger and complex, more tasks will be automated, reliance on software will greatly increase.

9

Do more testing?

Testing will only catch ~80% of the bugs.

“Program testing can be used to show the presence of bugs, but never to show their absence!” Edsger Dijkstra

10

Conclusion: our role

Our responsibility increases as the need for reliability in our system increases

Proper process / practices in architecting, managing risks, developing and testing.

As we were taught in various SE classes (577, 578…)

Good communication between stakeholders

To ensure all sides are talking about the same thing

11

Download