EC26_Classic_Disasters - Software Engineering II

advertisement
University of Southern California
Center for Systems and Software Engineering
Software Classic
Disasters
CS 577b Software Engineering II
Supannika Koolmanojwong
April 4, 2011
University of Southern California
Center for Systems and Software Engineering
Outline
• IT Project Management: Infamous Failures,
Classic Mistakes, and Best Practices
• Recovering IT in a Disaster: Lessons from
Hurricane Katrina
• Top 10 Worst Practices
04/04/2011
© 2011 USC-CSSE
2
University of Southern California
Center for Systems and Software Engineering
IT Project Management: Infamous Failures,
Classic Mistakes, and Best Practices
R. Ryan Nelson , MIS Quarterly Executive Vol. 6 No. 2 / June 2007
•
Retrospectives by project postmortems or postimplementation reviews
• 99 retrospectives conducted in 74 organizations
over the past 7 years
•
“Insanity: doing the same thing over and over
again and expecting different results.” — Albert
Einstein
04/04/2011
© 2011 USC-CSSE
3
University of Southern California
Center for Systems and Software Engineering
10 of the most infamous IT project failures
• Large magnitude
• Over $100 million
• One-half come from the public sector
– wasted taxpayer dollars
– lost services
• the other half - the private sector
– billions of dollars in added costs
– lost revenues
– lost jobs.
04/04/2011
© 2011 USC-CSSE
4
University of Southern California
Center for Systems and Software Engineering
1. Internal Revenue Service (IRS)1999
• PROJECT:
– Business Systems Modernization;
– Launched in 1999 to upgrade the agency’s IT infrastructure and
more than 100 business applications
• $8 billion modernization project , team of vendors
• a complex project overwhelms the management capabilities
of both vendor and client.
• the most expensive systems development “fiasco” in history,
with delays costing the U.S. Treasury tens of billions of
dollars per year.
• ability to collect revenue, conduct audits, and go after tax
evaders was severely compromised
04/04/2011
© 2011 USC-CSSE
5
University of Southern California
Center for Systems and Software Engineering
2. Federal Aviation Administration, 1996
• PROJECT: Advanced Automation System (AAS); FAA’s effort
to modernize the nation’s air traffic control system.
• Estimated to cost $2.5 billion ( $1.5 billion is wasted)
• Numerous delays and cost overruns, which were blamed on
both the FAA and the primary contractor, IBM.
• Technical complexity of the effort, bad resource estimation,
ineffectively requirements control
• "For example, they wanted the system to have only 3 seconds of
downtime a year. But to get the data to prove that requirement
had been met would have taken about 10 years” (later on change
to 5 minutes downtime)
• Instead of admitting the problem, IBM turned AAS into a research
project
• The project collapsed
04/04/2011
© 2011 USC-CSSE
6
University of Southern California
Center for Systems and Software Engineering
3. Federal Bureau of Investigation, 20004
• PROJECT: “Trilogy;” Four-year, $500M overhaul of
the FBI’s antiquated computer system.
• Ill-defined requirements, changed dramatically
after 9/11 (agency mission switched from criminal
to intelligence focus)
• $170 million project was abandoned altogether
• 400 problems with early versions of the troubled
software, but never told the contractor
• The bureau went ahead with a $17 million testing
program even the software would have to be scrapped
04/04/2011
© 2011 USC-CSSE
7
University of Southern California
Center for Systems and Software Engineering
4. McDonalds, 2001
• PROJECT: “Innovate;” Digital network for creating a real-time
enterprise
• planned to spend $1 billion over five years
• Objective: to better serve customers by using information and
communications technologies to monitor the quality of products
and services
• Executives in company headquarters would have been able
to see how soda dispensers and frying machines in every
store were performing, at any moment.
• Would need $1billion for infrastructure, and $zillions to maintain
and upgrade
• After two years and $170M, the fast food giant threw in the
towel.
04/04/2011
© 2011 USC-CSSE
8
University of Southern California
Center for Systems and Software Engineering
5. Denver International Airport 1994
• PROJECT: Baggage-handling system.
• It took 10 years and at least $600 million to
figure out big muscles, not computers, can
best move baggage
• The baggage system, designed and built by
BAE Automated Systems Inc., launched,
chewed up, and spit out bags so often that
it became known as the “baggage system
from hell.”
04/04/2011
© 2011 USC-CSSE
9
University of Southern California
Center for Systems and Software Engineering
6. AMR Corp., Budget Rent A Car Corp., Hilton
Hotels Corp., Marriott International Inc, 1992
• PROJECT: “Confirm;” Reservation system for
hotel and rental car bookings
• After four years and $125 million in development,
when it became clear that Confirm would miss its
deadline by as much as two years.
• Was supposed to be a leading edge
comprehensive travel industry reservation
program combining airline, rental car and hotel
information
• Major problems surfaced when Hilton tested the
system, then 18 months delay and the problems
could not be resolved© 2011 USC-CSSE
04/04/2011
10
University of Southern California
Center for Systems and Software Engineering
7. Bank of America, 1988
• PROJECT: “MasterNet;” Trust accounting system.
• hardware problems caused the Bank of America (BofA) to
lose control of several billion dollars of trust accounts.
• All the money was eventually found in the system, but all 255
people in the entire Trust Department were fired, as all the
depositors withdrew their money.
• This is a classic case study on the need for risk assessment,
including people, process, and technology-related risk.
• BofA spent $60M to fix the $20M project before deciding to
abandon it altogether. BofA fell from being the largest bank
in the world to No. 29
• CRACK stakeholders problems, bad modular design,
focusing in competing with competitors-but ready for
transition
04/04/2011
© 2011 USC-CSSE
11
University of Southern California
Center for Systems and Software Engineering
8. Kmart, 2000
• PROJECT: IT systems modernization
• $1.4 billion IT modernization effort
• aimed at linking its sales, marketing, supply, and
logistics systems.
• 18 months later, cash-strapped Kmart cut back on
modernization, writing off the $130 million it had
already invested in IT.
• Four months later, it declared bankruptcy
• Failing to allocate enough money and manpower to not
clearly establishing the IT project's relationship to the
organization's business
04/04/2011
© 2011 USC-CSSE
12
University of Southern California
Center for Systems and Software Engineering
9. London Stock Exchange, 1993
•
•
•
•
PROJECT: “Taurus;” Paperless share settlement system.
£800 million, original budget £6 million
Abandoned after 10 years of development
By Vista Concepts, US, for database management. Although
being very good for on-line real time processing, it could not
handle distributed data processing or batch processing
• LSE tried to modify Vista by rewriting almost 60% of it, hence
hidden bugs and long delays
• Grew from a settlement only system, to become a full “share
registration and transfer system”.
04/04/2011
© 2011 USC-CSSE
13
University of Southern California
Center for Systems and Software Engineering
10. Nike, 2000
• PROJECT: Integrated enterprise software
• $400 million installing ERP, CRM, and
SCM—the full complement of analystblessed integrated enterprise software.
• Caused major inventory glitch, overproduced some shoe models and underproduced others
• profits drop by $100 million
04/04/2011
© 2011 USC-CSSE
14
University of Southern California
Center for Systems and Software Engineering
Classic Mistakes
•
•
•
•
•
Behind schedule
Add more people
Want to speed up development
Cut testing
A new version of OS becomes available
during the project, Time for an upgrade!
• Key contributors aggravating the rest of the
team? Wait until the end of the project to
fire him!
04/04/2011
© 2011 USC-CSSE
15
University of Southern California
Center for Systems and Software Engineering
Classic Mistakes: People
• Undermined motivation
– productivity and quality
• Individual capabilities of the team members
or the working relationships
• Failure to take action to deal with a problem
employee
• Adding people to a late project
– pouring gasoline on a fire
04/04/2011
© 2011 USC-CSSE
16
University of Southern California
Center for Systems and Software Engineering
Classic Mistakes: Process
• BDUF – Big Design Up Front
• Underestimate, overly optimistic schedules, under
scoping it, undermining effective planning, and
shortchanging requirements determination and/or
quality assurance
– Poor estimation also puts excessive pressure on team
members, leading to lower morale and productivity.
• Insufficient risk management
• contractor failure - outsourcing and
offshoring
04/04/2011
© 2011 USC-CSSE
17
University of Southern California
Center for Systems and Software Engineering
Classic Mistakes: Product
• FAA’s modernization effort, where the goal was 99.99999%
reliability, which is referred to as “the seven nines.”
• Requirements gold-plating
• Feature creep
– average project experiences about a +25% change in
requirements over its lifetime.
• Developer gold-plating - new technology that are required in
the product.
• Research-oriented development
• Silver-bullet syndrome
• Overestimated savings from new tools or methods
• Switching tools in the middle of a project
04/04/2011
© 2011 USC-CSSE
18
University of Southern California
Center for Systems and Software Engineering
A Meta-Retrospective of 99 IT
Projects
• process mistakes (45%), people mistakes (43%) product
mistakes (8%) or technology mistakes (4%).
– project managers should be experts in managing processes and
people.
• Scope creep didn’t make the top ten mistakes
– As long as project manager pays attention to it
• Contractor failure has been climbing in frequency in recent
years
• If the project managers had focused their attention on better
estimation and scheduling, stakeholder management, and
risk management, they could have significantly improved the
success of the majority of the projects studied.
04/04/2011
© 2011 USC-CSSE
19
University of Southern California
Center for Systems and Software Engineering
Avoid classic mistakes through best practices
1. Avoiding Poor Estimating and/or Scheduling
–
–
–
–
Cost overrun, 1994-180%, 2003-43%,
Schedule overrun, 2000- 63%, 2007-82%.
cone of uncertainty
•
by multiplying the “most likely” single-point estimate by the optimistic
factor
•
lower bounds - optimistic estimate
•
upper bounds - pessimistic estimate.
Capital One
•
100% cushion - beginning of the feasibility phase
•
75% cushion in the definition phase
•
50% cushion in design
•
25% cushion at the beginning of construction
04/04/2011
© 2011 USC-CSSE
20
University of Southern California
Center for Systems and Software Engineering
Avoiding Poor Estimating and/or Scheduling
• Valuable approaches to improving project estimation
and scheduling
– Timebox development
• shorter, smaller projects are easier to estimate,
– creating a work breakdown structure
• to help size and scope projects
– retrospectives
• to capture actual size, effort and time data for use in
making future project estimates
– a project management office to maintain a repository of
project data over time.
04/04/2011
© 2011 USC-CSSE
21
University of Southern California
Center for Systems and Software Engineering
Avoiding Ineffective Stakeholder Management
• ineffective stakeholder management is the
second biggest cause of project failure
• Have to know
–
–
–
–
04/04/2011
who has influence over others
who has direct control of resources
stakeholder level of interest
stakeholder degree of support/resistance
© 2011 USC-CSSE
22
University of Southern California
Center for Systems and Software Engineering
Avoiding Insufficient Risk Management
• risk identification, analysis, prioritization,
risk-management planning, resolution, and
monitoring.
• Methods/ tools
–
–
–
–
04/04/2011
a prioritized risk assessment table
a top-10 risks list,
interim retrospectives
appointing a risk officer
© 2011 USC-CSSE
23
University of Southern California
Center for Systems and Software Engineering
Avoiding Insufficient Planning
• Ensure the followings
–
–
–
–
04/04/2011
Clear roles and responsibilities
Resource allocation
Schedule / timeline
Follow project policies, plans, and procedures
© 2011 USC-CSSE
24
University of Southern California
Center for Systems and Software Engineering
Avoiding Shortchanging Quality Assurance
• When a project falls behind schedule, the
first two areas that often get cut are testing
and training.
• Cut corners by eliminating test planning,
eliminating design and code reviews, and
performing only minimal testing
• Suggestions:
– agile development, joint application design
sessions, automated testing tools, and daily
build-and-smoke tests.
04/04/2011
© 2011 USC-CSSE
25
University of Southern California
Center for Systems and Software Engineering
Avoiding Weak Personnel and/or Team Issues
• get the right people assigned to the project
from the beginning
• Between 1999 and 2006, the retrospectives
reported an increasing number of problems
with distributed, inter-organizational, and
multi-national teams.
– reduction in face-to-face team meetings, timezone barriers, and language and cultural issues
04/04/2011
© 2011 USC-CSSE
26
University of Southern California
Center for Systems and Software Engineering
Avoiding Insufficient Project Sponsorship
• Not only getting top management support,
but identifying the right sponsor
• From the beginning !!!
04/04/2011
© 2011 USC-CSSE
27
University of Southern California
Center for Systems and Software Engineering
04/04/2011
© 2011 USC-CSSE
28
University of Southern California
Center for Systems and Software Engineering
Outline
• IT Project Management: Infamous Failures,
Classic Mistakes, and Best Practices
• Recovering IT in a Disaster: Lessons from
Hurricane Katrina
• Top 10 Worst Practices
04/04/2011
© 2011 USC-CSSE
29
University of Southern California
Center for Systems and Software Engineering
Hurricane Katrina
04/04/2011
© 2011 USC-CSSE
30
University of Southern California
Center for Systems and Software Engineering
Recovering IT in a Disaster:
Lessons from Hurricane Katrina
Iris Junglas, Blake Ives, MIS Quarterly Executive Vol. 6 No. 1 / Mar 2007
• August 29, 2005 - Hurricane Katrina destroyed a data center
and communications infrastructure at the Pascagoula and
Gulfport, Mississippi, operations of the Ship Systems sector
of Northrop Grumman Corporation
• Also put a second data center out of commission in a
shipyard near New Orleans
• 20,000 employees in Ship Construction
• Caused over US$1 billion in damage for the company
• Brought two of the nation’s largest shipyards to a standstill
04/04/2011
© 2011 USC-CSSE
31
University of Southern California
Center for Systems and Software Engineering
Recovering IT in a Disaster
• How to adapt when the business continuity
plan; inadequate public infrastructure
• Reexamine our processes for preparing
disaster plans
• Processes for assessing preparedness and
response after a disaster or a near-disaster.
04/04/2011
© 2011 USC-CSSE
32
University of Southern California
Center for Systems and Software Engineering
Northrop Grumman Corporation
• Products : electronics, aerospace, and
shipbuilding
• Customers: government and commercial
customers worldwide
• Major business:
–
–
–
–
–
04/04/2011
Ship construction - large military vessels
Revenue: US$5.7 billion in 2005
Customers: DoD and Navy
12,900 employees at Mississippi;
7,100 employees at the New Orleans
© 2011 USC-CSSE
33
University of Southern California
Center for Systems and Software Engineering
Preparation for Hurricane
• Hurricane is nothing new to ship industry
– September 04 – Hurricane Ivan
– July 05 - Hurricane Dennis
• A bigger one is heading in
– August 05
• 11 people dead, over US$1billion in damage
in Florida
04/04/2011
© 2011 USC-CSSE
34
University of Southern California
Center for Systems and Software Engineering
Preparation for Hurricane
• Data
– Data backups were sent to Iron Mountain (information
management services)
– Double back up in Dallas
• Servers
– power off
– wrapped in plastic
• New backup generator – in secure location
• Only one extranet alive (crucial the Navy and DoD)
• Human
– Left the area
04/04/2011
© 2011 USC-CSSE
35
University of Southern California
Center for Systems and Software Engineering
The storm smashed
• NGC facilities are on the storm’s path
• Communication failed
• Extensive damage to shipyard and nearby
communities
• Emergency command center – at Dallas
office – newly assembled emergency team
is formed
04/04/2011
© 2011 USC-CSSE
36
University of Southern California
Center for Systems and Software Engineering
Damages
• Collect digital images of damages
• At Mississippi, lost
– 1,500 PC, 200 servers, 300 printers, 600 data input
devices, and hundreds of two-way radios.
– communications closets, routers, switches, fiber and
copper cables and wires.
– LAN / WAN / MAN – no longer worked
• At New Orleans
– Infrastructures are there
– AC systems are not working, hence servers are automatic
shutdown
• A week after the storm, communication lines are
down again due to cars are driving over them
04/04/2011
© 2011 USC-CSSE
37
University of Southern California
Center for Systems and Software Engineering
First thing first
• Not about restoring computer systems, but
restoring human resources
• But most of the 20,000 employees were out
of contact
• Tools
– Press releases
– Corporate web site (67,000 hits in the weeks
after the storm )
– Toll-free call in number
• Payroll through Wal-Mart and Western
Union
04/04/2011
© 2011 USC-CSSE
38
University of Southern California
Center for Systems and Software Engineering
Restoring IT infrastructure
• Electronic communication – nonexistent
due to public communication infrastructure
• Communication through Black Berry can
be used intermittently
• Two-way radios, walkie-talkies
• Key members using satellite phones
– Required line-of-sight access to satellites
• Later on, use wireless communication
04/04/2011
© 2011 USC-CSSE
39
University of Southern California
Center for Systems and Software Engineering
Building new data center
• Hardware acquisition
• Incompatibilities between software and new
hardware environment
• Inaccessible or difficult to find system
documentation, e.g. license keys, server
names, addressing schemes, login IDs
04/04/2011
© 2011 USC-CSSE
40
University of Southern California
Center for Systems and Software Engineering
Restoring data and applications
• Some firms found that their back up data is
partially unreadable
• For NGC, 2 backups : iron mountain and
Dallas
• Lost some data on desktops or local
machines
• Two weeks after Katrina – had a new data
center; essential systems are up and
running
04/04/2011
© 2011 USC-CSSE
41
University of Southern California
Center for Systems and Software Engineering
Disaster preparedness
• Common mistake : prepare for disasters specific
to their domain
– financial institutions prepare for IT failures,
– hospitals for pandemics
– airliners for technical failures and sabotages.
• An alternative approach : consider a broader
spectrum of disaster types, such as the generic
disaster
– economic, information, physical, human resource,
reputation, psychopathic, and natural disasters
• Identify common characteristics of each disaster
categories, then construct the plan
04/04/2011
© 2011 USC-CSSE
42
University of Southern California
Center for Systems and Software Engineering
IT disaster preparedness framework
• provide generic objectives and measurements, guidelines for
establishing IT disaster preparedness,
• emphasize developing an IT continuity plan, identifying and
allocating critical resources, executing a business impact
analysis, and maintaining, testing and training of the plan
• COBIT (Control Objectives for Information and Related
Technology)
– For operational IT and business managers
– Focus on three core elements of IT governance: IT as an asset, ITrelated risks, and IT control structures.
• ITIL (IT Infrastructure Library)
– focus is to improve the efficiency and effectiveness of IT services
delivered to customers within the enterprise
– de facto standard for IT service management.
04/04/2011
© 2011 USC-CSSE
43
University of Southern California
Center for Systems and Software Engineering
IT disaster preparedness framework
COBIT (Control Objectives for Information and Related Technology)
04/04/2011
© 2011 USC-CSSE
ITIL (IT Infrastructure Library)
44
University of Southern California
Center for Systems and Software Engineering
Lesson Learned
1. Keep Data and Data Centers Out of Harm’s
Way
2. Don’t Assume the Public Infrastructure
Will Be Available
3. Plan for Civil Unrest
4. Assume Some People Will Not Be
Available
5. Leverage Your Suppliers as Critical Team
Members
04/04/2011
© 2011 USC-CSSE
45
University of Southern California
Center for Systems and Software Engineering
Lesson Learned
6. Expect the Unexpected
7. Get Prepared – Crisis portfolio
8. Establish a Strong Leadership Position
9. Empower Decision Makers on the Team
10.Exploit Fresh-Start Opportunities
04/04/2011
© 2011 USC-CSSE
46
University of Southern California
Center for Systems and Software Engineering
Outline
• IT Project Management: Infamous Failures,
Classic Mistakes, and Best Practices
• Recovering IT in a Disaster: Lessons from
Hurricane Katrina
• Top 10 Worst Practices
04/04/2011
© 2011 USC-CSSE
47
University of Southern California
Center for Systems and Software Engineering
Worst Practices
Capers Jones, "Our Worst Current Development Practices," IEEE Software, vol. 13,
no. 2, pp. 102-104, Mar. 1996
• Project failures
– terminated because of cost or schedule overrun
– experienced schedule or cost overruns in
excess of 50 percent of initial estimates
– resulted in client lawsuits for contractual
noncompliance
04/04/2011
© 2011 USC-CSSE
48
University of Southern California
Center for Systems and Software Engineering
Worst Practice #1
No historical software-measurement
• Lack of historical data makes stakeholders
blind to see the realities of software
development
• Need to check on schedule, cost, progress,
performance
04/04/2011
© 2011 USC-CSSE
49
University of Southern California
Center for Systems and Software Engineering
Worst Practice #2
Rejection of accurate estimates
• No accurate estimate is the root cause for
the rest of the worst practices including:
– inability to perform return-on-investment
calculations
– susceptibility to false claims by tool and method
vendors
– software contracts that are ambiguous and
difficult to monitor.
04/04/2011
© 2011 USC-CSSE
50
University of Southern California
Center for Systems and Software Engineering
Worst Practice #3 & 4
Failure to use automated estimating tools
and automated planning tools.
• 50 commercial software-cost estimating tools
– Checkpoint, COCOMO, Estimacs, Price-S, or Slim
• 100 project-planning tools on the market
– Microsoft Project, Primavera, Project Manager’s
Workbench, or Timeline
• Combination of estimating and planning tools
leads to accurate and realistic outcomes not easily
overridden by clients or executive
04/04/2011
© 2011 USC-CSSE
51
University of Southern California
Center for Systems and Software Engineering
Worst Practices
• 5 & 6 - Excessive, irrational schedule
pressure and creep in users’ requirements
• 7 & 8 - Failure to monitor progress and to
perform risk management
– “90 percent completion”
• 9 & 10 - Failure to use design reviews and
code inspections.
04/04/2011
© 2011 USC-CSSE
52
Download