Presentation(in MS PowerPointFormat)

advertisement

Software Reliability

Engineering

By Jackie Wadzinski

The Patriot Missile

Used to destroy incoming

Iraqi Scud Missiles

Hailed for effectiveness

Operated for 100 consecutive hours

28 American soldiers killed

Cause: Software Failure

The Patriot Missile

A Learning Experience

The software can be redesigned

A new Patriot Missile can be built

The fate of the 28 soldiers remains the same

THE MORAL: Software Engineers need to find a way to engineer reliability into software.

Objectives

 Definition of Software Reliability

 Importance of Reliability Engineering

 Why Reliability Engineering is Difficult

 Reliability Engineering Processes

 Weibull

 Musa

 Monte Carlo

 Conclusion

What is Software Reliability?

IEEE Definition:

“The ability of a system or component to perform its required functions under stated conditions for a specified period of time.”

Definition allows for “Just Right” level of reliability for software

Software Reliability and Hardware Reliability have the same definition

Why is Software Reliability

Important?

Manager View

Reliable software means satisfied customers

Reliable software means repeat customers

Reliable software is ethical

Legal liability

Customer View

Reliable software saves time

Reliable software increases efficiency

Why Software Reliability is

Difficult to Calculate

Without considering program evolution, failure rate is statistically non existent

There are many possible causes for design defects for failures to arise from

Why Software Reliability is

Difficult to Calculate

Errors can occur without warning

Cannot improve software quality if identical software components are used

Periodic restarts can sometimes help fix problems

Errors are caused by incorrect logic, incorrect statements, or incorrect input data

Software may require infinite testing

Software reliability models do not always fit the data points well

Over View

There are many models to chose from when calculating software reliability

Focus on three

Weibull Failure Time Model

Musa’s Basic Execution Time Model

Monte Carlo Simulation

Of all the models, each has strengths and limitations

Weibull Failure Time

About Weibull Failure Model

Used to model failure processes of hardware

One of the first models to be applied to software reliability modeling

Flexible – accommodates increasing, decreasing or constant failure rates

Weibull Failure Model

Weibull Failure Model Assumptions:

There are a fixed number of faults in the software being tested

The number of faults are detected in time intervals ((t=0, t1), (t1,t2)….)

Limitations:

Flexibility allows for greater chance of making the wrong assumption

Weibull Failure Model Example

Notice how the model follows the actual data

Musa

About Musa’s Basic Time

Execution Model

Developed by John Musa of AT&T Bell

Laboratories

One of the first models to use actual execution time of software components versus calendar time

Time between failures is expressed in terms of CPU time

Musa’s Basic Time Execution

Model

Uses a Poisson Distribution

Model Assumptions:

The execution times between failures is exponentially distributed

The hazard rate for a single fault is constant

Limitations:

Assumes new faults are not introduced after correction

Assumes number of faults decreases over time

Musa’s Basic Time Execution

Model Example

Notice how the model follows the actual data

Monte Carlo

Simulation

About Monte Carlo Simulation

Developed in 1940s as part of the atomic bomb program

Named after Monte Carlo, Monaco because city’s casinos featured games of chance like dice and roulette

Today Monte Carlo Simulations are used in many applications including physics, finance, and system reliability

Monte Carlo Simulation

Used for very complex problems which are difficult to solve or no solution exists

Uses statistics to mathematically model real life processes and then estimates the probability of possible outcomes

Involves fitting a curve to a process and then using the fitted curve to model a process over time

Dice Example

Monte Carlo Simulation Process

Determine a probability function

Weibull Distribution – Best for failure process

Lognormal Distribution – Best for repair process

Determine the random number generator, the source for selecting random numbers that are distributed uniformly on the proper unit interval

Determine a sampling rule for selecting samples for the model given a unit interval of random numbers

Record a count successes and failures

Monte Carlo Example

Select a random location within the rectangle

If the selected location is blue, record a hit

Repeat 10,000 times

Blue Area = (Hits / 10,000) * Area of Rectangle

Note: The standard error in the result is inversely proportional to the square root of the sample size

Monte Carlo Software Example

Arbitrary 3 component subsystem

The failure probability of each component given in the diagram above

If the first component fails, then the second is checked

If the second component fails, then the third component is checked

If the third component fails, then the entire subsystem fails

Monte Carlo Software Example

The actual failure of the subsystem is:

The results of the actual simulation are:

Conclusion

Conclusion

Engineering reliable software is important to both the engineer and the end user

Engineering reliable software is not an easy task to accomplish

There are methods available for measuring reliability

Each method has its strengths and weaknesses

At this time, no one method is superior

Questions

References

Ganesh, Pai. Survey of Software Reliability Models. Fall 2002 .

Korver, Brian. The Monte Carlo Method and Software Reliability

Theory.

Title

Oregan, 1994.

Lyu, Michael R, Editor. Handbook of Software Reliability Engineering.

IEEE Computer Society Press, McGraw-Hill, 1996.

Subtitle

Mladen, Vouk A. Software Reliability Engineering. Tutorial

Presented at Annual Reliability and Maintenance Symposium,

1998.

Pham, Hoang. Software Reliability . Springer-Verlag, 2000 .

Download