CS 5150 Software Engineering Reliability 1 Lecture 19

advertisement
CS 5150
Software Engineering
Lecture 19
Reliability 1
CS 5150
1
Administration
Weekly progress reports
•
Weekly progress reports are not required
after this week's presentations and report.
CS 5150
2
Security Techniques: Barriers
Place barriers that separate parts of a complex system:
•
Isolate components, e.g., do not connect a computer to a
network
• Firewalls
• Require authentication to access certain systems or parts
of systems
Every barrier imposes restrictions on permitted uses of the
system
Barriers are most effective when the system can be divided
into subsystems with simple boundaries
CS 5150
3
Barriers: Firewall
Public
network
Private
network
Firewall
A firewall is a computer at the junction of two network
segments that:
• Inspects every packet that attempts to cross the boundary
• Rejects any packet that does not satisfy certain criteria, e.g.,
an incoming request to open a TCP connection
an unknown packet type
Firewalls provide security at a loss of flexibility and a cost of
system administration.
CS 5150
4
Security Techniques:
Authentication & Authorization
Authentication establishes the identity of an agent:
• What does the agent know (e.g., password)?
• What does the agent possess (e.g., smart card)?
• Where does the agent have physical access to (e.g., crt-alt-del)?
• What are the physical properties of the agent (e.g., fingerprint)?
Authorization establishes what an authenticated agent may do:
• Access control lists
• Group membership
CS 5150
5
Example: An Access Model for Digital Content
User
Roles
Actions
Digital material
Access
Attributes
CS 5150
Operations
Policies
6
Security Techniques: Encryption
Allows data to be stored and transmitted securely, even
when the bits are viewed by unauthorized agents
Encryption
Y
X
Decryption
Y
• Private key and public key
X
• Digital signatures
CS 5150
7
Security and People
People are intrinsically insecure:
• Careless (e.g., leave computers logged on, leave passwords
where others can read them)
• Dishonest (e.g., stealing from financial systems)
• Malicious (e.g., denial of service attack)
Many security problems come from inside the organization:
• In a large organization, there will be some disgruntled and
dishonest employees
• Security relies on trusted individuals. What if they are
dishonest?
CS 5150
8
Design for Security: People
• Make it easy for responsible people to use the system (e.g.,
make security procedures simple)
• Make it hard for dishonest or careless people (e.g.,
password management)
• Train people in responsible behavior
• Test the security of the system thoroughly and repeatedly,
particularly after changes
• Do not hide violations
CS 5150
9
Programming Secure Software
Programs that interface with the outside world (e.g., Web
sites) need to be written in a manner that resists intrusion.
For the top 25 programming errors, see: Common Weakness
Evaluation: A Community-Developed Dictionary of Software
Weakness Types. http://cwe.mitre.org/top25/
• Insecure Interaction Between Components
• Risky Resource Management
• Porous Defenses
Project management must ensure that programs avoid these
errors.
CS 5150
10
Programming Secure Software
The following list is from the SANS Security Institute,
Essential Skills for Secure Programmers Using
Java/JavaEE, http://www.sans.org/
• Input Handling
• Authentication & Session Management
• Access Control (Authorization)
• Java Types & JVM Management
• Application Faults & Logging
• Encryption Services
• Concurrency and Threading
• Connection Patterns
CS 5150
11
Suggested Reading
Trust in Cyberspace, Committee on Information Systems
Trustworthiness, National Research Council (1999)
http://www.nap.edu/readingroom/books/trust/
Fred Schneider, Cornell Computer Science, was the chair of
this study.
CS 5150
12
Dependable and Reliable Systems:
The Royal Majesty
From the report of the National Transportation Safety Board:
"On June 10, 1995, the Panamanian passenger ship Royal Majesty
grounded on Rose and Crown Shoal about 10 miles east of
Nantucket Island, Massachusetts, and about 17 miles from where
the watch officers thought the vessel was. The vessel, with 1,509
persons on board, was en route from St. George’s, Bermuda, to
Boston, Massachusetts."
"The Raytheon GPS unit installed on the Royal Majesty had been
designed as a standalone navigation device in the mid- to late
1980s, ...The Royal Majesty’s GPS was configured by Majesty
Cruise Line to automatically default to the Dead Reckoning mode
when satellite data were not available."
CS 5150
13
The Royal Majesty: Analysis
• The ship was steered by an autopilot that relied on position
information from the Global Positioning System (GPS).
• If the GPS could not obtain a position from satellites, it provided
an estimated position based on Dead Reckoning (distance and
direction traveled from a known point).
• The GPS failed one hour after leaving Bermuda.
• The crew failed to see the warning message on the display (or to
check the instruments).
• 34 hours and 600 miles later, the Dead Reckoning error was 17
miles.
CS 5150
14
The Royal Majesty: Software Lessons
All the software worked as specified (no bugs), but ...
• Since the GPS software had been specified, the requirements
had changed (stand alone system now part of integrated system).
• The manufacturers of the autopilot and GPS adopted different
design philosophies about the communication of mode changes.
• The autopilot was not programmed to recognize valid/invalid
status bits in message from the GPS (NMEA 0183).
• The warnings provided by the user interface were not
sufficiently conspicuous to alert the crew.
• The officers had not been properly trained on this equipment.
CS 5150
15
Key Factors for Reliable Software
• Organization culture that expects quality
• Approach to software design and implementation that hides
complexity (e.g., structured design, object-oriented
programming)
• Precise, unambiguous specification
• Use of software tools that restrict or detect errors (e.g.,
strongly typed languages, source control systems, debuggers)
•
Programming style that emphasizes simplicity, readability,
and avoidance of dangerous constructs
•
Incremental validation
CS 5150
16
Building Dependable Systems: Three Principles
For a software system to be dependable:
• Each stage of development must be done well.
• Changes should be incorporated into the structure as carefully
as the original system development.
• Testing and correction do not ensure quality, but dependable
systems are not possible without systematic testing.
CS 5150
17
Building Dependable Systems:
Organizational Culture
Good organizations create good systems:
•
Acceptance of the group's style of work (e.g., meetings,
preparation, support for juniors)
•
Visibility
•
Completion of a task before moving to the next (e.g.,
documentation, comments in code)
CS 5150
18
Building Dependable Systems:
Quality Management Processes
Assumption:
Good software is impossible without good processes
The importance of routine:
Standard terminology (requirements, specification,
design, etc.)
Software standards (coding standards, naming
conventions, etc.)
Regular builds of complete system
Internal and external documentation
Reporting procedures
CS 5150
19
Building Dependable Systems:
Quality Management Processes
When time is short...
Pay extra attention to the early stages of the process:
feasibility, requirements, design.
There will be little time to redo mistakes in the requirements.
Experience shows that taking extra time on the early stages
will usually reduce the total time to release.
CS 5150
20
Building Dependable Systems:
Specifications for the Client
Specifications are of no value if they do not meet the
client's needs
•
The client must understand and review the
requirements specification in detail
•
Appropriate members of the client's staff must review
relevant areas of the design (including operations,
training materials, system administration)
•
The acceptance tests must belong to the client
CS 5150
21
Building Dependable Systems: Changes
Feasibility study
Requirements
System design
Program design
Implementation (coding)
Changes
Testing
Acceptance & release
Operation & maintenance
CS 5150
22
Building Dependable Systems: Change
Change management:
Source code management and version control
Tracking of change requests and bug reports
Procedures for changing requirements
specifications, designs and other documentation
Regression testing
Release control
CS 5150
23
Building Dependable Systems: Complexity
The human mind can encompass only limited complexity:
• Comprehensibility
• Simplicity
• Partitioning of complexity
A simple component is easier to get right than a complex one.
CS 5150
24
Reliability Metrics
Reliability
Probability of a failure occurring in operational use.
Perceived reliability
Depends upon:
user behavior
set of inputs
pain of failure
CS 5150
25
Reliability Metrics
Traditional measures for online systems
• Mean time between failures
• Availability (up time)
• Mean time to repair
Market measures
• Complaints
• Customer retention
User perception is influenced by
• Distribution of failures
CS 5150
26
Metrics: User Perception of Reliability
1. A personal computer that crashes frequently v. a machine
that is out of service for two days.
2. A database system that crashes frequently but comes back
quickly with no loss of data v. a system that fails once in
three years but data has to be restored from backup.
3. A system that does not fail but has unpredictable periods
when it runs very slowly.
CS 5150
27
Reliability Metrics for Distributed Systems
Traditional metrics are hard to apply in multi-component
systems:
• A system that has excellent average reliability might give
terrible service to certain users.
• In a big network, at any given moment something will be giving
trouble, but very few users will see it.
• When there are many components, system administrators
rely on automatic reporting systems to identify problem areas.
CS 5150
28
Requirements Metrics for System Reliability
Example: ATM card reader
Failure class
Example
Metric (requirement)
Permanent
System fails to operate
non-corrupting with any card -- reboot
1 per 1,000 days
Transient
System can not read
non-corrupting an undamaged card
1 in 1,000 transactions
Corrupting
Never
CS 5150
A pattern of
transactions corrupts
database
29
Metrics: Cost of Improved Reliability
Time
and $
Reliability
metric
99%
100%
• Will you spend your money on new
functionality or improved reliability?
• When do you ship?
CS 5150
30
Example: Central Computing System
A central computer system (e.g., a server farm) is
vital to an entire organization. Any failure is
serious.
Step 1: Gather data on every failure
•
Many years of data in a data base
• Every failure analyzed:
hardware
software (default)
environment (e.g., power, air conditioning)
human (e.g., operator error)
CS 5150
31
Example: Central Computing System
Step 2: Analyze the data
• Weekly, monthly, and annual statistics
Number of failures and interruptions
Mean time to repair
• Graphs of trends by component, e.g.,
Failure rates of disk drives
Hardware failures after power failures
Crashes caused by software bugs in each
component
CS 5150
32
Example: Central Computing System
Step 3: Invest resources where benefit will be
maximum, e.g.,
• Priority order for software improvements
• Changed procedures for operators
• Replacement hardware
• Orderly shut down after power failure
Example. Supercomputers may average 10 hours
productive work per day.
CS 5150
33
Download