Software Life-Cycle Models

advertisement
Lecture 10
Implementation
CSCI – 3350 Software Engineering II
Fall 2014
Bill Pine
Overview
•
•
•
•
•
The Implementation Workflow
Choosing a Programming Language
Good Programming Practices
Coding Standards
Metrics for the Implementation Workflow
CSCI 3350
Lecture 10 - 2
Overview (cont)
•
•
•
•
Secure Coding Background
Buffer overflow attack
Strategies to reduce vulnerability
Guiding Principles for Software Security
CSCI 3350
Lecture 10 - 3
Implementation Workflow
• Goal: Clearly and accurately represent the
detailed application design in the chosen
implementation language
– Define the unit tests
– Write the code
– Execute the unit test suite
• Resolve any discrepancies
– Submit to QA group for further evaluation
CSCI 3350
Lecture 10 - 4
Choosing a Programming Language
• Specified directly as a requirement
• Specified indirectly as a requirement
– Platform specified
• If there is an opportunity for choice
– “Most appropriate language” requirement
– Or you may be driven by an object-oriented
design and implementation requirement
CSCI 3350
Lecture 10 - 5
Taking a Decision
• Base upon
– Cost benefit analysis
– Risk analysis
• Use the language strength of the organization
– Procedural vs. object-oriented
• Pure object-oriented
• Hybrid
• Acquiring needed skills issue
– Hire new talent
– Retrain existing employees
– A mixture?
CSCI 3350
Lecture 10 - 6
X- Generation Language
• First-generation languages
– Machine code
• Second-generation language
– Assembly
• Third-generation language
– High order language
• Multiple (5-10) machine instructions/source line
• Examples: FORTRAN, C, COBOL …
CSCI 3350
Lecture 10 - 7
Fourth-Generation Language
• Application-specific language
• Original goal was 25 - 50 mi/source line
– Database based
•
•
•
•
PowerBuilder
Oracle
DB2
Report generators
– Mathematics based
• Mathmatica
• SPSS
CSCI 3350
Lecture 10 - 8
Good Programming Practices
• Many best practices tend to be language specific
– Some authors have made a career of adapting for each
new language
– Example: Henry Ledgard - authored > 25 works
•
•
•
•
•
•
•
Programming Proverbs
Programming Proverbs and Principles
Programming Proverbs for FORTRAN Programmers
FORTRAN with Style: Programming proverbs
Pascal with Style: Programming proverbs
Pascal with Excellence: Programming proverbs
Programming Language Landscape
• Some general practices cut across specific
languages
CSCI 3350
Lecture 10 - 9
Best Coding Practices
• The following slides on best practices draw
heavily upon
– Clean Code – full citation in reference
• Agile Development Community is the origin of
the Best Coding Practices
• You must devote effort to writing and maintaining
quality code
– As the code deteriorates, so decreases team productivity
– Decreasing productivity, causes less effort to be
expended in maintaining code quality. Leading to lower
productivity …
– A positive feedback loop that is inherently unstable
Best Coding Practices (cont)
Writing clean code is what you must do in
order to call yourself a professional. There
is no reasonable excuse for doing anything
less than your best.
- Robert Martin
Best Programming Practices
• Will examine guidelines relating to the
following areas
–
–
–
–
Identifier names
Functions
Comments
Formatting
CSCI 3350
Lecture 10 - 12
Guidelines for Identifier Names
• Meaning must be obvious to the
maintenance programmer
– Maximize communications
• Use intention-revealing names
– Much harder than it seems
– Accept that the name will probably change as
you are developing
CSCI 3350
Lecture 10 - 13
General Practices
• General Guidelines
– Variables should be nouns (noun phrases)
– Class and objects names should be nouns (noun
phrases
– Function and method names should be verbs (verb
phrases)
– Kernighan and Pike assert: “Long names for global
identifies; short names for local identifiers”
CSCI 3350
Lecture 10 - 14
Identifier Names (cont)
• Identifier name should answer the questions
– Why does the entity exist?
– What does the entity do?
– How is the entity used?
• If the identifier name requires a comment to
answer these questions
– The name does not reveal the identifier’s intent
and needs to be changed
CSCI 3350
Lecture 10 - 15
Identifier Names (cont)
• Avoid Disinformation
– Use a difference in identifier only when you are making
a meaningful distinction
• Example: fetch, get, retrieve or controller, manager, driver
– Don’t use lower case L or upper case O as variable
name
– Don’t use noise words
• a, an, the - as prefixes without a convention
• info, data – as suffixes
• nameString instead of name?
CSCI 3350
Lecture 10 - 16
Contrived (?) Example
int a = 1;
if( 1 == O1 )
a = Ol
else
a = 01
CSCI 3350
Lecture 10 - 17
Identifier Names (cont)
• Use pronounceable names
• Use searchable names
– One or two letter variable names and literal
constants yield too many matches
• Avoid encoding the identifier type in the name
– In particular, eschew Hungarian notation
• No help with strongly typed languages
• Allow for misleading information if the type changes
– But the encoding doesn’t
CSCI 3350
Lecture 10 - 18
Identifier Names (cont)
• Avoid mental mappings
– Short names / heavily abbreviated names require
the reader to translate
• Avoid cute names
• Prefer solution domain names over problem
domain names
– Who is the reading audience of your code?
• Prefer problem domain names over informal
names
CSCI 3350
Lecture 10 - 19
Identifier Names (cont)
• Don’t add gratuitous context to identifier
names
– Add context only as necessary
• accountAddress and customerAddress may be
appropriate for instances of a class
• But not appropriate for a class name – Address would
be a better choice
CSCI 3350
Lecture 10 - 20
Identifier Names (cont)
• Final comments
– Poor names
• Impede communications between the code author and
the code reader
• Have been shown to be an indicator of overall poor code
quality
– Indicate a less than complete understanding by the author
– Point to likely areas for code faults
CSCI 3350
Lecture 10 - 21
Guidelines for Functions
• First rule of functions
– A function should be small
• Second rule of functions
– A function should be smaller than would be
produced by rule 1
– Try for an average of 20 lines / function
– Indent depth should should be 1 or 2 levels
CSCI 3350
Lecture 10 - 22
Functions (cont)
• Functions should do 1 thing
– They should do it well
– They should do that 1 thing only
– All steps in the function should be at the same
level of abstraction
• Principle of Least Surprise
– Based upon the function name, the code in the
function is what you would expect
CSCI 3350
Lecture 10 - 23
Functions (cont)
• The ideal number of arguments is zero
– Niladic
• Followed by
– 1 argument – Monadic
– 2 arguments – Dyadic
– 3 arguments – Triadic
• Any more than 3 requires compelling
justification
CSCI 3350
Lecture 10 - 24
Functions (cont)
• Why restrict the the number of arguments?
– An increasing number of argument requires
increasing conceptual power
– Harder to test and requires an increasing number of
tests
• Eschew flag arguments
– Indicate that a function is doing more than 1 thing
• Functions should have no side effects
CSCI 3350
Lecture 10 - 25
Functions (cont)
• Avoid output arguments
– The reader’s expectation is that an argument is an
input
– Prior to object oriented programming, one could
justify the use of output arguments
• With o-o, instead of having a function return a value
through an argument, the function should change the
state of the appropriate object
CSCI 3350
Lecture 10 - 26
Functions (cont)
• Functions should change the state of an object
or return the state of an objects – never both
• Prefer exceptions over returning error codes
• Never duplicate code (i.e. copy &paste)
– Code bloat
– Multiple places to change the code => multiple
places for faults to be injected
CSCI 3350
Lecture 10 - 27
Guidelines for Comments
• Myth of “self-documenting” code
– Goal: The code should not need comments to
make clear the “how” of the code
– Always need internal documentation
• To meet the need of making clear the “why”
• Block comments at the beginning of each unit
• Comments interspersed (as needed) within the unit
CSCI 3350
Lecture 10 - 28
Comments (cont)
• The previous slide not withstanding, which
code would you rather read?
• Version 1
//
//
//
*** Check if employee is eligible for benefits
if( (employee.flags & HOURLY_FLAG) && (employee.age > 55 ) )
• Version 2
if( employee.isEligibleForFullBenefits( ) )
CSCI 3350
Lecture 10 - 29
Comments (cont)
• Additional thoughts on comments
– Don’t comment the obvious
– Don’t use end-of-line comments with highorder languages
– Format of the comments should reflect and
reinforce the structure of the code
– Comments must be accurate
• Agree with the code and support reading the code
CSCI 3350
Lecture 10 - 30
Comments (cont)
– Don’t comment closing braces
– Don’t use comments as a substitute for source
code versioning systems
• Remove commented-out code from production code
• Don’t add bylines
CSCI 3350
Lecture 10 - 31
Guidelines for Formatting
• Code formatting is important
– Format as you write the code, not as a cleanup
operation
• Remember the PARC Design Principle
• Indentation
– Source code is a hierarchy
• Use consistent indentation to reflect the hierarchy
• Don’t violate indentation – ever – not even for short
functions / methods
CSCI 3350
Lecture 10 - 32
Formatting (cont)
• Intra-line white space
– Some freedom to improve readability if your
editor / IDE doesn’t insist upon removing
“extraneous”spaces
– Consider the following
root1 = (-b+sqrt(b*b-4ac))/(2*a)
– Versus
root2 = (-b - sqrt(b*b – 4*a*c))/(2*a)
CSCI 3350
Lecture 10 - 33
Miscellaneous Practices
• Eschew literal constants for symbolic
constants
–
–
–
–
Higher informational content
Easier to read
Easier to maintain
More readily searchable
CSCI 3350
Lecture 10 - 34
Miscellaneous (cont)
• Layout
– Use the block separators consistently
• K&R
• Allman
• Whitesmith
– One statement per line
– Use parenthesis to eliminate misunderstanding
• Order of precedence
– Break complex expressions into simpler ones
CSCI 3350
Lecture 10 - 35
Miscellaneous (cont)
• Strive for clearness not cleverness
– Be concise, but not at the expense of readability
• Be aware of side effects
– Some languages have operators that
• Return a value
• Modify the internal state of an item
• Do not specify the exact order of execution
CSCI 3350
Lecture 10 - 36
Miscellaneous (cont)
• Idioms
– Definition - an expression that has a meaning
not readily understood from the meaning of the
individual words
– A central issue in learning any language is to
absorb and use the idioms
– Example
• “Burf is a student after my own heart”
• Array idioms (code patterns)
• List walking
CSCI 3350
Lecture 10 - 37
Coding Standards
• Purpose is to define the practices that make
the life of the development and maintenance
programmers easier
• Records, documents and clarifies the set of
best programming practices that will be
used by the
– Organization
– Team
– Project
CSCI 3350
Lecture 10 - 38
Recall The Distinction
• Error - A discrepancy between an actual value
and a expected value
• Failure - Inability for the system to perform
according to specifications
• Fault - A condition that causes the system to fail
• If an error is observed, then a failure must have
occurred
• If a failure has occurred, then there must be a fault
in the system
CSCI 3350
Lecture 10 - 39
Implementation Metrics
• Code complexity metrics
– Lines of code
• Assumes a constant probability that a line of code
contains a fault
• More lines of code => more faults
• A number of studies have shown a correlation
between the number of faults and the size of the
application
CSCI 3350
Lecture 10 - 40
Implementation Metrics (cont)
– McCabe’s cyclomatic complexity M
• M = number of binary decision + 1
• A measure of the number of branches in the code
• Recall white-box testing coverage criteria
– M can be used as a measure of the number of test cases for
branch coverage
CSCI 3350
Lecture 10 - 41
Implementation Metrics (cont)
• Advantages
– Almost as easy to calculate as lines of code
– Studies show a good correlation between M and number
of faults
• Disadvantages
– M correlates strongly with lines of code
– There may be little additional value over lines of code
CSCI 3350
Lecture 10 - 42
Implementation Metrics (cont)
• Testing metrics
– Number of tests
• McCabe’s M a good measure for number of tests for branch
coverage
– Total number of faults
• Exceeding a threshold triggers rewrite of a “chunk” of code
– Number of faults by faulty type
• Use of the types of faults to generate checklists for nonexecution based testing
CSCI 3350
Lecture 10 - 43
Origins of Bad Software
• Graff and van Wyk cite three factors
– Technical
– Psychological
– Real world
• Probably not due to
– Ignorance
– Stupidity
– Laziness
CSCI 3350
Lecture 10 - 44
Technical Factors
• Secure software is intrinsically difficult to
write
– Complexity
• Composition
– System composed of multiple separate
components
– Each component standing alone is secure
– Combination introduces a vulnerability
CSCI 3350
Lecture 10 - 45
Psychological Factors
• Software professionals make mistakes
• Even when examining software for vulnerabilities,
– Tend to discover only faults
• That we are looking for
• That we understand
• That we know how to fix
• Most people find it hard to
– Assume that the “other guy” is a “bad guy”
• We are too willing to trust others
– Adopt a different view of the software
CSCI 3350
Lecture 10 - 46
Different Views of Software
• Software developers frequently employ
mental models
– When viewed only from the mental model,
potential vulnerabilities are not apparent
• The bad guy is successful in his attack by
adopting a different mental model
CSCI 3350
Lecture 10 - 47
Mouse Attack
• An attacker was able to gain control of a
Unix system by abusing a mouse driver
• Purpose of the driver was to position the
cursor at a specified screen location
• Since the driver needed to interact with the
display hardware it was installed with high
privileges
• Driver worked error-free for years
• Until …
CSCI 3350
Lecture 10 - 48
Mouse Attack (cont)
• An attacker directly called the driver with very large values
for the screen coordinates
– Internal memory of the driver was overwritten
– Allowing the attacker to gain control of the system
• The attacker was successful by
– Ignoring the mental model for the driver
– Concentrating on the code bytes
• Developer’s mental model did not admit the possibility for
the driver being directly called by an application program
• The ability to ignore the mental model is
– A hard skill for developers to cultivate
– An essential skill for locating vulnerabilities
CSCI 3350
Lecture 10 - 49
Some Different Views of the System
• An ordered set of algorithms
• Lines of text on the screen
• An ordered set of instructions for a specific
processor
• A series of bits ( 0 | 1 )
– In memory
– On magnetic disk
– On optical media
CSCI 3350
Lecture 10 - 50
Some Different Views (cont)
• An ordered set of linked libraries, other
components
• A stream of bits along various pathways
• Executing on a host as a part of a network
• A set of vertical layers ( transport, protocol,
presentation, … )
• A ordered set of events, with critical timing
intervals
CSCI 3350
Lecture 10 - 51
Real World Factors
• Source of essential source code
– Much was written by “amateurs”
• The architecture and design decisions for the TCP/IP network
subsystem
– Developed by Berkeley undergraduates
• Much of the code for Internet applications written
by people without any software engineering
training
– Web pages, scripts, …
• A phenomenon know as “democratization of
development”
CSCI 3350
Lecture 10 - 52
Real World Factors (cont)
• But, the real software professionals are
responsible for most of the problems
– Even with
• Extensive training
• Awareness of the critical issues
• The best of intension
Developing secure software is one of the
most challenging activities imaginable
CSCI 3350
Lecture 10 - 53
Real World Factors (cont)
• Production pressures
• Just secure enough
– As little as possible, just enough to prevent loss of sales
and avoid bad publicity
– Resources spent on security mean fewer features in the
next release
– By not acknowledging security problems, don’t have to
deal with them
• Tragedy of the commons
– Garrett Hardin, 1988
• Pasture land commonly shared by many herdsmen
– Likewise the common shared Internet
CSCI 3350
Lecture 10 - 54
Focus for the Rest of the Lecture
• Although as mentioned earlier, security issues in
the
– Architecture
– Design
Are of equal or perhaps greater importance,
• These issues are the focus of software engineering
• For the remainder of the lecture, we will
concentrate on coding
– In particular, on buffer overflows
CSCI 3350
Lecture 10 - 55
Buffer Overflow Background
• Buffer overflows are arguably the most common
form of attack
• First well-known buffer overflow attack
– The Internet Worm, written and released by Robert T.
Morris in 1988,
– Infected thousands of systems on the Internet
– Exploited a buffer overflow in the finger daemon
• In 1999, Brian Snow predicted that buffer
overflow attacks would still be a problem 20 years
hence
CSCI 3350
Lecture 10 - 56
Process Memory Image
• Text = code
• Data = unintialized and
initialized data
• Heap - allocated by new
• Stack - local variables,
frame
• Environment = PATH,
HOME, …
Text
Data
Increasing Address
DLLs
Heap
Stack
Command Line Parms
Environment
CSCI 3350
Lecture 10 - 57
Structure of the Stack Frame
• For each function call,
certain data is placed
on the stack
Function Local Variables
Return Address
Function Parameters
Caller Stack Frame
CSCI 3350
Decreasing Addr
Lecture 10 - 58
Local Variable Overflow
• Normally, if a local variable overflows
– The data on the stack is “clobbered”
– When the function attempts to return
• The process crashes
• If however, a “bad guy” carefully crafts the data
that overflows
– Replaces the return address with a valid address that
contains code that the “bad guy” wants to execute
• For excruciating details see
– Smashing the Stack for Fun and Profit
CSCI 3350
Lecture 10 - 59
Preventing Overflow
• Many languages perform bounds checks on arrays
and strings to prevent overflow
• Not so, C, C++
• Main offenders
– strcpy, strcat
– sprintf
– scanf,
– gets
And all their sisters, and their cousins, and their aunts
CSCI 3350
Lecture 10 - 60
Preventing Overflow (cont)
• A minor improvement (for C programmers)
– Use strncpy, strncat
– But these are not without problems
• strncpy( source, destination, len );
• If source contains more characters than specified by len,
– No terminating null is place at the end of source
– Better choice
• strlcpy, strlcat - available on Darwin, FreeBSD, OpenBSD
• Freeware source code versions available for down load
– Heavy-duty libraries
• SafeStr
CSCI 3350
Lecture 10 - 61
Preventing Overflow (cont)
• With C++
– Whenever possible use string class
• Overflows still possible if you use [ ]
• If you need a c-style string for system call, recall a
member function exists for that purpose c_str( )
– Some better classes available e.g. Boost library
• rope class
CSCI 3350
Lecture 10 - 62
Stack Protection by Compiler
• Some compilers use a “canary” to detect
stack overflows
– Place an unpredictable value on the stack, prior
to the return address
– Prior to using the return address, check to see if
the canary has be overwritten
• If so - abort, throw an exception, …
• StackGuard, propolice, Stack Shield, MS
/GS switch
CSCI 3350
Lecture 10 - 63
Stack Protection by Compiler (cont)
• However, workarounds now exist
– http://www.coresecurity.com/files/files/11/Stac
kguardPaper.pdf
– http://www.phrack.org/phrack/56/p56-0x05
CSCI 3350
Lecture 10 - 64
Heap Smashing Attacks
• Possible in theory; difficult, but not
impossible in practice
– Attacker has to identify security critical
variables (akin to the criticality of the return
address on the stack)
• Difficult without source code
– Attacker has to find a buffer to overflow to
rewrite the critical variable
CSCI 3350
Lecture 10 - 65
Guiding Principles for Software Security
•
From Viega and McGraw
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Secure the weakest link
Practice defense in depth
Fail securely
Follow the principle of least privilege
Compartmentalize
Keep it simple
Promote privacy
Remember hiding secrets is hard
Be reluctant to trust
Use your community resources
CSCI 3350
Lecture 10 - 66
Secure the Weakest Link
• Example - physical security
– Attacker take the path of least resistance
• Approach
– List vulnerabilities by process area
– Assign weakness ranking
– Iteratively address the vulnerabilities weakest
first
CSCI 3350
Lecture 10 - 67
Practice Defense in Depth
• Example – perimeter defense
– Originated as military concept
• Approach (NSA)
– Identify potential adversaries, motivations and
classes of attack
– List common classes of attack
– Build the in-depth desfense by the common
classes
CSCI 3350
Lecture 10 - 68
Fail Securely
• Example – buffer overflow detected by
canary
• Approach
– Identify key checkpoint
– Explore what happens if checkpoint fails
CSCI 3350
Lecture 10 - 69
Follow Principle of Least Privilege
• Example – Personnel security clearance
• Approach
– Inventory privileges needed for operations
– Review and restrict to minimum privilege
necessary to carry out the assignment
CSCI 3350
Lecture 10 - 70
Compartmentalize
• Example – Submarines are built with
sealable compartments
• Approach
– List security components
– Determine coupling between components
– Reduce couplings to the minimum need to carry
out assignment
CSCI 3350
Lecture 10 - 71
Keep It Simple
• Example – Only need to dial 3 digits for
emergency help
• Approach
– Reuse of code
– Introduce common chokepoints
CSCI 3350
Lecture 10 - 72
Promote Privacy
• Example – Cookies used only with user
permission
• Approach
– Compile list of basic system components
– Identify information revealed
• User
• System / Server identification withheld
CSCI 3350
Lecture 10 - 73
Hiding Secrets is Hard
• Example – How quickly have various
“protections” been broken, DeCSS → CSS
• Approach
–
–
–
–
Identify “secrets” present in the system
Identify adversaries
Assess risk
Address risk
CSCI 3350
Lecture 10 - 74
Be Reluctant to Trust
• Example – Social engineering ala Kevin
Mitnick
• Approach
– Identify trust relations in system
• Individuals
• Other systems
– Log interactions with trustee
– Review log
CSCI 3350
Lecture 10 - 75
Use Your Community Resources
• Example – Use encryption techniques that
are peer reviewed and widely use
• Approach
– Become aware of resources - NIST, SANS,
USCERT, CERIAS, Schneier on Security, ..
– Regularly monitor your resources
– Consult resource when your situation changes
CSCI 3350
Lecture 10 - 76
References
• Any of Henry Ledgard “Proverbs” series
• Robert Martin, Clean Code, Prentice Hall,
2009, ISBN 0-13-235088-2.
• Brian W. Kernighan and Rob Pike, The
Practice of Programming, Addison-Wesley,
1999, ISBN 0-201-61586-X.
• Brian Snow, Future of Security, Panel
presentation at IEEE Security and Privacy,
May 1999.
CSCI 3350
Lecture 10 - 77
References
• John Viega and Gary McGraw, Building
Secure Software, Addison-Wesley, 2003.
• John Viega and Matt Messier, Secure
Programming Cookbook, O’Reilly, 2003.
• Mark Graff and Kenneth R. vanWyk,
Secure Coding, O’Reilly, 2003.
• Aleph One, Smashing the Stack for Fun and
Profit, Phrack 49,
http://phrack.org/show.php?p=49&a=14.
CSCI 3350
Lecture 10 - 78
Download