Lecture 10 Implementation CSCI – 3350 Software Engineering II Fall 2014 Bill Pine Overview • • • • • The Implementation Workflow Choosing a Programming Language Good Programming Practices Coding Standards Metrics for the Implementation Workflow CSCI 3350 Lecture 10 - 2 Overview (cont) • • • • Secure Coding Background Buffer overflow attack Strategies to reduce vulnerability Guiding Principles for Software Security CSCI 3350 Lecture 10 - 3 Implementation Workflow • Goal: Clearly and accurately represent the detailed application design in the chosen implementation language – Define the unit tests – Write the code – Execute the unit test suite • Resolve any discrepancies – Submit to QA group for further evaluation CSCI 3350 Lecture 10 - 4 Choosing a Programming Language • Specified directly as a requirement • Specified indirectly as a requirement – Platform specified • If there is an opportunity for choice – “Most appropriate language” requirement – Or you may be driven by an object-oriented design and implementation requirement CSCI 3350 Lecture 10 - 5 Taking a Decision • Base upon – Cost benefit analysis – Risk analysis • Use the language strength of the organization – Procedural vs. object-oriented • Pure object-oriented • Hybrid • Acquiring needed skills issue – Hire new talent – Retrain existing employees – A mixture? CSCI 3350 Lecture 10 - 6 X- Generation Language • First-generation languages – Machine code • Second-generation language – Assembly • Third-generation language – High order language • Multiple (5-10) machine instructions/source line • Examples: FORTRAN, C, COBOL … CSCI 3350 Lecture 10 - 7 Fourth-Generation Language • Application-specific language • Original goal was 25 - 50 mi/source line – Database based • • • • PowerBuilder Oracle DB2 Report generators – Mathematics based • Mathmatica • SPSS CSCI 3350 Lecture 10 - 8 Good Programming Practices • Many best practices tend to be language specific – Some authors have made a career of adapting for each new language – Example: Henry Ledgard - authored > 25 works • • • • • • • Programming Proverbs Programming Proverbs and Principles Programming Proverbs for FORTRAN Programmers FORTRAN with Style: Programming proverbs Pascal with Style: Programming proverbs Pascal with Excellence: Programming proverbs Programming Language Landscape • Some general practices cut across specific languages CSCI 3350 Lecture 10 - 9 Best Coding Practices • The following slides on best practices draw heavily upon – Clean Code – full citation in reference • Agile Development Community is the origin of the Best Coding Practices • You must devote effort to writing and maintaining quality code – As the code deteriorates, so decreases team productivity – Decreasing productivity, causes less effort to be expended in maintaining code quality. Leading to lower productivity … – A positive feedback loop that is inherently unstable Best Coding Practices (cont) Writing clean code is what you must do in order to call yourself a professional. There is no reasonable excuse for doing anything less than your best. - Robert Martin Best Programming Practices • Will examine guidelines relating to the following areas – – – – Identifier names Functions Comments Formatting CSCI 3350 Lecture 10 - 12 Guidelines for Identifier Names • Meaning must be obvious to the maintenance programmer – Maximize communications • Use intention-revealing names – Much harder than it seems – Accept that the name will probably change as you are developing CSCI 3350 Lecture 10 - 13 General Practices • General Guidelines – Variables should be nouns (noun phrases) – Class and objects names should be nouns (noun phrases – Function and method names should be verbs (verb phrases) – Kernighan and Pike assert: “Long names for global identifies; short names for local identifiers” CSCI 3350 Lecture 10 - 14 Identifier Names (cont) • Identifier name should answer the questions – Why does the entity exist? – What does the entity do? – How is the entity used? • If the identifier name requires a comment to answer these questions – The name does not reveal the identifier’s intent and needs to be changed CSCI 3350 Lecture 10 - 15 Identifier Names (cont) • Avoid Disinformation – Use a difference in identifier only when you are making a meaningful distinction • Example: fetch, get, retrieve or controller, manager, driver – Don’t use lower case L or upper case O as variable name – Don’t use noise words • a, an, the - as prefixes without a convention • info, data – as suffixes • nameString instead of name? CSCI 3350 Lecture 10 - 16 Contrived (?) Example int a = 1; if( 1 == O1 ) a = Ol else a = 01 CSCI 3350 Lecture 10 - 17 Identifier Names (cont) • Use pronounceable names • Use searchable names – One or two letter variable names and literal constants yield too many matches • Avoid encoding the identifier type in the name – In particular, eschew Hungarian notation • No help with strongly typed languages • Allow for misleading information if the type changes – But the encoding doesn’t CSCI 3350 Lecture 10 - 18 Identifier Names (cont) • Avoid mental mappings – Short names / heavily abbreviated names require the reader to translate • Avoid cute names • Prefer solution domain names over problem domain names – Who is the reading audience of your code? • Prefer problem domain names over informal names CSCI 3350 Lecture 10 - 19 Identifier Names (cont) • Don’t add gratuitous context to identifier names – Add context only as necessary • accountAddress and customerAddress may be appropriate for instances of a class • But not appropriate for a class name – Address would be a better choice CSCI 3350 Lecture 10 - 20 Identifier Names (cont) • Final comments – Poor names • Impede communications between the code author and the code reader • Have been shown to be an indicator of overall poor code quality – Indicate a less than complete understanding by the author – Point to likely areas for code faults CSCI 3350 Lecture 10 - 21 Guidelines for Functions • First rule of functions – A function should be small • Second rule of functions – A function should be smaller than would be produced by rule 1 – Try for an average of 20 lines / function – Indent depth should should be 1 or 2 levels CSCI 3350 Lecture 10 - 22 Functions (cont) • Functions should do 1 thing – They should do it well – They should do that 1 thing only – All steps in the function should be at the same level of abstraction • Principle of Least Surprise – Based upon the function name, the code in the function is what you would expect CSCI 3350 Lecture 10 - 23 Functions (cont) • The ideal number of arguments is zero – Niladic • Followed by – 1 argument – Monadic – 2 arguments – Dyadic – 3 arguments – Triadic • Any more than 3 requires compelling justification CSCI 3350 Lecture 10 - 24 Functions (cont) • Why restrict the the number of arguments? – An increasing number of argument requires increasing conceptual power – Harder to test and requires an increasing number of tests • Eschew flag arguments – Indicate that a function is doing more than 1 thing • Functions should have no side effects CSCI 3350 Lecture 10 - 25 Functions (cont) • Avoid output arguments – The reader’s expectation is that an argument is an input – Prior to object oriented programming, one could justify the use of output arguments • With o-o, instead of having a function return a value through an argument, the function should change the state of the appropriate object CSCI 3350 Lecture 10 - 26 Functions (cont) • Functions should change the state of an object or return the state of an objects – never both • Prefer exceptions over returning error codes • Never duplicate code (i.e. copy &paste) – Code bloat – Multiple places to change the code => multiple places for faults to be injected CSCI 3350 Lecture 10 - 27 Guidelines for Comments • Myth of “self-documenting” code – Goal: The code should not need comments to make clear the “how” of the code – Always need internal documentation • To meet the need of making clear the “why” • Block comments at the beginning of each unit • Comments interspersed (as needed) within the unit CSCI 3350 Lecture 10 - 28 Comments (cont) • The previous slide not withstanding, which code would you rather read? • Version 1 // // // *** Check if employee is eligible for benefits if( (employee.flags & HOURLY_FLAG) && (employee.age > 55 ) ) • Version 2 if( employee.isEligibleForFullBenefits( ) ) CSCI 3350 Lecture 10 - 29 Comments (cont) • Additional thoughts on comments – Don’t comment the obvious – Don’t use end-of-line comments with highorder languages – Format of the comments should reflect and reinforce the structure of the code – Comments must be accurate • Agree with the code and support reading the code CSCI 3350 Lecture 10 - 30 Comments (cont) – Don’t comment closing braces – Don’t use comments as a substitute for source code versioning systems • Remove commented-out code from production code • Don’t add bylines CSCI 3350 Lecture 10 - 31 Guidelines for Formatting • Code formatting is important – Format as you write the code, not as a cleanup operation • Remember the PARC Design Principle • Indentation – Source code is a hierarchy • Use consistent indentation to reflect the hierarchy • Don’t violate indentation – ever – not even for short functions / methods CSCI 3350 Lecture 10 - 32 Formatting (cont) • Intra-line white space – Some freedom to improve readability if your editor / IDE doesn’t insist upon removing “extraneous”spaces – Consider the following root1 = (-b+sqrt(b*b-4ac))/(2*a) – Versus root2 = (-b - sqrt(b*b – 4*a*c))/(2*a) CSCI 3350 Lecture 10 - 33 Miscellaneous Practices • Eschew literal constants for symbolic constants – – – – Higher informational content Easier to read Easier to maintain More readily searchable CSCI 3350 Lecture 10 - 34 Miscellaneous (cont) • Layout – Use the block separators consistently • K&R • Allman • Whitesmith – One statement per line – Use parenthesis to eliminate misunderstanding • Order of precedence – Break complex expressions into simpler ones CSCI 3350 Lecture 10 - 35 Miscellaneous (cont) • Strive for clearness not cleverness – Be concise, but not at the expense of readability • Be aware of side effects – Some languages have operators that • Return a value • Modify the internal state of an item • Do not specify the exact order of execution CSCI 3350 Lecture 10 - 36 Miscellaneous (cont) • Idioms – Definition - an expression that has a meaning not readily understood from the meaning of the individual words – A central issue in learning any language is to absorb and use the idioms – Example • “Burf is a student after my own heart” • Array idioms (code patterns) • List walking CSCI 3350 Lecture 10 - 37 Coding Standards • Purpose is to define the practices that make the life of the development and maintenance programmers easier • Records, documents and clarifies the set of best programming practices that will be used by the – Organization – Team – Project CSCI 3350 Lecture 10 - 38 Recall The Distinction • Error - A discrepancy between an actual value and a expected value • Failure - Inability for the system to perform according to specifications • Fault - A condition that causes the system to fail • If an error is observed, then a failure must have occurred • If a failure has occurred, then there must be a fault in the system CSCI 3350 Lecture 10 - 39 Implementation Metrics • Code complexity metrics – Lines of code • Assumes a constant probability that a line of code contains a fault • More lines of code => more faults • A number of studies have shown a correlation between the number of faults and the size of the application CSCI 3350 Lecture 10 - 40 Implementation Metrics (cont) – McCabe’s cyclomatic complexity M • M = number of binary decision + 1 • A measure of the number of branches in the code • Recall white-box testing coverage criteria – M can be used as a measure of the number of test cases for branch coverage CSCI 3350 Lecture 10 - 41 Implementation Metrics (cont) • Advantages – Almost as easy to calculate as lines of code – Studies show a good correlation between M and number of faults • Disadvantages – M correlates strongly with lines of code – There may be little additional value over lines of code CSCI 3350 Lecture 10 - 42 Implementation Metrics (cont) • Testing metrics – Number of tests • McCabe’s M a good measure for number of tests for branch coverage – Total number of faults • Exceeding a threshold triggers rewrite of a “chunk” of code – Number of faults by faulty type • Use of the types of faults to generate checklists for nonexecution based testing CSCI 3350 Lecture 10 - 43 Origins of Bad Software • Graff and van Wyk cite three factors – Technical – Psychological – Real world • Probably not due to – Ignorance – Stupidity – Laziness CSCI 3350 Lecture 10 - 44 Technical Factors • Secure software is intrinsically difficult to write – Complexity • Composition – System composed of multiple separate components – Each component standing alone is secure – Combination introduces a vulnerability CSCI 3350 Lecture 10 - 45 Psychological Factors • Software professionals make mistakes • Even when examining software for vulnerabilities, – Tend to discover only faults • That we are looking for • That we understand • That we know how to fix • Most people find it hard to – Assume that the “other guy” is a “bad guy” • We are too willing to trust others – Adopt a different view of the software CSCI 3350 Lecture 10 - 46 Different Views of Software • Software developers frequently employ mental models – When viewed only from the mental model, potential vulnerabilities are not apparent • The bad guy is successful in his attack by adopting a different mental model CSCI 3350 Lecture 10 - 47 Mouse Attack • An attacker was able to gain control of a Unix system by abusing a mouse driver • Purpose of the driver was to position the cursor at a specified screen location • Since the driver needed to interact with the display hardware it was installed with high privileges • Driver worked error-free for years • Until … CSCI 3350 Lecture 10 - 48 Mouse Attack (cont) • An attacker directly called the driver with very large values for the screen coordinates – Internal memory of the driver was overwritten – Allowing the attacker to gain control of the system • The attacker was successful by – Ignoring the mental model for the driver – Concentrating on the code bytes • Developer’s mental model did not admit the possibility for the driver being directly called by an application program • The ability to ignore the mental model is – A hard skill for developers to cultivate – An essential skill for locating vulnerabilities CSCI 3350 Lecture 10 - 49 Some Different Views of the System • An ordered set of algorithms • Lines of text on the screen • An ordered set of instructions for a specific processor • A series of bits ( 0 | 1 ) – In memory – On magnetic disk – On optical media CSCI 3350 Lecture 10 - 50 Some Different Views (cont) • An ordered set of linked libraries, other components • A stream of bits along various pathways • Executing on a host as a part of a network • A set of vertical layers ( transport, protocol, presentation, … ) • A ordered set of events, with critical timing intervals CSCI 3350 Lecture 10 - 51 Real World Factors • Source of essential source code – Much was written by “amateurs” • The architecture and design decisions for the TCP/IP network subsystem – Developed by Berkeley undergraduates • Much of the code for Internet applications written by people without any software engineering training – Web pages, scripts, … • A phenomenon know as “democratization of development” CSCI 3350 Lecture 10 - 52 Real World Factors (cont) • But, the real software professionals are responsible for most of the problems – Even with • Extensive training • Awareness of the critical issues • The best of intension Developing secure software is one of the most challenging activities imaginable CSCI 3350 Lecture 10 - 53 Real World Factors (cont) • Production pressures • Just secure enough – As little as possible, just enough to prevent loss of sales and avoid bad publicity – Resources spent on security mean fewer features in the next release – By not acknowledging security problems, don’t have to deal with them • Tragedy of the commons – Garrett Hardin, 1988 • Pasture land commonly shared by many herdsmen – Likewise the common shared Internet CSCI 3350 Lecture 10 - 54 Focus for the Rest of the Lecture • Although as mentioned earlier, security issues in the – Architecture – Design Are of equal or perhaps greater importance, • These issues are the focus of software engineering • For the remainder of the lecture, we will concentrate on coding – In particular, on buffer overflows CSCI 3350 Lecture 10 - 55 Buffer Overflow Background • Buffer overflows are arguably the most common form of attack • First well-known buffer overflow attack – The Internet Worm, written and released by Robert T. Morris in 1988, – Infected thousands of systems on the Internet – Exploited a buffer overflow in the finger daemon • In 1999, Brian Snow predicted that buffer overflow attacks would still be a problem 20 years hence CSCI 3350 Lecture 10 - 56 Process Memory Image • Text = code • Data = unintialized and initialized data • Heap - allocated by new • Stack - local variables, frame • Environment = PATH, HOME, … Text Data Increasing Address DLLs Heap Stack Command Line Parms Environment CSCI 3350 Lecture 10 - 57 Structure of the Stack Frame • For each function call, certain data is placed on the stack Function Local Variables Return Address Function Parameters Caller Stack Frame CSCI 3350 Decreasing Addr Lecture 10 - 58 Local Variable Overflow • Normally, if a local variable overflows – The data on the stack is “clobbered” – When the function attempts to return • The process crashes • If however, a “bad guy” carefully crafts the data that overflows – Replaces the return address with a valid address that contains code that the “bad guy” wants to execute • For excruciating details see – Smashing the Stack for Fun and Profit CSCI 3350 Lecture 10 - 59 Preventing Overflow • Many languages perform bounds checks on arrays and strings to prevent overflow • Not so, C, C++ • Main offenders – strcpy, strcat – sprintf – scanf, – gets And all their sisters, and their cousins, and their aunts CSCI 3350 Lecture 10 - 60 Preventing Overflow (cont) • A minor improvement (for C programmers) – Use strncpy, strncat – But these are not without problems • strncpy( source, destination, len ); • If source contains more characters than specified by len, – No terminating null is place at the end of source – Better choice • strlcpy, strlcat - available on Darwin, FreeBSD, OpenBSD • Freeware source code versions available for down load – Heavy-duty libraries • SafeStr CSCI 3350 Lecture 10 - 61 Preventing Overflow (cont) • With C++ – Whenever possible use string class • Overflows still possible if you use [ ] • If you need a c-style string for system call, recall a member function exists for that purpose c_str( ) – Some better classes available e.g. Boost library • rope class CSCI 3350 Lecture 10 - 62 Stack Protection by Compiler • Some compilers use a “canary” to detect stack overflows – Place an unpredictable value on the stack, prior to the return address – Prior to using the return address, check to see if the canary has be overwritten • If so - abort, throw an exception, … • StackGuard, propolice, Stack Shield, MS /GS switch CSCI 3350 Lecture 10 - 63 Stack Protection by Compiler (cont) • However, workarounds now exist – http://www.coresecurity.com/files/files/11/Stac kguardPaper.pdf – http://www.phrack.org/phrack/56/p56-0x05 CSCI 3350 Lecture 10 - 64 Heap Smashing Attacks • Possible in theory; difficult, but not impossible in practice – Attacker has to identify security critical variables (akin to the criticality of the return address on the stack) • Difficult without source code – Attacker has to find a buffer to overflow to rewrite the critical variable CSCI 3350 Lecture 10 - 65 Guiding Principles for Software Security • From Viega and McGraw 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Secure the weakest link Practice defense in depth Fail securely Follow the principle of least privilege Compartmentalize Keep it simple Promote privacy Remember hiding secrets is hard Be reluctant to trust Use your community resources CSCI 3350 Lecture 10 - 66 Secure the Weakest Link • Example - physical security – Attacker take the path of least resistance • Approach – List vulnerabilities by process area – Assign weakness ranking – Iteratively address the vulnerabilities weakest first CSCI 3350 Lecture 10 - 67 Practice Defense in Depth • Example – perimeter defense – Originated as military concept • Approach (NSA) – Identify potential adversaries, motivations and classes of attack – List common classes of attack – Build the in-depth desfense by the common classes CSCI 3350 Lecture 10 - 68 Fail Securely • Example – buffer overflow detected by canary • Approach – Identify key checkpoint – Explore what happens if checkpoint fails CSCI 3350 Lecture 10 - 69 Follow Principle of Least Privilege • Example – Personnel security clearance • Approach – Inventory privileges needed for operations – Review and restrict to minimum privilege necessary to carry out the assignment CSCI 3350 Lecture 10 - 70 Compartmentalize • Example – Submarines are built with sealable compartments • Approach – List security components – Determine coupling between components – Reduce couplings to the minimum need to carry out assignment CSCI 3350 Lecture 10 - 71 Keep It Simple • Example – Only need to dial 3 digits for emergency help • Approach – Reuse of code – Introduce common chokepoints CSCI 3350 Lecture 10 - 72 Promote Privacy • Example – Cookies used only with user permission • Approach – Compile list of basic system components – Identify information revealed • User • System / Server identification withheld CSCI 3350 Lecture 10 - 73 Hiding Secrets is Hard • Example – How quickly have various “protections” been broken, DeCSS → CSS • Approach – – – – Identify “secrets” present in the system Identify adversaries Assess risk Address risk CSCI 3350 Lecture 10 - 74 Be Reluctant to Trust • Example – Social engineering ala Kevin Mitnick • Approach – Identify trust relations in system • Individuals • Other systems – Log interactions with trustee – Review log CSCI 3350 Lecture 10 - 75 Use Your Community Resources • Example – Use encryption techniques that are peer reviewed and widely use • Approach – Become aware of resources - NIST, SANS, USCERT, CERIAS, Schneier on Security, .. – Regularly monitor your resources – Consult resource when your situation changes CSCI 3350 Lecture 10 - 76 References • Any of Henry Ledgard “Proverbs” series • Robert Martin, Clean Code, Prentice Hall, 2009, ISBN 0-13-235088-2. • Brian W. Kernighan and Rob Pike, The Practice of Programming, Addison-Wesley, 1999, ISBN 0-201-61586-X. • Brian Snow, Future of Security, Panel presentation at IEEE Security and Privacy, May 1999. CSCI 3350 Lecture 10 - 77 References • John Viega and Gary McGraw, Building Secure Software, Addison-Wesley, 2003. • John Viega and Matt Messier, Secure Programming Cookbook, O’Reilly, 2003. • Mark Graff and Kenneth R. vanWyk, Secure Coding, O’Reilly, 2003. • Aleph One, Smashing the Stack for Fun and Profit, Phrack 49, http://phrack.org/show.php?p=49&a=14. CSCI 3350 Lecture 10 - 78