50.530: Software Engineering Sun Jun SUTD Week 1: Introduction ABOUT THIS COURSE sunjun@sutd.edu.sg level 3, room 9 Facebook: sunjunhqq weChat: sunjunProf Course Communication • All class materials are on the course website – Lecture slides – Course project • Q&A – Email/WeChat/Facebook • sunjun@sutd.edu.sg • WeChat: sunjunprof • Facebook: sunjunhqq Course Structure • Cohort class – every Monday 10-12; every Tuesday 10-11 • Course Project: (50%) • Problem Sets: (20%) • Final Exam: Dec 15 (30%) INTRODUCTION TO SOFTWARE ENGINEERING Software Engineering User Requirements the magical programming machine System Implementation ***The synthesis problem (i.e., synthesizing a program from a specification automatically) is undecidable Software Engineering User Requirements The species we called programmers System Implementation A Programmer’s Life Staged Approach Are we getting the right requirements? User Requirements Specification is equivalent to requirements? System Specification Design satisfies the specification? The design is correctly implemented? System Design System Implementation ***The verification problem (i.e., verifying whether a program satisfies certain property) is undecidable too – but easier than the synthesis problem. Requirements • During the requirements workflow, the primary activities include – Listing candidate requirements – Understanding the system context through domain modelling and business modelling – Capturing functional as well as non-functional Requirements • Requirements should be captured in the language of the user. – Use cases help distil the essence of requirements as sets of action-response transactions between the user and the system. 12 Requirements 13 Analysis • A key theme of the analysis workflow is to understand how and where requirements interact and what it means for the system. • Analysis also involves – Detecting and removing ambiguities and inconsistencies amongst requirements – Developing an internal view of the system – Identifying the analysis classes and their collaborations • Analysis classes are preliminary placeholders of functionality 14 Analysis 15 Related Research • Proposing formal specification languages – The Z language, VDM, the B language, etc. – CSP, CCS, etc. • Providing facilities for programmers to write specification – Java modeling language So far nothing has been working. Design • Deciding on the collaboration between components lies at the heart of software design. – A component fulfils its own responsibility through the code it contains. – A component exchanges information by calling methods on other components, or when other components call its own methods. 17 Design • The design workflow involves – Considering specific technologies – Decomposing the system into implementation units, – Engaging in high-level and low-level designs 18 Design 19 Implementation • A large part of implementation is programming. • Implementation also involves – Unit testing – Planning system integrations – Devising the deployment model 20 Implementation 21 Testing • The primary activities of the test workflow include – Creating test cases, – Running test procedures, and analysing test results. • Due to its very nature, testing is never complete. 22 Test 23 Real-World Bugs http://en.wikipedia.org/wiki/List_of_software_bugs Staged Approach Ad Hoc User Requirements Missing System Specification Missing System Design System Implementation BUGGY Research Questions • How do we facilitate users to write the specification? • How do we help users to formally document system designs? People tried and people failed Research Questions • How do we help programmers debugging? • How do we verify a given program? The course is about debugging and verification, and many smaller questions that are related. Exercise 1 • Debug the program here. A Big View the space of all program behaviors The synthesis problem: How do we find a program to cover (part of) A? the behaviors we wanted A Big View the space of all program behaviors The verification problem: Is C empty? A the behaviors we wanted B C the behaviors we have A Big View the space of all program behaviors The Debugging problem: how to find where the problem is and change the program so that C is empty? A the behaviors we wanted B C the behaviors we have COURSE PLANNING Course Outline Date Topic Sep 14/15 Introduction Sep 21/22 Automatic Testing Sep 28/29 Delta Debugging Oct 5/6 Bug Localization Oct 12/13 Specification Mining Oct 19/20 Race Detection Nov 2/3 Research Idea Presentation Nov 9 Hoare Logic and Termination Checking Nov 16/17 Invariant Generation Nov 23/24 Symbolic Execution Nov 30/Dec 1 Software Model Checking Dec 8/9 Assume Guarantee Reasoning Dec 19 Final Exam Remarks Debugging Verification Class Format • I will introduce one or two approaches proposed (for the topic that week) in the literature. – In class exercises will be there • We will discuss: – When the approaches work – When they do not work – How to make them better Project • Pick one of the topics covered in the following 10 classes; • Conduct a survey on related work on that topic; • Propose an improved approach; • Write a research paper; Research Paper • Title/Abstract – catchy, to the point, not too abstract or detailed • Section 1: Introduction – Start with motivation – Explain your approach at a high level intuitively • Section 2: A Running Example – Use an interesting example to illustrate your approach step-by-step • Section 3: Detailed Approach – Explain how each step of the approach is done; highlight the technical challenges and remedies • Section 4: Evaluation – Show evidence on how the proposed approach would work on real-world programs – (Optional) Implementation of your approach • Section 5: Related Work – Survey related work and make a fair comparison with the proposed one Real-world Examples • For debugging, – http://sir.unl.edu/content/sir.php • For verification, – http://sv-comp.sosy-lab.org/2015/ • For some other topics, – http://find-your-own.com Project Due Dec 12 UNDERSTANDING PROGRAMMING Programs p(i) = o program input output Programs Java Programs Bytecode JVM Physical Machine Motivational Example NSA actually intercepted a RSA-encrypted secrete message which tells the location of a terrorist act, we believe that the act is going to happen one week from now, we need your help in decrypting the message. Task: Write a Java program to factor a number as the product of two prime numbers. Task Breakdown • Requirements/Specification – given a semi-prime, your program outputs its prime factors within certain time green: pre-condition red: post-condition purple: non-functional requirement Correctness: pre-condition => post-condition Task Breakdown • Design – Use the trial division method – Read: http://en.wikipedia.org/wiki/Trial_division – More: http://en.wikipedia.org/wiki/Integer_factorization • Implementation – “Enough talk, let’s fight” (Kong Fu Panda) Exercise 2 Write a Java program such that given a semiprime, outputs its prime factors. Hint: You need to use the BigInteger class. FactorPrime.java Task Breakdown • Testing – 4294967297 (famous Fermat Number) – 1127451830576035879 – 160731047637009729259688920385507056726966793490579598495689711866432421212774967029895340327 197901756096014299132623454583177072050452755510701340673282385647899694083881316194642417451 570483466327782135730575564856185546487053034404560063433614723836456790266457438831626375556 854133866958349817172727462462516466898479574402841071703909138062456567624565784254101568378 407242273207660892036869708190688033351601539401621576507964841597205952722487750670904522932 328731530640706457382162644738538813247139315456213401586618820517823576427094125197001270350 087878270889717445401145792231674098948416888868250143592026973853973785120217077951766546939 577520897245392186547279572494177680291506578508962707934879124914880885500726439625033021936 728949277390185399024276547035995915648938170415663757378637207011391538009596833354107737156 273037494727858302028663366296943925008647348769272035532265048049709827275179381252898675965 528510619258376779171030556482884535728812916216625430187039533668677528079544176897647303445 153643525354817413650848544778690688201005274443717680593899 • Verification: how to show it always works? Understanding Sequential Programs “A program consisted of a sequence of instructions (and a memory), where each instruction executed one after the other (to modify the memory, etc.). It ran from start to finish on a single processor.” “The sequential paradigm has the following two characteristics: the textual order of statements specifies their order of execution; successive statements must be executed without any overlap (in time) with one another.” int previousMax; public int max (int[] list) { int max = list[0]; for (int i = 1; i < list.length; i++) { if (max < list[i]) { max = list[i]; } } previousMax = max; return max; } The Illusion int previousMax; 0 0. public int max (int[] list) { 1. int max = list[0]; 2. for (int i = 1; 3. i < list.length; 4. i++) { 5. if (max < list[i]) { 6. max = list[i]; 7. } 8. } list = … 1 max = list[0] 2 9. previousMax = max; 9 10. return max; previous=max 11. } return max 10 4 i >= list.length i++ 11 8 7 3 i < list.length 5 max >= list[i] … i=1 max = list[i] max < list[i] 6 Control Flow Graph memory 0 previousMax … input … list = … 1 max = list[0] 2 9 previous=max return max 10 i >= list.length i++ 4 11 8 7 3 i < list.length 5 max >= list[i] … i=1 max = list[i] max < list[i] 6 System Execution memory 0 previousMax 0… input list = … [2,4] 1 max = list[0] 2 9 previous=max return max 10 i >= list.length i++ 4 11 8 7 3 i < list.length 5 max >= list[i] … i=1 max = list[i] max < list[i] 6 System Execution memory 0 previousMax 0… input [2,4] list [2,4] list = … 1 max = list[0] 2 9 previous=max return max 10 i >= list.length i++ 4 11 8 7 3 i < list.length 5 max >= list[i] … i=1 max = list[i] max < list[i] 6 System Execution memory 0 previousMax 0… input [2,4] list [2,4] max 2 list = … 1 max = list[0] 2 9 previous=max return max 10 i >= list.length i++ 4 11 8 7 3 i < list.length 5 max >= list[i] … i=1 max = list[i] max < list[i] 6 System Execution memory 0 previousMax 0… input [2,4] list [2,4] max 2 i 1 list = … 1 max = list[0] 2 9 previous=max return max 10 i >= list.length i++ 4 11 8 7 3 i < list.length 5 max >= list[i] … i=1 max = list[i] max < list[i] 6 System Execution memory 0 previousMax 0… input [2,4] list [2,4] max 2 i 1 list = … 1 max = list[0] 2 9 previous=max return max 10 i >= list.length i++ 4 11 8 7 3 i < list.length 5 max >= list[i] … i=1 max = list[i] max < list[i] 6 System Execution memory 0 previousMax 0… input [2,4] list [2,4] max 2 i 1 list = … 1 max = list[0] 2 9 previous=max return max 10 i >= list.length i++ 4 11 8 7 3 i < list.length 5 max >= list[i] … i=1 max = list[i] max < list[i] 6 System Execution memory 0 previousMax 0… input [2,4] list [2,4] max 4 i 1 list = … 1 max = list[0] 2 9 previous=max return max 10 i >= list.length i++ 4 11 8 7 3 i < list.length 5 max >= list[i] … i=1 max = list[i] max < list[i] 6 System Execution memory 0 previousMax 0… input [2,4] list [2,4] max 4 i 1 list = … 1 max = list[0] 2 9 previous=max return max 10 i >= list.length i++ 4 11 8 7 3 i < list.length 5 max >= list[i] … i=1 max = list[i] max < list[i] 6 System Execution memory 0 previousMax 0… input [2,4] list [2,4] max 4 i 1 list = … 1 max = list[0] 2 9 previous=max return max 10 i >= list.length i++ 4 11 8 7 3 i < list.length 5 max >= list[i] … i=1 max = list[i] max < list[i] 6 System Execution memory 0 previousMax 0… input [2,4] list [2,4] max 4 i 2 list = … 1 max = list[0] 2 9 previous=max return max 10 i >= list.length i++ 4 11 8 7 3 i < list.length 5 max >= list[i] … i=1 max = list[i] max < list[i] 6 System Execution memory 0 previousMax 0… input [2,4] list [2,4] max 4 i 2 list = … 1 max = list[0] 2 9 previous=max return max 10 i >= list.length i++ 4 11 8 7 3 i < list.length 5 max >= list[i] … i=1 max = list[i] max < list[i] 6 The Trace • With input = [2,4] 0 1 2 3 5 6 10 9 3 4 8 7 11 i … : a configuration of the program with control at line i The Trace • With input = [4,2] 0 1 2 3 5 10 9 3 4 8 11 i … : a configuration of the program with control at line i 7 Sequential Programming is Easy • It is deterministic: with one input, there is one deterministic path through control flow graph input1 input2 input3 input4 input5 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 Testing is to find the ‘right’ input … Concurrent Programs p(i, sc) = o program input output scheduling Concurrency: Benefit • Better resource utilization – With k processors, ideally we can be k times faster, if the task can be broken into k independent pieces and if we ignore the cost of task decomposition and communication between the processors Processor: Read file A Process A Read file B Process B time We can factorize the semi-prime faster with multiple computers or cores Concurrency: Benefit • Better resource utilization – With k processors, ideally we can be k times faster, if the task can be broken into k independent pieces and if we ignore the cost of task decomposition and communication between the processors Processor 2: Read file A Read file B Processor 1: Process A Process B time • Can we get better performance with 1 processor only? Read file A Processor: Read file B Process A time Process B Concurrency: Cost • More complex design, implement, testing, verification public class Holder { Will the exception occur? private int n; public Holder(int n) { this.n = n; } public void assertSanity() { if (n != n) throw new AssertionError("This statement is false."); } } • Overhead in task decomposition, communication, context switch • Increased resource consumption Distributed Systems CPU CPU Memory Memory messages messages … … CPU Memory messages Network • Each process has its own memory and processes communicate through messaging. Multi-core Processors CPU CPU Cache Cache … … CPU Cache Memory • Each thread has its cache and threads communicate through a shared memory. Multi-core Computer: More Like This Multi-Threaded Program • Write a program such that N threads concurrently increment a static variable (initially 0) by 1. Set N to be 2 and see what is the value of the variable after all threads are done. FirstBlood.java Scheduling threads Thread1 Thread2 Thread3 Thread4 Scheduler The scheduler is ‘un-predictable’ Scheduling/Interleaving thread1 thread2 0 0 1 1 2 2 3 3 00 01 02 10 11 12 03 13 20 30 21 22 23 31 32 33 There are exponentially many sequences. Is This Real? Thread1 Thread2 0 0 count++ 1 count++ 1 00 count = 0 count = 1 01 10 count = 1 11 count = 2 This is assuming that count++ is one step. Or is it? Reality is Messy Java Programs Bytecode JVM Physical Machine What are the atomic steps? What are the order of execution? What and where are the variable values? What Really Happened? Thread1 Thread2 0 0 read value of Count and assign it to a register 1 read value of Count and assign it to a register 1 Increment the register 2 Increment the register 2 Write the register value back to Count 3 Write the register value back to Count 3 For double type, even read/write is not atomic! What Really Happened? Thread1 Thread2 0 r2 0 r1 1 01 r2 i2 w2 02 2 2 w1 3 i1 20 w1 30 21 r1 w2 3 11 12 03 r1 10 i2 1 i1 00 r2 13 22 31 i1 i2 23 w1 32 w2 33 What Really Happened? Thread1 Thread2 0 0 r1 1 r1 02 1 2 i2 i2 12 03 2 w1 00 01 r2 i1 3 r2 w2 13 3 23 11 i1 w1 20 w2 30 21 22 33 Is this correct? 10 31 32 count=1 Concurrency is Hard • Heisenbug – is a computer programming jargon term for a software bug that seems to disappear or alter its behavior when one attempts to study it. • How do we find bugs in a multi-threaded program or show that there is no bug? Course Outline Date Topic Remarks Sep 15 Introduction Sep 22 Automatic Testing Sep 29 Delta Debugging Oct 13 Bug Localization Oct 20 Specification Mining Nov 3 Race Detection Nov 10 Hoare Logic and Proving Nov 17 Invariant Generation Nov 24 Symbolic Execution Dec 1 Software Model Checking Dec 8 Assume Guarantee Reasoning Concurrency* Dec 19 Final Exam Project Due Dec 18 Concurrency* Exercise 3 • Write a multi-threaded program to factor semi-prime. Argue that it is correct. FactorThread.java Reading Materials • References: – “Checking a Large Routine” by Turing – “The Humble Programmer” by Dijkstra – “No Silver Bullet: Essence and Accidents of Software Engineering” by Brooks