1 Chapter 1: Introduction to Software Engineering 2 Overview Learning Objectives What is software engineering? Why is software engineering important? 3 By the end of this chapter, you will... Understand what software engineering is. Understand why software engineering is important. Know answers to key questions related to the software engineering discipline. 4 Activity Think about all the devices and systems that you encounter in your everyday life which have software controlling them… List as many as you can Virtually all countries depend on complex computer-based systems. 5 Why is Software Engineering important? Complex systems need a disciplined approach for designing, developing and managing them. 6 7 Software Development Crisis Projects were: • Late • Over budget • Unreliable • Difficult to maintain • Performed poorly. 8 Software errors….the cost Errors in computer software can have devastating effects. 9 Software Crisis Example 1: 2009,Computer glitch delays flights Saturday 3rd October 2009-London, England (CNN) • Dozens of flights from the UK were delayed Saturday after a glitch in an air traffic control system in Scotland, but the problem was fixed a few hours later. • The agency said it reverted to backup equipment as engineering worked on the system. • The problem did not create a safety issue but could cause delays in flights. • Read more at: http://edition.cnn.com/2009/WORLD/europe/10/03/uk.fl ights.delayed 10 Software Crisis Example 2: Ariane 5 Explosion • European Space Agency spent 10 years and $7 billion to produce Ariane 5. • Crash after 36.7 seconds. • Caused by an overflow error. Trying to store a 64-bit number into a 16-bit space. • Watch the video: http://www.youtube.com/watch?v=z-r9cYp3tTE 11 Software Crisis Example 3: 1992, London Ambulance Service • Considered the largest ambulance service in the world. • Overloaded problem. • It was unable to keep track of the ambulances and their statuses. Sending multiple units to some locations and no units to other locations. • Generates many exceptions messages. • 46 deaths. 12 Therefore… A well-disciplined approach to software development and management is necessary. This is called engineering. 13 Software Engineering The term software engineering first appeared in the 1968 NATO Software Engineering Conference and was meant to provoke thought regarding what was then called the “software crisis”.. “.. An engineering discipline that is concerned with all aspects of software production from the early stages of system specification to maintaining the system after it has gone into use.” Sommerville, pg.7 14 What is Software? Programs Software System Documentation Data Documentation User Documentation 15 Types of Software • Generic products. • • • Stand-alone systems that are marketed and sold to any customer who wishes to buy them. Examples – PC software such as graphics programs, project management tools; CAD software; software for specific markets such as appointments systems for dentists. The specification of what the software should do is owned by the software developer and decisions on software change are made by the developer. • Customized or bespoke products. • • • Software that is commissioned by a specific customer to meet their own needs. Examples – embedded control systems, air traffic control software, traffic monitoring systems. The specification of what the software should do is owned by the customer for the software and they make decisions on software changes that are required. 16 Software Engineering vs. Computer Science “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra Computer Science • Theory. • Fundamentals. Software Engineering • Practicalities of software design, development and delivery. 17 Software Engineering vs. Systems Engineering Systems Engineering: Interdisciplinary engineering field (computer, software, and process eng.). Focuses on how complex engineering projects should be designed and managed. Systems Engineering • All aspects of computerbased systems development: HW + SW + Process. • Older than SWE. Software Engineering • Deals with the design, development and delivery of SW. • Is part of Systems Engineering. Frequently asked questions about software engineering Question Answer What is software? Computer programs and associated documentation. Software products may be developed for a particular customer or may be developed for a general market. What are the attributes of good software? Good software should deliver the required functionality and performance to the user and should be maintainable, dependable and usable. What is software engineering? Software engineering is an engineering discipline that is concerned with all aspects of software production. What are the fundamental engineering activities? software Software specification, software development, software validation and software evolution. What is the difference between software Computer science focuses on theory and fundamentals; software engineering is concerned with the practicalities of engineering and computer science? developing and delivering useful software. What is the difference between software System engineering is concerned with all aspects of computer-based systems development including hardware, engineering and system engineering? software and process engineering. Software engineering is part of this more general process. 18 Frequently asked questions about software engineering Question Answer What are the key challenges facing software Coping with increasing diversity, demands for reduced delivery engineering? times and developing trustworthy software. What are the costs of software engineering? Roughly 60% of software costs are development costs, 40% are testing costs. For custom software, evolution costs often exceed development costs. What are the best software engineering While all software projects have to be professionally managed and developed, different techniques are appropriate for techniques and methods? different types of system. For example, games should always be developed using a series of prototypes whereas safety critical control systems require a complete and analyzable specification to be developed. You can’t, therefore, say that one method is better than another. What differences has the web made to The web has led to the availability of software services and the possibility of developing highly distributed service-based software engineering? systems. Web-based systems development has led to important advances in programming languages and software reuse. 19 20 What is a Software Process? Activities and results that produce a software product: SW Process Activity What is going on there? Specification What does the customer need? What are the constraints? Development Design & programming. Validation Checking whether it meets requirements. Evolution Modifications (e.g. customer/market). 21 What is a Software Process Model? Description of the software process that represents one view, such as the activities, data or roles of people involved. Examples of views Focus on… Workflow Activities = human actions. What is input, output, and dependencies. Dataflow Activities = transformations of information. How the input is transformed into output. Role/Action What is the role of people involved in each step of the process? 22 Software Process Models Waterfall approach Iterative development Component-Based Software Engineering CBSE assembled form existing components 23 The Cost of Software Engineering Depends on: The process used, and The type of software being developed. Each generic approach has a different profile of cost distribution. Roughly 60% of costs are development costs, 40% are testing costs. For custom software, evolution costs often exceed development costs. 24 Cost distribution Custom software development (Bespoke) Software Model Cost units Waterfall Model 0 25 50 75 100 Cost distribution Software development activity Specification Design Development Integration and testing Iterative Development 0 25 50 Specification Iterative Development 75 100 System testing Component-based Software Engineering 0 Specification 25 50 75 Development 100 Integration and testing Development and evolution costs for long-lifetime systems 0 System development 100 200 300 System evolution 400 25 Cost distribution Generic software development 0 Specification 25 Development 50 75 System testing Product development costs 100 26 What is CASE? Computer Aided Software Engineering. Programs that support: Requirements analysis. System modeling. Debugging. Testing. 27 Attributes of good software Functional attributes (performance; what the system does). Non-functional attributes (quality; how the system does it). Product Characteristic Description Maintainability Evolution qualities such as Testability, extensibility. Dependability Reliability, security, safety. Efficiency Response time, processing time, memory utilization. Usability Easy to learn how to use the system by target users. Efficient to use the system by users to accomplish a task. Satisfying to use by intended users. 28 Activity What are the key attributes for.. Interactive game Banking system Cardiac monitor in an ICU unit Players, score, scenes, theme. Client accounts, stocks bonds, money transfers. heart rate, temperature, blood pressure. 29 Challenges facing software engineering Challenge Why? Software needs to .. Heterogeneity Different computers, different platforms, different support systems. Cope with this variability. Delivery Businesses are more responsive → supporting software needs to evolve as rapidly. Be delivered in shorter time without compromising quality. Trust Software is a part of many aspects of our lives (work, study, leisure). Demonstrate that it can be trusted by users. 30 References ➢ PRESS&SUN-BULLETIN, The Binghamton Press Co., Binghamton, NY, October 1,1999. ➢ “Software Hell: Is there a way out?”, BUSINESS WEEK, December 6, 1999. ➢ IEEE Standards Collection: Software Engineering, IEEE standard 610.12-1990, IEEE 1993. ➢ Sommerville, Ian “Software Engineering”, 9th edition, Addison-Wesley. Introduction SOFTWARE ENGINEERING 1 Software ◼ ◼ ◼ Q : If you have to write a 10,000 line program in C to solve a problem, how long will it take? Answers: generally range from 2-4 months Let us analyze the productivity ◼ ◼ ◼ ◼ Productivity = output/input resources In SW output is considered as LOC Input resources is effort - person months; overhead cost modeled in rate for person month Though not perfect, some productivity measure is needed, as project has to keep it high SOFTWARE ENGINEERING 2 Software … ◼ ◼ ◼ ◼ ◼ The productivity is 2.5-5 KLOC/PM Q: What is the productivity in a typical commercial SW organization ? A: Between 100 to 1000 LOC/PM Q: Why is it low, when your productivity is so high? (people like you work in the industry) A: What the student is building and what the industry builds are two different things SOFTWARE ENGINEERING 3 Software… ◼ ◼ ◼ In a univ a student system is built while the commercial org builds industrial strength sw What is the difference between a student program and industrial strength sw for the same problem? Software (IEEE): collection of programs, procedures, rules, and associated documentation and data SOFTWARE ENGINEERING 4 Software… ◼ Student Developer is the user ◼ ◼ ◼ bugs are tolerable UI not important No documentation ◼ Industrial Strength Others are the users ◼ ◼ ◼ bugs not tolerated UI v. imp. issue Documents needed for the user as well as for the organization and the project SOFTWARE ENGINEERING 5 Software… ◼ ◼ ◼ ◼ Student SW not in critical use Reliability, robustness not important No investment Don’t care about portability ◼ ◼ ◼ ◼ Industrial Strength Supports important functions / business Reliability , robustness are very important Heavy investment Portability is a key issue here SOFTWARE ENGINEERING 6 Industrial strength software ◼ ◼ ◼ ◼ ◼ Student programs for a problem & industrial strength software are two different things Key difference is in quality (including usability, reliability, portability, etc.) Brooks thumb-rule: Industrial strength sw costs 10 time more than student sw In this course, software means industrial strength software This software has some characteristics SOFTWARE ENGINEERING 7 Is Expensive ◼ Let us look at costs involved ◼ ◼ ◼ ◼ ◼ Productivity = Appx 1000 LOC/PM Cost = $3K to $15K/PM Cost per LOC = $3 to $15 i.e, each line of delivered code costs these many $s A simple application for a business may have 20KLOC to 50KLOC ◼ ◼ ◼ Cost = $60K to $2.25Million Can easily run on $10K-$20K hardware So HW costs in an IT solution are small as compared to SW costs. SOFTWARE ENGINEERING 8 Requires tight Schedule ◼ ◼ ◼ Business requirements today demand short delivery times for software In the past, software products have often failed to be completed in time Along with cost, cycle time is a fundamental driving force SOFTWARE ENGINEERING 9 Productivity – for cost and schedule ◼ ◼ An industrial strength software project is driven by cost and schedule Both can be modeled by productivity, measured in terms of output per unit effort (e.g. LOC per person month) ◼ ◼ ◼ Higher productivity leads to lower cost Higher productivity leads to lower cycle time Hence, for projects (to deliver software), quality and productivity are basic drivers SOFTWARE ENGINEERING 10 Quality ◼ ◼ ◼ Along with productivity, quality is the other major driving factor Developing high Q sw is a basic goal Quality of sw is harder to define SOFTWARE ENGINEERING 11 Quality – ISO standard SOFTWARE ENGINEERING 12 Quality – ISO std… ◼ ISO std has six attributes ◼ ◼ ◼ ◼ ◼ ◼ Functionality Reliability Usability Efficiency Maintainability Portability SOFTWARE ENGINEERING 13 Quality – ISO std… ◼ ISO std has six attributes ◼ ◼ ◼ Functionality – The capability to provide functions which meet stated and implied needs when the software is used. Reliability – The capability to provide failure-free service. Usability – The capability to be understood, learned and used. SOFTWARE ENGINEERING 14 Quality – ISO std… ◼ ISO std has six attributes ◼ ◼ Efficiency – The capability to provide appropriate performance relative to the amount of resources used. Maintainability – The capability to be modified for purposes of making corrections, improvements or adaptation. SOFTWARE ENGINEERING 15 Quality – ISO std… ◼ ISO std has six attributes ◼ Portability – The capability to be adapted for different specified environments without applying actions or means other than those provided for this purpose in the product. SOFTWARE ENGINEERING 16 Quality… ◼ ◼ Multiple dimensions mean that not easy to reduce Q to a single number Concept of Q is project specific ◼ ◼ ◼ For some reliability is most important For others usability may be more important Reliability is generally considered the main Q criterion SOFTWARE ENGINEERING 17 Quality… ◼ Reliability = Probability of failure ◼ ◼ ◼ To normalize Quality = Defect density ◼ ◼ ◼ ◼ hard to measure approximated by no. of defects in software Quality = No. of defects delivered / Size Defects delivered - approximated with no. of defects found in operation Current practices: less than 1 def/KLOC What is a defect? Project specific! SOFTWARE ENGINEERING 18 Quality – Maintainability ◼ Once sw delivered, it enters the maintenance phase, in which ◼ ◼ ◼ ◼ Residual errors are fixed – this is corrective maintenance Upgrades and environment changes are done – this is adaptive maintenance Maintenance can consume more effort than development over the life of the software (can even be 20:80 ratio!) Hence maintainability is another quality attribute of great interest SOFTWARE ENGINEERING 19 Quality and Productivity ◼ ◼ ◼ Hence, quality and productivity (Q&P) are the basic drivers in a sw project The aim of most methodologies is to deliver software with a high Q&P Besides the need to achieve high Q&P there are some other needs SOFTWARE ENGINEERING 20 Change ◼ ◼ ◼ ◼ Only constant in business is change! Requirements change, even while the project is in progress In a project, up to 40% of development effort may go in implementing changes Practices used for developing software must accommodate change SOFTWARE ENGINEERING 21 Scale ◼ ◼ ◼ Most industrial strength software tend to be large and complex Methods for solving small problems do not often scale up for large problems Two clear dimensions in a project ◼ ◼ ◼ engineering project management For small, both can be done informally, for large both have to be formalized SOFTWARE ENGINEERING 22 Scale… SOFTWARE ENGINEERING 23 Scale… ◼ An illustration of issue of scale is counting the number of people in a room vs taking a census ◼ ◼ ◼ ◼ Both are counting problems Methods used in first not useful for census For large scale counting problem, must use different techniques and models Management will become critical SOFTWARE ENGINEERING 24 Scale: Examples Gcc 980KLOC C, C++, yacc Perl 320 KLOC C, perl, sh Appache 100 KLOC C, sh Linux 30,000 KLOC C, c++ Windows XP 40,000 KLOC C, C++ SOFTWARE ENGINEERING 25 Scale… ◼ ◼ As industry strength software tends to be large, hence methods used for building these must be able to scale up For much of the discussion, we will high Q&P as the basic objective SOFTWARE ENGINEERING 26 Summary ◼ ◼ ◼ ◼ The problem domain for SE is industrial strength software SE aims to provide methods for systematically developing (industrial strength) software Besides developing software the goal is to achieve high quality and productivity (Q&P) Methods used must accommodate changes, and must be able to handle large problems SOFTWARE ENGINEERING 27 ➢ “The establishment and use of sound engineering principles in order to obtain economically developed software that is reliable and works efficiently on real machines.” -Fritz Bauer(1968) ➢ “A discipline whose aim is the production of quality software, software that is delivered on time, within budget, and that satisfies its requirements.” -Stephen Schach 2 Software Engineering is a systematic approach to the design, development, operation, and maintenance of a software system. 3 Software is more than just a program. It consists of programs, documentation of any facet of the program and the procedures used to setup and operate the software system. Any program is a subset of software and it becomes software only if documentation and operating procedure manuals are prepared. Program is just a combination of source code and object code. 4 5 6 7 The way of producing software. Differs from organization to organization. Its not just about hiring smart, knowledgeable developers and buying the latest development tools. The use of effective software development processes is also needed, so that developers can systematically use the best technical and managerial practices to successfully complete their projects. 8 There are many life cycle models and process improvement models. Depending on the type of project, a suitable model is to be selected. 9 The software has a very special characteristic e.g., “It does not wear out”. Its behavior and nature is quite different than other products of human life. Comparison b/w “constructing a bridge” and “building a software”. 10 S. No. Constructing a Bridge 1. The problem is well understood. Building a Software Only some parts of the problem are understood, others are not. Every software is different and designed for special applications. 2. There are many existing bridges. 3. The requirements for a bridge typically do not change much during construction. Requirements typically change during all phases of development. 4. The strength and stability of a bridge can be calculated with precision. Not possible to calculate correctness of a software with existing methods. 11 S. No. Constructing a Bridge 5. When a bridge collapses, there is a detailed investigation and report. Building a Software When a program fails, the reasons are often unavailable or even deliberately concealed. 6. Engineers have been constructing bridges for thousand of years. Developers have been writing programs for 50 years or so. 7. Materials (wood, stone, Hardware and software iron, steel) and techniques changes rapidly. (making joints in wood, carving stone, casting iron) change slowly. 12 Some other characteristics of software are: 1. 2. 3. 4. It does not wear out. Software is not manufactured. Reusability of components. Software is flexible. 13 Software has become integral part of most of the fields of human life. We name a field and we find the usage of software in that field. System software (e.g., operating system) Real-time software (e.g., weather forecasting system) Embedded software (e.g., control unit of power plants) Business software (e.g., ERP) Personal computer software (e.g., word processors, computer graphics) 14 Artificial Intelligence software (e.g., Expert systems, Recommendation Systems) Web based software (e.g., CGI, HTML, Java, Perl) Engineering and scientific software (e.g., CAD, CAM, SPSS, MATLAB) 15 1. Deliverables and Milestones Different deliverables are generated during software development. The examples are source code, user manuals, operating procedure manuals, etc. The milestones are the events that are used to ascertain the status of the project. Finalization of specification is a milestone. Completion of design documentation is another milestone. 16 2. Product and Process What is delivered to the customer is called a product. Process is the way in which we produce software. 3. Productivity and Effort 4. Module and Software Components An independent deliverable piece of functionality providing access to its services through interfaces 17 5. Generic and Customised Software Products Generic products are developed for anonymous customers. The target is generally the entire world. The customized products are developed for particular customers. 18 • Use of Life Cycle Models • Software is developed through several well-defined stages: − − − − Requirement analysis and specification, Design, Coding Testing, etc. 2 • Emphasis has shifted − From error correction to error prevention • Modern practices emphasize: − Detection of errors as close to their point of introduction as possible 3 • In exploratory style − Errors are detected only during testing • Now: − Focus is on detecting as many errors as possible in each phase of development 4 • In exploratory style − Coding is synonymous with program development • Now: − Coding is considered only a small part of program development effort 5 • A lot of effort and attention is now being paid to: − Requirements specification • Also, now there is a distinct design phase: − Standard design techniques are being used 6 • During all stages of development process: − Periodic reviews are being carried out • Software Testing has become systematic: − Standard testing techniques are available 7 • There is better visibility of design and code: − visibility means production of good quality, consistent and standard documents − In the past, very little attention was being given to producing good quality and consistent documents − We will see later that increased visibility makes software project management easier 8 • Projects are being properly planned: − estimation − Scheduling − Monitoring mechanisms • Use of CASE Tools 9 • A descriptive and diagrammatic model of software life cycle • Identifies all the activities undertaken during product development • Establishes a precedence ordering among the different activities • Divides life cycle into phases. 10 • Helps common understanding of activities among the software developers • Helps to identify inconsistencies, redundancies, and omissions in the development process 11 • When a program is developed by a single programmer − The problem is within the grasp of an individual − He has the freedom to decide his exact steps and still succeed --- called exploratory model --- one can use it in many ways − Code -> Test -> Design − Code -> Design -> Test -> change code − Specify -> code -> design -> test -> etc. 12 • When software is being designed by a team: − There must be a precise understanding among team members as to when to do what − Otherwise, it would lead to chaos and project failure 13 • A life cycle model: − Defines entry and exit criteria for every phase − A phase is considered to be complete: • Only when all its exit criteria are satisfied 14 ➢ The ultimate objective of software engineering is to produce good quality maintainable software within reasonable time frame and at affordable cost. ➢ Life cycle of a software starts from the concept exploration and ends at the retirement of the software. ➢ The software life cycle typically includes a requirement phase, design phase, implementation phase, test phase, installation and check out phase, operation and maintenance phase, and sometimes retirement phase. 15 ➢ A software life cycle model is a particular abstraction that represents a software life cycle. ➢ A software life cycle model is often called a software development life cycle (SDLC). ➢ A variety of life cycle models have been proposed and are based on tasks involved in developing and maintaining software. 16 17 Sometimes, a product is constructed without specifications or any attempt at design. Instead, the developer simply builds a product that is reworked as many times as necessary to satisfy the client. This is an adhoc approach and not well-defined. Basically, it is a simple two-phase model. The first phase is to write code and the next phase is to fix it. Fixing in this context may be error correction or addition of further functionality. 18 May work well on small programming exercises 100 or 200 lines long but totally unsatisfactory for software of any reasonable cost. Code soon becomes unfixable and unenhanceable. 19 The most familiar model. The model has five phases. The phases always occur in this order and do not overlap. The developer must complete each phase before the next phase begins. 20 21 ➢ The main goal of this phase is to understand the exact requirements of the customer and to document them properly. ➢ The activity is executed together with the customer. ➢ The output of this phase is a large document written in a natural language describing what the system will do without describing how it will be done. ➢ The resultant document is known as Software Requirement Specification (SRS) document. 22 ➢ The goal of this phase is to transform the requirements specification into a structure that is suitable for implementation in some programming language. ➢ Here, overall software architecture is defined, and the high level and detailed design work is performed. ➢ This work is documented and known as Software Design Description (SDD) document. 23 ➢ During this phase, design is implemented. ➢ If the SDD is complete, the implementation or coding phase proceeds smoothly. ➢ During testing, the major activities are centered around the examination and modification of the code. ➢ Initially, small modules are tested in isolation from the rest of the software product. 24 ➢ Very Important phase. ➢ Effective testing will contribute to the delivery of higher quality software products, more satisfied users, lower maintenance costs, and more accurate and reliable results. ➢ Very expensive activity. Consumes one-third to onehalf of the cost of a typical development project. 25 ➢ The task which comes into play when the software is delivered to the customer’s site, installed and is operational. ➢ This phase includes error correction, enhancement of capabilities, deletion of obsolete capabilities, and optimization. ➢ The purpose of this phase is to preserve the value of the software over time. 26 ➢ It is difficult to define all requirements at the beginning of a project. ➢ This model is not suitable for accommodating any change. ➢ A working version of the system is not seen until late in the project’s life. ➢ It does not scale up well to large projects. ➢ Real projects are rarely sequential. 27 The classical waterfall model is hard to use. So, Iterative waterfall model can be thought of as incorporating the necessary changes to the classical waterfall model to make it usable in practical software development projects. It is almost same as the classical waterfall model except some changes are made to increase the efficiency of the software development. The iterative waterfall model provides feedback paths from every phase to its preceding phases, which is the main difference from the classical waterfall model. 2 The iterative waterfall model provides feedback paths from every phase to its preceding phases, which is the main difference from the classical waterfall model. 3 The main focus of this phase is to determine whether it would be financially and technically feasible to develop the software. The feasibility study involves carrying out several activities such as collection of basic information relating to the software such as the different data items that would be input to the system, the processing required to be carried out on these data, the output data required to be produced by the system, Various constraints on the development 4 The feedback paths allow the phase to be reworked in which errors are committed and these changes are reflected in the later phases. But, there is no feedback path to the stage – feasibility study, because once a project has been taken, does not give up the project easily. It is good to detect errors in the same phase in which they are committed. It reduces the effort and time required to correct the errors. 5 Feedback Path: In the classical waterfall model, there are no feedback paths, so there is no mechanism for error correction. But in iterative waterfall model feedback path from one phase to its preceding phase allows correcting the errors that are committed and these changes are reflected in the later phases. Simple: Iterative waterfall model is very simple to understand and use. That’s why it is one of the most widely used software development models. 6 Difficult to incorporate change requests: The major drawback of the iterative waterfall model is that all the requirements must be clearly stated before starting of the development phase. Customer may change requirements after some time but the iterative waterfall model does not leave any scope to incorporate change requests that are made after development phase starts. Incremental delivery not supported: In the iterative waterfall model, the full software is completely developed and tested before delivery to the customer. There is no scope for any intermediate delivery. So, customers have to wait long for getting the software. 7 Overlapping of phases not supported: Iterative waterfall model assumes that one phase can start after completion of the previous phase, But in real projects, phases may overlap to reduce the effort and time needed to complete the project. Risk handling not supported: Projects may suffer from various types of risks. But, Iterative waterfall model has no mechanism for risk handling. 8 Limited customer interactions: Customer interaction occurs at the start of the project at the time of requirement gathering and at project completion at the time of software delivery. These fewer interactions with the customers may lead to many problems as the finally developed software may differ from the customers’ actual requirements. 9 Incremental process model is also known as Successive version model. First, a simple working system implementing only a few basic features is built and then that is delivered to the customer. Then thereafter many successive iterations/ versions are implemented and delivered to the customer until the desired system is released. 10 A, B, C are modules of Software Product that are incrementally developed and delivered. 11 Life cycle activities – Requirements of Software are first broken down into several modules that can be incrementally constructed and delivered. At any time, the plan is made just for the next increment and not for any kind of long term plans. Therefore, it is easier to modify the version as per the need of the customer. Development Team first undertakes to develop core features (these do not need services from other features) of the system. Once the core features are fully developed, then these are refined to increase levels of capabilities by adding new functions in Successive versions. Each incremental version is usually developed using an iterative waterfall model of development. As each successive version of the software is constructed and delivered, now the feedback of the Customer is to be taken and these were then incorporated in the next version. Each version of the software have more additional features over the previous ones. 12 13 Error Reduction: The core modules are used by the customer from the beginning and therefore these get tested thoroughly. This reduces chances of errors in the core modules of the final product, leading to greater reliability of the software. Incremental Resource Deployment: This model obviates the need for the customer to commit large resources at one go for development of the system. It also saves the developing organisation from deploying large resources and manpower for a project in one go. 14 Requires good planning and design. Total cost is not lower. Well defined module interfaces are required. 15 Another popular life cycle model. This model suggests building a working prototype of the system, before development of the actual software. A prototype is a toy and crude implementation of a system. 2 It has limited functional capabilities, low reliability, or inefficient performance as compared to the actual software. A prototype can be built very quickly by using several shortcuts by developing inefficient, inaccurate or dummy functions. 3 To develop the Graphical User Interface (GUI) part of a software. Through prototype, the user can experiment with a working user interface and they can suggest any change if needed. When the exact technical solutions are unclear to the development team. A prototype can help them to critically examine the technical issues associated with the product development. The lack of familiarity with a required development technology is a technical risk. This can be resolved by developing a prototype to understand the issues and accommodate the changes in the next iteration. 4 Prototype model should be used when the desired system needs to have a lot of interaction with the end users. Typically, online systems, web interfaces have a very high amount of interaction with end users, are best suited for Prototype model. It might take a while for a system to be built that allows ease of use and needs minimal training for the end user. Prototyping ensures that the end users constantly work with the system and provide a feedback which is incorporated in the prototype to result in a useable system. They are excellent for designing good human computer interface systems. 5 The Prototyping Model of software development is graphically shown in the next slide. The software is developed through two major activities – one is prototype construction and another is iterative waterfall based software development. 6 7 Prototype Development – Prototype development starts with an initial requirements gathering phase. A quick design is carried out and a prototype is built. The developed prototype is submitted to the customer for evaluation. Based on the customer feedback, the requirements are refined and the prototype is suitably modified. This cycle of obtaining customer feedback and modifying the prototype continues till the customer approves the prototype. 8 Iterative Development – Once the customer approves the prototype, the actual software is developed using the iterative waterfall approach. In spite of the availability of a working prototype, the SRS document is usually needed to be developed since the SRS Document is invaluable for carrying out traceability analysis, verification and test case design during later phases. 9 The code for the prototype is usually thrown away. However, the experience gathered from developing the prototype helps a great deal in developing the actual software. By constructing the prototype and submitting it for user evaluation, many customer requirements get properly defined and technical issues get resolved by experimenting with the prototype. This minimises later change requests from the customer and the associated redesign costs. 10 Strengths of prototype model are: a) Gains customer’s confidence as developers and customers are in sync with each other’s expectations continuously. b) Ideal for online systems where high level of human computer interaction is involved. c) Very flexible, as changes in requirements can be accommodated much more easily with every new review and refining. d) Helps the developers and users both understand the system better. 11 e) Software built through prototyping needs minimal user training as users get trained using the prototypes on their own from the very beginning of the project. f) Integration requirements are very well understood and deployment channels are decided at a very early stage. 12 Cost of the development of the software by using prototyping model can increase in various cases where the risks are very less. It may take more time to develop a software by using Prototyping model. The Prototyping model is effective only for those projects for which the risks can be identified before the development starts. Since the prototype is developed at the start of the project, so the Prototyping model is ineffective for risks that identified after the development phase starts. 13 ➢ Spiral model is one of the most important Software Development Life Cycle models, which provides support for Risk Handling. ➢ In its diagrammatic representation, it looks like a spiral with many loops. ➢ The exact number of loops of the spiral is unknown and can vary from project to project. ➢ Each loop of the spiral is called a Phase of the software development process. 14 ➢ The exact number of phases needed to develop the product can be varied by the project manager depending upon the project risks. ➢ As the project manager dynamically determines the number of phases, so the project manager has an important role to develop a product using spiral model. 15 16 “Risk is future uncertain events with a probability of occurrence and a potential for loss” ➢ “A Risk is essentially any adverse circumstance that might hamper the successful completion of a software project.” ➢ Schedule / Time-Related / Delivery Related Planning Risks. Budget / Financial Risks. Operational / Procedural Risks. Technical / Functional / Performance Risks. Other Unavoidable Risks. 17 Each phase of Spiral Model is divided into four quadrants as shown in the above figure. The functions of these four quadrants are discussed below Objectives determination and identify alternative solutions: Requirements are gathered from the customers and the objectives are identified, elaborated and analyzed at the start of every phase. Then alternative solutions possible for the phase are proposed in this quadrant. 18 Identify and resolve Risks: During the second quadrant all the possible solutions are evaluated to select the best possible solution. Then the risks associated with that solution is identified and the risks are resolved using the best possible strategy. At the end of this quadrant, Prototype is built for the best possible solution. 19 Develop next version of the Product: During the third quadrant, the identified features are developed and verified through testing. At the end of the third quadrant, the next version of the software is available. Review and plan for the next Phase: In the fourth quadrant, the Customers evaluate the so far developed version of the software. In the end, planning for the next phase is started. 20 A risk is any adverse situation that might affect the successful completion of a software project. The most important feature of the spiral model is handling these unknown risks after the project has started. Such risk resolutions are easier done by developing a prototype. The spiral model supports coping up with risks by providing the scope to build a prototype at every phase of the software development. 21 Prototyping Model also support risk handling, but the risks must be identified completely before the start of the development work of the project. But in real life project risk may occur after the development work starts, in that case, we cannot use Prototyping Model. In each phase of the Spiral Model, the features of the product dated and analyzed and the risks at that point of time are identified and are resolved through prototyping. Thus, this model is much more flexible compared to other SDLC models. 22 The Spiral model is called as a Meta Model because it subsumes all the other SDLC models. For example, a single loop spiral actually represents the Iterative Waterfall Model. The spiral model incorporates the stepwise approach of the Classical Waterfall Model. The spiral model uses the approach of Prototyping Model by building a prototype at the start of each phase as a risk handling technique. Also, the spiral model can be considered as supporting the evolutionary model – the iterations along the spiral can be considered as evolutionary levels through which the complete system is built. 23 Risk Handling: The projects with many unknown risks that occur as the development proceeds, in that case, Spiral Model is the best development model to follow due to the risk analysis and risk handling at every phase. Good for large projects: It is recommended to use the Spiral Model in large and complex projects. 24 Flexibility in Requirements: Change requests in the Requirements at later phase can be incorporated accurately by using this model. Customer Satisfaction: Customer can see the development of the product at the early phase of the software development and thus, they habituated with the system by using it before completion of the total product. 25 Complex: The Spiral Model is much more complex than other SDLC models. Expensive: Spiral Model is not suitable for small projects as it is expensive. Too much dependable on Risk Analysis: The successful completion of the project is very much dependent on Risk Analysis. Without very highly experienced expertise, it is going to be a failure to develop a project using this model. Difficulty in time management: As the number of phases is unknown at the start of the project, so time estimation is very difficult. 26 The Rapid Application Development Model was first proposed by IBM in 1980’s. The critical feature of this model is the use of powerful development tools and techniques. A software project can be implemented using this model if the project can be broken down into small modules wherein each module can be assigned independently to separate teams. These modules can finally be combined to form the final product. 2 Development of each module involves the various basic steps as in waterfall model i.e analyzing, designing, coding and then testing, etc. as shown in the figure. Another striking feature of this model is a short time span i.e the time frame for delivery(time-box) is generally 6090 days. 3 4 This model consists of 4 basic phases: Requirements Planning – It involves the use of various techniques used in requirements elicitation like brainstorming, task analysis, form analysis, user scenarios, FAST (Facilitated Application Development Technique), etc. It also consists of the entire structured plan describing the critical data, methods to obtain it and then processing it to form final refined model. 5 User Description – This phase consists of taking user feedback and building the prototype using developer tools. In other words, it includes re-examination and validation of the data collected in the first phase. The dataset attributes are also identified and elucidated in this phase. 6 Construction – In this phase, refinement of the prototype and delivery takes place. It includes the actual use of powerful automated tools to transform process and data models into the final working product. All the required modifications and enhancements are too done in this phase. 7 Cutover – All the interfaces between the independent modules developed by separate teams have to be tested properly. The use of powerfully automated tools and subparts makes testing easier. This is followed by acceptance testing by the user. The process involves building a rapid prototype, delivering it to the customer and the taking feedback. After validation by the customer, SRS document is developed and the design is finalised. 8 Use of reusable components helps to reduce the cycle time of the project. Feedback from the customer is available at initial stages. Reduced costs as fewer developers are required. Use of powerful development tools results in better quality products in comparatively shorter time spans. The progress and development of the project can be measured through the various stages. It is easier to accommodate changing requirements due to the short iteration time spans. 9 The use of powerful and efficient tools requires highly skilled professionals. The absence of reusable components can lead to failure of the project. The team leader must work closely with the developers and customers to close the project in time. The systems which cannot be modularized suitably cannot use this model. Customer involvement is required throughout the life cycle. It is not meant for small scale projects as for such cases, the cost of using automated tools and techniques may 10 exceed the entire budget of the project. The V-model is a type of SDLC model where process executes in a sequential manner in V-shape. It is also known as Verification and Validation model. It is based on the association of a testing phase for each corresponding development stage. Development of each step directly associated with the testing phase. The next phase starts only after completion of the previous phase i.e. for each development activity, there is a testing activity corresponding to it. 11 12 As shown in figure, there are two main phasesdevelopment and validation phases. In each development phase, along with the development of a work product, test case design and the plan for testing the work product are carried out, whereas the actual testing is carried out in the validation phase. This validation plan created during the development phase is carried out in the corresponding validation phase. 13 In the validation phase, testing is carried out in three stepsunit, integration and system testing. The purpose of these three different steps of testing during the validation phase is to detect defects that arise in the corresponding phases of software development-requirements analysis and specification, design, and coding respectively. 14 Design Phase: Requirement Analysis: This phase contains detailed communication with the customer to understand their requirements and expectations. This stage is known as Requirement Gathering. System Design: This phase contains the system design and the complete hardware and communication setup for developing product. 15 Architectural Design: System design is broken down further into modules taking up different functionalities. The data transfer and communication between the internal modules and with the outside world (other systems) is clearly understood. Module Design: In this phase the system breaks down into small modules. The detailed design of modules is specified, also known as Low-Level Design (LLD). 16 Testing Phases: Unit Testing: Unit Test Plans are developed during module design phase. These Unit Test Plans are executed to eliminate bugs at code or unit level. Integration testing: After completion of unit testing Integration testing is performed. In integration testing, the modules are integrated and the system is tested. Integration testing is performed on the Architecture design phase. This test verifies the communication of modules among themselves. 17 System Testing: System testing test the complete application with its functionality, inter dependency, and communication. It tests the functional and non-functional requirements of the developed application. User Acceptance Testing (UAT): UAT is performed in a user environment that resembles the production environment. UAT verifies that the delivered system meets user’s requirement and system is ready for use in real world. 18 In waterfall model, testing activities are confined to the testing phase only, in V-Model, testing activities are spread over the entire life cycle. 19 Much of the testing activities are carried out in parallel with the development activities. Therefore, this model leads to a shorter testing phase and an overall faster product development as compared to the iterative model. Since, test cases are designed when the schedule pressure has not built up, the quality of the test cases are usually better. The test team is reasonably kept occupied throughout the development cycle in contrast to the waterfall model where the testers are active only during the testing phase. 20 In the V-Model, the test team is associated with the project from the beginning. Therefore, they built up a good understanding of the development artifacts, and this in turn, helps them to carry out effective testing of the software. 21 Being a derivative of the classical waterfall model, this model inherits most of the weaknesses of the waterfall model High risk and uncertainty. It is not a good for complex and object-oriented projects. It is not suitable for projects where requirements are not clear and contains high risk of changing. This model does not support iteration of phases. It does not easily handle concurrent events. 22 Agile is a time-bound, iterative approach to software delivery that builds software incrementally from the start of the project, in The Agile model was primarily designed to help a project to adapt to change requests quickly. So, the main aim of the Agile model is to facilitate quick project completion. To accomplish this task agility is required. Agility is achieved by fitting the process to the project, removing activities that may not be essential for a specific project. Also, anything that is wastage of time and effort is avoided. 23 Actually Agile model refers to a group of development processes. These processes share some basic characteristics but do have certain subtle differences among themselves. A few Agile SDLC models are given below: Crystal Atern Feature-driven development Scrum Extreme programming (XP) Lean development 24 Unified process To establish close contact with the customer during development and to gain a clear understanding of various requirements, each Agile project usually includes a customer representative on the team. Agile model relies on working software deployment rather than comprehensive documentation. Frequent delivery of incremental versions of the software to the customer representative in intervals of few weeks. Requirement change requests from the customer are encouraged and efficiently incorporated. 25 It emphasizes on having efficient team members and enhancing communications among them is given more importance. It is recommended that the development team size should be kept small (5 to 9 peoples) to help the team members meaningfully engage in face-to-face communication and have collaborative work environment. Agile development process usually deploy Pair Programming. In Pair programming, two programmers work together at one work-station. One does coding while the other reviews the code as it is typed in. The two programmers switch their roles every hour or so. 26 As per IEEE Standards: A condition of capability needed by a user to solve a problem or achieve an objective; A condition or a capability that must be met or possessed by a system….to satisfy a contract, standard, specification, or other formally imposed document. 2 A software requirement can be of 3 types: Functional requirements Non-functional requirements Domain requirements 3 The requirements that the end user specifically demands as basic facilities that the system should offer. All these functionalities need to be necessarily incorporated into the system as a part of the contract. These are basically the quality constraints that the system must satisfy according to the project contract. The priority or extent to which these factors are implemented varies from one project to other. They are also called non-behavioral requirements. 4 They basically deal with issues like: Portability Security Maintainability Reliability Scalability Performance Reusability Flexibility NFR’s are classified into following types: Interface constraints Performance constraints: response time, security, storage space, etc. Operating constraints Life cycle constraints: maintainability, portability, etc. Economic constraints 5 Domain requirements are the requirements which are characteristic of a particular category or domain of projects. The basic functions that a system of a specific domain must necessarily exhibit come under this category. 6 A software requirements specification (SRS) is a description of a software system to be developed. It lays out functional and non-functional requirements, and may include a set of use cases that describe user interactions that the software must provide. 7 In order to fully understand one’s project, it is very important that they come up with a SRS listing out their requirements, how are they going to meet it and how will they complete the project. It helps the team to save upon their time as they are able to comprehend how are going to go about the project. Doing this also enables the team to find out about the limitations and risks early on. 8 An SRS establishes the basis for agreement between the client and the supplier on what the software product will do. An SRS provides a reference for validation of the final product. A high-quality SRS is a pre-requisite to high-quality software. A high-quality SRS reduces the development cost. 9 Requirement/Problem Analysis Requirement Specification Requirement Validation 10 The SRS document should describe the system (to be developed) as a black box, and should specify only the externally visible behaviour of the system. For this reason, the SRS document is also called the black-box specification of the software being developed. 11 1. 2. 3. 4. 5. 6. Correct Complete Unambiguous Verifiable Consistent Ranked for importance and/or stability 12 1. 2. 3. 4. Over-specification Forward References Wishful Thinking Noise 13 1. 2. 3. 4. Functionality Performance Design constraints implementation External Interfaces imposed on an 14 1. 2. 3. Actor: A person which uses the system being built for achieving some goal. Primary Actor: The main actor for whom a use case is initiated and whose goal satisfaction is the main objective of the use case Scenario: A set of actions that are performed to achieve a goal 15 4. Main success scenario: Describes the interaction if nothing fails and all steps in the scenario succeed. 5. Exceptional scenario: Describes the system behaviour if some of the steps in the main scenario do not complete successfully 16 How are design ideas communicated in a team environment? If the software is large scale, employing perhaps dozens of developers over several years, it is important that all members of the development team communicate using a common language. This isn’t meant to imply that they all need to be fluent in English or C++, but it does mean that they need to be able to describe their software’s operation and design to another person. That is, the ideas in the head of say the analyst have to be conveyed to the designer in some way so that he/she can implement that idea in code. Just as mathematicians use algebra and electronics engineers have evolved circuit notation and theory to describe their ideas, software engineers have evolved their own notation for describing the architecture and behaviour of software system. That notation is called UML. The Unified Modelling Language. Some might prefer the title Universal Modelling language since it can be used to model many things besides software. What is UML ? UML is not a language in the same way that we view programming languages such as ‘C++’, ‘Java’ or ‘Basic’. UML is however a language in the sense that it has syntax and semantics which convey meaning, understanding and constraints (i.e. what is right and wrong and the limitations of those decisions) to the reader and thereby allows two people fluent in that language to communicate and understand the intention of the other. UML represents a collection of 13 essentially graphical (i.e. drawing) notations supplemented by textual descriptions designed to capture requirements and design alternatives. You don’t have to use them all, you just chose the ones that capture important information about the system you are working on. What is UML ? UML is to software engineers what building plans are to an architect and an electrical circuit diagrams is to an electrician. Note: UML does not work well for small projects or projects where just a few developers are involved. If you attempt to use it in this environment it will seem more of a burden than an aid, but then it was never intended for that. It works best for very complex systems involving dozens of developers over a long period of time where it is impossible for one or two people to maintain the structure of the software in their head as they develop it. • During the software design phase, the design document (called as SDD) is produced, based on the customer requirements as documented in the SRS document. • The design document produced at the end of the design phase should be implementable using a programming language in the subsequent (coding) phase. The following items are designed and documented during the design phase: • Different Modules required • Control relationships among modules − call relationship or invocation relationship • Interfaces among different modules − data items exchanged among different modules • Data Structures of the individual modules • Algorithms required to implement the individual modules. • Good software designs: − Seldom arrived through a single step procedure: − But through a series of steps and iterations Depending on the order in which various design activities are performed, these are classified into two important stages: • Preliminary (or high-level) design, and • Detailed design • First part is Conceptual Design that tells the customer what the system will do. • Second is Technical Design that allows the system builders to understand the actual hardware and software needed to solve customer’s problem. • Identify − modules − control relationships among modules − interfaces among modules • The outcome of the high-level design − program structure, also called software architecture • For each module, design for it: − data structures − algorithms • The outcome of the detailed design − module specification • There is no unique way to design a software • Even while using the same design methodology: − Different engineers can arrive at very different designs • Need to determine which is a better design • Should implement all functionalities of the system correctly • Should be easily understandable • Should be efficient • Should be easily amenable to change, − i.e. easily maintainable • Modularity is a fundamental attributes of any good design. − Decomposition of a problem into a clean set of modules: − Modules are almost independent of each other − Based on Divide and conquer principle Modularization is the process of dividing a software system into multiple independent modules where each module works independently. There are many advantages of Modularization in software engineering. Some of these are given below: − Easy to understand the system. − System maintenance is easy. − A module can be used many times as their requirements. No need to write it again and again. • Coupling is the measure of the degree of interdependence between modules. • Two modules with high coupling are strongly interconnected and thus, dependent on each other. • Two modules with low coupling are not dependent on one another. Two modules are said to be highly coupled, if either of the following 2 situations arise: • If the function calls between two modules involve passing large chunks of shared data, the modules are tightly coupled. • If the interactions occur through some shared data, then also we say that they are highly coupled. Data Coupling: If the dependency between the modules is based on the fact that they communicate by passing only data, then the modules are said to be data coupled. In data coupling, the components are independent to each other and communicating through data. Module communications don’t contain tramp data. Example-customer billing system. Stamp Coupling In stamp coupling, the complete data structure is passed from one module to another module. Therefore, it involves tramp data. It may be necessary due to efficiency factors- this choice made by the insightful designer, not a lazy programmer. Control Coupling: If the modules communicate by passing control information, then they are said to be control coupled. It can be bad if parameters indicate completely different behaviour and good if parameters allow factoring and reuse of functionality. Example- sort function that takes comparison function as an argument. External Coupling: In external coupling, the modules depend on other modules, external to the software being developed or to a particular type of hardware. Ex- protocol, external file, device format, etc. Common Coupling: The modules have shared data such as global data structures.The changes in global data mean tracing back to all modules which access that data to evaluate the effect of the change. So it has got disadvantages like difficulty in reusing modules, reduced ability to control data accesses and reduced maintainability. Content Coupling: In a content coupling, one module can modify the data of another module or control flow is passed from one module to the other module. This is the worst form of coupling and should be avoided. Data Coupling Stamp Coupling Control Coupling External Coupling Common Coupling Content Coupling Best (High) Worst (Low) • Cohesion is a measure of the degree to which the elements of a module are functionally related. • A strongly cohesive module implements functionality that is related to one feature of the solution and requires little or no interaction with other modules. • Cohesion may be viewed as a glue that keeps the module together. Functional Cohesion Sequential Cohesion Communicational Cohesion Procedural Cohesion Temporal Cohesion Logical Cohesion Coincidental Cohesion Best (High) Worst (Low) Functional Cohesion: Every essential element for a single computation is contained in the component. A functional cohesion performs the task and functions. It is an ideal situation. Sequential Cohesion: An element outputs some data that becomes the input for other element, i.e., data flow between the parts. It occurs naturally in functional programming languages. Communicational Cohesion: Two elements operate on the same input data or contribute towards the same output data. Example- update record into the database and send it to the printer. Procedural Cohesion: Elements of procedural cohesion ensure the order of execution. Actions are still weakly connected and unlikely to be reusable. Ex- calculate student GPA, print student record, calculate cumulative GPA, print cumulative GPA. Thus, procedural cohesion occurs in modules whose instructions although accomplish different tasks yet have been combined because there is a specific order in which the tasks are yet to be completed. Temporal Cohesion: The elements are related by their timing involved. A module connected with temporal cohesion all the tasks must be executed in the same time-span. This cohesion contains the code for initializing all the parts of the system. Lots of different activities occur, all at init time. Logical Cohesion: The elements are logically related and not functionally. Suppose, X and Y are two different operations carried out in the same module and both X and Y perform logically similar operations. Example: more than one data item in an input transaction may be a date. Separate code would be written to check that each such date is a valid date. A better way to construct a DATECHECK module and call this module whenever a date check is necessary. Coincidental Cohesion: The elements are not related(unrelated). The elements have no conceptual relationship other than location in source code. It is accidental and the worst form of cohesion. Ex- check validity and print is a single component with two parts. Coincidental cohesion is to be avoided as far as possible. • Function-oriented Design • Top-down decomposition • Centralised System state • Object-oriented Design • Data abstraction • Data structure • Data type • These techniques were proposed nearly 4 decades ago • Still very popular and are currently being used in many software development project • These techniques view a system as a black-box that provides a set of services to the users of the software • These services provided by a software (e.g., issue book, search book, etc., for a Library Automation Software) to its users are also known as the high-level functions supported by the software. • During the design process, these high-level functions are successively decomposed into more detailed functions. • The term top-down decomposition is often used to denote the successive decomposition of a set of high-level functions into more detailed functions. • After top-down decomposition has been carried out, the different identified functions are mapped to modules and a module structure is created. • This methodology is called structured analysis/structured design (SA/SD) methodology. During structured analysis, the SRS document is transformed into a Data Flow Diagram (DFD) model. During structured design, the DFD model is transformed into a structure chart. During structured analysis, functional decomposition of the system is achieved. That is, each function is analysed and hierarchically decomposed into more detailed functions During structured design, all functions analysed during structured analysis are mapped to a module structure. This module structure is also called the high-level design or the software architecture for the given problem. This is represented using a structure chart. Also known as Bubble chart It is a graphical representation of a system in terms of the input data to the system, various processing carried out on those data, and the output data generated by the system. Starting with a set of high-level functions that a system performs, a DFD model represents the sub-functions performed by the functions using a hierarchy of diagrams. If two bubbles are directly connected by a data flow arrow, then they are synchronous. This means that they operate at the same speed. If two bubbles are connected through a data store, then the speed of the operation of the bubbles are independent. A data dictionary lists all data items that appear in a DFD model. The data items listed include all data flows and the contents of all data stores appearing on all the DFDs in a DFD model. A single data dictionary should capture all the data appearing in all the DFDs constituting the DFD model of a system. For example, a data dictionary entry may represent that the data grosspay consists of the components regularpay and overtimepay. grosspay = regularpay+overtimepay • For the smallest units of data items, the data dictionary simply lists their name and their type. Composite data items can be defined in terms of primitive data items using the following data definition operators. + : denotes composition of two data items, e.g. a+b represents data a and b [, ,]: represents selection. For example, [a,b] represents either a occurs or b occurs () : the contents inside the bracket represent optional data which may or may not appear. For example, a+(b) represents either a or a+b occurs. { } : represents iterative data definition. For example, {name}5 represents five name data. {name}* represents zero or more instances of name data. = : represents equivalence. a=b+c means that a is a composite data item comprising of both b and c. /**/ : Anything appearing within /* and */ is considered as comment. A DFD model of a system graphically represents how each input data is transformed to its corresponding output data through a hierarchy of DFDs. The top level DFD is called the level 0 DFD or Context Diagram. This is the most abstract representation of the system. At each successive lower level DFDs, more and more details are gradually introduced. It represents the entire system as a single bubble. The bubble in the context diagram is annotated with the name of the software system being developed (usually a noun). This is the only bubble in a DFD model where noun is used for naming the bubble. The bubbles at all other levels are annotated with verbs according to the main function performed by the bubble. Examine the SRS document to determine: • Different high-level functions that the system needs to perform. • Data input to every high-level function. • Data output from every high-level function. • Interactions (data flow) among the identified high-level functions. Usually contains three to seven bubbles. That is, the system is represented as performing three to seven important operations. Some of the related requirements have to be combined if the system has more than seven high-level functional requirements. Similarly, some of the requirements need to be split if the system has less than three high-level functional requirements. Examine the high-level functions described in the SRS document. If there are three to seven high-level requirements in the SRS document, then represent each of the high-level function in the form of a bubble. If there are more than seven bubbles, then some of them have to be combined. If there are less than three bubbles, then some of these have to be split. Decompose each high level function into its constituent subfunctions through the following set of activities: • Identify the different subfunctions of the high-level function. • Identify the data input to each of these subfunctions. • Identify the data output from each of these subfunctions. • Identify the interactions (data flow) among these subfunctions. • Drawing more than one bubble in the context diagram. • Drawing too many or too few bubbles in a DFD. • Unbalanced DFDs • Attempt to represent control information in a DFD. • Many beginners create DFD models in which external entities appearing at all levels of DFDs. All external entities interacting with the system should be represented only in the context diagram. The external entities should not appear in the DFDs at any other level. • A DFD should not represent the conditions under which different functions are invoked. • A data flow arrow should not connect two data stores or even a data store with an external entity. • If a bubble A invokes either the bubble B or the bubble C depending upon some conditions, we need only to represent the data that flows between bubbles A and B or bubbles A and C and not the conditions depending on which the two modules are invoked. • All the functionalities of the system must be captured by the DFD model. No function of the system specified in the SRS document of the system should be overlooked. • Only those functions of the system specified in the SRS document should be represented. That is, the designer should not assume functionality of the system not specified by the SRS document and then try to represent them in the DFD. • The data and function names must be intuitive. Some students and even practicing developers use meaningless symbolic data names such as a, b, c, etc. Such names hinder understanding the DFD model. • Too many data flow arrows. It becomes difficult to understand a DFD if any bubble is associated with more than seven data flows. A super market needs to develop a software that would help it to automate a scheme that it plans to introduce to encourage regular customers. In this scheme, a customer would have first register by supplying his/her residence address, telephone number, and the driving license number. Each customer who registers for this scheme is assigned a unique customer number (CN) by the computer. A customer can present his CN to the checkout staff when he makes any purchase. In this case, the value of his purchase is credited against his CN. At the end of each year, the supermarket intends to award surprise gifts to 10 customers who make the highest total purchase over the year. Also, it intends to award a 22 caret gold coin to every customer purchase exceeded Rs. 10,000/-. The entries against the CN are reset on the last day of every year after the prize winners’ lists are generated. • Imprecise DFDs leave ample scope to be imprecise • Not well-defined control aspects are not defined by a DFD • Decomposition • Improper data flow diagram • One of the widely accepted techniques for extending the DFD technique to real-time system analysis. • In the Ward and Mellor notation, a type of process that handles only control flows is introduced. • These processes representing control processing are denoted using dashed bubbles. A structure chart represents the software architecture. The various modules making up the system, the module dependency and the parameters that are passed among the different modules. • Rectangular Boxes: represents a module. • Module invocation arrows: An arrow connecting two modules represents that during program execution, control is passed from one module to the other in the direction of the connecting arrow. • Data Flow Arrows: These are small arrows appearing alongside the module invocation arrows • Library Modules: It is made by a rectangle with double edges. Libraries comprise the frequently called modules. • Selection: The diamond symbol represents the fact that one module of several modules connected with the diamond symbol is invoked depending on the outcome of the condition attached with the diamond symbol. • Repetition: A loop around the control flow arrows denotes that the respective modules are invoked repeatedly. • Only one module at the top called root. • There should be at most one control relationship between any two modules in the structure chart. This means that if module A invokes module B, module B cannot invoke module A. • Lower level modules should be unaware of the existence of higher level modules. • It is usually difficult to identify the different modules of a program from its flow chart representation. • Data interchange among different modules is not represented in a flow chart. • Sequential ordering of tasks that is inherent to a flow chart is suppressed in a structure chart. • Transform Analysis • Transaction Analysis Normally, one would start with the level 1 DFD, transform it into module representation using either the transform or transaction analysis and then proceed towards the lower level DFDs. • If all the data flow into the diagram are processed in similar ways (i.e., if all the input data flow arrows are incident on the same bubble in the DFD), then transform analysis is applicable. Otherwise, transaction analysis is applicable. • Transform analysis is normally applicable at the lower levels of a DFD model. • The first step in transform analysis is to divide the DFD into three parts: 1. Input 2. Processing 3. Output The input portion in the DFD includes processes that transform input data from physical (e.g. character from terminal) to logical form (e.g. internal tables, lists, etc.). Each input portion is called an afferent branch. The output portion of a DFD transforms output data from logical form to physical form. Each output portion is called efferent branch. The remaining portion in DFD is called central transform. • In the next step of transform analysis, the structure chart is derived by drawing one functional component each for the central transform, the afferent and efferent branches. These are drawn below a root module, which would invoke these modules. • Identifying the input and output parts requires experience and skill. One possible approach is to trace the input data until a bubble is found whose output data cannot be deduced from its inputs alone. • Processes which validate input are not central transforms. Processes which sort input or filter data from it are central transformations. • The first level of structure chart is produced by representing each input and output unit as a box and each central transform as a single box. • In the third step of transform analysis, the structure chart is refined by adding sub functions required by each of the high-level functional components. • Many level of functional components may be added. • This process of breaking functional components into subcomponents is called factoring. • Factoring includes adding read and write modules, error-handling modules, initialisation and termination process, identifying consumer modules etc. • The factoring process is continued until all bubbles in the DFD are represented in the structure chart. By observing the level 1 DFD of RMS Software, we can identify validate-input as the afferent branch and write-output as the efferent branch. The remaining (i.e., compute-rms) as the central transform. • It is well known that mastering an object-oriented programming language such as Java or C++ rarely equips one with the skills necessary to develop good quality object-oriented software – it is important to learn the object-oriented design skills well. • Once a good design has been arrived at, it is easy to code it using an object-oriented language. • It has now even become possible to automatically generate much of the code from the design by using a CASE tool. • In order to arrive at a satisfactory object-oriented design (OOD) solution to a problem, it is necessary to create several types of models. • What has modelling got anything to do with designing? • A model is constructed by focusing only on a few aspects of the problem and ignoring the rest. A model of a given problem is called its analysis model. • On the other hand, a model of the solution (code) is called design model. • The design model is usually obtained by carrying out iterative refinements to the analysis model using a design methodology. In the context of model construction, we need to carefully understand the distinction between a modelling language and a design process. • Modelling Language: A modelling language consists of a set of notations using which design and analysis models are documented. • Design Process: A design process addresses the following issue: “Given a problem description, how to systemically work out the design solution to the problem?” • • • • Code and design reuse Increased productivity Ease of testing and maintenance Better code and design understandability, which are especially important to the development of large programs. Out of all the above mentioned advantages, it is usually agreed that the chief advantage of OOD is improved productivity – which comes due to the following factors: • Code reuse is facilitated through easy use of predeveloped class libraries • Code reuse due to inheritance • Simpler and more intuitive abstraction, which support, better management of inherent problem and code complexities • Better problem decomposition • The principles of abstraction, data hiding, inheritance, etc. do incur run time overhead due to the additional code that gets generated on account of these features. This causes an object-oriented program to run a little slower than an equivalent procedural program. • An important consequence of object-orientation is that the data that is centralised in a procedural implementation, gets scattered across various objects in an object-oriented implementation. For example, in a procedural program, the set of books may be implemented in the form of an array. In this case, the data pertaining to the books are stored at consecutive memory locations. On the other hand, in an object-oriented program, the data for a collection of book objects may not et stored consecutively. Therefore, the spatial locality of data becomes weak and this leads to higher cache miss ratios and consequently to larger memory access times. This finally shows up as increased program run time. • UML is a language for documenting models. • Just like any other language, UML has its own syntax (a set of basic symbols and sentence formation rules) and semantics (meanings of basic symbols and sentences). • It also provides a set of basic graphical notations (e.g., rectangles, lines, ellipses, etc) that can be combined in certain ways to document the design and anlysis results. • UML is not a system design or development methodology by itself, neither is it designed to be tied to any specific methodology. • Before the advent of UML, every design methodology that existed, not only prescribed its unique set of design steps, but each was tied to some specific design modelling language. • For example, OMT methodology had its own design methodology and had its own unique set of notations. So was the case with Booch’s methodology, and so on. • This situation made it hard for someone familiar with one methodology to understand and reuse the design solutions available from a project that used a different methodology. • One of the objectives of the developers of UML was to keep the notations of UML independent of any specific design methodology, so that it can be used along with any specific design methodology. In this respect, UML is different from its predecessors. User’s View This view defines the functionalities that would be made available by a system to its users. The user’s view captures the functionalities offered by the system to its users. Structural View The structural view defines the structure of the problem (or the solution) in terms of the kinds of objects (classes) important to the understanding of the working of a system and to its implementation. It also captures the relationships among the classes (objects). Behavioural View The behavioural view captures how objects interact with each other in time to realise the system behaviour. The system behaviour captures the time-dependent (dynamic) behaviour of the system. It, therefore, constitutes the dynamic model of the system. Implementation View This view captures the important components of the system and their interdependencies. For example, the implementation view might show the GUI part, the middleware, and the database part as the different components and also would capture their interdependencies. Environmental View This view models how the different components are implemented on different pieces of hardware. For any given problem, should one construct all the views using all the diagrams provided by UML? The answer is No. For a simple system, the use case model, class diagram, and one of the interaction diagrams may be sufficient. For a system in which the objects undergo many state changes, a state chart diagram may be necessary. For a system, which is implemented on a large number of hardware components, a deployment diagram may be necessary. So, the type of models to be constructed depends on the problem at hand. “Just like you do not use all the words listed in the dictionary while writing a prose, you normally do not use all the UML diagrams and modelling elements while modelling a system.” ➢ ➢ Use Case is one way of representing system functionality. Use case refers to • A system’s behavior (functionality) • A set of activities that produce some output. Intuitively, the use cases represent the different ways in which a system can be used by the users. 19 ➢ A simple way to find all the use cases of a system is to ask the question: “What all can the different categories of users achieve by using the system?” ➢ When we pose this question for the library information system (LIS), the use cases could be identified to be: • • • • • issue-book query-book return-book create member add-book, etc. 20 ➢ A Use Case is a term in UML: − Represents a high-level functional requirement ➢ Use case representation is more well-defined and has agreed documentation: − Compared to a high-level functional requirement and its documentation − Therefore many organizations document the functional requirements in terms of use cases 21 Use Case Diagrams: ▪ A simple but very effective model used during the analysis phase for analysing requirements through the process of exploring user interactions with the system. ▪ The process involves documenting Who initiates an interaction, What information goes into the system, What information comes out and What the measurable benefit is to the user who initiates the interaction (i.e. what they get out of it). Requirements analysis attempts to uncover and document the services the system provides to the user. ➢ A diagram representing system’s behavior—use cases and actors. a global look of a system – it’s basic functionality - (use cases) and environment (actors). ➢ Provides ➢ Use case diagrams describe what a system does from the standpoint of an external observer. The emphasis is on what a system does rather than how. ➢ Useful for early structuring of requirements; iterative 23 revisions. ➢ Req 1: − Once user selects the “search” option, • He is asked to enter the key words. − The system should output details of all books • whose title or author name matches any of the key words entered. • Details include: Title, Author name, publisher name, Year of Publication, ISBN Number, Catalog Number, Location in the library 24 ➢ Req 2: − When the “renew” option is selected, • The user is asked to enter his membership number and password. − After password validation • The list of the books borrowed by him are displayed. − The user can renew any of the books • By clicking in the corresponding renew box. 25 • A high-level function: − Usually involves a series of interactions between the system and one or more users • Even for the same high-level function, − There can be different interaction sequences (or scenarios) − Due to users selecting different options or entering different data items 26 ➢ Each use case in a use case diagram describes one and only one function in which users interact with the system contain several “paths” that a user can take while interacting with the system ➢ May ➢ Each path is referred to as a scenario ➢ Labelled using a descriptive verb-noun phrase ➢ Represented by an oval Make Appointment Make Appointment ➢ Labelled using a descriptive noun or phrase ➢ Represented by a stick character ➢ Relationships Represent communication between actor and use case Depicted by line or double-headed arrow line Also called association relationship Make Make Appointment Appointment It involves brain storming and reviewing the SRS document. Typically, the high-level requirements specified in the SRS document correspond to the use cases. In the absence of a well-formulated SRS document, a popular method of identifying the use cases is actor-based. 34 Essential use cases are created during early requirements elicitation. These are also early problem analysis artifacts. These are independent of the design decisions and tend to be correct over long periods of time. Real use cases describe the functionality of the system in terms of the actual design targeted for specific for specific input/output technologies. Therefore, the real use cases can be developed only after the design decisions have been made. 35 ➢Decision Trees ➢Decision Tables ➢ A Decision Tree gives a graphic view of: ▪ Logic involved in decision making ▪ Corresponding actions taken ➢Edges of a decision tree represents conditions ➢Leaf nodes represent actions to be performed. Consider Library Membership Automation Software (LMS) where it should support the following three options: ▪ ▪ ▪ New member Renewal Cancel membership New member optionDecision: When the 'new member' option is selected, the software asks details about the member like the member's name, address, phone number etc. Action: If proper information is entered then a membership record for the member is created and a bill is printed for the annual membership charge plus the security deposit payable. Renewal optionDecision: If the 'renewal' option is chosen, the LMS asks for the member's name and his membership number to check whether he is a valid member or not. Action: If the membership is valid then membership expiry date is updated and the annual membership bill is printed, otherwise an error message is displayed. Cancel membership optionDecision: If the 'cancel membership' option is selected, then the software asks for member's name and his membership number. Action: The membership is cancelled, a cheque for the balance amount due to the member is printed and finally the membership record is deleted from the database. The following tree shows the graphical representation of the above example. After getting information from the user, the system makes a decision and then performs the corresponding actions. Decision Tree for LMS ➢Decision Tables represent: • Which variables are to be tested • What actions are to be taken if the conditions are true • The order in which decision making is performed ➢A decision table shows in a tabular form: • Processing logic and corresponding actions ➢Upper rows of the table specify: • The variables or conditions to be evaluated ➢Lower rows specify: • The actions to be taken when the corresponding conditions are satisfied. Consider the previously discussed LMS example. The following decision table shows how to represent the LMS problem in a tabular form. Here the table is divided into two parts, the upper part shows the conditions and the lower part shows what actions are taken. Each column of the table is a rule. ➢Develop decision table for the following: • If the flight is more than half-full and ticket cost is more than Rs. 3000, free meals are served unless it is a domestic flight. The meals are charged on all domestic flights. • A class represents entities (objects) with common features. • That is, objects having similar attributes and operations constitute a class. • A class is represented by a solid outlined rectangle with compartments. • The class name is usually written using mixed case convention and begins with an uppercase (e.g., LibraryMember) • Object names on the other hand, are written using a mixed case convention, but starts with a small case letter (e.g., studentMember). • Class names are usually chosen to be singular nouns. • A class diagram can describe the static structure of a simple system. • It shows how a system is structured rather than how it behaves. • The static structure of a more complex system usually consists of a number of class diagrams. • Each class diagram represents classes and their inter-relationships. Classes may be related to each other in four ways: • • • • Inheritance Association and Link Aggregation and Composition Dependency The inheritance feature allows to define a new class by extending the features of an extending class. The original class is called the base class (also called superclass or parent class) and the new class obtained through inheritance is called the derived class (also called the subclass or a child class). Association is a common relation among classes and frequently occurs in design solutions. When two classes are associated, they can take each others’ help to serve user’s requests. Association between two classes is represented by drawing a straight line between the concerned classes. The name of the association is written alongside the association line. An arrowhead may be placed on the association line to indicate the reading direction of the annotated association name. The arrowhead should not be misunderstood to be indicating the direction of a pointer implementing an association. On each side of the association relation, the multiplicity is either noted as a single number or as a value range. The multiplicity indicates how many instances of one class are associated with one instance of the other class. When two classes are associated with each other, it implies that in the implementation, the object instances of each class stores the id of the objects of the other class with which it is linked. In other words, bidirectional association relation between two classes in implemented by having the code of the two classes by maintaining the reference to the object of the object of the other class as an instance variable. Identify the association relation among classes and the corresponding association links among objects from an analysis of the following description. “A person works for a company. Ram works for Infosys. Hari works for TCS.” Composition and aggregation represent part/whole relationships among objects. Objects which contain other objects are called composite objects. Aggregation is a special type of association relation where the involved classes are not only associated to each other, but a whole-part relationship exists between them. That is, an aggregate object not only “knows” the ids of its part (and therefore can invoke the methods of its parts), but also takes the responsibility of creating and destroying its parts. An example of an aggregation relation is a book register that aggregates book objects. Book objects can be added to the register and deleted as and when required. Composition is a stricter form of aggregation, in which the parts are existence-dependent on the whole. This means that the lifeline of the whole and the part are identical. When the whole is created, the parts are created and when the whole is destroyed, the parts are destroyed. Both aggregation and composition represent part/whole relationships. If components can dynamically be added to and removed from the aggregate, then the relationship is expressed as aggregation. On the other hand, if the components are not required to be dynamically added/deleted, then the components have the same life time as the composite. In this case, the relationship should be represented by composition. Composition • • • B is a permanent part of A A is made up of Bs A is a permanent collection of Bs Aggregation • • • B is a part of A A contains B A is a collection of Bs Inheritance • • • A is a kind of B A is a specialisation of B A behaves like B Aggregation • • • A delegates to B A needs help from B A collaborates with B Consider this situation in a business system: “A Business Partner might be a Customer or a Supplier or both. Initially we might be tempted to model it as in figure below: But in fact, during its lifetime, a business partner might become a customer as well as a supplier, or it might change from one to the other. In such cases, we prefer aggregation instead, as in below figure LibraryMember LibraryMember Member name Membership number Address Phone number E-mail address Membership admission date Membership expiry date Books issued Member name Membership number Address Phone number E-mail address Membership admission date Membership expiry date Books issued issueBook(); findPendingBooks(); findOverdueBooks(); returnBook(); findMembershipDetails(); LibraryMember (libraryMember) (libraryMember) Amit Jain b04025 C-108, R.K.Hall 4211 amit@cse 20-07-97 1-05-98 NIL Amit Jain b04025 C-108, R.K.Hall 4211 amit@cse 20-07-97 1-05-98 NIL issueBook(); findPendingBooks(); findOverdueBooks(); returnBook(); findMembershipDetails(); (libraryMember) • An attribute is a named property of a class. • It represents the kind of data that an object might store. • Attributes are listed by their names, and optionally their types (that is, their class, e.g., Int, Book, Employee, etc), an initial value, and some constraints may be specified. • Attribute names begin with a lower case letter, and are written left-justified using plain type letters. • An attribute name may be followed by square brackets containing a multiplicity expression, e.g., sensorStatus[10]. The multiplicity expression indicates the number of attributes that would be present per instance of the class. “A patient is a person and has several persons as relatives.” An engineering college offers B.Tech degrees in three branches- Electronics, Electrical and Computer Science. These B.Tech programs are offered by the respective departments. Each branch can admit 30 students each year. For a student to complete the B.tech Degree, he/she has to clear all the 30 core courses and atleast 10 elective courses. When a user invokes any one of the use cases of a system, the required behavior is realized through the interaction of several objects in the system. An interaction diagram, as the name itself implies, is model that describes how group of objects interact among themselves through message passing to realize some behavior (execution of a use case). Interaction diagrams are models that describe how group of objects collaborate to realize some behavior. Typically, each interaction diagram realizes the behavior of a single use case. An interaction diagram shows a number of example objects and the messages that are passed between the objects within the use case. • Sometimes, more than one interaction diagram may be necessary to capture the behaviour. • 2 kinds – sequence diagrams and collaboration diagrams. • 2 diagrams are equivalent -in the sense that any one diagram can be derived automatically from the other. • It shows the interactions among objects as a two dimensional chart. • The chart is read from top to bottom. • The objects participating in the interaction are drawn at the top of the chart as boxes attached to a vertical dashed line • Inside the box, the name of the object is written with a colon separating it from the name of the class and both the name of the object and the class are underlined. • When no name is specified, it indicates that we are referring any arbitrary instance of the class. • An object appearing at the top of a sequence diagram signifies that the object existed even before the time the use case execution was initiated. • However, if some object created during the execution of a use case and participates in the interaction (e.g., a method call), then the object should be shown at the appropriate place on the diagram where it is created. • Objects existence are shown as dashed lines (lifeline) • Objects activeness, shown as a rectangle on lifeline • Messages are shown as arrows. • Each message labelled with corresponding message name. • Each message can be labelled with some control information. • Two types of control information: -condition ([]) -iteration (*) • A condition is represented within square brackets. A condition prefixed to the message name, such as [invalid] indicates that a message is sent, only if the condition is true. • An iteration marker shows that the message is sent many times to multiple receiver objects as would happen when you are iterating over a collection or the elements of an array. You can also indicate the basis of the iteration, e.g., [for every book object]. : Customer : Order : Payment : Product : Supplier place an order object control process validate if ( payment ok ) deliver lifetime if ( not in stock ) back order get address message mail to address : Customer : Order : Payment : Product : Supplier place an order process validate Sequence of message sending if ( payment ok ) deliver if ( not in stock ) back order get address mail to address :Library Book Renewal Controller :Library Boundary :Library Book Register renewBook :Book find MemberBorrowing displayBorrowing selectBooks bookSelected [reserved] [reserved] apology * find update apology confirm confirm updateMemberBorrowing Sequence Diagram for the renew book use case :Library Member Shows both structural and behavioural aspects Objects are collaborator, shown as boxes Messages between objects shown as a solid line A message is shown as a labelled arrow placed near the link Messages are prefixed with sequence numbers to show relative sequencing :Library Book Register [reserved] 8: apology 1: renewBook :Library Boundary 3: display Borrowing 4: selectBooks 5: book Selected :Library Book Renewal Controller 6: * find :Book 9: update 10: confirm [reserved] 7: apology 2: findMemberBorrowing 12: confirm 11: updateMemberBorrowing :Library Member Collaboration Diagram for the renew book use case • Not present in earlier modelling techniques. • Possibly based on event diagram of Odell [1992] • An activity diagram can be used to represent various activities (or chunks of processing) that occur during execution of the software and their sequence of activation. • Activity is a state with an internal action and one/many outgoing transitions • Somewhat related to flowcharts • Can represent parallel synchronization aspects activity and • Activity Diagrams incorporate swim lanes to indicate which components of the software are responsible for which activities. • Example: academic department vs. hostel • Normally employed modelling. in business process • Carried out during requirements analysis and specification stage. • Can be used to develop interaction diagrams. Academic Section Accounts Section check student records Hostel Office Hospital Department receive fees allot hostel create hospital record register in course receive fees allot room issue identity card conduct medical examination Activity diagram for student admission procedure at SNU Finance Order Processing Receive Order *[for each item on order] Authorize Payment Check Item [failed] Cancel Order [succeeded] Receive Supply Choose Outstanding Order Items * [for each chosen order item] [in stock] Assign to Order Stock Manager Assign Goods to Order [need to reorder] Reorder Item [stock assigned to all line items and payment authorized] Dispatch Order [all outstanding order items filled] • Based on the work of David Harel [1990] • Model how the state of an object changes in its lifetime • Based on finite state machine (FSM) formalism • An FSM model consists of a finite number of states corresponding to those that the object being modelled can take. • The object undergoes state changes when specific events occur. • State chart avoids the problem of state explosion of FSM. • The number of states becomes too many and the model too complex when used to model practical systems. • This problem is overcome in UML by using state charts rather than FSMs. • Hierarchical model of a system. • Represents composite nested states. • Elements of state chart diagram • Initial State: A filled circle • Final State: A filled circle inside a larger circle • State: Rectangle with rounded corners • Transitions: Arrow between states, also boolean logic condition (guard) order received [reject] checked Unprocessed Order [accept] checked Rejected Order Accepted Order [some items available] processed / deliver [some items not available] processed Pending Order [all items available] newsupply Example: State chart diagram for an order object Fulfilled Order • Effective Project Management is crucial for the success of any software development project. • In the past, several projects have failed not for want of competent technical professionals neither for lack of resources, but due to the use of faulty project management practices. • The main goal of software project management is to enable a group of developers to work effectively towards the successful completion of a project. • • • • • • Invisibility Changeability Complexity Uniqueness Exactness of the solution Team-oriented and intellect-intensive work • Project planning Project planning is undertaken immediately after the feasibility study phase and before the starting of the requirements analysis and specification phase. The initial project plans are revised from time to time as the project progresses and more project data become available. • Project Monitoring and Control Project monitoring and control activities are undertaken once the development activities start. The focus of project monitoring and control activities is to ensure that the software development proceeds as per plan. During project planning, the project manager performs the following activities: • Estimation: The following project attributes are estimated. • Cost, Duration and Effort • Scheduling The schedules for manpower and other resources are developed. • Staffing Staff organisation and staffing plans are made. • Risk Management This includes risk identification, analysis and rectification planning. • Miscellaneous Plans This includes making several other plans such as quality assurance plan, and configuration management plan, etc. Effort estimation Cost estimation Duration estimation Project Staffing Size estimation Scheduling • Lines of Code (LOC) • Function Point (FP) Metric • Simplest among all metrics available • Extremely popular • Measures the size of project by counting the number of source instructions in the developed program • LOC is a measure of coding activity alone • LOC count depends on the choice of specific instructions • LOC measure correlates poorly with the quality and efficiency of the code • LOC metric penalises use of higher-level programming languages and code reuse • LOC metric measures the lexical complexity of a program and does not address the more important issue of logical and structural complexities. • It is very difficult to accurately estimate LOC of the final program from problem specification • Proposed by Albrecht and Gaffney in 1983 • Overcomes many of the shortcomings of the LOC metric • It can easily be computed from the problem specification itself • The size of a software product is directly dependent on the number of different high-level functions or features it supports. This assumption is reasonable, since each feature would take additional effort to implement. • The size of a software product is computed using different characteristics of the product identified in its requirements specification. • It is computed using following 3 steps: 1. Compute UFP (unadjusted function point) 2. Refine UFP to reflect the actual complexities of the different parameters used in UFP computation. 3. Compute FP by further refining UFP. The UFP is computed as the weighted sum of five characteristics of a product as shown in the following expression. The weights associated with the five characteristics were determined empirically by Albrecht through data gathered from many projects. UFP = (Number of inputs) * 4 + (Number of outputs) * 5 + (Number of inquiries) * 4 + (Number of files) * 10 + (Number of interfaces) * 10 The meanings of the different parameters are as follows: 1. Number of inputs: Each data item input by the user is counted. However, it should be noted that data inputs are considered different from user inquiries. Inquiries are user commands such as print-account-balance that require no data values to be input by the user. Inquiries are counted separately. * Individual data items input by the user are not simply added up to compute the number of inputs, but related inputs are grouped and considered as a single unit. For example, while entering the data concerning an employee to an employee pay roll software; the data items name, age, sex, address, phone number, etc. are together considered as a single input. 2. Number of outputs: The outputs considered include reports printed, screen outputs, error messages produced, etc. While computing the number of outputs, the individual data items within a report are not considered; but a set of related data items is counted as just a single output. 3. Number of inquiries: An inquiry is a user command (without any data input) and only requires some actions to be performed by the system. Thus, the total number of inquiries is essentially the number of distinct interactive queries (without data input) which can be made by the users. 4. Number of files: The files referred to here are logical files. A logical file represents a group of logically related data. Logical files include data structures as well as physical files. 5. Number of interfaces: Here the interfaces denote the different mechanisms that are used to exchange information with other systems. Examples of such interfaces are data files on tapes, disks, communication links with other systems, etc. UFP computed at the end of step 1 is a gross indicator of the problem size. This UFP needs to be refined by taking into account various peculiarities of the project. This is possible, since each parameter (input, output, etc.) has been implicitly assumed to be of average complexity, some very simple, etc. Type Simple Average Complex Input (I) 3 4 6 Output (O) 4 5 7 Inquiry (E) 3 4 6 Number of Files (F) 7 10 15 Number of interfaces 5 7 10 In the final step, several factors that can impact the overall project size are considered to refine the UFP computed in step 2. Examples of such project parameters that can influence the project sizes include high transaction rates, response time requirements, scope for reuse, etc. Albrecht identified 14 parameters that can influence the development effort. The list of these parameters have been shown in the next slides. Each of these 14 parameters is assigned a value from 0 (not present or no influence) to 6 (strong influence). The resulting numbers are summed, yielding the total degree of influence (DI). A Technical Complexity Factor (TCF) for the project is computed and the TCF is multiplied with UFP to yield FP. The TCF expresses the overall impact of the corresponding project parameters on the development effort. TCF is computed as (0.65 + 0.01 * DI). As DI can vary from 0 to 84, TCF can vary from 0.65 to 1.49. Finally, FP is given as the product of UFP and TCF. That is, FP = UFP * TCF Requirement for reliable backup and recovery Requirement for data communication Extent of distributed processing Performance requirements Expected operational environment Extent of online data entries Extent of multi-screen or multi-operation online data input Extent of online updating of master files Extent of complex inputs, outputs, online queries and files Extent of complex data processing Extent that currently developed code can be designed for reuse Extent of conversion and installation included in the design Extent of multiple installations in an organisation and variety of customer organisations Extent of change and focus on ease of use • It does not take account the algorithmic complexity of a function. • That is, the function point metric implicitly assumes that the effort required to design and develop any two different functionalities of the system is the same. • Example: create-member vs loan-from-remote-library • FP only considers the number of functions that the system supports, without distinguishing the difficulty levels of developing the various functionalities. Feature point metric incorporates algorithm complexity as an extra parameter. This parameter ensures that the computed size using the feature point metric reflects the fact that higher the complexity of a function, the greater the effort required to develop it- therefore, it should have larger size compared to a simpler function. A super market needs to develop a software that would help it to automate a scheme that it plans to introduce to encourage regular customers. In this scheme, a customer would have first register by supplying his/her residence address, telephone number, and the driving license number. Each customer who registers for this scheme is assigned a unique customer number (CN) by the computer. A customer can present his CN to the checkout staff when he makes any purchase. In this case, the value of his purchase is credited against his CN. At the end of each year, the supermarket intends to award surprise gifts to 10 customers who make the highest total purchase over the year. Also, it intends to award a 22 caret gold coin to every customer purchase exceeded Rs. 10,000/-. The entries against the CN are reset on the last day of every year after the prize winners’ lists are generated. • Empirical Estimation Techniques • Heuristic Techniques • Analytical Estimation Techniques • Based on making an educated guess • Based on the prior experience development of similar products with the • Two such formalisation of the basic empirical estimation techniques: • Expert Judgement • Delphi Cost Estimation • Widely used size estimation technique • An expert makes an educated guess about the problem size after analysing the problem thoroughly. • Few Shortcomings: • Subject to human errors and individual bias • Expert may overlook some factors inadvertently • Tries to overcome some of the shortcomings of the expert judgement approach • Consumes more time and effort, overcomes an important shortcoming of the expert judgement technique in that the results cannot unjustly be influenced by overly assertive and senior members. • Assume that the relationships that exist among the different project parameters can be satisfactorily modelled using suitable mathematical expressions. • Once the basic (independent) parameters are known, the other (dependent) parameters can be easily determined by substituting the values of the independent parameters in the corresponding mathematical expression. • Different heuristic estimation models can be divided into two broad categories: • Single variable model • Multi variable model • Assume that various project characteristics can be predicted based on a single previously estimated basic (independent) characteristic of the software such as its size. Estimated Parameter = c1 * ed1 • COCOMO model: an example of single variable cost estimation model Estimated Resource = c1 * pd1 + c2 * pd2 + ….. • Intermediate COCOMO model: an example of multi variable cost estimation model • Multivariable estimation models are expected to give more accurate estimates compared to the single variable models, since a project parameter is typically influenced by several independent parameters. • Any software development project can be classified in 3 categories based on the development complexity – organic, semidetached and embedded. • Based on the different categories, Boehm gave different formulae to estimate the effort and duration from the size estimate. • To classify any project, apart from considering the characteristics of the product, one needs to consider the development team and development environment also. • Roughly, the three product development classes correspond to development of application, utility and system software. • Normally, data processing programs considered to be application programs are • Compilers, linkers, etc. are considered to be utility programs • Operating systems and real-time programs, etc. are system programs system • So, the relative level of their complexities are 1:3:9 • Organic – A software project is said to be an organic type if the team size required is adequately small, the problem is well understood and has been solved in the past and also the team members have a nominal experience regarding the problem. • Semi-detached – A software project is said to be a Semi-detached type if the vital characteristics such as team-size, experience, knowledge of the various programming environment lie in between that of organic and Embedded. The projects classified as Semi-Detached are comparatively less familiar and difficult to develop compared to the organic ones and require more experience and better guidance and creativity. Eg: Compilers or different Embedded Systems can be considered of Semi-Detached type. • Embedded – A software project with requiring the highest level of complexity, creativity, and experience requirement fall under this category. Such software requires a larger team size than the other two models and also the developers need to be sufficiently experienced and creative to develop such complex models. • Person-Month (PM) is considered to be an appropriate unit for measuring effort, because developers are typically assigned to a project for a certain number of months. • One person month is the effort an individual can typically put in a month. The person-month estimate implicitly takes into account the productivity losses that normally occur due to time lost in holidays, weekly offs, coffee breaks, etc. The basic COCOMO model is a single variable heuristic model that gives an approximate estimate of the project parameters. The basic COCOMO estimation model is given by expressions of the following forms: Effort = a1 * (KLOC)a2 PM Tdev = b1 * (Effort)b2 months where, • KLOC is the estimated size of the software product expressed in Kilo Lines Of Code . • a1, a2, b1, b2 are constants for each category of software product. • • Tdev is the estimated time to develop the software, expressed in months. Effort is the total effort required to develop the software product, expressed in person-months (PMs). According to Boehm, every line of source text should be calculated as one LOC irrespective of the actual number of instructions on that line. Thus, if a single instruction spans several lines (say n lines), it is considered to be nLOC. The values of a1, a2, b1, b2 for different categories of products as given by Boehm [1981] are summarised in the next slides. He derived these values by examining historical data collected from a large number of actual projects. For the three classes of software products, the formulas for estimating the effort based on the code size are shown below: Organic: Effort = 2.4 * (KLOC)1.05 PM Semi-detached: Effort = 3.0 * (KLOC)1.12 PM Embedded: Effort = 3.6 * (KLOC)1.20 PM For the three classes of software products, the formulas for estimating the development time based on the effort are given below: Organic: Tdev = 2.5 * (Effort)0.38 Months Semi-detached: Tdev = 2.5 * (Effort)0.35 Months Embedded: Tdev = 2.5 * (Effort)0.32 Months Assume that the size of an organic type software product has been estimated to be 32,000 lines of source code. Assume that the average salary of a software developer is 15,000/- per month. Determine the effort required to develop the software product, the nominal development time, and the cost to develop the product. From the basic COCOMO estimation formula for organic software: Effort = 2.4 * (32)1.05 = 91 PM Nominal Development Time = 2.5 * (91)0.38 = 14 Months Staff cost required to develop the product = 91 * 15,000 = 1,465,000 • The basic COCOMO model assumes that the effort is only a function of the number of lines of code and some constants evaluated according to the different software system. • However, in reality, no system’s effort and schedule can be solely calculated on the basis of Lines of Code. For that, various other factors such as reliability, experience, Capability also needs to be considered. • These factors are known as Cost Drivers and the Intermediate Model utilizes 15 such drivers for cost estimation. (i) Product attributes – Required software reliability extent Size of the application database The complexity of the product (ii) Hardware attributes – Run-time performance constraints Memory constraints The volatility of the virtual machine environment Required turnaround time (iii) Personnel attributes – Analyst capability Software engineering capability Applications experience Virtual machine experience Programming language experience (iv) Project attributes – Use of software tools Application of software engineering methods Required development schedule Cost Drivers Very Low Low Normal High Very High Product Attributes Required Software Reliability 0.75 Size of Application Database Complexity of The Product 0.70 0.88 1.00 1.15 1.40 0.94 1.00 1.08 1.16 0.85 1.00 1.15 1.30 Cost Drivers Very Low Low Normal High Very High Runtime Performance Constraints 1.00 1.11 1.30 Memory Constraints 1.00 1.06 1.21 Hardware Attributes Volatility of the virtual machine environment 0.87 1.00 1.15 1.30 Required turnabout time 0.94 1.00 1.07 1.15 Cost Drivers Very Low Low Normal High Very High Analyst capability 1.46 1.19 1.00 0.86 0.71 Applications experience 1.29 1.13 1.00 0.91 0.82 Software engineer capability 1.42 1.17 1.00 0.86 0.70 Virtual machine experience 1.21 1.10 1.00 0.90 Programming Language Experience 1.14 1.07 1.00 0.95 Personnel Attributes Cost Drivers Very Low Low Normal High Very High Project Attributes Application of software engineering methods 1.24 1.10 1.00 0.91 0.82 Use of software tools 1.24 1.10 1.00 0.91 0.83 Required development schedule 1.23 1.08 1.00 1.04 1.10 The project manager is to rate these 15 different parameters for a particular project on a scale of one to three. Then, depending on these ratings, appropriate cost driver values are taken from the above table. These 15 values are then multiplied to calculate the EAF (Effort Adjustment Factor). The Intermediate COCOMO formula now takes the form: Effort = a(KLOC)b * EAF The values of a and b are as follows: SOFTWARE PROJECTS a b Organic 3.2 1.05 Semi Detached 3.0 1.12 Embedded 2.8 1.20 • A major shortcoming of both the basic and the intermediate COCOMO models is that they consider a software product as a single homogeneous entity. • However, most large systems are made up of several smaller sub-systems. • In detailed cocomo, the whole software is divided into different modules and then we apply COCOMO in different modules to estimate effort and then sum the effort. • In other words, the cost to develop each sub-system is estimated separately, and the complete system cost is determined as the subsystem costs. This approach reduces the margin of error in the final estimate. • Let us consider the following development project as an example application of the complete COCOMO model. A distributed Management Information System (MIS) product for an organisation having offices at several places across the country can have the following sub-component: • Database Part • Graphical User Interface (GUI) part • Communication part • An analytical technique to measure size, development effort and development cost of software products. • Halstead used a few primitive program parameters to develop the expressions for overall program length, potential minimum volume, actual volume, language level, effort, and development time. • For a given program, let: Ƞ1 be the number of unique operators used in the program, Ƞ2 be the number of unique operands used in the program, N1 be the total number of operators used in the program, N2 be the total number of operands used in the program • For Example: a = &b; a, b are the operands = , & are the operators Another Example: int func (int a, int b) { ……. ……. } {}, ( ) are the operators • Third Example: func(a, b); func, ‘,’ ; are the operators and a, b are the operands • Program Length (N) and Vocabulary (Ƞ) Length, N= N1 + N2 Program Vocabulary, Ƞ = Ƞ1 + Ƞ2 • Program Volume (V) V = N log2 Ƞ • Potential Minimum Volume (V*) func(); • Potential Minimum Volume (V*) func(); d1, d2, d3, ….dn main() { func(d1,d2,d3,…dn); } Ƞ1 =3 Ƞ2 = n V*=(2+ Ƞ2) log2 (2+ Ƞ2) • Effort and Time E = V/L E = V2/V* (since L = V*/V) • Programmer’s Time T = E/S, where S is the speed of mental discriminations (S=18) • Length Estimation • Length Estimation (Contd…) Since operators and operands usually alternate in a program, we can further refine the upper bound into N ≤ Ƞ Ƞ1Ƞ1 Ƞ2Ƞ2 . Also, N must include not only the ordered set of N elements, but it should also include all possible subsets of that ordered set, i.e. the power set of N strings (This particular reasoning of Halstead is hard to justify!). Therefore, 2N = Ƞ Ƞ1Ƞ1 Ƞ2Ƞ2 Or, taking logarithm on both sides, N = log2 Ƞ + log2(Ƞ1Ƞ1 Ƞ2Ƞ2) So, we get, N = log2(Ƞ1Ƞ1 Ƞ2Ƞ2) or, N = log2 Ƞ1Ƞ1 + log2 Ƞ2Ƞ2 = Ƞ1 log2 Ƞ1 + Ƞ2 log2 Ƞ2 Experimental evidence gathered from the analysis of a large number of programs suggest that the computed and actual lengths match very closely. However, the results may be inaccurate when small programs are considered individually. Consider the following C program: main() { int a, b, c, avg; scanf(“%d%d%d”, &a, &b, &c); avg=(a+b+c)/3; printf(“avg=%d”, avg); } Find out estimated Length and Program volume. Unique Operators: main, ( ), { }, int, scanf, &, “,”, “;”, =, +, /, printf (12) Unique Operands: a, b, c, avg, %d%d%d, &a, &b, &c, 3, avg = %d, a+b+c (11) Ƞ1 = 12, Ƞ1 = 11 Estimated Length = (12 log (12)+11 log(11) = 81 Volume = N log n = 81 log (23) = 366 Software Project Management Scheduling • Scheduling the project tasks is an important project planning activity. • The scheduling problem, in essence, consists of deciding which tasks would be taken up when and by whom. • Once a schedule has been worked out and the project gets underway, the project manager monitors the timely completion of the tasks and takes any corrective action that may be necessary whenever there is a chance of schedule slippage. In order to schedule the project activities, a software project manager needs to do the following: Scheduling (Contd..) 1. Identify all the major activities that need to be carried out to complete the project. 2. Break down each activity into tasks. 3. Determine the dependency among different tasks. 4. Establish the estimates for the time durations necessary to complete the tasks. 5. Represent the information in the form of an activity network. 6. Determine task starting and ending dates from the information represented in the activity network. 7. Determine the critical path. A critical path is a chain of tasks that determines the duration of the project. 8. Allocate resources to tasks. Work Breakdown Structure • Work breakdown structure (WBS) is used to recursively decompose a given set of activities into smaller activities. • First, let us understand why it is necessary to break down project activities into tasks. Once project activities have been decomposed into a set of tasks using WBS, the time frame when each activity is to be performed is to be determined. • Tasks are the lowest level work activities in a WBS hierarchy. They also form the basic units of work that are allocated to the developer and scheduled. Work Breakdown Structure • The end of each important activity is called a milestone. • The project manager tracks the progress of a project by monitoring the timely completion of the milestones. • If he observes that some milestones start getting delayed, he carefully monitors and controls the progress of the tasks, so that the overall deadline can still be met. • WBS provides a notation for representing the activities, subactivities, and tasks needed to be carried out in order to solve a problem. Work Breakdown Structure of MIS Problem Fig. 1 How long to decompose? The decomposition of the activities is carried out until any of the following is satisfied: • A leaf-level subactivity (a task) requires approximately two weeks to develop. • Hidden complexities are exposed, so that the job to be done is understood and can be assigned as a unit of work to one of the developers. • Opportunities for reuse of existing software components is identified. • A quality management tool that charts the flow of activity between separate tasks. • An activity network shows the different activities making up a project, their estimated durations, and their interdependencies. Two equivalent representations for activity networks are possible and are in use: • Activity on Node (AoN): In this representation, each activity is represented by a rectangular box and the duration of the activity is shown alongside each task in the node. • Activity on Edge (AoE): In this representation, tasks are associated with the edges. The edges are also annotated with the task duration. The nodes is the graph represent project milestones. • A project network should have only one start node • A project network should have only one end node • A node has a duration • Links normally have no duration • “Precedents” are the immediate preceding activities • Time moves from left to right in the project network • A network should not contain loops • A network should not contain dangles Determine the Activity Network Representation for the MIS development project for which the relevant data is given below in the table. Assume that the manager has determined the tasks to be represented from the work breakdown structure of Fig. 1, and has determined the durations and dependencies for each task as shown in the table. • Critical Path Method (CPM) is a method used in project planning, generally for project scheduling for the on-time completion of the project. • It actually helps in the determination of the earliest time by which the whole project can be completed. • There are two main concepts in this method namely critical task and critical path. • Critical task is the task/activity which can’t be delayed otherwise the completion of the whole project will be delayed. • It must be completed on-time before starting the other dependent tasks. • Critical path is a sequence of critical tasks/activities and is the largest path in the project network. • It gives us the minimum time which is required to complete the whole project. The activities in the critical path are known as critical activities and if these activities are delayed then the completion of the whole project is also delayed. 1. Identifying the activities 2. Construct the project network 3. Perform time estimation using forward and backward pass 4. Identify the critical path Activity label is the name of the activity represented by that node. Earliest Start is the date or time at which the activity can be started at the earliest. Earliest Finish is the date or time at which the activity can completed at the earliest. Latest Start is the date or time at which the activity can be started at the latest. Latest Finish is the date or time at which the activity can be finished at the latest. Float is equal to the difference between earliest start and latest start or earliest finish and latest finish. • Critical path is the path which gives us or helps us to estimate the earliest time in which the whole project can be completed. • Any delay to an activity on this critical path will lead to a delay in the completion of the whole project. In order to identify the critical path, we need to calculate the activity float for each activity. • If the float of an activity is zero, then the activity is a critical activity and must be added to the critical path of the project network. • In this example, activity F and G have zero float and hence, are critical activities. • The activity durations computed using an activity network are only estimated durations. • Since, the actual durations might vary from the estimated durations, the utility of the activity network diagrams are limited. • The CPM can be used to determine the duration of the project, but does not provide any indication of the probability of meeting that schedule • Project Evaluation and Review Technique (PERT) charts are a more sophisticated form of activity chart. • PERT allows for some randomness in task completion times, and therefore provides the capability to determine the probability for achieving project milestones. • Each task is annotated with three estimates: 1. Optimistic (O): The best possible case task completion time. 2. Most Likely Estimate (M): Most likely task completion time. 3. Worst Case (W/P): The worst possible case task completion time. The optimistic (O) and worst case (W) estimates represent the extremities of all possible scenarios of task completion. The most likely estimate (M) is the completion time that has the highest probability. The three estimates are used to compute the expected value of the standard deviation. It can be shown that the entire distribution lies between the interval (M-3ST) and (M+3ST), where ST is the standard deviation. From this it is possible to show that – The standard deviation for a task ST = (P - O)/6 The mean estimated time is calculated as ET = (O + 4M + W)/6 A Gantt chart is a special type of bar chart where each bar represents an activity. The bars are drawn along a time line. The length of each bar is proportional to the duration of time planned for the corresponding activity. A Gantt chart is a form of bar chart. The vertical axis lists all the tasks to be performed. The bars are drawn along the y-axis, one for each task. In the Gantt charts used for software project management, each bar consists of an unshaded part and a shaded part. The shaded part of the bar shows the length of time each task is estimated to take. The unshaded part shows the slack time or lax time. The lax time represents the leeway or flexibility available in meeting the latest time by which a task must be finished. A Gantt chart representation for the MIS problem is shown in the next slide. Suppose you are the project manager of a software project requiring the following activities: (a)Draw the Activity Network representation of the project. (b)Determine ES, EF and LS, LF for every task. (c)Draw the Gantt chart representation of the project. Software Testing Software Testing • Process to identify the correctness, completeness and quality of the developed software. • Helps in identifying errors, gaps or missing requirements. • Can be done manually or using Automation tools Why should We Test ? China Airbus crashing, 1994 1985, Canada Windows 98 crashes live on CNN The real reason Boeing's new plane crashed twice Who should Do the Testing ? • Testing requires the developers to find errors from their software • It is difficult for software developer to point out errors from own creations. • Many organisations have made a distinction between development and testing phase by making different people responsible for each phase. Testing Principles 1. 2. 3. 4. All tests should be traceable to customer requirements Tests should be planned long before testing begins. The Pareto principle applies to software testing Testing should begin “in the small” and progress toward testing “in the large.” 5. Exhaustive testing is not possible 6. To be most effective, testing should be conducted by an independent third party. Error, Mistake, Bug, Fault and Failure Test, Test Case and Test Suite • Test and Test case terms are used interchangeably. • Test case describes an input description and an expected output description. • If the expected output and observed output are different from each other, then there is a failure and it must be recorded properly. • The set of test cases is called a test suite. Hence any combination of test cases may generate a test suite. Verification and Validation • Verification: “Does the product meet its specifications?”. • Validation: “Does the product perform as desired?” V Model of STLC vs SDLC Levels of Testing In System Testing, all the system test cases, functional test cases and non-functional test cases are executed. In other words, the actual and full fledge testing of the application takes place here. Defects are logged and tracked for its closure. Integration testing is a technique where the unit tested modules are integrated and tested whether the integrated modules are rendering the expected results. In simpler words, It validates whether the components of the application work together as expected. Unit Testing is the process of testing a module, at a time. In this testing, one unit, that is the smallest functional unit of the software is tested at a time. To perform a unit test, unit test cases are developed and is performed by the developer or the tester who has good command over programming. Types of Testing Static Testing Software Testing Dynamic Testing Code Review Black Box Testing White Box Testing Code Review • Code review for a model is carried out after the module is successfully compiled and the all the syntax errors have been eliminated. • Code reviews are extremely cost-effective strategies for reduction in coding errors and to produce high quality code. • Normally, two types of reviews are carried out on the code of a module: Code Inspection and Code Walk Through Code Walk Throughs • Code walk through is an informal code analysis technique. • In this technique, after a module has been coded, successfully compiled and all syntax errors eliminated, a few members of the development team are given the code to read and understand code ,few days before the walk through meeting. • Each member selects some test cases and simulates execution of the code by hand (i.e. trace execution through each statement and function execution). • The main objectives of the walk through are to discover the algorithmic and logical errors in the code. The members note down their findings to discuss these in a walk through meeting where the coder of the module is present. Guidelines for the walkthroughs • The team performing code walk through should not be either too big or too small. Ideally, it should consist of between three to seven members. • Discussion should focus on discovery of errors and not on how to fix the discovered errors. • In order to foster cooperation and to avoid the feeling among engineers that they are being evaluated in the code walk through meeting, managers should not attend the walk through meetings. Code Inspection • The aim of code inspection is to discover some common types of errors caused due to oversight and improper programming. • In other words, during code inspection, the code is examined for the presence of certain kinds of errors, in contrast to the hand simulation of code execution done in code walk throughs. • In addition to the commonly made errors, adherence to coding standards is also checked during code inspection. • Good software development companies collect statistics regarding different types of errors commonly committed by their engineers and identify the type of errors most frequently committed Such a list of commonly committed errors can be used during code inspection to look out for possible errors. • • • • • • • • Use of uninitialized variables. Jumps into loops. Nonterminating loops. Incompatible assignments. Array indices out of bounds. Improper storage allocation and deallocation. Mismatches between actual and formal parameter in procedure calls. Use of incorrect logical operators or incorrect precedence among operators. • Improper modification of loop variables. • Comparison of equally of floating point variables, etc. White Box Testing • White box testing is a software testing technique where, the internal structure, design, coding and all the events are tested. • As the name says, white box, which means whatever (that all the internal modules) in the box that is, the software is visible to the tester. • This testing technique deals with the verification of how the flow of the input leads to the output. • White box testing is referred to by different names. Such as clear box testing or the Open box testing or structural testing. White box testing is performed by a set of developers.. Types of White Box testing • Statement Coverage • Branch Coverage • Path Coverage o Condition Coverage (BCC) o Condition and Decision Coverage o Multiple Condition Coverage (MCC) o Multiple Condition/Decision Coverage (MC/DC) Statement Coverage • The statement coverage strategy aims to design test cases so that every statement in a program is executed at least once. • The principal idea governing the statement coverage strategy is that unless a statement is executed, it is very hard to determine if an error exists in that statement. • Unless a statement is executed, it is very difficult to observe whether it causes failure due to some illegal memory access, wrong result computation, etc. • However, executing some statement once and observing that it behaves properly for that input value is no guarantee that it will behave correctly for all input values. Branch Coverage • In the branch coverage-based testing strategy, test cases are designed to make each branch condition to assume true and false values in turn. • Branch testing is also known as edge testing as in this testing scheme, each edge of a program’s control flow graph is traversed at least once. • The only difference is that in statement coverage all statement has to be covered at least once, on the other hand, in branch coverage, the output of all the branch is checked whether returning true or false, but both should be reached. • It is obvious that branch testing guarantees statement coverage and thus is a stronger testing strategy compared to the statement coverage-based testing . Path coverage • In this test, it is checked whether all the paths are covered at least once. This is mainly a detailed task to test how a function performed as the data flows from it. • It mainly checks from the function start to the exit point. • It is important to note that, both the concept of statement testing and the branch coverage is performed in this testing technique. • The test cases for this technique is also, designed in such a way that it traverses all loops and reach up to the final output covering all the statements. Basis Path Testing • A white box method • Proposed by McCabe in 1980’s • A hybrid of path testing and branch testing methods. • Based on Cyclomatic complexity and uses control flow to establish the path coverage criteria Cyclomatic Complexity • Cyclomatic Complexity is a software metric for measuring the complexity of a program. This is used for calculation of independent path in a software or a program through which the program control may flow. • Measures the number of linearly independent paths through a program. • The higher the number the more complex the code. • When we say independent path, it means that it is the path which has at least one edge which has not been traversed before any other paths. Basic path testing approach • Step 1: Draw a control flow graph. • Step 2: Determine Cyclomatic complexity. • Step 3: Find a basis set of paths. • Step 4: Generate test cases for each path. Basic Control Flow Graph Structures • Arrows or edges represent flows of control. • Circles or nodes represent actions. • Areas bounded by edges and nodes are called regions. • A predicate node is a node containing a condition. • Cyclomatic complexity = edges - nodes + 2p where p = number of unconnected parts of the graph • Cyclomatic complexity= Number of Predicate Nodes + 1 • Cyclomatic complexity = number of regions in the control flow graph Predicate Nodes are the nodes with more than one edge emanating from it. Cyclomatic complexity = 8-7+ 2*1= 3. Cyclomatic complexity = 7-8+ 2*2= 3. Software Testing Program flow graph DD Path graph (Decision to decision path graph) Multiple condition/decision coverage (MC/DC) • It ensures Decision as well as condition coverage. • A test suite would achieve MC/DC if during execution of the test suite each condition in a decision expression independently affects the outcome of the decision. • Requirements: 1. Every decision expression in a program must take both true as well as false values. 2. Every condition in a decision must assume both true and false values. 3. Each condition in a decision should independently affect the decision’s outcome. Number of test cases required would be generally (no. of conditions +1) if((A&&B)||C) Test Case ABC ReFsult A 1 TTT T 5 2 TTF T 6 3 TFT T 7 4 TFF F 5 FTT F 1 6 FTF F 2 7 FFT F 3 8 FFF F A [1,5] B [2,4] C [3,4] B 4 4 2 [2,6] C [3,7] 3 [2,3,4,6] or [2,3,4,7] or [1,2,3,4,5] if((A||B)&&C) If ((A&B)&(C||D)) Draw the control flow graph, compute cyclomatic complexity and basis paths Draw the control flow graph, compute cyclomatic complexity and basis paths Software Testing Draw the control flow graph, compute cyclomatic complexity and basis paths Draw the control flow graph, compute cyclomatic complexity and basis paths public double calculate(int amount) { -1- double rushCharge = 0; -1- if (nextday.equals("yes") ) { -2rushCharge = 14.50; } -3- double tax = amount * .0725; -3- if (amount >= 1000) { -4shipcharge = amount * .06 + rushCharge; } -5- else if (amount >= 200) { -6shipcharge = amount * .08 + rushCharge; } -7- else if (amount >= 100) { -8shipcharge = 13.25 + rushCharge; } -9- else if (amount >= 50) { -10shipcharge = 9.95 + rushCharge; } -11- else if (amount >= 25) { -12shipcharge = 7.25 + rushCharge; } else { -13shipcharge = 5.25 + rushCharge; } -14- total = amount + tax + shipcharge; -14- return total; } //end calculate Write a program that checks whether a character entered by the user is a vowel or not • Draw the control flow graph, compute cyclomatic complexity and basis paths Black Box Testing • Black-box testing, also called behavioral testing, focuses on the functional requirements of the software. That is, black-box testing enables the software engineer to derive sets of input conditions that will fully exercise all functional requirements for a program. • Black-box testing is not an alternative to white-box techniques. Rather,it is a complementary approach that is likely to uncover a different class of errors than white-box methods. Black-box testing attempts to find errors in the following categories: 1. incorrect or missing functions 2. interface errors, 3. errors in data structures or external database access, 4. behavior or performance errors, and 5. initialization and termination errors. • Unlike white-box testing, which is performed early in the testing process, blackbox testing tends to be applied during later stages of testing Types of Black Box Testing 1. Equivalence Partitioning 2. Boundary Value Analysis 3. Cause Effect Graphing Equivalence Partitioning Principles • An exhaustive testing of values in the input domains is impossible. • One is limited to a small subset of all possible input values. • One wants to select a subset with the highest probability of finding the most errors. Equivalence Classes • A well-selected set of input values should covers a large set of other input values. • This property implies that one should partition the input domains into a finite number of equivalence classes. • A test of a representative value of each class is equivalent to a test of any other value. Valid and Invalid Equivalence Classes • The equivalence classes are identified by taking each input condition and partitioning the input domain into two or more groups. • Two types of equivalence classes are identified. • Valid equivalence classes represent valid inputs to the program. • Invalid equivalence classes represent all other possible states of the condition. ` If an input condition specifies a range of values (e.g., the count can be from 1 to 999), it identifies one valid equivalence class (1 ≤count ≤999) and two invalid equivalence classes (count < 1and count > 999) Partitioning Valid Equivalence Classes • If elements in a valid equivalence class are not handled in an identical manner by the program, partition the equivalence class into smaller equivalence classes. • Generate a test case for each valid and invalid equivalence class. • Inputs will be integers greater than or equal to 0 and less than or equal to 100 Let the input values are partitioned as 10 to 20. and 40 to 60. Now we make different can be classified as: -- to 9 Invalid. 10 to 20 Valid. 21 to 39 Invalid. 40 to 60 Valid 61 to --- Invalid Now we may select values from each class: -7, 16, 31, 49, 88 and see directly whether is it is valid or invalid. Example An integer field shall contain values between and including 1 to 15. By applying equivalence partition what is the minimum number of test cases required for maximum coverage? •1 •2 Invalid Valid Invalid •3 <1 1-15 16≤ •4 Example • In a system designed to work out the tax to be paid • An employee has 4000 Rs of salary tax free .The next 1500 is taxed at 10%. The next 28000 Rs is taxed at 22%. Any further amount is taxed at 40%. Which of these groups will fall into same equivalence class? 1. 4800;14000;28000 2. 5200;5500;28000 3. 28001;32000;35000 4. 5800;28000;32000 Valid 1-4000 Valid 4001-5500 Valid 5501-33500 Valid 33501≤ Boundary Value Analysis • When choosing values from an equivalence class to test, use the values that are most likely to cause the program to fail. • For reasons that are not completely clear, a greater number of errors tends to occur at the boundaries of the input domain rather than in the "center.“ • It is for this reason that boundary value analysis (BVA) has been developed as a testing technique. Boundary value analysis leads to a selection of test cases that exercise bounding values. • Boundary value analysis is a test case design technique that complements equivalence partitioning. • Rather than selecting any element of an equivalence class, BVA leads to the selection of test cases at the "edges" of the class. Rather than focusing solely on input conditions, BVA derives test cases from the output domain as well. • In addition to testing center values, we should also test boundary values 1. Right on a boundary 2. Very close to a boundary on either side • The boundary value analysis test cases for our program with two inputs variables (x and y) that may have any value from 100 to 300 are: (200,100), (200,101), (200,200), (200,299), (200,300), (100,200), (101,200), (299,200) and (300,200). For a program of n variables, boundary value analysis yield 4n + 1 test cases. Robustness testing • It is nothing but the extension of boundary value analysis. • Here, we would like to see, what happens when the extreme values are exceeded with a value slightly greater than the maximum, and a value slightly less than minimum. • It means, we want to go outside the legitimate boundary of input domain. This extended form of boundary value analysis is called robustness testing. • There are additional test cases which are outside the legitimate input domain. Hence total test cases in robustness testing are 6n+1, where n is the number of input variables. Robustness testing Consider the square root problem . Generate robust and test cases for this problem Consider a simple program to classify a triangle. Its inputs is a triple of positive integers (say x, y, z) and the date type for input parameters ensures that these will be integers in the range [1,100]. The program output may be one of the following words: Expected Output: [Scalene; Isosceles; Equilateral; Not a triangle] Robustness Testing • A test field in an application accepts inputs as the age of user. Here, the values are allowed to be accepted by the field is between 18 to 30 years, both values inclusive. By applying EP and BVA , which of the given option consists of valid boundary values and valid equivalence value. 1. 17,18,20 2. 18, 30, 25 3. 18,30,31 inValid Valid invalid 4. 19,20,31 ≤17 18-30 31≤ Cause Effect Graphing • One weakness with the equivalence class partitioning and boundary value analysis method is that they consider each input separately. • That is both concentrate on conditions and classes of one input. • They do not consider combinations of input circumstances that may form interesting situations that should be tested. • Cause Effect Graphing is a technique that aids in selecting combinations of input conditions in a systematic way such that the number of test cases does not become unmanageably large. • The technique starts with identifying causes and effects of the system under testing. • A cause is a distinct input condition and an effect is a distinct output condition. • Each condition forms a node in the cause-effect graph. • The condition should be stated such that they can be set to either true or false. • After identifying the causes and effects , for each effect we identify the causes that can produce that effect and how the conditions have to be combined to make the effect true. • Conditions are combined using Boolean operators: “and (^)”, “or (V)” and “not(~)”. • A test case is generated for each combination of conditions which make some effect true. Basic cause effect graph symbols S.No 1 2 3 C1 T F X C2 T F X C3 T X F E1 T E2 E3 T T TRIANGLE PROBLEM SOFTWARE TESTING STRATEGIES • UNIT TESTING • INTEGRATION TESTING • SYSTEM TESTING UNIT TESTING • Unit testing focuses verification effort on the smallest unit of software design—the software component or module. • Using the component-level design description as a guide, important control paths are tested to uncover errors within the boundary of the module. • The relative complexity of tests and uncovered errors is limited by the constrained scope established for unit testing. The unit test is whitebox oriented, and the step can be conducted in parallel for multiple components Unit Test Considerations • The module interface is tested to ensure that information properly flows into and out of the program unit under test. • The local data structure is examined to ensure that data stored temporarily maintains its integrity during all steps in an algorithm's execution. • Boundary conditions are tested to ensure that the module operates properly at boundaries established to limit or restrict processing. • All independent paths (basis paths) through the control structure are exercised to ensure that all statements in a module have been executed at least once. • And finally, all error handling paths are tested. In Unit Testing, Test cases should uncover errors such as 1. Comparison of different data types 2. Incorrect logical operators or precedence, 3. Expectation of equality when precision error makes equality unlikely, 4. Incorrect comparison of variables, 5. Improper or nonexistent loop termination, 6. Failure to exit when divergent iteration is encountered, 7. improperly modified loop variables. In most applications a driver is nothing more than a "main program" that accepts test case data, passes such data to the component (to be tested), and prints relevant results. Stubs serve to replace modules that are subordinate (called) by the component to be tested. A stub or "dummy subprogram" uses the subordinate module's interface, may do minimal data manipulation, prints verification of entry, and returns control to the module undergoing testing. INTEGRATION TESTING • Data can be lost across an interface; one module can have an inadvertent, adverse affect on another; subfunctions, when combined, may not produce the desired major function; individually acceptable imprecision may be magnified to unacceptable levels; global data structures can present problems. • Integration testing is a systematic technique for constructing the program structure while at the same time conducting tests to uncover errors associated with interfacing. • The objective is to take unit tested components and build a program structure that has been dictated by design. Top-down Integration • Top-down integration testing is an incremental approach to construction of program structure. • Modules are integrated by moving downward through the control hierarchy, beginning with the main control module (main program). • Modules subordinate (and ultimately subordinate) to the main control module are incorporated into the structure in either a depthfirst or breadth-first manner. Depth-first integration would integrate all components on a major control path of the structure. Selection of a major path is somewhat arbitrary and depends on application-specific characteristics. • Selecting the left hand path, components M1, M2, M5 would be integrated first. Next, M8 or M6. Then, the central and right hand control paths are built. Breadth-first integration incorporates all components directly subordinate at each level, moving across the structure horizontally. • Components M2, M3, and M4 (a replacement for stub S4) would be integrated first. The next control level, M5, M6, and so on, follows. Top down Integration Procedure • The main control module is used as a test driver and stubs are substituted for all components directly subordinate to the main control module. • Depending on the integration approach selected (i.e., depth or breadth first), subordinate stubs are replaced one at a time with actual components. • Tests are conducted as each component is integrated. • On completion of each set of tests, another stub is replaced with the real component. • Regression testing may be conducted to ensure that new errors have not been introduced. Drawback: Problems occurs when processing at low levels in the hierarchy is required to adequately test upper levels. Stubs replace low level modules at the beginning of topdown testing; therefore, no significant data can flow upward in the program structure Bottom-up Integration • Bottom-up integration testing, as its name implies, begins construction and testing with atomic modules (i.e., components at the lowest levels in the program structure). • Because components are integrated from the bottom up, processing required for components subordinate to a given level is always available and the need for stubs is eliminated. Bottom-up integration strategy • Low-level components are combined into clusters (sometimes called builds) that perform a specific software subfunction. • A driver (a control program for testing) is written to coordinate test case input and output. • The cluster is tested. • Drivers are removed and clusters are combined moving upward in the program structure. • • • • • • Components are combined to form clusters 1, 2, and 3. Each of the clusters is tested using a driver (shown as a dashed block). Components in clusters 1 and 2 are subordinate to Ma. Drivers D1 and D2 are removed and the clusters are interfaced directly to Ma. Similarly, driver D3 for cluster 3 is removed prior to integration with module Mb. Both Ma and Mb will ultimately be integrated with component Mc, and so forth Regression Testing • Each time a new module is added as part of integration testing, the software changes. • New data flow paths are established, new I/O may occur, and new control logic is invoked. These changes may cause problems with functions that previously worked flawlessly. • In the context of an integration test strategy, regression testing is the reexecution of some subset of tests that have already been conducted to ensure that changes have not propagated unintended side effects The regression test suite (the subset of tests to be executed) contains three different classes of test cases: • A representative sample of tests that will exercise all software functions. • Additional tests that focus on software functions that are likely to be affected by the change. • Tests that focus on the software components that have been changed. VALIDATION TESTING • Validation succeeds when software functions in a manner that can be reasonably expected by the customer. • Software validation is achieved through a series of black-box tests that demonstrate conformity with requirements. • A test plan outlines the classes of tests to be conducted and a test procedure defines specific test cases that will be used to demonstrate conformity with requirements. • Both the plan and procedure are designed to ensure that all functional requirements are satisfied, all behavioral characteristics are achieved, all performance requirements are attained, documentation is correct, and human engineered and other requirements are met Acceptance Testing Alpha Testing Beta Testing Acceptance testing • When custom software is built for one customer, a series of acceptance tests are conducted to enable the customer to validate all requirements. • Conducted by the enduser rather than software engineers, an acceptance test can range from an informal "test drive" to a planned and systematically executed series of tests. • In fact, acceptance testing can be conducted over a period of weeks or months, thereby uncovering cumulative errors that might degrade the system over time. If software is developed as a product to be used by many customers, it is impractical to perform formal acceptance tests with each one. Most software product builders use a process called alpha and beta testing to uncover errors that only the end-user seems able to find. Alpha Testing: • The alpha test is conducted at the developer's site by a customer. The software is used in a natural setting with the developer "looking over the shoulder" of the user and recording errors and usage problems. • Alpha tests are conducted in a controlled environment. Beta Testing: • The beta test is conducted at one or more customer sites by the end-user of the software. Unlike alpha testing, the developer is generally not present. • Therefore, the beta test is a "live" application of the software in an environment that cannot be controlled by the developer. The customer records all problems (real or imagined) that are encountered during beta testing and reports these to the developer at regular intervals. • As a result of problems reported during beta tests, software engineers make modifications and then prepare for release of the software product to the entire customer base. SYSTEM TESTING • Of the three levels of testing, the system level is closet to everyday experiences. • System testing is actually a series of different tests whose primary purpose is to fully exercise the computer-based system. • Although each test has a different purpose, all work to verify that system elements have been properly integrated and perform allocated functions. • Goal is not to find faults, but to demonstrate performance. Because of this we tend to approach system testing from a functional standpoint rather than from a structural one. • Since it is so intuitively familiar, system testing in practice tends to be less formal than it might be, and is compounded by the reduced testing interval that usually remains before a delivery deadline. • During system testing, we should evaluate a number of attributes of the software that are vital to the user by using recovery testing, security testing, stress testing, performance testing. Recovery Testing • Many computer based systems must recover from faults and resume processing within a prespecified time. Recovery testing is a system test that forces the software to fail in a variety of ways and verifies that recovery is properly performed. • If recovery is automatic (performed by the system itself), reinitialization, checkpointing mechanisms, data recovery, and restart are evaluated for correctness. • If recovery requires human intervention, the mean-time-to-repair (MTTR) is evaluated to determine whether it is within acceptable limits. Security Testing • Any computer-based system that manages sensitive information or causes actions that can improperly harm (or benefit) individuals is a target for improper or illegal penetration. • Security testing attempts to verify that protection mechanisms built into a system will, in fact, protect it from improper penetration . • During security testing, the tester plays the role(s) of the individual who desires to penetrate the system. • The tester may attempt to acquire passwords through external clerical means; may attack the system with custom software designed to breakdown any defenses that have been constructed; may overwhelm the system, thereby denying service to others; may purposely cause system errors, hoping to penetrate during recovery; may browse through insecure data, hoping to find the key to system entry Stress Testing • During earlier software testing steps, white-box and black-box techniques resulted in thorough evaluation of normal program functions and performance. • Stress tests are designed to confront programs with abnormal situations. • In essence, the tester who performs stress testing asks: "how high can we crank this up before it fails?“ • Stress testing executes a system in a manner that demands resources in abnormal quantity, frequency, or volume. Examples: 1. 2. 3. Special tests may be designed that generate ten interrupts per second, when one or two is the average rate Input data rates may be increased by an order of magnitude to determine how input functions will respond Test cases that require maximum memory or other resources are executed, Performance Testing • For real-time and embedded systems, software that provides required function but does not conform to performance requirements is unacceptable. • Performance testing is designed to test the run-time performance of software within the context of an integrated system. • Performance testing occurs throughout all steps in the testing process. Even at the unit level, the performance of an individual module may be assessed as white-box tests are conducted. • However, it is not until all system elements are fully integrated that the true performance of a system can be ascertained