ESE Module 3-3 Structured Analysis Part 2: Data Flow Modeling All software, whether it's used to produce paychecks or control a supersonic aircraft, is an information transformer. That is, all software accepts input, transforms it in some way, and produces output as a result. In essence, data flow into a system, get changed, and ultimately flow out. As we create the analysis model, our challenge is to understand this transformation. How are data changed as they flow through a system? What functions are required to transform input into output? These questions are answered during the second analysis modeling activity-the creation of a data flow model. The data flow model kills two birds with one stone. It indicates the flow of data and it also provides us with a mechanism for functional decomposition. Hence we address two important analysis principles: (1) understanding the flow of data through a system, and (2) partitioning (decomposing) function into progressively lower levels of abstraction. (See ESE Module 3-2 for details.) Flow Modeling Notation Data flow notation is surprisingly simple, and yet the models that it produces can be quite sophisticated. The basic data flow icons address only four things: producers and consumers of data; processes that transform data; the data flow itself; and places where data are stored. Armed with these icons, the software engineer (analyst) can produce a picture of the software (and its external environment) that establishes the basis for architectural design. Readings The following excerpt has been adapted from Software Engineering: A Practitioner's Approach and discusses data flow notation. Extensions for real-time software (not discussed in the video portion of this ESE Module) are presented briefly. Information is transformed as it flows through a computerbased system. The system accepts input in a variety of forms; applies hardware, software and human elements to transform input into output; and produces output in a variety of forms. Input may be a control signal transmitted by a transducer, a series of numbers typed by a human operator, a packet of information transmitted on a network link or a voluminous data file retrieved from secondary storage. The transform(s) may comprise a single logical comparison, a complex numerical algorithm or rule-inference approach of an expert system. Output may light a single LED or produce a 200-page report. In effect, we can create a flow model for any computer-based system, regardless of size and complexity. Structured analysis is an information flow and content modeling technique. A computer-based system is represented as an information transform. Overall function of the system is represented as a single information transform, represented as a circle or bubble. One or more inputs, shown as labeled arrows, originate from external entities, represented as a box. The input drives the transform to produce output information (also represented as labeled arrows) that is passed to the external entities. It should be noted that the model may be applied to the entire system or to the software element only. The key is to represent the information fed into and produced by the transform. Data Flow Diagrams As information moves through software, it is modified by a series of transformations. A data flow diagram (DFD) is a graphical technique that depicts information flow and the transforms that are applied as data move from input to output. The DFD is also known as a data flow graph or a bubble chart. The data flow diagram may be used to represent a system or software at any level of abstraction. In fact, DFDs may be partitioned into levels that represent increasing information flow and functional detail. A level 0 DFD, also called a fundamental system model or a context model, represents the entire software element as a single bubble with input and output data indicated by incoming and outgoing arrows, respectively. Additional processes (bubbles) and information flow paths are represented as the level 0 DFD is partitioned to reveal more detail. For example, a level 1 DFD might contain five or six bubbles with interconnecting arrows. Each of the processes represented at level 1 are subfunctions of the overall system depicted in the context model. A rectangle is used to represent an external entity, that is, a system element (e.g., hardware, a person, another program) or another system that produces information for transformation by the software or that receives information produced by the software. A circle represents a process or transform that is applied to data (or control) items and changes them in some way. An arrow represents one or more data items. All arrows on a data flow diagram should be labeled. The double line represents a data store--stored information that is used by the software. The simplicity of DFD notation is one reason why structured analysis techniques are the most widely used. 3-3p2.2 ·· Essential Software Engineering It is important to note that no explicit indication of the sequence of processing is supplied by the diagram. Procedure or sequence may be implicit in the diagram, but explicit procedural representation is generally delayed until software design. processing narrative described the input to the bubble, the algorithm that is applied to the input and the output that is produced. In addition, the narrative indicates restrictions and limitations imposed on the process, performance characteristics that are relevant to the process, and design constraints that may influence the way in which the process will be implemented. Extensions for Real-Time Systems Many software applications are time dependent, and process as much or more control-oriented information as data. [For a detailed discussion of these real-time systems see Chapter 15 of Software Engineering: A Practitioner's Approach.] For now, suffice it to say that a real-time system must interact with the real world in a timeframe dictated by the real world. Aircraft avionics, manufacturing process control, consumer products and industrial instrumentation are but a few of hundreds of real-time software applications. Ward and Mellor Extensions As we noted earlier, each of the bubbles may be refined or layered to depict more detail. Figure 1 illustrates this concept. A fundamental model for system F indicates the primary input is A and ultimate output is B. We refine the F model into transforms f1to f7. Note that information flow continuity must be maintained, that is, input and output to each refinement must remain the same. This concept, sometimes called balancing, is essential for the development of consistent models. Further refinement of f4 depicts detail in the form of transforms f41 to f45 Again, the input (X,Y) and output (Z) remain unchanged. The data flow diagram is a graphical tool that can be very valuable during software requirements analysis. However, the diagram can cause confusion if its function is confused with the flowchart. A data flow diagram depicts information flow without explicit representation of procedural logic (e.g., conditions or loops). It is not a flowchart with rounded edges! The basic notation used to develop a DFD is not in itself sufficient to describe requirements for software. For example, an arrow shown in a DFD represents a data item that is input to or output from a process. A data store represents some organized collection of data. But what is the content of the data implied by the arrow or depicted by the store? If the arrow (or the store) represents a collection of items, what are they? These questions are answered by applying another component of the basic notation for structured analysis--the data dictionary. [The format and use of the data dictionary are presented in the third part of this ESE Module.] The graphical notation for DFDs must be augmented with descriptive text. A processing narrative--a paragraph that describes a process bubble--can be used to specify the processing details implied by the bubble within a DFD. The Ward and Mellor [1] extend basic structured analysis notation to accommodate the following demands imposed by a real-time system: information flow that is gathered or produced on a time-continuous basis; control information passed throughout the system and associated control processing; multiple instances of the same transformation are sometimes encountered in multitasking situations; system states and the mechanism that causes transition between states. In a significant percentage of real-time applications, the system must monitor time-continuous information generated by some real world process. For example, a realtime test monitoring system for gas turbine engines might be required to monitor turbine speed, combustor temperature, and a variety of pressure probes on a continuous basis. Conventional data flow notation does not make a distinction between discrete data and time-continuous data. An extension to basic structured analysis notation, shown in Figure 2, provides a mechanism for representing time-continuous data flow. The double-headed arrow is used to represent time-continuous flow; a single-headed arrow is used to indicate discrete data flow. In the figure, monitored temperature is measured continuously and a single value for temperature set-point is also provided. The process shown in the figure produces a time-continuous output, corrected value. The distinction between discrete and time-continuous data flow has important implications for both the system engineer and the software designer. During the creation of the system model, a system engineer will be better able to isolate those processes that may be performance critical. (It is often likely that the input and output of time-continuous data will be performance sensitive.) As the physical or implementation model is created, the designer must establish a mechanism for collection of time-continuous data. Obviously, the digital system collects data in a quasi-continuous fashion using techniques such as high-speed polling. The notation indicates where analog to digital hardware will be required and which transforms are likely to demand high-performance software. In conventional data flow diagrams, control items or event flows are not represented explicitly. In fact, the analyst is cautioned specifically to exclude the representation of control flow from the data flow diagram. This exclusion is overly restrictive when real-time applications are considered; for this reason, a specialized notation for representing event flows and control processing has been developed. Continuing the convention established for data flow diagrams, data flow is represented using a solid arrow. Control flow, however, is represented using a dashed or shaded arrow. A process that handles only control flows, called a control process, is similarly represented using a dashed bubble. Structured Analysis: Data Flow Modeling .. 3-3p2.3 Control flow can be input directly to a conventional process or into a control process. Figure 3 illustrates control flow and processing as it would be represented using Ward and Mellor notation. The figure illustrates a toplevel view of a data and control flow for a manufacturing cell. [A manufacturing cell is used in factory automation applications. It contains computers and automated machines (e.g., robots, NC machines, specialized fixtures) and performs one discrete manufacturing operation under computer control.] As components to be assembled by a robot are placed on fixtures, a status bit is set within a parts status buffer (a control store) that indicates the presence or absence of each component. Event information contained within the parts status buffer is passed as a bit string to a process, monitor fixture and operator interface. The process will read operator commands only when the control information, bit string, indicates that all fixtures contain components. An event flag, start/stop flag, is sent to robot initiation control, a control process that enables further command processing. Other data flows occur as a consequence of the process activate event that is sent to process robot commands. In some situations multiple instances of the same control or data transformation process may occur in a realtime system. This can occur in a multitasking environment when tasks are spawned as a result of internal processing or external events. 1] Ward, P.T., and S. Mellor, Structured Development for Real-Time Systems (3 volumes), Yourdon Press, 1985. i~ Exercise 3-7 Data Flow Modeling Recall the electronic checkbook problem introduced in ESE Module 3-2. Assume that you work for a consumer products company that is about to build an electronic checkbook, called ElectroChex. The product, about the size and shape of a standard checkbook will print checks that you insert into a slot at the end. The product stores up to 256 payee names, categorizes payments, allows you to enter numeric and alpha information via a qwerty keyboard and has communication capabilities to PCs. 1. Develop Level 0, 1 and 2 data flow diagrams for ElectroChex. 2. Be sure to identify all external entities. 3. Be sure to label all bubbles (transforms) and 3-3p2.4 ·· Essential Software Engineering arrows in your DFDs. 4. Review your results with those developed by your colleagues. Hint: If you have trouble starting this exercise, think back to our discussion of the grammatical parse in ESE Module 3-3. part 1 . You'll recall that each of the active verbs in the statement of scope represents a potential function for the system. The functions translate into bubbles at level 1. Creating DFDs Although DFD notation is quite simple, the creation of a DFD is more difficult than you might think. The reason: most software people have been trained to think procedurally. Data flow modeling has a procedural component to it, but it is a process of refinement based on how data flows, not on how program logic progresses. For this reason, it may seem a bit odd to you if this is your first introduction. Readings The following excerpt has been adapted from Software Engineering: A Practitioner's Approach and discusses the mechanics for creating a data flow diagram. The data flow diagram (DFD) enables the software engineer to develop models of the information domain and functional domain at the same time. As the DFD is refined into greater levels of detail, the analyst performs an implicit functional decomposition of the system. At the same time, the DFD refinement results in a corresponding refinement of data as it moves through the processes that embody the application. A few simple guidelines can aid immeasurably during derivation of a data flow diagram: (1) the level 0 data flow diagram should depict the software/system as a single bubble; (2) primary input and output should be carefully noted; (3) refinement should begin by isolating candidate processes, data items and stores to be represented at the next level; (4) all arrows and bubbles should be labeled with meaningful names; (5) information flow continuity must be maintained from level to level; (6) one bubble at a time should be refined. There is a natural tendency to overcomplicate the data flow diagram. This occurs when the analyst attempts to show too much detail too early or represents procedural aspects of the software in lieu of infermation flow. To illustrate the use of these basic guidelines, the SafeHome security system example, introduced earlier, will be used. A processing narrative for SafeHome is reproduced below: SafeHome software enables the homeowner to configure the security system when it is installed, monitors all sensors connected to the security system, and interacts with the homeowner through a keypad and function keys contained in the SafeHome control panel. During installation, the SafeHome control panel is used to program and configure the system. Each sensor is assigned a number and type, a master password is programmed for arming and disarming the system, and telephone number(s) are input for dialing when a sensor event occurs. When software senses an event, it rings an audible alarm attached to the system. After a specified delay time set by the homeowner during system configuration activities, the software dials the telephone number of a monitoring service, provides information about the location and the nature of the event that it detected. The software will re-dial the number every 20 seconds until telephone connection is obtained. All interaction with SafeHome is managed by a userinteraction subsystem that reads input provided through the keypad and function keys and displays prompting messages on the LCD display, displays system status infermation on the LCD display. Keyboard interaction takes the following form ... A level 0 DFD for SafeHome is shown in Figure 1. The primary external entities (boxes) produce information for use by the system and consume information generated by the system. The labeled arrows represent composite data items, that is, a data item that is actually a collection of many additional data items. For example, user commands and data encompasses all configuration commands, all activation/deactivation commands, all miscellaneous interactions, and all data that are input to qualify or expand a command. The level 0 DFD is now expanded into a level 1 model. But how do we proceed? A simple yet effective approach is to perform a grammatical parse on the processing narrative that described the context-level bubble. That is, we isolate all nouns (and noun phrases) and verbs (and verb phrases) in the narrative presented above. To illustrate, we again reproduce the processing narrative, underlining the first occurrence of all nouns and italicizing the first occurrence of all verbs: Structured Analysis: Data Flow Modeling .. 3-3p2.5 SafeHome software enables the homeowner to configure the security system when it is installed, monitors all sensors connected to the security system, and interacts with the homeowner through a keypad and function keys contained in the SafeHome control panel. During installation, the SafeHome control panel is used to program and configure the system. Each sensor is assigned a number and type, a master password is programmed for arming and disarming the system, and telephone number(s) are input for dialing when a sensor event occurs. When software senses an event, it rings an audible alarm attached to the system. After a specified delay time set by the homeowner during system configuration activities, the software dials the telephone number of a monitorine service, provides information about the location and the nature of the event that it detected. The software will re-dialed the number every 2 seconds until telephone connection is obtained. All interaction with SafeHome is managed by a userinteraction subsystem that reads input provided through the keypad and function keys and displays prompting messages on the LCD display. displays system status information on the LCD display. Keyboard interaction takes the following form ... It should be noted that nouns and verbs that are synonyms or have no direct bearing on the modeling process are omitted. Referring to the grammatical parse, a pattern begins to emerge. All verbs are SafeHome processes; that is, they may ultimately be represented as bubbles in a subsequent DFD. All nouns are either external entities (boxes), data or control items (arrows), or data stores (double lines). Note further that nouns and verbs can be attached to one another (e.g., sensor is assigned number and ~. Therefore, by performing a grammatical parse on the processing narrative for a bubble at any DFD level, we can generate much useful information about how to proceed with the refinement to the next level. Using this information, a level 1 DFD is shown in Figure 2. The context-level process shown in Figure 1 has been expanded into seven processes derived from an examination of the grammatical parse. Similarly, the information flow between processes at level 1 has been derived from the parse. It should be noted that information flow continuity is maintained between levels 0 and 1. Elaboration of the content of inputs and output at DFD levels 0 and 1 is postponed until later. The processes represented at DFD level 1 can be further refined into lower levels. For example, the process monitor sensors can be refined into a level 2 DFD as shown in Figure 3. Note once again that information flow continuity has been maintained between levels. The refinement of DFDs continues until each bubble performs a simple function; that is, until the process represented by the bubble performs a function that would be easily implemented as a program component. In [ESE Module 3-4] we discuss a concept called cohesion that can be used to assess the simplicity of a given function. For now, we strive to refine DFDs until each bubble is singleminded. Creating a Control Flow Model For many types of data processing applications, data flow modeling is all that is necessary to obtain meaningful insight into software requirements. As we have already noted, however, there exists a large class of applications that are driven by events rather than data; that produce control information rather than reports or displays; that process information with heavy concern for time and performance. Such applications require the use of control flow modeling in addition to data flow modeling. To review the approach for creating a control flow diagram (CFD), a data flow model is stripped of all data flow arrows. Events and control items (dashed arrows) are then added to the diagram and a "window" (a vertical bar) into the control specification is shown. But how are events selected? We have already noted that an event or control item is implemented as a boolean value (e.g., true or false, on or 3-3p2.6 ·· Essential Software Engineering off, 1 or 0) or a discrete list of conditions (empty, jammed, full). To select potential candidate events, the following guidelines are suggested: list all sensors that are read by the software; list all interrupt conditions; list all switches that are actuated by the operator list all data conditions recalling the noun-verb parse that was applied to the processing narrative, review all control items as possible control specification (CSPEC) inputs/outputs: describe the behavior of a system by identifying its states; identify how each state is reached and define the transitions between states focus on possible omissions--a very common error in specifying control (e.g., ask: "Is there any other way I can get to this state or exit from it?") A level 1 CFD for SafeHome software is illustrated in Figure 4. Among the events and control items noted are sensor event (i.e., a sensor has been tripped), blink flag (a signal to blink the LCD display) and start/stop switch (a signal to turn the system on or off). When the event flows into the CSPEC window from the outside world, it implies that the CSPEC will activate one or more of the processes shown in the CFD. When a control item emanates from a process and flows into the CSPEC window, control and activation of some other process or an outside entity is implied. Exercise 3-8, Data Flow Diagrams Read the following problem description and then perform the tasks required to build a model using structured analysis. Problem Statement: The Department of Public Works for a large city has decided to develop a computerized pothole tracking and repair system (PHTRS). As potholes are reported, they are assigned an identifying number, stored by street address, size (on a scale of 1 to 10), location (middle, curb, etc.), district (determined from street address), and repair priority (determined from the size and location of the pothole). Work order data associated with each pothole includes pothole location and size, repair crew identifying number, number of people on crew, equipment assigned, hours applied to repair, hole status (work in progress, repaired, temporary repair, not repaired), amount of filler material used and cost of repair (computed from hours applied, number of people, material and equipment used). Finally, a damage file is created to hold information about reported damage due to the pothole and includes citizen's name, address, phone number type of damage; and dollar amount of damage. PHTRS is an on-line system; queries are to be made interactively. The user interacts with the system using a city map (with appropriate zoom capabilities) that enables pothole information to be entered and displayed. For example, all streets in a five-block section of the city can be displayed with each pothole located. 1. Using the grammatical parse, make lists of potential external entities, data objects and data stores (nouns) and system functions (verbs). 2. Build a paper prototype of the user interface for PHTRS. 3. Develop a level 0 DFD. 4. Develop level 1 and level 2 DFDs. 5. Develop an ERD (ESE Module 3-3, part 1) for PHTRS. 6. Review your results with those developed by a colleague Use separate sheets of paper for your solution to this exercise. Structured Analysis: Data Flow Modeling .. 3-3p2.7 Post-test, Module 3-3, part 2 This post-test has been designed to help you assess the degree to which you've absorbed the information presented in this ESE module. Data Flow Modeling 1. Every computer-based system is: a. a data transform b. a computational transform c. a logic transform d. a procedural transform 2. An external entity is represented in a DFD as a box. It represents: a. a producer or consumer of information b. a part of the software to be built c. the data dictionary d. none of the above 3. Using the basic rules that were described for creating DFDs, which of the following would represent a violation of the rules: a. data flowing between two bubbles in the DFD b. an labeled arrow c. data flowing directly between a box and a data store d. a labelled bubble e. all are valid according to the rules 4. Partitioning is accomplished in part by: a. considering composite data items b. representing data flow at a number of different levels c. using the data dictionary d. all of the above contribute to partitioning 5. Balancing means: a. that the data flow should not be visually lopsided b. that input and output flow at one level must be maintained at the next level c. that the same number of inputs and outputs must always be present d. none of the above 6. A level 1 data flow diagram (the first refinement of the context level) would typically have the following number of bubbles: a. 1 or 2 b. 4 to 7 c. 7 to 14 d. more than 14 7. A data store could be: a. a file b. a buffer c. a relational data base d. all of the above 8. A single data item flows into a process bubble, is transformed and then flows out of a process bubble, is stored in a buffer, and then is extracted from the buffer for use in another function, which produces a single output item. The total number of DFD symbols required to depict this situation (not counting labeling) is: a. 3 b. 5 c. 7 d. 9 9. Most systems require from m to n levels of data flow refinement, where m to n is: a. 1 to 2 b. 3 to 7 c. 5 to 9 d. 7 to 10 10. What structured analysis notation provides infermation about the internal working of a process bubble? a. PSPEC b. statement of scope c. data dictionary d. ERD 11. The data flow diagram maps into: a. a behavioral design b. a procedural design c. a data design d. an architectural design 12. An arrow in a flow model can represent: a. data b. data and control c. data and relationships d. data and logical flow Copyright 1995 R.S. Pressman & Associates, Inc. No part of this material may be electronically copied, transmitted, or posted without the express written permission of R.S. Pressman & Associates, Inc. These materials have been posted on the Local Web Server of the Department of Computer Science with permission of R.S. Pressman & Associates, Inc. and are for use only by students registered for DCS/235 at Queen Mary and Westfield College