Part III: The Analysis Process Lecture Note 9 Analyzing Systems Using Data Dictionaries Systems Analysis and Design Kendall & Kendall Sixth Edition Major Topics • • • • • • • Data dictionary concepts Defining data flow Defining data structures Defining elements Defining data stores Using the data dictionary Data dictionary analysis 2 Data Dictionary • Is a main method for analyzing the data flows and data stores of Data-Oriented Systems. • Is a reference work of data about data (that is , metadata), one that is compiled (編輯) by systems analysts to guide them through analysis and design. • As a document, It collects, coordinates (協調), and confirms (確定) what a specific data term means to different people in the organization. • Objective is to provide clear, comprehensive (全面的, 綜合的) information about the data and processes that make up the system. • Data dictionaries contain: Data Flow, Data Structures, Data Elements, Data Stores. 3 Reasons for Using a Data Dictionary A set of DFDs products a logical model of the system, but the details within those DFDs are documented separately in a data dictionary, which is the second component of structured analysis. • The data dictionary may be used for the following reasons: – – – – – – Provide documentation. Eliminate redundancy. (Clean data E.g. sex=“M”, / 1 / “Male” ) Validate the DFD for completeness and accuracy. Provide a starting point for developing screens and reports. Determine the contents of data stored in files. To develop the logic for DFD processes. 4 The Repository • Although the data dictionary contains information about data and procedure. A central storehouse of large collection of information about the system’s data is called a Data Repository. • It includes: – Information about system data. – Procedural logic. – Screen and report design. – Relationships between entries. – Project requirements and deliverables (陳述). – Project management information. 5 How Data Dictionary relate to DFD • A Data Element, also called a data item or field, is the smallest piece of data that has meaning within an information system. • Data element are combined into Record, also called Structures. • A record is meaningful combination of related data elements that is included in a data flow or retained in a data store. 6 An order form from World’s Trend Catalog Division • This company clothing and other items by mail order using a toll-free phone order system and via the Internet using customized Web forms. • This form gives some clues (綫 索) about what to enter into a data dictionary. • • • First, you need to capture and store the name, address and telephone number of the person placing the order. Then you need to address the details of the order: the item description, size, colour, price, quantity and so on. The customer’s method of payment must also be determined. 7 Defining the Data Flow • • You must document all data flows in the data dictionary. Data flow are usually the first components to be defined. Each data flow should be defined with descriptive information and its composite structure or elements. 8 Include the following information: 1. 2. 3. 4. 5. 6. 7. 8. 9. ID – an optional identification number. A unique descriptive name for this data flow. This name is the text that should appear on the diagram and be referenced in all descriptions using the data flow. A general description of the data flow. The source of the data flow. The source could be an external entity, a process, or a data flow coming from a data store. The destination (終點) of the data flow. The status of data flow, either: • A record entering or leaving a file. • Containing a report, form, or screen. • Internal - used between processes. The name of the data structure describing the elements found in this data flow, it could be one or several elements. The volume (容量) per unit of time. • The data could be records per day or any other unit of time. An area for further comments and notations about the data flow. 9 Data Flow Example1 Indicates that the flow represents an input screen. It could be any screen, such as a mainframe, GUI or Web page. Visible Analyst was used to create the form. 10 Defining Data Structures A data structure is a collection of symbols and elements describing relationship between different data. A more complete data structure is formed from the collection of these structures. • Data structures are a group of smaller structures and elements. • An algebraic notation is used to represent the data structure. • This method allows the analyst to produce a view of the elements that make up the data structure along with information about those elements. • The analyst define these structural records and elements once and use them in many different applications. 11 Algebraic Notation The symbols used are: – Equal sign “=”, meaning “consists of”. – Plus sign (+), meaning "and”. – Braces {} meaning repetitive (重複) elements, a repeating element, group of elements, or tables. • A repeating group may be: – A sub-form. – A screen or form table. – A program table, matrix, or array. • There may be one repeating element or several within the group. • The repeating group may have: – Conditions. – A fixed number of repetitions. – Upper and lower limits for the number of repetitions. – Brackets [ ] for an either/or situation. • Either one element may be present or another, but not both. The elements listed inside are mutually exclusive. – Parentheses () for an optional element. 12 Data Structures Example 1 Customer Order = This is a example of the data structure for Adding a Customer Order at World’s Trend Catalog Division. • Each NEW CUSTOMER screen consists of the entries found on the right side of the equal signs. Some of the entries are elements, but other, such as CUSTOMER NAME, ADDRESS and TELEPHONE, are groups of elements or structural records. E.g. customer name is made up of FIRST NAME, MIDDLE INITIAL and LAST NAME. • Each structural record must be defined until the entire set is broken down into its component elements. Customer Name = Address = Telephone = Available Order Items = Method of Payment = Credit Card Type = Customer Number + Customer Name + Address + Telephone + Catalog Number + Order Date + {Available Order Items} + Merchandise Total + (Tax) + Shipping and Handling + Order Total + Method of Payment + (Credit Card Type) + (Credit Card Number) + (Expiration Date) First Name + (Middle Initial) + Last Name Street + (Apartment) + City + State + Zip + (Zip Expansion) + (Country) Area Code + Local Number Quantity Ordered + Item Number + Item Description + Size + Color + Price + Item Total [Check | Charge | Money Order] [World's Trend | Amer. Express | Discover | MasterCard | Visa] Physical and Logical Data Structures • Data structures may be either logical or physical. – When data structures are first defined, only the data elements that the user would see. This stage is the logical design, showing what data the business needs for its dayto-day operations. – Then analyst designs the physical data structures. • Logical data structures indicate the composition of the data familiar to the user. • The physical data structures which include additional elements necessary for implementing the system. 14 • Additional physical elements include: – Key fields used to locate records in the database table. An example is an item number, which is not required for a business to function but is necessary for identifying and locating computer record. – Codes to indicate record status, such as if an employee is active (currently employed) or inactive. – Transaction codes are used to identify types of records when a file contains different record types. An example is a credit file containing records for returned items as well as records of payment. – Repeating group entries containing a count of how many items are in the group. – Limits on the number of items in a repeated group. – A password used by a customer accessing a secure Web site. 15 • A structure may consist of elements or smaller structural records. • These are a group of fields, such as: – Customer Name. – Address. – Telephone. • Each of these must be further defined until only elements remain. • Structural records and elements that are used within many different systems should be given a nonsystem-specific name, such as street, city, and zip. • The names do not reflect a functional area. • This allows the analyst to define them once and use in many different applications. Structural Records Customer Name =First Name + (Middle Initial) + Last Name Address = Street + (Apartment) + City + State + Zip + (Zip Expansion) + (Country) Telephone = Area code + Local number 16 Data Structure Example 2 This is a example of the data structure for a CUSTOMER BILLING STATEMENT, one showing that the ORDER LINE is both a repeating item and a structure record. The ORDER LINE limits are from 1 to 5, indicating that the customer may order from one to five items on this screen. Additional items would appear on subsequent orders. 17 Defining Elements • Data elements should be defined with descriptive information in the data dictionary, length and type of data information, validation criteria, and default values. • Each element should be defined once in the data dictionary. • The objective is to provide clear, comprehensive information about the data and processes that make up the system. 18 • Attributes of each element are: 1. Element ID. This is an optional entry that allows the analyst to build automated data dictionary entries. 2. The name of the element, descriptive and unique • It should be what the element is commonly called in most programs or by the major user of the element. 3. Aliases, which are synonyms (同義詞) or other names for the element 4. A short description of the element. These are names used by different users within different systems Example, a Customer Number may be called a: • • Receivable Account Number. Client Number. 5. Whether the element is “Base” or “Derived”: – – A base element is keyed into the system and it must be stored on files. A derived element is created by processes as the output of calculations or logic. 19 6. The length of an element. – Some elements have standard lengths, such as a state abbreviation, zip code, or telephone number. – For other elements, the length may vary and the analyst and user community must decide the final length. The final length base on the following consideration: 1. Numeric amount lengths should be determined by figuring the largest number the amount will probably contain and then allowing reasonable room for expansion (擴大). 2. Totals should be large enough to accommodate (容納) the numbers accumulated (積累) into them. 3. It is often useful to sample historical data to determine a suitable length. 20 7. The type of data: numeric, date, alphabetic or character, which is sometimes called alphanumeric or text data. 8. The input and output formats should be included, using special coding symbols to indicate how the data should be presented. For example, XXXXXXXX would be represented as X(8). These may translate into masks used to define database fields. 21 9. Validation (確認) criteria must be defined. Validation criteria for ensuring that accurate data are captured by the system. Elements are either: • A range of values is suitable for elements that contain continuous (連續的) data with a smooth range of values. Continuous elements are checked that the data is within limits or ranges. • A list of values is indicated if the data are discrete (不連接的). – Discrete, meaning they have fixed values. • Discrete elements are verified by checking the values within a program. • They may search a table of codes. • A table of codes is suitable if the list of values is extensive ( 大量的). E.g. telephone country code. • For key or index elements, a check digit is included. 22 10. Any default value the element may have. – The default value is displayed on entry screens and is used to reduce the amount of keying that the operator may have to do. – Default values on GUI screens • Initially display in drop-down lists. • Are selected when a group of radio buttons are used. 11. An additional comment / remarks area. – This might be used to indicate the format of the date, special validation that is required, the checkdigit method used, and so on. 23 Data Element Example 1 • Customer Number may be called Client Number. This variable can be as large as 999999, but cannot be less than zero. Another kind of data element is an alphabetic (字母表) element. BL: blue, WH: white, GR: green. When this element is implemented, a table will be need for 24 users to look up the meanings of these codes. Data Element Example 2 Shows a sample screen that illustrates how the Social Security Number data element might be recorded in the Visible Analysis data dictionary. 1 2 3 4 5 6 8 7 1. Online or manual documentation entries often indicate which system is involved. 2. The data element has a standard label that provides consistency throughout the data dictionary. 3. The data element can have an alternative name, or alias. 4. This entry indicates that the data element consists of nine numeric characters. 5. Depending on the data element, strict limits might be placed on acceptable values. 6. The data comes from the employee’s job application. 7. This entry indicates that only the payroll department has authority to update or change this data. 8. Indicates the individual or department responsible for entering and changing data. Defining Data Stores • Data stores contain a minimal of all base elements as well as many derived elements. • Data stores are created for each different data entity; that is, each different person, place, or thing being stored. • Data flow base elements are grouped together and a data store is created for each unique group. • Since a data flow may only show part of the collective data, called the user view, you may have to examine many different data flow structures to arrive at a complete data store description. 26 Data Store Definition 1. 2. 3. 4. 5. 6. 7. 8. 9. The Data Store ID The Data Store Name, descriptive and unique An Alias for the file A short description of the data store The file type, either manual or computerized If the file is computerized, the file format designates whether the file is a database file or the format of a traditional flat file. The maximum and average number of records on the file The growth per year. This helps the analyst to predict (預告) the amount of disk space required. The data set name specifies the table or file name, if known. In the initial design stages, this may be left blank. The data structure should use a name found in the data 27 dictionary. Data Store Example 1 • Primary and secondary keys must be elements found within the data structure. E.g: • Customer Master File Customer Number is the primary key, which should be unique. • The Customer Name, Telephone, and Zip Code are secondary keys. 28 Data Store Example 2 1. This data store has an alternative name or alias. 2. For consistency, data flow names are standardized throughout the data dictionary. 3. It is important document these estimates, because they will affect design decision in subsequent SDLC phases. 1 2 3 Visible Analyst screen that documents a data store named IN STOCK. Defining the Entities • By documenting all entities, the data dictionary can serve as a complete documentation package. • Typical characteristics of an entity include the following. – Entity name: The entity name as appears in the DFDs. – Description: Describe the entity and its purpose. – Alternate name: Any aliases for the entity name. – Input data flows: The standard DFD names for the input data flows to the entity. – Output data flows: The standard DFD names for the data flows leaving the entity. 30 1 2 Diagram 0 for Order System. 1. The external entity also can have an alternative name, or alias, if properly documented. 2. For consistency, these data flow names are standardized throughout the data dictionary. Visible Analyst screen that documents an external entity named WAREHOUSE 31 Defining the Processes • You must document every process. Following are typical characteristics of a process: – Process name or Label: The process name as it appears on the DFDs. – Description: A brief statement of the process’s purpose. – Process Number: A reference number that identifies the process and indicates relationships among various levels in the system. – Process description: This section includes the input and output data flows. For functional primitives, the process description also documents the processing steps and business logic. (You will learn how to write process descriptions in the next lecture) 32 1 2 Diagram 1 DFD shows details of the FILL ORDER process in the order system. Visible Analyst screen that describes a process named VERIFY ORDER 1. The process number identifies this process. Any sub-process are numbered 1.1, 1.2, 1.3 and so on. 2. Process description includes data flows, logical rules and structured English version of the policy. Data Dictionary Report • The data dictionary serves as the central storehouse of documentation for an information system. In addition to describing each data element, data flow, data store, data structure, entity and process, the data dictionary documents the relationships among these components. You can obtain many valuable reports from a data dictionary, including the following: – An alphabetized list of all data elements by name. – A report by user departments of data elements that must be updated by each department. – A report of all data flows and data stores that use a particular data element. – Detailed reports showing all characteristics of data elements, records, data flows, processes, or any other selected item stores in the data dictionary. 34 Data Dictionary and Data Flow Diagram Levels • Data dictionary entries vary according to the level of the corresponding data flow diagram. • Data dictionaries are created in a top-down manner. • Data dictionary entries may be used to validate parent and child data flow diagram level balancing. • Whole structures, such as the whole report or screen, are used on the top level of the data flow diagram. – Either the context level or diagram zero • Data structures are used on intermediate-level data flow diagram. • Elements are used on lower-level data flow diagrams. 35 Creating Data Dictionaries 1. Data Dictionary entries may be created after the DFD has been completed. – – – 2. The analysts may create a Diagram 0 data flow after the first interviews and At the same time, make the preliminary data dictionary entries. Typically, these entries consists of data flow name and their corresponding data structure. After several additional interviews have been conducted to learn the details of the system, the analyst will expand the DFD and create the child DFD. Each level of a DFD should use data appropriate (恰當) for the level. The data dictionary is then modified to include new structural records and elements . • • Diagram 0 should include only forms, screens, reports and records. As child diagrams are created, the data flow into and out of the processes becomes more and more detailed, including structural records and elements. 36 • It is important that the data flow names in the child DFD are contained as elements or structural records in the date flow on the parent process. • It illustrates a portion of two data flow diagram levels and corresponding data dictionary entries for producing an employee pay check. Process 5 found on Diagram 0, is an overview of the production of an EMPLOYEE PAY CHECK. The corresponding data dictionary entry for EMPLOYEE RECORD shows the EMPLOYEE NUMBER and four structural records. 37 Analyzing Input and Output An important step in creating the data dictionary is to identify and categorize system input and output data flow. Input and output analysis forms may be used to organize the information obtained from interviews and document analysis. This form contains the following commonly included fields: • A name for the input and output. • The user contact responsible (負責) for further details clarification (澄清), design feedback, and final approval (批准). • Whether the data is input and output. • The format of the data flow. • Elements indicating the sequence (次序) of the data on a report and screen (perhaps in columns). • A list of elements, including their names, lengths, base and derived, and their editing criteria. 38 An Example of an input/output analysis form for World’s Trend Catalog Division. • Once the form has been completed, each element should be analyzed to determine whether the element repeats, is optional or is mutually (相互的) exclusive (排斥) of another element. • Elements that fall into a group or that regularly combine with several other elements in many structures should be placed together in a structural record. 39 Developing Data Stores • Another activity in creating the data dictionary is developing data stores. Up to now, we have determined what data needs to flow from one process to another. This information is described in data structures. The information may be stored in numerous places and in each place the data store may be different. • Data stores may be determined by analyzing data flows. • Each data store should consist of elements on the data flows that are logically related, meaning they describe the same entity. Data stores derived from a pending order at World’s Trend Catalog Division. 40 Using and Maintaining the Data Dictionary • As the SA learns about the organization’s systems, data items are added to the data dictionary. • To have maximum power, the data dictionary should be tied ( 束縛) into other programs in the system. So that when an item is updated or deleted from the data dictionary, it is updated or deleted from the database. • Data dictionaries may be used to: – Create reports, screens, and forms. • Use the element definitions to create fields. • Arrange the fields in an aesthetically pleasing screen, form, or report, using design guidelines and common sense. • Repeating groups become columns. • Structural records are grouped together on the screen, report, or form. – Generate computer program source code. – Analyze the system design for completion and to detect design flaws. 41 42 Data Dictionary Analysis • The data dictionary may be used in conjunction with the data flow diagram to analyze the design, detecting flaws and areas that need clarification. • Some considerations for analysis are: – All base elements on an output data flow must be present on an input data flow to the process producing the output. – Base elements are keyed and should never be created by a process. – A derived element should be output from at least one process that it is not input into. – The elements that are present on a data flow into or coming from a data store must be contained within the data store. 43 Review Question • • • • Define the term data dictionary. What information is contained in the data repository? Describe the difference between base and derived elements. What are the main benefits of using a data dictionary? 44