Database Development Process CS263 Lecture 2 Enterprise data modelling • Is the first step in database development, in which the scope and general contents of organisational databases are specified • The ‘enterprise’ is the complete entity being modelled, be it a corner shop or a whole corporation. In practice, creating an enterprise-wide data model is a fairly rare occurrence – only likely to occur when there is a major change in direction for the organisation (such as complete systems re-design or business take-over) • However, it is vital that some form of enterprise data model is in place • Enterprise data modelling • Ideally, the model should encompass both current and historical data (often as two or more discrete models with links between them) • The model should reflect both the data and the processes of the organisation, plus the views of these components required by the various sections of the organisation. Thus sales data (invoices etc.) will be viewed from a monetary aspect by the accountant, from a quantitative aspect by the store staff and from both these aspects by the marketing people • These various views of the data will then become information meaningful to the recipient. Segment from enterprise data model (Pine Valley Furniture Company) Information Engineering • “A data-oriented methodology to create and maintain information systems.” • Top-down planning approach. • Four steps: – Planning (Results in an Information Systems Architecture) – Analysis – meet the various members of the organisation (from top to bottom) to discuss and agree on the data requirements and the processes involved – Design – Implementation Information systems architecture • A high level data model such as previous Fig. is only one part of an overall Information Systems Architecture (ISA) for an organisation. ISA has 6 components • 1. Data • 2. Processes that manipulate that data ( represented by DFDs, Object Models) • 3. Network – transports data around the organisation and between the organisation and its key business partners • 4. People – who perform processes and are the sources and receivers of data and information Information systems architecture • 5. Events and points in time when processes are performed • 6. Reasons for events and rules that govern the processing of data Information systems planning • Goal is to “Align IT with the business strategies of the organisation” Has three steps: • 1. Identify strategic planning factors • Organization goals (e.g. maintain 10% yearly growth rate) • Critical success factors (e.g. make high quality products) • Problem areas (e.g. increasing competition) Information systems planning 2. Identify corporate planning objects: • Organizational units (departments of the organisation), Organizational locations (places where business operations occur), Business functions (related groups of business processes supporting the mission of the organisation), and Entity types (major categories of data about the people, places and things managed by the organisation) and Information (application) systems (the application software and supporting procedures for handling data Information systems planning 3. Develop enterprise model, which consists of: • Decomposition of business functions (break down functions of an organisation into progressively greater levels of detail, generally in a hierarchical form) • Data decomposition (process of breaking down organisational data requirements into identifiable components, generally in hierarchical form) • Enterprise data model (see later) • Planning matrices (see later) Example of process decomposition of an order fulfillment function Enterprise data model • “Sets the range and general contents of organizational databases.” • “Results in a total picture or explanation of organizational data, not in the design for a particular database.” • Entity-relationship diagram. • Descriptions of entity types. • Relationships between entities. • Business rules. Planning matrices • Provide an explicit approach for describing business requirements because they provide an explicit approach for describing business requirements without requiring that the database be explicitly modelled. Function to data entity • Location to function (which business function is being performed at which location) • Unit to function (which business functions are the responsibility of which business units) • IS application to data entity (explains how each information system interacts with each data entity) Planning matrices • Supporting function to data entity (which data are captured, used, updated, deleted within each function) • IS application to business objective (shows which information systems support each business objective) Following Fig. illustrates a possible function-to-data entity matrix. Can be used for many purposes, including the following 3: • Identify orphans – indicate which data entities are not used by any function, or which function do not use any entities • Spot missing entities – Employees involved with each function who examined the matrix can identify any entities that may have been missed • Prioritise development – If a given function has a high priority for systems development (perhaps because it is related to important organisational objectives) then the entities used by that area also have a high priority in database development Data Entity Types Business Function Business Planning Product Development Materials Management Order Fulfillment Order Shipment Sales Summarization Production Operations Finance and Accounting Customer Product Raw Material Order Work Center Work Order Invoice Equipment Employee Example business function-todata entry matrix X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Systems development life cycle (SDLC) • Is a complete set of steps that a team of IS professionals follow to specify, develop, maintain and replace information systems • Process is often viewed as a cascade of steps (see following figure) • Cascade or ‘waterfall’ approach as each step flows into the next – though steps can overlap in time and it is possible to backtrack when prior decisions need to be reconsidered Database development activities during the SDLC Project Identification and Selection Project Initiation and Planning Analysis Logical Design Physical Design Implementation Maintenance Alternative IS development approaches • The previous approach is methodical and highly structured, which includes many checks and balances at each step. Often criticised for the length of time needed until a working system produced • Increasingly, organisations use more Rapid Application Development (RAD) methods which follow an iterative process of rapidly repeating analysis, design and implementation steps until convergence on the system the user wants Alternative IS development approaches • One of the most popular RAD methods is Prototyping (Fig. 2-6) • An iterative process in which requirements are converted to a working system that is continually revised through close work between analysts and users • With increasing popularity of visual RAD tools (VB, Delphi, Visual C++, Java) prototyping is becoming the IS development methodology of choice. The prototyping methodology and database development process Managing the people • To determine whether a project is progressing on time and within budget, the project leader develops detailed schedules of project activities • These are often depicted in graphical form, such as the Gannt and Pert charts in the following Figs. Figure 2-7a Gantt Chart Figure 2-7b PERT chart Three schema architecture for database development • Conceptual Schema (during the Analysis phase) is a detailed specification of the overall structure of organisational data that is independent of any database management technology) – depicted in ER or objectmodelling notations (a data model). Specifications stored as metadata in repository or data dictionary • External Schema (or user view) is some portion of the database that is required for a user to perform some task. Also independent of database technology but typically contains a subset of the associated conceptual schema. Often the original description of a user view is is a computer screen display, business transaction or report. Three schema architecture • Logical version of a user view can be represented by ERDiagram, object diagram or as relations. • Internal Schema (physical schema) contains specifications for how data from a conceptual schema are stored in a computer’s secondary memory. Three-schema database architecture Three tiered architecture • Typically develop conceptual and external schemas iteratively (see Fig.). • Often, a first go at the conceptual schema is developed based on the organisations enterprise data model and a general understanding of the database requirements • Then external schemas for each transaction, report and screen display are developed • Often, further analysis of external schemas will yield new attributes and possibly entities and relationships not shown in the original conceptual schema • So then the conceptual schema is modified and augmented Process of developing three-schema architecture for a database project Three-tiered database location architecture • The data for a given information system may reside in multiple locations or tiers of computers, in order to balance various technical and organisational factors • Four tiers are possible for data to be on – on a client server, an application server, a Web server and a database server. However, three tiers are more commonly considered (Fig. 2-10): • Client tier – typically a desktop or laptop computer, which concentrates on managing the user-system interface and localised data (also called presentation tier). Web scripting tasks may be executed here. Three-tiered • Application/Web server tier – processes HTTP protocols, scripting tasks, performs calculations and provides access to data (also called process services tier). • Enterprise server (minicomputer or mainframe tier) – performs sophisticated calculations and manages the merging of data from multiple sources across the organisation (also called the data services tier) Three-tiered client/server database architecture Reasons for using a client/server architecture • It allows for simultaneous processing on multiple processors for the same application, thus improving application response time and data processing speed • It is possible to take advantage of the best data processing features of each computer platform (e.g. the advanced user interface capabilities of PCs versus the speed of minicomputers and mainframes) • Can mix client technologies (by different companies such as Intel, Sun, Motorola) and yet share common data • Can change technologies in any tier with limited impact on the other tiers Reasons for using a client/server architecture • Processing can be performed close to the source of processed data, improving response times and reducing network traffic • Allows for and encourages open system standards • Ease of separating the development of the database and the modules that contain the database from the information system modules that present the contents to end users. These are typically developed in Powerbuilder, VB or Delphi and interact through middleware to the routines that access and analyse the data Using and defining views • Views provide users controlled access to tables. Advantages of dynamic views: – Simplify query commands – Help provide data security and confidentiality – Enhance programming productivity – Contain most current base table data – Use little storage space – Provide a customised view for a user – Establish physical database independence • CREATE VIEW command Using and defining views • Disadvantages of views: • Use processing time re-creating view each time it is referenced • May or may not be directly updateable • Some disadvantages can be overcome by using materialised views which are stored physically on disk and refreshed at appropriate intervals The SELECT clause • Specifies (or projects) what data elements (columns) are to be included in the view table • The FROM clause lists the tables and views involved in the view development • The WHERE clause specifies the names of the common columns used to join the tables • Because a view is a table its rows may not be sorted Views • e.g., build a query to generate an invoice for order number 1004 - where rather than having to specify the joining of four tables, the query can include all relevant data elements from the view table INVOICE_V: • CREATE VIEW INVOICE_V AS • SELECT CUSTOMER_ID, CUSTOMER_ADDRESS, PRODUCT_ID, QUANTITY etc. • FROM INVOICE_V • WHERE ORDER_ID = 1004; Views • A view may join multiple tables or views together and may contain derived (or virtual) columns • e.g., if a user only wants to know the total value of the orders placed for each furniture product, a view can be created from the previous view (INVOICE_V) • We can assign a different name (alias) to a view column than the associated base table or expression column name • Here PRODUCT is a renaming of PRODUCT_ID, local to only this view Views • TOTAL is the column name given to the expression for total sales of each product • The expression can now be referenced via this view in subsequent queries as if it were a column (rather than a derived expression) • CREATE VIEW ORDER_TOTALS_V AS • SELECT PRODUCT_ID PRODUCT, SUM(UNIT_PRICE*QUANTITY) TOTAL • FROM INVOICE_V • GROUP BY PRODUCT_ID; Views • Views can establish security because tables and columns that are not included will not be obvious to users of the view • Restricting access to a view with GRANT and REVOKE statements adds another layer of security • Some people advocate the creation of a view for every single base table, even if that view is identical to the base table • This can create to greater programming productivity as databases evolve, through greater flexibility when base tables are changed Views • The following examples are based on the following views: • CREATE VIEW CUSTOMER_V AS SELECT * FROM CUSTOMER_T; • CREATE VIEW ORDER_V AS SELECT * FROM ORDER_T; • CREATE VIEW ORDER_LINE_V AS SELECT * FROM ORDER_LINE_T; • CREATE VIEW PRODUCT_V AS SELECT * FROM PRODUCT_T; Views • Updating data directly from a view rather than from base tables is possible under certain limitations • Usually permitted as long as the update is unambiguous in terms of data modification in the base table • When the CREATE VIEW statement contains any of the following five situations, that view may not be updated directly: • 1. The SELECT clause includes the keyword DISTINCT (see later) • 2. The SELECT clause contains expressions, including derived columns, aggregates, statistical functions etc. Views • 3. The FROM clause, or a subquery, or a UNION clause references more than one table • 4. The FROM clause or a subquery references another view that is not updateable • 5. The CREATE VIEW command contains a GROUP BY or HAVING clause • It could happen that an update to an instance would result in the instance disappearing from the view. • A view EXPENSIVE_STUFF_V lists all furniture products where the unit price is > $300. Views • That view will include PRODUCT_ID 5, a writers desk = $325 dollars. If we update this view and reduce writer’s desk to $295, the desk will no longer appear in the view because its unit price is now less than $300. • If it is desired to track everything with an original price over $300, we must include a WITH CHECK OPTION clause after the SELECT clause in the CREATE VIEW COMMAND. • This will cause UPDATE or INSERT statements to be rejected when they would cause rows to be removed from the view Sample CREATE VIEW • CREATE VIEW EXPENSIVE_STUFF_V AS • SELECT PRODUCT_ID, PRODUCT_NAME, UNIT_PRICE • FROM PRODUCT_T • WHERE UNIT_PRICE >300 • WITH CHECK_OPTION; •CHECK_OPTION works only for updateable views and prevents updates that would create rows not included in the view, or cause updated or inserted rows to be removed from the view