IS 342 Amaravadi COURSE WRAP UP The traditional IS environment consisted of applications developed haphazardly. These applications were mainly TP systems that used a lot of data -- the data was often duplicated into multiple files. Called file processing, it resulted in problems such as uncontrolled redundancy, inconsistent data, inflexibility, limited data sharing, poor enforcement of standards, low programmer productivity and excessive program maintenance. The main problem was a lack of an integrated approach to data i.e. organizations did not have an integrated view of their data needs. These problems led to the development of DBMS technologies. The database approach is the result of a realization that information is a resource that needs to be managed carefully. Just as physical resources such as budgets and inventory need to be managed, so it is with information. Managing information means to store, update and retrieve it. A customer needs to know his/her account balance or a car buyer might need information about the vehicle. Over the last three decades, DBMS has become a popular way to manage information resources. A database is a collection of logically related data designed to meet the needs of multiple users in an organization. A DBMS is a software program that helps organizations manage their data resources. A DBMS environment consists of a repository, the database, users, administrators, the application environment and CASE tools to supplement the system development process. The database is organized into three levels according to the three schema architecture; this has been introduced by ANSI/SPARC to standardize database architecture. DBMSs have facilities for each of the levels of the three schema architecture. The data consists of three levels, the external, conceptual and internal levels. The external level is concerned with the way the data is viewed for example an invoice may be the way customers view the data about their order, the conceptual with the way the data has been designed (this is in the form of several base tables such as customer, product and order) and the internal level with the manner in which data is stored. The main purpose of the three schema architecture is to provide logical and physical data independence. At the conceptual level, database is organized into files, records and fields. In the real world these concepts are referred to as eclasses, entities and attributes. Relationships among eclasses are captured by data models. This is one of the main differences between the database and file processing approaches i.e. file processing did not employ data models, but these are essential to the database approach. When the database is designed, the eclasses in the data models become tables and, relationships among eclasses are captured by cross reference keys. The database development process starts with planning. Only with a good plan can a DBMS be developed within budget and on schedule. In order to get the planning process underway, we need to have management commitment, a planning methodology and a good team of willing and enthusiastic analysts. Planning starts in a top-down fashion where we interview people from the top down. What sort of information is collected during the planning stage? We need information on functions, processes, activities, organizational objectives and existing application environment. These functions, processes and activities create or use information. For e.g. an ordering process could create the order entity. Such process data relationships are captured with Enterprise Analysis Matrices. One example is process vs data class matrix to see which process uses which data classes. The planning matrices give an overview of data requirements in the enterprise. The planning process is top down but the design process is bottom-up. Before doing design it is necessary to identify the requirements. During requirements definition, we collect information on eclasses, attributes and their allowable values (what is the term for this?). This information is gathered by analyzing views of data. What are examples of these? We also collect information on constraints. There are three main types of constraints: domain, business, and database constraints. With this information a data model is developed. This is known as the conceptual data model. The next stage is the logical design which involves decomposing the data model into a set of well-structured tables, followed by the physical database design. If we use the ER model to do the design, three rules are used to decompose the data model into relations. In the case of 1:1 relationships, each entity class is placed in a separate relation with a cross-reference key used in either to get information from the other relation. In the case of 1:M relationships, each entity class is placed into a separate relation, but the many side gets the primary key from the one side. In the case of an M:N relationship, each entity class is put into a separate relation and an intersection record consisting of keys from the M side and the N side is created. Database design is normally carried out using the functional dependency approach. A functional dependency is a relationship between two or more attributes such that if we know the value of one attribute we are able to determine the value of other attributes. This is taken to mean that if for each value of an attribute ‘a’, there is only one and one value of attribute ‘b’ then we can say that a b. A functional dependency diagram lists the dependencies. We could also carry out database design using the normalization process. Normalization can be thought of as removing unwanted functional dependencies. During this process we remove repeating groups, partial dependencies, transitive dependencies and multi-valued dependencies to arrive at various stages of normalization: Repeating groups – more than one set of values at row-column intersection. Partial dependency – an attribute is dependent on part of the primary key. Transitive dependency – an attribute is dependent on a non key attribute. 1st Normal Form – No repeating groups, but there are other types of dependencies present. 2nd Normal Form – No repeating groups, No partial dependencies, but there are other types of dependencies present 3rd Normal Form – No repeating groups, partial dependencies or transitive dependencies. For the fourth Normal Form there are no Multi-valued dependencies. Physical database design is concerned with mapping the logical structure developed during the logical design into the database. It includes issues of volume and usage analysis - which is concerned with identifying the volumes and data accesses in a given time period; database distribution - which is concerned with where the databases are physically located; file-organization - which is concerned about how records are arranged on secondary storage; and integrity constraints - which are mainly concerned about ensuring that the database information is consistent, accurate and non-redundant. Information from the database is obtained by means of SQL of which there are five types: DDL, DML, SQL/AU, SQL/I, SQL/T. The two common types are -- Data Definition Language (DDL) and Data Manipulation Language (DML). Examples of the former include: create index <name of index> on <table name>(attribute name); create table <relation name> …; Examples of the latter include: Select...From…Where.. Update product table on product# set price = price * 1.1.. Knowledge of the fundamentals of database systems can enable organizations to reap rich rewards from their information resources.