Pearcey Centre Course CO24 Database Design using SQL My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pclec 01 / 1 Introduction to Database Technology, and Database Design using SQL Objectives : To introduce you to – Database Technology – RDB Management Systems – The Relational Database Model – Relational Database Design concepts – Structured Query Language – Data Warehousing Pclec 01 / 2 Introduction to Database Technology, and Database Design using SQL • And some insight into some of the components and structure of the Oracle DBMS (version 8i) Pclec 01 / 3 Database and Associated Topics The objective of this lecture is to introduce you to a cross section of material which will be introduced over the next 9 lectures. You will look at - the scope of database, - why this form of data management is so deeply entrenched in the Information Technology world, - the different ‘sizes’ of database - and the reasons for this - the aspects of security, recovery, accuracy and integrity - and some of the advantages and disadvantages of database technology Pclec 01 / 4 SQL Development There will be some selected and appropriate SQL commands (user level), and examples will be included in the lecture material AND there will be some exercises based on SQL and its functions each laboratory session. There will also be some discussions and review material at the laboratory sessions Pclec 01 / 5 Database Theory Why database? Data is a valuable corporate resource which needs accuracy, consistency, and security controls. Pclec 01 / 6 Database Theory The ‘centralised’ control of data means that for many applications the data will already exist, and facilitate quicker development. Data will no longer be related by application programs, but by the structure defined in the database. And this also means Easier, Faster and Less Costly User System Maintenance Pclec 01 / 7 Traditional File Systems Consider some of the problems of traditional file systems. In the the past as new applications were written, they either used existing files, or created a new file or files for their use. Frequently, several existing files needed to be sorted and merged to obtain the new file. Thus, it is probable that several files contained the same information stored in different ways. In other words, there would have been redundant and possibly inconsistent data. Consider the files for an insurance company POLICYNUMBER POLICYHOLDER PREMIUMS data ADDRESS PREMIUM-PA PREMIUM-TOTAL POLICYNUMBER POLICYHOLDER AGENCY data ADDRESS AGENT-CODE RENEWAL-DATE RENEWAL-AMT Pclec 01 / 8 Information / Data A General Definition: Data: raw (unprocessed or part-processed) facts which represent the state of entities (things) which have occurred. Information: data which has been processed into a form useful to the user. What is information to one user, may be data to another user. Pclec 01 / 9 Basic Definitions Database: A collection of related data Data: Known facts that can be captured and recorded Schema: Some part of the real world about which data is stored in the database. Database Management System(DBMS): A software package to facilitate the creation and maintenance of a computerised database. Pclec 01 / 10 What is a Database ? A DATABASE is a shared collection of Inter-related data which is designed to meet the needs of multiple types of users and applications. Thus the concept of USER VIEWS • Data stored is INDEPENDENT of the programs which use it • Data is structured to provide a foundation for future applications • Data may be physically distributed Pclec 01 / 11 Data Base Management System The Primary Objectives of a DBMS are to provide facilities for : 1. Definition of Database Logical Structures 2. Definition of Physical Structures 3. Access to the Database 4. Definition of Storage Structures to store user data These components are known as the ‘database architecture’ Pclec 01 / 12 DataBase Management System • Software - Provides access to a database in an integrated and controlled manner. • Must contain (1) Definition/Structure capabilities (2) Data manipulation capabilities Pclec 01 / 13 DBMS Components 1. Data Description Language (DDL) - used to describe data at the database level 2 levels (1) Schema - complete description of a database (2) Sub-Schema - user view 2. Data Manipulation Language (DML) Provides for Create, Insert, Delete, Drop, Retrieve, Report, Update, Modify Calculate (derive) ---> Common term ‘ QUERY’ Pclec 01 / 14 Three Schema Architecture ANSI & ISO suggest that a DBMS should have three schemas Conceptual Schema - the global logical model of the data and processing of the enterprise. i.e. community user view. External Schema(s) - the logical application views of the Conceptual Schema. i.e. individual user views. Internal Schema - the internal level storage view. Pclec 01 / 15 Data Base Architecture 3 Schema Architecture 1. User Views - External Schema 2. Complete Database 3. Physical Database - Conceptual Schema - Internal Schema Pclec 01 / 16 Three Schema Architecture External Schema 1 External Schema 2 External Schema n Conceptual Schema Internal Schema Pclec 01 / 17 Application Development Applications and their data needs are not considered in isolation. Centralised control of one or several databases takes place. i.e. database administration. Data administration is seen as an important part of system development. CLAIMS PREMIUMS D B M S CLAIMS PREMIUMS Pclec 01 / 18 Data Integrity Validation or integrity rules may be defined and automatically invoked at run time by the DBMS regardless of the source of update i.e. application program, 4GL screen or query language. Significant variation exists among DBMS in the level of support for semantic data integrity. ISO suggest that 100% of all enterprise rules should be held in the conceptual schema, and specifically none in application programs. An area of significant development during the 1990's. Pclec 01 / 19 Data Integrity Application Programs 4GL Screens & Stored Pros. Interactive Query Language D B M S CATALOGUE Data Definitions & Integrity Rules STORED DATA Pclec 01 / 20 Inter-Related Data CLAIMS RENEWALS D B M S AGENCY RENEWALS CLAIMS AGENCY Data related by structure Flexible enquiry easier QUERY Pclec 01 / 21 Multiple Applications LOCAL VIEWS DATABASE AGENCY CLAIMS RENEWALS Pclec 01 / 22 Important Database Functions (1) Data Integrity Data Independence Referential Integrity Concurrency Control Database Consistency • Multi Users • Distributed Database • Replicated Database • Partitioned Database Pclec 01 / 23 Important Database Functions(2) Recovery from Failure • Transaction • Media Determinancy • Consistent Results • Respond to ALL events • and cater for unpredictable order Scalability Pclec 01 / 24 Database Environment Databases Can Be: • Transaction Intensive Databases • Decision Support Databases • Mixed Load Databases • Small Databases • VLDB - Very Large Databases • Non-traditional Databases - weather forecasting Pclec 01 / 25 The Many Faces of Database They can be: Data Warehouses Data Marts (and Data Martlets) How is a database size measured ? There are a number of ‘measurements’ Raw data size Total database size Total usable disk space size (which includes media protection such as mirroring) Pclec 01 / 26 The Many Faces of Database Hardware Database Raw Data Total Disk HP9000 Oracle 100GB 643GB Digital 8400 Oracle 100GB 361GB IBM SP2 DB2/6000 100GB 377GB NCR5100 Teradata 100GB 880GB NCR5100 Teradata 1,000GB 3,280GB Pclec 01 / 27 The Many Faces of Database The first databases were stored on large centralised mainframe computers. They were accessed from terminals which had no processing capability As distributed computing and microcomputers became available during the early 1980’s, 2 new kinds of databases emerged : personal databases client/server databases Pclec 01 / 28 The Many Faces of Database Personal databases (Microsoft Access and FoxPro) are aimed at the single-user database applications which are stored on the single user’s desktop computer - a client workstation When a personal DBMS is used for a multiuser application,the database application files are stored on a file server and transmitted to the individual users across a network. A Server refers to any computer able to accept requests from other computers and to share some or all of its resources such as printers, files, programs, Pclec 01 / 29 The Many Faces of Database A network is an infrastructure of telecommunications hardware and software which enables computers to transmit messages to each other With a personal DBMS, each client workstation must load the entire application into memory along with the client database application in order to view, insert, update or print . A client request for a small amount of data from a large database might require the server to transmit the entire database to the client’s workstation. Pclec 01 / 30 The Many Faces of Database Newer personal databases use indexed files which enable the server to send only part of the database. In either case there is a heavy demand on client workstations and on the network. Pclec 01 / 31 The Many Faces of Database Client/server databases split the DBMS and the applications into a ‘process’ running on the server and the applications running on the client. The client application sends data requests across the network. When the server receives a request, the server DBMS process retrieves the data from the database, performs the requested functions, and sends only the final query results back via the network to the client. This generates less network traffic than personal databases. Pclec 01 / 32 The Many Faces of Database Another important difference between client/server and personal databases is in the handling of client failures. In a personal database system, when a client workstation fails, the database is likely to be damaged due to interrupted updates, deletes, insertions. Records in use at the failure time are locked. They are unavailable to other users. The database may be able to be repaired, but all users must log off during the repair process. Often the processes active at the time of failure cannot be reconstructed. The database must be restarted to the last regular backup, but transactions since that backup are not automatically available (normally) Pclec 01 / 33 The Many Faces of Database A client/server database is not affected when a client workstation fails. The failed client’s in-process transactions are lost, but the failure of a single client should not affect other users. In the case of a server failure, a central synchronised transaction log, which contains a record of all current database changes, enables in-progress transactions from all clients to be either fully completed or rolled back. Pclec 01 / 34 The Many Faces of Database Rolling Back has the effect of the database never having processed the transactions. Client transactions can then be resubmitted. Most client/server database servers have additional features to minimise the risk of failure and have fast recovery mechanisms. It is a bit similar to the ‘undo’ which you have met in some of Microsoft’s office software (there is a small exercise with commit and rollback in a few week’s time) Pclec 01 / 35 The Many Faces of Database Client/server systems also differ in the way in which they handle competing transactions. A system of locking is normally applied which forces transaction other than the one current to wait until the lock is unset. A personal database uses optimistic locking - there is the assumption that 2 or more competing transactions will not occur at the same time. User code can be written if this situation is not acceptable. Transaction processing: This refers to the grouping of related database changes into batches which must either all succeed or all fail. Pclec 01 / 36 DataBase Environment All databases require: – – – – – – – Querying Capabilities Data Display facilities Database navigation Data entry (Initial Load, Transactions) Data validation Data deletion Committing capability Pclec 01 / 37 Database Transactions · Sometimes several database operations need to be treated as one atomic unit which may either succeed or fail. EMP EMPNO E3 E4 E1 E2 BUDGET SALARY 30,000 60,000 50,000 18,000 DEPT D2 D2 D1 D1 DEPT T0TAL SALARY D1 D2 68,000 90,000 To keep the budget correct, any alteration to EMP would need to flow onto (into ?) BUDGET Pclec 01 / 38 Concurrency Control · The DBMS should support multiple concurrent users of the same data and ensure that the data remains consistent at all times. TX 1 Part 2 TX 2 QOH 10 Delivery of 10 items Supply 5 items QOH=QOH-5 Part 2 What is the correct result ? QOH 20 Part 2 QOH 5 Pclec 01 / 39 Security Each user may require identification with a user-id and password. Users may be limited in the data they can see and what actions they can perform on that data. The DBMS may encrypt and decrypt data as it is stored and retrieved. Many systems now provide data value sensitive security. There is an article on ‘security’ in about Week 5. Pclec 01 / 40 Disadvantages of Database Processing • Complexity • Expense • Vulnerability • Size • Training Costs • Compatibility • Technology Lock-In Pclec 01 / 41 Advantages of Database Processing • Reduction in Data Redundancy • Data Integrity • Data Independence • Data Security • Data Consistency • Easier Use of Data via DBMS Tools (Query Language, 4GL’s • Less Disk Storage Pclec 01 / 42 Costs Associated with Database The initial purchase cost Planning and design Database education and training Application and data conversion System overheads (response) Management and Administration Complexity of support Pclec 01 / 43 The Users So, who are the users ? There are 4 main groups 1. Unsophisticated or ‘naïve’ users They interact with the system by invoking one of the application programs which have been written as part of the design and implementation processes. E.g. a person wishing to find a bank account balance uses an ATM or Web program which has a ‘form’ the person can complete and ‘send’. The balance detail will be returned. Pclec 01 / 44 The Users 2. Application Programmers Normally these are computer professionals who write application programs. They can choose from many tools to develop the interfaces. RAD’s for instance are tools which enable a programmer to construct forms and reports. There are languages which combine imperative control structures (for loops, if-the-else statements) with statements of the data manipulations language. (known as 4th generation languages). Pclec 01 / 45 The Users 3. ’Sophisticated’ users. They interact with the system without writing programs. They develop their database requests using a database query language. The queries are submitted to a query processor, which interprets the query and converts it into instructions. (non-procedural language). On line analytical processing (OLAP) tools simplify analysts’ tasks by the ‘viewing’ of results in a variety of ways. E.g. sales by region, or region and product, or by city with a region. Another class of tools is found in Data Mining applications Pclec 01 / 46 The Users 4. Specialised Users. These are sophisticated users who who write specialised database applications which don’t fit into the ‘traditional’ or ‘normal’ data processing framework. Computer aided design, knowledge based and expert systems. Systems which store data with complex data types such as graphics and audio data, and environment modelling systems - such as the Country Fire Authority and the Ambulance systems. These are gaining in popularity and use. Pclec 01 / 47 And that’s it for the first session ! Pclec 01 / 48