INFORMATION TECHNOLOGY IN BUSINESS AND SOCIETY SESSION 15 – RELATIONAL DATABASES SEAN J. TAYLOR ADMINISTRATIVIA • Assignment 3: Due tonight at midnight (AdSense in a week) • Midterm back on Thursday • Database tutorial led by Varun • Assignment 4: Posted Thursday, due Friday 3/30 ADMINISTRATIVIA II • Groups: I will email a form for you to list up to five classmates and then choose the groups. You can list 0-5. • 2-way feedback: 1. I will send you an anonymous survey. 2. I will send you a brief summary of your current grade. LEARNING OBJECTIVES 1. Understand what relational databases are (or, why text files and Excel are not enough) 2. Identify and distinguish between the following parts of a relational database: tables, records, fields, field values 3. Understand three types of anomalies that arise from un-normalized data 4. Understand how primary keys and foreign keys are used to link tables. WHY ARE DATA VALUABLE? RELATIONAL DATABASES • Store data (insert) • Retrieve data (query) • Software applications • Operations • Analyze data (reporting capabilities) WHY NOT STORE DATA LIKE THIS? Order# Date Customer ID Last Name First Name Address ISBN Book Name Author #0465039138 Code and other laws of cyberspace Lessig, Lawrence Digital Copyright: Protecting #1573928895 Intellectual Property on the Internet Litman, Jessica Price 1 9/1/03 C1001 Bezos Jeff 2 9/2/03 C1004 Sproull Lee 1 Amazon Plaza Dean's Office, Stern School, New York 3 9/3/03 C1002 Student Pat Tisch LC-12, New York #0072952849 MIS in the Information Age Haag, Stephen $98.75 4 9/4/03 C1003 Gates Bill Microsoft Corporation, Redmond Barabasi, AlbertLaszlo $34.95 5 9/5/03 C1003 Gates Bill Microsoft Corporation, Redmond Rheingold, Howard $29.95 6 9/6/03 C1001 Bezos Jeff Rheingold, Howard $29.95 7 9/7/03 C1002 Student Pat Litman, Jessica $55.00 8 9/8/03 C1001 Bezos Jeff Rheingold, Howard $29.95 Linked: The New Science of #0738206679 Networks Smart Mobs: The Next Social #0738206083 Revolution Smart Mobs: The Next Social 1 Amazon Plaza #0738206083 Revolution Digital Copyright: Protecting Tisch LC-12, New York #1573928895 Intellectual Property on the Internet Smart Mobs: The Next Social 1 Amazon Plaza #0738206083 Revolution $25.00 $55.00 RELATIONAL DATABASES • Databases that use a series of logically related twodimensional tables to store their information • Tables are comprised of fields/records, which in turn contain field values Student Field Record Field value Table Last Name SS# DOB Major Smith 100201122 06/11/84 IS Kim 200202222 1/1/85 FIN Davis 300201232 12/31/81 MKT Pat 999132212 3/3/88 ACC RELATIONAL DATABASES Relational Database Tables Records Fields Field values Bytes, bits Field Record Field value Student Table Last Name SS# DOB Major Smith 100201122 06/11/84 IS Kim 200202222 1/1/85 FIN Davis 300201232 12/31/81 MKT Pat 999132212 3/3/88 ACC ADVANTAGES 1. Consistency • We can restrict the values of certain fields (e.g. dates, integers) • We can impose other kinds of constraints (all costs must be positive, last names must be included, orders must have addresses) • Data look the same to all users at the same time. 2. Centralization • Many different users can edit and view the data simultaneously. Efficient sharing of information. 3. Efficient Querying • SQL and other query languages can be used to create complex reports quickly PROBLEMS WITH EXCEL? When should you use a database instead of Excel? – Insertion anomalies – Deletion anomalies – Update anomalies } Data Quality Problems Should we just create multiple workbooks in Excel? – The real power of a database: Querying – How would you answer the following question in Excel? – Find customers that spend on average $50 per book order, that live on West Coast or on the East Coast (but not in Midwest) and whose annual income is at least $150K INSERTION ANOMALIES • Inability to insert a piece of information about an object without having to insert a (bogus) piece of information about something else • Example: Adding a new customer/book before it is ordered How can you add the book “Harry Potter” in the file below? Order# Date Customer ID Last Name First Name Address ISBN Book Name Author 1 9/1/03 C1001 Bezos Jeff 2 9/2/03 C1004 Sproull Lee 1 Amazon Plaza Dean's Office, Stern School, New York 3 9/3/03 C1002 Student Pat Tisch LC-12, New York #0072952849 MIS in the Information Age Haag, Stephen $98.75 Barabasi, AlbertLaszlo $34.95 Rheingold, Howard $29.95 Rheingold, Howard $29.95 Litman, Jessica $55.00 Rheingold, Howard $29.95 4 9/4/03 C1003 Gates Bill Microsoft Corporation, Redmond 5 9/5/03 C1003 Gates Bill Microsoft Corporation, Redmond 6 9/6/03 C1001 Bezos Jeff 7 9/7/03 C1002 Student Pat 8 9/8/03 C1001 Bezos Jeff #0465039138 Code and other laws of cyberspace Lessig, Lawrence Digital Copyright: Protecting #1573928895 Intellectual Property on the Internet Litman, Jessica Price Linked: The New Science of #0738206679 Networks Smart Mobs: The Next Social #0738206083 Revolution Smart Mobs: The Next Social 1 Amazon Plaza #0738206083 Revolution Digital Copyright: Protecting Tisch LC-12, New York #1573928895 Intellectual Property on the Internet Smart Mobs: The Next Social 1 Amazon Plaza #0738206083 Revolution $25.00 $55.00 DELETION ANOMALIES • The loss of a piece of information about one object when a piece of information about a different object is deleted • Example: Deleting order 2 => deleting customer Lee Sproull • Example: Deleting order 1 => deleting book “Code…” Order# Date Customer ID Last Name First Name Address ISBN Book Name Author 1 9/1/03 C1001 Bezos Jeff 2 9/2/03 C1004 Sproull Lee 1 Amazon Plaza Dean's Office, Stern School, New York 3 9/3/03 C1002 Student Pat Tisch LC-12, New York #0072952849 MIS in the Information Age Haag, Stephen $98.75 Barabasi, AlbertLaszlo $34.95 Rheingold, Howard $29.95 Rheingold, Howard $29.95 Litman, Jessica $55.00 Rheingold, Howard $29.95 4 9/4/03 C1003 Gates Bill Microsoft Corporation, Redmond 5 9/5/03 C1003 Gates Bill Microsoft Corporation, Redmond 6 9/6/03 C1001 Bezos Jeff 7 9/7/03 C1002 Student Pat 8 9/8/03 C1001 Bezos Jeff #0465039138 Code and other laws of cyberspace Lessig, Lawrence Digital Copyright: Protecting #1573928895 Intellectual Property on the Internet Litman, Jessica Price Linked: The New Science of #0738206679 Networks Smart Mobs: The Next Social #0738206083 Revolution Smart Mobs: The Next Social 1 Amazon Plaza #0738206083 Revolution Digital Copyright: Protecting Tisch LC-12, New York #1573928895 Intellectual Property on the Internet Smart Mobs: The Next Social 1 Amazon Plaza #0738206083 Revolution $25.00 $55.00 UPDATE ANOMALIES • A need to change the same piece of information about an object multiple times • Example: Changing Jeff Bezos address in order 1 leaves orders 6 and 8 unchanged… Order# Date Customer ID Last Name First Name Address ISBN Book Name Author 1 9/1/03 C1001 Bezos Jeff 2 9/2/03 C1004 Sproull Lee 1 Amazon Plaza Dean's Office, Stern School, New York 3 9/3/03 C1002 Student Pat Tisch LC-12, New York #0072952849 MIS in the Information Age Haag, Stephen $98.75 Barabasi, AlbertLaszlo $34.95 Rheingold, Howard $29.95 Rheingold, Howard $29.95 Litman, Jessica $55.00 Rheingold, Howard $29.95 4 9/4/03 C1003 Gates Bill Microsoft Corporation, Redmond 5 9/5/03 C1003 Gates Bill Microsoft Corporation, Redmond 6 9/6/03 C1001 Bezos Jeff 7 9/7/03 C1002 Student Pat 8 9/8/03 C1001 Bezos Jeff #0465039138 Code and other laws of cyberspace Lessig, Lawrence Digital Copyright: Protecting #1573928895 Intellectual Property on the Internet Litman, Jessica Price Linked: The New Science of #0738206679 Networks Smart Mobs: The Next Social #0738206083 Revolution Smart Mobs: The Next Social 1 Amazon Plaza #0738206083 Revolution Digital Copyright: Protecting Tisch LC-12, New York #1573928895 Intellectual Property on the Internet Smart Mobs: The Next Social 1 Amazon Plaza #0738206083 Revolution $25.00 $55.00 MODELING DATA WITH ENTITYRELATIONSHIP DIAGRAMS ENTITY RELATIONSHIP DIAGRAM • The aim of an ERD is to model the data within the Information System. • Provides a CONCEPTUAL DATA MODEL: a concept of the system, independent of implementation 1. What data should be stored? 2. What relationships exist between items of data? ENTITIES An actual, real thing or person about which data might be stored is referred to as an entity. An entity can be uniquely identified. Organizations collect and store data about entities: • if a bank stores data about you - you are an entity • if a business stores a piece of paper called an invoice - the invoice is an entity • a library stores data about a particular book - the book is an entity RELATIONSHIPS • Entities are associated with each other via relationships. • A relationship is a named association between two or more entity types: Player Plays for Team DEFINING RELATIONSHIPS Entity-relationship (E-R) diagram: a graphic method of representing entity classes and their relationships. • • • • • Rectangle – entity class Dotted line – relationship | – single relationship O – zero or optional relationship Crow’s foot () – multiple relationship The types of relationships reflect the business rules applicable to the entities SIMPLE HOSPITAL EXAMPLE In a hospital system, each ward has many patients who are cared for by nurses assigned to the specific ward. Patients may require treatment by more than one specialist doctor. A patient belongs to only one ward. SIMPLE HOSPITAL EXAMPLE In a hospital system, each ward has many patients who are cared for by nurses assigned to the specific ward. Patients may require treatment by more than one specialist doctor. A patient belongs to only one ward. SIMPLE HOSPITAL SYSTEM • Ward has many patients (1:N) • Patients are cared for by nurses (N:M) • Ward has assigned many nurses (1:N) • Patients require treatment by one or more doctor (N:M) SIMPLE HOSPITAL SYSTEM ERD WARD has assigned NURSE DOCTOR accommodates cares for treats PATIENT UNIVERSITY DATABASE EXAMPLE A department has many Professors. A Professor belongs to only one department. The department offers many different courses, and many Professors can teach a single course. A Professor can also teach more than one course. Students enroll for many courses and courses have many students. A course belongs to only one department. FITTING DATA INTO THE RELATIONAL MODEL NORMALIZATION NORMALIZING AMAZON’S DATA • The process of assuring that a database can be implemented effectively as a set of two-dimensional tables • Unlike Excel though, the tables are connected • Prevents insertion, deletion and update anomalies CONNECTING TABLES TOGETHER Primary keys – A field (or group of fields in some cases) that uniquely describes each record in a table – Examples: Customer ID, ISBN, Order# Foreign keys Each record should have a unique primary key – A field that is a primary key in one table and appears in a different table (though not as the primary key) – Examples: Customer ID in Orders Integrity constraints – Rules that help ensure data quality DATABASE SCHEMA Summary of the logical structure of your database – The tables in your database, along with each of their fields, keys – The relationships between the tables Primary key Primary key Primary key Foreign key Foreign key NEXT CLASS: SQL • Do the SQL tutorial at http://sqlzoo.net/ (at least sections 0-3) • Download the “Facebook” database from Blackboard and make sure you can open it in MS Access • Interest in OSS databases? • Bring a laptop if you want.