Designing MS-Access Tables Relational Database Concepts Paul A. Harris, Ph.D. General Clinical Research Center Introduction Database design (data modeling) is crucial for long-term management of information For many users, the first experience using MS-Access (or any RDBS) is confusing A major cause of confusion is the design and use of tables Agenda Discuss relational database concepts - Keys and relationships - Normalization - Strategy Fields - Types - Demonstration Referential Integrity Overview MS-Access is a relational database engine and a set of integrated development tools Tables = Data Queries = combine tables + ask questions Forms/reports UI Macros/Code add functionality Table s Quer y Code Report Forms Macr o Relational Database Concepts - Keys Keys are pieces of data that help to identify a row of information in a table Primary key uniquely identifies an entire row of data – 1) must have a value (cannot be null); 2) can never change(?); and 3) must have a unique value for each record in table. - Look for a logical field meeting criteria - If no logical field exists, invent one (auto-number) Foreign keys are fields in one table that relate back to another table’s primary keys - Make sure foreign key “type” is same as related PK. Relational Database Concepts - Relationships In a RDBS, tables are related through relationships. Relationships may be one-to-one, one-to-many, many-tomany. One-to-many should be the most common. One-to-One: One item in Table A applies to one item in Table B (demographics table – dna table) One-to-Many: One item in Table A applies to many items in Table B (gender table – demographics table) Many-to-Many: Many records in table A relates to many records in Table B (avoid these) Strive for one-to-many relationships – PK/FK Relational Database Concepts - Normalization Series of rules developed by E.F. Codd (IBM) in 1970s – integral to relational database model First Normal Form: each column must contain only one value (atomic, discrete data storage) Second Normal Form: 1N + any column in a table that is not a key has to relate only to the primary key Third Normal Form: 2N + every non-key column is independent of every other non-key column Relational Database Concepts - Normalization – First Normal Form Each column (field) must contain only one value: Identify any field that contains multiple pieces of information (ex address) Break up problem fields into separate fields (address1, city, state, zip) Relational Database Concepts - Normalization – Second Normal Form 1N + any non-key column independent of every other non-key Identify any fields that do not relate directly to the primary key. Create new tables accordingly Assign or create new primary keys Create requisite foreign keys indicating relationships Relational Database Concepts - Normalization – Third Normal Form 2N + any non-key column independent of every other non-key Within a table, test to see whether any non-key field determines the value of another non-key field Relational Database Concepts - Table Design and Normalization Strategy Eliminate redundancy Think about units – this will help with 1NF atomicity Strive for one field primary key – use autonumbers if needed Think first about the most important data table (most important measurements), then work out from there to normalize Think about questions you’ll be asking from your data – then think about how your table structure may be combined to answer Avoid many to many relationships – one to many relationships are cleaner and avoid problems in long run Don’t be afraid to break a normalization rule if it is silly for your application Work out on paper first, then mock-up with MS-Access and test answering business questions with query-builds linking tables Fields – Common Types Text - Text or combinations of text and numbers, as well as numbers that don't require calculations, such as phone numbers. – Up to 255 characters Memo - Lengthy text or combinations of text and numbers - Up to 65,535 characters. Number - Numeric data used in mathematical calculations. Date/Time - Date and time values for the years 100 through 9999 AutoNumber - A unique sequential (incremented by 1) number or random number assigned by Microsoft Access whenever a new record is added to a table. AutoNumber fields can't be updated. Yes/No - Yes and No values and fields that contain only one of two values (Yes/No, True/False, or On/Off). OLE Object - An object (such as a Microsoft Excel spreadsheet, a Microsoft Word document, graphics, sounds, or other binary data) linked to or embedded in a Microsoft Access table. Demo? Referential Integrity Referential integrity is a system of rules that Microsoft Access uses to ensure that relationships between records in related tables are valid, and that you don't accidentally delete or change related data. (from MS-Help) Ensures data validity between tables is upheld Cascade Update Cascade Delete Summary – Paul’s Laws Think about the entire project and design tables (1st Cut) before touching keyboard Formulate data questions to determine best table scheme (How many people took drug A and gender = F and …). Leave wiggle room. Spend time normalizing, but don’t turn a 2-day project into a 2-month project. You’re not E-Bay – you can get by with less than perfect performance as long as you can answer your questions and the application is flexible for growth. Think about central table and questions first - then work outwards to define adjunct tables. Design enough tables to make things work, but don’t go overboard. I usually try to get by with as few as possible while remaining true to the spirit of normalization. Strive to store data once – don’t store calculations. Where to Get More Information Most database books have one chapter on table design and normalization -- I like the Visual QuickPro Guide series of technical help books Google search for ‘database normalization tutorial’