Databases CS 110 Fall 2005 Homework Read “Introduction to Access” from text For Tuesday: turn in index card with three questions about reading Two great issues in computing Programs • Instructions on how to manipulate given data Data • Representations of information Representations Data are a specific slice of reality Many representations can organize data Unstructured Text Steve Alan Mary Damon Ordered Text Alan Damon Mary Steve Graph I H G F Tom Raleigh NC Sally Easton MD Mary C D E Tables Hierarchy B A John Liz Laura Representations Same information representable in many ways • Representation affects ease with which certain operations can be done The ease with which programs/algorithms can execute Representational decisions are crucial Databases are a Representation Databases store relationships • Names are paired with phone numbers • Children are matched with parents Algorithms act on databases to extract information • Find all the Republican soccer moms • Is the sequence ATTATAA in the gene? Basic Operations on Databases Find/Query Display Add/Insert Delete Change/Modify Example: File system Stores “files” Uses hierarchical representation Operations • Display • Find • Add • Delete • Modify Example: File system Stores “files” Uses hierarchical representation Operations • Display • Find • Add • Delete • Modify Why this representation? What’s good/bad about this? Hierarchical File Systems Metaphor for file cabinet with folders • Hides complexity but makes it available List files in this directory Go up to parent directory Go down into child directory • Easy to describe where to find something Go up two directories… go to directory foo.. then bar… Hierarchical File Systems Problems with hierarchical file systems • Where’s my file?!? • File can’t be in two places at once • Reorganization not cheap Google Desktop Other Representations What do Google Desktop and Microsoft “Search” do? Superimpose another representation on top of hierarchical one Time to Find File “foo” You have to look at every file you have • If you have n files, it will require n “lookups” • Order: O(n) Time to Find “foo” Order all the files first • Sorting takes some time: nearly O(n2) • Finding file still takes O(n) • Why? Time to Find “foo” Use Indexing • Create a smaller list that “points” to special landmarks in data These are the tabbed pages in dictionaries • First find appropriate tab • Find “foo” in subsection of data • Order of search is function of number of tabs and size of subsections Indexed Databases Modern databases keep multiple indexes • “find” is faster • “add” and “delete” may be more expensive if reindexing is required Finding Database Records Remember the Relational Model? • Multiple tables with relationships between them Finding appropriate records requires operations • If they have compatible sizes Find records in one table but not the other Find records common to both tables Combine two tables • Find records in table that meet requirements Less than 3 Equal the word “simpleton” Operators on Relational DBs Join ( ) • Join table R with S subject to: A 1 4 7 Column B < Column D B 2 5 8 R C 3 6 9 D 3 6 E 1 2 S A 1 1 4 B 2 2 5 C 3 3 6 D 3 6 6 E 1 2 2 R joined with S Return Tests Test Section Mean 1 67 Median Standard Deviation 68 16 2 67 68 14 3 65 66 16 Histograms cs110-1 Midterm Scores 30 24 Number of Students 25 20 20 15 17 11 10 7 5 5 0 cs110-2 Midterm Scores 30 25 Poitns Scored 25 24 20 15 14 13 13 10 5 2 0 cs110-3 Midterm scores Number of Students 30 27 25 20 21 17 18 15 15 10 5 0 3