INFO100 and CSE100 Fluency with Information Technology Katherine Deibel 2012-02-29 Katherine Deibel, Fluency in Information Technology 1 We learn about data management We discussed spreadsheets We will get into databases now Lab 9 will get you involved in using database software (Access) Project 3 will have you use both spreadsheets and databases 2012-02-29 Katherine Deibel, Fluency in Information Technology 2 Databases are collections of information given a structure We have done this before: XHTML describes the layout of info on a page CSS describes the styling of information JavaScript describes the computation of info Spreadsheets describe data organization and flow of calculations The repeated lesson: Give the computer structure so it can help! 2012-02-29 Katherine Deibel, Fluency in Information Technology 3 Some of us want to compute, but all of us want information … Most archived information is in tables Databases enhance many applications Databases introduce interesting ideas Still, there is a lot of overlap with what spreadsheets can do 2012-02-29 Katherine Deibel, Fluency in Information Technology 4 Before relational databases, there were only “flat files” Structural information difficult to describe All processing of information was “special cased” and required custom programs Information repeated in multiple places and hard to keep consistent Change in format of one file meant all related programs had to be changed 2012-02-29 Katherine Deibel, Fluency in Information Technology 5 Invented in 1970 by Ted Codd Motivation: The adverse impact on development productivity of requiring programmers to navigate along access paths to reach target data [...] was enormous. In addition, it was not possible to make slight changes in the layout in storage without simultaneously having to revise all programs that relied on the previous structure. [...] As a result, far too much manpower was being invested in continual (and avoidable) maintenance of application programs. 2012-02-29 Katherine Deibel, Fluency in Information Technology 6 Metadata Focusing on the relationships between the data entries Manipulating data tables through operations on the tables Separating the physical and logical aspects of the database 2012-02-29 Katherine Deibel, Fluency in Information Technology 7 Data about data about data about… 2012-02-29 Katherine Deibel, Fluency in Information Technology 8 Metadata is Data about data The key to making computers more useful A database is composed of data and its metadata Metadata was not available to computers in the past 2012-02-29 Katherine Deibel, Fluency in Information Technology 9 Bits and bytes encode the information, but that’s not all Tags can encode format and structure Example uses: word processors HTML Oxford English Dictionary 2012-02-29 Katherine Deibel, Fluency in Information Technology 10 byte (baIt). Computers. [Arbitrary, prob. influenced by bit sb.4 and bite sb.] A group of eight consecutive bits operated on as a unit in a computer. 1964 Blaauw & Brooks in IBM Systems Jrnl. III. 122 An 8-bit unit of information is fundamental to most of the formats [of the System/360]. A consecutive group of n such units constitutes a field of length n. Fixed-length fields of length one, two, four, and eight are termed bytes, halfwords, words, and double words respectively. 1964 IBM Jrnl. Res. & Developm. VIII. 97/1 When a byte of data appears from an I/O device, the CPU is seized, dumped, used and restored. 1967 P. A. Stark Digital Computer Programming xix. 351 The normal operations in fixed point are done on four bytes at a time. 1968 Dataweek 24 Jan. 1/1 Tape reading and writing is at from 34,160 to 192,000 bytes per second. <e><hg><hw>byte</hw> <pr><ph>baIt</ph></pr></hg>. <la>Computers</la>. <etym>Arbitrary, prob. influenced by <xr><x>bit</x></xr> <ps>n.<hm>4</hm></ps>and <xr><x>bite</x> <ps>n.</ps> </xr></etym> <s4>A group of eight consecutive bits operated on as a unit in a computer.</s4> <qp><q><qd>1964</qd><a>Blaauw</a> &amp. <a>Brooks</a> <bib>in</bib> <w>IBM Systems Jrnl.</w> <lc>III. 122</lc> <qt>An 8-bit unit of information is fundamental to most of the formats <ed>of the System/360</ed>.&es.A consecutive group of <i>n</i> such units constitutes a field of length <i>n</i>.&es.Fixedlength fields of length one, two, four, and eight are termed bytes, halfwords, words, and double words respectively. </qt></q><q><qd>1964</qd> <w>IBM Jrnl. Res. &amp. Developm.</w> <lc>VIII. 97/1</lc> <qt>When a byte of data appears from an I/O device, the CPU is seized, dumped, used and restored.</qt></q> <q><qd>1967</qd> <a>P. A. Stark</a> <w>Digital Computer Programming</w> <lc>xix. 351</lc> <qt>The normal operations in fixed point are done on four bytes at a time.</qt></q><q><qd>1968</qd> <w>Dataweek</w> <lc>24 Jan. 1/1</lc> <qt>Tape reading and writing is at from 34,160 to 192,000 bytes per second.</qt></q></qp></e> 2012-02-29 Katherine Deibel, Fluency in Information Technology 11 Two most important for us are tags and schemas Tags Tags <population>305,471,002</population> Schemas “Schemas,” which are descriptions of tables and the kinds of values they can store 2012-02-29 Katherine Deibel, Fluency in Information Technology 12 The Extensible Markup Language has become the standard way to add metadata to data Its success is largely driven by Web Example: <demogData> <country>Canada</country> <population>32805041</population> <fertility>1.61</fertility> <infant>5</infant> <lifeExpct>80.1</lifeExpct> </demogData> 2012-02-29 Katherine Deibel, Fluency in Information Technology 13 The best part of XML is that YOU think up the tags A “self-describing language” There are no tags to learn!!! That’s why it is called “extensible” You are already an expert on XML 2012-02-29 Katherine Deibel, Fluency in Information Technology 14 Tags are like XHTML <start> … </start> Must be properly nested Allowed characters Alphanumeric and _ No spaces! Everything must be tagged 2012-02-29 Katherine Deibel, Fluency in Information Technology 15 <archipelago> When we tag in <island> <iName>Isabela</iName> XML, we use tags <area>4588</area> in different ways <elev>1707</elev> </island> Identity: <island> <iName>Fernandina</iName> Say what something is <area>642</area> Affinity: <elev>1494</elev> </island> Say which properties <island> go together <iName>Tower</iName> <area>14</area> Collection: <elev>76</elev> Group like things </island> </archipelago> together 2012-02-29 Katherine Deibel, Fluency in Information Technology 16 Not really a fortress… More a specialized furniture store 2012-02-29 Katherine Deibel, Fluency in Information Technology 17 Databases are typically in XML All relational databases use XML Not all XML databases are relational The difference: Relational databases place further restrictions on the XML 2012-02-29 Katherine Deibel, Fluency in Information Technology 18 General XML approach Best when the data is not rigidly structured More of an ad hoc organization Relational database approach Data comes with a rigid structure Happens very frequently Humans (and the computers we make) really really really like structure 2012-02-29 Katherine Deibel, Fluency in Information Technology 19 A relational database consists of Multiple tables of data Descriptions of the relationships between the various tables Sounds simple… and it kind of is 2012-02-29 Katherine Deibel, Fluency in Information Technology 20 Information is stored in tables Each table consists of entities of one kind Each entity has a set of characteristics known as attributes Tables are tuples of these attributes Each tuple must have a unique primary key Relationships among the data are stored The table structure is called a schema The table contents are an instance 2012-02-29 Katherine Deibel, Fluency in Information Technology 21 Tables have names, attributes, tuples Primary Key Instance Schema: ID Last First Hire Addr 2012-02-29 number text text date text unique number (key) person’s last name person’s first name first day on job street address Katherine Deibel, Fluency in Information Technology 22 Databases are comprised of multiple tables BUT DATA SHOULD NOT BE REPEATED!! Replicated data can differ in its different locations, e.g. multiple addresses can differ Inconsistent data is worse than no data Solution: Keep a single copy of any data, and If it is needed in multiple places, associate it with a key, and store key rather than the data 2012-02-29 Katherine Deibel, Fluency in Information Technology 23 When looking for information, a single item or a table of answers is possible “Who is currently taking FIT100?” Result: Table of students “Who won the 1940 Best Actor Oscar?” Result: A table containing only a single row “In what years has the US won the World Cup?” Result: Empty Table A query to a database produces a table 2012-02-29 Katherine Deibel, Fluency in Information Technology 24 Scalpel… Sponge… Union… Join… 2012-02-29 Katherine Deibel, Fluency in Information Technology 25 There are five primitive operations on tables to create new tables: Select: pick rows from a table Project: pick columns from a table Union: combine two tables w/like columns Difference: remove one table from another Product: create “all pairs” from two tables Another fundamental operation is "Join": Join: Combine tables based on common fields 2012-02-29 Katherine Deibel, Fluency in Information Technology 26 Select creates a table from the rows of another table meeting a criterion Select from Example On Hire < 1993 2012-02-29 Katherine Deibel, Fluency in Information Technology 27 Project creates a table from the columns of another table Project Last, First From Example 2012-02-29 Katherine Deibel, Fluency in Information Technology 28 Union (written like addition) combines two tables with same attributes PoliticalUnits = States + Provinces 2012-02-29 Katherine Deibel, Fluency in Information Technology 29 Difference (written like subtraction) removes 1 table’s rows from another Eastern = States - WestCoast 2012-02-29 Katherine Deibel, Fluency in Information Technology 30 Product (written like multiplication) combines columns and pairs all rows Colors = Blues x Reds Column Rule: If A has x columns and B has y columns, then A x B has x+y columns Row Rule: If A has m rows and B has n rows, then A x B has m∙n rows 2012-02-29 Katherine Deibel, Fluency in Information Technology 31 To the right is a man who divides database tables. Do you want to be like him? Seriously though Division operations do exist Advanced database topic Not used in regular practice 2012-02-29 Katherine Deibel, Fluency in Information Technology 32 Join (written like a bow tie) combines rows if a common field matches Homes = States ⨝ Students 2012-02-29 Katherine Deibel, Fluency in Information Technology 33 The five DB Operations can create any table from a given set of tables Join is not primitive, but can be built from 5 Join, select and project are used most often All modern database systems are built on these relational operations The operations are not usually used directly, but are used indirectly from other languages SQL database language is one such example 2012-02-29 Katherine Deibel, Fluency in Information Technology 34 Databases are a big topic Physical versus logical databases Constructing and designing a database More on operations and queries More about XML 2012-02-29 Katherine Deibel, Fluency in Information Technology 35 Like many aspects of computer fluency, understanding databases is about understanding structure Defining structure Manipulating structure Databases are based around the simple notion of tables More tables are built from more tables using operations 2012-02-29 Katherine Deibel, Fluency in Information Technology 36