slides (PPT)

advertisement
INFO100 and CSE100
Fluency with Information Technology
Katherine Deibel
2012-02-29
Katherine Deibel, Fluency in Information Technology
1

We learn about data management
 We discussed spreadsheets
 We will get into databases now


Lab 9 will get you involved in using
database software (Access)
Project 3 will have you use both
spreadsheets and databases
2012-02-29
Katherine Deibel, Fluency in Information Technology
2

Databases are collections of information
given a structure

We have done this before:
 XHTML describes the layout of info on a page
 CSS describes the styling of information
 JavaScript describes the computation of info
 Spreadsheets describe data organization and
flow of calculations

The repeated lesson:
Give the computer structure so it can help!
2012-02-29
Katherine Deibel, Fluency in Information Technology
3

Some of us want to compute, but all of
us want information …
 Most archived information is in tables
 Databases enhance many applications
 Databases introduce interesting ideas

Still, there is a lot of overlap with what
spreadsheets can do
2012-02-29
Katherine Deibel, Fluency in Information Technology
4
Before relational databases,
there were only “flat files”
Structural information difficult to describe
 All processing of information was “special
cased” and required custom programs


Information repeated in multiple places
and hard to keep consistent

Change in format of one file meant all
related programs had to be changed
2012-02-29
Katherine Deibel, Fluency in Information Technology
5

Invented in 1970 by Ted Codd

Motivation:
The adverse impact on development
productivity of requiring programmers
to navigate along access paths to reach target
data [...] was enormous. In addition, it was not
possible to make slight changes in the layout in
storage without simultaneously having to revise all
programs that relied on the previous structure.
[...] As a result, far too much manpower was being
invested in continual (and avoidable) maintenance
of application programs.
2012-02-29
Katherine Deibel, Fluency in Information Technology
6




Metadata
Focusing on the relationships between
the data entries
Manipulating data tables through
operations on the tables
Separating the physical and logical
aspects of the database
2012-02-29
Katherine Deibel, Fluency in Information Technology
7
Data about data about data about…
2012-02-29
Katherine Deibel, Fluency in Information Technology
8

Metadata is
 Data about data
 The key to making computers more useful

A database is composed of data and
its metadata
 Metadata was not available to computers
in the past
2012-02-29
Katherine Deibel, Fluency in Information Technology
9



Bits and bytes encode the information,
but that’s not all
Tags can encode format and structure
Example uses:
 word processors
 HTML
 Oxford English Dictionary
2012-02-29
Katherine Deibel, Fluency in Information Technology
10
byte (baIt). Computers. [Arbitrary, prob. influenced by bit sb.4 and bite sb.] A group of eight
consecutive bits operated on as a unit in a computer. 1964 Blaauw & Brooks in IBM Systems
Jrnl. III. 122 An 8-bit unit of information is fundamental to most of the formats [of the
System/360]. A consecutive group of n such units constitutes a field of length n. Fixed-length
fields of length one, two, four, and eight are termed bytes, halfwords, words, and double words
respectively. 1964 IBM Jrnl. Res. & Developm. VIII. 97/1 When a byte of data appears from an
I/O device, the CPU is seized, dumped, used and restored. 1967 P. A. Stark Digital Computer
Programming xix. 351 The normal operations in fixed point are done on four bytes at a time.
1968 Dataweek 24 Jan. 1/1 Tape reading and writing is at from 34,160 to 192,000 bytes per
second.
<e><hg><hw>byte</hw> <pr><ph>baIt</ph></pr></hg>. <la>Computers</la>.
<etym>Arbitrary, prob. influenced by <xr><x>bit</x></xr> <ps>n.<hm>4</hm></ps>and
<xr><x>bite</x> <ps>n.</ps> </xr></etym> <s4>A group of eight consecutive bits
operated on as a unit in a computer.</s4> <qp><q><qd>1964</qd><a>Blaauw</a> &amp.
<a>Brooks</a> <bib>in</bib> <w>IBM Systems Jrnl.</w> <lc>III. 122</lc> <qt>An 8-bit
unit of information is fundamental to most of the formats <ed>of the System/360</ed>.&es.A
consecutive group of <i>n</i> such units constitutes a field of length <i>n</i>.&es.Fixedlength fields of length one, two, four, and eight are termed bytes, halfwords, words, and double
words respectively. </qt></q><q><qd>1964</qd> <w>IBM Jrnl. Res. &amp. Developm.</w>
<lc>VIII. 97/1</lc> <qt>When a byte of data appears from an I/O device, the CPU is seized,
dumped, used and restored.</qt></q> <q><qd>1967</qd> <a>P. A. Stark</a> <w>Digital
Computer Programming</w> <lc>xix. 351</lc> <qt>The normal operations in fixed point are
done on four bytes at a time.</qt></q><q><qd>1968</qd> <w>Dataweek</w> <lc>24 Jan.
1/1</lc> <qt>Tape reading and writing is at from 34,160 to 192,000 bytes per
second.</qt></q></qp></e>
2012-02-29
Katherine Deibel, Fluency in Information Technology
11


Two most important for us are tags
and schemas
Tags
 Tags
<population>305,471,002</population>

Schemas
 “Schemas,” which are descriptions of
tables and the kinds of values they can
store
2012-02-29
Katherine Deibel, Fluency in Information Technology
12

The Extensible Markup Language has
become the standard way to add
metadata to data
 Its success is largely driven by Web

Example:
<demogData>
<country>Canada</country>
<population>32805041</population>
<fertility>1.61</fertility>
<infant>5</infant>
<lifeExpct>80.1</lifeExpct>
</demogData>
2012-02-29
Katherine Deibel, Fluency in Information Technology
13

The best part of XML is that YOU think
up the tags
 A “self-describing language”
 There are no tags to learn!!!

That’s why it is called “extensible”
 You are already an expert on XML
2012-02-29
Katherine Deibel, Fluency in Information Technology
14

Tags are like XHTML
 <start> … </start>
 Must be properly nested

Allowed characters
 Alphanumeric and _
 No spaces!

Everything must be tagged
2012-02-29
Katherine Deibel, Fluency in Information Technology
15

<archipelago>
When we tag in
<island>
<iName>Isabela</iName>
XML, we use tags
<area>4588</area>
in different ways
<elev>1707</elev>
</island>
 Identity:
<island>
<iName>Fernandina</iName>
Say what something is
<area>642</area>
 Affinity:
<elev>1494</elev>
</island>
Say which properties
<island>
go together
<iName>Tower</iName>
<area>14</area>
 Collection:
<elev>76</elev>
Group like things
</island>
</archipelago>
together
2012-02-29
Katherine Deibel, Fluency in Information Technology
16
Not really a fortress…
More a specialized furniture store
2012-02-29
Katherine Deibel, Fluency in Information Technology
17

Databases are typically in XML
 All relational databases use XML
 Not all XML databases are relational

The difference:
 Relational databases place further
restrictions on the XML
2012-02-29
Katherine Deibel, Fluency in Information Technology
18

General XML approach
 Best when the data is not rigidly structured
 More of an ad hoc organization

Relational database approach
 Data comes with a rigid structure
 Happens very frequently
 Humans (and the computers we make) really
really really like structure
2012-02-29
Katherine Deibel, Fluency in Information Technology
19

A relational database consists of
 Multiple tables of data
 Descriptions of the relationships between
the various tables

Sounds simple… and it kind of is
2012-02-29
Katherine Deibel, Fluency in Information Technology
20

Information is stored in tables

Each table consists of entities of one kind

Each entity has a set of characteristics
known as attributes

Tables are tuples of these attributes

Each tuple must have a unique primary key

Relationships among the data are stored

The table structure is called a schema

The table contents are an instance
2012-02-29
Katherine Deibel, Fluency in Information Technology
21
Tables have names, attributes, tuples
Primary Key

Instance Schema:
ID
Last
First
Hire
Addr
2012-02-29
number
text
text
date
text
unique number (key)
person’s last name
person’s first name
first day on job
street address
Katherine Deibel, Fluency in Information Technology
22

Databases are comprised of multiple tables

BUT DATA SHOULD NOT BE REPEATED!!
 Replicated data can differ in its different
locations, e.g. multiple addresses can differ
 Inconsistent data is worse than no data

Solution:
 Keep a single copy of any data, and
 If it is needed in multiple places, associate it
with a key, and store key rather than the data
2012-02-29
Katherine Deibel, Fluency in Information Technology
23

When looking for information, a single item
or a table of answers is possible
 “Who is currently taking FIT100?”
Result: Table of students
 “Who won the 1940 Best Actor Oscar?”
Result: A table containing only a single row
 “In what years has the US won the World Cup?”
Result: Empty Table

A query to a database produces a table
2012-02-29
Katherine Deibel, Fluency in Information Technology
24
Scalpel… Sponge… Union… Join…
2012-02-29
Katherine Deibel, Fluency in Information Technology
25

There are five primitive operations on
tables to create new tables:
 Select: pick rows from a table
 Project: pick columns from a table
 Union: combine two tables w/like columns
 Difference: remove one table from another
 Product: create “all pairs” from two tables

Another fundamental operation is "Join":
 Join: Combine tables based on common fields
2012-02-29
Katherine Deibel, Fluency in Information Technology
26

Select creates a table from the rows
of another table meeting a criterion
Select from Example On Hire < 1993
2012-02-29
Katherine Deibel, Fluency in Information Technology
27

Project creates a table from the columns of
another table
Project Last, First From Example
2012-02-29
Katherine Deibel, Fluency in Information Technology
28

Union (written like addition) combines two
tables with same attributes
PoliticalUnits = States + Provinces
2012-02-29
Katherine Deibel, Fluency in Information Technology
29

Difference (written like subtraction)
removes 1 table’s rows from another
Eastern = States - WestCoast
2012-02-29
Katherine Deibel, Fluency in Information Technology
30

Product (written like multiplication)
combines columns and pairs all rows
Colors = Blues x Reds
Column Rule:
If A has x columns and
B has y columns, then
A x B has x+y columns
Row Rule:
If A has m rows and
B has n rows, then
A x B has m∙n rows
2012-02-29
Katherine Deibel, Fluency in Information Technology
31

To the right is a man who
divides database tables.
Do you want to be like him?

Seriously though
 Division operations do exist
 Advanced database topic
 Not used in regular practice
2012-02-29
Katherine Deibel, Fluency in Information Technology
32

Join (written like a bow tie) combines rows
if a common field matches
Homes = States ⨝ Students
2012-02-29
Katherine Deibel, Fluency in Information Technology
33

The five DB Operations can create any
table from a given set of tables
 Join is not primitive, but can be built from 5
 Join, select and project are used most often

All modern database systems are built
on these relational operations
 The operations are not usually used directly,
but are used indirectly from other languages
 SQL database language is one such example
2012-02-29
Katherine Deibel, Fluency in Information Technology
34
Databases are a big topic

Physical versus logical databases

Constructing and designing a database

More on operations and queries

More about XML
2012-02-29
Katherine Deibel, Fluency in Information Technology
35

Like many aspects of computer
fluency, understanding databases is
about understanding structure
 Defining structure
 Manipulating structure

Databases are based around the
simple notion of tables
 More tables are built from more tables
using operations
2012-02-29
Katherine Deibel, Fluency in Information Technology
36
Download