Organizing Information Digitally

advertisement
Organizing Information
Digitally
Norm Friesen
Overview
• General properties of digital
information
• Relational: tabular & linked
• Object-Oriented: inheritance &
modularity
• Markup: serial & hierarchical
General Properties
• Multiple Axes and access points
– Allow for different views
• Form & Content can (should) be
separate
• Formatting can be used for analysis &
organization of data
• Instructions and data can be combined;
– effects of instructions are difficult to control
• Database software for each type
Examples
• Relational: library catalogue,
Amazon.com, hotel reservation system
• Markup: Web pages & Google, Blogs &
RSS,
• Object-Oriented: programs of all kinds;
Windows XP, Office, etc. Java
Programming langauge
Relational
• Tables and links
• Table: “a systematic arrangement of
data usually in rows and columns for
ready reference”
• Represents a category or example,
rather than a specific instance of that
category. Entities can be thought of
(roughly) as nouns.
Deriving tables from text
• Tabs, commas and hard returns
(paragraphs) are often used to indicate
rows and columns in a table
• Data in this format often called “flat
files.”
• Can be used as a way of getting data
“into” a database: make a list into a
database table
Relational, con’t
• An entity described in a table can be
related to other entities
– E.g. person and membership card(s)
• This relationship can be:
– One to one
– One to many
– Many to many
Primary Key
• Primary Key: a field that uniquely
identifies each record stored in a table.
This field is often automatically
numbered; it cannot contain any empty,
blank or null values.
Definition: Relation
Relation: A connection between two
tables, each describing an entity that
interacts with the other. In the example
above, users (described in the first table)
compose and send messages (described
in the second table). The values for the
primary key for one of these entities is
stored in two places: in its own table, and
as a foreign key in the related table.
Many to Many: Junction Table
Activity: a 2-Table database
• Think of examples
• Look at examples for the database
application project
• Include primary and foreign key
• Make sure that you use the correct
relation type
Relational Data:
Other Characteristics
• Particular means of querying: SQL or
Standard Query Language
– ISO/IEC 9075; Information Technology - Database
Languages
• Not good at representing complex
relationships and some kinds of entities/data
– Complexity can sometimes be accommodated at
the price of performance
– Multimedia not easy to accommodate
Object-Orientation
• Way of organizing and conceptualizing
information largely for the purposes of
programming
• Programming: the creation of step-bystep list of instructions written for a
particular computer environment in a
particular language.
Object Orientation:
Characteristics
• Modular: Black boxes with a
standardized interface; encapsulation
• Classes and inheritance: part of
producing and modifying program
components
• Operation: what the object can do
Object Orientation: Modular
• Bugs tend to arise from unexpected
consequences of relations between
parts of a program
– Simplify relations by defining modular
program components that relate to one
another through clearly defined interfaces.
– Programmers and program components
only deal with the interface, not the module
or object contents.
Object Orientation: Classes
• A class is a pattern, template, or blueprint
for a category of structurally identical
items. The items created using the class
are called instances. This is often
referred to as the "class as a `cookie
cutter'" view. As you might guess, the
instances are the "cookies.”
(http://www.toa.com/pub/oobasics/oobasics.htm)
Object Orientation: Inheritance
• “In an object-oriented context, we speak
of specializations as "inheriting"
characteristics from their corresponding
generalizations. Inheritance can be
defined as the process whereby one
object acquires (gets, receives)
characteristics from one or more other
objects.”
Object-oriented Databases
• data is stored as objects it can be
interpreted only using the methods,
usually specified by its class. The
relationship between similar objects is
preserved (inheritance) as are
references between objects.
Object oriented Databases
• Doesn’t translate well into SQL data:
Object-SQL Impedance Mismatch
• “As an industry, ODBMS were long
considered to be a lost opportunity to
revolutionize software development.
Since 2004, object databases have
seen a renaissance when open source
object databases appeared…”
Markup Languages
• Markup refers to the use of a markup
language to describe the structure and
appearance of a particular document.
– HTML: describes the appearance of
documents
– XML: geared to the description of the
structure of documents
– There are many types of documents, so
many derivatives from XML exist
Markup, con’t
• Used for both documents and records
• Both XML and HTML derived from
SGML, “Standardized General Markup
Language” (1960’s). A language for
formulating languages
– XML (1996): a simplified subset of SGML
– HTML (1992): very simplified subset;
XHTML conforms to XML
Markup, con’t
<title>A Tale of Two Cities</title>
SERIAL & HIERARCHICAL:
<image>
<title>Stephen's Web</title>
<url>http://www.d.ca/r.gif</url>
<link>http://www.downes.ca</link>
<width>90</width> <height>36</height>
</image> (validation)
XML
•
•
•
•
•
OpenDoc: for office documents
Doc book: for manuals
XrML: for enforceable copyright statements
RSS: for news/posting syndication
MathML: for formatting mathematical
formulations
• RuleML: expressing formal rules for
processing information, etc.
DTD/Schema, Document,
XSLT
XML, con’t
• Repetition of elements within
repetitions.
• XML databases
– Relational/hybrid
– “Native”
– XQuery
Summary
• Three forms of organizing information
• Each is flexible and powerful, but only within
specific domains/purposes
• Most widespread database technologies are
relational
• But the other two forms (markup and objectoriented) do not translate easily into this
format.
Download