Organizing Information Digitally Norm Friesen Overview • General properties of digital information • Relational: tabular & linked • Object-Oriented: inheritance & modularity • Markup: serial & hierarchical General Properties • Multiple Axes and access points – Allow for different views • Form & Content can (should) be separate • Formatting can be used for analysis & organization of data • Instructions and data can be combined; – effects of instructions are difficult to control • Database software for each type Examples • Relational: library catalogue, Amazon.com, hotel reservation system • Markup: Web pages & Google, Blogs & RSS, • Object-Oriented: programs of all kinds; Windows XP, Office, etc. Java Programming langauge Relational • Tables and links • Table: “a systematic arrangement of data usually in rows and columns for ready reference” • Represents a category or example, rather than a specific instance of that category. Entities can be thought of (roughly) as nouns. Deriving tables from text • Tabs, commas and hard returns (paragraphs) are often used to indicate rows and columns in a table • Data in this format often called “flat files.” • Can be used as a way of getting data “into” a database: make a list into a database table Relational, con’t • An entity described in a table can be related to other entities – E.g. person and membership card(s) • This relationship can be: – One to one – One to many – Many to many Primary Key • Primary Key: a field that uniquely identifies each record stored in a table. This field is often automatically numbered; it cannot contain any empty, blank or null values. Definition: Relation Relation: A connection between two tables, each describing an entity that interacts with the other. In the example above, users (described in the first table) compose and send messages (described in the second table). The values for the primary key for one of these entities is stored in two places: in its own table, and as a foreign key in the related table. Many to Many: Junction Table Activity: a 2-Table database • Think of examples • Look at examples for the database application project • Include primary and foreign key • Make sure that you use the correct relation type Relational Data: Other Characteristics • Particular means of querying: SQL or Standard Query Language – ISO/IEC 9075; Information Technology - Database Languages • Not good at representing complex relationships and some kinds of entities/data – Complexity can sometimes be accommodated at the price of performance – Multimedia not easy to accommodate Object-Orientation • Way of organizing and conceptualizing information largely for the purposes of programming • Programming: the creation of step-bystep list of instructions written for a particular computer environment in a particular language. Object Orientation: Characteristics • Modular: Black boxes with a standardized interface; encapsulation • Classes and inheritance: part of producing and modifying program components • Operation: what the object can do Object Orientation: Modular • Bugs tend to arise from unexpected consequences of relations between parts of a program – Simplify relations by defining modular program components that relate to one another through clearly defined interfaces. – Programmers and program components only deal with the interface, not the module or object contents. Object Orientation: Classes • A class is a pattern, template, or blueprint for a category of structurally identical items. The items created using the class are called instances. This is often referred to as the "class as a `cookie cutter'" view. As you might guess, the instances are the "cookies.” (http://www.toa.com/pub/oobasics/oobasics.htm) Object Orientation: Inheritance • “In an object-oriented context, we speak of specializations as "inheriting" characteristics from their corresponding generalizations. Inheritance can be defined as the process whereby one object acquires (gets, receives) characteristics from one or more other objects.” Object-oriented Databases • data is stored as objects it can be interpreted only using the methods, usually specified by its class. The relationship between similar objects is preserved (inheritance) as are references between objects. Object oriented Databases • Doesn’t translate well into SQL data: Object-SQL Impedance Mismatch • “As an industry, ODBMS were long considered to be a lost opportunity to revolutionize software development. Since 2004, object databases have seen a renaissance when open source object databases appeared…” Markup Languages • Markup refers to the use of a markup language to describe the structure and appearance of a particular document. – HTML: describes the appearance of documents – XML: geared to the description of the structure of documents – There are many types of documents, so many derivatives from XML exist Markup, con’t • Used for both documents and records • Both XML and HTML derived from SGML, “Standardized General Markup Language” (1960’s). A language for formulating languages – XML (1996): a simplified subset of SGML – HTML (1992): very simplified subset; XHTML conforms to XML Markup, con’t <title>A Tale of Two Cities</title> SERIAL & HIERARCHICAL: <image> <title>Stephen's Web</title> <url>http://www.d.ca/r.gif</url> <link>http://www.downes.ca</link> <width>90</width> <height>36</height> </image> (validation) XML • • • • • OpenDoc: for office documents Doc book: for manuals XrML: for enforceable copyright statements RSS: for news/posting syndication MathML: for formatting mathematical formulations • RuleML: expressing formal rules for processing information, etc. DTD/Schema, Document, XSLT XML, con’t • Repetition of elements within repetitions. • XML databases – Relational/hybrid – “Native” – XQuery Summary • Three forms of organizing information • Each is flexible and powerful, but only within specific domains/purposes • Most widespread database technologies are relational • But the other two forms (markup and objectoriented) do not translate easily into this format.