Storing and Organizing Data

advertisement
Storing and Organizing Data
Informatics I101
February 18, 2004
John C. Paolillo
Storing Data
• Encoding: fixed or variable width
• Memory
• Storage medium:
– Magnetic: tape, disk, hard disk
– Optical: CD, DVD, etc.
– Silicon: Programable Read Only Memory
(PROM), Erasable PROM, etc.
Compact Disk Recording
LED
Light beam
Lens
Lens
Data groove, etched in
surface of plastic, has
a slight “wobble” that
helps locate the data
The Recording Process
Light beam — pulses to
record on and off states,
steady for reading
Crystaline metal alloy
recording surface
Pits of amorphous solid
left when metal re-cools
1.6µm
0.74µm
0.32µm
CD Media States
• Crystaline: bright, reflects light well
– “off” state
• Amorphous: dark, scatters light
– “on” state
• Micro-crystaline: reflects light, but not
brightly
– “erased” state (= “off”)
How Erasing Takes Place
Writing isn’t perfect
The center pits (dots) are partly erased by the heating
caused by the writing of the nearby longer pits (dashes)
which were written later.
Reference
van Houten, Henk; and Wouter Leibbrandt.
2000. “Phase change recording”.
Communications of the ACM, 43.11: 64-71.
http://www.acm.org/dl
Storing Data
• Encoding: we may need to change from one
encoding to another
– Task of the device driver
– Gives us a stream of bits
• Medium: different media require different
treatment of the data for storage
– Task of the device hardware itself
– Gives us a stream of bits read/write-able by the device
But how do we find the data later?
Data Organization
• Index for the data
– File names, extensions
– Metadata (date, program that uses it, etc.)
– Directory structures
• All data storage systems use some kind of
data organization
– The principles of data organization are the same
no matter what the data or where it is organized
When Organization is Critical
• National Center for Biotechnology Information
(NCBI)
Genbank:
– 28 billion DNA base pairs (A, C, G, T)
– 22 million sequences (possible genes)
This is a lot of data to manage. In NCBI it has
been indexed with many kinds of metadata and
integrated with information from scientific
publications, so the overall enterprise is larger yet.
Other Similar Applications
• NASA mars and other missions
– http://photojournal.jpl.nasa.gov/index.html
• The National Virtual Observatory
– http://www.us-vo.org/
• Centers for Disease Control
– http://www.cdc.gov/
• Homeland Security
Data and Metadata
Data:
any object of interest which can be
characterized and encoded in digital form
Metadata:
data about data — data used to help index
and locate data of interest in some
application
Data Organization Schemes
• Hierarchical
– Data organized into object hierarchies for easy access
– Metadata is in the tree structure of the hierarchies
– XML Databases
• Network
– Objects link to some selected other objects
– Metadata is embedded in the data
– The World-Wide Web
• Relational
– Data organized into relations
– Metadata is in the structure of the relations
– Most Database Management Systems (DBMSs)
Relations
Metadata
Data
Actor
Meryl Streep
Johnny Depp
Meg Ryan
...
Movie
The Hours
Dead Man
Against the Ropes
...
Date
Summer 2003
Summer 1994
Winter 2004
...
Download