Data Management for Frontiers at the Interface Between Computing and Biology

advertisement
Data Management
for
Frontiers at the Interface Between
Computing and Biology
Jim Gray
Microsoft Research
Cosmic Questions
•
•
•
•
•
•
•
Where are we today?
Where in 5 years?
What are the key questions?
What am I doing next?
What are the barriers?
What hinders collaboration?
What changes needed in education?
How much information is there?Yotta
Everything
• Soon everything can be
!
recorded and indexed Recorded
All Books
• Most bytes will never be
MultiMedia
seen by humans.
• Human attention is the All LoC books
(words)
precious resource.
.Movi
• Automatic: Capture, store,
e
organize, analyze,
A Photo
summarize
• Manual visualize/iterate
A Book
24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli
Zetta
Exa
Peta
Tera
Giga
Mega
Kilo
Plumbing
• Everything can be online
– Storage is nearing 1 K$/TeraByte,
– Networking is 1$ / delivered GB
– Software is cheap or free
– Systems are becoming self-managing
Data Management Systems
• Can ingest/store/search/analyze Tera
Bytes
– Numbers
– Text
• Some progress on “objects”
– But semantics have to come from the
domain
– Good science and engineering, but…
Flopped in marketplace.
Basic Problems
• Data Acquisition:
– I do not much to say here
• Data Ingest:
– This is a huge problem
• Data Organization & Access
– This is what databases are good at for text & numbers
– For “semantic” data it requires domain –specific tools.
• Data Publication/ Discovery/ Interchange
– Requires good standards
– We have syntactic standards, Semantic standards are
needed.
My #1 Problem Data
Interchange
(includes publication and discovery)
• What does the data mean?
– The answer is: 42.
• Units?
• Precision? Accuracy?
• How was the number derived?
• How can you tell me what it means
(without us talking on the phone or you visiting my laboratory)
• Need standard terminology, and standard formats.
• Hard to do for “new” stuff.
Great Hope & Promise
•
•
•
•
•
XML is the answer
Reality: XML is one layer up from Unicode.
Can describe structured information
But not process, not meaning, not…
Answer #2: Objects
– SOAP, Web Services,…
– Probably a better answer
– But… still needs tools to make it workable.
Discussion
Gifford’s List
•
•
•
•
Data Interchange
Scale: whats big
Quality: how do you keep it up
DBs need more semantics.
Download