Data Management for Frontiers at the Interface Between Computing and Biology Jim Gray Microsoft Research Cosmic Questions • • • • • • • Where are we today? Where in 5 years? What are the key questions? What am I doing next? What are the barriers? What hinders collaboration? What changes needed in education? How much information is there?Yotta Everything • Soon everything can be ! recorded and indexed Recorded All Books • Most bytes will never be MultiMedia seen by humans. • Human attention is the All LoC books (words) precious resource. .Movi • Automatic: Capture, store, e organize, analyze, A Photo summarize • Manual visualize/iterate A Book 24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli Zetta Exa Peta Tera Giga Mega Kilo Plumbing • Everything can be online – Storage is nearing 1 K$/TeraByte, – Networking is 1$ / delivered GB – Software is cheap or free – Systems are becoming self-managing Data Management Systems • Can ingest/store/search/analyze Tera Bytes – Numbers – Text • Some progress on “objects” – But semantics have to come from the domain – Good science and engineering, but… Flopped in marketplace. Basic Problems • Data Acquisition: – I do not much to say here • Data Ingest: – This is a huge problem • Data Organization & Access – This is what databases are good at for text & numbers – For “semantic” data it requires domain –specific tools. • Data Publication/ Discovery/ Interchange – Requires good standards – We have syntactic standards, Semantic standards are needed. My #1 Problem Data Interchange (includes publication and discovery) • What does the data mean? – The answer is: 42. • Units? • Precision? Accuracy? • How was the number derived? • How can you tell me what it means (without us talking on the phone or you visiting my laboratory) • Need standard terminology, and standard formats. • Hard to do for “new” stuff. Great Hope & Promise • • • • • XML is the answer Reality: XML is one layer up from Unicode. Can describe structured information But not process, not meaning, not… Answer #2: Objects – SOAP, Web Services,… – Probably a better answer – But… still needs tools to make it workable. Discussion Gifford’s List • • • • Data Interchange Scale: whats big Quality: how do you keep it up DBs need more semantics.