View from Experiment/Observation driven Applications Richard P. Mount May 24, 2004 DOE Office of Science Data Management Workshop Richard P Mount View from Experiment/Observation Driven Science 2 Richard P Mount View from Experiment/Observation Driven Science 3 Richard P Mount View from Experiment/Observation Driven Science 4 Richard P Mount View from Experiment/Observation Driven Science 5 Richard P Mount View from Experiment/Observation Driven Science 6 Richard P Mount View from Experiment/Observation Driven Science 7 Richard P Mount View from Experiment/Observation Driven Science 8 Richard P Mount View from Experiment/Observation Driven Science 9 Richard P Mount View from Experiment/Observation Driven Science 10 Richard P Mount View from Experiment/Observation Driven Science 11 Richard P Mount View from Experiment/Observation Driven Science 12 Richard P Mount View from Experiment/Observation Driven Science 13 Richard P Mount View from Experiment/Observation Driven Science 14 Richard P Mount View from Experiment/Observation Driven Science 15 Richard P Mount View from Experiment/Observation Driven Science 16 Richard P Mount View from Experiment/Observation Driven Science 17 Richard P Mount View from Experiment/Observation Driven Science 18 Richard P Mount View from Experiment/Observation Driven Science 19 Richard P Mount View from Experiment/Observation Driven Science 20 Richard P Mount View from Experiment/Observation Driven Science 21 Richard P Mount View from Experiment/Observation Driven Science 22 Richard P Mount View from Experiment/Observation Driven Science 23 Richard P Mount View from Experiment/Observation Driven Science 24 Richard P Mount View from Experiment/Observation Driven Science 25 Richard P Mount View from Experiment/Observation Driven Science 26 Richard P Mount View from Experiment/Observation Driven Science 27 Richard P Mount View from Experiment/Observation Driven Science 28 Richard P Mount View from Experiment/Observation Driven Science 29 Richard P Mount View from Experiment/Observation Driven Science 30 Richard P Mount View from Experiment/Observation Driven Science 31 Richard P Mount View from Experiment/Observation Driven Science 32 Richard P Mount View from Experiment/Observation Driven Science 33 Richard P Mount View from Experiment/Observation Driven Science 34 Experiment/Observation Common Characterisitcs (Mildly Provocative) Experiment/Observation Common Characteristics • Dominated by large, expensive devices and projects • Correct project planning includes datamanagement hardware and software development – Not acceptable to build a $1Billion device and then face a Data-Management crisis – Development might be much more valuable if performed in a wider context • Often hundreds or thousands of users • Geographically distributed users Richard P Mount View from Experiment/Observation Driven Science 36 Consequences of Common Characteristics • Less worry about workflow management – part of the project from the start • Multi-user concerns: – Keeping track of millions of data products (files?) created by people you barely know – Performance issues due to many concurrent queries – Data movement, grids and networks really matter to international collaborations • Visualization can be a useful tool but rarely a major issue • Responsiveness is a key issue – Taking months or years to answer a simple question is almost deadly Richard P Mount View from Experiment/Observation Driven Science 37 Final Comments and Pet Project Peddling Characterizing Scientific Data My petabyte is harder to analyze than your petabyte – Images (or meshes) are bulky but simply structured and usually have simple access patterns – Features are perhaps 1000 times less bulky, but often have complex structures and hard-to-predict access patterns Richard P Mount View from Experiment/Observation Driven Science 39 Hydrogen Bubble Chamber Photograph 1970 Richard P Mount CERN Photo View from Experiment/Observation Driven Science 40 Storage Issues • Disks: – Random access performance is lousy, unless objects are megabytes or more • independent of cost • deteriorating with time at the rate at which disk capacity increases (Define random-access performance as time taken to randomly access entire contents of a disk) Richard P Mount View from Experiment/Observation Driven Science 41 Latency and Speed – Random Access Random-Access Storage Performance 1000 100 10 Retreival Rate Mbytes/s 1 0.1 0.01 0.001 PC2100 WD200GB 0.0001 STK9940B 0.00001 0.000001 0.0000001 0.00000001 0.000000001 0 1 2 3 4 5 6 7 8 9 10 log10 (Obect Size Bytes) Richard P Mount View from Experiment/Observation Driven Science 42 Latency and Speed – Random Access Historical Trends in Storage Performance 1000 100 10 Retrieval Rate MBytes/s 1 0.1 PC2100 WD200GB STK9940B 0.01 0.001 RAM 10 years ago Disk 10 years ago Tape 10 years ago 0.0001 0.00001 0.000001 0.0000001 0.00000001 0.000000001 0 1 2 3 4 5 6 7 8 9 10 log10 (Object Size Bytes) Richard P Mount View from Experiment/Observation Driven Science 43 The End