Building Mult-Petabyte Online Databases Jim Gray Microsoft Research Gray@Microsoft.com http://research.Microsoft.com/~Gray Outline • Technology: – 1M$/PB: store everything online (twice!) • End-to-end high-speed networks – Gigabit to the desktop • Research driven by apps: – EOS/DIS – TerraServer – National Virtual Astronomy Observatory. Reality Check • Good news – In the limit, processing & storage & network is free – Processing & network is infinitely fast • Bad news – Most of us live in the present. – People are getting more expensive. Management/programming cost exceeds hardware cost. – Speed of light not improving. – WAN prices have not changed much in last 8 years. How Much Information Is there? • Soon everything can be recorded and indexed • Most data never be seen by humans • Precious Resource: Human attention Auto-Summarization Auto-Search is key technology. Everything ! Recorded All Books Yotta Zetta Exa MultiMedia Peta All LoC books (words) .Movi e A Photo Tera Giga Mega www.lesk.com/mlesk/ksg97/ksg.html A Book 24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli Kilo Trends: ops/s/$ Had Three Growth Phases 1890-1945 Mechanical Relay 7-year doubling 1945-1985 Tube, transistor,.. 2.3 year doubling 1985-2000 Microprocessor 1.0 year doubling 1.E+09 ops per second/$ doubles every 1.0 years 1.E+06 1.E+03 1.E+00 1.E-03 doubles every 7.5 years doubles every 2.3 years 1.E-06 1880 1900 1920 1940 1960 1980 2000 Sort Speedup • Performance doubled every year for the last 15 years. • But now it is 100s or 1,000s of processors and disks • Got 40%/y (70x) from technology and 60%/y (1,000x) from parallelism (partition and pipeline) • See http://research.microsoft.com/barc/SortBenchmark/ SPsort 1.E+08 Sort Re cords/se cond vs T ime SPsort/ IB 1.E+07 1.E+06 Records Sorted per Second Doubles Every Year NOW IBM RS6000 1.E+06 IBM 3090 Sandia/Compaq /NT Ordinal+SGI NT/PennySort Alpha Compaq/NT 1.E+03 1.E+05 Cray YMP Sequent 1.E+04 1.E+03 Intel HyperCube Penny NT sort 1.E+00 Kitsuregawa Hardware Sorter Tandem 1.E+02 1985 GB Sorted per Dollar Doubles Every Year Bitton M68000 1990 1995 2000 1.E-03 1985 1990 1995 2000 Terabyte (Petabyte) Processing Requires Parallelism parallelism: use many little devices in parallel 1,000 x parallel: At 10 MB/s: 1.2 days to scan 100 seconds scan. 1 Terabyte 1 Terabyte 10 MB/s Use 100 processors & 1,000 disks Parallelism Must Be Automatic • There are thousands of MPI programmers. • There are hundreds-of-millions of people using parallel database search. • Parallel programming is HARD! • Find design patterns and automate them. • Data search/mining has parallel design patterns. Storage capacity beating Moore’s law Disk TB Shipped per Year 1E+7 ExaByte 1E+6 • 4 k$/TB today (raw disk) 1998 Disk Trend (Jim Porter) http://www.disktrend.com/pdf/portrpkg.pdf. 1E+5 disk TB growth: 112%/y Moore's Law: 58.7%/y 1E+4 1E+3 1988 Moores law Revenue TB growth Price decline 1991 1994 1997 58.70% /year 7.47% 112.30% (since 1993) 50.70% (since 1993) 2000 Cheap Storage • 4 k$/TB disks (16 x 60 GB disks @ 210$ each) 40 1000 900 40 Price disk capacity capacity Price vs vs disk 900 800 800 700 IDE SCSI 600 700 SCSI IDE y = 15.895x + 13.446 raw raw k$/TB 3030 k$/TB 300 y = 13.322x - 1.4332 200 y = 5.7156x + 47.857 400 300 SCSI IDE 20 $ 400 500 SCSI IDE 2525 $ $ $ 500 600 35 35 20 15 1015 7 510 100 200 0 0 1000 10 0 0 20 40 y 30 =unit 3.0635x + Raw Disk Size GB 50 40.542 20 40 60 Raw Disk unit Size GB 60 5 0 10 20 30 40 Disk unit size GB 0 80 0 20 40 Disk unit size GB 50 60 60 80 240 GB, 2k$ (now) 320 GB by year end. • 4x60 GB IDE (2 hot plugable) – (1,100$) • SCSI-IDE bridge – 200k$ • Box – – – – 500 Mhz cpu 256 MB SRAM Fan, power, Enet 700$ • Or 8 disks/box 480 GB for ~3K$ ( or 300 GB RAID) Hot Swap Drives for Archive or Data Interchange • 35 MBps write (so can write N x 60 GB in 40 minutes) • 60 GB/overnite = ~N x 2 MB/second @ 19.95$/nite 17$ 200$ Cheap Storage or Balanced System • Low cost storage (2 x 1.5k$ servers) 7K$ TB 2x (1K$ system + 8x60GB disks + 100MbEthernet) • Balanced server (7k$/.64 TB) – – – – – 2x800Mhz (2k$) 256 MB (200$) 8 x 80 GB drives (2.8K$) Gbps Ethernet + switch (1.5k$) 11k$ TB, 22K$/RAIDED TB 2x800 Mhz 256 MB The “Absurd” Disk • 2.5 hr scan time (poor sequential access) • 1 access per second / 5 GB (VERY cold data) • It’s a tape! 100 MB/s 200 Kaps 1 TB It’s Hard to Archive a Petabyte It takes a LONG time to restore it. • At 1GBps it takes 12 days! • Store it in two (or more) places online (on disk?). A geo-plex • Scrub it continuously (look for errors) • On failure, – use other copy until failure repaired, – refresh lost copy from safe copy. • Can organize the two copies differently (e.g.: one by time, one by space) Disk vs Tape Disk Tape – 80 GB – 35 MBps – 5 ms seek time – 3 ms rotate latency – 4$/GB for drive 3$/GB for ctlrs/cabinet – 4 TB/rack – – – – – – 1 hour scan – 1 week scan 40 GB 10 MBps 10 sec pick time 30-120 second seek time 2$/GB for media 8$/GB for drive+library – 10 TB/rack Guestimates Cern: 200 TB 3480 tapes 2 col = 50GB Rack = 1 TB =12 drives The price advantage of tape is narrowing, and the performance advantage of disk is growing At 10K$/TB, disk is competitive with nearline tape. Gilder’s Law: 3x bandwidth/year for 25 more years • Today: – 10 Gbps per channel – 4 channels per fiber: 40 Gbps – 32 fibers/bundle = 1.2 Tbps/bundle • • • • In lab 3 Tbps/fiber (400 x WDM) In theory 25 Tbps per fiber 1 Tbps = USA 1996 WAN bisection bandwidth Aggregate bandwidth doubles every 8 months! 1 fiber = 25 Tbps Sense of scale • How fat is your pipe? • Fattest pipe on MS campus is the WAN! 94 MBps Coast to Coast 300 MBps OC48 = G2 Or memcpy() 90 MBps PCI 20MBps disk / ATM / OC3 Redmond/Seattle, WA Information Sciences Institute Microsoft Qwest University of Washington Pacific Northwest Gigapop New York HSCC (high speed connectivity consortium) DARPA Arlington, VA San Francisco, CA 5626 km 10 hops Networking • WANS are getting faster than LANS G8 = OC192 = 10Gbps is “standard” • Link bandwidth improves 4x per 3 years • Speed of light (60 ms round trip in US) • Software stacks have always been the problem. Time = SenderCPU + ReceiverCPU + bytes/bandwidth This has been the problem The Promise of SAN/VIA/Infiniband 10x in 2 years • Yesterday: http://www.ViArch.org/ – 10 MBps (100 Mbps Ethernet) – ~20 MBps tcp/ip saturates 2 cpus – round-trip latency ~250 µs • Now 250 Time µs to Send 1KB 200 150 Transmit receivercpu sender cpu 100 – Wires are 10x faster Myrinet, Gbps Ethernet, ServerNet,… – Fast user-level communication • tcp/ip ~ 100 MBps 10% cpu • round-trip latency is 15 us • 1.6 Gbps demoed on a WAN 50 0 100Mbps Gbps SAN What’s a Balanced System? System Bus PCI Bus PCI Bus Rules of Thumb in Data Engineering • • • • • • • Moore’s law -> an address bit per 18 months. Storage grows 100x/decade (except 1000x last decade!) Disk data of 10 years ago now fits in RAM (iso-price). Device bandwidth grows 10x/decade – so need parallelism RAM:disk:tape price is 1:10:30 going to 1:10:10 Amdahl’s speedup law: S/(S+P) Amdahl’s IO law: bit of IO per instruction/second (tBps/10 top! 50,000 disks/10 teraOP: 100 M$ Dollars) • Amdahl’s memory law: byte per instruction/second (going to 10) (1 TB RAM per TOP: 1 TeraDollars) • PetaOps anyone? • Gilder’s law: aggregate bandwidth doubles every 8 months. • 5 Minute rule: cache disk data that is reused in 5 minutes. • Web rule: cache everything! http://research.Microsoft.com/~gray/papers/ MS_TR_99_100_Rules_of_Thumb_in_Data_Engineering.doc Scalability: Up and Out Up •“Scale Up” –Use “big iron” (SMP) –Cluster into packs for availability •“Scale Out” clones & partitions –Use commodity servers –Add clones & partitions as needed Out An Architecture for Internet Services? • Need to be able to add capacity – New processing – New storage – New networking • Need continuous service – Online change of all components (hardware and software) – Multiple service sites – Multiple network providers • Need great development tools – Change the application several times per year. – Add new services several times per year. Farm Premise: Each Site is a • Buy computing by the slice (brick): Building 11 – Rack of servers + disks. • Grow by adding slices Internal WWW Staging Servers (7) Log Processing Av e CFG: 4xP6, 1 GB RAM, 180 GB HD Av e Cost: $128K FY98 Fcst: 2 • Two styles: The Microsoft.Com Site SQLNet Feeder LAN Router Liv e SQL Serv ers MOSWest Admin LAN Live SQL Server All servers in Building11 are accessable from corpnet. w w w .microsoft.com (4) register.microsoft.com (2) Ave CFG: 4xP6, Ave CFG: 4xP6, 512 RAM, 30 GB HD Ave Cost: $35K FY98 Fcst: 3 Av e CFG: 4xP6, 512 RAM, 160 GB HD Av e Cost: $83K FY98 Fcst: 12 Av e CFG: 4xP6, 512 RAM, 50 GB HD Av e Cost: $35K FY98 Fcst: 2 home.microsoft.com (4) Av e CFG: 4xP6 512 RAM 28 GB HD Av e Cost: $35K FY98 Fcst: 17 FDDI Ring (MIS1) home.microsoft.com (3) FDDI Ring (MIS2) Av e CFG: 4xP6, 256 RAM, 30 GB HD Av e Cost: $25K FY98 Fcst: 2 Router Internet register.msn.com (2) Switched Ethernet search.microsoft.com (1) Japan Data Center w w w .microsoft.com SQL SERVERS (2) premium.microsoft.com (3) Av e CFG: 4xP6, (1) Av e CFG: 4xP6, 512 RAM, Av e CFG: 4xP6, 512 RAM, 30 GB HD Av e Cost: $35K FY98 Fcst: 1 512 RAM, 50 GB HD Av e Cost: $50K FY98 Fcst: 1 160 GB HD Av e Cost: $80K FY98 Fcst: 1 msid.msn.com (1) Switched Ethernet FTP Download Serv er (1) HTTP Download Serv ers (2) search.microsoft.com (2) Router Secondary Gigaswitch support.microsoft.com search.microsoft.com (1) (3) Router support.microsoft.com (2) 13 DS3 (45 Mb/Sec Each) Ave CFG: 4xP5, 512 RAM, 30 GB HD Ave Cost: $28K FY98 Fcst: 0 register.microsoft.com (2) register.microsoft.com (1) (100Mb/Sec Each) Internet Router FTP.microsoft.com (3) msid.msn.com (1) 2 OC3 Primary Gigaswitch Router Router Av e CFG: 4xP5, 256 RAM, 20 GB HD Av e Cost: $29K FY98 Fcst: 2 Av e CFG: 4xP6, 512 RAM, 30 GB HD Av e Cost: $28K FY98 Fcst: 7 activex.microsoft.com (2) Av e CFG: 4xP6, 512 RAM, 30 GB HD Av e Cost: $28K FY98 Fcst: 3 Router home.microsoft.com (2) SQL SERVERS (2) Av e CFG: 4xP6, 512 RAM, 160 GB HD Av e Cost: $80K FY98 Fcst: 1 Router premium.microsoft.com (1) FDDI Ring (MIS3) FTP Download Serv er (1) Router Router msid.msn.com (1) 512 RAM, 30 GB HD Av e Cost: $35K FY98 Fcst: 1 msid.msn.com (1) search.microsoft.com (3) cdm.microsoft.com (1) Av e CFG: 4xP5, 256 RAM, 12 GB HD Av e Cost: $24K FY98 Fcst: 0 Av e CFG: 4xP6, 1 GB RAM, 160 GB HD Av e Cost: $83K FY98 Fcst: 2 msid.msn.com (1) w w w .microsoft.com (4) 512 RAM, 30 GB HD Ave Cost: $43K FY98 Fcst: 10 Av e CFG: 4xP6, 512 RAM, 50 GB HD Av e Cost: $50K FY98 Fcst: 17 w w w .microsoft.com (3) w w w .microsoft.compremium.microsoft.com (1) Av e CFG: 4xP6, Av e CFG: 4xP6, (3) 512 RAM, 50 GB HD Av e Cost: $50K FY98 Fcst: 1 SQL Consolidators DMZ Staging Serv ers Router SQL Reporting Av e CFG: 4xP6, 512 RAM, 160 GB HD Av e Cost: $80K FY98 Fcst: 2 European Data Center IDC Staging Serv ers MOSWest FTP Servers Ave CFG: 4xP5, 512 RAM, Download 30 GB HD Replication Ave Cost: $28K FY98 Fcst: 0 premium.microsoft.com (2) – Spread data and computation to new slices Ave CFG: 4xP5, 512 RAM, 30 GB HD Ave Cost: $35K FY98 Fcst: 12 w w w .microsoft.com (5) Internet FDDI Ring (MIS4) home.microsoft.com (5) Ave CFG: 4xP6, 512 RAM, 30 GB HD Ave Cost: $35K FY98 Fcst: 9 \\Tweeks\Statistics\LAN and Server Name Info\Cluster Process Flow\MidYear98a.vsd 12/15/97 – Clones: anonymous servers – Parts+Packs: Partitions fail over within a pack • In both cases, remote farm for disaster recovery 2 Ethernet (100 Mb/Sec Each) Everyone scales out What’s the Brick? • 1M$/slice – IBM S390? – Sun E 10,000? • 100 K$/slice – HPUX/AIX/Solaris/IRIX/EMC • 10 K$/slice – Utel / Wintel 4x • 1 K$/slice – Beowulf / Wintel 1x Outline • Technology: – 1M$/PB: store everything online (twice!) • End-to-end high-speed networks – Gigabit to the desktop • Research driven by apps: – EOS/DIS – TerraServer – National Virtual Astronomy Observatory. Interesting Apps • EOS/DIS • TerraServer • Sloan Digital Sky Survey Kilo Mega Giga Tera Peta Exa 103 106 109 1012 1015 1018 today, we are here The Challenge -- EOS/DIS • Antarctica is melting -- 77% of fresh water liberated – sea level rises 70 meters – Chico & Memphis are beach-front property – New York, Washington, SF, LA, London, Paris • Let’s study it! Mission to Planet Earth • EOS: Earth Observing System (17B$ => 10B$) – 50 instruments on 10 satellites 1999-2003 – Landsat (added later) • EOS DIS: Data Information System: – 3-5 MB/s raw, 30-50 MB/s processed. – 4 TB/day, – 15 PB by year 2007 Designing EOS/DIS • Expect that millions will use the system (online) Three user categories: – NASA 500 -- funded by NASA to do science – Global Change 10 k - other earth scientists – Internet 500 M - everyone else High school students Grain speculators Environmental Impact Reports New applications => discovery & access must be automatic • Allow anyone to set up a peer- node (DAAC & SCF) • Design for Ad Hoc queries, Not Standard Data Products design for pull vs push. computation demand is enormous (pull:push is 100: 1) Key Architecture Features • • • • • • 2+N data center design Scaleable OR-DBMS Emphasize Pull vs Push processing Storage hierarchy Data Pump Just in time acquisition Obvious Point: EOS/DIS will be a cluster of SMPs • It needs 16 PB storage = 200K disks in current technology = 400K tapes in current technology • It needs 100 TeraOps of processing = 100K processors (current technology) and ~ 100 Terabytes of DRAM • Startup requirements are 10x smaller – smaller data rate – almost no re-processing work 2+N data center design • • • • duplex the archive (for fault tolerance) let anyone build an extract (the +N) Partition data by time and by space (store 2 or 4 ways). Each partition is a free-standing OR-DBBMS (similar to Tandem, Teradata designs). • Clients and Partitions interact via standard protocols – HTTP+XML, Data Pump • Some queries require reading ALL the data (for reprocessing) • Each Data Center scans ALL the data every 2 days. – Data rate 10 PB/day = 10 TB/node/day = 120 MB/s • Compute on demand small jobs • • • less than 100 M disk accesses less than 100 TeraOps. (less than 30 minute response time) • For BIG JOBS scan entire 15PB database • Queries (and extracts) “snoop” this data pump. Problems • Management (and HSM) • Design and Meta-data • Ingest • Data discovery, search, and analysis • Auto Parallelism • reorg-reprocess What this system taught me • Traditional storage metrics – KAPS: KB objects accessed per second – $/GB: Storage cost • New metrics: – MAPS: megabyte objects accessed per second – SCANS: Time to scan the archive – Admin cost dominates (!!) – Auto parallelism is essential. Outline • Technology: – 1M$/PB: store everything online (twice!) • End-to-end high-speed networks – Gigabit to the desktop • Research driven by apps: – EOS/DIS – TerraServer – National Virtual Astronomy Observatory. Microsoft TerraServer: http://TerraServer.Microsoft.com/ • Build a multi-TB SQL Server database • Data must be – – – – 1 TB Unencumbered Interesting to everyone everywhere And not offensive to anyone anywhere – – – – 1.5 M place names from Encarta World Atlas 7 M Sq Km USGS doq (1 meter resolution) 10 M sq Km USGS topos (2m) 1 M Sq Km from Russian Space agency (2 m) • Loaded • On the web (world’s largest atlas) • Sell images with commerce server. Microsoft TerraServer Background • Earth is 500 Tera-meters square – USA is 10 tm2 • 100 TM2 land in 70ºN to 70ºS • We have pictures of 6% of it • Someday – multi-spectral image – of everywhere – once a day / hour – 3 tsm from USGS – 2 tsm from Russian Space Agency • • • • Compress 5:1 (JPEG) to 1.5 TB. Slice into 10 KB chunks Store chunks in DB Navigate with – Encarta™ Atlas • globe • gazetteer – StreetsPlus™ in the USA .2x.2 km2 tile .4x.4 km2 image .8x.8 km2 image 1.6x1.6 km2 image USGS Digital Ortho Quads (DOQ) • US Geologic Survey • 6 Tera Bytes (14 TB raw but there is redundancy) • Most data not yet published • Based on a CRADA – TerraServer makes data available. 1x1 meter 6 TB Continental US New Data Coming USGS “DOQ” Russian Space Agency(SovInfomSputnik) SPIN-2 (Aerial Images is Worldwide Distributor) SPIN-2 • • • • • • 1.5 Meter Geo Rectified imagery of (almost) anywhere Almost equal-area projection De-classified satellite photos (from 200 KM), More data coming (1 m) Selling imagery on Internet. Putting 2 tm2 onto Microsoft TerraServer. Hardware Internet Map Site Server Servers SPIN-2 100 Mbps Ethernet Switch DS3 Web Servers 6TB Database Server AlphaServer 8400 4x400. 10 GB RAM 324 StorageWorks disks 10 drive tape library (STC Timber Wolf DLT7000 ) Software Web Client Image Server Active Server Pages Internet Information Server 4.0 Java Viewer browser MTS Terra-Server Stored Procedures HTML The Internet Internet Info Server 4.0 SQL Server 7 Microsoft Automap ActiveX Server TerraServer DB Automap Server TerraServer Web Site Internet Information Server 4.0 Microsoft Site Server EE Image Delivery SQL Server Application 7 Image Provider Site(s) BAD OLD Load DLT Tape DLT Tape “tar” NT \Drop’N’ DoJob LoadMgr DB Wait 4 Load Backup LoadMgr LoadMgr ESA Alpha Server 4100 100mbit EtherSwitch 60 4.3 GB Drives Alpha Server 4100 ImgCutter \Drop’N’ \Images Enterprise Storage Array STC DLT Tape Library 108 9.1 GB Drives 108 9.1 GB Drives 108 9.1 GB Drives Alpha Server 8400 10: ImgCutter 20: Partition 30: ThumbImg 40: BrowseImg 45: JumpImg 50: TileImg 55: Meta Data 60: Tile Meta 70: Img Meta 80: Update Place ... New Image Load and Update DLT Tape “tar” Active Server Pages Cut & Load Scheduling System Metadata Load DB Dither Image Pyramid From base Image Cutter Merge ODBC Tx TerraLoader ODBC TX ODBC Tx TerraServer SQL DBMS After a Year: 30M Count • 2 TB of data 750 M records • 2.3 billion Hits • 2.0 billion DB Queries • 1.7 billion Images sent (2 TB of download) • 368 million Page Views • 99.93% DB Availability • 3rd design now Online • Built and operated by team of 4 people TerraServer Daily Traffic Jun 22, 1998 thru June 22, 1999 Sessions Hit Page View DB Query Image 20M 10M 0 Down Time TotalTime (Hours) (Hours:minutes) 8640 6:00 7920 5:30 7200 5:00 6480 Operations 4:30 5760 4:00 5040 4320 3600 2880 Up 3:30 3:00 2:30 Scheduled 2:00 2160 1:30 1440 1:00 720 0:30 0 0:00 HW+Software TerraServer Current Effort • • • • • • • • Added USGS Topographic maps (4 TB) The other 25% of the US DOQs (photos) Adding digital elevation maps Integrated with Encarta Online Open architecture: publish XML and C# interfaces. Adding mult-layer maps (with UC Berkeley) High availability (4 node cluster with failover) Geo-Spatial extension to SQL Server Outline • Technology: – 1M$/PB: store everything online (twice!) • End-to-end high-speed networks – Gigabit to the desktop • Research driven by apps: – EOS/DIS – TerraServer – National Virtual Astronomy Observatory. (inter) National Virtual Observatory • • • • • • • Almost all astronomy datasets will be online Some are big (>>10 TB) Total is a few Petabytes Bigger datasets coming Data is “public” Scientists can mine these datasets Computer Science challenge: Organize these datasets Provide easy access to them. The Sloan Digital Sky Survey A project run by the Astrophysical Research Consortium (ARC) The University of Chicago Princeton University The Johns Hopkins University The University of Washington Fermi National Accelerator Laboratory US Naval Observatory The Japanese Participation Group The Institute for Advanced Study SLOAN Foundation, NSF, DOE, NASA Goal: To create a detailed multicolor map of the Northern Sky over 5 years, with a budget of approximately $80M Data Size: 40 TB raw, 1 TB processed Scientific Motivation Create the ultimate map of the Universe: The Cosmic Genome Project! Study the distribution of galaxies: What is the origin of fluctuations? What is the topology of the distribution? Measure the global properties of the Universe: How much dark matter is there? Local census of the galaxy population: How did galaxies form? Find the most distant objects in the Universe: What are the highest quasar redshifts? The ‘Naught’ Problem What are the global parameters of the Universe? H0 0 0 the Hubble constant the density parameter the cosmological constant 55-75 km/s/Mpc 0.25-1 0 - 0.7 Their values are still quite uncertain today... Goal: measure these parameters with an accuracy of a few percent High Precision Cosmology! The Cosmic Genome Project The SDSS will create the ultimate map of the Universe, with much more detail than any other measurement before daCosta etal 1995 deLapparent, Geller and Huchra 1986 Gregory and Thompson 1978 SDSS Collaboration 2002 The Spectroscopic Survey Measure redshifts of objects distance SDSS Redshift Survey: 1 million galaxies 100,000 quasars 100,000 stars Two high throughput spectrographs spectral range 3900-9200 Å. 640 spectra simultaneously. R=2000 resolution. Automated reduction of spectra Very high sampling density and completeness Objects in other catalogs also targeted The First Quasars Three of the four highest redshift quasars have been found in the first SDSS test data ! SDSS Data Products Object catalog parameters of >108 objects Redshift Catalog parameters of 106 objects 400 GB 2 GB Atlas Images 5 color cutouts of >109 objects 1.5 TB Spectra in a one-dimensional form 106 60 GB Derived Catalogs - clusters - QSO absorption lines 60 GB 4x4 Pixel All-Sky Map heavily compressed 5 x 105 1 TB All raw data saved in a tape vault at Fermilab Concept of the SDSS Archive Operational Archive Science Archive (products accessible to users) (raw + processed data) Other Archives Other OtherArchives Archives Parallel Query Implementation • Getting 200MBps/node thru SQL today • = 4 GB/s on 20 node cluster. User Interface Analysis Engine Master SX Engine DBMS Federation DBMS Slave Slave Slave DBMS Slave DBMS RAID DBMS RAID DBMS RAID RAID What we have been doing with SDSS – Database design – Data loading – Spatial data access method 1.E+7 1.E+6 1.E+5 u-g 1.E+4 Counts • Helping move the data to SQL Color Magnitude Diff/Ratio Distribution g-r r-i 1.E+3 i-z 1.E+2 1.E+1 1.E+0 -30 -20 -10 0 10 20 Magnitude Diff/Ratio • Experimenting with queries on a 4 M object DB – 20 questions like “find gravitational lens candidates” – Queries use parallelism, most run in a few seconds.(auto parallel) – Some run in hours (neighbors within 1 arcsec) – EASY to ask questions. • Helping with an “outreach” website: SkyServer • Personal goal: Try data mining techniques to “re-discover” Astronomy 30 Outline • Technology: – 1M$/PB: store everything online (twice!) • End-to-end high-speed networks – Gigabit to the desktop • Research driven by apps: – EOS/DIS – TerraServer – National Virtual Astronomy Observatory.