SPIN-2 Microsoft BackOffice Tom Barclay Jim Gray, Don Slutz, many others Microsoft Research Terra-Server Application Requirements BIG —1 TB of data. PUBLIC — available on the world wide web. INTERESTING — to a wide audience ACCESSIBLE — using standard browsers (IE, Netscape) REAL — a real application (users can buy imagery) FREE —cannot require NDA or money to access FAST — impress customers for BackOffice, StorageWorks EASY — Inexpensive to develop, deploy, and maintain Project Mission Statement Demonstrate the unique capabilities of each Terra-Server partner. Demo scope & quality of Spin-2 imagery Open new markets SPIN-2 for imagery sales Demo DEC Alpha & StorageWorks™ Scalability Recognized as superior h/w vendor Distribute DOQs to a wider audience Lower cost of distribution Microsoft BackOffice Demo Scalability of NT & SQL Server What’s a Terabyte? 1 Terabyte 1,000,000,000 business letters 100,000,000 book pages 50,000,000 FAX images 10,000,000 TV pictures (mpeg) 4,000 LandSat images 150 miles of book shelf 15 miles of book shelf 7 miles of book shelf 10 days of video 16 earth images (100m) Library of Congress (in ASCII) is 25 TB 1980: 200 M$ of disc 5 M$ of tape silo 1996: 200 k$ of magnetic disc 50 K$ nearline tape Terror Byte !! 10,000 discs 10,000 tapes 120 discs 50 tapes 4 Background (no pun intended) Earth is 500 Tera-meters square – USA is 10 tm2 100 TM2 land in 70ºN to 70ºS We have pictures of 6% of it – 3 tsm from USGS – 2 tsm from Russian Space Agency Compress 5:1 (JPEG) to 1.5 TB. Slice into 10 KB chunks Store chunks in DB Navigate with – Encarta™ Atlas globe gazetteer – StreetsPlus™ in the USA Someday – multi-spectral image – of everywhere – once a day / hour 1.8x1.2 km2 tile 10x15 km2 thumbnail 20x30 km2 browse image 40x60 km2 jump image Image Data 1x1 meter 4 TB Continental US New Data Coming DRG Topo Maps may add during 1998 USGS “DOQ” Spin-2 Poor Resolution Spot Image 1.5x1.5 m 500 GB World Wide New Data Coming USGS Digital Ortho Quads (DOQ) 1 Meter aerial photos of many places most of data not yet published Based on a CRADA – TerraServer will make data available. UTM coordinates – – – – (universal transverse mercator) 120 zones (60 N and 60 south Each zone is 6 º slice Zone is flattened: coordinates are Zone #, xx N yy E/W – North may be 3º off – Zone crossings are bizarre. 90º = 107 m The earth is not flat – and it is not round either 6º ~ 6.666 105 m Russian Space Agency(SovInfomSputnik) SPIN-2 (Aerial Images is Worldwide Distributor) 1.5 Meter Geo Rectified imagery of (almost) anywhere Almost equal-area projection De-classified satellite photos (from 200 KM), More data coming (1 m) Want to sell imagery on Internet. Putting 2 tm2 onto TerraServer. SPIN-2 What Microsoft & DEC Contribute Microsoft’s contribution: – – – – Microsoft Build an “internet UI” Design the app and the database BackOffice Slice & Dice & Load the data. Build “electronic stores” for USGS’ for Aerial Images to operate to sell & distribute images – Run a “robust”web site 18 months Digital contribution: – Provide high-performance processors – provide high capacity, reliable storage. – Provide technical advice Demo http://msrlab/terraserver Microsoft BackOffice SPIN-2 Hardware Map Server STC 9710 DLT7k Tape Library 10x256 = 5TB Enterprise Storage Array 108 9.1 GB Drives 108 9.1 GB Drives 108 9.1 GB Drives Alpha Server 8400 100mbit EtherSwitch 1TB Database Server AlphaServer 8400 4x400. 8GB RAM 324 StorageWorks disks 10 drive tape library (STC Timber Wolf DLT7000 ) USGS Site Server DS3 Internet Spin2 Site Server SPIN-2 Software Web Client Image Server Active Server Pages Internet Information Server 3.0 Java Viewer IE or Netscape MTS ODBC Terra-Server Stored Procedures HTML The Internet Internet Info Server 3.0 Internet Information Server 3.0 Sphinx (SQL Server) Microsoft Automap ActiveX Server Microsoft Site Server EE Terra-Server DB Automap Server Image Delivery SQL Server Application 7 Terra-Server Web Site Image Provider Site(s) Backup – – – – and Recovery System Management & Maintenance Cheyenne ArcServe Legato Networker Seagate Backup Exec Sphinx Backup/Restore Utility SQL Server Enterprise Manager – DBA Maintenance – SQL Performance Monitor “Chopped” How We Did It big images into small “tiles” – Sub-sampled tiles to create zoom levels – Tile sizes map to Lat/Lon system – Unique ID assigned to each Tile location (Z-transform of lat/long or UTM) – Unique ID clusters adjacent tiles onto the same database & index pages Wrote – – – – Load Management program Runs image cutting job Loads meta and image data into SQL Multiple Loaders can run in parallel Web Active Server Page controls load process USGS Editing Process 1 “QUAD” DOQ Photo (3.75’ x 3.75’) 1 2 3 4 7 5 8 6 9 10 11 12 13 16 14 15 17 18 Quad Cut 3x6 Jump, Thumb-nails & Browse Images 1 Degree Latitude 1 Quadrangle (7.5’ x 7.5’) 1 9 DOQQ Origin Point 1 Degree Longitude 8 64 DOQ Tiles Spin-2 Image Editing Process 48 x 96 cells per sq degree Image aligned to left corner of grid system Non-image squares (all white) are discarded Cut Images are extracted SubSample Jump 32m Thumb 16m 8m Tiles are cut 5x5, scrambled output Jpeg Browse Spin-2 Meta Data Semi-colon delimited fields, ASCII encoding 1 records per line 1Field File name (of image) City1 State1 Country Number of Rows Number of Columns Shooting Height Height of Sun Date of survey (mm/dd/yyyy) Time of survey (GMT) (hr:mn:ss) Upper Left Latitude Upper Left Longitude Lower Right Latitude Lower Right Longitude Camera System1 Pixel size1 Copyright1 is not required, if not present, then a blank field is present Logical Schema Country PlaceType State Image Data & Meta Data Place Theme Meta Information TileLog ImgMeta TileMeta FeatureType Gazetteer Star schema Index on • image, place, type • image, state, type • image, state, country, type • image, place, state, type • image, place, country, type all lookups are fast Jump Img BrowseImg Thumb Img TileImg Lookup by UGrid or ZGrid ID plus resolution Lookups are fast. Indices are in DRAM (auto-magically by SQL) SQL manages all the tiles and indices Images are brought in on demand *.IMD & *.JPG Pre-Process Data NT Backup Read *.IMD files Generate Ids Generate ZLatLong Sort by ZLatLong Image Meta Tile Meta Load Thumb Img Load Browse Img Load Tile Img Read Image Meta Read Image Data BCP into ImgTbl Read Image Meta Read Image Data BCP into ImgTbl Read Tile Meta Read Tile Data BCP into TileTbl “SRC”ThumbImg ThumbImgId int ImgMetaId int ZLatLong int SrcId int ImgTypeId int PixWidth int PixHeight int ImgData Blob “SRC”BrowseImg BrowseImgId int ImgMetaId int ZLatLong int SrcId int ImgTypeId int PixWidth int PixHeight int ImgData Blob Meta & Image Load Process “SRC”TileImg TileImgId TileMetaId ZLatLong SrcId ImgTypeId PixWidth PixHeight ImgData int int int int int int int Blob Load Tile Meta Load Img Meta Read Image Meta BCP into TileMeta Read Image Meta BCP into TileMeta TileMeta TileMetaId ImgMetaId OrigMetaId SrcId ImgTypeId XGridId YGridId Hemisphere Continent xxLat xxLong ZLatLong int int int int int int int smallint smallint smallint smallint int ImgMeta ImgMetaId int OrigMetaId int SrcId int ImgTypeId int XGridId int YGridId int ImgDate Date Hemisphere smallint Continent smallint xxLat smallint xxLong smallint ZLatLong int MetaStr vchar(255) Image Delivery and Load DLT Tape DLT Tape “tar” NT DoJob \Drop’N’ LoadMgr DB Wait 4 Load Backup LoadMgr LoadMgr ESA Alpha Server 4100 100mbit EtherSwitch 60 4.3 GB Drives Alpha Server 4100 ImgCutter \Drop’N’ \Images Enterprise Storage Array STC DLT Tape Library 108 9.1 GB Drives 108 9.1 GB Drives 108 9.1 GB Drives Alpha Server 8400 10: ImgCutter 20: Partition 30: ThumbImg 40: BrowseImg 45: JumpImg 50: TileImg 55: Meta Data 60: Tile Meta 70: Img Meta 80: Update Place ... Physical DB Design 324 disks ~ 3 TB of disk space Configured as RAID5 => ~2.4 TB Configured as 20 NT volumes – Each volume ~ 120 GB – Big files! SQL data spread across all volumes. – Combines the 20 files. – One BIG table for the tiles – Images stored as blobs (JPEG compressed) 2 GB RAM holds – all indices and – gazetteer. DEC Alpha Server T2B2 Alpha 8400 4x400Mhz 2 GB PCI U-SCSI 36 9GB disks HSZ50 Controller HSZ50 Controller 36 9GB disks HSZ50 Controller 36 9GB disks HSZ50 Controller 36 9GB disks HSZ50 Controller 36 9GB disks HSZ50 Controller 36 9GB disks HSZ50 Controller 36 9GB disks HSZ50 Controller 36 9GB disks HSZ50 Controller 36 9GB disks HSZ50 Controller(2) 10x tapes Terra-Server Tables USGS DOQ Data – 48,000 DOQQ images (45-55mb / image) – Creates 864,000 Jump, Thumb, & Browse images (3.5 m rows) – Creates 55.3 m Tile images (110.6 m rows) SPIN-2 Data – 3200 278 MB images (approximate size) – Creates 620,800 Jump, Thumb, & Browse images (2.5 m rows) – Creates 15.5 m Tile images (31 m rows) Gazetteer Data – 1.1 m named places (Encarta World Atlas) – 45 m cell names Total Rows = 193.7 M Other Details Active Server pages – faster and easier than DB stored procedures. Commerce Server is interesting – Images the Inventory no SKU, millions of them – USGS built their own they are very smart, but it is easy masquerade as a credit-card reader. The earth is a geoid, and Every Geographer has a coordinate system (or two). Tapes are still a nightmare. Everyone is a UI expert. Thank You! SPIN-2 Microsoft BackOffice