Microsoft SPIN-2 Tom Barclay Jim Gray, Don Slutz, many others

advertisement
SPIN-2
Microsoft
BackOffice
Tom Barclay
Jim Gray, Don Slutz, many others
Microsoft Research
Terra-Server
Application Requirements

BIG —1 TB of data.

PUBLIC — available on the world wide web.

INTERESTING — to a wide audience

ACCESSIBLE — using standard browsers (IE, Netscape)

REAL — a real application (users can buy imagery)

FREE —cannot require NDA or money to access

FAST — impress customers for BackOffice, StorageWorks

EASY — Inexpensive to develop, deploy, and maintain
Project Mission Statement
Demonstrate the unique capabilities
of each Terra-Server partner.
Demo scope & quality
of Spin-2 imagery
Open new markets SPIN-2
for imagery sales
Demo DEC Alpha
& StorageWorks™
Scalability
Recognized as superior h/w vendor
Distribute DOQs to a
wider audience
Lower cost of
distribution
Microsoft
BackOffice
Demo Scalability
of NT &
SQL Server
What’s a Terabyte?
1 Terabyte
1,000,000,000 business letters
100,000,000 book pages
50,000,000 FAX images
10,000,000 TV pictures (mpeg)
4,000 LandSat images
150 miles of book shelf
15 miles of book shelf
7 miles of book shelf
10 days of video
16 earth images (100m)
Library of Congress (in ASCII) is 25 TB
1980: 200 M$ of disc
5 M$ of tape silo
1996: 200 k$ of magnetic disc
50 K$ nearline tape
Terror Byte !!
10,000 discs
10,000 tapes
120 discs
50 tapes
4







Background (no pun intended)
Earth is 500 Tera-meters square
– USA is 10 tm2
100 TM2 land in 70ºN to 70ºS
We have pictures of 6% of it
– 3 tsm from USGS
– 2 tsm from Russian Space Agency
Compress 5:1 (JPEG) to 1.5 TB.
Slice into 10 KB chunks
Store chunks in DB
Navigate with
– Encarta™ Atlas
globe
 gazetteer

– StreetsPlus™ in the USA

Someday
– multi-spectral image
– of everywhere
– once a day / hour
1.8x1.2 km2 tile
10x15 km2 thumbnail
20x30 km2 browse image
40x60 km2 jump image
Image Data
1x1 meter
4 TB
Continental
US
New Data
Coming
DRG
Topo
Maps
may add
during
1998
USGS “DOQ”
Spin-2
Poor
Resolution
Spot Image
1.5x1.5 m
500 GB
World Wide
New Data
Coming
USGS Digital Ortho Quads (DOQ)

1 Meter aerial photos of many places
most of data not yet published
 Based on a CRADA

– TerraServer will make data available.

UTM coordinates
–
–
–
–
(universal transverse mercator)
120 zones (60 N and 60 south
Each zone is 6 º slice
Zone is flattened:

coordinates are Zone #, xx N yy E/W
– North may be 3º off
– Zone crossings are bizarre.

90º =
107 m
The earth is not flat
– and it is not round either
6º ~ 6.666 105 m
Russian Space Agency(SovInfomSputnik)
SPIN-2 (Aerial Images is Worldwide Distributor)






1.5 Meter Geo Rectified imagery of (almost) anywhere
Almost equal-area projection
De-classified satellite photos (from 200 KM),
More data coming (1 m)
Want to sell imagery on Internet.
Putting 2 tm2 onto TerraServer.
SPIN-2
What Microsoft & DEC Contribute
 Microsoft’s
contribution:
–
–
–
–
Microsoft
Build an “internet UI”
Design the app and the database
BackOffice
Slice & Dice & Load the data.
Build “electronic stores” for USGS’ for Aerial
Images to operate to sell & distribute images
– Run a “robust”web site 18 months
 Digital
contribution:
– Provide high-performance processors
– provide high capacity, reliable storage.
– Provide technical advice
Demo
http://msrlab/terraserver
Microsoft
BackOffice
SPIN-2
Hardware
Map Server
STC
9710
DLT7k
Tape
Library
10x256
= 5TB
Enterprise Storage Array
108
9.1 GB
Drives
108
9.1 GB
Drives
108
9.1 GB
Drives
Alpha
Server
8400
100mbit
EtherSwitch
1TB Database Server
AlphaServer 8400 4x400. 8GB RAM
324 StorageWorks disks
10 drive tape library (STC Timber Wolf DLT7000 )
USGS
Site Server
DS3
Internet
Spin2
Site Server
SPIN-2
Software
Web Client
Image
Server
Active Server Pages
Internet
Information
Server 3.0
Java
Viewer
IE or Netscape
MTS
ODBC
Terra-Server
Stored Procedures
HTML
The Internet
Internet Info
Server 3.0
Internet Information
Server 3.0
Sphinx
(SQL Server)
Microsoft Automap
ActiveX Server
Microsoft
Site Server EE
Terra-Server DB
Automap Server
Image Delivery SQL Server
Application
7
Terra-Server Web Site
Image Provider Site(s)
 Backup
–
–
–
–
and Recovery
System
Management &
Maintenance
Cheyenne ArcServe
Legato Networker
Seagate Backup Exec
Sphinx Backup/Restore Utility
 SQL Server
Enterprise Manager
– DBA Maintenance
– SQL Performance Monitor
 “Chopped”
How We Did It
big images into small “tiles”
– Sub-sampled tiles to create zoom levels
– Tile sizes map to Lat/Lon system
– Unique ID assigned to each Tile location
 (Z-transform
of lat/long or UTM)
– Unique ID clusters adjacent tiles onto the same database
& index pages
 Wrote
–
–
–
–
Load Management program
Runs image cutting job
Loads meta and image data into SQL
Multiple Loaders can run in parallel
Web Active Server Page controls load process
USGS Editing Process
1 “QUAD”
DOQ Photo
(3.75’ x 3.75’)
1
2
3
4
7
5
8
6
9
10
11 12
13
16
14 15
17 18
Quad Cut 3x6
Jump, Thumb-nails &
Browse Images
1 Degree Latitude
1 Quadrangle
(7.5’ x 7.5’)
1
9
DOQQ Origin Point
1 Degree Longitude
8
64
DOQ
Tiles
Spin-2 Image Editing Process
48 x 96 cells per sq degree
Image aligned to left
corner of grid system
Non-image squares (all
white) are discarded
Cut Images are extracted
SubSample
Jump
32m
Thumb
16m
8m
Tiles are cut
5x5, scrambled
output Jpeg
Browse
Spin-2 Meta Data
Semi-colon delimited fields, ASCII encoding 1 records per line









1Field
File name (of image)
City1
State1
Country
Number of Rows
Number of Columns
Shooting Height
Height of Sun
Date of survey
(mm/dd/yyyy)
 Time of survey (GMT)
(hr:mn:ss)
 Upper Left Latitude
 Upper Left Longitude
 Lower Right Latitude
 Lower Right Longitude
 Camera System1
 Pixel size1
 Copyright1
is not required, if not present, then a blank field is present
Logical Schema
Country
PlaceType
State
Image Data &
Meta Data
Place
Theme Meta
Information
TileLog
ImgMeta
TileMeta
FeatureType
Gazetteer
Star schema
Index on
• image, place, type
• image, state, type
• image, state, country, type
• image, place, state, type
• image, place, country, type
all lookups are fast
Jump Img
BrowseImg
Thumb Img
TileImg
Lookup by UGrid or ZGrid ID plus resolution
Lookups are fast.
Indices are in DRAM (auto-magically by SQL)
SQL manages all the tiles and indices
Images are brought in on demand
*.IMD & *.JPG
Pre-Process Data
NT Backup
Read *.IMD files
Generate Ids
Generate ZLatLong
Sort by ZLatLong
Image Meta
Tile Meta
Load Thumb Img
Load Browse Img
Load Tile Img
Read Image Meta
Read Image Data
BCP into ImgTbl
Read Image Meta
Read Image Data
BCP into ImgTbl
Read Tile Meta
Read Tile Data
BCP into TileTbl
“SRC”ThumbImg
ThumbImgId int
ImgMetaId int
ZLatLong
int
SrcId
int
ImgTypeId
int
PixWidth
int
PixHeight
int
ImgData
Blob
“SRC”BrowseImg
BrowseImgId int
ImgMetaId
int
ZLatLong
int
SrcId
int
ImgTypeId
int
PixWidth
int
PixHeight
int
ImgData
Blob
Meta & Image
Load Process
“SRC”TileImg
TileImgId
TileMetaId
ZLatLong
SrcId
ImgTypeId
PixWidth
PixHeight
ImgData
int
int
int
int
int
int
int
Blob
Load Tile Meta
Load Img Meta
Read Image Meta
BCP into TileMeta
Read Image Meta
BCP into TileMeta
TileMeta
TileMetaId
ImgMetaId
OrigMetaId
SrcId
ImgTypeId
XGridId
YGridId
Hemisphere
Continent
xxLat
xxLong
ZLatLong
int
int
int
int
int
int
int
smallint
smallint
smallint
smallint
int
ImgMeta
ImgMetaId
int
OrigMetaId
int
SrcId
int
ImgTypeId
int
XGridId
int
YGridId
int
ImgDate
Date
Hemisphere smallint
Continent
smallint
xxLat
smallint
xxLong
smallint
ZLatLong
int
MetaStr vchar(255)
Image Delivery and Load
DLT
Tape
DLT
Tape
“tar”
NT
DoJob
\Drop’N’
LoadMgr
DB
Wait 4
Load
Backup
LoadMgr
LoadMgr
ESA
Alpha
Server
4100
100mbit
EtherSwitch
60
4.3 GB
Drives
Alpha
Server
4100
ImgCutter
\Drop’N’
\Images
Enterprise Storage Array
STC
DLT
Tape
Library
108
9.1 GB
Drives
108
9.1 GB
Drives
108
9.1 GB
Drives
Alpha
Server
8400
10: ImgCutter
20: Partition
30: ThumbImg
40: BrowseImg
45: JumpImg
50: TileImg
55: Meta Data
60: Tile Meta
70: Img Meta
80: Update Place
...
Physical DB Design



324 disks ~ 3 TB of disk space
Configured as RAID5 => ~2.4 TB
Configured as 20 NT volumes
– Each volume ~ 120 GB
– Big files!

SQL data spread across
all volumes.
– Combines the 20 files.
– One BIG table for the tiles
– Images stored as blobs
(JPEG compressed)

2 GB RAM holds
– all indices and
– gazetteer.
DEC Alpha Server
T2B2
Alpha 8400
4x400Mhz
2 GB
PCI
U-SCSI
36 9GB disks
HSZ50
Controller
HSZ50
Controller
36 9GB disks
HSZ50
Controller
36 9GB disks
HSZ50
Controller
36 9GB disks
HSZ50
Controller
36 9GB disks
HSZ50
Controller
36 9GB disks
HSZ50
Controller
36 9GB disks
HSZ50
Controller
36 9GB disks
HSZ50
Controller
36 9GB disks
HSZ50
Controller(2)
10x tapes
Terra-Server Tables

USGS DOQ Data
– 48,000 DOQQ images (45-55mb / image)
– Creates 864,000 Jump, Thumb, & Browse images (3.5 m rows)
– Creates 55.3 m Tile images (110.6 m rows)

SPIN-2 Data
– 3200 278 MB images (approximate size)
– Creates 620,800 Jump, Thumb, & Browse images (2.5 m rows)
– Creates 15.5 m Tile images (31 m rows)

Gazetteer Data
– 1.1 m named places (Encarta World Atlas)
– 45 m cell names

Total Rows = 193.7 M
Other Details

Active Server pages
– faster and easier than DB stored procedures.

Commerce Server is interesting
– Images the Inventory
 no SKU,
 millions of them
– USGS built their own
 they are very smart, but it is easy
 masquerade as a credit-card reader.
The earth is a geoid, and
 Every Geographer has a coordinate system (or two).
 Tapes are still a nightmare.
 Everyone is a UI expert.

Thank You!
SPIN-2
Microsoft
BackOffice
Download