A Personal View of Microsoft Research and its Impact Jim Gray

advertisement
A Personal View of
Microsoft Research
and its Impact
Jim Gray
Microsoft Research
Gray@Microsoft.com
http://Research.Microsoft.com/~Gray/Talks
1
Outline
• What I am doing
• What my group is doing
• What Microsoft Research is doing
• How it affects Microsoft Products
and others
• Q&A
2
What I Am Doing
• Came to Microsoft to work on scalable computing:
build computers by the slice
–
–
–
–
–
TerraServer
Billion transactions per day
Fault tolerant clusters
Transactions in the OS (DTC, MTS, COM+,..)
Cyberbricks
• Social service
– PITAC: expand government funding of IT research and
training.
– Library of Congress study
– Putting all technical literature online
– Turing lecture
3
PITAC Report
Presidential IT Advisory Committee
• Findings:
http://www.ccic.gov/ac/report/
– Software construction is a mess: needs breakthroughs.
– We do not know how to scale the Internet 100x
• Security, manageability, services, terabit per second issues.
– USG needs high-performance computing (Simulation)
but market is not providing vector-supers – just providing processor arrays.
– Trained people are in very short supply.
• Recommendations:
–
–
–
–
Lewis & Clark expeditions to 21st century.
Increase long-term research funding by 1.4B$/y.
Re-invigorate university research & teaching.
Facilitate immigration of technical experts.
4
Why Can’t Industry Fund IT Research?
• It does: IBM
(5.8%),
Intel
(13%),
Lucent
(12%),
Microsoft
(14.%)
, Sun
(12%)
, ...
– R&D is ~5%-15% (50 B$ of 500 B$)
• AD is 10% of that (5 B$)
– Long-Range Research
is 10% of that 500 M$
2,500 researchers and university support
– Compaq: 4.8% R&D (1.3 B$ of 27.3 B$).AOL: 3.7% D, ?R (96 M$ of 2.6 B$)
– Dell:1.6% R&D
EDS, MCI-WorldCom, ….
(204 M$ of 12.6 B$),
• To be competitive, some companies
cannot make large long-term research investments.
The Xerox/PARC story:
created Mac, Adobe, 3Com…
5
Cyberspace is a New World
• We have discovered a “new continent”.
• It is changing how we learn, work, and play.
– 1 T$/y industry
– 1 T$ new wealth since 1993
– 30% of US economic growth since 1993
• There is a gold rush to stake out territory.
THE
• But we also need explorers:
LONG
Lewis & Clark expeditions
BOOM
Universities to teach the next generation(s)
• Governments, industry, and philanthropists
should fund long-term research.
6
ACM 1998 Turing Lecture
http://research.microsoft.com/~gray/
• Organized around three seminal visionaries
– Babbage: Computers
– Bush: Automatic Information storage & access
– Turing: Intelligent Machines
• A dozen LONG RANGE research goals
–
–
–
–
Vision, speech, …, Intelligent machines
Understand information
Automatic programming, dependable computing
Most are AI complete.
7
How Much Information Is there? Yotta
• Soon everything can be
recorded and indexed
• Most data never be seen by humans
• Precious Resource:
Human attention
Auto-Summarization
Auto-Search
is key technology.
Everything
!
Recorded
All Books
Zetta
Exa
MultiMedia
Peta
All LoC books
(words)
.Movi
e
A Photo
Tera
Giga
Mega
http://www.lesk.com/mlesk/ksg97/ksg.html
A Book
24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli
Kilo8
Putting All Information Online
• Library of Congress study
– NRC group looking at IT future of LC.
– World-library is evolving
• CoRR
– Put all scientific literature online (and free)
– http://xxx.lanl.gov/archive/cs
• TerraServer
– An example of the new digital library
• Sloan Digital Sky Survey (and an object catalog)
http://www.sdss.org/
9
1.2 Billion Transactions Per Day
•
•
•
•
1 B tpd ran for 24 hrs.
Out-of-the-box software
Off-the-shelf hardware
AMAZING!
•Sized for 30 days
•Linear growth
•5 micro-dollars per
transaction
10
How Much Is 1 Billion Tpd?
•
Mtpd
Mtpd
Millions of
of Transactions
Transactions Per
Per Day
Day
Millions
•
1 billion tpd = 11,574 tps
~ 700,000 tpm (transactions/minute)
ATT
– 185 million calls per peak day (worldwide)
1,000.
900.
800.
100.
700.
600.
500.
10.
400.
300.
1.
200.
100.
0.
0.1
•
Visa ~20 million tpd
– 400 million customers
– 250K ATMs worldwide
– 7 billion transactions
(card+cheque) in 1994
1 Btpd
Visa
ATT
BofA
NYSE
•
New York Stock Exchange
– 600,000 tpd
•
Bank of America
– 20 million tpd checks cleared
(more than any other bank)
– 1.4 million tpd ATM transactions
•
Worldwide Airlines Reservations:
250 Mtpd
11
NCSA Super Cluster
http://access.ncsa.uiuc.edu/CoverStories/SuperCluster/super.html
• National Center for Supercomputing Applications
University of Illinois @ Urbana
• 512 Pentium II cpus, 2,096 disks, SAN
• Compaq + HP +Myricom + WindowsNT
• A Super Computer for 3M$
• Classic Fortran/MPI programming
• COM+ programming model
12
Transactions in the OS
•
•
•
•
•
Transaction Processing Monitors (TP)
Web Servers
Database Systems (stored procedures)
Object Request Brokers (ORB)
Remote Procedure Call (RPC)
Programming model:
requests invoke objects:
COM+ or EJB
Quickly Dispatch many requests
Authorize on the fly
Load balance across many servers
Provide transactional services
(mgmt, restart/recovery, log, …)
 Protocol translation
(HTTP<->RPC or RPC<->LU6 or..)
All moving to core OS.
All converging on
a common
architecture
Web Servers Embrace&Extend
TPmonitor techniques
SPS: servelets per second
(ASPs served per second by IIS, 1,000 statement VBscript)
450
400
1P
2P
4P
8P
350
Shift 300
from 4x200
Mhz to 8 450 Mhz
250
200
150
100
50
0
NT4 InProc
W2K RC2 InProc
NT4 OOP
W2K RC2 OOP
13
CyberBricks: Data Gravity
Processing Moves to Transducers
• Move Processing to data sources
• Move to where the power (and sheet metal) is
• Processor in
– Modem
– Display
– Microphones (speech recognition)
& cameras (vision)
– Storage: Data storage and analysis
14
It’s Already True of Printers
Peripheral = CyberBrick
• You buy a printer
• You get a
– several network interfaces
– A Postscript engine
•
•
•
•
cpu,
memory,
software,
a spooler (soon)
– and… a print engine.
15
Remember Your Roots
16
Year 2002 Disks
• Big disk (5 $/GB)
–
–
–
–
3”
150 GB
150 kaps (k accesses per second)
30 MBps sequential
• Small disk (50 $/GB)
–
–
–
–
1”
1 GB
100 kaps
10 MBps sequential
• Both running Windows NT™ 7.0?
(see below for why)
17
How Do They Talk to Each Other?
Applications
Each node has an OS
Each node has local resources: A federation.
Each node does not completely trust the others.
Nodes use RPC to talk to each other
– CORBA? DCOM? IIOP? RMI?
– One or all of the above.
Applications
?
RPC
streams
datagrams
• Huge leverage in high-level interfaces.
• Same old distributed system story.
VIAL/VIPL
?
RPC
streams
datagrams
•
•
•
•
h
Wire(s)
18
Technology Drivers:
The Promise of SAN/VIA:10x in 2 years
http://www.ViArch.org/
• Today:
– wires are 10 MBps (100 Mbps Ethernet)
– ~20 MBps tcp/ip saturates 2 cpus
– round-trip latency is ~300 us
• In the lab
– Wires are 10x faster Myrinet, Gbps Ethernet, ServerNet,…
– Fast user-level communication
• tcp/ip ~ 100 MBps 10% of each processor
• round-trip latency is 15 us
19
SAN: System Area Networks
Standard Interconnect
Gbps Ethernet: 110 MBps
PCI: 70 MBps
UW Scsi: 40 MBps
• LAN faster than
memory bus?
• 1 GBps links in lab.
• 100$ port cost soon
• Port is computer
FW scsi: 20 MBps
scsi: 5 MBps
20
Plug & Play Software
• RPC is standardizing: (DCOM, IIOP, HTTP)
– Gives huge TOOL LEVERAGE
– Solves the hard problems for you:
• naming,
• security,
• directory service,
• operations,...
• Commoditized programming environments
–
–
–
–
FreeBSD, Linix, Solaris,…+ tools
NetWare + tools
WinCE, Win2K,…+ tools
JavaOS + tools
• Apps gravitate to data.
• General purpose OS on controller runs apps.
21
Outline
• What I am doing
• What my group is doing
• What Microsoft Research is doing
• How it affects Microsoft Products
and others
• Q&A
22
Scale Up and Scale Out
Grow Up with SMP
4xP6 is now standard
SMP
Super Server
Grow Out with Cluster
Cluster has inexpensive parts
Departmental
Server
Personal
System
Cluster
of PCs
Microsoft TerraServer:
Scaleup to Big Databases
http://terraserver.Micrsoft.com/
• Build a multi-TB SQL Server database
• Data must be
–
–
–
–
1 TB
Unencumbered
Interesting to everyone everywhere
And not offensive to anyone anywhere
• Loaded
– 1.5 M place names from Encarta World Atlas
– 4 M Sq Km from USGS (1 meter resolution)
– 1 M Sq Km from Russian Space agency (2 m)
• On the web (world’s largest atlas)
• Sell images with commerce server.
24
Microsoft TerraServer Background
• Earth is 500 Tera-meters square
– USA is 10 tm2
• 100 TM2 land in 70ºN to 70ºS
• We have pictures of 6% of it
• Someday
– multi-spectral image
– of everywhere
– once a day / hour
– 3 tsm from USGS
– 2 tsm from Russian Space Agency
•
•
•
•
Compress 5:1 (JPEG) to 1.5 TB.
Slice into 10 KB chunks
Store chunks in DB
Navigate with
– Encarta™ Atlas
.2x.2 km2 tile
.4x.4 km2 image
.8x.8 km2 image
1.6x1.6 km2 image
• globe
• gazetteer
– StreetsPlus™ in the USA
25
USGS Digital Ortho Quads (DOQ)
• US Geologic Survey
• 4 Tera Bytes
• Most data not yet published
• Based on a CRADA
– Microsoft TerraServer makes
data available.
1x1 meter
4 TB
Continental
US
New Data
Coming
USGS “DOQ”
26
Russian Space Agency(SovInfomSputnik)
SPIN-2 (Aerial Images is Worldwide Distributor)
SPIN-2
•
•
•
•
•
•
1.5 Meter Geo Rectified imagery of (almost) anywhere
Almost equal-area projection
De-classified satellite photos (from 200 KM),
More data coming (1 m)
Selling imagery on Internet.
Putting 2 tm2 onto Microsoft TerraServer.
27
Demo
http://www.TerraServer.
Microsoft.com/
Microsoft
BackOffice
SPIN-2
28
Hardware
Internet
Map Site
Server
Servers
SPIN-2
100 Mbps
Ethernet Switch
DS3
Web Servers
1TB Database Server
AlphaServer 8400 4x400. 10
GB RAM
324 StorageWorks disks
10 drive tape library
(STC Timber Wolf DLT7000 )
29
Software
Web Client
Image
Server
Active Server Pages
Internet
Information
Server 4.0
Java
Viewer
browser
MTS
Terra-Server
Stored Procedures
HTML
The Internet
Internet Info
Server 4.0
SQL Server 7
Microsoft Automap
ActiveX Server
TerraServer DB
Automap Server
TerraServer Web Site
Internet Information
Server 4.0
Microsoft
Site Server EE
Image Delivery SQL Server
Application
7
Image Provider Site(s)
30
System
Management &
Maintenance
• Backup and Recovery
–
–
–
–
STK 9710 Tape robot
Legato NetWorker™
SQL Server 7 Backup & Restore
Clocked at 80 MBps (peak)
(~ 200 GB/hr)
• SQL Server Enterprise Mgr
– DBA Maintenance
– SQL Performance Monitor
31
After a Year:
30M
Count
• 1 TB of data
750 M records
10M
• 2.3 billion Hits
• 2.0 billion DBB Queries
0
• 1.7 billion Images sent
• 368 million Page Views
• 99.93% DB Availability
• 3rd design now Online
• Built and operated by team
of 4 people
• In late July 99 Operations
missed 32 hr outage (!)
TerraServer Daily Traffic
Jun 22, 1998 thru June 22, 1999
Sessions
Hit
Page View
DB Query
Image
20M
Down Time
TotalTime (Hours)
(Hours:minutes)
8640
6:00
7920
5:30
7200
5:00
6480
Operations
4:30
5760
4:00
5040
4320
3600
2880
Up
3:30
3:00
2:30
Scheduled
2:00
2160
1:30
1440
1:00
720
0:30
0
0:00
HW+Software
32
TerraServer What Next
• Integrated with Encarta Online
(a classic technology transfer story)
• Adding USGS Topographic maps (4 TB more)
• Potential European coverage (?)
• Adding mult-layer maps (with UC Berkeley)
• Thinking about Geo-Spatial extension to SQL
Server
33
Automatic Testing
• 60% of Microsoft R&D is testing.
• What can research do to help?
– beyond joining the 500,000 Win2K beta testers
• Test generation robot:
Case
W
– Make up SQL queries
– Send them to SQL Server,
Oracle, DB2, Informix,…
– If answer is the same, great,
if not there is a problem
X
Y
1672
1672
232
234
241
31
1
1
1
1
31
15
12
28
1
12
5
116
0
29
32
4
18
18
19
25
45
19
18
113
All four
agree 84%
1672 1672
•
•
•
•
Also good for stress tests
Found MANY bugs in our products (all fixed).
Found MANY bugs in other’s products.
Very valuable tool.
•
MSR-TR-98-21 Massive Stochastic Testing of SQL, Slutz, Don
http://research.microsoft.com/scripts/pubDB/pubsasp.asp?RecordID=175
Error
Z
W,X, and Y
agree 95%
Problem with
intermediate
table.
34
Gordon Bell on
Tele Presentations
http://research.microsoft.com/barc/GBell/
35
Motivation:
Telepresentations
• Presenter and/or
audience telepresent
NOT: meeting or collaboration settings
Avoids the nasty social issues!
Mostly one-way
36
Telepresentation
Elements
Slides
Audio
• Video
• Script,
text comments,
hyperlinks,
etc.
37
Telepresentations:
The Essentials
• Slide and audio a must
• Add some video
(low quality)
to make us feel good
• Storage and transmission costs
low
38
Telepresentations:
The Killer App
• Increased attendance & lower travel costs
• Practical and low-cost NOW
• e.g. ACM97 - 2,000 visitors in real space,
20,000 visitors on Internet
http://research.microsoft.com/acm97
39
Today’s
Experiment
• Would you like to pause, rewind, browse?
• Do you wish you could have seen this
– At home?
– At another time?
• How much does a present speaker add? How much
would you pay for real presence?
40
University Lectures Online
•
•
•
•
Research lectures on-line & on-demand
http://murl.microsoft.com/
Will get UVC content
Available to anyone anywhere
– T1 good, 28.8 OK
• Generated by CMU, MIT, MSR, Stanford, UW, Xerox
• Hosted by MSR
41
Outline
• What I am doing
• What my group is doing
• What Microsoft Research is doing
• How it affects Microsoft Products
and others
• Q&A
42
Microsoft Research
• 450 Scientists
• University research model:
– Open publication
– Collaboration
– Many visitors
• Many research areas
• Major focus on break-throughs in
human-computer interfaces
• Mostly Redmond, Washington, USA
• Also Labs in
Beijing, China,
Cambridge UK,
San Francisco, CA, USA
43
Flows
Making “Flows” a Reality
• Computer Graphics
– Creating realistic looking environments, people
• Computer Vision
– Analyzing posture, gaze, gestures
• Speech input/output
• Natural Language
– Analysis, IR
• Implicit requests for information
45
Building life-like human characters
Generating life-like speech
from textual data
• Data-driven stochastic speech
– Natural sounding
– Rapid, automatic customizability
• Examples
– Synthetic voice w/ transplanted speech contours
47
Artificial singing
• AT&T Voder, 1962, by Homer Dudley
– Daisy (Inspiration for HAL’s voice in 2001)
• Microsoft Research Whistler, 1997
– Scarborough Fair
48
Understanding language:
MindNet
• Ten year investment
• A huge language knowledge base
• Automatically created from
dictionaries
• Words (nodes) linked by
relationships
• Millions of links
• Recently added (Encarta)
encyclopedia knowledge
working on web knowledge
49
Changing balance between user
& software systems
• Yesterday:
– Applications were single programs running in isolation
– Users used to (more or less) understand systems that they
used
• Today:
– Componentized applications operate in concert
– Sophisticated users understand only small percentage of
systems they use
50
Examples of user agents &
implicit actions
• Lumiere (Office 97)
– Monitoring user and program events to provide user
help and assistance
• Implicit queries
– Inferring information needs from browsing
• Lookout SpamKiller
– Monitoring mail activity to auto-categorize it
51
Tomorrow’s Systems and
Applications
• Users will not be able to predict
– where computations will be performed,
– when they will be performed or
– by what software components
• Gap between system capabilities and user
understanding will grow to the point that the only
way user will be able to use system is through
assisting agents
52
Millennium
• Long-term research project to eliminate distinction
between distributed and local computing.
• Raise the Level of Abstraction
App
App
App
AppApplication
Millennium
COM+
COM+
COM+
NT
NT
NT
• Maintain single system image.
• Transparent invocation,
migration, and recovery.
• Individual computers, file
systems, and networks become
unimportant to application
developers.
• System auto-configure, automonitor, auto-tune
53
Outline
• What I am doing
• What my group is doing
• What Microsoft Research is doing
• How Research affects Products
• Q&A
54
Analyzing language
• Language recognition shipped in Word 97
• General purpose text-critiquing, summarization,
Japanese word-breaking
55
Inside The Office
Grammar Checker
56
Microsoft ClearType
• 200% - 300% increase in resolution
• S/W solution that works on existing color LCD displays
57
SQL 7 Tuning Wizard
• Automate physical database design
• Analyzes actual server usage history
• Makes recommendations to improve performance
58
SQL 7 Index Wizard is Good
but will get better
• On a complex
query set
wizard is 90%
of best expert.
• Extending to
other aspects of
DB design
59
Data Mining
• Find interesting structure (patterns, relationships) in data
– Prediction
– Segmentation (clustering)
– Dependency modeling (find distribution)
– Summarization
– Trend and change detection and modeling
• Allow user to state the query in terms of the business logic
– User does not speak statistics or SQL
• Use data to build predictors
– regression, classification, segmentation etc.
• Generate summaries and reports for insight
– find “easy to describe” segments in data automatically
– find segments not known to analyst
60
Example Embedded Feature:
Microsoft SiteServer Commerce 3.0
• Intelligent Cross-sell
• Based on:
– Historical sales baskets in stores
– Contents of current shopper
basket
– Browsing behavior of shopper
• Predict: ranking of products in store
likely to be most interesting to
shopper.
Http://www.holtoutlet.com/outlet4
61
100.0%
98.5%
94.8%
68.5%
56.9%
43.8%
34.5%
25.5%
6.7%
5.3%
1.3%
0.6%
0.3%
0.2%
0.1%
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
% Captured of true targets
Mail to 25% and capture 40%
400% improved response!
% mailed
Real data drawn from a Microsoft marketing example
62
How do people use www.microsoft.com?
70M hits per day
10M users/week
User
browsing
data
X segments
Data Mining
(Clustering)
Engine
Cluster
Visualizer
Wizard
63
64
Windows 2000
IntelliMirror™
• Extends CMU Coda File System ideas
• Files and settings mirrored on
client and server
• Great for disconnected users
• Facilitates roaming
• Easy to replace PCs
• Optimizes network performance
• Grew out of a research prototype
65
WinSock Direct Path
– 70 us latency
– 86 MBps bandwidth
– USER LEVEL
(application to application)!
– On workstation-class PCs
• In Windows2000.
•
Microseconds
Sockets is the “standard” interface for programs.
TCP
LPC
VIA-copy
Internet apps all use sockets.
1600
So, can we make sockets fast?
1200
800
COM+ over SAN:
400
VIA-direct
0
0
2048
4096
6144
8192
Data size (bytes)
VIA-direct
VIA-copy
TCP
100
Bandwidth (MBps)
•
•
•
•
80
60
40
20
High-Performance Distributed Objects
0
over a System Area Network
0
16384
32768
Li, Li ; Forin, Alessandro ; Hunt, Galen ;
Data size (bytes)
Wang, Yi-Min, December 1998 MSR-TR-98-68
http://research.microsoft.com/scripts/pubDB/pubsasp.asp?RecordID=214
49152
65536
66
Outline
•
•
•
•
What I am doing
What my group is doing
What Microsoft Research is doing
How it affects Microsoft Products
and others
• Q&A
67
68
Scaleable Servers
Jim Gray
Microsoft Research
Gray@Microsoft.com
http://Research.Microsoft.com/~Gray/Talks
69
Exponential Growth
• Some enterprises growing well
15%/year
business as usual
• Some enterprises growing fast
50%/year
can ride Moore’s law
• Some enterprises are exploding
500%/year
huge demand for services
Example Hotmail: 300%/year
• New demand
– As customers go online
– As online allows global consolidation
70
Unpredictable Growth
• The TerraServer Story:
–
–
–
–
We expected 5 M hits per day
We got 50 M hits on day 1
We peak at 15-20 M hpd on a “hot” day
Average 5 M hpd after 1 year
• Most of us cannot predict demand
– Must be able to deal with NO demand
– Must be able to deal with HUGE demand
71
An Architecture for X?
• Need to be able to add capacity
– New processing
– New storage
– New networking
• Need continuous service
– Online change of all components (hardware and software)
– Multiple service sites
– Multiple network providers
• Need great development tools
– Change the application several times per year.
– Add new services several times per year.
72
Premise: Each Site is a Farms
• Buy computing by the slice:
– Rack of servers + disks.
• Grow by adding slices
– Spread data and computation to new slices
• Two styles:
– Clones: anonymous servers
– Mobs+Packs: Partitions fail over within a pack
• In both cases, remote site for disaster recovery
73
Clones: Availability+Scalability
• Some applications are
– read-mostly
– low consistency requirements
– Modest storage requirement (less than 1TB)
• Examples:
– HTML web servers (IP sprayer/sieve + replication)
– LDAP servers (replication via gossip)
• Replicate whole app at all nodes (clones)
• Spray requests across nodes.
• Grow by adding clones
• Fault tolerance: stop sending to that clone.
74
What Clones Need
• Automatic replication
– Applications (and system software)
– Data
• Automatic request routing
– Spray or sieve
• Management:
– Who is up?
– Update management & propagation
– Application monitoring.
75
Mobs for Scalability
• Clones do not always work.
• Some applications are
– statefull
– with fairly high update rate
• Examples
– Email
– Databases
• Partition state among servers (mob)
• Scalability:
– Partition split/merge
– Partitioning must be transparent to client.
76
Packs for Availability
• Each partition may fail (independent of others)
• Partitions migrate to new node via fail-over
– Failover in seconds
• Pack: the nodes supporting a partition
• Mobs typically grow in packs.
77
What Mobs+Packs Need
• Automatic partitioning (in dbms, mail, files,…)
– Location transparent
– Partition split/merge
• Simple failover model
– Partition migration is transparent
– MSCS-like model for services
• Application-centric request routing
• Management:
– Who is up?
– Automatic partition management (split/merge)
– Application monitoring.
78
AlwaysUP: site pairs
• Tape-based backup/restore too slow for online
• Keep online copy of data at second site
– Clone or transaction log
• Failover to second site in case of disaster
• Masks many
– Envrionmental faults
– Operations faults
– Some software faults
• Also eases many operations problems
79
MAS
• Manageability
• Availability
• Scalability
80
Manageability
• Manage growth & change of
–
–
–
–
Applications
Nodes/servers
Data
Sites
• Automate standard tasks
• Operations deals with exceptions
– A few per hour?
– If load grows 10x then a few per minute?
–  A few events per year is the safe zone!
81
System Management Must be Automatic
• Self operating
– propagate changes to all members of cluster
• Self tuning
– load balance, design,…
• Self repair
– Failover
– Software upgrades
– Call-home for replacement parts.
82
Availability
•
•
•
•
•
We monitor most large web sites.
They deliver worse than 99% availability.
There is a lot of “hype” about 6-9’s (99.99999% = ½ minute/year)
Those people are not counting scheduled downtime!
MCI scheduled 36 hours of downtime a few
weekends ago!
• Let’s talk end-to-end availability.
83
SMP+NUMA
84
The Largest TPC-C Benchmark
Sun+Oracle
115,395.73 tpmC @ 105.63 $/tpmC == 12.2 M$
available 8/22/99
27 Sun StorageEdge A3500 with 1778 disks (15.6 TB)
Sun E10000 + Solaris+ Oracle8i
64x400 Mhz cpus + 64 GB RAM
(8,430k$ hardware + 1,890 k$ software)
Tuxedo® on 32 Sun Ultra 10 333Mhz workstations, Wyse terminals
429k$ hardware, 168k$ software
85
Commodity TPC-C Benchmark
Compaq + Microsoft
40,266.4 tpmC @ 18.70 $/tpmC == 0.8 M$
available 12/31/99.
Compaq SmartArray RAID5 477 disks, 5TB, 418k$
Compaq Proliant 8000, Microsoft NT4, SQL 7
8x550MHz Intel Xeon, 4 GB RAM
101k$ hardware, 49k$ software
IIS web server on on 5 Compaq Proliant 2 x500 Mhz workstations,
35k$ hardware, 4k$ software
86
Unix Per Unit Costs Much Higher
$50
•
•
•
•
•
Disk $/MB
CPU/trans
Software/tran
Network/tran
Compaq:
–
–
–
–
3.3x more
11x more
27x more
~ same
3 vs 30 cabinets
5 vs 33 nodes
18 vs 96 cpus
Pure COM+
$45
$/tpmC by component
$40
Sun+Oracle
Compaq+Microsoft
$35
$30
$25
$20
$15
$10
$5
$0
cpu+mem
disks
Network
Oracle 8i on Sun Starfire 64x 115396 tpmC @ 105$/tpmC
%
Per tpmC
cpu+memory
3871
31%
$34
disks
5171
41%
$45
Network
438
3%
$4
software
3058
24%
$27
SUM
12538
100%
$109
TPMC
115396
Compaq 40,0013 tpmc @ 18.86$/tpmC
cpu+memory
127
17%
disks
439
58%
Network
141
19%
software
49
6%
SUM
756
100%
TPMC
40013
$3
$11
$4
$1
$19
software
Ram $/MB
Disk $/MB
$7.12
$0.33
ram$/MB
disk$/MB
$3.87
$0.09
87
What Conclusions?
•
•
•
•
•
Sun+Oracle has impressive performance but…
8x more cpus give 2.6x more throughput
The first 40 ktpmC cost 0.8 M$
The next 80 ktpmC cost 11.4 M$
Why not buy 3 “small” systems and spread the load?
– Save 9.8 M$
– Get fault tolerance (failover to other server)
– Would require partitioning the database and app.
• Scaleout with commodity components.
• Clone whole system at remote site for disaster protection
88
How much is 100 kTpmC?
•
•
•
•
•
100,000 users
Each submitting 2.3 transactions/minute
About 300 M transactions/day
A huge number!
About 1/3 of Yahoo! load
89
500
140
450
400
120
350
100
300
25080
20060
150
40
100
5020
0 0
IISPerformance
Performance
IIS
UP
UP
2P
2P
4P
8P
4P
Pages/Sec
Pages/Sec
Serving 1,000 statement ASPs
the new COM+
In-Proc
In-Proc
OOP
OOP
Win NT4
2000sp5
RC1
In-Proc
In-Proc
OOP
OOP
Win2000
2000build
beta 2111
Win
90
NCSA
• Super-computing performance
at mail-order prices.
91
Cornell
92
Beowolf
93
ASCII
94
www.Microsoft.com
a typical web cluster server
Building 11
Internal WWW
Staging Servers
(7)
Log Processing
Av e CFG: 4xP6,
1 GB RAM,
180 GB HD
Av e Cost: $128K
FY98 Fcst: 2
The Microsoft.Com Site
Ave CFG: 4xP5,
512 RAM,
30 GB HD
Ave Cost: $35K
FY98 Fcst: 12
SQLNet
Feeder LAN
Router
Liv e SQL Serv ers
MOSWest
Admin LAN
Live SQL Server
All servers in Building11
are accessable from
corpnet.
register.microsoft.com
(2) Ave CFG: 4xP6,
home.microsoft.com
(4)
w w w .microsoft.com
(4)
premium.microsoft.com
Ave CFG: 4xP6, (2)
Av e CFG: 4xP6
512 RAM
28 GB HD
Av e Cost: $35K
FY98 Fcst: 17
FDDI Ring
(MIS1)
FDDI Ring
(MIS2)
activex.microsoft.com
(2)
Av e CFG: 4xP6,
256 RAM,
30 GB HD
Av e Cost: $25K
FY98 Fcst: 2
Router
Internet
premium.microsoft.com
(1)
w w w .microsoft.com
(3)
register.msn.com
(2)
Switched
Ethernet
search.microsoft.com
(1)
Japan Data Center
w w w .microsoft.com SQL SERVERS
(2)
premium.microsoft.com
(3)
Av e CFG: 4xP6,
(1)
Av e CFG: 4xP6,
512 RAM,
Av e CFG: 4xP6,
512 RAM,
30 GB HD
Av e Cost: $35K
FY98 Fcst: 1
512 RAM,
50 GB HD
Av e Cost: $50K
FY98 Fcst: 1
160 GB HD
Av e Cost: $80K
FY98 Fcst: 1
msid.msn.com
(1)
Switched
Ethernet
FTP
Download Serv er
(1)
HTTP
Download Serv ers
(2)
search.microsoft.com
(2)
Router
Secondary
Gigaswitch
Router
(100 Mb/Sec Each)
support.microsoft.com
(2)
Ave CFG: 4xP6,
512 RAM,
30 GB HD
Ave Cost: $35K
FY98 Fcst: 9
13
DS3
(45 Mb/Sec Each)
Ave CFG: 4xP5,
512 RAM,
30 GB HD
Ave Cost: $28K
FY98 Fcst: 0
register.microsoft.com
(2)
support.microsoft.com
search.microsoft.com
(1)
(3)
2
Ethernet
Router
FTP.microsoft.com
(3)
register.microsoft.com
(1)
(100Mb/Sec Each)
Internet
Router
msid.msn.com
(1)
2
OC3
Primary
Gigaswitch
Router
FDDI Ring
(MIS3)
Av e CFG: 4xP6,
512 RAM,
160 GB HD
Av e Cost: $80K
FY98 Fcst: 1
Router
Router
Av e CFG: 4xP5,
256 RAM,
20 GB HD
Av e Cost: $29K
FY98 Fcst: 2
Av e CFG: 4xP6,
512 RAM,
30 GB HD
Av e Cost: $28K
FY98 Fcst: 7
Router
msid.msn.com
(1)
SQL SERVERS
(2)
Router
Av e CFG: 4xP6,
512 RAM,
30 GB HD
Av e Cost: $28K
FY98 Fcst: 3
cdm.microsoft.com
(1)
Av e CFG: 4xP5,
256 RAM,
12 GB HD
Av e Cost: $24K
FY98 Fcst: 0
FTP
Download Serv er
(1)
msid.msn.com
(1)
search.microsoft.com
(3)
home.microsoft.com
(3)
Av e CFG: 4xP6,
1 GB RAM,
160 GB HD
Av e Cost: $83K
FY98 Fcst: 2
msid.msn.com
(1)
512 RAM,
30 GB HD
Ave Cost: $43K
FY98 Fcst: 10
Av e CFG: 4xP6,
512 RAM,
50 GB HD
Av e Cost: $50K
FY98 Fcst: 17
512 RAM,
30 GB HD
Ave Cost: $35K
FY98 Fcst: 3
Av e CFG: 4xP6,
512 RAM,
160 GB HD
Av e Cost: $83K
FY98 Fcst: 12
Av e CFG: 4xP6,
512 RAM,
50 GB HD
Av e Cost: $35K
FY98 Fcst: 2
512 RAM,
30 GB HD
Av e Cost: $35K
FY98 Fcst: 1
512 RAM,
50 GB HD
Av e Cost: $50K
FY98 Fcst: 1
SQL Consolidators
DMZ Staging Serv ers
Router
w w w .microsoft.com
(4)
home.microsoft.com
(2)
w w w .microsoft.compremium.microsoft.com
(1)
Av e CFG: 4xP6,
Av e CFG: 4xP6, (3)
MOSWest
FTP Servers
Ave CFG: 4xP5,
512 RAM,
Download 30 GB HD
Replication Ave Cost: $28K
FY98 Fcst: 0
SQL Reporting
Av e CFG: 4xP6,
512 RAM,
160 GB HD
Av e Cost: $80K
FY98 Fcst: 2
European Data Center
IDC Staging Serv ers
w w w .microsoft.com
(5)
Internet
FDDI Ring
(MIS4)
home.microsoft.com
(5)
95
TerraServer as an Example
96
TerraServer Manageability
97
TerraServer Availability
98
TerraServer Scalability
99
Download