Windows NT Scalability Jim Gray Microsoft Research

advertisement
Windows NT Scalability
Jim Gray
Microsoft Research
Gray@Microsoft.com
http/www.research.Microsoft.com/~Gray/talks/
Outline
• Scalability: What & Why?
• Scale UP: NT SMP scalability
• Scale OUT: NT Cluster scalability
• Key Message:
– NT can do the most demanding apps
today.
– Tomorrow will be even better.
What is Scalability?
Super
Server
Server Cluster
Server
PC
Workstation
Portable
Win Term
NetPC
Handheld
TV
• Grow without limits
– Capacity
– Throughput
• Do not add complexity
– design
– administer
– Operate
– Use
ScaleServer
UPCluster
& OUT Focus Here
• Grow without limits
Super
Server
Server
•
– SMP: 4, 8, 16, 32 CPUs
– 64-bit addressing
– Huge storage
Cluster Requirements
– Auto manage
– High availability
– Transparency
– Programming tools & apps
Scalability is Important
• Automation benefits growing
– ROI of 1 month....
• Slice price going to zero
Server
– Cyberbrick costs 5k$
• Design, Implement & Manage
cost going down
– DCOM & Viper make it easy!
– NT Clusters are easy!
• Billions of clients imply
•
millions of HUGE servers.
Thin clients imply huge servers.
Q: Why Does Microsoft Care?
A: Billions of clients need millions of servers
2,700
2,400
2,100
1,800
1,500
1,200
900
600
300
0
Servers Shipped
per year
WindowsNT
Server
(97-01 are MS estimates)
NetWare
Unix
1994
1995
1996
1997
1998
Expect Microsoft to work hard on
Scaleable Windows NT and
Scaleable BackOffice.
Key technique: INTEGRATION.
1999
2000
2001
Outline
• Scalability: What & Why?
• Scale UP: NT SMP scalability
• Scale OUT: NT Cluster scalability
• Key Message:
– NT can do the most demanding apps today.
– Tomorrow will be even better.
How Scaleable is NT??
The Single Node Story
• 64 bit file system in NT 1, 2, 3, 4, 5
• 8 node SMP in NT 4.E, 32 node OEM
• 64 bit addressing in NT 5
• 1 Terabyte SQL Databases (PetaByte capable)
• 10,000 users (TPC-C benchmark)
• 100 Million web hits per day (IIS)
• 50 GB Exchange mail store
next release designed for 16 TB
• 50,000
POP3 users on Exchange
(1.8 M messages/day)
• And, more coming…..
Windows NT Server
• Scalability
Enterprise Edition
– 8x SMP support (32x in OEM kit)
– Larger process memory (3GB Intel)
– Unlimited Virtual Roots in IIS (web)
• Transactions
– DCOM transactions (Viper TP mon)
– Message Queuing (Falcon)
• Availability
– Clustering (WolfPack)
– Web, File, Print,DB … servers fail over.
What Happens in 10 Years?
1987: 256 tps
$ 14 million computer
A dozen people
Two rooms of machines
1997: 1,250 tps
$ 50 k$ computer
One person
1 micro-dollar per transaction
(1,000x cheaper)
Ready for the next 10 years?
NT vs UNIX SMPs
•
•
NT traditionally ran on 1 to 4 cpus
– Scales near-linear on them
tpmC vs Time
UNIX boxes: 32-64 way SMPs
35,000
tpmC vs Time
30,000
– They do 3x more tpmC
25,000
35,000
tpmC vs Time
20,000
30,000
– They cost 10x more.
15,000
25,000
35,000
20,000
10,000
10 way NT machines are available
Unix
30,000
15,000
5,000
NT
Unix
25,0000
10,000
– They cost more
20,000
5,000
Jan-95
Jan-96
Jan-97 NT
0
15,000
– They are faster
10,000
Jan-95
Jan-96
Jan-97
5,000
My view (shared by many)
0
Jan-95
Jan-96
Jan-97
– Need clusters for availability
– Cluster commodity servers to make huge systems
tpmC tpmC
•
h
h
tpmC
h
•
– a la Tandem, Teradata, VMScluster, IBM Sysplex, IBM SP2
– Clusters reduce need for giant SMPs
Transaction Throughput TPC-C
• On comparable hardware: NT scales better!
• SQL Server & NT Improving 250% per year
• NT has best Price Performance (2x cheaper)
tpmC on Intel CPUs
tpmC vs Intel CPUs
NT all
14,000
tpmC
10,000
8,000
h hhhh
h
6,000
4,000
2,000
0
0 1 2 3 4 5 6 7 8 9 10
tpmC
NT
UNIX
12,000
14,000
12,000
10,000
8,000
6,000
4,000
2,000
0
NT Best
Unix best
h
h
0 1 2 3 4 5 6 7 8 9 10
NT Scales Better Than Solaris
• Microsoft SQL
20,000
15,000
tpmC
•
NT
Intel
scales to 6x
Beats Sybase
Solaris
UltraSPARC
up to 11-way
10,000
5,000
0
0
10
cpus
20
New News: SUN is Waking Up
• Sybase on 4x Sun UltraSPARC
•
•
– 4x250Mhz 57$/tpmC @ 11.6 ktpmC
– 6x300Mhz 69$/tpmC @ 14.6 ktpmC
Microsoft & Unisys
– 4x200Mhz 43$/tpmC @ 10.7 ktpmC
– 6x200Mhz 40$/tpmC @ 12.2 ktpmC
SUN:
– 10% better performance
– 20% higher unit price
New News: HPUX is New Leader
• Sybase on HP 8x SMP scales to 40 ktpmC!
• Price/Performance is flat (no diseconomy)
Sybase & HP tpmC vs CPUs
HP + Sybase $/tpmC vs tpmC
45000
40000
35000
$/tmpC
tpmC
30000
25000
20000
15000
10000
5000
$140
$120
$100
$80
$60
$40
$20
$0
0
10000
20000
0
0
5
10
cpus
15
20
tpmC
30000
40000
Sun/Solaris More Competitive
TPC Price/tpmC
• Competitive prices
• no premium
50
45
40
except on CPUs
38
37
34 33
35
29
30
24
25
21
20
20
15
Oracle on UltraSPARC, 31 k tpmC
HP-Sybase 39K tpmC
SUN & Sybase 11.6 ktpmC
Microsoft, HP, 9.1 k tpmc
16
12
10
10
11
9
6
7
6
8
9
6
5
5
0
processor
disk
software
net
total/10
Only NT Has Economy of Scale
• NT is 2x less
•
25.0
20.0
Microsoft/NT
tpmC/k$
•
expensive
40$/tpmC
vs 110$/tpmC
Only NT has
economy of
scale
Unix has
dis-economy
of scale
Transactions/k$ by vendor
15.0
Oracle/Unix
Sybase/Unix
10.0
Informix/Unix
DB2/Unix
5.0
0.0
0
10,000
20,000
tpmC
30,000
40,000
TPC-D Decision Support
Benchmark
• NT has good performance and
price/performance.
TPC D 100 GB results
3,000
Price/Perf ($/QthD)
2,500
More
Througput
2,000
NT
1,500
NT
1,000
Lower
price
NT
500
0
200
400
600
800
Performance
1000
1200
1400
1600
•
•
•
Scaleup To Big Databases?
NT 4 and SQL Server 6.5
– DBs up to 1 Billion records,
– 100 GB
– Covers most (80%) data warehouses
SQL Server 7.0
– Designed for Terabytes
• Hundreds of disks per server.
• SMP parallel search
– Data Mining and Multi-Media
TerraServer is good MM example
Satellite
photos of
Earth (1 TB)
Dayton-Hudson
Sales records
(300GB)
Human Genome
(3GB)
Manhattan phone book
(15MB)
Excel
spreadsheet
Database Scaleup: TerraServer™
•
•
•
•
•
•
•
Demo NT and SQL Server scalability
Stress test SQL Server 7.0
Requirements
– 1 TB
– Unencumbered (put on www)
– Interesting to everyone everywhere
– And not offensive to anyone anywhere
Loaded
– 1.1 M place names from Encarta World Atlas
– 1 M Sq Km from USGS (1 meter resolution)
– 2 M Sq Km from Russian Space agency (2 m)
Will be on web (world’s largest atlas)
Sell images with commerce server.
USGS CRDA: 3 TB more coming.
TerraServer System
•
•
•
•
•
•
DEC Alpha 4100 (4x smp) +
324 StorageWorks Drives (1.4 TB)
RAID 5 Protected
SQL Server 7.0
USGS 1-meter data
(30% of US)
Russian Space data
Two meter
resolution
SPIN-2
images
(2 M km2
2% of earth)
Demo
http://msrlab/terraserver
Manageability
Windows NT 5.0 and Windows 98
• Active Directory tracks all objects in net
• Integration with IE 4.
–Web-centric user interface
• Management Console
–Component architecture
• Zero Admin Kit and Systems
Management Server
• PlugNPlay, Instant On, Remote Boot,..
• Hydra and Intelli-Mirroring
Thin Client Support
TSO comes to NT
lower per-client costs
Net PC
Windows NT Server
with “Hydra” Server
Existing,
Desktop PC
MS-DOS,
UNIX,
Mac
clients
Dedicated
Windows
terminal
Windows NT 5.0
IntelliMirror™
• Extends CMU Coda File System ideas
• Files and settings mirrored on
•
•
•
•
client and server
Great for disconnected users
Facilitates roaming
Easy to replace PCs
Optimizes network performance
Best of PC and
centralized computing advantages
Outline
• Scalability: What & Why?
• Scale UP: NT SMP scalability
• Scale OUT: NT Cluster scalability
• Key Message:
– NT can do the most demanding apps
today.
– Tomorrow will be even better.
•
•
•
•
Scale OUT
Clusters Have Advantages
Fault tolerance:
– Spare modules mask failures
Modular growth without limits
– Grow by adding small modules
Parallel data search
– Use multiple processors and disks
Clients and servers made from the same stuff
– Inexpensive: built with
commodity CyberBricks
How scaleable is NT??
The Cluster Story
• 16-node Tandem Cluster
•
– 64 cpus
– 2 TB of disk
– Decision support
45-node Compaq Cluster
– 140 cpus
– 14 GB DRAM
– 4 TB RAID disk
– OLTP (Debit Credit)
• 1 B tpd (14 k tps)
microsoft.com
•
•
•
•
90m hits/day
– 17m page views
– #4 site on Internet
900k visitors per day
Not cheap
– Data Centers
– Bandwidth
– 27 people on content
– 22 people on systems
•
•
•
Production
– Windows NT.4 and IIS.3
• 20 HTTP,
• 3 download,
• 3 FTP
• 5 SQL 6.5
• Index Server + 3 search
Stagers
– Site Server for content
– DCOM Publishing wizard
Network
– 6 DS3
– 4 TB/day download capacity
Replicas in UK and Japan
Tandem 2 Ton
• 2 TB SQL database
• 1.2 TB user data
• 16 node cluster
• 64 cpus, 480 disks
• Decision support
parallel data-mining
• Will be Wolf Pack aware
• Demoed at DB Expo in
• ServerNet™ interconnect
Billion Transactions per Day Project
•
Built a 45-node Windows NT Cluster
(with help from Intel & Compaq)
•
•
•
•
•
•
> 900 disks
All off-the-shelf parts
Using SQL Server &
DTC distributed transactions
DCOM & ODBC clients
on 20 front-end nodes
DebitCredit Transaction
Each server node has 1/20 th of the DB
Each server node does 1/20 th of the work
15% of the transactions are “distributed”
Billion Transactions Per Day Hardware
• 45 nodes (Compaq Proliant)
• Clustered with 100 Mbps Switched Ethernet
• 140 cpu, 13 GB, 3 TB (RAID 1, 5).
Type
Workflow
MTS
SQL Server
Distributed
Transaction
Coordinator
TOTAL
nodes
CPUs
DRAM
ctlrs
disks
20
Compaq
Proliant
2500
20
Compaq
Proliant
5000
5
Compaq
Proliant
5000
45
20x
20x
20x
20x
RAID
space
20x
2
128
1
1
2 GB
20x
20x
20x
20x
4
512
4
20x
36x4.2GB
7x9.1GB
130 GB
5x
5x
5x
5x
5x
4
256
1
3
8 GB
140
13 GB
105
895
3 TB
Local Debit Credit
Driver
Thread
DebitCredit
Driver
DebitCredit
Component
Database
1
2
4
3
Run
5
6
Init
8
9
Loop
10
7
DebitCredit
11
12
13
14
DebitCredit
Distributed Debit Credit Same DTC
Database1
Database2
18
11
DebitCredit
21
UpdateAcct
22
23
12
DTC
19
13
20
14
25
15
16
17
24
25
26
27
28
29
26
27
28
Distributed Debit Credit Different DTC
Database1
Database2
20
23
11
DebitCredit
24
UpdateAcct
25
12
DTC1
13
21
14
22
15
16
17
19
18
26
27
27
30
30
31
31
34
35
34
28 29
33
32
DTC2
1.2 B tpd
• 1 B tpd ran for 24 hrs.
• Out-of-the-box software
• Off-the-shelf hardware
• AMAZING!
•Sized for 30 days
•Linear growth
•5 micro-dollars
per transaction
•
•
1 billion tpd = 11,574 tps
~ 700,000 tpm (transactions/minute)
ATT
Millions of Transactions Per Day
– 185 million calls per
peak day (worldwide)
1,000.
900.
800.
Visa ~20 million tpd
100.
700.
600.
– 400 million customers
500.
10.
400.
– 250K ATMs worldwide
300.
1.
200.
– 7 billion transactions
100.
0.
0.1
(card+cheque) in 1994
1 Btpd Visa
ATT BofA NYSE
New York Stock Exchange
– 600,000 tpd
Bank of America
– 20 million tpd checks cleared (more than any other bank)
– 1.4 million tpd ATM transactions
Worldwide Airlines Reservations: 250 Mtpd
Mtpd
•
How Much Is 1 Billion Tpd?
•
•
•
37
1 B tpd: So What?
• Shows what is possible, easy to build
•
•
– Grows without limits
Shows scaleup of DTC, MTS, SQL…
Shows (again) that shared-nothing
clusters scale
• Next task: make it easy.
– auto partition data
– auto partition application
– auto manage & operate
Cluster Server: High Availability
• Multiple servers form one system
• Industry standard APIs and hardware
• Server application and tools support
– IIS web server
– File and Print servers
– IP and NetName failover
– Transaction and Queue Server failover
– SQL Server, Enterprise edition
• Tight integration with Windows NT -- its easy!
• Two-Node clusters now (2 to 20 cpus)
• 16 node soon (2 to 192 cpus).
WolfPack Cluster
IIS & SQL Failover Demo
Browser
Alice
Betty
Web
site
Web
site
Database
Database
Web site files
Database files
Summary
• SMP Scale UP: OK but limited
• Cluster Scale OUT: OK and unlimited
• Manageability:
•
•
•
– fault tolerance OK & easy!
– more needed
CyberBricks work
Manual Federation now
Automatic in future
Scalability Research Problems
• Automatic everything
• Scaleable applications
•
•
•
•
•
– Parallel programming with clusters
– Harvesting cluster resources
Data and process placement
– auto load balance
– dealing with scale (thousands of nodes)
High-performance DCOM
– active messages meet ORBs?
Process pairs, other FT concepts?
Real time: instant failover
Geographic (WAN) failover
Download