How High is High Performance Transaction Processing? Jim Gray, Microsoft Research HPTS 99

advertisement
How High is High Performance
Transaction Processing?
Jim Gray, Microsoft Research
HPTS 99
Asilomar, CA 1 Oct 1999
http://research.Microsoft.com/~Gray/Talks/
1
Outline
• Sizing the business: 0B$ or 1T$?
• TP is dead: long live HTTP-XML
– i.e. TP monitors morph to web servers.
• Transactions are C2C (b2b not enough)
• Scaleability terminology if there is time.
2
Where are we?
Where are we headed
(will we run out of transactions)?
What’s Ultimate tpd demand?
•We started out to do
1,000 transactions per
second
•1985 Datamation article
What’s Current tpd demand?
1Ktps = 80 Mtpd
3
time
•
TpcA
1998:
–
208 DebitCredit @ 45 k$/tps (Tandem 32x T16)
• 1991 Sept
– 212 tpsA @ 16,331 $/tpsA (Tandem 64xCLX)
– 62 tpsA @ 11,945 $/tpsA (Rdb 6xVAX)
• 1993 Sept
– 1,002 tpsA @ 9,313 $/tpsA (Oracle 32x Sequent)
– 529 tpsA @ 6,341 $/tpsA (Rdb 4xAlpha)
• 1995 Peak
– 3,692 tpsA @ 4,873$/tpsA (Rdb 20x Alpha)
– 662 tpsA @ 4,401 $/tpsA (Rdb 4xAlpha)
• 1999 guess
– 4,000 tpsA @
200 $/tpsA = (= 320 M tpd) PER NODE!!
4
• Jan 1993:
–
–
TPC C Progress
23 tpmC @ 2.3k $/tpmC
269 tpmC @ 3.0k $/tpmC
• Nov 1995:
–
2,455 tpmC @ 242 $/tpmC (SS 4x P5 133)
– 11,456 tpmC @ 286 $/tpmC (Oracle 8xAlpha 350)
• Sept 1997
–
–
12,026 tpmC @ 40 $/tpmC (SS 6x P6 200)
39,469 tpmC @ 95 $/tpmC (Sybase 16xHPPA 200)
• Sept 1999
– 40,368 tpmC @ 19 $/tpmC (SS 8xIntel 550)
– 135,461 tpmC @ 97 $/tpmC (Oracle 4x24 Sparc 400)
5
Prediction
• 1 MtpmC @ 10$/tpmC in 3 years (or tpcC disappears)
– 25 M$ of stuff today
– 8 M$ of stuff in 3 years
• That’s ~ 3 Btpd at much less than 0.01$/tpd
a PENNY PER tpd.
6
How Many tpds Are There?
•
•
•
•
10 second think time
12 hour days
5,000 tpd/person
6 billion people
• 30 Ttpd
– actual guess is 100x less than that
– People think slower, work/play less
– Not everyone is wired
7
Where Are We?
• Market is not saturated
1998 IBM annual report:
20 Btpd on IBM systems (0.1% of demand).
• It’s a big market
40k tpmC @20$/tpmC
~ 100 Mtpd @ 1M$ (a penny per tpd)
So: 30 Ttpd ~ 300B$ industry for hw/sw
8
Wow!! 300 B$ Business, GREAT!!
• But….
– What about my 100x over-estimate?
• A 3B$ business?
– What about Moore’s law: 2x decline/year?
• A 0B$ business?
• Time to find a second career?
• Go into services/consulting/operations?
• Count on shadow transactions?
every human transaction
= 100 shadow transactions (B2B)
9
Conclusion: Sizing traditional TP
• It’s a big business now and for a while
• But people will be limited to 30 Ttpd
– Ultimately a 0B$ industry
• A penny per tpd today
(a microdollar per transaction)
• Web nearing commercial TP rates
10
Outline
• Sizing the business: 0B$ or 1T$? (30 Ttpd)
• TP is dead: long live HTTP-XML
– i.e. TP monitors morph to web servers.
• Transactions are C2C (b2b not enough)
• Scaleability terminology if there is time.
11
A Brief History of Computing
• In the beginning there was batch.
– Automated back office
• Then came Timesharing/OLTP
– Automated front office
• Then came the web
– Automated the customer
We are here
• Then came ?
– Automated the process
– Computers talk to computers
12
Some busy web servers
• AOL: ~ 3 B hits per day (~3B tpd)
• Yahoo: ~ 1 B hits per day
• Top 10: ~ 6 B hits per day
13
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
AOL
Yahoo
MSN
Hotmail
IBM
Compaq
Dell
ATT
Lucent
Cisco
Oracle
NASDAQ
NYSE
FedEx
LL Bean
Schwab
Etrade
Ebay
Amazon
NaviServer/IRIX
Apache/FreeBSD, IIS/NT
IIS/NT
Apache/FreeBSD
Domino/AIX
IIS/NT
IIS/NT
•
Netscape/Solaris
Netscape/IRIX
•
Netscape/Solaris
Oracle/Solaris
•
IIS/NT
•
Netscape/AIX
Netscape/Solaris
Netscape/AIX
•
Netscape/Solaris (!!!)
Netscape/Solaris
IIS/NT
Netscape/DEC Unix
So, What happened
to the TP monitor?
???
???
???
???
CICS/NT
CICS/AIX
CICS/OS390
Tuxedo?
!!!!
Yes, there are
HTTP-SNA front ends,
but… why??
14
Body Count
• 7 M servers (IP addresses)
– Minus squatters
– Plus servers behind the firewall
– Plus Intranets
15
Courtesy of Netcraft http://www.netcraft.com/survey/
How many TP monitors are there?
• Guess: 100,000 nodes
– (CICS, CICS, CICS, Tuxedo, IMS, Encina, ACMS, Pathway, …)
• IBM estimate of 20 B tpd on IBM gear is impressive
– equal current Internet traffic (ignore intranets).
– At mainframe prices (200$/tpmC)
~ 0.25$/tpdC ~ 4 B$ seems very conservative
– Installed base ~ 150B$, so suggests ~ 1 T tpd
•
•
•
•
Or transactions MUCH bigger than tpmC
Or low utilization (peak:average is 10:1 ?)
Or, systems not doing TP.
Or, more expensive
• Probably, most of these statements are true
16
Claim: Climbing the value chain
TP = ORB = HTTP = Tplite = RPC =IPspray
•
•
•
•
•
TP monitors multiplex clients to servers & manage servers
ORBs multiplex invocations to methods and…
HTTP servers multiplex GET/POST to pages and…
TPlite multiplex clients to stored procs and…
RPC multiplex callers to callees and…
– They are stealing our tp tricks.
• The revenge of TPlite:
– It’s 3 tier, but your/my stuff
is not in the middle tier.
17
Web servers are behind but…
• They are learning about manageability.
• They are learning about functionality
• Take a page from the OO vs OR war:
– Easier to add Objects to a DB
– Than add DB to Objects.
• Guess:
– Easier to add HTTP to TP
– Than to add TP to HTTP
18
Web Servers are Learning TP Tricks
• Multiplexing server pools:
1 year’s progress
– Fast CGI
– COM+ connection pooling
1P
• Client context
2P
– Cookies
– Sessions
8P
4P
• Load balancing (various)
• Security (SSL, certificates, CR,…)
• High availability
1-ball
2-ball
1-ball
2-ball
1-ball
2-ball
1-ball
2-ball
– Failover, IP mobility, Redirection,…
• Queuing
19
Outline
• Sizing the business: 0B$ or 1T$? (30 Ttpd)
• TP is dead: long live HTTP-XML
– i.e. TP monitors morph to web servers.
• Transactions are C2C (b2b not enough)
• Scaleability terminology if there is time.
20
The birth of C2C
• So, if the Person2Computer business is 0B$ (30 Ttpd)
• Then the B2B business (shadow transactions) is 100B$
– Great, but what about Moore’s law
• Solution:
–
–
–
–
C2C transactions: computers to computers
No people involved!
Ultimate automation.
Smart dust (Avagadro’s number of users)
21
C2C TP (actually making it work)
• Interop is key (no people to do format translation).
• Use a STANDARD protocol
– No IBM/Microsoft/Intel “standard”
• Standard protocols:
–
–
–
–
–
HTTP++ (get/post/queue/dequeue)
XML for data format
Transactions and queues
Authentication/Authorization (certificates/signatures)
Pico-pricing
• Does this sound like B2B?
22
User Productivity:
one person generates 1K C2C*
transactions
* computer to computer
• No obvious limit to the number of tps.
• Obvious need for transaction properties
– That’s how social systems work (transactions).
• So, its not a 0B$ industry (thank goodness!)
• But, its not your father’s TP monitor.
23
Some sample XML
(outer frame of this talk in XML)
<xml xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:p="urn:schemas-microsoft-com:office:powerpoint">
Go here for DTDs
• Anyone can define
anything.
• “When I use a
word it means
exactly what I want
it to mean, nothing
more and nothing
less.”
24
<p:presentation>
<p:... slots="title,body,dateTime,footer,slideNumber">
...
</p:master>
<p:slide id="1" href="slide0001.htm" layout="title_subtitle"
slots="centerTitle,subTitle"/>
...
<p:viewstate type="slideView" slidehref="slide0030.htm"/>
<p:font name="Times New Roman" charset="0" type="4" family="18"/>
<p:headersfooters noheader="t"/>
<p:pptdocumentsettings framecolors="WhiteTextOnBlack" hideslideanimation="t"/>
</p:presentation>
<o:shapedefaults v:ext="edit" spidmax="34820">
<o:colormru v:ext="edit" colors="#3cc,#09f,fuchsia,#6f3,#ff6"/>
<o:colormenu v:ext="edit" fillcolor="#ff6"/>
</o:shapedefaults></xml>
Yes, but…..
• XML is WONDERFUL!
• But.. XML is no panecea.
• XML uses DTDs
Syntax & presentation, not schema, not semantics!!!
• DTDs as a competitive advantage
–
–
–
–
SAP will publish its DTDs
Defines customer, employee, … (all business objects)
Other vendors will “compete” for these definitions
Lassettre’s Dog: (www.ibm.com/dog, www.jim.com/dog, ….)
no ‘www.top.org/dog’
• World needs a global schema
– Data + Methods (= semantics)
25
Oh!! And by the way….
• B2B and C2C need workflow
– Scripts
– Execution
– Status
• Good luck….
26
Summary
• People are a 30 Ttpd business
• B2B is a 3 Ptpd business
– (shadow transactions)
• C2C is an infinite business (smart dust)
• The web servers are coming! The web servers are coming!
• XML needs schema definitions
– Syntax + Semantics.
– Formats + Protocols + workflows
27
Outline
• Sizing the business: 0B$ or 1T$? (30 Ttpd)
• TP is dead: long live HTTP-XML
– i.e. TP monitors morph to web servers.
• Transactions are C2C (b2b not enough)
• Scaleability terminology if there is time.
28
Terminology for scaleability
Farm
• Farms of servers:
– Clones: identical
• Scaleability + availability
Clone
Partition
– Partitions:
• Scaleability
– Packs
Pack
• Partition availability via fail-over
29
Unpredictable Growth
• The TerraServer Story:
–
–
–
–
We expected 5 M hits per day
We got 50 M hits on day 1
We peak at 15-20 M hpd on a “hot” day
Average 5 M hpd after 1 year
• Most of us cannot predict demand
– Must be able to deal with NO demand
– Must be able to deal with HUGE demand
30
An Architecture for Internet Services?
• Need to be able to add capacity
– New processing
– New storage
– New networking
• Need continuous service
– Online change of all components (hardware and software)
– Multiple service sites
– Multiple network providers
• Need great development tools
– Change the application several times per year.
– Add new services several times per year.
31
Farm
Premise: Each Site is a
• Buy computing by the slice (brick):
Building 11
– Rack of servers + disks.
• Grow by adding slices
Internal WWW
Staging Servers
(7)
Log Processing
Av e CFG: 4xP6,
1 GB RAM,
180 GB HD
Av e Cost: $128K
FY98 Fcst: 2
• Two styles:
The Microsoft.Com Site
SQLNet
Feeder LAN
Router
Liv e SQL Serv ers
MOSWest
Admin LAN
Live SQL Server
All servers in Building11
are accessable from
corpnet.
w w w .microsoft.com
(4)
register.microsoft.com
(2) Ave CFG: 4xP6,
Ave CFG: 4xP6,
512 RAM,
30 GB HD
Ave Cost: $35K
FY98 Fcst: 3
Av e CFG: 4xP6,
512 RAM,
160 GB HD
Av e Cost: $83K
FY98 Fcst: 12
Av e CFG: 4xP6,
512 RAM,
50 GB HD
Av e Cost: $35K
FY98 Fcst: 2
home.microsoft.com
(4)
Av e CFG: 4xP6
512 RAM
28 GB HD
Av e Cost: $35K
FY98 Fcst: 17
FDDI Ring
(MIS1)
home.microsoft.com
(3)
FDDI Ring
(MIS2)
Av e CFG: 4xP6,
256 RAM,
30 GB HD
Av e Cost: $25K
FY98 Fcst: 2
Router
Internet
register.msn.com
(2)
Switched
Ethernet
search.microsoft.com
(1)
Japan Data Center
w w w .microsoft.com SQL SERVERS
(2)
premium.microsoft.com
(3)
Av e CFG: 4xP6,
(1)
Av e CFG: 4xP6,
512 RAM,
Av e CFG: 4xP6,
512 RAM,
30 GB HD
Av e Cost: $35K
FY98 Fcst: 1
512 RAM,
50 GB HD
Av e Cost: $50K
FY98 Fcst: 1
160 GB HD
Av e Cost: $80K
FY98 Fcst: 1
msid.msn.com
(1)
Switched
Ethernet
FTP
Download Serv er
(1)
HTTP
Download Serv ers
(2)
search.microsoft.com
(2)
Router
Secondary
Gigaswitch
support.microsoft.com
search.microsoft.com
(1)
(3)
Router
support.microsoft.com
(2)
13
DS3
(45 Mb/Sec Each)
Ave CFG: 4xP5,
512 RAM,
30 GB HD
Ave Cost: $28K
FY98 Fcst: 0
register.microsoft.com
(2)
register.microsoft.com
(1)
(100Mb/Sec Each)
Internet
Router
FTP.microsoft.com
(3)
msid.msn.com
(1)
2
OC3
Primary
Gigaswitch
Router
Router
Av e CFG: 4xP5,
256 RAM,
20 GB HD
Av e Cost: $29K
FY98 Fcst: 2
Av e CFG: 4xP6,
512 RAM,
30 GB HD
Av e Cost: $28K
FY98 Fcst: 7
activex.microsoft.com
(2)
Av e CFG: 4xP6,
512 RAM,
30 GB HD
Av e Cost: $28K
FY98 Fcst: 3
Router
home.microsoft.com
(2)
SQL SERVERS
(2)
Av e CFG: 4xP6,
512 RAM,
160 GB HD
Av e Cost: $80K
FY98 Fcst: 1
Router
premium.microsoft.com
(1)
FDDI Ring
(MIS3)
FTP
Download Serv er
(1)
Router
Router
msid.msn.com
(1)
512 RAM,
30 GB HD
Av e Cost: $35K
FY98 Fcst: 1
msid.msn.com
(1)
search.microsoft.com
(3)
cdm.microsoft.com
(1)
Av e CFG: 4xP5,
256 RAM,
12 GB HD
Av e Cost: $24K
FY98 Fcst: 0
Av e CFG: 4xP6,
1 GB RAM,
160 GB HD
Av e Cost: $83K
FY98 Fcst: 2
msid.msn.com
(1)
w w w .microsoft.com
(4)
512 RAM,
30 GB HD
Ave Cost: $43K
FY98 Fcst: 10
Av e CFG: 4xP6,
512 RAM,
50 GB HD
Av e Cost: $50K
FY98 Fcst: 17
w w w .microsoft.com
(3)
w w w .microsoft.compremium.microsoft.com
(1)
Av e CFG: 4xP6,
Av e CFG: 4xP6, (3)
512 RAM,
50 GB HD
Av e Cost: $50K
FY98 Fcst: 1
SQL Consolidators
DMZ Staging Serv ers
Router
SQL Reporting
Av e CFG: 4xP6,
512 RAM,
160 GB HD
Av e Cost: $80K
FY98 Fcst: 2
European Data Center
IDC Staging Serv ers
MOSWest
FTP Servers
Ave CFG: 4xP5,
512 RAM,
Download 30 GB HD
Replication Ave Cost: $28K
FY98 Fcst: 0
premium.microsoft.com
(2)
– Spread data and
computation
to new slices
Ave CFG: 4xP5,
512 RAM,
30 GB HD
Ave Cost: $35K
FY98 Fcst: 12
w w w .microsoft.com
(5)
Internet
FDDI Ring
(MIS4)
home.microsoft.com
(5)
Ave CFG: 4xP6,
512 RAM,
30 GB HD
Ave Cost: $35K
FY98 Fcst: 9
\\Tweeks\Statistics\LAN and Server Name Info\Cluster Process Flow\MidYear98a.vsd
12/15/97
– Clones: anonymous servers
– Parts+Packs: Partitions fail over within a pack
32
• In both cases, remote farm for disaster recovery
2
Ethernet
(100 Mb/Sec Each)
Scaleable Systems
Scale UP and Scale OUT
• Everyone does both.
• Choice is
– Size of a brick
– Clones or partitions
– Size of a pack
33
Everyone scales out
What’s the Brick?
• 1M$/slice
– IBM S390?
– Sun E 10,000?
• 100 K$/slice
– Wintel 8X
• 10 K$/slice
– Wintel 4x
• 1 K$/slice
– Wintel 1x
34
Clones: Availability+Scalability
• Some applications are
– Read-mostly
– Low consistency requirements
– Modest storage requirement (less than 1TB)
• Examples:
– HTML web servers (IP sprayer/sieve + replication)
– LDAP servers (replication via gossip)
• Replicate app at all nodes (clones)
•
•
•
•
Spray requests across nodes.
Grow by adding clones
Fault tolerance: stop sending to that clone.
Growth: add a clone.
35
Facilities Clones Need
• Automatic replication
– Applications (and system software)
– Data
• Automatic request routing
– Spray or sieve
• Management:
– Who is up?
– Update management & propagation
– Application monitoring.
• Clones are very easy to manage:
– Rule of thumb: 100’s of clones per admin
36
Partitions for Scalability
• Clones are not appropriate for some apps.
– Statefull apps do not replicate well
– high update rates do not replicate well
• Examples
– Email / chat / …
– Databases
• Partition state among servers
• Scalability (online):
– Partition split/merge
– Partitioning must be transparent to client.
37
Partitioned/Clustered Apps
• Mail servers
– Perfectly partitionable
• Business Object Servers
– Partition by set of objects.
• Parallel Databases
• Transparent access to partitioned tables
• Parallel Query
38
Packs for Availability
• Each partition may fail (independent of others)
• Partitions migrate to new node via fail-over
– Fail-over in seconds
• Pack: the nodes supporting a partition
–
–
–
–
–
VMS Cluster
Tandem Process Pair
SP2 HACMP
Sysplex™
WinNT MSCS (wolfpack)
• Cluster In A Box
now commodity
• Partitions typically grow in packs.
39
What Parts+Packs Need
• Automatic partitioning (in dbms, mail, files,…)
– Location transparent
– Partition split/merge
– Grow without limits (100x10TB)
• Simple failover model
– Partition migration is transparent
– MSCS-like model for services
• Application-centric request routing
• Management:
– Who is up?
– Automatic partition management (split/merge)
– Application monitoring.
40
Always UP: Farm pairs
•
•
•
•
Two farms
Changes from one
sent to other
When one farm fails
other provides service
Masks
– Hardware/Software faults
– Operations tasks (reorganize, upgrade move
– Environmental faults (power fail)
41
Services on Clones & Partitions
• Application provides a set of services
• If cloned:
– Services are on subset of clones
• If partitioned:
– Services run at each partition
• System load balancing routes request to
– Any clone
– Correct partition.
– Routes around failures.
42
Cluster Scenarios: 3- tier systems
A simple web site
Clones for availability
Packs for availability
Web File Store
SQL Database
SQL Temp State
Load Balance
Front End
Web Clients
43
Cluster Scale Out Scenarios
The FARM: Clones and Packs of Partitions
Packed Partitions: Database Transparency
SQL Partition 3
Cloned
Packed
file
servers
SQL Partition 2
replication
Web File StoreA
Web File StoreB
Web
Clients
SQL
SQLPartition1
Database
SQL Temp State
Load Balance
Cloned
Front Ends
(firewall, 44
sprayer,
web server)
Talk 2 (if there is time)
• Terminology for scaleability
• Farms of servers:
Farm
– Clones: identical
• Scaleability + availability
Clone
Partition
– Partitions:
• Scaleability
– Packs
Pack
• Partition availability via fail-over
45
Download