An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many discussions with: Eric Baldeschwieler, Jay Kistler, Chuck Neerdaels, Shelton Shugar, and Raymie Stata and joint work with the Sherpa team, in particular: Brian Cooper, Utkarsh Srivastava, Adam Silberstein, Rodrigo Fonseca and Nick Puz in Y! Research Chuck Neerdaels, P.P. Suryanarayanan and many others in CCDI 1 Questions • What is cloud computing? – Horizontal and functional services • What’s it going to change? – Software business models, science, life • How many clouds will there be? – 1, 2, 3, infinity • What’s new in cloud computing? – HPC grids, ASPs, hosted services, Multics (!) – Emerging “cloud stack” to support a broad class of programs, including data intensive applications 2 Pie-in-the-sky SCENARIOS 3 Living in the Clouds • We want to start a new website, FredsList.com • Our site will provide listings of items for sale, jobs, etc. • As time goes on, we’ll add more features – And illustrate how more cloud capabilities (and corresponding infrastructure components) are used as needed • List of capabilities/components is illustrative, not exhaustive • Our cloud provides a “dataset” abstraction – FredsList doesn’t worry about the underlying components 4 Step 1: Listings Scenario FredsList wants to store listings as (key, category, description) FredsList.com application 1234323, transportation, For sale: one bicycle, barely used 5523442, childcare, Nanny available in San Jose 215534, wanted, Looking for issue 1 of Superman comic book DECLARE DATASET Listings AS ( ID String PRIMARY KEY, Category String, Description Text ) Simple Web Service API’s PNUTS Database 5 Step 2: System Evolution Fred belatedly realizes prices are useful information! FredsList.com application 1234323, transportation, For sale: one bicycle, barely used 5523442, childcare, Nanny available in San Jose ALTER DATASET Listings ADD (Price Float) 215534, wanted, Looking for issue 1 of Superman comic book 32138, camera, Nikon D40, USD 300 Simple Web Service API’s vs. PNUTS Database Not every record in a dataset has values defined for all fields declared for the dataset Schemas are flexible, and evolve 6 Federation of systems offering different capabilities Step 3: Search FredsList’s customers quickly ask for keyword search FredsList.com application “dvd’s” “bicycle” “nanny” ALTER Listings SET Description SEARCHABLE Simple Web Service API’s PNUTS Vespa Database Search Tribble Messaging 7 Step 4: Photos FredsList decides to add photos/videos to listings Federation of systems offering different performance points FredsList.com application ALTER Listings ADD Photo BLOB Simple Web Service API’s PNUTS Foreign key MObStor photo → listing Vespa Storage Database Search Tribble Messaging 8 Step 5: Data Analysis FredsList wants to analyze its listings to get statistics about category, do geocoding, etc. FredsList.com application Pig query to analyze categories Hadoop program to generate fancy pages for listings Hadoop program to geocode data ALTER Listings MAKE ANALYZABLE Simple Web Service API’s Grid PNUTS Foreign key MObStor Vespa photo → listing Compute Database Batch export Storage Search Tribble Messaging 9 Step 6: Performance And by now, Fred is global, and wants georeplication! FredsList wants to reduce its data access latency FredsList.com application ALTER Listings MAKE CACHEABLE Simple Web Service API’s Grid PNUTS Foreign key MObStor Vespa memcached photo → listing Compute Database Batch export Storage Search Caching Tribble Messaging 10 Data Serving vs. Analysis • Very different workloads, requirements • Data from serving system is one of many kinds of data (click streams are another common kind, as are syndicated feeds) to be analyzed and integrated • The result of analysis often goes right back into serving system 11 Motherhood-and-Apple-Pie EYES TO THE SKIES 12 Why Clouds? • On-demand infrastructure to create a fundamental shift in the OE curve: – Do things we can’t do – Build more robustly, more efficiently, more globally, more completely, more quickly, for a given budget • Cloud services should do heavy lifting of heavy-lifting of scaling & high-availability – Today, this is done at the applevel, which is not productive 13 Requirements for Cloud Services • Multitenant. A cloud service must support multiple, organizationally distant customers. • Elasticity. Tenants should be able to negotiate and receive resources/QoS on-demand. • Resource Sharing. Ideally, spare cloud resources should be transparently applied when a tenant’s negotiated QoS is insufficient, e.g., due to spikes. • Horizontal scaling. It should be possible to add cloud capacity in small increments; this should be transparent to the tenants of the service. • Metering. A cloud service must support accounting that reasonably ascribes operational and capital expenditures to each of the tenants of the service. • Security. A cloud service should be secure in that tenants are not made vulnerable because of loopholes in the cloud. • Availability. A cloud service should be highly available. • Operability. A cloud service should be easy to operate, with few operators. Operating costs should scale linearly or better with the capacity of the service. 14 Types of Cloud Services • Two kinds of cloud services: – Horizontal (“Platform”) Cloud Services • Functionality enabling tenants to build applications or new services on top of the cloud – Functional Cloud Services • Functionality that is useful in and of itself to tenants. E.g., various SaaS instances, such as Saleforce.com; Google Analytics and Yahoo!’s IndexTools; Yahoo! properties aimed at end-users and small businesses, e.g., flickr, Groups, Mail, News, Shopping • Could be built on top of horizontal cloud services or from scratch • Yahoo! has been offering these for a long while (e.g., Mail for SMB, Groups, Flickr, BOSS, Ad exchanges) 15 Opening Up Yahoo! Search Phase 1 Giving site owners and developers control over the appearance of Yahoo! Search results. Phase 2 BOSS takes Yahoo!’s open strategy to the next level by providing Yahoo! Search infrastructure and technology to developers and companies to help them build their own search experiences. 16 BOSS Offerings BOSS offers two options for companies and developers and has partnered with top technology universities to drive search experimentation, innovation and research into next generation search. API A self-service, web services model for developers and start-ups to quickly build and deploy new search experiences. CUSTOM Working with 3rd parties to build a more relevant, brand/site specific web search experience. This option is jointly built by Yahoo! and select partners. ACADEMIC Working with the following universities to allow for wide-scale research in the search field: • University of Illinois Urbana Champaign • Carnegie Mellon University • Stanford University • Purdue University • MIT • Indian Institute of Technology Bombay • University of Massachusetts (Slide courtesy Prabhakar Raghavan) 18 Partner Examples 19 Horizontal Cloud Services • Horizontal cloud services are foundations on which tenants build applications or new services. They should be: – Semantics-free. Must be "generic infrastructure,” and not tied to specific app-logic. • May provide the ability to inject application logic through well-defined APIs – Broadly applicable. Must be broadly applicable (i.e., it can't be intended for just one or two properties). – Fault-tolerant over commodity hardware. Must be built using inexpensive commodity hardware, and should mask component failures. • While each cloud service provides value, the power of the cloud paradigm will depend on a collection of well-chosen, loosely coupled services that collectively make it easy to quickly develop and operate innovative web applications. 20 Yahoo! Cloud Stack EDGE Brooklyn Horizontal Cloud Services YCPI … WEB VM/OS Horizontal Cloud ServicesPHP yApache APP VM/OS Horizontal Cloud Serving Grid Services … STORAGE Sherpa Horizontal Cloud Services… MOBStor App Engine Data Highway Monitoring/Metering/Security Provisioning (Self-serve) YCS BATCH Hadoop Horizontal…Cloud Services 22 Yahoo! CCDI Thrust Areas • Fast Provisioning and Machine Virtualization: On demand, deliver a set of hosts imaged with desired software and configured against standard services – Multiple hosts may be multiplexed onto the same physical machine. • Batch Storage and Processing: Scalable data storage optimized for batch processing, together with computational capabilities • Operational Storage: Persistent storage that supports low-latency updates and flexible retrieval • Edge Content Services: Support for dealing with network topology, communication protocols, caching, and BCP Rest of today’s talk 23 Web Data Management • Scan oriented workloads • Focus on sequential disk I/O • $ per cpu cycle Large data analysis (Hadoop) Structured record storage (PNUTS/Sherpa) Blob storage (SAN/NAS) • CRUD • Point lookups and short scans • Index organized table and random I/Os • $ per latency • Object retrieval and streaming • Scalable file storage • $ per GB 24 Hadoop: Batch Storage/Analysis Why is batch processing important? [Workflow] High-level query layer (Pig) Map-Reduce HDFS • Whether it’s – – – – response-prediction for advertising machine-learned relevance for Search, or content optimization for audience, data-intensive computing is increasingly central to everything Yahoo! does – Hadoop is central to addressing this need • Hadoop is a case-study in our cloud vision – Processes enormous amounts of data – Provides horizontal scaling and faulttolerance for our users – Allows those users to focus on their app logic 25 The World Has Changed • Web serving applications need: – Scalability! • Preferably elastic – – – – Flexible schemas Geographic distribution High availability Reliable storage • Web serving applications can do without: – Complicated queries – Strong transactions 26 MObStor • Yahoo!’s next-generation globally replicated, virtualized media object storage service • Better provisioning, easy migration, replication, better BCP, and performance • New features (Evergreen URLs, CDN integration, REST API, …) • The object metadata problem addressed using Sherpa, though MObStor is focused on blob storage. 2727 Storage & Delivery Stack 28 PNUTS / SHERPA To Help You Scale Your Mountains of Data 29 CCDI—Research Collaboration Yahoo! Research CCDI • • • • • • • • • • Raghu Ramakrishnan Brian Cooper Utkarsh Srivastava Adam Silberstein Rodrigo Fonseca Chuck Neerdaels P.P.S. Narayan Kevin Athey Toby Negrin Plus Dev/QA teams 30 Yahoo! Serving Storage Problem – Small records – 100KB or less – Structured records – lots of fields, evolving – Extreme data scale - Tens of TB – Extreme request scale - Tens of thousands of requests/sec – Low latency globally - 20+ datacenters worldwide – High Availability - outages cost $millions – Variable usage patterns - as applications and users change 31 31 What is PNUTS/Sherpa? A B C D E F 42342 42521 66354 12352 75656 15677 E W W E C E Parallel database CREATE TABLE Parts ( ID VARCHAR, StockNumber INT, Status VARCHAR … ) Structured, flexible schema A B C D E F 42342 42521 66354 12352 75656 15677 E W W E C E A B C D E F 42342 42521 66354 12352 75656 15677 E W W E C E Geographic replication Hosted, managed infrastructure 33 33 What Will It Become? A B C D E F 42342 42521 66354 12352 75656 15677 E W W E C E A B C D E F 42342 42521 66354 12352 75656 15677 E W W E C E Indexes and views A B C D E F 42342 42521 66354 12352 75656 15677 E W W E C E 35 Design Goals Scalability Consistency • • • • • • Thousands of machines Easy to add capacity Restrict query language to avoid costly queries Per-record guarantees Timeline model Option to relax if needed Geographic replication Multiple access paths • • • • Asynchronous replication around the globe Low-latency local access Hash table, ordered table Primary, secondary access High availability and fault tolerance Hosted service • • • • Automatically recover from failures Serve reads and writes despite failures Applications plug and play Share operational cost 36 36 Technology Elements Applications Tabular API PNUTS API YCA: Authorization PNUTS • Query planning and execution • Index maintenance Distributed infrastructure for tabular data • Data partitioning • Update consistency • Replication YDOT FS • Ordered tables YDHT FS • Hash tables Tribble • Pub/sub messaging Zookeeper • Consistency service 37 37 Data Manipulation • Per-record operations – Get – Set – Delete • Multi-record operations – Multiget – Scan – Getrange • Web service (RESTful) API 38 38 Tablets—Hash Table 0x0000 Name Description Grape Grapes are good to eat $12 Lime Limes are green $9 Apple Apple is wisdom $1 Strawberry 0x2AF3 0x911F 0xFFFF Strawberry shortcake Price $900 Orange Arrgh! Don’t get scurvy! $2 Avocado But at what price? $3 Lemon How much did you pay for this lemon? $1 Tomato Is this a vegetable? $14 Banana The perfect fruit $2 New Zealand $8 Kiwi 39 39 Tablets—Ordered Table A Name Description Price Apple Apple is wisdom $1 Avocado But at what price? $3 Banana The perfect fruit $2 Grape Grapes are good to eat $12 New Zealand $8 How much did you pay for this lemon? $1 Limes are green $9 H Kiwi Lemon Lime Q Orange Strawberry Tomato Arrgh! Don’t get scurvy! $2 Strawberry shortcake $900 Is this a vegetable? $14 Z 40 40 Flexible Schema Posted date Listing id Item Price 6/1/07 424252 Couch $570 6/1/07 763245 Bike $86 6/3/07 211242 Car $1123 6/5/07 421133 Lamp $15 Color Condition Good Red Fair 41 Detailed Architecture Remote regions Local region Clients REST API Routers Tribble Tablet Controller Storage units 42 42 Tablet Splitting and Balancing Each storage unit has many tablets (horizontal partitions of the table) Storage unit may become a hotspot Storage unit Tablet Overfull tablets split Tablets may grow over time Shed load by moving tablets to other servers 43 43 QUERY PROCESSING 44 44 Accessing Data 4 Record for key k 1 Get key k 3 Record for key k SU SU 2 Get key k SU 45 45 Bulk Read 1 {k1, k2, … kn} 2 Get k1 Get k2 SU SU Get k3 Scatter/ gather server SU 46 46 Range Queries in YDOT • Clustered, ordered retrieval of records Apple Avocado Grapefruit…Pear? Banana Blueberry Canteloupe Grape Kiwi Lemon Grapefruit…Lime? Lime…Pear? Router Lime Mango Orange Strawberry Apple Tomato Avocado Watermelon Banana Blueberry Storage unit 1 Canteloupe Storage unit 3 Lime Storage unit 2 Strawberry Storage unit 1 Strawberry Tomato Watermelon Storage unit 1 Lime Mango Orange Canteloupe Grape Kiwi Lemon Storage unit 2 Storage unit 3 47 Updates 1 8 Write key k Sequence # for key k Routers Message brokers 3 Write key k 2 7 Sequence # for key k 4 Write key k 5 SU SU SU 6 SUCCESS Write key k 48 48 ASYNCHRONOUS REPLICATION AND CONSISTENCY 49 49 Asynchronous Replication 50 50 Consistency Model • Goal: Make it easier for applications to reason about updates and cope with asynchrony • What happens to a record with primary key “Alice”? Record inserted Update v. 1 Update Update Update v. 2 v. 3 v. 4 Update Update v. 5 v. 6 Generation 1 v. 7 Delete Update v. 8 Time Time As the record is updated, copies may get out of sync. 51 51 Example: Social Alice East West User Status Alice ___ User Status Alice Busy User Status User Status Alice Busy Alice Free User Status User Status Alice ??? Alice ??? Record Timeline ___ Busy Free Free 52 Consistency Model Read Stale version v. 1 v. 2 v. 3 v. 4 Stale version v. 5 v. 6 Generation 1 v. 7 Current version v. 8 Time In general, reads are served using a local copy 53 53 Consistency Model Read up-to-date Stale version v. 1 v. 2 v. 3 v. 4 Stale version v. 5 v. 6 Generation 1 v. 7 Current version v. 8 Time But application can request and get current version 54 54 Consistency Model Read ≥ v.6 Stale version v. 1 v. 2 v. 3 v. 4 Stale version v. 5 v. 6 Generation 1 v. 7 Current version v. 8 Time Or variations such as “read forward”—while copies may lag the master record, every copy goes through the same sequence of changes 55 55 Consistency Model Write Stale version v. 1 v. 2 v. 3 v. 4 Stale version v. 5 v. 6 Generation 1 v. 7 Current version v. 8 Time Achieved via per-record primary copy protocol (To maximize availability, record masterships automaticlly transferred if site fails) Can be selectively weakened to eventual consistency (local writes that are reconciled using version vectors) 56 56 Consistency Model Write if = v.7 ERROR Stale version v. 1 v. 2 v. 3 v. 4 Stale version v. 5 v. 6 Generation 1 v. 7 Current version v. 8 Time Test-and-set writes facilitate per-record transactions 57 57 Consistency Techniques • Per-record mastering – Each record is assigned a “master region” • May differ between records – Updates to the record forwarded to the master region – Ensures consistent ordering of updates • Tablet-level mastering – Each tablet is assigned a “master region” – Inserts and deletes of records forwarded to the master region – Master region decides tablet splits • These details are hidden from the application – Except for the latency impact! 58 Mastering A B C D E F 42342 42521 66354 12352 75656 15677 E W W E C E A B C D E F Tablet master A B C D E F 42342 42521 66354 12352 75656 15677 42342 42521 66354 12352 75656 15677 E W W E C E E W W E C E 59 59 Bulk Insert/Update/Replace Client Source Data Bulk manager 1. Client feeds records to bulk manager 2. Bulk loader transfers records to SU’s in batches • Bypass routers and message brokers • Efficient import into storage unit 60 Bulk Load in YDOT • YDOT bulk inserts can cause performance hotspots • Solution: preallocate tablets 61 Index Maintenance • How to have lots of interesting indexes and views, without killing performance? • Solution: Asynchrony! – Indexes/views updated asynchronously when base table updated 62 SHERPA IN CONTEXT 63 63 Types of Record Stores • Query expressiveness S3 PNUTS Oracle Simple Feature rich Object retrieval Retrieval from single table of objects/records SQL 64 Types of Record Stores • Consistency model S3 PNUTS Oracle Best effort Eventual consistency Timeline consistency Object-centric consistency ACID Strong guarantees Program centric consistency 65 Types of Record Stores • Data model PNUTS CouchDB Oracle Flexibility, Schema evolution Object-centric consistency Optimized for Fixed schemas Consistency spans objects 66 Types of Record Stores • Elasticity (ability to add resources on demand) Oracle PNUTS S3 Inelastic Elastic Limited (via data distribution) VLSD (Very Large Scale Distribution /Replication) 67 Data Stores Comparison Versus PNUTS • User-partitioned SQL stores – Microsoft Azure SDS – Amazon SimpleDB • Multi-tenant application databases – Salesforce.com – Oracle on Demand • • • More expressive queries Users must control partitioning Limited elasticity • Highly optimized for complex workloads Limited flexibility to evolving applications Inherit limitations of underlying data management system • • • Mutable object stores – Amazon S3 • Object storage versus record management 68 Application Design Space Get a few things Sherpa MySQL Oracle BigTable Scan everything Everest Records MObStor YMDB Filer Hadoop Files 69 69 SQL/ACID Consistency model Updates Structured access Global low latency Availability Operability Elastic Alternatives Matrix Sherpa Y! UDB MySQL Oracle HDFS BigTable Dynamo Cassandra 70 70 Further Reading Efficient Bulk Insertion into a Distributed Ordered Table (SIGMOD 2008) Adam Silberstein, Brian Cooper, Utkarsh Srivastava, Erik Vee, Ramana Yerneni, Raghu Ramakrishnan PNUTS: Yahoo!'s Hosted Data Serving Platform (VLDB 2008) Brian Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Phil Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana Yerneni Asynchronous View Maintenance for VLSD Databases, Parag Agrawal, Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava and Raghu Ramakrishnan SIGMOD 2009 (to appear) Cloud Storage Design in a PNUTShell Brian F. Cooper, Raghu Ramakrishnan, and Utkarsh Srivastava Beautiful Data, O’Reilly Media, 2009 (to appear) 71 QUESTIONS? 72 72