Presenter: Daniel J. Schaak History AWS Services Amazon EC2 Elastic Load Balancing Amazon EBS Amazon S3 Amazon EMR Amazon RDS Amazon SimpleDB Amazon DynamoDB AWS SDK AWS Competitors AWS Future White paper definition: “Amazon Web Services is a collection of remote computing services that together make up a cloud computing platform, offered over the Internet by Amazon.com.” Can be used for almost anything imaginable. Founded by Jeff Bezos Incorporated (as Cadabra) in 1994 Amazon.com debuts on the web in 1995 Sold only books Growth into new genres in coming years 2001 Q4 is first profitable quarter Headquarters: Seattle, WA 117,300 employees Current Forbes rankings #6 Innovative Companies #33 World's Most Valuable Brand Originally developed for internal use Chris Pinkham lead designer Began building it in 2003 2005 offered technology to limited customers under NDA EC2 & S3 launched in 2006 Many new services and regions since then Direct correlation between launch of AWS and Amazon's growth Image credit: Charles McLellan/ZDNet; Data: Amazon Image credit: Charles McLellan/ZDNet; Data: Amazon Image credit: Charles McLellan/ZDNet; Data: Amazon Advantages to AWS Flexible Cost-Effective Secure Distinguished Cost of operating No minimum fee Pay for what you use As of white paper issued in January 2014 34 services in 6 primary service areas. Compute & Networking Storage & Content Delivery Network Database Analytics Application Services Deployment & Management Compute & Networking Amazon Elastic Compute Cloud (EC2) Auto Scaling Elastic Load Balancing Amazon WorkSpaces Amazon Virtual Private Cloud (Amazon VPC) Amazon Route 53 AWS Direct Connect Storage & Content Delivery Network Amazon Simple Storage Service (Amazon S3) Amazon Glacier Amazon Elastic Block Storage (EBS) AWS Storage Gateway AWS Import/Export Amazon CloudFront Database Amazon Relational Database Service (Amazon RDS) Amazon DynamoDB Amazon ElastiCache Amazon Redshift Amazon SimpleDB Analytics Amazon Elastic MapReduce (Amazon EMR) Amazon Kinesis AWS Data Pipeline Application Services Amazon AppStream Amazon Simple Queue Service (Amazon SQS) Amazon Simple Notification Service (Amazon SNS) Amazon Simple Workflow Service (Amazon SWF) Amazon Simple Email Service (Amazon SES) Amazon CloudSearch Amazon Elastic Transcoder Deployment and Management AWS Identity and Access Management (IAM) AWS CloudTrail Amazon CloudWatch AWS Elastic Beanstalk AWS CloudFormation AWS OpsWorks AWS CloudHSM Elastic Compute Cloud Simple to bring up new virtual machines Many base images to choose from Create custom images Technology behind it Amazon guards these details very closely Article by Steven J. Vaughan-Nichols in March 2012 Huang Liu Ph.D. in EE (Research Manager with Accenture) 454,400 servers Believed that each runs a custom version of Red Hat Enterprise Linux Xen hypervisor for VM hosting Who Is Using It? Elastic Load Balancing Central managing of encryption\decryption Sticky sessions Single point of contact for domain names Amazon Elastic Block Store Block level storage values Linked to single EC2 instance at a time shows as storage device on VM Persists independent of EC2 instance Provides reliable storage Built in redundancy (within a single availability zone) Snapshots stored in S3 provides long term data backups Snapshots are incremental Large data sets for free provided for public use 1000 Genomes Project Enron Email Data Marvel Universe Social Graph NASA NEX Daily Global Weather Measurements from 1929 - 2009 Amazon Simple Storage Service Stores data as "objects" in "buckets" Object sizes can range from 1 byte to 5 terabytes Buckets are containers for data objects Single bucket can store unlimited number of objects Access permissions can be granted on a per bucket basis Redundant backups Multiple devices in multiple facilities Regular data checks 99.999999999% durability and 99.99% availability Highly integrated Data can be made publicly viewable Versioning Common use cases: Backup and storage Application or media hosting Software delivery Static website hosting Who Is Using It? Elastic Map Reduce Relies on EC2 and S3 Data & Processing code loaded into S3 Spins up cluster of EC2 instances Results available in S3 Hadoop Ecosystem Hive & PIG available Who Is Using It? Provides access to traditional RDBMS systems Oracle Microsoft SQL Server MySQL PostgreSQL Existing applications & tools work as is Features Automatic patching of database software Automatic backups Support for multi zone deployments fail-over Storage Type Options General Purpose (SSD) Consistent 3 IOPS per GB Supports bursts up to 3000 IOPS Provisioned IOPS (SSD) High performance storage for I/O intensive workloads 1000 - 30,000 IOPS per instance Magnetic Storage Not listed guaranteed IOPS Best suited for small workloads with minimal reads Who Is Using It? Written in Erlang Introduced in 2007 Data model and architecture Document based NoSQL database Consists of one or more named fields Supports multiple domains of documents Can be independently queried Each domain may be stored on a different Amazon node Configurable Consistency Switches CAP properties between AP (eventually consistent) Recommended mode DB is typically consistent within less than 1 sec CP (consistent) Geographically distributed replicas of data Best suited for smaller applications requiring flexibility in queries Logging is a good example of a use case Limitations Domains limited to 1 billion attributes (10GB) Each document in a domain is limited to 256 attributes Each attribute for a document is limited to 1024 bytes Drawbacks Manual partitioning of data Typical max request capacity is under 25 writes/sec Automatically indexes all item attributes Provides query flexibility Costs performance and scalability Fully managed NoSQL database service Debuted in 2012 Combines best parts of Dynamo and SimpleDB Dynamo was first NoSQL solution built by Amazon Provided reliability, performance and scalability Required management Turned people away to more simplistic options Advantages Managed Scalable Fast Durable & Highly Available Flexible Low Cost Document based NoSQL database Provides two types of keys for PK indexing Simple Hash Key Single attribute (PK) Composite Hash Key w/ Range Key Key contains two attributes Hash attribute Range attribute: used to return multiple data records within specified criteria CAP theorem properties AP by default Consistency usually reached within a second Can be CP on a per read basis Focus on scalability and performance Runs on solid state disks No limits on request capacity or table size 400 KB limit on item size Automatic partitioning Does not index all attributes Keeps read\write cost low Updates only require updated PK index Secondary indexes can be defined Impacts performance Latencies remain stable even as datasets grow Provisioned throughput Provisioned throughput Provides predictable (specifiable) performance Allows a per table specification of throughput capacity DynamoDB allocates resources sufficient to gaurantee it Reservations are elastic Scaled up or down at any time Management console API Measured in capacity units Capacity Units Measure of strongly consistent operations per second Eventually consistent operations are twice as efficient Read = 4KB per unit Write = 1KB per unit Rounds up to nearest unit Local secondary indexes impact throughput Each item has 100 bytes of additional overhead for indexing Capacity Units - Read Expected Item Size Consistency Desired Reads Per Second Provisioned Throughput Required 4 KB Strongly consistent 50 50 8 KB Strongly consistent 50 100 4 KB Eventually consistent 50 25 8 KB Eventually consistent 50 50 Image courtesy of: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/WorkingWithTables.html Capacity Units - Write Expected Item Size Desired Writes Per Second Provisioned Throughput Required 1 KB 50 50 2 KB 50 100 Image courtesy of: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/WorkingWithTables.html Automatically replicates data Replicates to at least 3 different data centers Replicates to multiple AWS availability zones Replicates within a single AWS region Ensures availability and durability Who Is Using It? Management Console Command line utility HTTP API AWS SDK Available for a wide variety of languages including: Java, .Net, Ruby, Python, PHP, Node.js, Browser (JavaScript) Android, iOS Extensive library of developer documentation available API documents Developer guides Reference videos Case studies Sample code DynamoDB API HTTP requests Data passed in JSON AWS SDK Low-level API methods Correspond closely to DynamoDB operations Common across languages JAVA and .Net provide object persistence Map client side classes to DynamoDB tables Ability to call object methods rather than low-level API .NET provides document model High level object model Abstracts low-level operations into table and document objects GetItem Eventually consistent by default Returns ALL attributes of an item PutItem Creates new item Overwrites existing items UpdateItem Modifies existing items Creates new item if necessary Only need to specify attributes to be updated DeleteItem Batch Operations BatchGetItem Retrieve up to 1MB OR 100 items Can retrieve from multiple tables BatchWriteItem Put or Delete multiple tables Up to 16 MB or 25 items Can Put\Delete in multiple tables Can NOT Update items Invokes corresponding request for each item Individual failed requests do not fail entire batch Key and data returned for failed requests ALL requests must fail for batch request to fail Conditional Writes Image courtesy of: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/WorkingWithItems.html Conditional Writes Specify expected conditions Must be met PRIOR to operation taking effect Applicable to PUT, UPDATE, DELETE Idempotent Image courtesy of: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/WorkingWithItems.html Advanced Topics Atomic Counters Projection expressions Specify attributes Substitution in expressions Specify return data UPDATE or DELETE None by default Query Hash key required Range key optional Secondary Indexes Eventually consistent by default Returns all attributes by default Always returns a result set Results always sorted by range key Ascending by default Max return of 1MB Scan Examines EVERY item in table Always eventually consistent Returns all data attributes by default Always returns result set Max return of 1MB Advanced topics Filters Apply to Query or Scan Conditional expression to limit result set Applied AFTER query or Scan completes Paging Limit Parallel scans Scans multiple segments simultaneously Application managed Dependent on throughput settings Must be finely tuned Advanced topics Performance Query more efficient than Scan Scan always scans entire table Query uses indexes to find range of keys Filters can degrade performance Impact on both Query and Scan performance Applied after initial search operation completes Use with caution Image courtesy of: http://www.fool.com/investing/general/2014/06/05/heres-why-microsoft-corporation-is-the-biggest-aws.aspx More Infrastructure Increased server capacity Building a ‘private’ cloud for CIA usage Additional data centers Launched Frankfurt, Germany on 10-24-14 11th region in the world More Data Consumers Photos, music, other files Companies Adopting “cloud first” strategies More Security Providing new tools "I see most of this as an opportunity, not as something that is really bad. It's an opportunity to give customers tools to protect themselves.“ Werner Vogels (AWS CTO) More Competition "We've always said this is too good a business. It's not a winner-take-all environment.“ Werner Vogels (AWS CTO) “There’s also plenty of room for growth in Amazon Web Services. The server market is a $50 billion industry, and that represents just one piece of the current hardware/software ecosystem that Amazon Web Services aims to replace. By contrast, Amazon Web Services generates about $4 billion of annual revenue today.” Courtesy of: http://www.fool.com/investing/general/2014/09/30/3-reasons-amazoncom-incs-stock-could-rise.aspx http://www.fundinguniverse.com/company-histories/amazon-com-inc-history/ http://www.zdnet.com/in-pictures-the-rise-of-aws_p3-3040155324/#photo http://www.zdnet.com/in-pictures-the-rise-of-aws-3040155324/#photo http://www.forbes.com/companies/amazon/ http://media.amazonwebservices.com/AWS_Overview.pdf http://www.zdnet.com/blog/open-source/amazon-ec2-cloud-is-made-up-of-almost-half-a-million-linux-servers/10620 http://www.rightscale.com/blog/cloud-industry-insights/amazons-elastic-block-store-explained https://www.youtube.com/playlist?list=PLhr1KZpdzukcMmx04RbtWuQ0yYOp1vQi4 http://www.academia.edu/1254017/Data_consistency_properties_in_Amazon_SimpleDB_and_Amazon_S3 http://www.browniethoughts.com/2013/02/nosql-databases-key-value-and-document.html http://www.allthingsdistributed.com/2012/01/amazon-dynamodb.html http://aws.amazon.com/tools/ http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ http://smashingboxes.com/ideas/heroku-vs-amazon-web-services http://www.stackdriver.com/cassandra-aws-gce-rackspace/ http://www.rightscale.com/blog/cloud-cost-analysis/google-slashes-cloud-prices-google-vs-aws-price-comparison http://www.theregister.co.uk/2014/07/26/amazon_aws_margin_decline/ http://www.geekwire.com/2014/amazon-web-services-expands-Europe-new-german-data-centers/ http://www.zdnet.com/aws-guru-werner-vogels-predicts-future-for-next-decade-in-the-cloud-7000030683/ http://www.fool.com/investing/general/2014/06/05/heres-why-microsoft-corporation-is-the-biggest-aws.aspx http://www.fool.com/investing/general/2014/09/30/3-reasons-amazoncom-incs-stock-could-rise.aspx