AWS Storage AWS Cloud Storage ● Amazon Elastic File System (EFS) A simple, scalable, elastic file system for Linux-based workloads. It is built to scale on demand to petabytes without disrupting applications. A Network Attached Storage (NAS) ● Amazon Elastic Block Store (EBS) Persistent local storage for Amazon EC2, for relational and NoSQL databases, data warehousing, enterprise applications, Big Data processing, or backup and recovery ● Amazon Simple Storage Service (S3) A scalable, durable serverless platform to make data accessible from any Internet location, for user generated content, active archive, serverless computing, Big Data storage or backup ● Amazon S3 Glacier Highly affordable long-term storage classes that can replace tape for archive and regulatory compliance ● AWS Storage Gateway A hybrid storage cloud augmenting your on-premises environment with Amazon cloud storage, for bursting, tiering or migration ● Amazon Snowball ● Cloud Data Migration Services A portfolio of services to help simplify and accelerate moving data of all types and sizes into and out of the AWS cloud ● AWS Backup A fully managed backup service that makes it easy to centralize and automate the backup of data across AWS services in the cloud as well as on premises using the AWS Storage Gateway ● Amazon FSx for Lustre A fully managed file system that is optimized for compute-intensive workloads, such as high performance computing, machine learning, and media data processing workflows, and is seamlessly integrated with Amazon S3 Migration Petabyte-scale physical data transport solution that uses devices designed to be secure to transfer large amounts of data into and out of the AWS Cloud EFS vs EBS vs S3 ● Amazon Elastic File System (EFS) File Storage A scalable storage for the users of Amazon EC2 - the storage will scale itself automatically. If the workload decreases - the storage will scale down, so you won't pay anything for the storage you don't use. You can mount onto several EC2 instances at same time. Throughput of more than 10GB/s (500,000 IOPS). A complete managed service, no need to patch and manage file system. Suitable for Big Data analytics. Multi Availability Zone replication. Need to mount to EC2 before visible for use ● Amazon Elastic Block Store (EBS) Block Storage Amazon EBS is a storage for the drives of your virtual machines. Amazon EBS only allows to attach the volume to another EC2 instance or keep it in a standby mode. It stores data as blocks of the same size and organizes them through the hierarchy similar to a traditional file system. Once you configure the volume in Amazon EBS, it can’t be easily scaled. If you need more storage space, you will need to buy and configure a new volume of a bigger size. Throughput up to 2GB/s General Purpose (SSD) Volumes - The baseline performance of 3 IOPS/GB and a possibility to burst up to 10,000 IOPS makes them a good fit for AWS databases that need a lot of read and write operations, like PostgreSQL, MS SQL or Oracle databases. Provisioned IOPS (SSD) Volumes - This type of EBS volumes is backed with the same SSD but designed for heavy workloads from 30 IOPS/GB up to 20,000 IOPS. Multiple Provisioned IOPS volumes can be striped thus ensuring up to 48,000 IOPS or 800 MBps of throughput. EFS vs EBS vs S3 ● Amazon Simple Storage Service (S3) Object Storage A scalable, durable, highly available, publicly accessible data storage. An object storage and suitable for storing user files and backups in massive numbers. Amazon S3 stores data as objects in a flat environment (without a hierarchy). Each object (file) in the storage contains a header. Objects in Amazon S3 are associated with a unique identifier (key), so access to them can be obtained through web requests from anywhere. Allows hosting static website content. Total number of data and objects are unlimited. Bucket name must be unique, between 3 to 63 characters long, cannot be IP address format, lowercase, cannot contain uppercase and underscore. Maximum size of an object is 5TB while largest object upload is 5GB. Use multi-part upload. S3 is a web store and not file system. Eventually consistent Upload new object → Synchronized → S3 index updated → Success returned Updates overwrites puts and deletes (not read after write/delete) - Eventually consistent Update / delete existing object → Success returned → Synchronized → S3 index updated Standard: 99.99999999% durability, 99.99% availability, SSL encryption, Lifecycle management Standard Infrequent Access (IA): 99.99999999% durability, 99.9% availability, 3AZ, lower cost Standard - One Zone IA: 99.99999999% durability, 99.5% availability, even lower cost S3 Intelligent Tiering: Automatically moves between Standard and IA, Object not accessed after 30 days are moved to IA tier, small monitoring fee Object level configuration: Single bucket can have all of above Use unique object name prefix for maximum (27,500) read requests per seconds EFS Architecture Example AWS Cloud Region A Virtual Private Cloud EFS share (NAS) VPC Subnet EFS mount target EC2 Instance VPC Subnet VPC Subnet VPC Subnet EFS mount target EFS mount target EFS mount target FS DNS NFS inbound Availability Zone A NFS inbound Availability Zone B NFS inbound Availability Zone C NFS inbound Availability Zone D EFS Architecture Example A EFS Architecture Example A B EFS Architecture Example B Copy Security Group ID C EFS Architecture Example C D EFS Architecture Example D $ sudo mkdir efs $ sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo= 600,retrans=2,noresvport fs-901a1c10.efs.us-east-1.amazonaws.com:/ efs $ cd efs $ sudo touch.txt $ dir test.txt Storage Example AWS Cloud Region A Virtual Private Cloud EFS mount target EC2 Instance EFS mount target FS DNS EBS (HDD/SSD) EC2 Instance FS DNS $$$ Note: EBS data tied to the EC2 instance it is mounted on. $$$ Note: EFS mount target enable multiple servers to access one data source EFS share (NAS) Availability Zone A $$$$ Note: Use lifecycle management to transfer unused data to more cost effective storage solution EBS (HDD/SSD) S3 Buckets $$ Storage Example Note: Region X and Availability Zone X can be any Region or Availability Zone including Region A and Availability Zone A AWS Cloud Region X Region A Availability Zone X Availability Zone A Virtual Private Cloud S3 Buckets https://bobic.be.s3-website-u s-east-1.amazonaws.com EC2 Instance EC2 Instance $$ Note: S3 bucket offer cheaper centralized storage than General Purpose EBS volume by more than 1/7 the price. EBS (HDD/SSD) EBS (HDD/SSD) Note: S3 Glacier offer cheaper archive storage than EBS Cold HDD by about 1/10 the price. Endpoint Note: Use lifecycle management to transfer unused data to more cost effective storage solution $ S3 Glacier Storage Example AWS Cloud Region A Availability Zone A S3 Buckets Storage Gateway Note: Snowball is a physical device that can store petabytes of data for transport to AWS Cloud Snowball On-Prem Note: Frequent data is kept in on premise data center for high speed access and delivery to employee. Cost Comparison Parameters ● ● ● Provisioned volume: Days (in month) provisioned: Maximum days (in the month): 50TB 15 31 Cost (estimates): ● Amazon Elastic File System (EFS) Standard Storage: Infrequent Access Storage: $15,000.00 $1,250.00 EFS is elastic storage. No need to provision. It scales automatically. Pay only what you use. ● Amazon Elastic Block Store (EBS) General Purpose SSD: Cold HDD: $5,000.00 $1,250.00 EBS is just like HDD/SSD of a computer. It does not scale. Need to mount new drive to EC2 instance if existing EBS drive is insufficient. ● Amazon Simple Storage Service (S3) Standard: $687.50 Standard - Infrequent Access: $390.63 One Zone - Infrequent Access: $312.50 EBS is just like HDD/SSD of a computer. It does not scale. Need to mount new drive to EC2 instance if existing EBS drive is insufficient. ● Amazon S3 Glacier (up to 12 hours to access) Glacier: Glacier Deep Archive: Archive data similar to tape backup. Need up to 12 hours to recover data. $125.00 $30.94 AWS EC2 Storage Matrix a1.metal, m5.metal, m5d.metal, c5.metal, c5d.metal and c5n.metal are bare metal instances for: ● Specialized workloads require direct access to bare metal infra ● Legacy workloads not supported in virtual environments ● License-restricted Tier 1 software t2 and t3 are burstable instances. Ability to burst depends on CPU credits. ● Instance running below baseline performance pays lower fee and excess credit accumulated for bursting when needed ● Unlimited Mode - high CPU performance at flat additional 5 cents per vCPU hour for as long as needed Thank you