logo

AWS - Storage

1. Object Storage

Amazon S3 (Simple Storage Service) is the cornerstone of AWS's storage portfolio. It is a highly scalable, durable, and secure object storage service designed to store and retrieve any amount of data from anywhere.

  • Primary Use: Ideal for a vast range of use cases, including cloud-native applications, data lakes, backup and restore, archival, and content distribution for websites and mobile applications.
  • Key Features: It offers a simple web service interface, is designed for 99.999999999% (11 nines) of durability, and provides various storage classes (like S3 Standard, S3 Intelligent-Tiering, and S3 Glacier) to optimize costs based on access frequency.

S3 is so successful that many products / services provide S3 compatible APIs. E.g. NetApp's StorageGrid, Backblaze, MinIO, etc.

2. Block Storage

Amazon EBS (Elastic Block Store) provides high-performance, persistent block-level storage volumes for use with Amazon EC2 instances. It is analogous to a virtual hard drive in the cloud.

  • Primary Use: Best suited for data that requires frequent and granular updates, such as boot volumes for EC2 instances, transactional and NoSQL databases, and throughput-intensive applications.
  • Key Features: Offers different volume types optimized for performance (SSD-backed) or cost (HDD-backed). EBS volumes are tied to a specific Availability Zone but can be backed up with "snapshots" that are stored in S3.

3. File Storage

Amazon EFS (Elastic File System) provides a simple, scalable, and fully managed elastic file system. It can be mounted by multiple EC2 instances simultaneously and automatically grows and shrinks as you add and remove files.

  • Primary Use: Designed for workloads that require shared access to file-based data, such as content management systems, web serving, and shared code repositories. It functions like a network-attached storage (NAS) system.
  • Key Features: It is built to be highly available and durable across multiple Availability Zones. It offers different performance modes and storage tiers to balance cost and performance.

Amazon FSx is a family of fully managed third-party file systems. It allows you to run popular, high-performance file systems in the cloud without having to manage the underlying hardware or software. The main offerings include:

  • Amazon FSx for Windows File Server: For lift-and-shift Windows applications that need a native Windows file system.
  • Amazon FSx for Lustre: For high-performance computing (HPC) workloads that require extremely fast, parallel file access.

4. Backup and Archive

Amazon S3 Glacier is a secure, durable, and extremely low-cost storage service for data archiving and long-term backup. It is integrated with S3, allowing you to move cold data to Glacier to save costs.

  • Primary Use: Archiving data that is infrequently accessed and can tolerate retrieval times of minutes to hours.
  • Key Features: It offers different retrieval options (Expedited, Standard, and Bulk) that balance cost and access time. S3 Glacier Deep Archive is an even lower-cost tier for data that is rarely accessed but must be retained for compliance.

AWS Backup is a centralized, fully managed service that makes it easy to manage backups across multiple AWS services, including EBS, EFS, and databases.

  • Key Features: It automates the backup process, allows you to create and enforce backup policies, and provides a central console to monitor backup and restore activities.

5. Hybrid and Data Transfer

AWS Storage Gateway is a hybrid cloud storage service that gives your on-premises applications access to virtually unlimited cloud storage. It connects your on-premises environment to AWS storage services like S3, EBS, and Glacier.

  • Key Features: It comes in different types, including File Gateway (for file-based access to S3), Volume Gateway (for block storage), and Tape Gateway (for virtual tape backups).

AWS Snow Family (including Snowcone, Snowball, and Snowmobile) provides physical devices to help you migrate petabyte-scale data into and out of AWS, especially in environments with limited network bandwidth.

AWS DataSync is an online data transfer service that simplifies, automates, and accelerates moving data between on-premises storage systems and AWS storage services.