S3

Billing

  • Billing Alarms can be created from CloudWatch.
  • Billing Notifications should be enabled from the Billing Preferences section.

S3 (Simple Storage Service)

  • Provides developers and IT teams with secure, durable, highly scalable object storage
  • Provides simple web services interface to store and retrieve data
  • S3 is a safe place to store the files
  • S3 and Glacier are not block storages
  • S3 is Object-Based - allows you to upload files
  • Files can be 0 bytes to 5TB
  • Successful uploads will generate an HTTP 200 code
  • Unlimited storage
  • Files are stored in Buckets
  • Objects consist of:
    • Key (name of the object)
    • Value (the data)
    • Version ID (important for versioning)
    • Metadata (data about data you are storing)
    • Subresources
      • Access Control List
      • Torrent
  • Files can be from 0 Bytes to 5TB
  • There is unlimited storage
  • Files are stored in Buckets
  • S3 is a universal namespace. Names must be unique globally.
  • Data Consistency
    • Read after Write consistency for PUTS of new Objects
      If you write a new file and read it immediately after, you will be able to view that data
    • Eventual Consistency for overwrite PUTS and DELETEs (can take some time to propagate)
      If you update an existing file or delete a file and read it immediately, you may get the older version, or you may not. Changes to objects can take a little bit of time to propagate.
  • Tiered Storage Available
  • Lifecycle Management
  • Versioning
  • Encryption
  • MFA Delete
  • Secure your data using Access Control Lists and Bucket Policies
  • Support BitTorrent peer-to-peer protocol
    • Allows cost saving when distributing content at high scale
  • Amazon S3 can be paired with Amazon CloudSearch / DynamoDB or RDS for ease of querying metadata and locating the object reference.

Usage Patterns

  1. Store and distribute static web content and media
  2. Host entire static website
  3. Data store for computation and large-scale analytics, allowing concurrent access to multiple computing nodes
  4. Highly durable, scalable and secure solution for backup and archiving of critical data

Storage Classes

  • S3 Standard
    • 99.99% availability
    • 99.999999999% durability for S3 information. (11x9s)
    • Stored redundantly across multiple devices in multiple facilities
    • Designed to sustain a loss of 2 facilities concurrently
  • S3 - IA (Infrequently Accessed)
    • For data that is accessed less frequently but requires rapid access when needed
    • Lower fee that S3, but you are charged a retrieval fee
  • S3 One Zone - IA (Infrequently Accessed, was called before RRS - Reduced Redundancy Storage)
    • Lower-cost option for infrequently accessed data, but do not require the multiple Availability Zone data resilience.
  • S3 - Intelligent Tiering
    • Uses machine learning
    • Optimizes costs automatically by moving data to the most cost-effective access tier, without performance impact or operational overhead
  • S3 Glacier
    • secure, durable and low-cost storage class for backup and data archiving
    • retrieval times are configurable from minutes to hours
    • retrieval puts a copy of retrieved object in S3 Reduced Redundancy Storage (RRS) for a specified retention period (original object remains in Glacier)
    • expedited, standard and bulk retrievals
    • data is encrypted by default
  • S3 Glacier Deep Archive
    • lowest-cost storage class where a retrieval time of 12 hours is acceptable

S3 Glacier

  • Single Archive Limited to 40TB in size
  • There’s no limit on total amount of data you can store in S3 Glacier
  • Vaults can be locked by using lockable policies
    • You can specify “undeletable records” or “time-based data retention” in “Glacier Vault Lock” policy
    • After policy is locked it becomes immutable and Amazon Glacier enforces the controls to help achieve compliance objectives
  • Can be integrated with CloudTrail to help control access
  • Can be interfaces using REST web services, or as a storage class in S3
  • Objects archived to Glacier using S3 Lifecycle policies can be accessed only from S3 API and not from Glacier API
  • Amazon Glacier performs regular systematic data integrity checks and is built to be automatically self-healing

S3 Billing

  • Storage
  • Number of requests
  • Storage Management Pricing
  • Data Transfer Pricing
  • Transfer Acceleration
  • Cross-Region Replication

Access & Encryption

  • By default, all newly created buckets are PRIVATE
  • Control access to the buckets using:
    • Bucket Policies
    • Access Control Lists
  • Encryption in Transit
    • SSL/TLS (HTTPS)
  • Encryption At Rest (Server Side)
    (SSE = Server Side Encryption)
    • SSE-S3, S3 Managed Keys - AES-256
    • SSE-KMS, AWS Key Management Service
    • SSE-C, Customer Provided Keys

Versioning

  • Stores all versions of an object (including all writes and even if you delete an object)
  • Great backup tool
  • Once enabled, Versioning cannot be disabled, only suspended.
  • Integrates with Lifecycle rules
  • Versioning’s MFA Delete capability, which uses multi-factor authentication, can be used to provide an additional layer of security.
  • Size of the bucket is a sum of all versions of the files stored in the bucket
  • A specific version of the file can be deleted
  • Deletion of a file will place a delete marker

Lifecycle Management Tools & Glacier

  • Allows you to automate moving your objects between the different storage tiers
  • Can be used in conjunction with versioning
  • Can be applied to current versions and previous versions

Cross Region Replication

  • Versioning must be enabled on both the source and destination buckets for CRR to work
  • Regions must be unique
  • CRR will not replicate the objects created before the CRR Rule was added
  • Delete markers are not replicated
  • Deleting individual versions or delete markers will not be replicated
  • All subsequently updated files will be replicated automatically

Amazon S3 Transfer Acceleration

Using S3 Transfer Acceleration:

  1. Enable Transfer Acceleration on S3 Bucket
  2. Modify Amazon S3 PUT and GET requests to use s3 accelerate endpoint domain name (.s3-accelerate.amazonaws.com) - Regular endpoint will still be accesible
  3. Some customers measured performance to exceed 500% percent

Amazon S3 Notifications

  • Can be issued when certain events happen in your bucket
  • Notifications can be issued to Amazon SQS, SNS Topics and Lambda functions

CloudFront

  • Edge Location - the location where the content will be cached
  • Origin - the origin of all the files that the CDN will distribute. It can be an S3 Bucket, an EC2 Instance, an Elastic Load Balancer, or Route 53
  • Distribution - The name that is given to the CDN which consists of a collection of Edge Locations
  • If Edge Location does not have a file in the cache, it will download it from the Origin using optimized networks
  • Objects are cached for the life of the TTL (Time to Live)
  • Edge locations are not just read-only, you can write to them to
  • Types of Distribution supported:
    • Web Distribution
    • RTMP - Used for Media Streaming
  • Invalidation
    • Clears the cache from the Edge Locations
  • You can invalidate cached objects, but you will be charged

Snowball

Petabyte-scale data transporter solution that uses secure appliances to transfer large amounts of data into and out of AWS.

  • Snowball
    • Import to S3
    • Export from S3
    • Types
      • 50TB
      • 80TB
    • Using it can be cheaper than using high-speed internet
  • Snowball Edge
    • is a 100TB data transfer device with on-board storage and compute capabilities. Can be used to move large amounts of data into and out of AWS.
    • Applications will continue to run even when they are not able to access the cloud
  • Snowmobile
    • Exabyte-scale data transfer service used to move extremely large amounts of data to AWS.
    • Can transfer up to 100 PB per SnowMobile, 45-foot long ruggedized shipping container, pulled by a semi-trailer truck.

Cost Model

  • Service Fee (per job)
  • Extra day charges as required (first 10 days of onsite usage are free)
  • Data Transfer

Storage Gateway

Connects on-premise software appliance with cloud-based storage to provide seamless and secure integration between an organization’s on-premises IT environment and AWS’s storage infrastructure.

The service enables you to securely store data to the AWS cloud for scalable and cost-effective storage.

Can be installed as a VM image on a host in a data center. Supports either VMware ESXi or Microsoft Hyper-V hypervisors.

Physical appliances are available as well.

Types of Storage Gateways:

  • File Gateway (NFS)
    For files: files are stored as objects in your S3 buckets and accessed through an NFS mount point. Ownership, permissions, and timestamps are stored in S3 user-metadata.
  • Volume Gateway (iSCSI)
    An application can use the disk volumes using iSCSI block protocol.
    Data written to the volumes can be asynchronously backed up as point-in-time snapshots of your volumes, and stored in the cloud as Amazon EBS snapshots.
    Snapshots are incremental backups and only changed blocks will be charged.
    • Stored Volumes
      DAta will be stored locally and asynchronously backed-up to S3 in the form of EBS. (1GB - 16TB volume size)
    • Cached Volumes
      Data is stored on AWS S3, while retianing frequently accessed data locally in your storage gateway. This minimizes the need to scale on-premises storage infrastructure, while still providing your applications with low-latency access to their frequently accessed data. (1GB - 32TB volume size)
  • Tape Gateway (TPL)
    Data archiving to AWS Cloud. Lets you leverage your existing tape-based backup application infrastructure to store data on virtual tape cartridges that you create on your tape gateway.
    Tape gateway is preconfigured with a media changer and tape drives, which are available to your existing client backup applications as iSCSI devices.

FAQs

  • The total volume of data is unlimited
  • Individual objects can have a max size of 5Tb
  • Largest object uploaded in a single put is 5Gb
  • For objects larger then 100Mb users should consider using multi-part upload functionality
  • Amazon uses S3 for its developers and a wide variety of projects
  • Amazon S3 is a simple key-based object store. Tags can be added to the objects to organized the data.
  • Pricing components include: storage used, data transfer and data requests
  • Amazon Macie - AI-powered security service that helps you prevent data loss by discovering, classifying, and protecting sensitive data stored in Amazon S3
  • Amazon S3 uses a combination of Content-MD5 checksums and cyclic redundancy checks (CRCs) to detect data corruption
  • AZs are automatically assigned in Amazon S3 based on the storage class used
  • If the source object is uploaded using the multipart upload feature, then it is replicated using the same number of parts and part size. For example, a 100 GB object uploaded using the multipart upload feature (800 parts of 128 MB each) will incur request cost associated with 802 requests (800 Upload Part requests + 1 Initiate Multipart Upload request + 1 Complete Multipart Upload request) when replicated. You will incur a request charge of $0.00401 (802 requests x $0.005 per 1,000 requests) and a charge of $2.00 ($0.020 per GB transferred x 100 GB) for inter-region data transfer. After replication, the 100 GB will incur storage charges based on the destination region.