S3

Billing

  • Billing Alarms can be created from CloudWatch.
  • Billing Notifications should be enabled from the Billing Preferences section.

S3 (Simple Storage Service)

  • Provides developers and IT teams with secure, durable, highly scalable object storage
  • Provides simple web services interface to store and retrieve data
  • S3 is a safe place to store the files
  • S3 and Glacier are not block storages
  • S3 is Object-Based - allows you to upload files
  • Files can be 0 bytes to 5TB
  • Successful uploads will generate an HTTP 200 code
  • Unlimited storage
  • Files are stored in Buckets
  • Objects consist of:
    • Key (name of the object)
    • Value (the data)
    • Version ID (important for versioning)
    • Metadata (data about data you are storing)
    • Subresources
      • Access Control List
      • Torrent
  • Files can be from 0 Bytes to 5TB
  • There is unlimited storage
  • Files are stored in Buckets
  • S3 is a universal namespace. Names must be unique globally.
  • Data Consistency
    • Read after Write consistency for PUTS of new Objects
      If you write a new file and read it immediately after, you will be able to view that data
    • Eventual Consistency for overwrite PUTS and DELETEs (can take some time to propagate)
      If you update an existing file or delete a file and read it immediately, you may get the older version, or you may not. Changes to objects can take a little bit of time to propagate.
  • Tiered Storage Available
  • Lifecycle Management
  • Versioning
  • Encryption
  • MFA Delete
  • Secure your data using Access Control Lists and Bucket Policies
  • Storage Classes
    • S3 Standard
      • 99.99% availability
      • 99.999999999% durability for S3 information. (11x9s)
      • Stored redundantly across multiple devices in multiple facilities
      • Designed to sustain a loss of 2 facilities concurrently
    • S3 - IA (Infrequently Accessed)
      • For data that is accessed less frequently but requires rapid access when needed
      • Lower fee that S3, but you are charged a retrieval fee
    • S3 One Zone - IA (Infrequently Accessed, was called before RRS - Reduced Redundancy Storage)
      • Lower-cost option for infrequently accessed data, but do not require the multiple Availability Zone data resilience.
    • S3 - Intelligent Tiering
      • Uses machine learning
      • Optimizes costs automatically by moving data to the most cost-effective access tier, without performance impact or operational overhead
    • S3 Glacier
      • secure, durable and low-cost storage class for backup and data archiving
      • retrieval times are configurable from minutes to hours
      • expedited, standard and bulk retrievals
      • data is encrypted by default
    • S3 Glacier Deep Archive
      • lowest-cost storage class where a retrieval time of 12 hours is acceptable

S3 Billing

  • Storage
  • Number of requests
  • Storage Management Pricing
  • Data Transfer Pricing
  • Transfer Acceleration
  • Cross-Region Replication

Access & Encryption

  • By default, all newly created buckets are PRIVATE
  • Control access to the buckets using:
    • Bucket Policies
    • Access Control Lists
  • Encryption in Transit
    • SSL/TLS (HTTPS)
  • Encryption At Rest (Server Side)
    (SSE = Server Side Encryption)

    • SSE-S3, S3 Managed Keys - AES-256
    • SSE-KMS, AWS Key Management Service
    • SSE-C, Customer Provided Keys

Versioning

  • Stores all versions of an object (including all writes and even if you delete an object)
  • Great backup tool
  • Once enabled, Versioning cannot be disabled, only suspended.
  • Integrates with Lifecycle rules
  • Versioning's MFA Delete capability, which uses multi-factor authentication, can be used to provide an additional layer of security.
  • Size of the bucket is a sum of all versions of the files stored in the bucket
  • A specific version of the file can be deleted
  • Deletion of a file will place a delete marker

Lifecycle Management Tools & Glacier

  • Allows you to automate moving your objects between the different storage tiers
  • Can be used in conjunction with versioning
  • Can be applied to current versions and previous versions

Cross Region Replication

  • Versioning must be enabled on both the source and destination buckets for CRR to work
  • Regions must be unique
  • CRR will not replicate the objects created before the CRR Rule was added
  • Delete markers are not replicated
  • Deleting individual versions or delete markers will not be replicated
  • All subsequently updated files will be replicated automatically

Transfer Acceleration

  • takes advantage of CloudFront's globally distributed edge locations
  • can improve upload and access times
  • can be tested using speed comparison tool:
    (http://s3-accelerate-speedtest.s3-accelerate.amazonaws.com)

CloudFront

  • Edge Location - the location where the content will be cached
  • Origin - the origin of all the files that the CDN will distribute. It can be an S3 Bucket, an EC2 Instance, an Elastic Load Balancer, or Route 53
  • Distribution - The name that is given to the CDN which consists of a collection of Edge Locations
  • If Edge Location does not have a file in the cache, it will download it from the Origin using optimized networks
  • Objects are cached for the life of the TTL (Time to Live)
  • Edge locations are not just read-only, you can write to them to
  • Types of Distribution supported:
    • Web Distribution
    • RTMP - Used for Media Streaming
  • Invalidation
    • Clears the cache from the Edge Locations
  • You can invalidate cached objects, but you will be charged

Snowball

Petabyte-scale data transporter solution that uses secure appliances to transfer large amounts of data into and out of AWS.

  • Snowball
    • Import to S3
    • Export from S3
    • Types
      • 50TB
      • 80TB
    • Using it can be cheaper than using high-speed internet
  • Snowball Edge
    • is a 100TB data transfer device with on-board storage and compute capabilities. Can be used to move large amounts of data into and out of AWS.
    • Applications will continue to run even when they are not able to access the cloud
  • Snowmobile
    • Exabyte-scale data transfer service used to move extremely large amounts of data to AWS.
    • Can transfer up to 100 PB per SnowMobile, 45-foot long ruggedized shipping container, pulled by a semi-trailer truck.

Storage Gateway

Connects on-premise software appliance with cloud-based storage to provide seamless and secure integration between an organization's on-premises IT environment and AWS's storage infrastructure.

The service enables you to securely store data to the AWS cloud for scalable and cost-effective storage.

Can be installed as a VM image on a host in a data center. Supports either VMware ESXi or Microsoft Hyper-V hypervisors.

Physical appliances are available as well.

Types of Storage Gateways:

  • File Gateway (NFS)
    For files: files are stored as objects in your S3 buckets and accessed through an NFS mount point. Ownership, permissions, and timestamps are stored in S3 user-metadata.
  • Volume Gateway (iSCSI)
    An application can use the disk volumes using iSCSI block protocol.
    Data written to the volumes can be asynchronously backed up as point-in-time snapshots of your volumes, and stored in the cloud as Amazon EBS snapshots.
    Snapshots are incremental backups and only changed blocks will be charged.

    • Stored Volumes
      DAta will be stored locally and asynchronously backed-up to S3 in the form of EBS. (1GB - 16TB volume size)
    • Cached Volumes
      Data is stored on AWS S3, while retianing frequently accessed data locally in your storage gateway. This minimizes the need to scale on-premises storage infrastructure, while still providing your applications with low-latency access to their frequently accessed data. (1GB - 32TB volume size)
  • Tape Gateway (TPL)
    Data archiving to AWS Cloud. Lets you leverage your existing tape-based backup application infrastructure to store data on virtual tape cartridges that you create on your tape gateway.
    Tape gateway is preconfigured with a media changer and tape drives, which are available to your existing client backup applications as iSCSI devices.

FAQs

  • The total volume of data is unlimited
  • Individual objects can have a max size of 5Tb
  • Largest object uploaded in a single put is 5Gb
  • For objects larger then 100Mb users should consider using multi-part upload functionality
  • Amazon uses S3 for its developers and a wide variety of projects
  • Amazon S3 is a simple key-based object store. Tags can be added to the objects to organized the data.
  • Pricing components include: storage used, data transfer and data requests
  • Amazon Macie - AI-powered security service that helps you prevent data loss by discovering, classifying, and protecting sensitive data stored in Amazon S3
  • Amazon S3 uses a combination of Content-MD5 checksums and cyclic redundancy checks (CRCs) to detect data corruption
  • AZs are automatically assigned in Amazon S3 based on the storage class used
  • If the source object is uploaded using the multipart upload feature, then it is replicated using the same number of parts and part size. For example, a 100 GB object uploaded using the multipart upload feature (800 parts of 128 MB each) will incur request cost associated with 802 requests (800 Upload Part requests + 1 Initiate Multipart Upload request + 1 Complete Multipart Upload request) when replicated. You will incur a request charge of $0.00401 (802 requests x $0.005 per 1,000 requests) and a charge of $2.00 ($0.020 per GB transferred x 100 GB) for inter-region data transfer. After replication, the 100 GB will incur storage charges based on the destination region.