Docs > AWS Certified SAA > S3

S3

Billing
S3 (Simple Storage Service)
Usage Patterns
Storage Classes
S3 Glacier
S3 Billing
Access & Encryption
Versioning
Lifecycle Management Tools & Glacier
Cross Region Replication
Amazon S3 Transfer Acceleration
Amazon S3 Notifications
CloudFront
Snowball
Storage Gateway
FAQs

Billing

Billing Alarms can be created from CloudWatch.
Billing Notifications should be enabled from the Billing Preferences section.

S3 (Simple Storage Service)

Provides developers and IT teams with secure, durable, highly scalable object storage
Provides simple web services interface to store and retrieve data
S3 is a safe place to store the files
S3 and Glacier are not block storages
S3 is Object-Based - allows you to upload files
Files can be 0 bytes to 5TB
Successful uploads will generate an HTTP 200 code
Unlimited storage
Files are stored in Buckets
Objects consist of:
- Key (name of the object)
- Value (the data)
- Version ID (important for versioning)
- Metadata (data about data you are storing)
- Subresources
  - Access Control List
  - Torrent
Files can be from 0 Bytes to 5TB
There is unlimited storage
Files are stored in Buckets
S3 is a universal namespace. Names must be unique globally.
Data Consistency
- Read after Write consistency for PUTS of new Objects
  If you write a new file and read it immediately after, you will be able to view that data
- Eventual Consistency for overwrite PUTS and DELETEs (can take some time to propagate)
  If you update an existing file or delete a file and read it immediately, you may get the older version, or you may not. Changes to objects can take a little bit of time to propagate.
Tiered Storage Available
Lifecycle Management
Versioning
Encryption
MFA Delete
Secure your data using Access Control Lists and Bucket Policies
Support BitTorrent peer-to-peer protocol
- Allows cost saving when distributing content at high scale
Amazon S3 can be paired with Amazon CloudSearch / DynamoDB or RDS for ease of querying metadata and locating the object reference.

Usage Patterns

Store and distribute static web content and media
Host entire static website
Data store for computation and large-scale analytics, allowing concurrent access to multiple computing nodes
Highly durable, scalable and secure solution for backup and archiving of critical data

Storage Classes

S3 Standard
- 99.99% availability
- 99.999999999% durability for S3 information. (11x9s)
- Stored redundantly across multiple devices in multiple facilities
- Designed to sustain a loss of 2 facilities concurrently
S3 - IA (Infrequently Accessed)
- For data that is accessed less frequently but requires rapid access when needed
- Lower fee that S3, but you are charged a retrieval fee
S3 One Zone - IA (Infrequently Accessed, was called before RRS - Reduced Redundancy Storage)
- Lower-cost option for infrequently accessed data, but do not require the multiple Availability Zone data resilience.
S3 - Intelligent Tiering
- Uses machine learning
- Optimizes costs automatically by moving data to the most cost-effective access tier, without performance impact or operational overhead
S3 Glacier
- secure, durable and low-cost storage class for backup and data archiving
- retrieval times are configurable from minutes to hours
- retrieval puts a copy of retrieved object in S3 Reduced Redundancy Storage (RRS) for a specified retention period (original object remains in Glacier)
- expedited, standard and bulk retrievals
- data is encrypted by default
S3 Glacier Deep Archive
- lowest-cost storage class where a retrieval time of 12 hours is acceptable

S3 Glacier

Single Archive Limited to 40TB in size
There’s no limit on total amount of data you can store in S3 Glacier
Vaults can be locked by using lockable policies
- You can specify “undeletable records” or “time-based data retention” in “Glacier Vault Lock” policy
- After policy is locked it becomes immutable and Amazon Glacier enforces the controls to help achieve compliance objectives
Can be integrated with CloudTrail to help control access
Can be interfaces using REST web services, or as a storage class in S3
Objects archived to Glacier using S3 Lifecycle policies can be accessed only from S3 API and not from Glacier API
Amazon Glacier performs regular systematic data integrity checks and is built to be automatically self-healing

S3 Billing

Storage
Number of requests
Storage Management Pricing
Data Transfer Pricing
Transfer Acceleration
Cross-Region Replication

Access & Encryption

By default, all newly created buckets are PRIVATE
Control access to the buckets using:
- Bucket Policies
- Access Control Lists
Encryption in Transit
- SSL/TLS (HTTPS)
Encryption At Rest (Server Side)
(SSE = Server Side Encryption)
- SSE-S3, S3 Managed Keys - AES-256
- SSE-KMS, AWS Key Management Service
- SSE-C, Customer Provided Keys

Versioning

Stores all versions of an object (including all writes and even if you delete an object)
Great backup tool
Once enabled, Versioning cannot be disabled, only suspended.
Integrates with Lifecycle rules
Versioning’s MFA Delete capability, which uses multi-factor authentication, can be used to provide an additional layer of security.
Size of the bucket is a sum of all versions of the files stored in the bucket
A specific version of the file can be deleted
Deletion of a file will place a delete marker

Lifecycle Management Tools & Glacier

Allows you to automate moving your objects between the different storage tiers
Can be used in conjunction with versioning
Can be applied to current versions and previous versions

Cross Region Replication

Versioning must be enabled on both the source and destination buckets for CRR to work
Regions must be unique
CRR will not replicate the objects created before the CRR Rule was added
Delete markers are not replicated
Deleting individual versions or delete markers will not be replicated
All subsequently updated files will be replicated automatically

Amazon S3 Transfer Acceleration

takes advantage of CloudFront’s globally distributed edge locations
can improve upload and access times
can be tested using speed comparison tool:
(http://s3-accelerate-speedtest.s3-accelerate.amazonaws.com)

Using S3 Transfer Acceleration:

Enable Transfer Acceleration on S3 Bucket
Modify Amazon S3 PUT and GET requests to use s3 accelerate endpoint domain name (.s3-accelerate.amazonaws.com) - Regular endpoint will still be accesible
Some customers measured performance to exceed 500% percent

Amazon S3 Notifications

Can be issued when certain events happen in your bucket
Notifications can be issued to Amazon SQS, SNS Topics and Lambda functions

CloudFront

Edge Location - the location where the content will be cached
Origin - the origin of all the files that the CDN will distribute. It can be an S3 Bucket, an EC2 Instance, an Elastic Load Balancer, or Route 53
Distribution - The name that is given to the CDN which consists of a collection of Edge Locations
If Edge Location does not have a file in the cache, it will download it from the Origin using optimized networks
Objects are cached for the life of the TTL (Time to Live)
Edge locations are not just read-only, you can write to them to
Types of Distribution supported:
- Web Distribution
- RTMP - Used for Media Streaming
Invalidation
- Clears the cache from the Edge Locations
You can invalidate cached objects, but you will be charged

Snowball

Petabyte-scale data transporter solution that uses secure appliances to transfer large amounts of data into and out of AWS.

Snowball
- Import to S3
- Export from S3
- Types
  - 50TB
  - 80TB
- Using it can be cheaper than using high-speed internet
Snowball Edge
- is a 100TB data transfer device with on-board storage and compute capabilities. Can be used to move large amounts of data into and out of AWS.
- Applications will continue to run even when they are not able to access the cloud
Snowmobile
- Exabyte-scale data transfer service used to move extremely large amounts of data to AWS.
- Can transfer up to 100 PB per SnowMobile, 45-foot long ruggedized shipping container, pulled by a semi-trailer truck.

Cost Model

Service Fee (per job)
Extra day charges as required (first 10 days of onsite usage are free)
Data Transfer

Storage Gateway

Connects on-premise software appliance with cloud-based storage to provide seamless and secure integration between an organization’s on-premises IT environment and AWS’s storage infrastructure.

The service enables you to securely store data to the AWS cloud for scalable and cost-effective storage.

Can be installed as a VM image on a host in a data center. Supports either VMware ESXi or Microsoft Hyper-V hypervisors.

Physical appliances are available as well.

Types of Storage Gateways:

File Gateway (NFS)
For files: files are stored as objects in your S3 buckets and accessed through an NFS mount point. Ownership, permissions, and timestamps are stored in S3 user-metadata.
Volume Gateway (iSCSI)
An application can use the disk volumes using iSCSI block protocol.
Data written to the volumes can be asynchronously backed up as point-in-time snapshots of your volumes, and stored in the cloud as Amazon EBS snapshots.
Snapshots are incremental backups and only changed blocks will be charged.
- Stored Volumes
  DAta will be stored locally and asynchronously backed-up to S3 in the form of EBS. (1GB - 16TB volume size)
- Cached Volumes
  Data is stored on AWS S3, while retianing frequently accessed data locally in your storage gateway. This minimizes the need to scale on-premises storage infrastructure, while still providing your applications with low-latency access to their frequently accessed data. (1GB - 32TB volume size)
Tape Gateway (TPL)
Data archiving to AWS Cloud. Lets you leverage your existing tape-based backup application infrastructure to store data on virtual tape cartridges that you create on your tape gateway.
Tape gateway is preconfigured with a media changer and tape drives, which are available to your existing client backup applications as iSCSI devices.

FAQs

The total volume of data is unlimited
Individual objects can have a max size of 5Tb
Largest object uploaded in a single put is 5Gb
For objects larger then 100Mb users should consider using multi-part upload functionality
Amazon uses S3 for its developers and a wide variety of projects
Amazon S3 is a simple key-based object store. Tags can be added to the objects to organized the data.
Pricing components include: storage used, data transfer and data requests
Amazon Macie - AI-powered security service that helps you prevent data loss by discovering, classifying, and protecting sensitive data stored in Amazon S3
Amazon S3 uses a combination of Content-MD5 checksums and cyclic redundancy checks (CRCs) to detect data corruption
AZs are automatically assigned in Amazon S3 based on the storage class used
If the source object is uploaded using the multipart upload feature, then it is replicated using the same number of parts and part size. For example, a 100 GB object uploaded using the multipart upload feature (800 parts of 128 MB each) will incur request cost associated with 802 requests (800 Upload Part requests + 1 Initiate Multipart Upload request + 1 Complete Multipart Upload request) when replicated. You will incur a request charge of $0.00401 (802 requests x $0.005 per 1,000 requests) and a charge of $2.00 ($0.020 per GB transferred x 100 GB) for inter-region data transfer. After replication, the 100 GB will incur storage charges based on the destination region.