S3
Billing
- Billing Alarms can be created from CloudWatch.
- Billing Notifications should be enabled from the Billing Preferences section.
S3 (Simple Storage Service)
- Provides developers and IT teams with secure, durable, highly scalable object storage
- Provides simple web services interface to store and retrieve data
- S3 is a safe place to store the files
- S3 and Glacier are not block storages
- S3 is Object-Based - allows you to upload files
- Files can be 0 bytes to 5TB
- Successful uploads will generate an HTTP 200 code
- Unlimited storage
- Files are stored in Buckets
- Objects consist of:
- Key (name of the object)
- Value (the data)
- Version ID (important for versioning)
- Metadata (data about data you are storing)
- Subresources
- Access Control List
- Torrent
- Files can be from 0 Bytes to 5TB
- There is unlimited storage
- Files are stored in Buckets
- S3 is a universal namespace. Names must be unique globally.
- Data Consistency
- Read after Write consistency for PUTS of new Objects
If you write a new file and read it immediately after, you will be able to view that data
- Eventual Consistency for overwrite PUTS and DELETEs (can take some time to propagate)
If you update an existing file or delete a file and read it immediately, you may get the older version, or you may not. Changes to objects can take a little bit of time to propagate.
- Tiered Storage Available
- Lifecycle Management
- Versioning
- Encryption
- MFA Delete
- Secure your data using Access Control Lists and Bucket Policies
- Storage Classes
- S3 Standard
- 99.99% availability
- 99.999999999% durability for S3 information. (11x9s)
- Stored redundantly across multiple devices in multiple facilities
- Designed to sustain a loss of 2 facilities concurrently
- S3 - IA (Infrequently Accessed)
- For data that is accessed less frequently but requires rapid access when needed
- Lower fee that S3, but you are charged a retrieval fee
- S3 One Zone - IA (Infrequently Accessed, was called before RRS - Reduced Redundancy Storage)
- Lower-cost option for infrequently accessed data, but do not require the multiple Availability Zone data resilience.
- S3 - Intelligent Tiering
- Uses machine learning
- Optimizes costs automatically by moving data to the most cost-effective access tier, without performance impact or operational overhead
- S3 Glacier
- secure, durable and low-cost storage class for backup and data archiving
- retrieval times are configurable from minutes to hours
- expedited, standard and bulk retrievals
- data is encrypted by default
- S3 Glacier Deep Archive
- lowest-cost storage class where a retrieval time of 12 hours is acceptable
S3 Billing
- Storage
- Number of requests
- Storage Management Pricing
- Data Transfer Pricing
- Transfer Acceleration
- Cross-Region Replication
Access & Encryption
- By default, all newly created buckets are PRIVATE
- Control access to the buckets using:
- Bucket Policies
- Access Control Lists
- Encryption in Transit
- Encryption At Rest (Server Side)
(SSE = Server Side Encryption)
- SSE-S3, S3 Managed Keys - AES-256
- SSE-KMS, AWS Key Management Service
- SSE-C, Customer Provided Keys
Versioning
- Stores all versions of an object (including all writes and even if you delete an object)
- Great backup tool
- Once enabled, Versioning cannot be disabled, only suspended.
- Integrates with Lifecycle rules
- Versioning's MFA Delete capability, which uses multi-factor authentication, can be used to provide an additional layer of security.
- Size of the bucket is a sum of all versions of the files stored in the bucket
- A specific version of the file can be deleted
- Deletion of a file will place a delete marker
Lifecycle Management Tools & Glacier
- Allows you to automate moving your objects between the different storage tiers
- Can be used in conjunction with versioning
- Can be applied to current versions and previous versions
Cross Region Replication
- Versioning must be enabled on both the source and destination buckets for CRR to work
- Regions must be unique
- CRR will not replicate the objects created before the CRR Rule was added
- Delete markers are not replicated
- Deleting individual versions or delete markers will not be replicated
- All subsequently updated files will be replicated automatically
Transfer Acceleration
- takes advantage of CloudFront's globally distributed edge locations
- can improve upload and access times
- can be tested using speed comparison tool:
(http://s3-accelerate-speedtest.s3-accelerate.amazonaws.com)
CloudFront
- Edge Location - the location where the content will be cached
- Origin - the origin of all the files that the CDN will distribute. It can be an S3 Bucket, an EC2 Instance, an Elastic Load Balancer, or Route 53
- Distribution - The name that is given to the CDN which consists of a collection of Edge Locations
- If Edge Location does not have a file in the cache, it will download it from the Origin using optimized networks
- Objects are cached for the life of the TTL (Time to Live)
- Edge locations are not just read-only, you can write to them to
- Types of Distribution supported:
- Web Distribution
- RTMP - Used for Media Streaming
- Invalidation
- Clears the cache from the Edge Locations
- You can invalidate cached objects, but you will be charged
Snowball
Petabyte-scale data transporter solution that uses secure appliances to transfer large amounts of data into and out of AWS.
- Snowball
- Import to S3
- Export from S3
- Types
- Using it can be cheaper than using high-speed internet
- Snowball Edge
- is a 100TB data transfer device with on-board storage and compute capabilities. Can be used to move large amounts of data into and out of AWS.
- Applications will continue to run even when they are not able to access the cloud
- Snowmobile
- Exabyte-scale data transfer service used to move extremely large amounts of data to AWS.
- Can transfer up to 100 PB per SnowMobile, 45-foot long ruggedized shipping container, pulled by a semi-trailer truck.
Storage Gateway
Connects on-premise software appliance with cloud-based storage to provide seamless and secure integration between an organization's on-premises IT environment and AWS's storage infrastructure.
The service enables you to securely store data to the AWS cloud for scalable and cost-effective storage.
Can be installed as a VM image on a host in a data center. Supports either VMware ESXi or Microsoft Hyper-V hypervisors.
Physical appliances are available as well.
Types of Storage Gateways:
- File Gateway (NFS)
For files: files are stored as objects in your S3 buckets and accessed through an NFS mount point. Ownership, permissions, and timestamps are stored in S3 user-metadata.
- Volume Gateway (iSCSI)
An application can use the disk volumes using iSCSI block protocol.
Data written to the volumes can be asynchronously backed up as point-in-time snapshots of your volumes, and stored in the cloud as Amazon EBS snapshots.
Snapshots are incremental backups and only changed blocks will be charged.
- Stored Volumes
DAta will be stored locally and asynchronously backed-up to S3 in the form of EBS. (1GB - 16TB volume size)
- Cached Volumes
Data is stored on AWS S3, while retianing frequently accessed data locally in your storage gateway. This minimizes the need to scale on-premises storage infrastructure, while still providing your applications with low-latency access to their frequently accessed data. (1GB - 32TB volume size)
- Tape Gateway (TPL)
Data archiving to AWS Cloud. Lets you leverage your existing tape-based backup application infrastructure to store data on virtual tape cartridges that you create on your tape gateway.
Tape gateway is preconfigured with a media changer and tape drives, which are available to your existing client backup applications as iSCSI devices.
FAQs
- The total volume of data is unlimited
- Individual objects can have a max size of 5Tb
- Largest object uploaded in a single put is 5Gb
- For objects larger then 100Mb users should consider using multi-part upload functionality
- Amazon uses S3 for its developers and a wide variety of projects
- Amazon S3 is a simple key-based object store. Tags can be added to the objects to organized the data.
- Pricing components include: storage used, data transfer and data requests
- Amazon Macie - AI-powered security service that helps you prevent data loss by discovering, classifying, and protecting sensitive data stored in Amazon S3
- Amazon S3 uses a combination of Content-MD5 checksums and cyclic redundancy checks (CRCs) to detect data corruption
- AZs are automatically assigned in Amazon S3 based on the storage class used
- If the source object is uploaded using the multipart upload feature, then it is replicated using the same number of parts and part size. For example, a 100 GB object uploaded using the multipart upload feature (800 parts of 128 MB each) will incur request cost associated with 802 requests (800 Upload Part requests + 1 Initiate Multipart Upload request + 1 Complete Multipart Upload request) when replicated. You will incur a request charge of $0.00401 (802 requests x $0.005 per 1,000 requests) and a charge of $2.00 ($0.020 per GB transferred x 100 GB) for inter-region data transfer. After replication, the 100 GB will incur storage charges based on the destination region.