Deep Dive on Amazon S3 & Amazon Glacier Storage Management (reInvent 2017)
Storage Management on S3
- Organize
- Monitor and Analyze
- S3 Inventory
- Amazon CloudWatch
- Storage Class Analysis
- AWS CloudTrail
- Act
- Cross Region replications
- Event Notification
- Lifecycle Policy
- Security Management
- AWS KMS
- AWS IAM
- Bucket Permissions Check
- Encryption Status in S3 Inventory
- Default Encryption
- Trusted advisor
- Amazon Macie
User Permission Management By Tagging
{
"version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": "arn:aws:s3:::Project-bucket/*",
"Condition": {"StringEquals": {"s3:RequestObjectTag/Project": "x"}}
}
]
}
S3 Inventory
- Generates a CSV / ORC file based of all objects in S3 bucket with respect to filter criteria.
- Triggers business workflows and applications such as secondary index, garbage collection, data auditing and offline analytics.
Features:
- Save time
- Daily or Weekly delivery
- Delivery notification
- Delivery to S3 bucket
- Same set of metadata as the LIST API
- Can add size, last modified date, storage class, etag or replication status
- Object-level Encryption Status
- Encrypt Inventory with SSE-S3 or SSE-KMS
- CSV or ORC output format
- Query with Athena, Redshift Spectrum or any Hive tools
S3 Inventory can be queried with Amazon Athena:
CREATE EXTERNAL TABLE my_inventory_table(
`bucket` string,
`key` string,
`version_id` string,
`is_latest` boolean,
`is_delete_marker` boolean,
`size` bigint,
`last_modified_date` timestamp,
`e_tag` string,
`storage_class` string,
`is_multipart_uploaded` boolean,
`replication_status` string,
`encryption_status` string
)
PARTITIONED BY (dt string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymLinkTextInputFormat'
LOCATION 's3://bucketname/inventory/output_destination/hive'
Storage Class Analysis
- Data-driven storage management for S3
- Daily Storage Class Analysis
- Export Analysis data to your S3 Bucket
- Filter by Bucket, Prefix, or Object Tags
Process:
- Monitors access patterns to understand your storage usage
- After 30 days, recommends when to move objects to Standard - Infrequent Access
- Export file includes a daily report of storage, retrieved bytes, and GETs by object age
Object-Level Logging
- Allows Logging CloudTrail for Read / Write Events on the Objects
Cross-Region Replication (CRR)
Use cases:
- Compliance
- Lower latency
- Security
Features:
- Ownership overwrite for cross-account CRR
- Support SSE-KMS Encrypted objects
- Choose any S3 Storage Class as target
- Choose any AWS region as target
- Bi-directional replication
- Lifecycle Policy
Automate with Trigger-Based Workflow Amazon S3 event notifications
- Notifications when objects are created via Put, Post, Copy, Multipart Upload, or Delete
- Filter on prefixes and suffixes
- Trigger workflow with Amazon SNS, Amazon SQS, and Amazon Lambda functions
Default Encryption
- Automatically encrypts all objects written to your Amazon S3 bucket
- Choose SSE-S3 or SSE-KMS
- Makes it easy to satisfy compliance needs
Amazon Macie
- Security service that uses machine learning to automatically discover, classify and protect sensitive data in AWS
- Recognizes sensitive data
- Continuously monitors data access
- Provides dashboards and alerts
AlertLogic Use Case on AWS S3
S3 Object Management
- S3 Object Keys use hash prefix for performance:
logmsgs-001:/X-OGA/11543.2016-03/...
- S3 Objects written with two Tags
- Customer identitfier (cid=1234567890)
- Date (date=2017-06)
- AWS KMS used to generate data encryptionkeys
- Customer Master Key (CMK) for each data type with automatic rotation enabeld
- Data Keys generated per-customer/per-month
- Per Customer Expiration Rule
- Uses
cid
and date
tags as filter
- Indepdendent of object create time
<Rule>
<ID>expiration-12345</ID>
<Status>Enabled</Status>
<Filter>
<And>
<Tag>
<Name>cid</Name>
<Value>12345</Value>
</Tag>
<Tag>
<Name>date</Name>
<Value>2015-09</Value>
</Tag>
</And>
</Filter>
<Expiration>
<!-- Depends entirely on the tag values -->
<Days>0</Days>
</Expiration>
</Rule>
- One Transition Rule per month
- Uses
date
tag as filter
<Rule>
<ID>transition-ia-3months</ID>
<Status>Enabled</Status>
<Filter>
<And>
<Tag>
<Name>date</Name>
<Value>2016-07</Value>
</Tag>
</And>
</Filter>
<Transition>
<StorageClass>STANDARD_IA</StorageClass>
</Transition>
</Rule>
Demonstrate Scale of Storage Solution (AWS re:Invent 2017)
- Scaled wrokload 100x successfully
- 140PB/month of customer data
- 30k writes/second sustained
- Write latency 200ms at 95th percentile
- Read latency 125ms at 95th percentile
- Limited only by resources driving traffic