Docs > AWS WhitePapers & Deep Dives > Deep Dive on Amazon S3 & Amazon Glacier Storage Management (reInvent 2017)

Deep Dive on Amazon S3 & Amazon Glacier Storage Management (reInvent 2017)

Storage Management on S3
AlertLogic Use Case on AWS S3

Storage Management on S3

Organize
- Object Tagging
Monitor and Analyze
- S3 Inventory
- Amazon CloudWatch
- Storage Class Analysis
- AWS CloudTrail
Act
- Cross Region replications
- Event Notification
- Lifecycle Policy
Security Management
- AWS KMS
- AWS IAM
- Bucket Permissions Check
- Encryption Status in S3 Inventory
- Default Encryption
- Trusted advisor
- Amazon Macie

User Permission Management By Tagging

{
    "version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": "arn:aws:s3:::Project-bucket/*",
            "Condition": {"StringEquals": {"s3:RequestObjectTag/Project": "x"}}
            
        }
    ]
}

S3 Inventory

Generates a CSV / ORC file based of all objects in S3 bucket with respect to filter criteria.
Triggers business workflows and applications such as secondary index, garbage collection, data auditing and offline analytics.

Features:

Save time
Daily or Weekly delivery
Delivery notification
Delivery to S3 bucket
Same set of metadata as the LIST API
Can add size, last modified date, storage class, etag or replication status
Object-level Encryption Status
Encrypt Inventory with SSE-S3 or SSE-KMS
CSV or ORC output format
Query with Athena, Redshift Spectrum or any Hive tools

S3 Inventory can be queried with Amazon Athena:

CREATE EXTERNAL TABLE my_inventory_table(
    `bucket` string,
    `key` string,
    `version_id` string,
    `is_latest` boolean, 
    `is_delete_marker` boolean, 
    `size` bigint, 
    `last_modified_date` timestamp, 
    `e_tag` string,
    `storage_class` string, 
    `is_multipart_uploaded` boolean,
    `replication_status` string,
    `encryption_status` string
)
PARTITIONED BY (dt string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymLinkTextInputFormat'
LOCATION 's3://bucketname/inventory/output_destination/hive'

Storage Class Analysis

Data-driven storage management for S3
Daily Storage Class Analysis
Export Analysis data to your S3 Bucket
Filter by Bucket, Prefix, or Object Tags

Process:

Monitors access patterns to understand your storage usage
After 30 days, recommends when to move objects to Standard - Infrequent Access
Export file includes a daily report of storage, retrieved bytes, and GETs by object age

Object-Level Logging

Allows Logging CloudTrail for Read / Write Events on the Objects

Cross-Region Replication (CRR)

Use cases:

Compliance
Lower latency
Security

Features:

Ownership overwrite for cross-account CRR
Support SSE-KMS Encrypted objects
Choose any S3 Storage Class as target
Choose any AWS region as target
Bi-directional replication
Lifecycle Policy

Automate with Trigger-Based Workflow Amazon S3 event notifications

Notifications when objects are created via Put, Post, Copy, Multipart Upload, or Delete
Filter on prefixes and suffixes
Trigger workflow with Amazon SNS, Amazon SQS, and Amazon Lambda functions

Default Encryption

Automatically encrypts all objects written to your Amazon S3 bucket
Choose SSE-S3 or SSE-KMS
Makes it easy to satisfy compliance needs

Amazon Macie

Security service that uses machine learning to automatically discover, classify and protect sensitive data in AWS
Recognizes sensitive data
Continuously monitors data access
Provides dashboards and alerts

AlertLogic Use Case on AWS S3

S3 Object Management

S3 Object Keys use hash prefix for performance: logmsgs-001:/X-OGA/11543.2016-03/...
S3 Objects written with two Tags
- Customer identitfier (cid=1234567890)
- Date (date=2017-06)
AWS KMS used to generate data encryptionkeys
- Customer Master Key (CMK) for each data type with automatic rotation enabeld
- Data Keys generated per-customer/per-month

Tags with Lifecycle Expiration Policies

Per Customer Expiration Rule
Uses cid and date tags as filter
Indepdendent of object create time

<Rule>
    <ID>expiration-12345</ID>
    <Status>Enabled</Status>
    <Filter>
        <And>
            <Tag>
                <Name>cid</Name>
                <Value>12345</Value>
            </Tag>
            <Tag>
                <Name>date</Name>
                <Value>2015-09</Value>
            </Tag>
        </And>
    </Filter>
    <Expiration>
        <!-- Depends entirely on the tag values -->
        <Days>0</Days>
    </Expiration>
</Rule>

Tags with Lifecycle Transition Policies

One Transition Rule per month
Uses date tag as filter

<Rule>
    <ID>transition-ia-3months</ID>
    <Status>Enabled</Status>
    <Filter>
        <And>
            <Tag>
                <Name>date</Name>
                <Value>2016-07</Value>
            </Tag>
        </And>
    </Filter>
    <Transition>
        <StorageClass>STANDARD_IA</StorageClass>
    </Transition>
</Rule>

Demonstrate Scale of Storage Solution (AWS re:Invent 2017)

Scaled wrokload 100x successfully
- 140PB/month of customer data
- 30k writes/second sustained
- Write latency 200ms at 95th percentile
- Read latency 125ms at 95th percentile
Limited only by resources driving traffic