Architecting to Scale
Architectural Patterns
Loosely Coupled Architecture
Components can stand independently and require little or no knowledge of the inner workings of the other components.
- Layers of Abstraction
- Permits more flexibility
- Interchangable components
- More atomic functional units
Horizontal Scaling vs. Vertical Scaling
- Vertical scaling requires downtime
- Horizontal scaling is theoretically unlimited
- In horizontal scaling instances can be added on demand which may be a more cost effective solution
- Horizontal scaling can be automated while vertical scaling would require scripting
- Operations
- Scale Out (horizontal)
- Scale In (horizontal)
- Scale Up (vertical)
- Scale Down (vertical)
Type of Auto-Scaling
- Amazon EC2 Auto-Scaling
- Application Auto-Scaling
- API used to to control scaling for resources other than EC2, like DynamoDB, ECS, EMR
- Provides a common way to interact with the scalability of resources
- AWS Auto Scaling
- Provides centralized way to manage scalability for whole stacks; Predictive scaling feature
- Console that can manage both of the above from a unified standpoint
Amazon EC2 Auto-Scaling Options
- Maintain - Specific minimum number of instances running
- Manual - Use maximum, minimum or specific number of instances
- Schedule - Scale in/out based on schedule
- Dynamic - based on real-time metrics of the system
Auto-Scaling Policy
- Target Tracking Policy
- Simple Scaling Policy
- Step Scaling Policy (More Sophisticated Logic)
Scaling Cooldown Period
- Gives resources time to stabilize before automatically triggering another scaling event
- Different from health check period
- 300 seconds by default
- Automatically applies to dynamic scaling and optionally to manual scaling but not supported for schedule scaling
AWS Kinesis
Collection of services for processing streams of various data
Data is processed in “shards” - each shard can ingest 1000 records per second
Default limit of 500 shards
Record consists of Partition Key (128 bit MD5 hash), Sequence Number and Data Blob (up to 1MB)
Sequence numbers can be duplicated across Shards
Transient Data Store - default retention period of 24 hours, can be configured to up to 7 days
Kinesis Data Streams - Ingest and stores data streams for processing
Kinesis Firehose - Prepares and loads the data continously to the destinations you choose
Kinesis Data Analytics - Run standard SQL queries against data streams
DynamoDB Scaling
- Throughput: Read/Write capacity units
- Max item size is 400KB
- There’s no limit on number of items
DDB Terminology
- Parition - physical space where DDB data is stored
- Partition Key - Unique identifier for each record, also called Hash Key
- Sort Key - Optional second part of a composite key that defines storage order - sometimes called a Range Key
DDB Partitions and Scaling
- Partitions have limitation of Capacity Units and Storage
- Number of Partitions required are determined by both factors
- Capacity - RCU / 3000 + WCU / 1000
- Storage - Total Size / 10GB
- Total Partitions = Round Up Max(Capacity, Storage)
- RCU and WCU will be equality allocated across partitions
- Partition Key should be designed to have high avariability across paritions to distribute the WCUs and RCUs load across the partitions
- DynamoDB allows Auto-Scaling based on Target Utilization and Limits
- Supports Global Secondary Indexes
- Uses Target Tracking method
- Doesn’t scale down if consumptions drops to zero
- Workaround1: send requests to table at minimal level
- Workaround2: manually reduce max capacity to be the same as minimum
- DynamoDB supports On-Demand scaling
- Costs more than traditional provisioning and auto-scaling
DynamoDB Accelerator - DAX
- Sits in from of DDB and provides in-memory caching
- Micro-second level reads
- Good for read-intensive applications
- Supports static / dynamic content at edge locations
- Supports Adobe Flash Media Server’s RTMP protocol
- Web Distributions support streaming through HTTP / HTTPS
- Origins can be S3, EC2, ELB and another Web Server
- Cache invalidation requests can delete the file from the edge location or you have to wait for TTL to expire
- Support Zone Apex (domain without subdomain infront of it)
- Supports Geo-Restriction
SNS (Simple Notification Service)
- Enables Publish/Subscribe design pattern
- Topics - Channels for publishing notifications
- Subscriptions - configuring an endpoint to receive messages published to a topic
- Endpoint options: HTTP/HTTPS, Email, SMS, SQS, Amazon Device Messaging (push notifications), Lambda
- Supports Fan-out Architecture - helps achieve a loosely coupled architecture
- Highly scalable hostead messaging queue
- Available integration with KMS for encrypting messages
- Transient Storage - default 4 days, max 14 days
- Supports first-in / first-out queueing
- Maximum message size of 256KB - Java SDK allows up to 2GB by utilizing S3
- Allows Loosely Coupled Architecture
Queue Types
- Standard Queue - no guarantee about the order of the messages
- FIFO Queue- maintains receiving order - holds all messages until a message is processed
Amazon MQ
- Managed, HA Implementation of Apache ActiveMQ
- Similar to SQS, but a different implementation
- Supports different protocols
- Designed as a drop-in replacement for on-premise message brokers (Lift and Shift to the Cloud)
- Recommended to use SQS if you are building a new application from scratch
AWS Lambda, Serverless Application Manager and EventBridge
- Run code on-demand without the need for infrastructure
- Supports Node.js, Python, Java, Go and C#
- Code is stateless - executed on an event basis (SNS, SQS, S3, DynamoDB Streams, etc.)
- Very useful for event driven architectures
- No limits to scaling a function since AWS dynamically allocates capacity in relation to events
AWS Serverless Application Model (AWS SAM)
- Open source framework for building serverless apps on AWS
- Uses YAML as configuration language
- Includes CLI functionality to create, deploy and update serverless apps using AWS services such as Lambda, DynamoDB and API Gateway
- Enables local testing and debugging of apps using a Lambda-like emulator via Docker
- Extension of CloudFormation so you can use everything CloudFormation can provide by way of resources and functions
- AWS Serverless Application Repository - contains sample apps
- Serverless Framework is different from AWS SAM - supports other provides besides AWS
Amazon EventBridge
- Ingest events from your own apps, SaaS and AWS Services
- Setup rules to filter and send events to targets
Simple Workflow Service (SWF)
- Create distributed asynchronous systems as workflows
- Supports both sequential and parallel processing
- Best suited for human-enabled workflows, e.g. order fulfillment or procedural requests
- AWS recommends Step Functions over SWF for new applications
- Main Components: Activity Worker, Decider (Activity Workers are doing long-polling)
- AWS Simple Workflow is used when we need to support external processes processes or specialized execution logic (maybe beyond the scope of AWS)
AWS Step Functions
- Managed Workflow and Orchestration platform
- Scalable and Highly Available
- Defined your app as a state machine
- Create tasks, sequential steps, parallel steps, branching paths or timers
- Amazon State Language declarative JSON
- Apps can interact and update the stream via Step Function API
- Visual Interface describes flow and realtime status
- Detailed logs for all the steps
- Out-of-the box coordination of AWS components (e.g. Order processing flow)
- Recommended by AWS over Simple Workflow Service for new applications
AWS Batch
- Management tool for creating, managing and executing batch-oriented tasks using EC2 Instances
- Create a Computer Environment: Managed/Unmanaged, Spot, On-Demand, vCPUs
- Create a Job Queue with priority and assigned to a Comput Environment
- Create Job Definition: Script/JSON, ENV vars, mount points, IAM role, container image, etc.
- Schedule the Job
Elastic MapReduce
- Managed Hadoop framework for processing huge amounts of data
- Also supports Apache Spark, HBase, Presto and Flink
- Most common used for log analysis, financial analysis, or ETL (extract, transform and load) activities
- A Step is a programming task for performing some process on the data (i.e. count words)
- A Cluster is a collection of EC2 instances provisioned by EMR to run your steps
- Master Node, Core Node (HDFS), Task Node
Components of Elastic MapReduce
- Hadoop HDFS - Distributed File System
- Hadoop MapReduce - Distributed Processing
- Flume - Log Collection
- ZooKeeper - Resource Coordination
- Sqoop - Data Transfer
- Oozie - Workflow
- Apache Pig - Scripting
- Hive SQL
- Mahout - Machine Learning
- HBase - Columnar Datastore
- Ambari - Management and Monitoring