Architecting to Scale
Architectural Patterns
Loosely Coupled Architecture
Components can stand independently and require little or no knowledge of the inner workings of the other components.
Benefits:
- Layers of Abstraction
- Permits more flexibility
- Interchangable components
- More atomic functional units
Horizontal Scaling vs. Vertical Scaling
- Vertical scaling requires downtime
- Horizontal scaling is theoretically unlimited
- In horizontal scaling instances can be added on demand which may be a more cost effective solution
- Horizontal scaling can be automated while vertical scaling would require scripting
- Operations
- Scale Out (horizontal)
- Scale In (horizontal)
- Scale Up (vertical)
- Scale Down (vertical)
Auto-Scaling
Type of Auto-Scaling
- Amazon EC2 Auto-Scaling
- Application Auto-Scaling
- API used to to control scaling for resources other than EC2, like DynamoDB, ECS, EMR
- Provides a common way to interact with the scalability of resources
- AWS Auto Scaling
- Provides centralized way to manage scalability for whole stacks; Predictive scaling feature
- Console that can manage both of the above from a unified standpoint
Amazon EC2 Auto-Scaling Options
- Maintain - Specific minimum number of instances running
- Manual - Use maximum, minimum or specific number of instances
- Schedule - Scale in/out based on schedule
- Dynamic - based on real-time metrics of the system
Auto-Scaling Policy
- Target Tracking Policy
- Simple Scaling Policy
- Step Scaling Policy (More Sophisticated Logic)
Scaling Cooldown Period
- Gives resources time to stabilize before automatically triggering another scaling event
- Different from health check period
- 300 seconds by default
- Automatically applies to dynamic scaling and optionally to manual scaling but not supported for schedule scaling
AWS Kinesis
-
Collection of services for processing streams of various data
-
Data is processed in “shards” - each shard can ingest 1000 records per second
-
Default limit of 500 shards
-
Record consists of Partition Key (128 bit MD5 hash), Sequence Number and Data Blob (up to 1MB)
-
Sequence numbers can be duplicated across Shards
-
Transient Data Store - default retention period of 24 hours, can be configured to up to 7 days
-
Kinesis Data Streams - Ingest and stores data streams for processing
-
Kinesis Firehose - Prepares and loads the data continously to the destinations you choose
-
Kinesis Data Analytics - Run standard SQL queries against data streams
DynamoDB Scaling
- Throughput: Read/Write capacity units
- Max item size is 400KB
- There’s no limit on number of items
DDB Terminology
- Parition - physical space where DDB data is stored
- Partition Key - Unique identifier for each record, also called Hash Key
- Sort Key - Optional second part of a composite key that defines storage order - sometimes called a Range Key
DDB Partitions and Scaling
- Partitions have limitation of Capacity Units and Storage
- Number of Partitions required are determined by both factors
- Capacity - RCU / 3000 + WCU / 1000
- Storage - Total Size / 10GB
- Total Partitions = Round Up Max(Capacity, Storage)
- RCU and WCU will be equality allocated across partitions
- Partition Key should be designed to have high avariability across paritions to distribute the WCUs and RCUs load across the partitions
- DynamoDB allows Auto-Scaling based on Target Utilization and Limits
- Supports Global Secondary Indexes
- Uses Target Tracking method
- Doesn’t scale down if consumptions drops to zero
- Workaround1: send requests to table at minimal level
- Workaround2: manually reduce max capacity to be the same as minimum
- DynamoDB supports On-Demand scaling
- Costs more than traditional provisioning and auto-scaling
DynamoDB Accelerator - DAX
- Sits in from of DDB and provides in-memory caching
- Micro-second level reads
- Good for read-intensive applications
CloudFront
- Supports static / dynamic content at edge locations
- Supports Adobe Flash Media Server’s RTMP protocol
- Web Distributions support streaming through HTTP / HTTPS
- Origins can be S3, EC2, ELB and another Web Server
- Cache invalidation requests can delete the file from the edge location or you have to wait for TTL to expire
- Support Zone Apex (domain without subdomain infront of it)
- Supports Geo-Restriction
SNS (Simple Notification Service)
- Enables Publish/Subscribe design pattern
- Topics - Channels for publishing notifications
- Subscriptions - configuring an endpoint to receive messages published to a topic
- Endpoint options: HTTP/HTTPS, Email, SMS, SQS, Amazon Device Messaging (push notifications), Lambda
- Supports Fan-out Architecture - helps achieve a loosely coupled architecture
SQS
- Highly scalable hostead messaging queue
- Available integration with KMS for encrypting messages
- Transient Storage - default 4 days, max 14 days
- Supports first-in / first-out queueing
- Maximum message size of 256KB - Java SDK allows up to 2GB by utilizing S3
- Allows Loosely Coupled Architecture
Queue Types
- Standard Queue - no guarantee about the order of the messages
- FIFO Queue- maintains receiving order - holds all messages until a message is processed
Amazon MQ
- Managed, HA Implementation of Apache ActiveMQ
- Similar to SQS, but a different implementation
- Supports different protocols
- Designed as a drop-in replacement for on-premise message brokers (Lift and Shift to the Cloud)
- Recommended to use SQS if you are building a new application from scratch
AWS Lambda, Serverless Application Manager and EventBridge
- Run code on-demand without the need for infrastructure
- Supports Node.js, Python, Java, Go and C#
- Code is stateless - executed on an event basis (SNS, SQS, S3, DynamoDB Streams, etc.)
- Very useful for event driven architectures
- No limits to scaling a function since AWS dynamically allocates capacity in relation to events
AWS Serverless Application Model (AWS SAM)
- Open source framework for building serverless apps on AWS
- Uses YAML as configuration language
- Includes CLI functionality to create, deploy and update serverless apps using AWS services such as Lambda, DynamoDB and API Gateway
- Enables local testing and debugging of apps using a Lambda-like emulator via Docker
- Extension of CloudFormation so you can use everything CloudFormation can provide by way of resources and functions
- AWS Serverless Application Repository - contains sample apps
- Serverless Framework is different from AWS SAM - supports other provides besides AWS
Amazon EventBridge
- Ingest events from your own apps, SaaS and AWS Services
- Setup rules to filter and send events to targets
Simple Workflow Service (SWF)
- Create distributed asynchronous systems as workflows
- Supports both sequential and parallel processing
- Best suited for human-enabled workflows, e.g. order fulfillment or procedural requests
- AWS recommends Step Functions over SWF for new applications
- Main Components: Activity Worker, Decider (Activity Workers are doing long-polling)
- AWS Simple Workflow is used when we need to support external processes processes or specialized execution logic (maybe beyond the scope of AWS)
AWS Step Functions
- Managed Workflow and Orchestration platform
- Scalable and Highly Available
- Defined your app as a state machine
- Create tasks, sequential steps, parallel steps, branching paths or timers
- Amazon State Language declarative JSON
- Apps can interact and update the stream via Step Function API
- Visual Interface describes flow and realtime status
- Detailed logs for all the steps
- Out-of-the box coordination of AWS components (e.g. Order processing flow)
- Recommended by AWS over Simple Workflow Service for new applications
AWS Batch
- Management tool for creating, managing and executing batch-oriented tasks using EC2 Instances
- Create a Computer Environment: Managed/Unmanaged, Spot, On-Demand, vCPUs
- Create a Job Queue with priority and assigned to a Comput Environment
- Create Job Definition: Script/JSON, ENV vars, mount points, IAM role, container image, etc.
- Schedule the Job
Elastic MapReduce
- Managed Hadoop framework for processing huge amounts of data
- Also supports Apache Spark, HBase, Presto and Flink
- Most common used for log analysis, financial analysis, or ETL (extract, transform and load) activities
- A Step is a programming task for performing some process on the data (i.e. count words)
- A Cluster is a collection of EC2 instances provisioned by EMR to run your steps
- Master Node, Core Node (HDFS), Task Node
Components of Elastic MapReduce
- Hadoop HDFS - Distributed File System
- Hadoop MapReduce - Distributed Processing
- Flume - Log Collection
- ZooKeeper - Resource Coordination
- Sqoop - Data Transfer
- Oozie - Workflow
- Apache Pig - Scripting
- Hive SQL
- Mahout - Machine Learning
- HBase - Columnar Datastore
- Ambari - Management and Monitoring