Architecting to Scale

Architectural Patterns

Loosely Coupled Architecture

Components can stand independently and require little or no knowledge of the inner workings of the other components.


  • Layers of Abstraction
  • Permits more flexibility
  • Interchangable components
  • More atomic functional units

Horizontal Scaling vs. Vertical Scaling

  • Vertical scaling requires downtime
  • Horizontal scaling is theoretically unlimited
  • In horizontal scaling instances can be added on demand which may be a more cost effective solution
  • Horizontal scaling can be automated while vertical scaling would require scripting
  • Operations
    • Scale Out (horizontal)
    • Scale In (horizontal)
    • Scale Up (vertical)
    • Scale Down (vertical)


Type of Auto-Scaling

  • Amazon EC2 Auto-Scaling
  • Application Auto-Scaling
    • API used to to control scaling for resources other than EC2, like DynamoDB, ECS, EMR
    • Provides a common way to interact with the scalability of resources
  • AWS Auto Scaling
    • Provides centralized way to manage scalability for whole stacks; Predictive scaling feature
    • Console that can manage both of the above from a unified standpoint

Amazon EC2 Auto-Scaling Options

  • Maintain - Specific minimum number of instances running
  • Manual - Use maximum, minimum or specific number of instances
  • Schedule - Scale in/out based on schedule
  • Dynamic - based on real-time metrics of the system

Auto-Scaling Policy

  • Target Tracking Policy
  • Simple Scaling Policy
  • Step Scaling Policy (More Sophisticated Logic)

Scaling Cooldown Period

  • Gives resources time to stabilize before automatically triggering another scaling event
  • Different from health check period
  • 300 seconds by default
  • Automatically applies to dynamic scaling and optionally to manual scaling but not supported for schedule scaling

AWS Kinesis

  • Collection of services for processing streams of various data

  • Data is processed in “shards” - each shard can ingest 1000 records per second

  • Default limit of 500 shards

  • Record consists of Partition Key (128 bit MD5 hash), Sequence Number and Data Blob (up to 1MB)

  • Sequence numbers can be duplicated across Shards

  • Transient Data Store - default retention period of 24 hours, can be configured to up to 7 days

  • Kinesis Data Streams - Ingest and stores data streams for processing

  • Kinesis Firehose - Prepares and loads the data continously to the destinations you choose

  • Kinesis Data Analytics - Run standard SQL queries against data streams

DynamoDB Scaling

  • Throughput: Read/Write capacity units
  • Max item size is 400KB
  • There’s no limit on number of items

DDB Terminology

  • Parition - physical space where DDB data is stored
  • Partition Key - Unique identifier for each record, also called Hash Key
  • Sort Key - Optional second part of a composite key that defines storage order - sometimes called a Range Key

DDB Partitions and Scaling

  • Partitions have limitation of Capacity Units and Storage
  • Number of Partitions required are determined by both factors
    • Capacity - RCU / 3000 + WCU / 1000
    • Storage - Total Size / 10GB
    • Total Partitions = Round Up Max(Capacity, Storage)
  • RCU and WCU will be equality allocated across partitions
  • Partition Key should be designed to have high avariability across paritions to distribute the WCUs and RCUs load across the partitions
  • DynamoDB allows Auto-Scaling based on Target Utilization and Limits
    • Supports Global Secondary Indexes
    • Uses Target Tracking method
    • Doesn’t scale down if consumptions drops to zero
    • Workaround1: send requests to table at minimal level
    • Workaround2: manually reduce max capacity to be the same as minimum
  • DynamoDB supports On-Demand scaling
    • Costs more than traditional provisioning and auto-scaling

DynamoDB Accelerator - DAX

  • Sits in from of DDB and provides in-memory caching
  • Micro-second level reads
  • Good for read-intensive applications


  • Supports static / dynamic content at edge locations
  • Supports Adobe Flash Media Server’s RTMP protocol
  • Web Distributions support streaming through HTTP / HTTPS
  • Origins can be S3, EC2, ELB and another Web Server
  • Cache invalidation requests can delete the file from the edge location or you have to wait for TTL to expire
  • Support Zone Apex (domain without subdomain infront of it)
  • Supports Geo-Restriction

SNS (Simple Notification Service)

  • Enables Publish/Subscribe design pattern
  • Topics - Channels for publishing notifications
  • Subscriptions - configuring an endpoint to receive messages published to a topic
    • Endpoint options: HTTP/HTTPS, Email, SMS, SQS, Amazon Device Messaging (push notifications), Lambda
  • Supports Fan-out Architecture - helps achieve a loosely coupled architecture


  • Highly scalable hostead messaging queue
  • Available integration with KMS for encrypting messages
  • Transient Storage - default 4 days, max 14 days
  • Supports first-in / first-out queueing
  • Maximum message size of 256KB - Java SDK allows up to 2GB by utilizing S3
  • Allows Loosely Coupled Architecture

Queue Types

  • Standard Queue - no guarantee about the order of the messages
  • FIFO Queue- maintains receiving order - holds all messages until a message is processed

Amazon MQ

  • Managed, HA Implementation of Apache ActiveMQ
  • Similar to SQS, but a different implementation
  • Supports different protocols
  • Designed as a drop-in replacement for on-premise message brokers (Lift and Shift to the Cloud)
  • Recommended to use SQS if you are building a new application from scratch

AWS Lambda, Serverless Application Manager and EventBridge

  • Run code on-demand without the need for infrastructure
  • Supports Node.js, Python, Java, Go and C#
  • Code is stateless - executed on an event basis (SNS, SQS, S3, DynamoDB Streams, etc.)
  • Very useful for event driven architectures
  • No limits to scaling a function since AWS dynamically allocates capacity in relation to events

AWS Serverless Application Model (AWS SAM)

  • Open source framework for building serverless apps on AWS
  • Uses YAML as configuration language
  • Includes CLI functionality to create, deploy and update serverless apps using AWS services such as Lambda, DynamoDB and API Gateway
  • Enables local testing and debugging of apps using a Lambda-like emulator via Docker
  • Extension of CloudFormation so you can use everything CloudFormation can provide by way of resources and functions
  • AWS Serverless Application Repository - contains sample apps
  • Serverless Framework is different from AWS SAM - supports other provides besides AWS

Amazon EventBridge

  • Ingest events from your own apps, SaaS and AWS Services
  • Setup rules to filter and send events to targets

Simple Workflow Service (SWF)

  • Create distributed asynchronous systems as workflows
  • Supports both sequential and parallel processing
  • Best suited for human-enabled workflows, e.g. order fulfillment or procedural requests
  • AWS recommends Step Functions over SWF for new applications
  • Main Components: Activity Worker, Decider (Activity Workers are doing long-polling)
  • AWS Simple Workflow is used when we need to support external processes processes or specialized execution logic (maybe beyond the scope of AWS)

AWS Step Functions

  • Managed Workflow and Orchestration platform
  • Scalable and Highly Available
  • Defined your app as a state machine
  • Create tasks, sequential steps, parallel steps, branching paths or timers
  • Amazon State Language declarative JSON
  • Apps can interact and update the stream via Step Function API
  • Visual Interface describes flow and realtime status
  • Detailed logs for all the steps
  • Out-of-the box coordination of AWS components (e.g. Order processing flow)
  • Recommended by AWS over Simple Workflow Service for new applications

AWS Batch

  • Management tool for creating, managing and executing batch-oriented tasks using EC2 Instances
  1. Create a Computer Environment: Managed/Unmanaged, Spot, On-Demand, vCPUs
  2. Create a Job Queue with priority and assigned to a Comput Environment
  3. Create Job Definition: Script/JSON, ENV vars, mount points, IAM role, container image, etc.
  4. Schedule the Job

Elastic MapReduce

  • Managed Hadoop framework for processing huge amounts of data
  • Also supports Apache Spark, HBase, Presto and Flink
  • Most common used for log analysis, financial analysis, or ETL (extract, transform and load) activities
  • A Step is a programming task for performing some process on the data (i.e. count words)
  • A Cluster is a collection of EC2 instances provisioned by EMR to run your steps
  • Master Node, Core Node (HDFS), Task Node

Components of Elastic MapReduce

  • Hadoop HDFS - Distributed File System
  • Hadoop MapReduce - Distributed Processing
  • Flume - Log Collection
  • ZooKeeper - Resource Coordination
  • Sqoop - Data Transfer
  • Oozie - Workflow
  • Apache Pig - Scripting
  • Hive SQL
  • Mahout - Machine Learning
  • HBase - Columnar Datastore
  • Ambari - Management and Monitoring