Getting Started with Amazon Aurora Whitepaper (2016)

Amazon Aurora

  • Fully managed, cloud-native database service
  • Scalability, reliability, performance
  • Cost-effective
  • Highly durable
    • Database volumes are divided into 10GB segments
    • Each segment is replicated six ways across 3 AZs
  • Fault-tolerant
    • Transparently handles loss of 2 out of 6 copies without losing write availability or three out of six copies without losing read availability
  • Self-healing
    • Automatically replaces or repairs failed disks and nodes
  • Storage autoscaling
    • Volume grows in increments of 10GB up to a maximum of 64TB
  • Continuous backup
    • No impact on performance, 99.99999999999% (11 nines) durability
  • High performance
    • Delivers up to five times the throughput of standard MySQL
  • Read replicas
    • Each cluster may have up to 15 read replicas across AZs
    • Scale read operations
    • Act as failover targets
    • Replication lag is very low (typically in tens of milliseconds)
  • Instant crash recovery
    • Log-structured storage that doesn’t require crash recovery replay of database redo logs
  • Survivable buffer cache
    • Database buffer cache is isolated from the database process
    • Cache survives a database restart
  • Highly secure
    • Runs in a VPC by default, uses SSL to secure data in transit
    • Support encryption at rest

Amazon Aurora Architecture

Cluster consists of:

  • Primary instance
    • Supports read-write workloads
  • Cluster volume
    • SSD virtual database storage volume that spans across multiple AZs
    • Each AZ having 2 copies of the cluster data
    • The primary instance and Amazon Aurora Replicas share te same cluster volume
  • Aurora Replica:
    • Can be additionally created, up to 15 replicas
    • Distributes read workload
    • When located in separate AZ can increase DB availability

Features:

  • Self-Healing, fault-tolerant design
    • Using volume spanning across multi AZs
    • DB volume is split into 10GB segments, spread widely across the cluster (isolates the blast radius of disk failures)
  • Automatic, continuous backups
    • Continuously backs up data to Amazon S3
    • Automatic backup retention up to 35 days
    • Database restore point-in-time up to last 5 minutes
  • High performance
    • Modified database engine
    • Log-structure storage
    • SSD-based virtualized layer purpose-built for database workloads
    • Tests on r3.8xlarge delivers over 500,000 SELECTs/second and 100,000 updates/second
    • Data write operations are acknowledge as soon as they are committed by four out of six storage nodes
    • Storage nodes acknowledge the write operations as soon as the log records are persisted to disk
  • Autoscaling storage
    • Don’t have to provision the space
    • Storage grows automatically
    • Delivers consistent low-latency I/O
    • Hotspots are managed to move data around to ensure consistent performance of the storage of the layer
  • Low-latency read replicas
    • Lower costs, due to sharing the same underlying storage as the primary instance
    • No need to replay logs at the replica nodes
    • Less processing power to serve read requests
    • Reduced replica lag time
  • Failure testing
    • Fault injection queries enables you to schedule a simulated occurrence of failure events
      • Crash of the master instance or an Aurora Replica
      • Failure of an Aurora Replica
      • Disk failure
      • Disk congestion
    • Example: ALTER SYSTEM CRASH [INSTANCE | DISPATCHER | NODE]
  • Multiple Failover Targets
    • On instance failure, Aurora will automatically failover to any of up to 15 Replicas
    • Recommended to place at least one replica in an alternate AZ
    • Failover happens with no data loss, and log replay is not required
      • Replicas and the primary instance share the same storage
  • Instance Crash Recovery
    • Immediate recover from crash
    • No need to replay the redo log from the last database checkpoint
    • Restart time is reduced to less than 60 seconds
  • Survivable Caches
    • Cache is isolated from the database
    • Cache remains warm after database restart
  • Security
    • Based Amazon VPC
    • SGs and NACLs can be leverage to control access to your instances in each subnet
    • Supports SSL connections from application, using SSL (AES-256)
    • Supports encryption of data at rest, using AES-256 with hardware acceleration support
    • Encryption keys managed using AWS KMS

Monitoring

  • Default CloudWatch metrics are collected from hypervisor
  • Enhanced monitoring of up to 1 seconds granularity
    • Metrics are collected from lightweight agent installed the instance
    • Supports up to 50 metrics related to CPU usage, network, storage, and memory

Migrating to Amazon Aurora

  • MySQL on RDS database snapshot can be to Aurora DB cluster
  • MYSQL on EC2 can be migrated by piping mysqldump directly to Amazon Aurora
  • AWS Database Migration Service (AWS DMS)
    • Offer minimal downtime or service interruption
    • The source database remains fully operational during the migration
    • AWS DMS continuously captures the changes from the source database and applies them to the target
    • Manages all the complexities of the migration process, e.g. compression, parallel transfer for faster data transfer
    • Low-cost and simple to use - pay for compute resources during the migration process
    • Supports Oracle to Oracle, Oracle to Aurora, SQL Server to MySQL
      • For heterogenous migrations use AWS Schema Conversion Tool