Databases

Relational Database on AWS: (RDS) used for OLTP (Online Transaction Processing)

  • Supported Types
    • SQL Server
    • Oracle
    • MySQL Server
    • PostgreSQL
    • Aurora
    • MariaDB
  • Run on virtual machines
  • You cannot log in to these operating systems however
  • Patching of the RDS Operating System and DB is Amazon’s responsibility
  • RDS is NOT Serverless
  • Aurora Serverless is Serverless

Read-Replicas vs Muti-AZ

  • Multi-AZ
    • For Disaster Recovery
    • AWS handles cross-region replication
    • AWS does an automatic failover of the instance
    • Available for SQL Server, Oracle, MySQL Server, PostgreSQL, MariaDB
    • You can force a failover from one AZ to another by rebooting the RDS instance.
  • Read Replicas
    • Improves Performance
    • Up to 5 copies of Read-Replicas
    • All read traffic can be going to read replica
    • Available for MySQL Server, PostgreSQL, MariaDB, Aurora
    • You can create read replicas of read replica as well, however, the latency will be greater then.
    • Read replicas can be promoted to be their own databases - this will break the replication.
    • Read replica can be in a second region.
    • Automated backups should be turned on in order to be able to create read replicas.

RDS Backups

  • Automated Backups
    • Allow you to recover your database to any point in time within a “retention period”. Retention period can be between 1 and 35 days.
    • Enabled by default
    • Backups stored in S3
    • Free storage space equal to the size of your database
    • On recovery, RDS will choose the most recent daily backup (full snapshot) and then apply transaction logs relevant to the restore “point in time”.
    • Backups are taken within the defined window.
    • During the backup window, storage I/O may be suspended and latency may be elevated.
    • Deleted when RDS instance is deleted
  • Snapshots
    • Done manually
    • Stored even after deletion of RDS instance
  • Restoring snapshots or automated backups will create a new RDS instance with a new DNS endpoint.

Encryption:

  • Encryption at rest is supported.
  • Encryption is done using AWS Key Management Service (KMS).
  • Automated backups, snapshots and read replicas of encrypted storage will be encrypted as well.
  • Is available for all RDS supported types

DynamoDB (NoSQL database, Non-Relational Databases)

  • Terms:
    • Collection
    • Document
    • Key Value Pairs
  • Fast and flexible NoSQL database service
  • Single-digit millisecond latency at any scale
  • Fully managed database
  • Great fit for mobile, web, gaming, ad-tech, IoT and many other applications
  • Stored on SSD storage
  • Spread across 3 geographically distinct data centers
  • Eventual Consistent Reads (Default)
  • Strongly Consistent Reads (1-second rule)

Allocated resources:

  • Read Capacity Unit (for an item up to 4 KB)
    • One strongly consistent read per second
    • Two eventual consistent reads per second
  • Write Capacity Unit (for an item up to 1KB)
    • One write per second

AWS Redshift

Data Warehousing

  • Business Intelligence Tools
  • Pulling very large and complex data sets. Usually used by management to do queries on data
  • OLTP vs OLAP
    (Online Transaction Processing vs Online Analytics Processing)
  • Data Warehousing databases use a different type of architecture both from a database perspective and infrastructure layer.

Amazon Redshift (OLAP)

  • Amazon’s Data Warehouse Solution
  • Fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud
  • Configuration
    • Single Node (160GB)
    • Multi-Node
      • Leader Node (manages client connections and receives queries)
      • Compute Node (store data and perform queries and computations). Up to 128 Compute Nodes.
  • Advanced Compression
    • Columnar data stores can be compressed much more than row-based data stores because similar data is stored sequentially on disk.
    • Does not require indexes or materialized views, so uses less space than traditional relational database systems. When loading data into an empty table, Amazon Redshift automatically samples your data and selects the most appropriate compression scheme.
  • Massive Parallel Processing (MPP)
    • Distributes data and query load across all nodes.
    • Makes it easy to add odes to your data warehouse and enables you to maintain fast query performance as your data warehouse grows.
  • Backups
    • By default 1 day retention period, up to 35 days.
    • Always attempts to maintain at least three copies of your data (the original and replica on the compute nodes and a backup in Amazon S3)
    • Can also asynchronously replicate your snapshots to S3 in another region for disaster recovery.
  • Pricing
    • Compute Node Hours
    • You will not be charged for leader node hours.
    • Backup charges
    • Data transfer (only within a VPC, not outside it)
  • Security Considerations
    • Encrypted in transit using SSL
    • Encrypted at rest using AES-256 encryption
    • Redshift takes care of key management
      • Manage your key through HSM (Hardware Security Module)
      • AWS Key Management Service
  • Availability
    • Available in 1AZ
    • Can restore snapshots to the new availability zone

Aurora

  • Amazon’s own proprietary database that is compatible with MySQL and PostgreSQL
  • 2 copies of your data are contained in each AZ with a minimum of 3 AZs. 6 copies of your data.
  • You can share Aurora Snapshots with other AWS accounts.
  • 2 types of replicas available
    • Aurora Replicas
    • MySQL replicas
  • Automated failover is only available with Aurora Replicas.
  • You can do migrations from MySQL to Aurora only by creating an Aurora read replica and promoting it.
  • Provides up to 5 times better performance than MySQL.
  • Start with 10GB, Scales in 10GB increments to 64TB (Storage Autoscaling)
  • Compute resources can scale up to 32vCPU and 244GB of Memory
  • Designed to transparently handle the loss of up to two copies of data without affecting database write availability and up to three copies without affecting read availability.
  • Automated backups are always enabled and they do not impact database performance.
  • Taking snapshots does not impact performance.

ElasticCache

Web service that makes it easy to deploy, operate and scale an in-memory cache in the cloud. Improves the performance of web applications. Retrieves information from fast, managed, in-memory caches.

  • Memcached
    • Can be scaled horizontally
  • Redis
    • Multi-AZ
    • You can do backups and restores of Redis

FAQs

  • By default customers are allowed to have up to a total of 40 Amazon RDS DB instances.
  • Database limits per RDS instance are imposed based on software limitation.
  • Maintenance windows - defines the time range when DB instance modifications, database engine version upgrades, and software patching occurs, in the event they are requested or required.
  • Auto Minor Version Upgrade setting will automatically schedule the minor version upgrade in the next maintenance window.
  • When a major version is deprecated in RDS a minimum 6 month period will be granted to upgrade to a supporter major version.
  • Billing components include: DB instance hours, Storage, I/O requests per month, Provisioned IOPS per month, Backup Storage, Data transfer.
  • Amazon RDS supported storage types are: Provisioned IOPS (SSD) Storage (OLTP workloads) and General Purpose (SSD) Storage (for moderate I/O requirements)
  • RDS Automated Backups - allows point in time recovery using transaction logs.
  • Snapshots - are user-initiated backups - backing up your DB instance.
  • RDS master user account is a native database user account which you can use to connect to your DB Instance.
  • You can encrypt connections between your application and the DB instance using SSL/TLS. Amazon RDS generates an SSL/TLS certificate for each DB Instance. Once an encrypted connection is established, data transferred between the DB Instance and your application will be encrypted during transfer.
  • Amazon RDS supports encryption at rest for all database engines, using keys you manage using AWS Key Management Service (KMS).
  • By default, Amazon RDS chooses the optimal configuration parameters for your DB Instance taking into account the instance class and storage capacity.
  • When you create or modify your DB instance to run as a Multi-AZ deployment, Amazon RDS automatically provisions and maintains a synchronous “standby” replica in a different Availability Zone.
  • When you run a DB instance as a Multi-AZ deployment, the “primary” serves database writes and reads. In addition, Amazon RDS provisions and maintains a “standby” behind the scenes, which is an up-to-date replica of the primary. The standby is “promoted” in failover scenarios. After failover, the standby becomes the primary and accepts your database operations.
  • You may observe elevated latencies relative to a standard DB instance deployment in a single Availability Zone as a result of the synchronous data replication performed on your behalf.
  • Multi-AZ standby cannot serve read requests. Multi-AZ deployments are designed to provide enhanced database availability and durability, rather than read scaling benefits.
  • RDS initiates failover in these events: Loss of availability in primary Availability Zone, Loss of network connectivity to primary, Compute unit failure on primary, Storage failure on primary.
  • AWS will emit a DB instance event in case of failover. You can alsouse RDS Event Notifications to be notified of specific events.
  • You can initiate a forced failover when rebooting your instance.
  • Automatic Backups must be enabled on instance to be able to create read replicas.
  • Amazon RDS for MySQL, MariaDB, PostgreSQL, and Oracle allow you to create up to 5 read replicas for a given source DB instance.
  • Amazon RDS for MySQL, MariaDB, PostgreSQL, and Oracle allow you to enable Multi-AZ configuration on read replicas to support disaster recovery and minimize downtime from engine upgrades.
  • Amazon Aurora, Amazon RDS for MySQL and MariaDB: You can create a second-tier read replica from an existing first-tier read replica. By creating a second-tier read replica, you may be able to move some of the replication load from the master database instance to a first-tier Read Replica. Please note that a second-tier Read Replica may lag further behind the master because of additional replication latency introduced as transactions are replicated from the master to the first tier replica and then to the second-tier replica.