Infrastructure as Code Whitepaper (2017)

Infrastructure Resource Lifecycle

IaaC Features:

  • Both administrator and developers can instantiate infrastructure using configuration files
  • Code can be used to produce compute, storage, network and application services
  • Eliminates configuration drift through automation
  • Increases the speed and agility of infrastructure deployments

Lifecycle:

  1. Resource provisioning
  2. Configuration management
  3. Monitoring and performance
  4. Compliance and governance
  5. Resource optimization

Resource Provisioning

AWS CloudFormation

  • Uses JSON/YAML to describe the collection of AWS resources (knows as stack), dependencies and runtime parameters
  • Templates can be used repeatedly to create copies of the same stack across AWS regions
  • Code can be versioned
  • Change Sets enables you to preview proposed changes to a stack without performing the associated updated
    1. Create a change set
    2. View the change set
    3. Execute the change set
  • Reusable Templates
    • Nested Stacks: associating parent / child stacks
    • Cross-stack referencing: referencing resources from one stack in another stack
  • Template Linting
    • Static analysis of AWS CloudFormation templates
    • ValidateTepmplate API: aws cloudformation validate-template --template-url [url]
    • cfn-nag performs additional evaluations on templates to look for potential security concerns
    • cfn-check performs deeper checks on resource specifications to identify potential errors before they emerge during stack creation

Template Anatomy

---
AWSTemplateFormatVersion: "version date"

Description:
    String

Parameters:
    set of parameters

Mappings:
    set of mappings

Conditions:
    set of conditions

Transform:
    set of transforms

Resources:
    set of resources

Outputs:
    set of outputs

Best Practices for designing and implementing AWS CloudFormation templates:

  • Planning and Organizing
    • Organize your stacks by lifecycle and ownership
    • Use IAM to control access
    • Reuse templates to replica stacks in multiple environments
    • Use nested stacks to reuse common template patterns
    • Use cross-stack references to export shared resources
  • Creating templates
    • Do not embed credentials in your templates
    • Use AWS-Specific parameter types
    • Use Parameter constraints
    • Use AWS::CloudFormation::Init to deploy software application to Amazon EC2 instances
    • Use the latest helper scripts
    • Validate templates before using them
    • Use Parameter store to centrally manage parameters in your templates
  • Managing stacks
    • Manage all stack resources through AWS CloudFormation
    • Create Change Sets before updating your stacks
    • Use Stack Policies
    • Use AWS CloudTrail to log AWS CloudFormation calls
    • Use code reviews and revision control to manage your templates
    • Update your Amazon EC2 linux instances regularly

Configuration Management

Amazon EC2 Systems Manager

Task List

  • Run Command
    • Manages the configuration of managed instances at scale by distributing commands across a fleet
  • Inventory
    • Automate the collection of the software inventory from managed instances
  • State Manager
    • Keep managed instances in a defined and consistent state
  • Maintenance Windows
    • Define a maintenance window for running administrative tasks
  • Patch Manager
    • Deploy software patches automatically across group of instances
  • Automation
    • Perform common maintenance and deployment tasks, such as updating Amazon Machine Images (AMIs)
  • Parameter Store
    • Store, control, access, and retrieve configuration data, whether plain-text data such as database strings or secrets such as passwords, encrypted through AWS Key Management System (KMS)

Document Structure

  • Document defines actions that System Manager performs on your instances
  • Includes pre-configured documents to support the capabilities
  • Supports creation of custom version-controlled documents to augment capabilities of System Manager
  • Steps in the document define execution order
  • Written in JSON

Example:

{
    "schemaVersion": "2.0",
    "description": "Sample version 2.0 document v2",
    "parameters": {},
    "mainSteps": [
        {
            "action": "aws:runPowerShellScript",
            "name": "runShellScript",
            "inputs": {
                "runCommand": ["ipconfig"]
            }
        },
        {
            "action": "aws:applications",
            "name": "installapp",
            "inputs": [
                "action": "Install",
                "source": "http://dev.mysql.com/get/Downloads/MySQLInstaller/mysql-installer-community-5.6.22.0.msi"
            ]

        }
    ]
}

Best Practices

  • Run Command
    • Improve your security posture by leveraging Run Command to access your EC2 instances, instead of SSH/RDP
    • Audit all API calls made by or on behalf of Run Command using AWS CloudTrail
    • Use the rate control feature in Run Command using AWS Cloud Trail
    • Use fine-grained access permissions for Run Command (and all System Manager Capabilities) by using IAM policies
  • Inventory
    • Use Inventory in combination with AWS Config to audit your application configuration overtime
  • State Manager
    • Update the SSM agent periodically (at least once a month) using pre-configured AWS-UpdateSSmAgent document
    • Bootstrap EC2 instance on launch using EC2Config for Windows
    • (Specific to Windows) Upload the PowerShell or Desire State Configuration (DSC) module to Amazon S3, and use AWS-InstallPowerShellModule
    • Use tags to create application groups. Then target instances using the Targets parameters, instead of specifying individual instance IDs
    • Automatically remediate findings generated by Amazon Inspector using Systems Manager
    • Use a centralized configuration repository for all yof your System Manage documents, and share documents across your organization
  • Maintenance Windows
    • Define a schedule for performing disruptive actions on your instances
  • Patch Manager
    • Use patch Manager to roll out patches at scale and to increase fleet compliance visibility across your EC2 instances
  • Automation
    • Create self-serviceable runbooks for infrastructure as Automation documents
    • Use Automation to simplify create AMIs from the AWS Marketplace or custom AMIs, using public documents, or authoring your own workflows
    • Use the documents AWS-UpdateLinuxAmi or AWS-UpdateWindowsAmi or create a custom Automation document to build and maintain images
  • Parameter Store
    • Use Parameter Store to manage global configuration settings in a centralized manner
    • Use Parameter Store fore secrets managements, encrypted through AWS KMS
    • Use PArameter Store with Amazon EC2 Container Service (ECS) task definitions to store secrets

AWS OpsWorks for Chef Automate

  • Brings capabilities of Chef Automate to support DevOps capabilities at scale
  • Based on concepts of recipes
  • Configuration scripts written in Ruby
  • Supports DevOps practices: workflow, compliance, visibility

Supported resource definitions

  • Bash
  • Directory
  • Execute
  • File
  • Git
  • Group
  • Package
  • Route
  • Service
  • User

Example:

package 'apache2' do
    case node[:platform]
    when 'centos', 'redhat', 'fedora', 'amazon'
        package_name 'httpd'
    when 'debian', 'ubuntu'
        package_name 'apache2'
    end
    action :install
end

Recipe Linting and Testing

  • Linting with Rubocop
    • Static analysis based on Ruby style guide
  • Linting with Foodcritic
    • Checks chef recipes based on a set of built-in rules
  • Unit Testing with ChefSpec
  • Integration Testing with Test Kitchen
    • Creates test environments and validates the creation of resources specified in Chef recipes

Best Practices

  • Consider storing Chef recipes in an Amazon S3 archive, with Amazon S3 versioning
  • Establish backup schedule that meets your organizational governance requirements
  • Use IAM to limit access to the OpsWorks for Chef Automate API calls

Monitoring and Performance

  • Amazon CloudWatch Metrics
  • Create alarms in Amazon CloudWatch
  • Respond to metric-based alarms using built-in notifications, Amazon SNS, or custom Lambda functions
  • Make use of Amazon CloudWatch Logs
    • Install CloudWatch Logs Agent on EC2 instances
    • Logstash, Graylog, Fluentd can also ship logs
    • Logs stored to S3 can also be shipped to CloudWatch Logs, e.g. Lambda on S3 event
    • CloudWatch logs can be used for metrics, that can trigger alarms
    • Log processing and correlation allow deeper analysis
  • CloudWatch Events
    • Events from changes to AWS environments
    • Targets can include built-in actions, SNS notifications, Lambda functions

Best Practices

  • Ensure that all AWS resources are emitting metrics
  • Create CloudWatch alarms for metrics that provide the appropriate responses as metric-related events arise
  • Send logs from AWS resources, including Amazon S3, and Amazon EC2 to CloudWatch Logs for analysis using log stream triggers and Lambda functions
  • Schedule ongoing maintenance tasks with CloudWatch and Lambda
  • Use CloudWatch custom events to respond to application-level issues

Governance and Compliance

  • AWS Config
    • Assess, audit, and evaluate the configurations of AWS resources
    • Automatically builds an inventory of your resources and tracks changes made to them
    • Provides a clear view of resource change timeline
  • AWS Config Rules
    • Every change triggers an evaluation by the rules associated with the resources
    • Provided managed rules for common requirements
    • Easily identity noncompliant resources and help with reporting and remediation
    • Supports custom rules using AWS Lambda functions

AWS Config Rule Structure Example: Lambda to evaluate if flow logs are enabled on a given VPC:

import boto3
import json

def evaluate_compliance(config_item, vpc_id)"
    if (config_item['resourceType'] != 'AWS::EC2::VPC'):
        return 'NOT_APPLICABLE'
    elif is_flow_logs_enabled(vpc_id):
        return 'COMPLIANT'
    else:
        return 'NON_COMPLIANT'

def is_flow_logs_enabled(vpc_id):
    ec2 = boto3.client('ec2')
    response = ec2.describe_flow_)logs(
        Filter=[{'Name': 'resource-id', 'Values': [vpc_id]}, ],
    )
    if len(response[u'FlowLogs']) != 0: return True

def lambda_handler(event, context):
    invoking_event = json.loads(event['invokingEvent'])
    compliance_value = 'NOT_APPLICABLE'
    vpc_id = invoking_event['configurationItem']['resourceId']
    compliance_value = evaluate_compliance(
        invoking_event['configurationItem'], vpc_id
    )

    config = boto3.client('config')
    response = config.put_evaluations(
        Evalutions = [
            {
                'ComplianceResourceType': invoking_event['configurationItem']['resourceType'],
                'ComplianceResourceId': vpc_id,
                'ComplianceType': compliance_value
                'OrderingTimestamp': invoking_event['configurationItem']['configurationItemCaptureTime']
            }
        ],
        ResultToken = event['resultToken']
    )

Best Practices

  • Enable AWS Config for all regions to record the configuration item history, to facilitate auditing and compliance tracking
  • Implement a process to respond to changes detected by AWS Config. This could include email notifications and the use of AWS Config rules to respond to changes programmatically.

Resource Optimization

  • AWS Trusted Advisor
    • Observe best practices by scanning your AWS resources and comparing their usage against AWS best practices of 4 categories
      • Cost optimization
      • Performance
      • Security
      • Fault Tolerance
    • Trusted Advisor integrates with CloudWatch Events
      • You can design a Lambda function to respond to a change in the status of Trusted Advisor checks, e.g. send a notification

Best Practices

  • Subscribe to Trusted Advisor notifications through email or other system
  • Use distribution lists and ensure that the appropriate recipients are included
  • For AWS Business or Enterprise support, use AWS Support API in conjunction with Trusted Advisor to create cases to perform remediation

Key Actions to Implement IaaC

  • Start by using source control services, e.g. AWS CodeCommit
  • Incorporate quality control process via unit tests and static code analysis before deployments
  • Remove the human element and automate infrastructure provisioning, including infrastructure permission policies
  • Create idempotent infrastructure code that you can easily redeploy
  • Roll out every new update via code by updating idempotent stacks. Avoid making one-off changes manually.
  • Embrace end-to-end automation
  • Include infrastructure automation work as part of regular product sprints
  • Make your changes auditable, and make logging mandatory
  • Define common standards across your organization and continuously optimize