Home | About Me
| | | | | | | | | |

Developing Machine Learning Applications

AWS Certified MLS | 11 Nov 2019

AWS SageMaker

AWS SageMaker is a fully managed service that allows the users to build, train and deploy machine learning models.

Underlying steps are fully managed by AWS through each of the stages:

Notebook Instance

AWS SageMaker Neo

Challenges Faced

AWS SageMaker Neo

AWS SageMaker Neo Components

Machine Learning Algorithms

Machine Intelligence

Predicting Numbers

Linear Supervised Algorithms

Non-Linear Supervised Algorithms

Unsupervised Learning Algorithms

Deep Learning

Factors in Advent of Deep Learning

Deep Neural Networks

Networks with over thousand layers have been experimented with

AWS SageMaker

Convolutional Neural Networks (CNNs)

Recurrent Neural Networks (RNNs)

Network Topologies

Automatic Model Tuning


Neural Networks



Tuning Hyperparameters


Brute Force

Meta model

SageMaker’s method

SageMaker Integration

Flat Hyperparameters

Advanced Analytics with Amazon SageMaker

Building and Training Machine Learning Models with Amazon SageMaker and Apache Spark

Apache Spark

Spark and SageMaker Integration

Spark / SageMaker Integration Components

Building Machine Learning Pipelines using Spark and SageMaker

Problem: Recognizing handwritten numbers 0-9 using MNIST data set.

In this example Apache Spark will pre-process data to do feature reduction using PCA (Principal Component Analysis). We instantiate PCA object proving the input: set of Features, and a target number of features k. PCA algorithm will choose the most significant features and return “Projected Features”. Those features will act as an input to the second stage in the pipeline.

Defining the pipeline in Jupyter Notebook using both Spark SDK and SageMaker SDK will allow us to automate the process of pre-processing, training and deploying the model.

As a result we expect to have 2 Endpoints running on AWS SageMaker Infrastructure:

This allows to fully de-couple the pre-processing task from prediction. On calling transform() function the Pipeline will first contact PCA Endpoint to reduce the features of the provided input and then will call K-Means Endpoint to get the prediction based on the Projected Features as an input.

Our pipeline will consist of the following steps:

  1. Performing feature analysis/reduction on the input data set using PCA algorithm running on Apache Spark cluster
    • SageMaker Job will be created for running PCA feature reduction
  2. Training on reduced feature data using K-Means algorith on AWS SageMaker
    • SageMaker Job will be created and will run automatically on completion of Step 1
  3. Running Test-Data using the created AWS SageMaker Endpoint

Anomaly Detection on AWS

Random Cut Forest Algorithm

How it works

Dealing with Stream of Data



Kinesis Streams


Building Recommendation Systems with MXNet and GluOn

Collaborative Filtering

Matrix Factorization

Factorizes a matrix to separate matrices, that when multiplied approximate to the completed matrix.

Oreilly Matrix Factorization

Points to Consider

Cold-Start Problem

Hybrid Models

Semantic Models

How to Choose a Model

How to choose factorization model