Learn to Build Powerful Machine Learning Models with Amazon Service

Tavish Srivastava Last Updated : 21 Dec, 2015
5 min read

Introduction

After using Azure ML last week, I received multiple emails to publish a tutorial on Amazon’s ML. Thankfully, some of my meetings got postponed and I got time to write this.

Here is some more good news for you, I present you a tool which will make it even more simpler. It will just remove all the guess work you had to do with Azure ML in choosing model and splits. Obviously, I am talking about the Amazon ML tool. Unfortunately, this time you won’t get a trial pack but have to create your account giving up your credit card information. However, the tool is free to use and your credit card information is used only in case you breach the free tier.

In this article, I’ve demonstrated a step by step tutorial to build machine learning model with Amazon. I’ve also shared a video tutorial at the end of this article. Let’s make our first machine learning model with Amazon ML tool.

amazon machine learning step by step tutorial

 

What’s New in Amazon Machine Learning ?

Amazon is known for enhanced user experience, timely innovation and developments.

Just 4 days back, Amazon added a feature for Random Data Splitting and Cross Validation. Now you can train and evaluate machine learning models based on random input data split.  This will help you to avoid overfitting and produce more accurate evaluations.

Last Month, Amazon enabled real time predictions feature which let’s users to preview real time prediction before creating the application. This features requires no code. It’s ‘push a button’ to get started feature.

 

Also Read : Amazon re:Invent 2015 ( Machine Learning Reinvented)

 

Price Breakdown

 Basically, Amazon charges you for 2 services:

Data Analysis and Model Building Fees – It depends size of input data, number of variables, types of transformation and number of computation hours. For this, you’ll be charged $0.42 per hour.

Prediction Fees – It can further be divided into Batch Predictions and Real Time Predictions. Batch Predictions are when your application obtain many predictions at once. In real time predictions, you can request predictions for immediate use via web, mobile or desktop applications. Batch Prediction costs $0.10 per 1000 predictions. Real Time Prediction costs $0.0001 per prediction.

 

Machine Learning Model using Amazon Service

Let’s get to work now!

1. Once you sign in, You’ll find this as the main page (shown below). Now, select Machine Learning models to move to the first page of ML tool.

ML1
 2. Next step is to input a data set. In case, you do not have any ready data set, you can use the one suggested in the dialogue box “banking.csv”. Select S3 as the option (aml-sample-data/banking.csv). Once the data set is loaded successfully, you’ll get a dialogue box of ‘validation successful’.

ML2
Screen Shot 2015-12-05 at 8.36.52 pm
4. Press “Continue” to move to the next screen. You’ll now find all variables and a sample data. One thing you need to make sure is the target tag. This is your dependent variable. In our current example, the target is “y”. Hence, you see a mark in the Target column. Screen Shot 2015-12-05 at 11.54.26 pm

 

5. Now press ‘continue’ and click ‘Review’. In the final tab, you’ll find a summary of all inputs. Below is a sample:

Screen Shot 2015-12-05 at 11.56.45 pm

6. Finally, you press “Finish” and you are done.

 

Checking Model Results

To check the results, go to Dashboard.

Screen Shot 2015-12-05 at 11.58.46 pm

 

In the dashboard, you can find all type of objects created. Here are some key checks you can do:

1. Check the data type : On clicking the ID of Banking.csv, you will find a dashboard to browse through the data. Screen Shot 2015-12-06 at 12.01.35 am2. Now, Click Target Visualization. You’ll find the distribution of each column.  For instance, following is the distribution of the target variable (y): Screen Shot 2015-12-06 at 12.03.02 am

3. Check Performance Metrics : To check performance metrics, click ID of Evaluation type. Below is the dashboard you get:Screen Shot 2015-12-06 at 12.04.57 am

4. As you can see, our model has a AUC of 0.94 . Also, this tool gives you an option to adjust score threshold. This is a very interesting simulation to witness the trade off between false positive and true positive. Here is an instance :

Screen Shot 2015-12-06 at 12.07.12 amIn this chart, you can move the threshold score which gives you % correct and % error. The grey line is for distribution of 0s and black line is the distribution of 1s. The shaded portions represent type 1 and type 2 errors depending on which side of the cut off line the area falls. You also have a tool kit which is called the advanced metrics. These are other levers which can be adjusted to simulate the same graph. Here is a snapshot of this tool kit :

 Screen Shot 2015-12-06 at 12.08.02 am

Additional Resource: You may also be interested in this 53mins tutorial delivered at AWS re:invent 2015:

End Notes

Amazon ML tool is a really good tool for visualisation of data and results. The time which the tool take is slightly on the higher side when I compare it with H2O or other similar tool kits. However, the entire process is exceptionally simple to execute.

In this article, I’ve demonstrated a step by step process to build a machine learning model using Amazon ML service. As you have seen, it’s quite simple and ‘codeless’ process. So, people who find coding to be intimidating should use such services often.

Did you find this article helpful ? Share with us your experience with Amazon Machine Learning tool.

If you like what you just read & want to continue your analytics learning, subscribe to our emailsfollow us on twitter or like our facebook page.

Tavish Srivastava, co-founder and Chief Strategy Officer of Analytics Vidhya, is an IIT Madras graduate and a passionate data-science professional with 8+ years of diverse experience in markets including the US, India and Singapore, domains including Digital Acquisitions, Customer Servicing and Customer Management, and industry including Retail Banking, Credit Cards and Insurance. He is fascinated by the idea of artificial intelligence inspired by human intelligence and enjoys every discussion, theory or even movie related to this idea.

Responses From Readers

Clear

Rick
Rick

Nice article and very easy to follow. The whole tutorial takes a max of 10 mins of ML processing time. Unfortunately the ML module kicks off other ongoing stats processes which keep billing you in the background with no obvious visibility.

shaw38
shaw38

Hi Tavish, Could you publish a tutorial on how to use H2O? I'm an absolute beginner when it comes to H2O but have tried Azure ML before, if this info helps. Thanks!

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details