How to choose the right data science / analytics / big data training?

Kunal Jain Last Updated : 16 Sep, 2017

4 min read

Over the last 2 years, this is the most common query I receive from our readers:

Which data science / analytics training should I go for?

The query comes in varied shapes and size, but the inherent question is still the same.

I can empathize with people facing these questions – the number of tools, analytical techniques under application and trainings provider, all have increased many-fold in last few years. If the trends and projections are to be believed, this is probably just the start of a growth phase.

Let’s take an example, as a person switching from software industry, do you learn SAS or do you learn R? Or should you learn Big Data tools and techniques? How about machine learning? Data Visualization tools? Even if you zero in on one of these, the next question which arises is where and how to undergo these trainings?

I am sure most of the person in this situation feel like the person in the image above. This is where a framework can help you.

Framework to choose right analytics training:

I aim to provide a framework to you to decide:

Which tool to learn?
Which techniques to focus on?
How to learn?
Where to learn?

You can apply it at various stages of your analytics career to find out what should you be learning next.

Overview of the framework:

The answer to first 2 questions in this framework are in form of levels or steps. You start from level 0 and move one step at a time. So if you are a complete fresher start from Level 0 of tools and level 0 of techniques. But, if you are a fresher with statistics background, start with Level 1 of tools (assuming you know Excel) and Level 1 of techniques (move to level 2 if you know predictive modeling)

Once you have finalized the tools and techniques to learn, move on to step 3 and step 4 of the process.

Step 1: Which tool to learn?

Level 0: Excel.

If you don’t know excel, you should learn it first. You should be able to play with Pivot tables, do simple data manipulations and apply lookups in Excel.

Level 1: SAS / R / Python

This is going to be your work horse. You can choose any of these languages. For a more detailed comparison, have a look at this article.

Level 2: QlikView / Tableau / D3.js

You should add up your repository with one of the visualization tools.

Level 3: Big Data tools

This in itself can be multiple levels – start with Hadoop stack – HDFS, HBase, Pig, Hive, Spark

Level 4: NoSQL Databases

Again, you can read an overview of NoSQL databases here and start by learning the most popular one – MongoDB.

Exception 1: If you come from MIS / reporting background, you can start from learning visualization tools like QlikView and Tableau (Level 2) and then go to Level 1

Exception 2: If you come from software engineering / web development and know one of the 2 languages – Java or Python, you can start from Big Data tools as well (level 3)

Step 2: Which techniques should you be learning?

Now that you know, which tool would you want to learn, let us look at the techniques to learn. Again the structure is similar

Level 0: Basics of statistics – Descriptive and Inferential statistics

Level 1: Basic predictive modeling – ANOVA, Regression, Decision trees, Time Series

Level 2: All other remaining machine learning techniques except Neural nets

Level 3: Neural nets and deep learning

Step 3: How should you learn?

How should you learn is dependent on 2 factors:

Resources you can spend on learning; and
Your self learning motivation.

This image explains the selection:

On one extreme, you have option to join open courses – where you spend low (almost zero) resources, but need high self learning motivation. On the other hand, you have courses run by big universities like Stanford / MIT / North Western, where you will need to spend money and will get help and mentor-ship from experts over longer duration. You can choose the style of your learning depending on where you fit in.

Please note that irrespective of which method and blend you choose, you will need to aid these trainings by hands on projects and practice. No resources or trainings can cover that for you. Here are a few examples of these projects.

For people relying completely on self learning, our learning paths can be of great help. There is one for Python, SAS, Weka and Qlikview each and several more under development.

Step 4: Where to learn?

Now that you know, what to learn and how to learn, you can shortlist various options available. You should talk to people who have undergone that training / course and gather some reviews. You can also use our training listing page and apply filters to shortlist the trainings available for various tools and techniques. We have more than 300 trainings listed here and are in process of adding more trainings and courses.

End Notes:

So, there you go! You should have a way to find out your way through this data science course juggle. Hope you find this framework immensely useful. I have tried to put a framework to the most common query I get from our audience. The idea is to enable you to make the right decision to the extent possible. If you think, you are in a situation which doesn’t get addressed by the framework above, please feel free to ask those questions through comments / discussion portal.

P.S. These are my views. A lot of these recommendations are based on my experience and what I think is the right choice. As you can expect, some of these questions don’t have a right or wrong answer. They are subjective in nature. So, if you have a different opinion about something I have mentioned, please feel free to let me know.

If you like what you just read & want to continue your analytics learning, subscribe to our emails, follow us on twitter or like our facebook page.

Kunal Jain

Kunal Jain is the Founder and CEO of Analytics Vidhya, one of the world's leading communities of Al professionals. With over 17 years of experience in the field, Kunal has been instrumental in shaping the global Al landscape. His expertise spans diverse markets, from developed economies like the UK to emerging ones like India, where he has successfully led and delivered complex data-driven solutions. As a recognized thought leader, Kunal has empowered countless individuals to realize their Al ambitions through his visionary approach to Al education and community building. Before founding Analytics Vidhya, Kunal earned both his undergraduate and postgraduate degrees from IIT Bombay and held key roles at Capital One and Aviva Life Insurance across multiple geographies. His passion lies at the intersection of analytics, Al, and fostering a thriving community of data science professionals.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Responses From Readers

Suravi Kalita

Nicely written article.

Darshana

Dear Kunal, Thanks a lot for sharing very interesting insights about choosing the right program for analytics and big data. :) Regards, Darshana

Ruthger

Hi Kunal, Very nice and clear article! What I actually missed were basic Unix Shell programming skills. It can be extremely useful to know how to use commands like grep, awk and sed etc to perform essential data cleaning and pre-processing of the data before bringing these data as for ex. a .csv file into Excel or R. Could you expand a bit on what you feel are the advantages of QlikView / Tableau / D3.js beyond for example making the graphics in R? All the best! Ruthger

Show 1 reply

Kunal Jain

There are 2 advantages where I think a data visualization tool can come very handy: 1. Understanding and exploration of Huge Data - For example, while working on Avazu CTR Kaggle problem, we were working on 7GB data with anonymized columns. It was becoming time consuming to load this data in R and perform exploratory analysis. With QlikView, we could load the entire data in less than 5 minutes and then perform exploratory analysis very quickly. What helps is quick slice and dice and drill throughs available. So you can quickly identify high and low value population and segregate them in your modeling in R 2. The second application is in finally delivering your insights to the customers. Once your analysis is complete, you can use story-telling feature of these visualization tools to present your findings. You can bookmark the graphs and access them quickly on the go. If people want to explore additional information - it is typically far more easier to do so rather than opening RStudio and then writing / running the codes. Hope this helps you answer the question. Regards, Kunal

Write for us

Write, captivate, and earn accolades and rewards for your work

Reach a Global Audience
Get Expert Feedback
Build Your Brand & Audience

Cash In on Your Knowledge
Join a Thriving Community
Level Up Your Data Science Game

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

How to choose the right data science / analytics / big data training?

Framework to choose right analytics training:

Overview of the framework:

Step 1: Which tool to learn?

Step 2: Which techniques should you be learning?

Step 3: How should you learn?

Step 4: Where to learn?

End Notes:

If you like what you just read & want to continue your analytics learning, subscribe to our emails, follow us on twitter or like our facebook page.

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect