Understand Machine Learning and Its End-to-End Process

Shanthababu Pandian Last Updated : 18 Dec, 2020

5 min read

This article was published as a part of the Data Science Blogathon.

What is Machine Learning?

Machine Learning: Machine Learning (ML) is a highly iterative process and ML models are learned from past experiences and also to analyze the historical data. On top, ML models are able to identify the patterns in order to make predictions about the future of the given dataset.

Why is Machine Learning Important?

Since 5V’s are dominating the current digital world (Volume, Variety, Variation Visibility, and Value), so most of the industries are developing various models for analyzing their presence and opportunities in the market, based on this outcome they are delivering the best products, services to their customers on vast scales.

What are the major Machine Learning applications?

Machine learning (ML) is widely applicable in many industries and its processes implementation and improvements. Currently, ML has been used in multiple fields and industries with no boundaries. The figure below represents the area where ML is playing a vital role.

Where is Machine Learning in the AI space?

Just have a look at the Venn Diagram, we could understand where the ML in the AI space and how it is related to other AI components.

As we know the Jargons flying around us, let’s quickly look at what exactly each component talks about.

How Data Science and ML are related?

Machine Learning Process, is the first step in ML process to take the data from multiple sources and followed by a fine-tuned process of data, this data would be the feed for ML algorithms based on the problem statement, like predictive, classification and other models which are available in the space of ML world. Let us discuss each process one by one here.

Machine Learning – Stages: We can split ML process stages into 5 as below mentioned in the flow diagram.

Collection of Data
Data Wrangling
Model Building
Model Evaluation
Model Deployment

Identifying the Business Problems, before we go to the above stages. So, we must be clear about the objective of the purpose of ML implementation. To find the solution for the given/identified problem. we must collect the data and follow up the below stages appropriately.

Collection of Data

Data collection from different sources could be internal and/or external to satisfy the business requirements/problems. Data could be in any format. CSV, XML.JSON, etc., here Big Data is playing a vital role to make sure the right data is in the expected format and structure.

Data Wrangling and Data Processing: The main objective of this stage and focus are as below.

Data Processing (EDA):

Understanding the given dataset and helping clean up the given dataset.
It gives you a better understanding of the features and the relationships between them
Extracting essential variables and leaving behind/removing non-essential variables.
Handling Missing values or human error.
Identifying outliers.
The EDA process would be maximizing insights of a dataset.

Feature engineering:

Handling missing values in the variables
Convert categorical into numerical since most algorithms need numerical features.
Need to correct not Gaussian(normal). linear models assume the variables have Gaussian distribution.
Finding Outliers are present in the data, so we either truncate the data above a threshold or transform the data using log transformation.
Scale the features. This is required to give equal importance to all the features, and not more to the one whose value is larger.
Feature engineering is an expensive and time-consuming process.
Feature engineering can be a manual process, it can be automated

Training and Testing:

The training data is used to make sure the machine recognizes patterns of the data, cross-validation of data is used to ensure better accuracy and
the efficiency of the algorithm which is used to train the machine.
Test data is used to see how well the machine can predict new answers based on its training.
The train-test split procedure is used to estimate the ML performance of algorithms when they are used to make predictions on data that is not
used to train the model.

Training

Training data is the data set on which you train the model.
Train data from which the model has learned the experiences.
Training sets are used to fit and tune your models.

Testing

Test data is the data which is used to check if the model has
learnt good enough from the experiences it got in the train data set.
Test sets
are “unseen” data to evaluate your models.

Train data: It trains our machine learning algorithm
Test data: After the training the model, test data is used to test its efficiency and performance of the model

The purpose of the random state in train test split: Random state ensures that the splits that you generate are reproducible. The random state that you provide is used as a seed to the random number generator. This ensures that the random numbers are generated in the same order.

Data Split into Training/Testing Set

We used to split a dataset into training data and test data in the machine learning space.
The split range is usually 20%-80% between testing and training stages from the given data set.
A major amount of data would be spent on to train your model
The rest of the amount can be spent to evaluate your test model.
But you cannot mix/reuse the same data for both Train and Test purposes
If you evaluate your model on the same data you used to train it, your model could be very overfitted. Then there is a question of whether models can predict new data.
Therefore, you should have separate training and test subsets of your dataset.

MODEL EVALUATION: Each model has its own model evaluation mythology, some of the best evaluations are here.

Evaluating the Regression Model.
1. Sum of Squared Error (SSE)
2. Mean Squared Error (MSE)
3. Root Mean Squared Error (RMSE)
4. Mean Absolute Error (MAE)
5. Coefficient of Determination (R2)
6. Adjusted R2
Evaluating Classification Model.
1. Confusion Matrix.
2. Accuracy Score.
3. AUC and ROC.|

Deployment of an ML-model simply means the integration of the finalized model into a production environment and getting results to make business decisions.

Shanthababu Pandian

Shanthababu Pandian has over 23 years of IT experience, specializing in data architecting, engineering, analytics, DQ&G, data science, ML, and Gen AI. He holds a BE in electronics and communication engineering and three Master’s degrees (M.Tech, MBA, M.S.) from a prestigious Indian university. He has completed postgraduate programs in AIML from the University of Texas and data science from IIT Guwahati. He is a director of data and AI in London, UK, leading data-driven transformation programs focusing on team building and nurturing AIML and Gen AI. He helps global clients achieve business value through scalable data engineering and AI technologies. He is also a national and international speaker, author, technical reviewer, and blogger.

Beginner Machine Learning

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

varus

Dear Shanthababu Pandian, I am name is varus Loudou and I have been very blessed with your "Understand Machine Learning and Its End-to-End Process" I fund this morning by serching on internet. I am verry interested in how to learn ML and become a professional in that field. Please, I would like to know if there is any online courses available or any other ways to get full formation in ML. Best, varus Loundou

Reading list

Introduction to Deep Learning

Feed Forward Networks

Gradient Descent

Loss Function

Activation Functions

Introduction to Neural networks

Forward and Backward Propagation

Optimizers

Learning Rate Schedulers

NN on Structured Data

Improving the Deep Learning Model

Deep Learning Model Optimization

Unsupervised Deep Learning

AutoDL

Model Deployment

Introduction to PyTorch

Understand Machine Learning and Its End-to-End Process

What is Machine Learning?

Why is Machine Learning Important?

What are the major Machine Learning applications?

Where is Machine Learning in the AI space?

How Data Science and ML are related?

Collection of Data

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp

s_tslv

li_theme

li_theme_set

Google (11)

_gcl_au

SID