How much Mathematics do you need to know for Machine Learning?

KAVITA Last Updated : 31 Mar, 2023

8 min read

This article was published as a part of the Data Science Blogathon

No matter how long-running a love-hate relationship you have with maths, understanding its core concepts is essential for designing Machine Learning Models and making strategic decisions. Mathematics for Machine Learning is a prerequisite for building a career in Data Science and AI, so embracing its concepts and implementing them in your future work is crucial.

Machine learning is all about mathematics, which successively helps in creating an ML algorithm that will learn from data provided to form an accurate prediction. The prediction might be as simple as classifying cats or dogs from a given set of images or what quite products to recommend to a customer supported past purchases. Having a proper understanding of the mathematics behind the ML algorithms will help you choose all the proper algorithms for your project in data science and machine learning.

As long as you’ll understand why maths is employed, you’ll find it more interesting. With this, you’ll understand why we pick one machine learning algorithm over the opposite and the way it affects the performance of the machine learning model.

We will try to cover the following points in this blog post:

Vectors and Vector Spaces
Linear Transformations and matrices

Vectors and Vector Spaces

The ability to visualize data is one of the most useful skills to possess as a data science professional, and a solid foundation in linear algebra enables one to do that. Some concepts and algorithms are quite easy to understand if one can visualize them as vectors and matrices, rather than looking at the data as lists and arrays of numbers.

Linear Algebra is the workhorse of Data Science and ML. While training a machine learning model using a library (such as in R or Python), much of what happens behind the scenes is a bunch of matrix operations. The most popular deep learning library today, Tensorflow, is essentially an optimized (i.e. fast and reliable) matrix manipulation library. So is scikit-learn, the Python library for machine learning.

Vectors

A vector is an object having both magnitudes as well as direction. Vectors are usually represented in two ways – as ordered lists, such as x = [x₁, X₂ . . . x_n] or using the ‘hat’ notation, such as x = x₁^ˆi + x₂^ˆj + x₃^ˆk where ^ˆi, ^ˆj, ^ˆk represent the three perpendicular directions (or axes).

The number of elements in a vector is the dimensionality of the vector. For e.g. x = [ x₁ , x₂] is two dimensional (2-D) vector , x = [ x₁ , x₂ , x₃] is a 3-D vector and so on.

The magnitude of a vector is the distance of its tip from the origin. For an n-dimensional vector x = [x₁,x₂ , . . . x_n ] , the magnitude is given by,

A unit vector is one whose distance from the origin is exactly 1 unit. E.g the vectors $i hat comma j hat comma the fraction with numerator i hat and denominator the square root of 2 plus the fraction with numerator j hat and denominator the square root of 2$ are unit vectors.

Vector Operations

1. Vector Addition/Subtraction: It is the element-wise sum/difference of two vectors.

Mathematically,

Mathematics For Machine Learning the 4 by 1 column matrix Row 1: x sub 1 Row 2: x sub 2 Row 3: vertical ellipsis Row 4: x sub n plus or minus the 4 by 1 column matrix Row 1: y sub 1 Row 2: y sub 2 Row 3: vertical ellipsis Row 4: y sub n equals the 4 by 1 column matrix Row 1: x sub 1 plus y comma Row 2: x sub 2 plus y sub 2 Row 3: vertical ellipsis Row 4: x sub n plus y sub n

2. Scalar Multiplication/Division: It is the element-wise multiplication/division of the scalar value.

Mathematically,

scalar multiplication division Mathematics For Machine Learning a. times the 4 by 1 column matrix Row 1: x sub 1 Row 2: x sub 2 Row 3: vertical ellipsis Row 4: x sub n equals the 4 by 1 column matrix Row 1: a. x sub 1 Row 2: a. x sub 2 Row 3: vertical ellipsis Row 4: a. x sub n

3. Vector Multiplication or Dot Product: It is the element-wise product of the two vectors. It is also known as the dot product of two vectors. The dot product of two vectors returns a scalar quantity. Mathematically,

Mathematics For Machine Learning vector multi the 4 by 1 column matrix Row 1: x sub 1 Row 2: x sub 2 Row 3: vertical ellipsis Row 4: x sub n period the 4 by 1 column matrix Row 1: y sub 1 Row 2: y sub 2 Row 3: vertical ellipsis Row 4: y sub n equals x sub 1 y sub 1 plus x sub 2 y sub 2 plus period period period positive x sub n y sub n

Geometrically,

where theta is the angle between two vectors?

The dot product of two perpendicular vectors (also called orthogonal vectors) is 0. The dot product can be used to compute the angle between two vectors using the formula,

$Mathematics For Machine Learning cosine theta equals the fraction with numerator x right arrow times y right arrow and denominator double vertical bar double vertical bar x times double vertical bar double vertical bar y$

This simple property of the dot product is extensively used in data science applications.

Vector Spaces

Basis Vector: A basis vector of a vector space V is defined as a subset (v ₁, v₂,. . . v_n ) of vectors in vector space V, that are linearly independent and span vector space V. Consequently, if (v1, v2, . . . vn) is a list of vectors in vector space V, then these vectors form a vector basis if and only if every v in vector space V can be uniquely written as,
Span: The span of two or more vectors is the set of all possible vectors that one can get by changing the scalars and adding them.
Linear Combination: The linear combination of two vectors is the sum of the scaled vectors.
Linearly Dependent: A set of vectors is called linearly dependent if any one or more of the vectors can be expressed as a linear combination of the other vectors.
Linearly Independent: If none of the vectors in a set can be expressed as a linear combination of the other vectors, the vectors are called linearly independent.

LINEAR TRANSFORMATIONS AND MATRICES:

Matrices are a time-tested and powerful data structure used to perform numerical computations. Briefly, a matrix is a collection of values stored as rows and columns, i.e.

Linear transformation - A equals the 4 by 1 column matrix Row 1: x sub 11 comma x sub 12 times times times x sub 1 n Row 2: x sub 12 comma x sub 22 times times times x sub 2 n Row 3: colon colon colon Row 4: x sub m 1 comma x sub m 2 raised to the power times times times x sub m n

Matrices:

Rows: Rows are horizontal. The matrix A has m rows. Each row itself is a vector, so they are also called row vectors.
Columns: Columns are vertical. The matrix A has n columns. Each column itself is a vector, so they are also called column vectors.
Entities: Entities are individual values in a matrix. For a given matrix A, value of row i and column j is represented as A _ij
Dimensions: The number of rows and columns. For m rows and n columns, the dimensions are (m × n).
Square Matrices: These are matrices where the number of rows is equal to the number of columns, i.e m = n.
Diagonal Matrices: These are square matrices where all the off-diagonal elements are zero,i.e,
Identity Matrices: These are diagonal matrices where all the diagonal elements are 1, i.e,

Matrix Operations

Matrix Addition/Subtraction: It is the element-wise sum/difference of two matrices. Mathematically,
Matrix Multiplication/Division: It is the element-wise multiplication/division of the scalar value. Mathematically,
Matrix Multiplication or Dot Product: It is the element-wise product of the two matrices i.e the (i, j) element of the output matrix is the dot product of the i^th row of the first matrix and the j^th column of the second matrix. Mathematically, Not all matrices can be multiplied with each other. For the matrix multiplication AB to be valid, the number of columns in A should be equal to the number of rows in B. i.e for two matrices A and B with dimensions (m × n) and (o × p), AB exists if and only if m = p and BA exists if and only if o = n. Matrix multiplication is not commutative i.e
AB- BA.
Matrix Inverse: The inverse of a matrix A is a matrix such that AA ^-1= I ( Identity Matrix).
Matrix Transpose: The transpose of a matrix produces a matrix in which the rows and columns are interchanged. Mathematically,

Linear Transformations:

Image Source

Any transformation can be geometrically visualized as the distortion of the n-dimensional space (it can be squishing, stretching, rotating, etc.). The distortion of space can be visualized as a distortion of the grid lines that make up the coordinate system. Space can be distorted in several different ways. A linear transformation, however, is a special distortion with two distinct properties,

Straight lines remain straight and parallel to each other
The origin remains fixed

Consider a linear transformation where the original basis vectors- i hat a. n d j hat

move to the new points, i hat equals open bracket 1 comma negative 2 close bracket a. n d j hat equals open bracket 3 comma 0 close bracket

(where i and j are unit vectors along the x-direction and y-direction in the co-ordinate system respectively) This means that i moves to (1,− 2) from (1,0) and j moves to (3, 0) from (0, 1) in the linear transformation. This transformation simply stretches the space in the y-direction by three units while stretching the space in the x-direction by two units and rotating it by sixty degrees in the clockwise direction. One can combine the two vectors where i and j land and write them as a single matrix, i.e, L equals the 2 by 1 column matrix Row 1: 1 3 Row 2: negative 2 0

L equals the 2 by 1 column matrix Row 1: 1 3 Row 2: negative 2 0

As can be seen, each of these vectors forms one column of the matrix (and hence are often called column vectors). This matrix fully represents the linear transformation. Now, if one wants to find where any given vector v would land after this transformation, one simply needs to multiply the vector v with the matrix L, i.e v_new = L.v. It is convenient to think of this matrix as a function that describes the transformation, i.e it takes the original vector v as the input and returns the new vector v_new. The following figures represent the linear transformation.

Image source

Formally, a transformation is linear if it satisfies the following two properties,

Additivity or Distributivity, i.e L(v + w) = L(v) + L(w) .
Associativity of Homogeneity, i.e L(cv) = cL(v) where c is a scalar.

End!!!!

I hope you enjoyed the article !!! Though there are plenty of valuable resources available on the internet which explain concepts like matrix decompositions, vector calculus, linear algebra, geometry, matrices, the mathematics behind the principal component analysis, and support vector machines, and many more. The following links may help you to understand the mathematical concepts :

Khan Academy’s courses – Comprehensive free course for complex mathematical concepts.
3Blue1Brown – Here you will understand each most of the mathematical concept in depth

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

KAVITA

A Mathematics student turned Data Scientist. I am an aspiring data scientist who aims at learning all the necessary concepts in Data Science in detail. I am passionate about Data Science knowing data manipulation, data visualization, data analysis, EDA, Machine Learning, etc which will help to find valuable insights from the data.

Free Courses

4.6

Exploratory Data Analysis with Python & GenAI

Learn EDA with Python: Transform data into insights using PandasAI & more.

4.5

Ace a Data Scientist Interview in 2025

Build a powerful 2025-ready data science resume using AI tools.

4.5

No Code Predictive Analytics with Orange

No-code AI course for business pros with real-world ML use cases.

4.7

Adaptive Email Agents with DSPy

Build adaptive email agents with DSPy using context and smart learning.

4.9

Introduction to AI & ML

AI & ML are transforming industries. Learn their impacts in this course.

Reading list

How much Mathematics do you need to know for Machine Learning?

We will try to cover the following points in this blog post: