Deep learning can be a complex and daunting field for newcomers. Concepts like hidden layers, convolutional neural networks, backpropagation keep coming up as you try to grasp deep learning topics.
It’s not easy – especially if you take an unstructured learning path and don’t cover your basic fundamental concepts first. You’ll be stumbling around a foreign city like a tourist without a map!
Here’s the good news – you don’t need an advanced degree or a Ph.D. to learn and master deep learning. But there are certain key concepts you should know (and be well versed in) before you plunge into the deep learning world.
I’ll be covering five such essential concepts in this article. I also recommend going through the below resources to augment your deep learning experience:
Let’s go over them one by one.
For learning a new skill, say cooking, you would first need to have all the equipment. You would need tools like a knife, a cooking pan, and of course, a gas stove! You would also need to know how to use the tools given to you.
Similarly, it is important to set up your system for deep learning, have some knowledge of the tools you would need, and how to use them.
Regardless of your operating system, Windows, Linux or Mac, it is important to know the basic commands. Here is a handy table for your reference:
Here is a great tutorial to get started with Git and the basic Git commands: Git – Tutorial.
The Deep learning boom has not only brought path-breaking research in the field of AI but has also broken new barriers in computer hardware.
You would need a GPU to work with image and video data for most deep learning projects. You can build a deep learning model on your laptop/PC without the GPU as well, but then it would be extremely time-consuming to do. The main advantages a GPU has to offer are:
Here is a great video explaining the difference between a GPU and CPU:
The best part? You don’t need to buy a GPU or get one installed on your machine. There are multiple Cloud Computing resources that provide GPUs either for free or for an extremely low cost. Additionally, there are a few GPUs that come preinstalled with some practice datasets and their own tutorials preloaded. Some of them are Paperspace Gradient, Google Colab, and Kaggle Kernels.
On the other hand, there are full-fledged servers as well which require some installation steps and some customization like Amazon Web Services EC2.
Here is a table illustrating the options you have:
Deep Learning has also led to Google developing its own type of Processing Units exclusively catering towards building neural networks and deep learning tasks – TPUs.
TPU, or Tensor Processing Unit, is essentially a co-processor which you use with the CPU. Cheaper than a GPU, a TPU is much faster and thus makes building deep learning models affordable.
Google Colab also provides free usage of the TPU (not its full-fledged enterprise version, but a cloud version). Here is Google’s own Colab tutorial on working with TPUs and building models on them: Colab notebooks | Cloud TPU.
To summarize, here are the basic minimum hardware requirements to start cooking your deep learning model:
Continuing the same analogy of learning to cook, you have now got the hang of operating a knife and a gas stove. But what about the skills and the recipes needed to actually cook food?
This is where we encounter the software required for deep learning. Python is a programming language that is used across industries for deep learning.
However, we can’t use only Python for the level of computations and operations that deep learning needs. Additional functionalities are provided by what are known as libraries in Python. A library can have hundreds of small tools, called functions, that we can use for programming.
Anaconda is a framework that helps you keep track of your Python versions and the libraries as well. It is a handy all-in-one tool that is quite popular, easy to work with, and has simple documentation as well. Here is how you can install Anaconda.
So what do I mean by the basics of Python? Let’s discuss this in a bit more detail.
Note: You can start learning Python in our free and popular course – Python for Data Science.
The main data types in Python are:
There are 5 main types of operators in Python:
Python offers a variety of datasets that we can use for different purposes. Each data structure has its unique properties that we can leverage to store different types of data and data types. These properties are:
In data science, the most frequently used data structures are:
Example: We have a list like this:
my_list = [1, 3, 7, 9]
This order will remain the same everywhere we use this list. Also, we can change this list, like removing 7, adding 11, etc.
Example: A tuple can be declared as:
my_tuple = ("apple", "banana", "cherry")
Now, again, this order will remain the same, but unlike a list, we cannot remove ‘cherry’, or add ‘orange’ to the tuple.
Example: A set uses curly braces like this:
my_set = {'apple', 'banana', 'cherry'}
The order is not defined for a set.
Example: A dictionary also uses curly braces with a key-value format:
my_dict = { "brand": "Ford", "model": "Mustang", "year": 1964}
Here, ‘brand’, ‘model’, and ‘year’ are the keys that have the values ‘Ford’, ‘Mustang’, and ‘1964’ respectively. The order of the keys can be different every time you print the dictionary.
Control flow means controlling the flow of the execution of your code. We execute the code line by line, and what we execute on a line affects how we write the next line of code:
These are used to set a condition with the conditional operators we saw earlier.
Example: You need to check if a student has passed or failed. If he has obtained marks >= 40, he has passed, otherwise, he has failed.
In that case, our conditional statement would be:
if marks >= 40: print("Pass") else: print("Fail")
Example: We have a list having values from 1 to 5 and we need to multiply each value in this list with 3:
numbers_list = [1, 2, 3, 4, 5] for each_number is numbers_list: print(each_number * 3)
Try out the above snippets and you can see how easy Python is!
Fun note: Unlike other programming languages, we don’t need to store variables of the same type in a data structure. We can totally have a list like this [John, 153, 78.5, “A+”] or even a list of lists like [[“A”, 56], [“B”, 36.5]]. It is this variety and flexibility of Python that has made it so popular among data scientists!
You can also avail the below free courses that cover Python and Pandas essentials:
This is one of the first libraries you would come across when you start Machine Learning and Deep Learning. An extremely popular library, Pandas is just as required for deep learning as for machine learning.
We store data in a variety of formats, such as CSV (Comma Separated Values) file, Excel sheets, etc. In order to work with the data in these files, Pandas provides a data structure called a Pandas dataframe (you can think of it as a table).
Dataframes and the sheer number of manipulation operations Pandas provides on dataframes make it the workhorse library for machine and deep learning.
You can take this free and easy course to get started with Pandas if you haven’t already: Pandas for Data Analysis in Python.
Now, if you read the list of 5 things we started out with, you might have a question: What do we do with all the mathematics in deep learning?
Well, let’s find out!
There is a common myth that Deep Learning requires advanced knowledge of linear algebra and calculus. Well, let me dispel that myth right here.
You only need to recollect your high school-level math to start your Deep Learning journey!
Let us take a simple example. We have images of cats and dogs and we want the machine to tell us which animal is present in any given image:
Now, we can easily identify the cat and the dog here. But how will the machine distinguish the two? The only way is to give this data to the model in the form of numbers, and that is where we need linear algebra. We basically convert the images of a cat and a dog into numbers. These numbers can be either expressed as vectors or as matrices.
We will cover some key terms and some great resources you can learn from.
1. Scalars and vectors: While scalars only have magnitude, vectors have both direction and magnitude.
Example: If we have 2 vectors a = [1, -3, 5] and b = [4, -2, -1], then:
a) Dot product:
a . b = (a1 * b1) + (a2 * b2) + (a3 * b3) = (1 * 4) + (-3 * -2) + (5 * 1) = 3
b) Cross product:
a X b = [c1, c2, c3] = [13, 21, 10]
where,
c1 = (a2*b3) - (a3*b2) c2 = (a3*b1) - (a1*b3) c3 = (a1*b2) - (a2*b1)
2. Matrices and Matrix Operations: A matrix is an array of numbers in the form of rows and columns. Now, for example, the above image of a cat can be written as a matrix of pixels:
Just like numbers, we can perform operations like adding and subtracting two matrices. However, operations like multiplication and division are performed slightly differently from the regular way:
You can refer to this excellent Khan Academy course on Linear Algebra to learn the above concepts in detail. You can also check out 10 powerful applications of linear algebra here.
The value we are trying to predict, say, ‘y’, is whether the image is a cat or a dog. This value can be expressed as a function of the input variables/input vectors. Our main aim is to make this predicted value as close to the actual value.
Now, imagine dealing with thousands of cat images and dog images. These are surely cute to look at, but you can imagine that working on these images and numbers is not easy at all!
Since deep learning essentially involves large amounts of data and complex machine learning models, working with both is often time and resource expensive. That is why it is important to optimize our deep learning model in such a way that it is able to predict as accurately as possible without using too many resources and time.
This is where the crux of the calculus used in deep learning lies: Optimization.
In any deep learning or machine learning model, we can express the output as a mathematical function of the input variables. Thus, we need to see how our output changes with changes in each of the input variables. We need derivatives to do this since derivatives express the rate of change.
If y = f(x), then the derivative of y with respect to x, id given as dy/dx = change in y / change in x
Geometrically, if we express f(x) as a graph, the derivative at a point is also the slope of the tangent to the graph at that point.
Here is a figure to help you understand it:
The derivative we have seen above talks only of one variable, x. However, in deep learning, there can be hundreds of variables on which our final output, y, depends. In such cases, we need to calculate the rate of change in y with respect to each of these input variables. Here is where partial derivatives come into the picture.
Partial derivatives: Basically, we consider only one variable, and keep all the other variables as constant. Then, we calculate the derivate of y with the remaining variable. Like this, we calculate the derivative with respect to each variable.
Chain Rule: Oftentimes, the function of y in terms of the input variables can be much more complicated. How do we calculate the derivative then? The chain rule helps us compute this:
If y = f(g(x)), where g(x) is a function of x, and f is a function of g(x), then dy/dx = df/dx * dg/dx
Let us consider a relatively simple example:
y = sin(x^2)
Thus, using the Chain Rule:
dy/dx = d(sin(x2))/dx * d(x2)/dx = cos(x2) * 2x
Learning Resources for Calculus in Deep Learning:
Just like Linear Algebra, ‘Statistics and Probability’ is its own new world of mathematics. It can be quite intimidating for beginners and even seasoned data scientists sometimes find it challenging to recall advanced statistical concepts.
However, it cannot be denied that Statistics form the backbone of Machine Learning and Deep Learning. The concepts of probability and statistics like descriptive statistics and hypothesis testing are extremely crucial in the industry where the interpretability of your deep learning model is the topmost priority.
Let us start with the basic definitions:
Let me give you a simple example. Suppose you have the marks scored by 1000 students on an entrance exam (the marks are out of 100). Someone asks you – how did the students perform in this exam? Would you present that person with a detailed study of the scores of the students? In the future, you might, but initially, you can start off by saying that the average score was 68. This is the mean of the data.
Similarly, we can figure out more simple statements based on the data:
There you go – just with these few lines, we can say that a majority of the students performed well, but not many were able to score really high marks in the test. This is what descriptive statistics is. We represented the data of 1000 students using just 5 values.
There are other key terms used in descriptive statistics as well, such as:
Based on the same example, let’s say that you are asked a question: if I pick a student randomly from these 1000 students, what are the chances that he/she has passed the test? The concept of probability will help you answer this question. If you get a probability of 0.6, it implies that there is a 60% chance that he/she passed it (assuming the passing criteria is 40 marks).
Other questions on the same data (as shown below) can be answered using Hypothesis testing and Inferential Statistics to answer them:
You can learn all about statistics and probability from the below resources:
Here’s the good news – you don’t need to know the entire gamut of the Machine Learning algorithms that exist today. Not to say that they are insignificant, but just from the point of view of starting deep learning, there are not many you need to be acquainted with.
There are, however, a few concepts that are crucial to build your foundation and acquaint yourself with. Let us go over these concepts.
Building a predictive model is not the only step required in deep learning. You need to check how good the model is and keep improving it till we reach the best model we can.
So how do we judge the performance of a deep learning model? We use some evaluation metrics. Depending on the task, we have different evaluation metrics for regression and classification.
Evaluation metrics are extremely crucial in deep learning. Be it in the research domain or in the industry, your deep learning model will be judged on the value of the evaluation metric.
A deep learning model trains itself on the data provided to it. However, as I mentioned above, we need to improve this model and we need to check its performance. The true mettle of the model can only be observed when we give it totally new (although cleaned) data.
But then, how do we improve on the model? Do we give it new data every time we want to change even a single parameter? You can imagine how time-consuming and costly such a task would be!
This is why we use validation. We divide our entire data into 3 parts: training, validation, and testing. Here is a single sentence to help you remember:
We train the model on the training set, improve it on the validation set, and finally predict on the so-far unseen test set.
Some common strategies for Cross-validation are: k-fold Cross-Validation and Leave-One-Out Cross-Validation (LOOCV).
Here’s a comprehensive article covering validation techniques and how to implement them in Python: Improve Your Model Performance using Cross-Validation (in Python / R).
Let us go back to the calculus we saw earlier and the need for optimization. How do we know that we have achieved the best model there can be? We can make small changes in the equation and at each change, we check if we are closer to the actual value.
It is this act of taking small steps towards a possible direction which is the basic intuition behind gradient descent. Gradient descent is one of the most important concepts you will come across and revisit often in deep learning.
Explanation and implementation of Gradient Descent in Python: Introduction to Gradient Descent Algorithm (along with variants) in Machine Learning.
What is the simplest equation you can think of? Let me list a few:
Did you notice the one thing that was common in all the 3 functions? Yes, they are all linear functions. What if we could predict the value of y using these functions?
These would then be called linear models. You would be surprised to know how popular linear models are in the industry. They are not too complicated, are interpretable, and with the right gradient descent, we can get high evaluation metrics too! Not only this, linear models form the basis of deep learning. For instance, do you know that you can create a logistic regression model using a simple neural network?
Here’s a detailed guide covering not only linear and logistic regression but other linear models as well: 7 Regression Types and Techniques in Data Science.
You will often come across situations where your deep learning model is performing very well on the training set but gives you poor accuracy on the validation set. This is because the model is learning each and every pattern from the training set, and thus, it is unable to detect these patterns in the validation set. This is called overfitting the data and it makes the model too complex.
On the other hand, if your deep learning model is performing poorly on both the training set as well as the validation set, it is most likely underfitting. Think of it as applying a linear equation (a too simple model) on our data when it is, in fact, non-linear (complex):
A simple analogy for overfitting and underfitting is a student’s example in a math class:
Check out this intuitive explanation of overfitting and underfitting, along with the comparison between them: Underfitting vs. Overfitting in Machine Learning.
In simplest terms, bias is the difference between the actual value and the predicted value. Variance is measured by the change in the output when we change the training data.
Let’s quickly summarize what we can interpret from the above image:
Both high bias and high variance lead to an increase in the error. Typically, a high bias implies underfitting, and a high variance implies overfitting. It is very difficult to achieve both low bias and low variance – one usually comes at the cost of the other.
In terms of model complexity, we can use the below diagram to decide on the optimal complexity of our model:
I encourage you to go through this awesome essay by Scott Fortmann-Roe on Bias Variance using examples: Understanding the Bias-Variance Tradeoff.
Just like the Pandas library, there is another library that forms the foundation of machine learning. The sklearn library is the most popular library in machine learning. It contains a large number of machine learning algorithms which can you can apply to your data in the form of functions.
What’s more, sklearn even has the functionalities for all the evaluation metrics, cross-validation, and scaling/normalizing your data as well.
Here’s a quick example of sklearn in action:
from sklearn import datasets, linear_model from sklearn.metrics import mean_squared_error regr = LinearRegression() #train your data - remember how we train the model on our train set? regr.fit(X_train, y_train) #predict on our validation set to improve it y_pred = regr.predict(X_Valid) #evaluation metrics: MSE print('Mean Squared Error:', mean_squared_error(y_test, y_pred)) ...#further improvement of our model
There you go! We could build a simple linear regression model with essentially less than 10 lines of code!
Here are a couple of excellent resources to learn more about sklearn:
In this article, we covered 5 essential things you need to know before building your first deep learning model. It is here that you will encounter the popular deep learning frameworks like PyTorch and TensorFlow. They have been built with Python in mind and you can now easily understand working with them since you have a good grasp of Python.
Here are a couple of great articles to get started on these frameworks:
Once you have built your foundations on these 5 pillars, you can always explore more advanced concepts like Hyperparameter Tuning, Backpropagation, etc. These are the concepts I built my knowledge of deep learning on.
How would you go about starting your deep learning journey? Please reply in the comments below!
not much in neural network article, topic was diverted to machine learning
Very nice article and quite informative
Nice article. Please check a small confusion on the bias-variance graphs (in the top right it says high bias/low variance, but it is the oposite)