Every person has their own way of learning. What helped me break into data science was books. There is nothing like opening your mind to a world of knowledge condensed into a few hundred pages. There is a magic and allure to books that I have never found in any other medium of learning.
“If you only read the books that everyone else is reading, you can only think what everyone else is thinking.” – Haruki Murakami
Learning Data Science on your own can be a very daunting task! There are numerous ways to learn today – MOOCs, workshops, degrees, diplomas, articles, and so on. But putting them in a structure and focusing on a structured path to become a data scientist is of paramount importance.
But there are hundreds of books out there about data science. How do you choose where to start? Which books are ideal for learning a certain technique or domain? While there’s no one-shoe-fits-all answer to this, I have done my best to cut down the list to these 27 books we’ll see shortly.
I have divided the books into different domains to make things easier for you:
At the bottom of the article, you will find a superbly illustrated infographic mentioning each book. You can use that as a ‘to-read’ shelf and strike them off as you go down the list! You can also download a High Resolution copy of this infographic. It’s perfect for printing as it’s in a PDF format.
Without any further ado, let’s dive right in.
Author: Timothy C. Urdan
I started my journey into the world of statistics with this beauty of a book. It’s written for absolute beginners and in a way that makes you come back for more. The writing style and explanations provided do justice to the title – Statistics in Plain English. You could recommend it to any non-technical person and they would get the hang of these topics, it’s that good!
Author: Allen B. Downey
You’ll find this book at the top of most data science book lists. The book comes with plenty of resources. Use the above link to go to the book home page and you’ll see resources like data files, codes, solutions, etc. It will be especially useful for folks who know the basics of Python. The language is used to demonstrate real world examples.
Authors: Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
An all-time classic. This book is recommended or referenced in most machine learning courses I’ve come across, it’s just that well written. It covers basic statistics as well as machine learning techniques. The awesome thing about this book is that each concept is explained with case studies in R. So once you have a handle on programming, you can always come back and try out each concept again. What better way to ingrain a concept than by practicing it multiple times?
Author: David Morin
Ideal book for beginners. It is written for college students so all of you looking to learn probability from scratch will appreciate the way this is written. All the basics are covered – combinatorics, the rules of probability, Bayes theorem, expectation value, variance, probability density, common distributions, the law of large numbers, the central limit theorem, correlation, and regression.
Authors: J. Laurie Snell and Charles Miller Grinstead
Another introductory book covering basic probability concepts. Like the book above, this one is a comprehensive text written with college graduate students in mind. Why do I keep repeating that, you might be wondering. It’s because I want to emphasize that if there’s a place to start learning from scratch, it’s a book that’s written for students who haven’t ever ventured into this field before.
Author: William Feller
As the book’s description states, it’s a complete guide to the theory and practical applications of probability theory. I recommend reading this if you really want to deep dive into the world of probability. It’s a VERY comprehensive text and might not be to a beginner’s taste. If you’re learning probability just to get into data science, you can get away with reading either of the two probability books mentioned above.
Author: Andriy Burkov
I love this book. Having read a ton of books trying to teach machine learning from various angles and perspectives, I struggled to find one that could succinctly summarize difficult topics and equations. Until Andriy Burkov managed to do it in some 100-odd pages. It is beautifully written, is easy to understand and has been endorsed by thought leaders like Peter Norvig. Need I say more? Beginner or established, every data scientist should get their hands on this book.
Author: Tom Mitchell
Before all the hype came about, Tom Mitchell’s book on machine learning was the go-to text to understand the math behind various techniques and algorithms. I would suggest brushing up on your math before taking this up. But you don’t need any background in AI or statistics to understand these concepts. It was the first-ever book I read on ML! It’s modestly priced so it’s definitely worth adding to your collection.
Authors: Trevor Hastie, Robert Tibshirani and Jerome Friedman
And we’re back with another classic by Hastie and Tibsharani! It’s the natural successor to the ‘Introduction to Statistical Learning’ book we covered earlier. While there are a few overlaps with that book, this one takes a more advanced look at what we call machine learning algorithms. Topics like neural networks, matrix factorization, spectral clustering are covered apart from the common ML techniques.
Authors: Ian Goodfellow, Yoshua Bengio and Aaron Courville
What a list of rockstar authors! The ‘Deep Learning’ book is widely regarded as the best resource for beginners. It’s divided into three sections: Applied Math and Machine Learning Basics, Modern Practical Deep Learning Frameworks, and Deep Learning Research. It is to-date the most cited book in the deep learning community. Keep it by your bedside, worship it and reference it often – this will be your companion whenever you start your deep learning journey.
Author: Francois Chollet
A really cool way of learning deep learning (or machine learning for that matter) is by programming side-by-side with the theory. And that’s the approach Francois Chollet follows in the ‘Deep Learning with Python’ book. Concepts are taught using the popular Keras library. Francois is the creator of Keras so who better to teach you this topic? I also recommend following Francois on Twitter – there is a lot we can learn from him.
Author: Michael Nielsen
This is a free online book to learn about the core component that powers deep learning – neural networks. I quite like the way this book has been written. It takes a practical approach to teaching and looks at deep learning topics from the lens of a beginner. You will not learn any programming language in this book – it’s a good old fashioned text book on the underlying insights behind neural networks.
Authors: Steven Bird, Ewan Klein and Edward Loper
Another book in this collection which sticks to the learn by doing policy. You’ll pick up Python concepts you otherwise wouldn’t have and will navigate the world of NLP using the NLTK library (Natural Language Toolkit). While this shouldn’t be the only resource you refer to for learning NLP (it’s far too complex a field for that), it offers a pretty decent introduction to the topic.
Authors: Christopher Manning and Hinrich Schutze
Published almost two decades ago, this text still serves as an excellent introduction to natural languages processing. It’s a very comprehensive guide to the broader sub-topics in NLP, like Text Categorization, Parts-of-Speech Tagging, Probabilistic Parsing, among various other things. The authors have provided a rigorous coverage of mathematical and linguistic foundations. Again, the book is quite detailed so keep that in mind.
Authors: Daniel Jurafsky and James H. Martin
The emphasis of this book is on practical applications and scientific evaluation in the scope of natural language and speech. I included this book to expand our horizons beyond text – to look at speech recognition as well. And why not? It’s an area of research that is thriving nowadays with a plethora of applications coming out everyday. Jurafsky and Martin have written an in-depth book on NLP and computational linguistics. This one is from the masters themselves.
Author: Richard Szeliski
Explore a variety of common computer vision techniques in this book, especially ones used for analyzing and interpreting images. While this was published almost 9 years ago, the examples and methodology illustrated by Richard Szeliski are applicable today as well. It’s a comprehensive text that takes a scientific approach to solving basic vision challenges. The website I have linked to above contains a free PDF copy of the book
Author: Jan Erik Solem
Before you dive into this awesome book, go to the website I’ve linked above and download the datasets, the code notebooks and clone the GitHub repository mentioned there. They are excellent companions in this REALLY hands-on introduction to the world of computer vision. As the author states, “You’ll learn techniques for object recognition, 3D reconstruction, stereo imaging, augmented reality, and other computer vision applications as you follow clear examples written in Python.”
Author: Dr. Simon J.D. Prince
The book starts off from scratch by introducing us to the concepts of probability and quickly picks up pace from there. While some of the frameworks introduced here have seen more advanced versions come out, this book is nonetheless relevant in the current context. More than 70 algorithms have been introduced and the text is beautifully complemented by over 350 illustrations. The website also contains PowerPoint slides, if that’s the kind of learning you prefer.
Authors: Stuart Russell and Peter Norvig
A book written by Stuart Russell and Peter Norvig? I am sold. It is the leading book in Artificial Intelligence. More than 1300 universities in over 100 countries reference/cite this book in their curriculum. Given who the authors are, it isn’t surprising to see the book length – 1100 pages. Covering the length and breadth of AI components – speech recognition, autonomous vehicles, machine translation, and computer vision among other things, this can be considered the Bible of AI.
Author: Jeff Heaton
What are the foundational algorithms underneath artificial intelligence? This book packs a lot of technical know-how into just 222 pages. This is volume 1 of a series of books on the techniques behind AI (dimensionality, distance metrics, clustering, error calculation, hill climbing, Nelder Mead, and linear regression). There is an accompanying site as well which contains examples cited in the book + a GitHub repository containing the code.
Author: Pedro Domingos
If you’re looking for a technical book on AI, this isn’t it. What it is, however, is a masterful text on how machine learning is remaking business, politics, science and war. It is a thoughtful and thought-provoking book on where AI is right now, and where it might end up taking the human race. Will we ever find a single algorithm (or ‘The Master Algorithm’) that is capable of driving all knowledge from data? Join Pedro Domingos in his quest to find out.
Author: Luciano Ramalho
There are way too many resources out there to learn Python but nothing teaches you programming like a good old-fashioned book. As you might expect from a coding book, it’s a hands-on guide to help you understand how Python works and how to write awesome and effective Python code. Luciano Ramalho also covers a few popular libraries you’ll find yourself regularly using in data science projects. With a length of 794 pages, this book is worth the spend.
Author: Mark Lutz
Wait, another Python book?! If you thought the above book taught you everything you need to know about Python, think again. This is a vast programming language with a lot more left to cover. Once you’ve mastered the fundamentals from the above book by Luciano Ramalho, take a gander on this one by Mark Lutz. There are in-depth tutorials on a wide variety of topics: databases, networking, text processing, GUIs, etc. Tons and tons of examples are included. A must-read for programming geeks.
Author: Samir Madhavan
The two books we have covered so far for learning Python looked at the language from a programming perspective. Now it’s time to learn it from the data science angle. Which data science libraries are commonly used and how? How can you create data visualizations and mine for patterns in Python? And how can you code advanced data science/machine learning techniques to build models? These questions and more are answered by Samir Madhavan in this excellent write-up.
Authors: Garrett Grolemund and Hadley Wickham
Anyone who has remotely heard of R programming will have brushed across Hadley Wickham’s work. His work in this language is unparalleled – I could go on and on about him. I couldn’t recommend this book highly enough. You’ll learn how to import different kinds of data into R, the different data structures, and how to transform, visualize and model your data. The perfect book to learn data science through coding in R.
Author: Jared P. Lander
I learned R way before I even heard about Python. I have a special place for it in my heart and Jared Lander’s R for Everyone played a big part in that. I got this book through one of my acquaintances and was immediately taken by how well it was written. It claims to be for ‘everyone’ and lives up to it’s name. This is a great book if you’re from a non-technical and non-statistical background.
Author: Paul Teetor
The R Cookbook is an excellent addition to your budding data science reading list. It contains more than 200 practical recipes to help you get started with analyzing and manipulating data in R. Each recipe looks at a different problem. It’s meant for beginners, intermediate users and advanced practitioners alike. Whether it’s learning new programming skills or brushing up your concepts, this cookbook is for everyone.
And as promised, here is the full infographic covering all the books we saw in this article:
Hi Pranav, Thanks for a good article. Could you also the share the sequence in which one has to read the above mentioned books for the data science journey? Thanks in advance.
Hi Krishna, Appreciate you taking the time out to go through the list! The books should be read initially in the intended sequence. Start with statistics and probability (the absolute base of most things you'll learn in data science). Once done, move on to machine learning. After that comes the fork in the path. You could study deep learning if that's where you see yourself down the line. Otherwise I would recommend picking a domain (banking, finance, marketing, etc.), understanding what kind of problems are there in those fields, and then branching out to study certain topics. For example, NLP is a big thing in marketing to understand reviews. Computer Vision is big in surveillance applications, manufacturing products, etc. I recommend checking out the below two learning paths our team has put together. They are REALLY comprehensive and free: Machine Learning - https://trainings.analyticsvidhya.com/courses/course-v1:AnalyticsVidhya+LPDS2019+LPDS2019_T1/about Deep Learning - https://trainings.analyticsvidhya.com/courses/course-v1:AnalyticsVidhya+LP_DL_2019+2019_T1/about
Excellent guidance for serious aspirants.
Thanks for sharing this list Pranav. Very helpful!
Glad you found it useful, Gunashree. :)