“It’s called reading. It’s how people install new software into their brain”
Personally, I haven’t learnt as much from videos & online tutorials as much I’ve learnt from books. Until this very moment, my tiny wooden shelf has enough books to keep me busy this winter.
Understanding machine learning & data science is easy. There are numerous open courses which you can take up right now and get started. But, acquiring in-depth knowledge of a subject requires extra effort. For example: You might quickly understand how does a random forest work, but understanding the logic behind it’s working would require extra efforts.
The confidence of questioning the logic comes from reading books. Some people easily accept the status quo. On the other hand, some curious ones challenge & say, “Why can’t it be done the other way?” That’s where such people discover new ways of executing a task. Almost, every data scientist I’ve come across in person, on AMAs, on published interviews, each one of them have emphasized the inevitable role of books in their lives.
Here is a list of books on doing machine learning / data science in R and Python which I’ve come across in last one year. Since reading is a good habit, with this post, I want pass this habit to you. For each book, I’ve written a summary to help you judge its relevance. Happy reading!
Disclosure: The amazon links in this article are affiliate links. If you buy a book through this link, we would get paid through Amazon. This is one of the ways for us to cover our costs while we continue to create these awesome articles. Further, the list reflects our recommendation based on content of book and is no way influenced by the commission.
This book is written by Garrett Grolemund. It is best suited for people new to R. Learning to write functions & loops empowers you to do much more in R, than just juggling with packages. People think, R packages can let them avoid writing functions & loops, but it isn’t a sustainable approach. This book introduces you to details of R programming environment using interesting projects like weighted dice, playing cards, slot machine etc. The book language is simple to understand and examples can be reproduced easily.
Available: Buy Now
This book is written by Jared P. Lander. It’s a decent book covering all aspects of data science such as data visualization, data manipulation, predictive modeling, but not in as much depth. You can understand as, it covers a wide breath of topic and misses out on details of each. Precisely, it emphasizes on the usage criteria of algorithms and one example each showing its implementation in R. This books should be brought by people who are more inclined towards understand practical side of algorithms.
Available: Buy Now
This book is written by Teetor Paul. It comprises of several tips, recipes to help people overcome daily struggles in data pre-processing and manipulation. Many a times, we are stuck in a situation where we know very well, what needs to be done. But, how it needs to be done becomes a mammoth challenge. This books solves the problem. It doesn’t have theoretical explanation of concepts, but focuses on how to use them in R. It covers a wide range of topics such as probability, statistics, time series analysis, data pre-processing etc.
Available: Buy Now
This book is written by
Available: Buy Now
This book is written by Max Kuhn and Kjell Johnson. Max Kuhn is none other than creator of caret package too. It’s one of the best book comprising a blend of theoretical and practical knowledge. It discusses several crucial machine learning topics such as over-fitting, feature selection, linear & non-linear models, trees methods etc. Needless to say, it demonstrates all these algorithms using caret package. Caret is one of the powerful ML package contributed in CRAN library.
Available: Buy Now
This book is written by a team of authors including Trevor Hastie and Robert Tibshirani. It is one of the most detailed book on statistical modeling. Also, it’s available for free. It comprises of in-depth explanation of topics such as linear regression, logistic regression, trees, SVM, unsupervised learning etc. Since it’s the introduction, the explanations are quite easy and any newbie can easily follow it. Thus, I recommended this book to all people who are new to machine learning in R. In addition, several practice exercises in this book just adds cherry on top.
Available: Buy Now
This book is written by Trevor Hastie, Robert Tibshirani and Jerome Friedman. This is the next part of ‘Introduction to Statistical Learning’. It comprises of more advanced topics, therefore I would suggest you not to directly jump to it. This book in best suited for people familiar with basics of machine learning. It talks about shrinkage methods, different linear methods for regression, classification, kernel smoothing, model selection etc. It’s a must read book for people who want to understand ML in depth.
Available: Buy Now
This book is written by Brett Lantz. I am impressed by the simplicity of this author’s way of explaining concepts. It’s a book on machine learning which is easy to understand, and would provide you a lot of knowledge about their practical aspects too. Algorithms such as Bagging, Boosting, SVM, Neural Network, Clustering etc are discussed by solving respective case studies. These case studies will help you understand the real world usage of these algorithms. In addition, knowledge of ML parameters is also discussed.
Available: Buy Now
This book is written by Cory Lesmeister. It is best suited for everyone who want to master R for machine learning purposes. It comprises of all (almost) algorithms and their execution in R. Alongside, this book will introduce you to several R packages used for ML including the recently launched H2o package. It’s a book which features latest advancements in ML forte, hence I’d suggest it to be read by every R user. However, you can’t expect to learn advanced ML techniques like Stacking from this book.
Available: Buy Now
This book is written by Drew Conway and John Myles White. It’s a relatively shorter book than others, but aptly brings out sheer importance of every topic discussed. After reading this book, I realized that the author’s mindset is not to go deep in a topic, still making sure to cover important details. For enhanced understanding, the author also demonstrates several used cases, while solving which, explains the underlying methods too. It’s a good read for everyone who’d like to learn something new about ML.
Available: Buy Now
This book is written by Nina Zumel & John Mount. As the name suggests, this book focuses on using data science methods in real world. It’s different in itself. None of the books listed above, talks about real world challenges in model building, model deployment, but it does. The author doesn’t move her focus from establishing a connect between theoretical world of ML and its impact on real world activities. It’s a must read for freshers who are yet to enter analytics industry.
Available: Buy Now
This book is written by Samir Madhavan. This book starts with an introduction to data structures in Numpy & Pandas and provides a useful description of importing data from various sources into these structures. You will learn to perform linear algebra in Python and make analysis by using inferential statistics. Later, the book takes onto the advanced concepts like building a recommendation engine, high-end visualization using Python, ensemble modeling etc.
Available: Buy Now
Want to get started with data analysis with Python? Get your hands on this data analysis guide by W Mckinney, the main author of Pandas library. There isn’t any online course as comprehensive as this book. This book covers all aspects of data analysis from manipulating, processing, cleaning, visualization and crunching data in Python. If you are a new to data science python, it’s a must read for you. It’s power-packed with case studies from various domains.
Available: Buy Now
This book is written by Andreas Muller and Sarah Guido. It’s meant to help beginners to get started with machine learning. It teaches to build ML models in python scikit-learn from scratch. It assumes no prior knowledge, hence it’s best suited for people with no prior python or ML knowledge. In addition, it also covers advanced methods for model evaluation and parameter tuning, methods for working with text-data, text -specific processing techniques etc.
Available: Buy Now
This book is written by Sebastian Raschka. It’s one of the most comprehensive book’s I’ve found on ML in Python. The author explains every crucial detail we need to know about machine learning. He takes a stepwise approach in explaining the concepts supported by various examples. This book cover topics such as neural networks, clustering, regression, classification, ensemble etc. It’s a must read book for everyone keen to master ML in python.
Available: Buy Now
Available: Buy Now
This book is written by John Hearty. It’s a definite read for every machine learning enthusiasts. It lets you rise above the basics of ML techniques and dive into unsupervised methods, deep belief networks, Auto encoders, feature engineering techniques, ensembles etc. It’s definitely a book you would want to read to improve your ranks in machine learning competitions. The author lays equal emphasis on theoretical as well practical aspects of machine learning.
Available: Buy Now
This book is written by Toby Segaran. With an interesting title, this book is meant to introduce you to several ML algorithms such as SVM, trees, clustering, optimization etc using interesting examples and used cases. This is book is best suited for people new to ML in python. Python, known for its incredible ML libraries & support should make it easy for you to learn these concepts faster. Also, the chapters include exercises for practice to help you develop better understanding.
Available: Buy Now
The motive of this article is to introduce you to the huge reservoir of knowledge which you haven’t noticed yet. These books will not only provide you boundless knowledge but also, enrich you with various perspectives on using ML algorithms. You might feel puzzled at seeing so many books explaining similar concepts. What differentiates these books is the case studies & examples discussed.
Trust me, sometimes theoretical explanations becomes quite difficult to decipher as compared to understanding practical cases. That’s how I feel. Learning from these author’s knowledge is the fastest way you can learn from so many people.
Hope this article would help you selecting your next book on R or Python. Do keep me posted about your reading experience / suggestions or advises.
Hi Manish Thank you for sharing these books. I want to get a suggestion from you, if I may. I am a database developer with 7 years of experience. Just started learning R, stats and machine learning with a help of a technical institute located in Bangalore. Of all of the above books, you have suggested, which 2 would you recommend first, It would be nice if you could give me some insights of how you approached learning when you first started your journey to becoming a data scientist. Thanks Lokesh
Hi Lokesh Of all the books, the best options for you and the books which helped me initially were: 1. Introduction to Statistical Learning 2. Hands on Programming in R These 2 books would introduce you with programming + machine learning spectrum of R, and will put your basics at place. However, just reading these books wouldn't be enough. Make sure you undertake every practice exercise given in chapters. Trust me, it gives a lot of confidence.
Book: Transition to Higher Mathematics Structure and Proof by Bob A. Dumas and John E. McCarthy (~275 pages) tops my list.
Thanks for this summary For high end probabilistic graphical models in R I often use Søren Højsgaard Graphical Models with R, https://www.amazon.com/Graphical-Models-R-Use/dp/1461422981