What are the best books for learning data science?
First things first: if you want to learn to do data science, the most important thing you can do is get your hands on some real-world data and start coding. Our learning platform is designed to do that, getting you hands-on and writing real code from day one. Even if you’re not using Dataquest, your primary approach to learning data skills should be hands-on.
But what can you do to keep learning in those moments when you’re not sitting in front of a computer? Read some data science books!
As a student we recently spoke with pointed out, ebooks are a great way to immerse yourself in data science at times when you can’t actually get hands-on with code — like on a bus ride, for example, or while waiting in line.
You can also listen to bools like podcasts if you use an ebook app with a “read aloud” feature, or decide to pay for an audiobook.
So what books should you read? Below, we’ve listed some of the best. And the even better news? Many of these books are totally free!
Note: Some of the links below are PDF links. We’ve tried to link to the free versions of books where possible.
Non-Technical Data Science Books
(These are books that might help get you motivated to start or continue your data science journey, or help you better understand important issues in the data science field. You won’t learn many practical skills from them, but they’re good reads that help show how data and statistics are used in the real world).
Weapons of Math Destruction – One of the most popular nonfiction works about how “big data” and machine learning are not as unbiased as they might appear. Written by a former Wall Street quantitative analyst.
Big Data: A Revolution That Will Transform How We Live, Work, and Think – A good “big picture” read on how data and machine learning are changing lives in the real world — and on what else is likely to change in the future. If you’ve heard about the hype but aren’t really sure how data science can affect things, this is a good place to start.
Naked Statistics: Stripping the Dread from Data – A good read on statistics and data for the layperson. If you’re interested in learning data science but it’s been a while since your first math course, this is a great book to help you build confidence and intuition about how statistics are useful in the real world.
Invisible Women: Data Bias in a World Designed for Men – Understanding how biases in our data can create inequalities in the real world is critical for anyone working with data to understand. This book details how aspects of gender inequality can be traced to data that treats men as the “default.”
Numsense: Data Science for the Layman – A self-described “gentle” introduction to data science and algorithms, with minimal math. This is used as a textbook in some university courses, and it’s a good place to start if you’re interested in data but a little bit afraid of the math. (By the way, you don’t have to be good at math to learn coding — in fact, it doesn’t even really help).
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are – This book is essentially Freakonomics for data science. It’s an interesting read that will also help you get an idea about how to approach answering different kinds of questions using data.
Algorithms of Oppression: How Search Engines Reinforce Racism – Another book on how algorithms contribute to inequality, this one focused on search engines. Algorithmic bias, and the ways it is created (and can be avoided) is really important for anyone who wants to work with data to understand.
General Data Science Books
The Elements of Data Analytic Style – This book by Johns Hopkins professor Jeff Leek is a useful guide for anyone involved with data analysis, and covers a lot of the little details you might miss in statistics lessons and textbooks. It’s a pay-what-you-want book, so while you can technically get this one for free, we recommend making a contribution if you can.
The Art of Data Science – Another pay-what-you-want book that takes a big-picture view of how to do data science rather than focusing on the technical nitty gritty of statistical or programming techniques.
An Introduction to Data Science – This introductory textbook was written by Syracuse professor Jeffrey Stanton, and it covers a lot of the fundamentals of data science and statistics. It also covers some R programming, but sections of it are very worthwhile reading even for those who’re learning Python.
Social Media Mining – This textbook from Cambridge University Press won’t be relevant for every data science project, but if you do have to scrape data from social media platforms, this is a well-rated guidebook. Note that the site also includes links to some free slide presentations on related topics as well.
The Data Science Handbook – This book is a collection of interviews with prominent data scientists. It doesn’t offer any technical or mathematical insight, but it’s a great read for anyone who’s thinking about data science as a career and wondering what it entails, what roles are out there, and whether it might be right for them.
Doing Data Science: Straight Talk from the Frontline – A collection of talks from data scientists working at a variety of different companies that’s meant to cut through the hype and help you understand how data science works in the real world.
Data Jujitsu: The Art of Turning Data into Product – A good read on general data science processes and the data science problem solving approach from DJ Patil, arguably the most famous data scientist in the United States.
Mining of Massive Datasets – A free textbook on data mining with, as you’d expect from the title, a specific focus on working with huge datasets. Be aware, though, that it’s focused on the math and big-picture theory; it’s not really a programming tutorial.
Designing Data-Intensive Applications – This book is more about data engineering than data science, but it’s a good read for any aspiring data scientist who’s going to be creating production-ready models or who may have to do some data engineering work (which is not uncommon in data science roles, particularly at smaller companies).
Data Science Job: How to Become a Data Scientist – A book on the non-technical side of learning data science — how to build your data science career. The world of data science changes fast, but this book was self-published in 2020 so it’s relatively up-to-date, and several reviewers say it’s a good read for beginners. (Dataquest also has a data science job application and career guide if you’re interested in something that’s shorter and free)
Python for Data Science Books
Python Data Science Handbook – An O’Reilly text by Jake VanderPlas that is also available as a series of Jupyter Notebooks on Github. It’s not for total beginners; it assumes some knowledge of Python programming basics (but don’t worry, we’ve got an interactive Python course you can take for that).
Automate the Boring Stuff with Python – This total beginner’s Python book isn’t focused on data science specifically, but the introductory concepts it teaches are all relevant in data science, and some of the specific skills later in the book (like web scraping and working with Excel files and CSVs) will be of use to data scientists, too.
A Byte of Python (PDF link) – Like Automate the Boring Stuff, this is another well-liked Python-from-scratch ebook that teaches the basics of the language to total beginners. It’s not data-science-specific, but most of the concepts it covers are relevant to data scientists, and it has also been translated into a wide variety of languages, so it’s easily accessible to learners all over the globe.
Learn Python, Break Python – Yet another well-liked Python-for-beginners tome that encourages readers to learn Python by “breaking” it and watching how it handles errors and mistakes.
Data Science from Scratch – A book that’s focused on teaching data science in Python by walking you through how to implement algorithms from scratch. It covers a variety of areas including deep learning, statistics, NLP, and much more.
R for Data Science Books
R Programming for Data Science – Roger D. Peng’s free text will teach you R for data science from scratch, covering the basics of R programming. This is a pay-what-you-want text, but if you do choose to chip in a bit of money, note that for $20 you can get it together with all of the mentioned datasets and code files.
An Introduction to Data Science (PDF link) – This introductory text was already listed above, but we’re listing it again in the R section as well, because it does cover quite a bit of R programming for data science.
Advanced R – This is precisely what it sounds like: a free online text that covers more advanced R topics. Written by Hadley Wickham, one of the most influential voices in the R community.
R Cookbook – Precisely what it sounds like: a collection of R “recipes” for data analysis and data science work.
R Graphics Cookbook – Similar to the above, a cookbook that’s focused specifically on getting higher-quality graphs and charts out of R.
R for Everyone – An R programming textbook that’s focused on teaching R from scratch, without the assumption that the reader already has a deep knowledge of statistics (which is an assumption that some other R textbooks do make).
Machine Learning Books
Neural Networks and Deep Learning – This free online book aims to teach machine learning principles. It’s not the place to go to learn the technical intricacies of any particular library, and it’s written with the now-outdated Python 2.7 rather than Python 3, but there’s still a lot of valuable wisdom here.
Bayesian Reasoning and Machine Learning (PDF link) – A massive 680-page PDF that covers many important machine learning topics, and which was written to serve students who don’t necessarily have any formal background in computer science or advanced mathematics.
Understanding Machine Learning: From Theory to Algorithms – Looking for a thorough look at machine learning that runs from the fundamentals all the way through advanced machine learning theory? Look no further.
Deep Learning – This textbook from MIT Press is only available in HTML format, but it covers everything from the basics up through what’s happening with research into deep learning.
Machine Learning Yearning – This upcoming book from Andrew Ng isn’t technically available, or even finished, but signing up for a mailing list will get you emailed copies of draft chapters. Ng says that where courses teaching technical skills can give you a “hammer”, this book’s aim is to teach you how to use that hammer correctly.
Natural Language Processing with Python – A great text for anyone interested in NLP, and the online version has been updated with Python 3 (the printed version of this book uses Python 2).
Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow – A Python-focused machine learning textbook that uses the scikit-learn and Tensorflow frameworks to explore modeling and build different types of neural nets.
Grokking Deep Learning – Grokking means “understanding,” and that’s exactly what this book is focused on. Its aim is to help you understand deep learning well enough that you can build neural networks from scratch!
Deep Learning with Python – Another Python-focused deep learning and machine learning text, this one focused primarily on using the Keras library.
Introduction to Probability (PDF link) – Precisely what it sounds like: an introductory textbook that teaches probability and statistics.
Bayesian Methods for Hackers – Another free read on Bayesian statistics and programming. The cool thing about this one is that the chapters are in Jupyter Notebook form, so it’s easy to run, edit, and tinker with all of the code you come across.
Statistical Inference for Data Science – A rigorous look at statistical inference for readers who are already somewhat comfortable with basic statistics topics and programming with R.
An Introduction to Statistical Learning (PDF link) – A great introduction to data-science-relevant statistical concepts and R programming.
The Elements of Statistical Learning – Another valuable statistics text that covers just about everything you might want to know, and then some (it’s over 750 pages long). Make sure you get the most updated version of the book from here (as of this writing, that’s the 2017 edition.
Data Mining and Analysis – This Cambridge University Press text will take you deep into the statistics and algorithms used for various types of data analysis.
Think Stats: Exploratory Data Analysis – Another stats text that’s focused on statistics in the context of data analysis work using Python.
…But Don’t Just Read Books!
Books can be a great way to augment your data science learning. But the best way to learn anything, including data science, is to get hands-on and actually do it. Write the code you’re reading about. Collect your own data. Build your own models. Learn by doing.
Dataquest’s online classes teach you everything that you need to become a data scientist in a hands-on, project-based format. From the moment you sign up (it’s free) you’ll be writing real code and working with real datasets.
Give it a try — what have you got to lose?