How To Learn Data Science
How To Learn Data Science
guide)
There’s no doubt about it: data scientists are in high demand. As of 2020, the average data
scientist in the US makes over $113,000 a year, and data scientists in San Francisco make over
$140,000. Learn data science and you could find yourself working in this promising, well-
compensated field.
But even if you’re not interested in becoming a data scientist, learning data skills and improving
your data literacy can pay big dividends in your current career. Employees who have data skills
and can help their companies become more data driven are in demand across almost any
industry.
There’s no doubt about it: data scientists are in high demand. As of 2020, the average data
scientist in the US makes over $113,000 a year, and data scientists in San Francisco make over
$140,000. Learn data science and you could find yourself working in this promising, well-
compensated field.
But even if you’re not interested in becoming a data scientist, learning data skills and improving
your data literacy can pay big dividends in your current career. Employees who have data skills
and can help their companies become more data driven are in demand across almost any
industry.
Nobody ever talks about motivation in learning. Data science is a broad and fuzzy field, which
makes it hard to learn. Really hard. Without motivation, you’ll end up stopping halfway through
and believing you can’t do it. When this happens, the fault isn’t with you — it’s with the
teaching.
You need something that will motivate you to keep learning, even when it’s midnight, the
formulas are starting to look blurry, and you’re wondering if neural networks will ever make
sense.
You need something that will help you find the linkages between statistics, linear algebra, and
neural networks. Something that will prevent you from struggling with the “what do I learn
next?” question. You need motivation. Not in the form of an inspiring quote, but in the form of a
passion project you can use to drive your learning.
My entry point to data science was predicting the stock market, although I didn’t know it at the
time. Some of the first programs I coded to predict the stock market involved almost no statistics.
But I knew they weren’t performing well, so I worked day and night to make them better.
I was obsessed with improving the performance of my programs. I was obsessed with the stock
market. That was my motivation.
And as I worked, I was learning to love data. Because I was learning to love data, I was
motivated to learn anything I needed to make my programs better.
Not everyone is obsessed with predicting the stock market, I know. But it’s important to find that
thing that makes you want to learn.
It can be figuring out new and interesting things about your city, mapping all the devices on the
internet, finding the real positions NBA players play, mapping refugees by year, or anything else.
The great thing about data science is that there are genuinely infinite interesting things to work
on. It’s all about asking questions and finding a way to get answers — and you can ask any
question you want.
Take control of your learning by tailoring it to what you want to do, not the other way around.
Learning about machine learning, neural networks, image recognition, and other cutting-edge
techniques is important. But most data science doesn’t involve any of it. As a working data
scientist:
What all of this means is that the best way to learn is to work on projects. By working on
projects, you gain skills that are immediately applicable and useful, because real-world data
scientists have to see data science projects through from start to finish, and most of that work is
in fundamentals like cleaning and managing the data.
(Working on projects as you study also gives you nice way to build a portfolio. This will be
tremendously valuable when you’re ready to start applying for jobs).
So how can you find a good project? One technique to start projects is to find a data set you like.
Try to answer an interesting question about it. Rinse and repeat.
Here are some good places to find free data sets to get you started:
Another technique (and this was my technique) was to find a deep problem, predicting the stock
market, that could be broken down into small steps. I first connected to the Yahoo finance API,
and pulled down daily price data. I then created some indicators, like average price over the past
few days, and used them to predict the future (no real algorithms here, just technical analysis).
This didn’t work so well, so I learned some statistics, and then used linear regression. Then I
connected to another API, scraped minute by minute data, and stored it in a SQL database. And
so on, until the algorithm worked well.
The great thing about this is that I had context for my learning. I didn’t just learn SQL syntax in
the abstract. I used it to store price data, and thus learned 10x as much as I would have by just
studying syntax. Learning without application is easy to forget. More important, if you’re not
actively applying what you learn, your studies won’t prepare you to do actual data science work.
Data scientists constantly need to present the results of their analysis to others. Doing this well
this can be the difference between an being an okay data scientist and a great one. Data analysis
is typically only valuable in a business context if you can convince other people at your company
to act on what you found, and that means learning to communicate data.
Part of communicating insights is understanding the topic and theory — you’ll never be able to
explain to others something that you don’t understand yourself. Another part is understanding
how to clearly organize your results. The final piece is being able to explain your analysis
clearly.
It’s hard to get good at communicating complex concepts effectively, but here are some things
you should try:
Start a blog. Post the results of your data analysis. Or submit a pitch and write for Dataquest’s
blog!
Try to teach your less tech-savvy friends and family about data science concepts. It’s amazing
how much teaching can help you understand concepts.
Try to speak at meetups.
Use GitHub to host and share all your analysis.
Get active on communities like Quora, the Dataquest learning community, and the machine
learning subreddit.
It’s amazing how much you can learn from working with others. In data science, teamwork can
also be very important in a job setting. Data scientists often work as part of a team, and lone data
scientists at smaller companies will typically work together with other teams at their company to
solve specific problems. It’s not unusual for a data scientist to move from team to team as they
work on answering data questions for different arms of the company, so being able to collaborate
may be more important for data scientists than almost anyone else!
Are you completely comfortable with the project you’re working on? Was the last time you used
a new concept a week ago? It’s time to work on something more difficult. Data science is a steep
mountain to climb, and it’s easy to stop climbing. But of course, if you stop climbing, you’ll
never make it to the top!
If you find yourself getting too comfortable, here are some ideas that can add some complexity
and challenge to almost any data science project. Try adding one or more of these into your plans
to get yourself out of your comfort zone:
That last one is a really underrated challenge, and if you give it a try, you’ll quickly see how
valuable teaching can be to someone who’s trying to learn. You’ll likely come out of the
experience with a much deeper understanding of the topic than you had before, and you’ll have
improved your communication and explanation skills, too.
Having a certification on your resume is not likely to help you get a job. What’s important to
employers is the skills you have. A certificate, by itself, doesn’t tell an employer anything about
your skills. It just tells them that you studied a topic.
However, certificate programs can still be incredibly valuable if they can teach you the skills you
need effectively.
Programs and platforms that offer certifications can still be a great investment, but it’s important
to keep in mind that their value lies in the skills they can teach you.
When employers look at your resume, they’re going to be looking at your skills, your project
portfolio, and your relevant experience. A certificate is very unlikely to sway their decision, so
focus on acquiring the right skills and building cool projects.
Here’s some more information about data science certificates and whether or not you need one.
Do you need a degree in data science?
Having a data science degree on your resume might help you get a job. However, getting one
typically takes years and costs tens if not hundreds of thousands of dollars.
Universities can also be subject to institutional inertia and slow to adapt, so you can end up
wasting time studying older technologies that aren’t as relevant in the current business
environment.
Thankfully, there are many, many examples of people who’ve successfully learned data science
on their own, and reached a high level in the industry without needing a specialized degree.
For example, I myself worked as a machine learning engineer at EdX before starting Dataquest.
But I don’t have a degree in data science or machine learning. I taught myself those skills.
Our Dataquest learner stories are also full of examples of learners who’ve gotten industry jobs
with zero background in programming and no data science degree. Our 2020 survey covered
hundreds of respondents who’ve met their data science learning goals with no need to get a
degree.
If you have the time and money to get a university degree in data science, adding it to your
resume can definitely help you. But it is very possible to learn all of the necessary skills faster
and much more affordably. Not having a data science degree will not hurt you in the job market
as long as you do have the relevant skills.
The list of skills that fall under “data science” is huge! You might have seen this intimidating
image somewhere on the web:
But don’t worry, you don’t need to learn all of that!
Based on job postings and what data scientists report doing at work, the most fundamental data
science skills are:
From there, you can dig deeper into specializations like Natural Language Processing, Image
Classification, Deep Learning, and a wide variety of other options depending on your interests.
This article isn’t meant to be a road map of exactly what to do. Rather, consider it as a rough set
of guidelines to follow as you learn data science on your own path. If you do all of these things
well, you’ll find that you’re naturally developing data science expertise. But don’t feel
constricted by them! If you find a different approach that’s keeping you motivated and keeping
you learning, don’t hesitate to incorporate it into your long-term plans.
I generally dislike the “here’s a big list of stuff” approach, because it makes it extremely hard to
figure out what to do next. I’ve seen a lot of people give up learning when confronted with a
giant list of textbooks and MOOCs.
I personally believe that anyone can learn data science if they approach it with the right frame of
mind.
I’m also the founder of Dataquest, a site that helps you learn data science in your browser. It
encapsulates a lot of the ideas discussed in this post to create a better learning experience. You
learn by analyzing interesting data sets like CIA documents and NBA player stats. You also
complete projects and build a portfolio as you work through our courses.
Don’t worry if you don’t know how to code — we teach both Python and R from scratch, no
experience required! We teach Python and R because they’re beginner-friendly languages and
because they’re the most popular languages used in real-world data science.
As I worked on projects, I found these resources helpful. Remember, resources on their own
aren’t necessarily useful — find a context for them:
Dataquest — learn data science in your browser, complete projects, and build a portfolio.
Khan Academy — good basic statistics and linear algebra content.
Introduction to Linear Algebra, 4th Edition — Great linear algebra book by Gilbert
Strang.
Calculus Online Textbook — also by Gilbert Strang, great calculus book.
Elements of statistical learning — good machine learning book.
Andrew Ng’s Machine Learning Class — the original coursera machine learning class.
Mostly video-based.
OpenIntro Statistics — Good basic stats book.
Google Scholar — A paper can be a great way to learn about a topic. For example, here’s
Breiman’s original random forest paper.
Statsoft statistics textbook — Good for looking up statistics concepts.
If you’re ready to tackle the topic of data science and data analytics, Dataquest can help. Start
your journey today.