100% found this document useful (1 vote)

308 views8 pages

7 Steps To Mastering Machine Learning With Python

This 7-step document outlines a path for learning machine learning with Python, from basic Python skills to advanced machine learning topics. Step 1 focuses on basic Python skills. Step 2 covers foundational machine learning concepts. Step 3 provides an overview of key scientific Python packages. Step 4 introduces machine learning implementation in Python using scikit-learn. Step 5 explores machine learning algorithms like k-means clustering, decision trees, linear regression, and logistic regression. Step 6 covers more advanced topics like support vector machines and neural networks. Step 7 encourages practicing and exploring on your own.

Uploaded by

ypravi

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Download as docx, pdf, or txt

100% found this document useful (1 vote)

308 views8 pages

7 Steps To Mastering Machine Learning With Python

Uploaded by

ypravi

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Download as docx, pdf, or txt

You are on page 1/ 8

7 Steps to Mastering Machine Learning With Python

By Matthew Mayo, KDnuggets.

.
Getting started. Two of the most de-motivational words in the English language. The
first step is often the hardest to take, and when given too much choice in terms of
direction it can often be debilitating.

Where to begin?
This post aims to take a newcomer from minimal knowledge of machine learning in
Python all the way to knowledgeable practitioner in 7 steps, all while using freely
available materials and resources along the way. The prime objective of this outline
is to help you wade through the numerous free options that are available; there are
many, to be sure, but which are the best? Which complement one another? What is
the best order in which to use selected resources?

Moving forward, I make the assumption that you are not an expert in:

Machine learning
Python
Any of Python's machine learning, scientific computing, or data analysis
libraries
It would probably be helpful to have some basic understanding of one or both of
the first 2 topics, but even that won't be necessary; some extra time spent on the
earlier steps should help compensate.

Step 1: Basic Python Skills

If we intend to leverage Python in order to perform machine learning, having some
base understanding of Python is crucial. Fortunately, due to its widespread
popularity as a general purpose programming language, as well as its adoption in
both scientific computing and machine learning, coming across beginner's tutorials
is not very difficult. Your level of experience in both Python and programming in
general are crucial to choosing a starting point.
First, you need Python installed. Since we will be using scientific computing and
machine learning packages at some point, I suggest that you install Anaconda. It is
an industrial-strength Python implementation for Linux, OSX, and Windows,
complete with the required packages for machine learning, including numpy, scikit-
learn, and matplotlib. It also includes iPython Notebook, an interactive environment
for many of our tutorials. I would suggest Python 2.7, for no other reason than it is
still the dominant installed version.
If you have no knowledge of programming, my suggestion is to start with the
following free online book, then move on to the subsequent materials:

Python The Hard Way, by Zed A. Shaw

If you have experience in programming but not with Python in particular, or if your
Python is elementary, I would suggest one or both of the following:

Google Developers Python Course (highly recommended for visual learners)

An Introduction to Python for Scientific Computing (from UCSB Engineering),
by M. Scott Shell (a great scientific Python intro ~60 pages)
And for those looking for a 30 minute crash course in Python, here you go:

Learn X in Y Minutes (X = Python)

Of course, if you are an experienced Python programmer you will be able to skip
this step. Even if so, I suggest keeping the very readable Python
documentation handy.
Step 2: Foundational Machine Learning Skills

KDnuggets' own Zachary Lipton has pointed out that there is a lot of variation in
what people consider a "data scientist." This actually is a reflection of the field of
machine learning, since much of what data scientists do involves using machine
learning algorithms to varying degrees. Is it necessary to intimately
understand kernel methods in order to efficiently create and gain insight from a
support vector machine model? Of course not. Like almost anything in life, required
depth of theoretical understanding is relative to practical application. Gaining an
intimate understanding of machine learning algorithms is beyond the scope of this
article, and generally requires substantial amounts of time investment in a more
academic setting, or via intense self-study at the very least.
The good news is that you don't need to possess a PhD-level understanding of the
theoretical aspects of machine learning in order to practice, in the same manner
that not all programmers require a theoretical computer science education in order
to be effective coders.

Andrew Ng's Coursera course often gets rave reviews for its content; my
suggestion, however, is to browse the course notes compiled by a former student
of the online course's previous incarnation. Skip over the Octave-specific notes (a
Matlab-like language unrelated to our Python pursuits). Be warned that these are
not "official" notes, but do seem to capture the relevant content from Andrew's
course material. Of course, if you have the time and interest, now would be the
time to take Andrew Ng's Machine Learning course on Coursera.
Unofficial Andrew Ng course notes
There all sorts of video lectures out there if you prefer, alongside Ng's course
mentioned above. I'm a fan of Tom Mitchell, so here's a link to his recent lecture
videos (along with Maria-Florina Balcan), which I find particularly approachable:

Tom Mitchell Machine Learning Lectures

You don't need all of the notes and videos at this point. A valid strategy involves
moving forward to particular exercises below, and referencing applicable sections
of the above notes and videos when appropriate. For example, when you come
across an exercise implementing a regression model below, read the appropriate
regression section of Ng's notes and/or view Mitchell's regression videos at that
time.

Step 3: Scientific Python Packages Overview

Alright. We have a handle on Python programming and understand a bit about

machine learning. Beyond Python there are a number of open source libraries
generally used to facilitate practical machine learning. In general, these are the
main so-called scientific Python libraries we put to use when performing elementary
machine learning tasks (there is clearly subjectivity in this):
numpy - mainly useful for its N-dimensional array objects
pandas - Python data analysis library, including structures such as
dataframes
matplotlib - 2D plotting library producing publication quality figures
scikit-learn - the machine learning algorithms used for data analysis and
data mining tasks
A good approach to learning these is to cover this material:

Scipy Lecture Notes, by Gal Varoquaux, Emmanuelle Gouillart, and Olav

Vahtras
This pandas tutorial is good, and to the point:

10 Minutes to Pandas
You will see some other packages in the tutorials below, including, for example,
Seaborn, which is a data visualization library based on matplotlib. The
aforementioned packages are (again, subjectively) the core of a wide array of
machine learning tasks in Python; however, understanding them should let you
adapt to additional and related packages without confusion when they are
referenced in the following tutorials.

Now, on to the good stuff...

Step 4: Getting Started with Machine Learning in Python

Python. Check.
Machine learning fundamentals. Check.
Numpy. Check.
Pandas. Check.
Matplotlib. Check.
The time has come. Let's start implementing machine learning algorithms with
Python's de factostandard machine learning library, scikit-learn.

The scikit-learn flow chart.

Many of the following tutorials and exercises will be driven by the iPython (Jupyter)
Notebook, which is an interactive environment for executing Python. These iPython
notebooks can optionally be viewed online or downloaded and interacted with
locally on your own computer.

iPython Notebook Overview from Stanford

Also note that the tutorials below are from a number of online sources. All
Notebooks have been attributed to the authors; if, for some reason, you find that
someone has not been properly credited for their work, please let me know and the
situation will be rectified ASAP. In particular, I would like to tip my hat to Jake
VanderPlas, Randal Olson, Donne Martin, Kevin Markham, and Colin Raffel for their
fantastic freely-available resources.
Our first tutorials for getting our feet wet with scikit-learn follow. I suggest doing all
of these in order before moving to the following steps.

A general introduction to scikit-learn, Python's most-used general purpose machine

learning library, covering the k-nearest neighbors algorithm:

An Introduction to scikit-learn, by Jake VanderPlas

A more in-depth and expanded introduction, including a starter project with a well-
known dataset from start to finish:

Example Machine Learning Notebook, by Randal Olson

A focus on strategies for evaluating different models in scikit-learn, covering
train/test dataset splits:

Model Evaluation, by Kevin Markham

Step 5: Machine Learning Topics with Python

With a foundation having been laid in scikit-learn, we can move on to some more
in-depth explorations of the various common, and useful, algorithms. We start with
k-means clustering, one of the most well-known machine learning algorithms. It is a
simple and often effective method for solving unsupervised learning problems:
k-means Clustering, by Jake VanderPlas
Next, we move back toward classification, and take a look at one of the most
historically popular classification methods:

Decision Trees via The Grimm Scientist

From classification, we look at continuous numeric prediction:

Linear Regression, by Jake VanderPlas

We can then leverage regression for classification problems, via logistic regression:

Logistic Regression, by Kevin Markham

Step 6: Advanced Machine Learning Topics with Python

We've gotten our feet wet with scikit-learn, and now we turn our attention to some
more advanced topics. First up are support vector machines, a not-necessarily-
linear classifier relying on complex transformations of data into higher dimensional
space.
Support Vector Machines, by Jake VanderPlas
Next, random forests, an ensemble classifier, are examined via a Kaggle Titanic
Competition walk-through:
Kaggle Titanic Competition (with Random Forests), by Donne Martin
Dimensionality reduction is a method for reducing the number of variables being
considered in a problem. Principal Component Analysis is a particular form of
unsupervised dimensionality reduction:

Dimensionality Reduction, by Jake VanderPlas

Before moving on to the final step, we can take a moment to consider that we have
come a long way in a relatively short period of time.

Using Python and its machine learning libraries, we have covered some of the most
common and well-known machine learning algorithms (k-nearest neighbors, k-
means clustering, support vector machines), investigated a powerful ensemble
technique (random forests), and examined some additional machine learning
support tasks (dimensionality reduction, model validation techniques). Along with
some foundational machine learning skills, we have started filling a useful toolkit
for ourselves.

We will add one more in-demand tool to that kit before wrapping up.

Step 7: Deep Learning in Python

The learning is deep.
Deep learning is everywhere! Deep learning builds on neural network research
going back several decades, but recent advances dating to the past several years
have dramatically increased the perceived power of, and general interest in, deep
neural networks. If you are unfamiliar with deep learning, KDnuggets has many
articles detailing the numerous recent innovations, accomplishments, and
accolades of the technology.
This final step does not purport to be a deep learning clinic of any sort; we will take
a look at a few simple network implementations in 2 of the leading contemporary
Python deep learning libraries. For those interested in digging deeper into deep
learning, I recommend starting with the following free online book:

Neural Networks and Deep Learning by Michael Nielsen

Theano
Theano is the first Python deep learning library we will look at. From the authors:

Theano is a Python library that allows you to define, optimize, and evaluate
mathematical expressions involving multi-dimensional arrays efficiently.
The following introductory tutorial on deep learning in Theano is lengthy, but it is
quite good, very descriptive, and heavily-commented:

Theano Deep Learning Tutorial, by Colin Raffel

Caffe
The other library we will test drive is Caffe. Again, from the authors:

Caffe is a deep learning framework made with expression, speed, and modularity in
mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and by
community contributors.

This tutorial is the cherry on the top of this article. While we have undertaken a few
interesting examples above, none likely compete with the following, which is
implementing Google's #DeepDreamusing Caffe. Enjoy this one! After
understanding the tutorial, play around with it to get your processors dreaming on
their own.
Dreaming Deep with Caffe via Google's GitHub
I didn't promise it would be quick or easy, but if you put the time in and follow the
above 7 steps, there is no reason that you won't be able to claim reasonable
proficiency and understanding in a number of machine learning algorithms and
their implementation in Python using its popular libraries, including some of those
on the cutting edge of current deep learning research.

Bio: Matthew Mayo is a computer science graduate student currently working on

his thesis parallelizing machine learning algorithms. He is also a student of data
mining, a data enthusiast, and an aspiring machine learning scientist.

Information Systems 511 Learner Guide
No ratings yet
Information Systems 511 Learner Guide
95 pages
10-Day-Challenge - by Tony-Robbins PDF
No ratings yet
10-Day-Challenge - by Tony-Robbins PDF
2 pages
Ltam Formula Sheet PDF
75% (4)
Ltam Formula Sheet PDF
7 pages
Salt Sugar Fat - How The Food Giants Hooked Us
0% (3)
Salt Sugar Fat - How The Food Giants Hooked Us
3 pages
Data Science and Machine Learning Project Ideas
100% (2)
Data Science and Machine Learning Project Ideas
20 pages
Tutorialspoint Python
100% (3)
Tutorialspoint Python
444 pages
Load Test Plan
No ratings yet
Load Test Plan
14 pages
MiningMind ChatBot
No ratings yet
MiningMind ChatBot
14 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
3 pages
Install TensorFlow With Pip - TensorFlow
No ratings yet
Install TensorFlow With Pip - TensorFlow
3 pages
Anritsu MT9090A Network Master
No ratings yet
Anritsu MT9090A Network Master
8 pages
Full download Neural Networks A Visual Introduction for Beginners Michael Taylor pdf docx
100% (1)
Full download Neural Networks A Visual Introduction for Beginners Michael Taylor pdf docx
65 pages
Advanced Python
100% (2)
Advanced Python
4 pages
Python Fundamentals Object Oriented Python: Web Development With Django Unit Testing With Python (Pytest) Design Patterns
No ratings yet
Python Fundamentals Object Oriented Python: Web Development With Django Unit Testing With Python (Pytest) Design Patterns
30 pages
Pandas
100% (1)
Pandas
1,131 pages
Simple Libraries in Python
No ratings yet
Simple Libraries in Python
12 pages
Research & Simulation - Network Simulations and Installation of NS2 and NS3
No ratings yet
Research & Simulation - Network Simulations and Installation of NS2 and NS3
2 pages
How To Replicate ChatGPT With Langchain and GPT-3 Ahmad Rosid
No ratings yet
How To Replicate ChatGPT With Langchain and GPT-3 Ahmad Rosid
6 pages
Getting Started With CUDA Samples
No ratings yet
Getting Started With CUDA Samples
9 pages
Python Datatypes
100% (1)
Python Datatypes
28 pages
Python Cheet Sheet PDF
100% (1)
Python Cheet Sheet PDF
8 pages
Data Visualization With Python
No ratings yet
Data Visualization With Python
19 pages
Beginners Python Cheat Sheet PCC All
100% (2)
Beginners Python Cheat Sheet PCC All
26 pages
Python - Programming
No ratings yet
Python - Programming
9 pages
1 - Intro To Machine Learning
100% (1)
1 - Intro To Machine Learning
20 pages
Face Detection & Emotion Recognition
No ratings yet
Face Detection & Emotion Recognition
26 pages
Tools Machine Learning
No ratings yet
Tools Machine Learning
9 pages
Deep Learning Most Important Ideas PDF
No ratings yet
Deep Learning Most Important Ideas PDF
16 pages
ENG 202: Computers and Engineering Object Oriented Programming in PYTHON
No ratings yet
ENG 202: Computers and Engineering Object Oriented Programming in PYTHON
56 pages
Great Collection of Data Science Resources
100% (1)
Great Collection of Data Science Resources
2 pages
Artificial Intelligence With Python Nanodegree Syllabus 9-5 PDF
No ratings yet
Artificial Intelligence With Python Nanodegree Syllabus 9-5 PDF
15 pages
OceanofPDF - Com Python Machine Learning The Beginners Gu - Lilly Trinity
No ratings yet
OceanofPDF - Com Python Machine Learning The Beginners Gu - Lilly Trinity
115 pages
Python For Beginners
No ratings yet
Python For Beginners
119 pages
How To Start Machine Learning: by Suleyman Suleymanzade
100% (1)
How To Start Machine Learning: by Suleyman Suleymanzade
13 pages
NMĐT TH - CC05 - Vương Quốc Anh
No ratings yet
NMĐT TH - CC05 - Vương Quốc Anh
35 pages
Introduction To The Python Programming Language
No ratings yet
Introduction To The Python Programming Language
41 pages
Pytorch Tutorial
0% (1)
Pytorch Tutorial
65 pages
Python Scripting
100% (1)
Python Scripting
15 pages
Artificial Intelligence Projects PDF
No ratings yet
Artificial Intelligence Projects PDF
4 pages
Chatgpt for python
No ratings yet
Chatgpt for python
192 pages
Machine Learning With Python PDF
No ratings yet
Machine Learning With Python PDF
5 pages
NLTK Documentation: Release 3.2.5
No ratings yet
NLTK Documentation: Release 3.2.5
87 pages
Python Tkinter Tutorial
100% (2)
Python Tkinter Tutorial
71 pages
Sets in Python
No ratings yet
Sets in Python
7 pages
19 Data Science and Machine Learning Tools For People Who Don't Know Programming
No ratings yet
19 Data Science and Machine Learning Tools For People Who Don't Know Programming
8 pages
Python (Sockets) - Network Programming - Tutorialspoint
No ratings yet
Python (Sockets) - Network Programming - Tutorialspoint
8 pages
Corejavabynageswararaopdffreedownload PDF
0% (2)
Corejavabynageswararaopdffreedownload PDF
3 pages
Day 45 PyTorch Presentation
No ratings yet
Day 45 PyTorch Presentation
67 pages
AZ (PDF) Full Download Python For Probability, Statistics, and Machine Learning Read Online
No ratings yet
AZ (PDF) Full Download Python For Probability, Statistics, and Machine Learning Read Online
1 page
Introduction To Python Slides
No ratings yet
Introduction To Python Slides
72 pages
Python, Install PIP
No ratings yet
Python, Install PIP
18 pages
Natural Language Toolkit NLTK PDF
No ratings yet
Natural Language Toolkit NLTK PDF
23 pages
Bad Ideas
No ratings yet
Bad Ideas
69 pages
DIY Cozmo Robot Expressions: Technology Workshop Craft Home Food Play Outside Costumes
No ratings yet
DIY Cozmo Robot Expressions: Technology Workshop Craft Home Food Play Outside Costumes
7 pages
Python: An Introduction Python: An Introduction
100% (1)
Python: An Introduction Python: An Introduction
82 pages
Mushroom Classification Using Machine Learning
No ratings yet
Mushroom Classification Using Machine Learning
23 pages
Numpy-User-1 10 1
No ratings yet
Numpy-User-1 10 1
107 pages
Introduction To Parallel Computing
100% (1)
Introduction To Parallel Computing
34 pages
Python Machine Learning - Sample Chapter
No ratings yet
Python Machine Learning - Sample Chapter
57 pages
100 Numpy Exercises
No ratings yet
100 Numpy Exercises
14 pages
PyTorch For Machine Learning
No ratings yet
PyTorch For Machine Learning
5 pages
Introduction to Python Programming: Learn Coding with Hands-On Projects for Beginners
From Everand
Introduction to Python Programming: Learn Coding with Hands-On Projects for Beginners
Kiet Huynh
No ratings yet
Living With Linux In the Industrial World
From Everand
Living With Linux In the Industrial World
Elaiya Iswera Lallan
No ratings yet
E Banking
No ratings yet
E Banking
8 pages
7 Essential Support Metrics You Should Start Measuring Today - Desk
No ratings yet
7 Essential Support Metrics You Should Start Measuring Today - Desk
7 pages
PMI Talent-Triangle-Flyer PDF
No ratings yet
PMI Talent-Triangle-Flyer PDF
1 page
Kalinga and Funan - A Study in Ancient Relations
No ratings yet
Kalinga and Funan - A Study in Ancient Relations
7 pages
Common Yoga Terms Ahimsa Ananda Asana Ayurveda Bandha
No ratings yet
Common Yoga Terms Ahimsa Ananda Asana Ayurveda Bandha
1 page
What Are Economic Models
No ratings yet
What Are Economic Models
2 pages
Life Values and Shrimad Bhagavad Gita
No ratings yet
Life Values and Shrimad Bhagavad Gita
9 pages
Problem Solving and Brain Based Learning
No ratings yet
Problem Solving and Brain Based Learning
15 pages
The Art of Managing Multiple Process-Mgmt
No ratings yet
The Art of Managing Multiple Process-Mgmt
3 pages
Top 10 Brain-Based Teaching Strategies
67% (3)
Top 10 Brain-Based Teaching Strategies
12 pages
ABCs of BBL PDF
No ratings yet
ABCs of BBL PDF
28 pages
About Pytorch Brief Details 1716579380
No ratings yet
About Pytorch Brief Details 1716579380
20 pages
Univariate Time Series Modelling and Forecasting: Introductory Econometrics For Finance' © Chris Brooks 2008 1
No ratings yet
Univariate Time Series Modelling and Forecasting: Introductory Econometrics For Finance' © Chris Brooks 2008 1
62 pages
Modern Systems Analysis and Design 6th Edition Hoffer Test Bank
100% (41)
Modern Systems Analysis and Design 6th Edition Hoffer Test Bank
23 pages
01 ch03 Intro
No ratings yet
01 ch03 Intro
44 pages
Saliency Map Presentation
No ratings yet
Saliency Map Presentation
38 pages
Postal: Computer Science & IT
No ratings yet
Postal: Computer Science & IT
2 pages
Subgraphs, Paths, and Connected Graphs
No ratings yet
Subgraphs, Paths, and Connected Graphs
15 pages
PR3-12217033-Muhammad Aqsal Ilham
No ratings yet
PR3-12217033-Muhammad Aqsal Ilham
6 pages
Answer 2022 Engineering College Maths Exam
No ratings yet
Answer 2022 Engineering College Maths Exam
10 pages
Fault Location in Ehv Transmission Lines Using Artificial Neural Networks
No ratings yet
Fault Location in Ehv Transmission Lines Using Artificial Neural Networks
10 pages
Logistics Modeling Maths
No ratings yet
Logistics Modeling Maths
7 pages
Euler Circuit/Path Examples
No ratings yet
Euler Circuit/Path Examples
4 pages
Principles of Public Cryptosystems
No ratings yet
Principles of Public Cryptosystems
32 pages
Kundur Word Pequeña Señal
No ratings yet
Kundur Word Pequeña Señal
72 pages
Data Compression & Encryption - Prof - S. Chaya
No ratings yet
Data Compression & Encryption - Prof - S. Chaya
17 pages
TOAE201-LecturerNotes-Chapter 4. Sample Theoretical Basis
No ratings yet
TOAE201-LecturerNotes-Chapter 4. Sample Theoretical Basis
25 pages
Wallace Tree Multiplier
No ratings yet
Wallace Tree Multiplier
16 pages
Electric Power Distribution Handbook
No ratings yet
Electric Power Distribution Handbook
41 pages
Flowchart Seatwork No.1 MS Office 2019
No ratings yet
Flowchart Seatwork No.1 MS Office 2019
6 pages
Chapter 8
No ratings yet
Chapter 8
29 pages
.1.statistics MCQS
No ratings yet
.1.statistics MCQS
5 pages
Ch3 StochasticDifferentialEquations
No ratings yet
Ch3 StochasticDifferentialEquations
9 pages
Chebyshev Filters With NMath - CenterSpace Blog
No ratings yet
Chebyshev Filters With NMath - CenterSpace Blog
7 pages
Emailing PREDICTIVE ANALYSIS 2
No ratings yet
Emailing PREDICTIVE ANALYSIS 2
14 pages
Data Structure Using C
No ratings yet
Data Structure Using C
2 pages
Cryptography and Mathematics
No ratings yet
Cryptography and Mathematics
223 pages
Regression Analysis (1722021)
No ratings yet
Regression Analysis (1722021)
279 pages
Q1
No ratings yet
Q1
2 pages
1998 - A Method For VRP With Multiple Vehicle Types and TW
No ratings yet
1998 - A Method For VRP With Multiple Vehicle Types and TW
11 pages