Machine Learning Lecture2
Machine Learning Lecture2
March 9, 2023
Academic City University College, Agbogba Haatso, Ghana.
OUTLINE
• Python
• Numpy
• Pandas
• Scipy
• matplotlib/seaborn
• Data Exploration
1
Python
PYTHON
2
IMPORTANT PYTHON CONCEPTS FOR A DATA
SCIENTIST:
4
IMPORTANT PYTHON CONCEPTS FOR A DATA
SCIENTIST CON’T
• Virtual Environments:
Data scientists need to be familiar with virtual
environments to create isolated environments for different
projects with different dependencies.
• Collaboration:
Data scientists should be familiar with version control
systems like Git and collaboration platforms like GitHub
to collaborate with other team members and manage the
codebase.
6
Numpy
NUMPY
7
NUMPY CON’T
• Broadcasting
Broadcasting is a powerful feature of NumPy that allows
for element-wise operations between ndarrays of different
shapes and dimensions.
• Vectorization
Vectorization is the process of converting iterative
operations into vector operations, which can be
performed more efficiently with NumPy.
• File Input/Output
NumPy provides functions for reading and writing
ndarrays to and from files. You should be familiar with
functions like np.load(), np.save(), np.savetxt(), and
np.loadtxt() for working with files in NumPy.
10
pandas
PANDAS
11
PANDAS CON’T
13
PANDAS CON’T
14
PANDAS CON’T
• Data Transformation
Data transformation involves converting data from one
form to another. Pandas provide a wide range of
functions for data transformation, including functions for
sorting, ranking, merging, and pivoting data. You should
be familiar with functions like sort values(), rank(),
merge(), and pivot table() for transforming data.
15
PANDAS CON’T
• Time-Series Analysis
Pandas provide powerful support for time-series analysis.
You should be familiar with functions for working with
time-series data, like resample(), rolling(), and shift().
You should also be familiar with functions for handling
time zones and date ranges.
• Input/Output
Pandas provide functions for reading and writing data to
and from various file formats, including CSV, Excel,
SQL databases, and more. You should be familiar with
functions like read csv(), read excel(), read sql(),
to csv(), and to excel() for working with data in
Pandas.
16
Scipy
SCIPY
17
SCIPY
• Integration
Scipy provides functions for numerical integration,
including quad(), dblquad(), and tplquad(). These
functions can be used to calculate integrals of functions
in one, two, or three dimensions, respectively.
• Optimization
Scipy provides functions for numerical optimization,
including minimize(), curve fit(), and root(). These
functions can be used to find the minimum or maximum
of a function, fit a curve to data, or solve nonlinear
equations, respectively.
18
SCIPY
• Interpolation
Scipy provides functions for numerical interpolation,
including interp1d(), interp2d(), and griddata().
These functions can be used to interpolate data onto a
grid, or to create a smooth curve that passes through a
set of points.
• Signal Processing
Scipy provides functions for signal processing, including
convolution(), fft(), and spectrogram(). These
functions can be used to filter, transform, and analyze
signals, such as audio, image, or time-series data.
19
SCIPY
• Linear Algebra
Scipy provides functions for linear algebra, including
solve(), eig(), and svd(). These functions can be used
to solve linear systems of equations, compute eigenvalues
and eigenvectors, and perform singular value
decomposition.
• Statistics
Scipy provides functions for statistical analysis, including
ttest 1samp(), ttest ind(), and pearsonr(). These
functions can be used to perform hypothesis testing,
calculate confidence intervals, and compute correlation
coefficients.
20
SCIPY
• Sparse Matrices
Scipy provides support for sparse matrices, which are
useful for representing large datasets with many zeros.
Scipy provides functions for creating, manipulating, and
solving sparse matrices, including csr matrix(),
coo matrix(), and spsolve().
• Image Processing
Scipy provides functions for image processing, including
imread(), imsave(), and ndimage(). These functions
can be used to read and write image files, as well as
perform operations like filtering, segmentation, and
morphological operations on images.
21
matplotlib/seaborn
MATPLOTLIB
22
MATPLOTLIB CON’T
• Subplots
Matplotlib allows you to create multiple plots within a
single figure using subplots. Understanding how to create
and customize subplots can be useful for comparing
multiple datasets or visualizing different aspects of a
single dataset.
• Saving and Exporting Plots
Matplotlib allows you to save your plots in various
formats, such as PNG, PDF, or SVG. Understanding
how to save and export your plots can be useful for
sharing your visualizations with others or incorporating
them into reports or presentations.
24
MATPLOTLIB CON’T
• Plot Customization
Matplotlib provides many options for customizing plots,
such as changing the color, size, and style of lines or
markers, adding labels and titles, adjusting axis limits and
ticks, and more. Understanding how to use these options
can help improve the clarity and effectiveness of your
visualizations.
• Integration with Pandas Matplotlib can be easily
integrated with the Pandas library, which is commonly
used for data manipulation and analysis. Understanding
how to use Matplotlib to create visualizations from
Pandas dataframes can be useful for quickly exploring and
analyzing datasets.
25
Data Exploration
FF
26
DATA EXPLORATION
• Feature Engineering
Feature engineering involves creating new features or
variables from existing data to improve the performance
of machine learning models. Understanding how to select
and create appropriate features is important for
developing effective models.
• Exploratory Data Analysis (EDA) EDA involves
examining the data in depth to generate hypotheses and
insights about the data. Techniques such as clustering
and dimensionality reduction can help identify patterns
and relationships in the data.
29
DATA EXPLORATION
30
END OF PRESENTATION
THANK YOU