0% found this document useful (0 votes)

15 views

Machine Learning Lecture2

The document provides an overview and outline of important concepts for data scientists to learn related to Python libraries like NumPy, Pandas, Scipy, and Matplotlib. It discusses key ideas around Python itself like data structures, functions, control flow, modules and libraries. It then summarizes important concepts for working with NumPy like ndarrays, array operations, broadcasting and file input/output. Key Pandas topics covered include Series, DataFrames, indexing/selection, data cleaning/preprocessing, exploration/analysis, and input/output. Scipy concepts discussed include integration, optimization and linear algebra functions.

Uploaded by

xavieranosike

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Machine Learning Lecture2

Uploaded by

xavieranosike

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

NOTATIONS AND DEFINITIONS

Ishaya, Jeremiah Ayock

Lecture 1

March 9, 2023
Academic City University College, Agbogba Haatso, Ghana.
OUTLINE

Notations and Definitions

• Python
• Numpy
• Pandas
• Scipy
• matplotlib/seaborn
• Data Exploration

1
Python
PYTHON

Python is a widely used programming language for data

science, machine learning, and artificial intelligence. It has
become the go-to language for data scientists because of its
simplicity, ease of use, and availability of various libraries and
tools for data manipulation, visualization, and modelling.
In this lecture, we will cover some important Python libraries
for data science, including NumPy, pandas, SciPy,
Matplotlib/Seaborn, and Data Exploration.

2
IMPORTANT PYTHON CONCEPTS FOR A DATA
SCIENTIST:

• Data scientists need to be familiar with data structures

such as lists, tuples, dictionaries, and sets to store,
manipulate and organize data in Python.
• Functions:
Functions are reusable blocks of code that perform a
specific task. Data scientists need to be proficient in
creating, defining and calling functions to reuse the same
block of code in different parts of the program.
• Control Flow:
Data scientists need to be familiar with control flow
constructs like loops and conditional statements to
execute the code in a specific sequence. 3
IMPORTANT PYTHON CONCEPTS FOR A DATA
SCIENTIST CON’T

• Modules and Libraries:

Python provides many libraries and modules to perform
complex tasks. Data scientists should be familiar with
libraries like NumPy, pandas, Matplotlib/Seaborn, SciPy,
Scikit-Learn, and others.
• File Input and Output:
Data scientists need to be familiar with file input/output
(I/O) operations to read and write data to and from files.

4
IMPORTANT PYTHON CONCEPTS FOR A DATA
SCIENTIST CON’T

• Object-Oriented Programming (OOP):

Data scientists should have a basic understanding of OOP
concepts like classes, objects, inheritance, and
polymorphism. OOP is useful in creating custom data
structures and functions for specific use cases.
• Regular Expressions:
Regular expressions are useful in pattern matching and
data cleaning operations. Data scientists should be
familiar with the basic syntax and use cases of regular
expressions.
• Error Handling: Data scientists should be proficient in
error-handling techniques to debug their code and avoid 5
IMPORTANT PYTHON CONCEPTS FOR A DATA
SCIENTIST CON’T

• Virtual Environments:
Data scientists need to be familiar with virtual
environments to create isolated environments for different
projects with different dependencies.
• Collaboration:
Data scientists should be familiar with version control
systems like Git and collaboration platforms like GitHub
to collaborate with other team members and manage the
codebase.

6
Numpy
NUMPY

NumPy is a fundamental library for scientific computing

with Python.
It provides support for multidimensional arrays, as well as a
wide range of mathematical functions to operate on them.
Here are some of the important NumPy concepts for a data
scientist:

7
NUMPY CON’T

• ndarrays NumPy’s ndarrays are the core data structure for

storing and manipulating numerical data. They provide
efficient storage and access to homogeneous numerical
data, with support for various mathematical operations.
You should be familiar with creating ndarrays, indexing
and slicing them, and performing basic operations
like addition, multiplication, and broadcasting.
• Array Shape and Dimensions Understanding the shape
and dimensions of an ndarray is essential for working with
NumPy. The shape of an ndarray specifies the size of
each dimension, while the number of dimensions is called
the rank. You should be familiar with functions for
manipulating the shape and dimensions of ndarrays, like
8
reshape(), resize(), and transpose().
NUMPY CON’T

• Array Indexing and Slicing

Indexing and slicing are essential operations for accessing
and modifying elements of an ndarray. You should be
familiar with basic indexing and slicing, as well as
advanced indexing techniques like Boolean indexing, fancy
indexing, and broadcasting.
• Array Operations NumPy provides a wide range of
mathematical and logical operations to work with
ndarrays. You should be familiar with functions like
np.sum(), np.mean(), np.max(), np.min(), np.std(),
np.dot(), np.transpose(), np.concatenate(), and
many more.
9
NUMPY CON’T

• Broadcasting
Broadcasting is a powerful feature of NumPy that allows
for element-wise operations between ndarrays of different
shapes and dimensions.
• Vectorization
Vectorization is the process of converting iterative
operations into vector operations, which can be
performed more efficiently with NumPy.
• File Input/Output
NumPy provides functions for reading and writing
ndarrays to and from files. You should be familiar with
functions like np.load(), np.save(), np.savetxt(), and
np.loadtxt() for working with files in NumPy.
10
pandas
PANDAS

Pandas is a popular Python library for data manipulation and

analysis. It provides high-performance data structures for
efficiently working with large datasets, as well as a wide range
of functions for data cleaning, exploration, and transformation.
Here are some of the important panda’s concepts for a data
scientist:

11
PANDAS CON’T

• Series and DataFrame

Pandas provide two main data structures for working with
tabular data: Series and DataFrame. A Series is a
one-dimensional labeled array that can hold any data
type, while a data frame is a two-dimensional labeled data
structure with columns of potentially different data types.
• Indexing and Selection
Pandas provides various indexing and selection methods
for accessing and modifying elements of a Series or
DataFrame. You should be familiar with basic indexing
and selection methods like loc, iloc, and ix, as well as
advanced techniques like Boolean indexing, fancy
indexing, and hierarchical indexing.
12
PANDAS CON’T

• Data Cleaning and Preprocessing Data cleaning and

preprocessing are essential steps in data analysis. Pandas
provides a wide range of functions for handling missing
data, removing duplicates, renaming columns, converting
data types, and more. You should be familiar with
functions like dropna(), fillna(), drop duplicates(),
rename(), astype(), and apply() for cleaning and
preprocessing data.

13
PANDAS CON’T

• Data Exploration and Analysis

Pandas provide a wide range of functions for exploring
and analyzing data. You should be familiar with functions
for calculating summary statistics like mean, median, and
standard deviation, as well as functions for aggregating,
grouping, and pivoting data. You should also be familiar
with functions for working with time-series data, like
resample() and rolling().

14
PANDAS CON’T

• Data Transformation
Data transformation involves converting data from one
form to another. Pandas provide a wide range of
functions for data transformation, including functions for
sorting, ranking, merging, and pivoting data. You should
be familiar with functions like sort values(), rank(),
merge(), and pivot table() for transforming data.

15
PANDAS CON’T

• Time-Series Analysis
Pandas provide powerful support for time-series analysis.
You should be familiar with functions for working with
time-series data, like resample(), rolling(), and shift().
You should also be familiar with functions for handling
time zones and date ranges.
• Input/Output
Pandas provide functions for reading and writing data to
and from various file formats, including CSV, Excel,
SQL databases, and more. You should be familiar with
functions like read csv(), read excel(), read sql(),
to csv(), and to excel() for working with data in
Pandas.
16
Scipy
SCIPY

Scipy is a powerful Python library for scientific computing

that provides many functions for numerical optimization,
integration, interpolation, signal processing, linear
algebra, and more. Here are some of the important Scipy
concepts for a data scientist:

17
SCIPY

• Integration
Scipy provides functions for numerical integration,
including quad(), dblquad(), and tplquad(). These
functions can be used to calculate integrals of functions
in one, two, or three dimensions, respectively.
• Optimization
Scipy provides functions for numerical optimization,
including minimize(), curve fit(), and root(). These
functions can be used to find the minimum or maximum
of a function, fit a curve to data, or solve nonlinear
equations, respectively.

18
SCIPY

• Interpolation
Scipy provides functions for numerical interpolation,
including interp1d(), interp2d(), and griddata().
These functions can be used to interpolate data onto a
grid, or to create a smooth curve that passes through a
set of points.
• Signal Processing
Scipy provides functions for signal processing, including
convolution(), fft(), and spectrogram(). These
functions can be used to filter, transform, and analyze
signals, such as audio, image, or time-series data.

19
SCIPY

• Linear Algebra
Scipy provides functions for linear algebra, including
solve(), eig(), and svd(). These functions can be used
to solve linear systems of equations, compute eigenvalues
and eigenvectors, and perform singular value
decomposition.
• Statistics
Scipy provides functions for statistical analysis, including
ttest 1samp(), ttest ind(), and pearsonr(). These
functions can be used to perform hypothesis testing,
calculate confidence intervals, and compute correlation
coefficients.
20
SCIPY

• Sparse Matrices
Scipy provides support for sparse matrices, which are
useful for representing large datasets with many zeros.
Scipy provides functions for creating, manipulating, and
solving sparse matrices, including csr matrix(),
coo matrix(), and spsolve().
• Image Processing
Scipy provides functions for image processing, including
imread(), imsave(), and ndimage(). These functions
can be used to read and write image files, as well as
perform operations like filtering, segmentation, and
morphological operations on images.
21
matplotlib/seaborn
MATPLOTLIB

Matplotlib is a popular Python library for creating

visualizations and plots. Here are some of the important
Matplotlib concepts for a data scientist:

22
MATPLOTLIB CON’T

• Figure and Axes Objects

Matplotlib operates on two main types of objects: Figure
and Axes. The Figure object represents the entire figure
or window, while the Axes object represents an individual
plot within the figure. Understanding how to create,
customize, and manipulate these objects is key to
creating effective visualizations.
• Types of Plots
Matplotlib supports a wide range of plot types, including
line plots, scatter plots, bar plots, histogram plots, and
more. Understanding the syntax and options for each plot
type is important for creating the desired visualizations.
23
MATPLOTLIB CON’T

• Subplots
Matplotlib allows you to create multiple plots within a
single figure using subplots. Understanding how to create
and customize subplots can be useful for comparing
multiple datasets or visualizing different aspects of a
single dataset.
• Saving and Exporting Plots
Matplotlib allows you to save your plots in various
formats, such as PNG, PDF, or SVG. Understanding
how to save and export your plots can be useful for
sharing your visualizations with others or incorporating
them into reports or presentations.
24
MATPLOTLIB CON’T

• Plot Customization
Matplotlib provides many options for customizing plots,
such as changing the color, size, and style of lines or
markers, adding labels and titles, adjusting axis limits and
ticks, and more. Understanding how to use these options
can help improve the clarity and effectiveness of your
visualizations.
• Integration with Pandas Matplotlib can be easily
integrated with the Pandas library, which is commonly
used for data manipulation and analysis. Understanding
how to use Matplotlib to create visualizations from
Pandas dataframes can be useful for quickly exploring and
analyzing datasets.
25
Data Exploration
FF

Data exploration is a critical step in the data science process,

as it involves understanding the structure, quality, and
patterns in the data. Here are some of the important data
exploration concepts for a data scientist:

26
DATA EXPLORATION

• Data Types and Formats

Understanding the types and formats of data is important
for determining appropriate analysis methods and
identifying potential data quality issues. Common data
types include numerical, categorical, text, and datetime
data.
• Descriptive Statistics
Descriptive statistics provide summary information about
the data, such as measures of central tendency (e.g.,
mean, median) and variability (e.g., standard deviation,
range). Understanding how to calculate and interpret
descriptive statistics is important for identifying patterns
and outliers in the data.
27
• Data Visualization
DATA EXPLORATION

• Data Cleaning and Preprocessing

Data cleaning involves identifying and correcting errors,
missing values, and outliers in the data, while data
preprocessing involves transforming and normalizing the
data to prepare it for analysis. Understanding how to use
tools like pandas and numpy for data cleaning and
preprocessing is critical for ensuring the quality and
reliability of analysis results.
• Correlation Analysis
Correlation analysis involves identifying the strength and
direction of the relationships between variables in the
dataset. Understanding how to calculate and interpret
correlation coefficients can help identify important
28
variables and relationships within the data.
DATA EXPLORATION

• Feature Engineering
Feature engineering involves creating new features or
variables from existing data to improve the performance
of machine learning models. Understanding how to select
and create appropriate features is important for
developing effective models.
• Exploratory Data Analysis (EDA) EDA involves
examining the data in depth to generate hypotheses and
insights about the data. Techniques such as clustering
and dimensionality reduction can help identify patterns
and relationships in the data.

29
DATA EXPLORATION

• Data Quality Assessment

Assessing the quality of the data is critical for ensuring
the reliability and validity of analysis results. Common
methods for assessing data quality include evaluating data
completeness, accuracy, and consistency.

Overall, effective data exploration is critical for understanding

the characteristics of the data and identifying potential issues
or patterns that can inform subsequent analysis steps.

30
END OF PRESENTATION

THANK YOU

Mastering Python For Data Science With Numpy & Pandas
100% (2)
Mastering Python For Data Science With Numpy & Pandas
136 pages
Python Libraries and Packages For Data Science
100% (1)
Python Libraries and Packages For Data Science
5 pages
Report
No ratings yet
Report
18 pages
Python For Data Science
No ratings yet
Python For Data Science
8 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
49 pages
tool and lib in Data Science
No ratings yet
tool and lib in Data Science
32 pages
DAL EXT 1 and 2
No ratings yet
DAL EXT 1 and 2
125 pages
Data Science lecture 5 6th semster
No ratings yet
Data Science lecture 5 6th semster
3 pages
Unit 1
100% (1)
Unit 1
69 pages
Data Science Workshop - Day 1
No ratings yet
Data Science Workshop - Day 1
80 pages
DS-DS Lab-1
No ratings yet
DS-DS Lab-1
4 pages
Data Science I: Charles C.N. Wang
No ratings yet
Data Science I: Charles C.N. Wang
68 pages
Python Ca22
No ratings yet
Python Ca22
14 pages
Final Fds Manual
No ratings yet
Final Fds Manual
77 pages
FINAL FDS MANUAL print
No ratings yet
FINAL FDS MANUAL print
55 pages
suraj report file
No ratings yet
suraj report file
17 pages
Data Science Tools
No ratings yet
Data Science Tools
2 pages
PYTHON
No ratings yet
PYTHON
11 pages
data science
No ratings yet
data science
42 pages
TY FDS Workbook
No ratings yet
TY FDS Workbook
56 pages
Final Fds Manual Print
No ratings yet
Final Fds Manual Print
55 pages
MGNM801 Ca2 Final
No ratings yet
MGNM801 Ca2 Final
13 pages
Data Ty
No ratings yet
Data Ty
59 pages
DS FINAL
No ratings yet
DS FINAL
46 pages
Python For Data Science
No ratings yet
Python For Data Science
20 pages
Mastering Python Data Visualization - Sample Chapter
100% (9)
Mastering Python Data Visualization - Sample Chapter
63 pages
data science
No ratings yet
data science
10 pages
Data Analysis with Python
No ratings yet
Data Analysis with Python
51 pages
Python Data Science Group Bootcamp NYC (Affordable Machine Learning)
No ratings yet
Python Data Science Group Bootcamp NYC (Affordable Machine Learning)
16 pages
Top 18 Python Libraries
100% (1)
Top 18 Python Libraries
11 pages
Rakshitha.M - 1BO17EC031
No ratings yet
Rakshitha.M - 1BO17EC031
26 pages
Unit 5
No ratings yet
Unit 5
27 pages
EXP1-siddhant gupta (23_SE_148)
No ratings yet
EXP1-siddhant gupta (23_SE_148)
17 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Introduction-It Skills
No ratings yet
Introduction-It Skills
20 pages
Gujarat Technological University: Overview of Python and Data Structures
No ratings yet
Gujarat Technological University: Overview of Python and Data Structures
4 pages
data science unit 1
No ratings yet
data science unit 1
30 pages
ds with py
No ratings yet
ds with py
39 pages
TBC 401 Data Analytics Using Python
No ratings yet
TBC 401 Data Analytics Using Python
2 pages
Python Written Assignment
No ratings yet
Python Written Assignment
35 pages
Pandas Course Slides
No ratings yet
Pandas Course Slides
90 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
155 pages
NumPy Essentials - Sample Chapter
50% (2)
NumPy Essentials - Sample Chapter
16 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Python For Data Science
No ratings yet
Python For Data Science
5 pages
Chapter 04 Advanced Use of Python Libraries for AI and Data Science
No ratings yet
Chapter 04 Advanced Use of Python Libraries for AI and Data Science
179 pages
Data Science Using With Python
No ratings yet
Data Science Using With Python
14 pages
FDS Lab Meterial CS3361
No ratings yet
FDS Lab Meterial CS3361
30 pages
DS1
No ratings yet
DS1
20 pages
T - Report Abhishek Choudary
No ratings yet
T - Report Abhishek Choudary
17 pages
Python Data Science Intro To ML - Course 01
No ratings yet
Python Data Science Intro To ML - Course 01
3 pages
PDS MERGED NEW
No ratings yet
PDS MERGED NEW
19 pages
fds-fundamentals-of-data-science-laboratory
No ratings yet
fds-fundamentals-of-data-science-laboratory
53 pages
Python For Data Science .
100% (1)
Python For Data Science .
112 pages
UNIT 2
No ratings yet
UNIT 2
38 pages
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
From Everand
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
e3
No ratings yet
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
Data Science with Python: From Zero to Machine Learning
From Everand
Data Science with Python: From Zero to Machine Learning
Pouvo
No ratings yet
Fundamentals of Data Science
No ratings yet
Fundamentals of Data Science
84 pages
Unit 1 FUNDAMENTALS OF DATA SCIENCE-1
No ratings yet
Unit 1 FUNDAMENTALS OF DATA SCIENCE-1
27 pages
Matplotlib in Python
No ratings yet
Matplotlib in Python
23 pages
Ad3002 Health Care Analytics
No ratings yet
Ad3002 Health Care Analytics
76 pages
CBDS - 5 Days
No ratings yet
CBDS - 5 Days
5 pages
Big Data & Analytics
No ratings yet
Big Data & Analytics
14 pages
IT Project File 2024-2025
No ratings yet
IT Project File 2024-2025
50 pages
Data Science Curriculum
No ratings yet
Data Science Curriculum
3 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
15 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
14 pages
Marketing Analytics Notes
No ratings yet
Marketing Analytics Notes
123 pages
[C.R. Rao E. J. Wegman J. L. Solka] Handbook of St(BookFi)
No ratings yet
[C.R. Rao E. J. Wegman J. L. Solka] Handbook of St(BookFi)
660 pages
Andrew Treadway - Software Engineering For Data Scientists (MEAP v2) - Manning Publications (2023)
100% (1)
Andrew Treadway - Software Engineering For Data Scientists (MEAP v2) - Manning Publications (2023)
213 pages
Exploratory Data Analysis PDF
100% (4)
Exploratory Data Analysis PDF
791 pages
XXX398 Project Template
No ratings yet
XXX398 Project Template
4 pages
_Exploratory_Data_Analysis_of_Heart_Disease_Dataset__1737826105
No ratings yet
_Exploratory_Data_Analysis_of_Heart_Disease_Dataset__1737826105
50 pages
Unit II - RM Notes
No ratings yet
Unit II - RM Notes
12 pages
FDS Unit 2
No ratings yet
FDS Unit 2
15 pages
Final Mini Project PPT (d8)
No ratings yet
Final Mini Project PPT (d8)
15 pages
Industrial Training Report
No ratings yet
Industrial Training Report
26 pages
Phan Project2 Report
No ratings yet
Phan Project2 Report
10 pages
12 Ai Data Story 3
No ratings yet
12 Ai Data Story 3
20 pages
Exploratory Data Analysis With Python
No ratings yet
Exploratory Data Analysis With Python
2 pages
Facebook Ad Targeting Analysis Report
No ratings yet
Facebook Ad Targeting Analysis Report
7 pages
Data Science and Visualization (21CS644) : Text Books
No ratings yet
Data Science and Visualization (21CS644) : Text Books
27 pages
Unit Ii Eda Using R
No ratings yet
Unit Ii Eda Using R
11 pages
Data Analytics Life Cycle
No ratings yet
Data Analytics Life Cycle
13 pages
Exploratory Data Analysis-Engineering Statistics Handbook NIST 2002
100% (1)
Exploratory Data Analysis-Engineering Statistics Handbook NIST 2002
804 pages
AIML_Dom_25_Nov_2024
No ratings yet
AIML_Dom_25_Nov_2024
22 pages
Time Series Algorithms Recipes: Implement Machine Learning and Deep Learning Techniques with Python Akshay R Kulkarni All Chapters Instant Download
100% (3)
Time Series Algorithms Recipes: Implement Machine Learning and Deep Learning Techniques with Python Akshay R Kulkarni All Chapters Instant Download
32 pages