0% found this document useful (0 votes)
43 views5 pages

1 Week 6. Pandas and Numpy Cheat Sheet

This document provides an overview of the Pandas and NumPy Python libraries for data science. [Pandas] allows users to store and manipulate data in DataFrames and Series, and extract or add data. [NumPy] handles N-dimensional arrays and allows operations on numeric data like linear algebra and Fourier transforms. Both are essential for tasks like data wrangling, analysis, and machine learning in Python.

Uploaded by

William Oliss
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
43 views5 pages

1 Week 6. Pandas and Numpy Cheat Sheet

This document provides an overview of the Pandas and NumPy Python libraries for data science. [Pandas] allows users to store and manipulate data in DataFrames and Series, and extract or add data. [NumPy] handles N-dimensional arrays and allows operations on numeric data like linear algebra and Fourier transforms. Both are essential for tasks like data wrangling, analysis, and machine learning in Python.

Uploaded by

William Oliss
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 5

pandas&numpy

August 4, 2020

1 Week 6. Pandas and NumPy cheat sheet


Pandas (https://pandas.pydata.org) and Numpy (https://numpy.org) are two essential Python
libraries for Data Science. Here we will give you some basics that you need to know for the final
project.
You do not need to know a lot about these libraries for this course. However, it is worth
learning more about these libraries if you plan to be a Data Scientist in the future - do not hesitate
to search for extra materials and tutorials.

In [2]: import numpy as np


import pandas as pd

1.1 Pandas
Pandas provides easy-to-use data structures and data analysis tools. Basic classes in Pandas are
DataFrames, which you can consider as matrices with names columns and indexed rows, and
Series, which are columns of such matrices. For example:

In [3]: species = ['falcon', 'dog', 'spider', 'fish']


feature_1 = [2, 4, 8, 0]
feature_2 = [2, 0, 0, 0]

animals = pd.DataFrame({"num_legs": feature_1, "num_specimen_seen": feature_2}, index=s


print('Type of variable animals is ', type(animals))
print('Type of a column from DataFrame animals is ', type(animals["num_specimen_seen"])
animals

Type of variable animals is <class 'pandas.core.frame.DataFrame'>


Type of a column from DataFrame animals is <class 'pandas.core.series.Series'>

Out[3]: num_legs num_specimen_seen


falcon 2 2
dog 4 0
spider 8 0
fish 0 0

1
In [4]: feature_3 = [2, 0, 0, 0]
animals["num_wings"] = feature_3
animals

Out[4]: num_legs num_specimen_seen num_wings


falcon 2 2 2
dog 4 0 0
spider 8 0 0
fish 0 0 0

Now, when you have a DataFrame, you can extract data from it as you like. For example, take
only some of the columns:

In [5]: X = animals[["num_legs", "num_wings"]]


X

Out[5]: num_legs num_wings


falcon 2 2
dog 4 0
spider 8 0
fish 0 0

DataFrame is a quite complicated structure with a lot of extra information (ex. indexes, column
names, etc.). When you only need the numbers, you can convert the DataFrame into another object
- NumPy NdArray:

In [6]: X_np = X.values


print(type(X_np))

<class 'numpy.ndarray'>

1.2 NumPy
NumPy is the fundamental package for scientific computing with Python. ndarray is one of the
most important classes in NumPy. It is a powerful N-dimensional array object. For instance, you
can use it to store a matrix:

In [7]: X_np

Out[7]: array([[2, 2],


[4, 0],
[8, 0],
[0, 0]])

There are plenty of methods in that class. The simplest methods are getting the sizes of the
ndarray via all of the dimentions:

In [8]: X_np.shape

2
Out[8]: (4, 2)

You can create new ndarrays by converting Python objects into it or by using functions pro-
vided in NumPy:

In [9]: a = np.array([2, 4, 8, 0])


print(a, type(a))

[2 4 8 0] <class 'numpy.ndarray'>

In [10]: b = np.ones(shape=(1, 4))


print(b, type(b))

[[1. 1. 1. 1.]] <class 'numpy.ndarray'>

In [11]: c = np.zeros(shape=(5, 4))


print(c, type(c))

[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]] <class 'numpy.ndarray'>

In [12]: d = np.zeros_like(X_np)
print(d, type(d))

[[0 0]
[0 0]
[0 0]
[0 0]] <class 'numpy.ndarray'>

Also, you can change it in many ways. For example, by changing the shape (the sizes) of the
ndarray:

In [13]: print(a.reshape(2, 2))

[[2 4]
[8 0]]

In [14]: print(a.reshape(2, -1)) # -1 means "count it for me"

[[2 4]
[8 0]]

3
In [15]: e = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(e.T) # .T means transpose

[[1 5]
[2 6]
[3 7]
[4 8]]

In [16]: print("Transposing converts an object of shape", e.shape, "into an object of shape", e

Transposing converts an object of shape (2, 4) into an object of shape (4, 2) .

In [17]: print("X_np\n", X_np)


print("d\n", d)
print("result\n", np.concatenate([X_np, d], axis=1))

X_np
[[2 2]
[4 0]
[8 0]
[0 0]]
d
[[0 0]
[0 0]
[0 0]
[0 0]]
result
[[2 2 0 0]
[4 0 0 0]
[8 0 0 0]
[0 0 0 0]]

In [18]: print("result\n", np.concatenate([X_np, d], axis=0)) # depending on the specified axis

result
[[2 2]
[4 0]
[8 0]
[0 0]
[0 0]
[0 0]
[0 0]
[0 0]]

You can also perform matrix and vector operations over ndarray objects:

4
In [19]: v1 = np.array([1, 2, 3])
v2 = np.array([4, 5, 6])
print(v1, "+", v2, "=", v1 + v2)
print(v1, "-", v2, "=", v1 - v2)

[1 2 3] + [4 5 6] = [5 7 9]
[1 2 3] - [4 5 6] = [-3 -3 -3]

In [20]: print("<", v1, ", ", v2.T, ".T> =", v1.dot(v2.T))

< [1 2 3] , [4 5 6] .T> = 32

In [21]: print(v1, "*", v2, "=", v1 * v2)

[1 2 3] * [4 5 6] = [ 4 10 18]

In [22]: print(v1, "^2 =", v1**2)

[1 2 3] ˆ2 = [1 4 9]

In [23]: print(v1, "* 2 =", v1 * 2)

[1 2 3] * 2 = [2 4 6]

Here are cheat sheets for each library. Feel free to use them while completing your final project:
https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf

You might also like