1 Week 6. Pandas and Numpy Cheat Sheet
1 Week 6. Pandas and Numpy Cheat Sheet
August 4, 2020
1.1 Pandas
Pandas provides easy-to-use data structures and data analysis tools. Basic classes in Pandas are
DataFrames, which you can consider as matrices with names columns and indexed rows, and
Series, which are columns of such matrices. For example:
1
In [4]: feature_3 = [2, 0, 0, 0]
animals["num_wings"] = feature_3
animals
Now, when you have a DataFrame, you can extract data from it as you like. For example, take
only some of the columns:
DataFrame is a quite complicated structure with a lot of extra information (ex. indexes, column
names, etc.). When you only need the numbers, you can convert the DataFrame into another object
- NumPy NdArray:
<class 'numpy.ndarray'>
1.2 NumPy
NumPy is the fundamental package for scientific computing with Python. ndarray is one of the
most important classes in NumPy. It is a powerful N-dimensional array object. For instance, you
can use it to store a matrix:
In [7]: X_np
There are plenty of methods in that class. The simplest methods are getting the sizes of the
ndarray via all of the dimentions:
In [8]: X_np.shape
2
Out[8]: (4, 2)
You can create new ndarrays by converting Python objects into it or by using functions pro-
vided in NumPy:
[2 4 8 0] <class 'numpy.ndarray'>
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]] <class 'numpy.ndarray'>
In [12]: d = np.zeros_like(X_np)
print(d, type(d))
[[0 0]
[0 0]
[0 0]
[0 0]] <class 'numpy.ndarray'>
Also, you can change it in many ways. For example, by changing the shape (the sizes) of the
ndarray:
[[2 4]
[8 0]]
[[2 4]
[8 0]]
3
In [15]: e = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(e.T) # .T means transpose
[[1 5]
[2 6]
[3 7]
[4 8]]
X_np
[[2 2]
[4 0]
[8 0]
[0 0]]
d
[[0 0]
[0 0]
[0 0]
[0 0]]
result
[[2 2 0 0]
[4 0 0 0]
[8 0 0 0]
[0 0 0 0]]
result
[[2 2]
[4 0]
[8 0]
[0 0]
[0 0]
[0 0]
[0 0]
[0 0]]
You can also perform matrix and vector operations over ndarray objects:
4
In [19]: v1 = np.array([1, 2, 3])
v2 = np.array([4, 5, 6])
print(v1, "+", v2, "=", v1 + v2)
print(v1, "-", v2, "=", v1 - v2)
[1 2 3] + [4 5 6] = [5 7 9]
[1 2 3] - [4 5 6] = [-3 -3 -3]
< [1 2 3] , [4 5 6] .T> = 32
[1 2 3] * [4 5 6] = [ 4 10 18]
[1 2 3] ˆ2 = [1 4 9]
[1 2 3] * 2 = [2 4 6]
Here are cheat sheets for each library. Feel free to use them while completing your final project:
https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf