0% found this document useful (0 votes)
10 views

Tutorial 2

The document is a tutorial on using the NumPy and Pandas Python library modules. It provides examples of creating and manipulating NumPy ndarrays, including one-dimensional and two-dimensional arrays. It also demonstrates various NumPy functions for arithmetic operations, statistical analysis, linear algebra and more. Additionally, it shows how to create Pandas Series objects from lists, NumPy arrays and dictionaries and access/slice Series elements.

Uploaded by

POEASO
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Tutorial 2

The document is a tutorial on using the NumPy and Pandas Python library modules. It provides examples of creating and manipulating NumPy ndarrays, including one-dimensional and two-dimensional arrays. It also demonstrates various NumPy functions for arithmetic operations, statistical analysis, linear algebra and more. Additionally, it shows how to create Pandas Series objects from lists, NumPy arrays and dictionaries and access/slice Series elements.

Uploaded by

POEASO
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

tutorial2

November 12, 2020

1 Module 2: Introduction to Numpy and Pandas


The following tutorial contains examples of using the numpy and pandas library modules. Read
the step-by-step instructions below carefully. To execute the code, click on the cell and press the
SHIFT-ENTER keys simultaneously.

1.1 2.1 Introduction to Numpy


Numpy, which stands for numerical Python, is a Python library package to support numerical
computations. The basic data structure in numpy is a multi-dimensional array object called ndarray.
Numpy provides a suite of functions that can efficiently manipulate elements of the ndarray.

1.1.1 2.1.1 Creating ndarray


An ndarray can be created from a list or tuple object.

[ ]: import numpy as np

oneDim = np.array([1.0,2,3,4,5]) # a 1-dimensional array (vector)


print(oneDim)
print("#Dimensions =", oneDim.ndim)
print("Dimension =", oneDim.shape)
print("Size =", oneDim.size)
print("Array type =", oneDim.dtype)

twoDim = np.array([[1,2],[3,4],[5,6],[7,8]]) # a two-dimensional array (matrix)


print(twoDim)
print("#Dimensions =", twoDim.ndim)
print("Dimension =", twoDim.shape)
print("Size =", twoDim.size)
print("Array type =", twoDim.dtype)

arrFromTuple = np.array([(1,'a',3.0),(2,'b',3.5)]) # create ndarray from tuple


print(arrFromTuple)
print("#Dimensions =", arrFromTuple.ndim)
print("Dimension =", arrFromTuple.shape)
print("Size =", arrFromTuple.size)

There are several built-in functions in numpy that can be used to create ndarrays

1
[ ]: print(np.random.rand(5)) # random numbers from a uniform distribution␣
,→between [0,1]

print(np.random.randn(5)) # random numbers from a normal distribution


print(np.arange(-10,10,2)) # similar to range, but returns ndarray instead␣
,→of list

print(np.arange(12).reshape(3,4)) # reshape to a matrix


print(np.linspace(0,1,10)) # split interval [0,1] into 10 equally separated␣
,→values

print(np.logspace(-3,3,7)) # create ndarray with values from 10^-3 to 10^3

[ ]: print(np.zeros((2,3))) # a matrix of zeros


print(np.ones((3,2))) # a matrix of ones
print(np.eye(3)) # a 3 x 3 identity matrix

1.2 2.1.2 Element-wise Operations


You can apply standard operators such as addition and multiplication on each element of the
ndarray.

[ ]: x = np.array([1,2,3,4,5])

print(x + 1) # addition
print(x - 1) # subtraction
print(x * 2) # multiplication
print(x // 2) # integer division
print(x ** 2) # square
print(x % 2) # modulo
print(1 / x) # division

[ ]: x = np.array([2,4,6,8,10])
y = np.array([1,2,3,4,5])

print(x + y)
print(x - y)
print(x * y)
print(x / y)
print(x // y)
print(x ** y)

1.3 2.1.3 Indexing and Slicing


There are various ways to select certain elements with an ndarray.

[ ]: x = np.arange(-5,5)
print(x)

y = x[3:5] # y is a slice, i.e., pointer to a subarray in x


print(y)

2
y[:] = 1000 # modifying the value of y will change x
print(y)
print(x)

z = x[3:5].copy() # makes a copy of the subarray


print(z)
z[:] = 500 # modifying the value of z will not affect x
print(z)
print(x)

[ ]: my2dlist = [[1,2,3,4],[5,6,7,8],[9,10,11,12]] # a 2-dim list


print(my2dlist)
print(my2dlist[2]) # access the third sublist
print(my2dlist[:][2]) # can't access third element of each sublist
# print(my2dlist[:,2]) # this will cause syntax error

my2darr = np.array(my2dlist)
print(my2darr)
print(my2darr[2][:]) # access the third row
print(my2darr[2,:]) # access the third row
print(my2darr[:][2]) # access the third row (similar to 2d list)
print(my2darr[:,2]) # access the third column
print(my2darr[:2,2:]) # access the first two rows & last two columns

ndarray also supports boolean indexing.

[ ]: my2darr = np.arange(1,13,1).reshape(3,4)
print(my2darr)

divBy3 = my2darr[my2darr % 3 == 0]
print(divBy3, type(divBy3))

divBy3LastRow = my2darr[2:, my2darr[2,:] % 3 == 0]


print(divBy3LastRow)

More indexing examples.

[ ]: my2darr = np.arange(1,13,1).reshape(4,3)
print(my2darr)

indices = [2,1,0,3] # selected row indices


print(my2darr[indices,:])

rowIndex = [0,0,1,2,3] # row index into my2darr


columnIndex = [0,2,0,1,2] # column index into my2darr
print(my2darr[rowIndex,columnIndex])

3
1.4 2.1.4 Numpy Arithmetic and Statistical Functions
There are many built-in mathematical functions available for manipulating elements of nd-array.

[ ]: y = np.array([-1.4, 0.4, -3.2, 2.5, 3.4]) # generate a random vector


print(y)

print(np.abs(y)) # convert to absolute values


print(np.sqrt(abs(y))) # apply square root to each element
print(np.sign(y)) # get the sign of each element
print(np.exp(y)) # apply exponentiation
print(np.sort(y)) # sort array

[ ]: x = np.arange(-2,3)
y = np.random.randn(5)
print(x)
print(y)

print(np.add(x,y)) # element-wise addition x + y


print(np.subtract(x,y)) # element-wise subtraction x - y
print(np.multiply(x,y)) # element-wise multiplication x * y
print(np.divide(x,y)) # element-wise division x / y
print(np.maximum(x,y)) # element-wise maximum max(x,y)

[ ]: y = np.array([-3.2, -1.4, 0.4, 2.5, 3.4]) # generate a random vector


print(y)

print("Min =", np.min(y)) # min


print("Max =", np.max(y)) # max
print("Average =", np.mean(y)) # mean/average
print("Std deviation =", np.std(y)) # standard deviation
print("Sum =", np.sum(y)) # sum

1.5 2.1.5 Numpy linear algebra


Numpy provides many functions to support linear algebra operations.

[ ]: X = np.random.randn(2,3) # create a 2 x 3 random matrix


print(X)
print(X.T) # matrix transpose operation X^T

y = np.random.randn(3) # random vector


print(y)
print(X.dot(y)) # matrix-vector multiplication X * y
print(X.dot(X.T)) # matrix-matrix multiplication X * X^T
print(X.T.dot(X)) # matrix-matrix multiplication X^T * X

4
[ ]: X = np.random.randn(5,3)
print(X)

C = X.T.dot(X) # C = X^T * X is a square matrix

invC = np.linalg.inv(C) # inverse of a square matrix


print(invC)
detC = np.linalg.det(C) # determinant of a square matrix
print(detC)
S, U = np.linalg.eig(C) # eigenvalue S and eigenvector U of a square matrix
print(S)
print(U)

1.6 2.2 Introduction to Pandas


Pandas provide two convenient data structures for storing and manipulating data–Series and
DataFrame. A Series is similar to a one-dimensional array whereas a DataFrame is more simi-
lar to representing a matrix or a spreadsheet table.

1.6.1 2.2.1 Series


A Series object consists of a one-dimensional array of values, whose elements can be referenced
using an index array. A Series object can be created from a list, a numpy array, or a Python
dictionary. You can apply most of the numpy functions on the Series object.

[ ]: from pandas import Series

s = Series([3.1, 2.4, -1.7, 0.2, -2.9, 4.5]) # creating a series from a list
print(s)
print('Values=', s.values) # display values of the Series
print('Index=', s.index) # display indices of the Series

[ ]: import numpy as np

s2 = Series(np.random.randn(6)) # creating a series from a numpy ndarray


print(s2)
print('Values=', s2.values) # display values of the Series
print('Index=', s2.index) # display indices of the Series

[ ]: s3 = Series([1.2,2.5,-2.2,3.1,-0.8,-3.2],
index = ['Jan 1','Jan 2','Jan 3','Jan 4','Jan 5','Jan 6',])
print(s3)
print('Values=', s3.values) # display values of the Series
print('Index=', s3.index) # display indices of the Series

[ ]: capitals = {'MI': 'Lansing', 'CA': 'Sacramento', 'TX': 'Austin', 'MN': 'St␣


,→Paul'}

5
s4 = Series(capitals) # creating a series from dictionary object
print(s4)
print('Values=', s4.values) # display values of the Series
print('Index=', s4.index) # display indices of the Series

[ ]: s3 = Series([1.2,2.5,-2.2,3.1,-0.8,-3.2],
index = ['Jan 1','Jan 2','Jan 3','Jan 4','Jan 5','Jan 6',])
print(s3)

# Accessing elements of a Series

print('\ns3[2]=', s3[2]) # display third element of the Series


print('s3[\'Jan 3\']=', s3['Jan 3']) # indexing element of a Series

print('\ns3[1:3]=') # display a slice of the Series


print(s3[1:3])
print('s3.iloc([1:3])=') # display a slice of the Series
print(s3.iloc[1:3])

[ ]: print('shape =', s3.shape) # get the dimension of the Series


print('size =', s3.size) # get the # of elements of the Series

[ ]: print(s3[s3 > 0]) # applying filter to select elements of the Series

[ ]: print(s3 + 4) # applying scalar operation on a numeric Series


print(s3 / 4)

[ ]: print(np.log(s3 + 4)) # applying numpy math functions to a numeric Series

1.6.2 2.2.2 DataFrame


A DataFrame object is a tabular, spreadsheet-like data structure containing a collection of columns,
each of which can be of different types (numeric, string, boolean, etc). Unlike Series, a DataFrame
has distinct row and column indices. There are many ways to create a DataFrame object (e.g.,
from a dictionary, list of tuples, or even numpy’s ndarrays).

[ ]: from pandas import DataFrame

cars = {'make': ['Ford', 'Honda', 'Toyota', 'Tesla'],


'model': ['Taurus', 'Accord', 'Camry', 'Model S'],
'MSRP': [27595, 23570, 23495, 68000]}
carData = DataFrame(cars) # creating DataFrame from dictionary
carData # display the table

[ ]: print(carData.index) # print the row indices


print(carData.columns) # print the column indices

6
[ ]: carData2 = DataFrame(cars, index = [1,2,3,4]) # change the row index
carData2['year'] = 2018 # add column with same value
carData2['dealership'] = ['Courtesy Ford','Capital Honda','Spartan Toyota','N/
,→A']

carData2 # display table

Creating DataFrame from a list of tuples.

[ ]: tuplelist = [(2011,45.1,32.4),(2012,42.4,34.5),(2013,47.2,39.2),
(2014,44.2,31.4),(2015,39.9,29.8),(2016,41.5,36.7)]
columnNames = ['year','temp','precip']
weatherData = DataFrame(tuplelist, columns=columnNames)
weatherData

Creating DataFrame from numpy ndarray

[ ]: import numpy as np

npdata = np.random.randn(5,3) # create a 5 by 3 random matrix


columnNames = ['x1','x2','x3']
data = DataFrame(npdata, columns=columnNames)
data

The elements of a DataFrame can be accessed in many ways.

[ ]: # accessing an entire column will return a Series object

print(data['x2'])
print(type(data['x2']))

[ ]: # accessing an entire row will return a Series object

print('Row 3 of data table:')


print(data.iloc[2]) # returns the 3rd row of DataFrame
print(type(data.iloc[2]))
print('\nRow 3 of car data table:')
print(carData2.iloc[2]) # row contains objects of different types

[ ]: # accessing a specific element of the DataFrame

print(carData2.iloc[1,2]) # retrieving second row, third column


print(carData2.loc[1,'model']) # retrieving second row, column named 'model'

# accessing a slice of the DataFrame

print('carData2.iloc[1:3,1:3]=')
print(carData2.iloc[1:3,1:3])

7
[ ]: print('carData2.shape =', carData2.shape)
print('carData2.size =', carData2.size)

[ ]: # selection and filtering

print('carData2[carData2.MSRP > 25000]')


print(carData2[carData2.MSRP > 25000])

1.6.3 2.2.3 Arithmetic Operations

[ ]: print(data)

print('Data transpose operation:')


print(data.T) # transpose operation

print('Addition:')
print(data + 4) # addition operation

print('Multiplication:')
print(data * 10) # multiplication operation

[ ]: print('data =')
print(data)

columnNames = ['x1','x2','x3']
data2 = DataFrame(np.random.randn(5,3), columns=columnNames)
print('\ndata2 =')
print(data2)

print('\ndata + data2 = ')


print(data.add(data2))

print('\ndata * data2 = ')


print(data.mul(data2))

[ ]: print(data.abs()) # get the absolute value for each element

print('\nMaximum value per column:')


print(data.max()) # get maximum value for each column

print('\nMinimum value per row:')


print(data.min(axis=1)) # get minimum value for each row

print('\nSum of values per column:')


print(data.sum()) # get sum of values for each column

print('\nAverage value per row:')

8
print(data.mean(axis=1)) # get average value for each row

print('\nCalculate max - min per column')


f = lambda x: x.max() - x.min()
print(data.apply(f))

print('\nCalculate max - min per row')


f = lambda x: x.max() - x.min()
print(data.apply(f, axis=1))

1.6.4 2.2.4 Plotting Series and DataFrame


There are built-in functions you can use to plot the data stored in a Series or a DataFrame.

[ ]: %matplotlib inline

s3 = Series([1.2,2.5,-2.2,3.1,-0.8,-3.2,1.4],
index = ['Jan 1','Jan 2','Jan 3','Jan 4','Jan 5','Jan 6','Jan 7'])
s3.plot(kind='line', title='Line plot')

[ ]: s3.plot(kind='bar', title='Bar plot')

[ ]: s3.plot(kind='hist', title = 'Histogram')

[ ]: tuplelist = [(2011,45.1,32.4),(2012,42.4,34.5),(2013,47.2,39.2),
(2014,44.2,31.4),(2015,39.9,29.8),(2016,41.5,36.7)]
columnNames = ['year','temp','precip']
weatherData = DataFrame(tuplelist, columns=columnNames)
weatherData[['temp','precip']].plot(kind='box', title='Box plot')

You might also like