Python Material Part1
Python Material Part1
1.1 Introduction
The word Python - isn’t it scary? Doesn’t it bring the image of a big snake found in the
Amazon forest? Well, it is time to change this image. Now on, you will remember Python as
a free-to-use open-source programming language that is great for performing Data Science
tasks. Python, the programming language, was named after the famous BBC Comedy Show
Monty Python’s Flying Circus. It is an easy-to-learn yet powerful object-oriented
programming language that is mainly used now for analyzing data.
1.2 First Program in Python
Let us now try our first Program in Python. A Statement is an instruction that the
computer will execute. Perhaps the simplest Python Program that you can write is the one
that contains a single print statement, as shown below:
> print (“Hello, Python!”)
When the computer executes the above print statement, it will simply display the value you
write within the parentheses (i.e. “Hello, Python!”). The value you write in the parentheses is
called the argument. If you are using a Jupiter Notebook, you will see a small rectangle
with the above print statement. This is called a cell. If you select this cell with your Mouse
and then click the “Run All” button, the computer will execute the print statement and
display the output (i.e. Hello, Python!) beneath the cell, as shown below:
print (“Hello, Python!”)
Hello, Python!
Using the type( ) function, we can find the data type of a value. For example, type (29)
will return int, while type (3.14) will return float. If a string contains an integer value, we
can convert it to int. This is known as type casting. For example, int (“38”) will return 38,
while int (True) will return 1. On the other hand, bool (1) will return True. The data types,
List, Tuple and Dictionary are known as sequence data types.
1.4 Expressions
A simple example for an expression is 33+60–10. Here, we call the numbers 33, 60 and
10 as operands and the Maths symbols + and – as operators. The value of this expression is
83. We can perform the multiplication operation by using the asterisk symbol, and the
division operation by using the forward slash symbol, as shown below:
5*5 25/5
The values of the above expressions will be evaluated as 25 and 5.0, respectively.
We can use double slash for integer division, in which case the result will be rounded.
Python follows mathematical conventions when evaluating a mathematical expression. The
arithmetic operators in the following two expressions are in different order. But, in both the
cases, Python performs multiplication first and then the addition to obtain the final result:
120 120
2 * 60 + 30 30 + 2 * 60
150 150
The expressions in the parentheses are performed first in the following example. We
then multiply the result by 60. The final result is 1920.
32
(30 + 2) * 60
3
1920
1.5 Variable
We can use variables to store values. In the following example, we assign the value
10000 to the variable principal by using the assignment operator (i.e. the equals sign). We
can then use the value somewhere else in the code by typing the exact name of the variable.
principal = 10000
years =3
interest_rate = 10
simple_interest = (principal * years * interest_rate) / 100
print(simple_interest)
3000
Recall that we can find the data type of a value by using the type( ) function. For
example, type(29) will return int. We can apply the type( ) function on a variable also. For
example, type(principal) will return int.
1.6 Method of providing the input through the keyboard
In the above Program, input values are provided as part of the Program. But, often, the
input values are received from the User through the keyboard. The input( ) function shall be
used for this purpose.
name = input (“What is your name?”)
When the above statement is executed, the prompt will be displayed as a vertical line, as
shown below:
What is your name? |
We have to type the name in front of the above prompt (|), as shown below:
What is your name? Ganesh
Now, “Ganesh” will be assigned as the value of the variable name.
We have just now seen the method of reading in a string. Even if we provide a
number as the input, the input( ) function will return the entered value as a string. So, what
should we do if we have to read in an integer or a fractional value? What is the way out? We
have to use the int( ) and float( ) functions, as shown below:
age = int(input (“what is your age?”))
percentage = float(input (“Enter your percentage of marks:”)
The following Program will read in the marks obtained by a student in three different
subjects and then calculate the average mark.
3
9
5
25
The while statement is used to repeatedly execute a set of statements as long as a condition
remains true. Here is an example:
n=1
while n < = 3
print (n*n)
n = n+1
Here, n will take the values 1, 2 and 3. So, the output will be as follows:
1
4
9
Note:
Many programming languages use curly braces to delimit blocks of code. But Python
uses indentation, as shown below:
for i in [ 1, 2, 3, 4, 5 ] :
print i
for j in [ 1, 2, 3, 4, 5 ] :
print j
print ( i + j )
print i
print (“looping is over”)
mult (20, 10) # Invoking the mult( ) function. Output will be 200.
The syntax of the user-defined function is as follows:
def user_defined_function_name (list_of_arguments) :
6
filter( ) result = 94
1.13 Modules
A set of built-in functions and constants that were written for a specific purpose
constitute a Module. For example, the module named re contains functions and constants
required for working with regular expressions. If we wish to make use of the functions
defined in the re module, such as the compile( ) function, we have to import the re module,
as shown below:
import re
my_regular_expression = re.compile (“[0-9]+”, re.I)
Here, we prefix the compile( ) function with the name of the module (i.e., re). If we already
use re in our Program for some other purpose, we can use an alias, as shown below:
import re as regex
my_regular_expression = regex.compile (“[0-9]+”, re.I)
In Python, we use the functions in the module named matplotlib.pyplot for drawing a
variety of figures, such as the Bar Chart and the Pie Chart.
import matplotlib.pyplot
matplotlib.pyplot.plot( ... )
Here, the name of the module is a lengthy one. Instead of writing this lengthy name again
and again, we can use the alias option, as shown below:
import matplotlib.pyplot as plt
plt.plot( ... )
If we need only a few specific functions and constants defined in a Module, we can
import them explicitly and use them without qualification, as shown below:
from collections import defaultdict, counter
lookup = defaultdict (int)
my_counter = counter( )
Note that we are not invoking the defaultdict( ) function in the format
collections.defaultdict( ).
The module is a single Python file. It is saved with .py extension. Assume that a
module named sum has the following functions. Assume further that the following code is
saved as sum.py.
def add (x, y) :
return (x+y)
def mul (a, b) :
return (a*b)
def sub (x, y) :
11
return (x–y)
def div (a, b) :
return (a/b)
The above module sum is imported in the following program. Then, the functions in
this module are used.
import sum
n1 = int ( input (“Enter the first number:” ) )
n2 = int ( input (“Enter the second number:” ) )
print (“The result of add( ): ”, sum.add (n1, n2))
print (“The result of mul( ): ”, sum.mul (n1, n2))
Output
Enter the first number: 30
Enter the second number: 10
The result of add( ): 40
The result of mul( ): 300
The package contains a group of module files. It also contains ...init... py file. All the
components of a package are placed in a single directory. The package directory should be
differentiated from the ordinary directory. For this purpose, the package directory contains
... init ... py file. Every package directory contains this file. Package Installer for Python
(PIP) is used for installing a package.
Python allows the handling of the exceptions raised in the invoked functions.
1.14 Whitespace Formatting
Whitespace is ignored inside parentheses and brackets. This can be helpful for long-
winded computations:
long_winded_computation = (1+2+3+4+5+6+7+8+9+10+11+12+13+14+15+
16+17+18+19+20)
Whitespace can be used for making code easier to read:
list_of_lists = [ [1, 2, 3], [4, 5, 6], [7, 8, 9] ]
easier_to_read_list_of_lits = [ [1, 2, 3],
[4, 5, 6],
[7, 8, 9] ]
We can also use a backslash to indicate that a statement continues onto the next line.
two_plus_three =2+\
3
1.15 Object-oriented Programming in Python
12
In object-oriented programming, focus is given on the data. The code that works with
the data are grouped into a single entity and it is called as a class. The class contains member
data and methods. Some of the principles of object-oriented programming are listed below:
a) Abstraction: Abstraction allows the Programmer to use the object without knowing the
details of the object.
b) Inheritance: Inheritance allows one class to derive the functionality and characteristics
of the other class.
c) Encapsulation: Encapsulation binds all the data members and methods into a single
entity called as class.
d) Polymorphism: Polymorphism allows the same entity to be used a different form. Both
compile-time polymorphism and run-time polymorphism are possible in Python.
1.16 Classes and Objects in Python
A class is a collection of data and the methods that interact with those data. An instance
of a class is known as an object. Each data type in Python, namely the integers, floats,
strings, booleans, lists, sets and dictionaries is an object. We will now develop a class to
represent a circle.
radius
All we need to describe a circle is its radius. Let us also consider the colour to make it easier
later to distinguish between the different instances of the class circle. Here, the data
attributes of the class circle are radius and colour. The class circle shall now be defined
with the constructor and an utility method for calculation the area, as shown below:
class circle (object):
def - init - (self, radius, color) :
self.radius = radius
self.color = color
def calculate - area (self):
area = 3.14 * self.radius * self.radius
return area
We shall now create an object named playground whose radius is 400 meters and the
preferred color is green. We shall also calculate its area.
playground = Circle(400, ‘green’)
area1 = playground. calculate-area( )
13
print (area1)
5024
We will now develop a class to represent a rectangle.
breadth
length
All we need to describe a Rectangle is its length and breadth. Here, the data attributes of
the class. Rectangle are length and breadth. The class Rectangle shall now be defined with a
constructor and an utility method, as shown below:
class Rectangle (object) :
def-init-(self, length, breadth)
self.length = length
self.breadth = breadth
def calculate-perimeter (self) :
perimeter = 2(length + breadth)
return perimeter
We shall now create an object named ClassRoom whose length is 40 feet and the breadth is
20 feet. We will then calculate its area.
classroom = Rectangle (40, 20)
perimeter = classroom.calculate-perimeter( )
perimeter
120
derived class named student, which has two additional data attributes, namely rollNumber
and marks and two additional methods, namely getRollNumber( ) and getMarks( ).
class person (object):
def-init- (self, name, age):
self.name = name
self.age = age
def getName (self):
return self.rollNumber
def getAge (self):
return self.age
class student (person):
def-init- (self, name, age, rollNumber, marks):
super (student, self). -init-(self, name, age)
self.rollNumber = rollNumber
self.marks = marks
def getRollNumber(self):
return self.rollNumber
def getMarks (self):
return self.marks
The above classes shall be instantiated and used as shown below:
person1 = person (“Ganesh”, 20)
person1.getName( )
Ganesh
person1.getAge( )
20
student1 = student (“Ganesh”, 20, 101, 83
student1.getName( )
Ganesh
student1.getAge( )
20
student1.getRollNumber( )
101
student1.getMarks( )
83
Chapter-2: Data Structures in Python
2.1 Introduction
Lists, Tuples, Dictionaries and Data Frames are the important data structures available
in Python. All of them are often used now for analyzing data. We will briefly discuss them
one by one.
2.2 Lists
A list is an ordered sequence of comma-separated values of any data type. The values
in a list are written between square brackets. A value in a list can be accessed by specifying
its position in the list. The position is known as index. Here is an example.
list1 = [1, 2, 3, 4, 5]
list1 [0]
1
The lists are mutable. This means that the elements of a list can be changed at a later stage, if
necessary.
list2 = [10, 12, 14, 18, 30, 47]
list2[0] = 20
list2
[20, 12, 14, 18, 30, 47]
Each element of a List can be accessed via an index, as we have seen above. The elements in
a List are indexed from 0. Backward indexing from –1 is also valid. The following table
represents the relationship between the index and the elements in the following list:
2.10 Sets
A Set is a collection of distinct elements. As in the case of Lists and Tuples, the
elements of a set can be of different data types. (i.e., Values of different data types can be
placed within a Set). Unlike Lists and Tuples, sets are unordered. This means that the Sets
do not record the element position. Sets only have unique elements. This means there is
only one of a particular element in a Set. To define a Set, you have to use curly brackets.
You have to place the elements of a Set within the curly brackets, as shown below:
set1 = {“rock”, “R&B”, “disco”, “hard rock”, “pop”, “soul”}
You can convert a List to a Set by using the function Set. This is called type-casting. You
have to simply use the List as the input to the function set( ). The result will be a List
converted to a Set.
Let us go over an example. We start off with a List. We input the List to the function
set(). The function set() returns a Set. Notice how there are no duplicate elements.
album-list = [“Michael Jackson”, “Thriller”, “Thriller”, 1982]
album-set = set (album-list)
album-set
{“Michael Jackson”, “Thriller”, 1982}
Let us go over the Set operations. These can be used to change the Set. Consider the
Set, A, given below:
A = {“Thriller”, “Back in Black”, “AC/DC”}
We can add an item to a Set by using the add( ) method.
A . add (“NSYNC”)
A
{“AC/DC”, “Back in Black”, “NSYNC”, “Thriller”}
We can remove an item from a Set by using the remove( ) method.
A. remove (“NSYNC”)
A
{“AC/DC”, “Back in Black”, “Thriller”}
We can verify whether an element is in the set by using the in command, as follows:
A = {“AC/DC”, “Back in Black”, “Thriller”}
“AC/DC” in A
True
These are the types of Mathematical Set operations. There are other operations that we
can do. For example, we can find the union/intersection of two sets.
album-set-1 = {“AC/DC”, “Back in Black”, “Thriller”}
album-set-2 = {“AC/DC”, “Back in Black”, “The Dark Side of the Moon”}
album-set-3 = album-set-1 & album-set-2
album-set-4 = album-set-1 . union(album-set-2)
album-set-3
{“AC/DC”, “Back in Black”}
album-set-4
{“AC/DC”, “Back in Black”, “Thriller”, “The Dark Side of the Moon”}
Here, all the elements of album-set-3 are in album-set-1. We can check whether a Set is a
SubSet of another Set by using the issubject( ) method. Here is an example:
album-set-3 . issubject(album-set-1)
True
2.11 Dictionaries
A Dictionary is a collection of key:value pairs. The elements in a Dictionary are
written between curly brackets. In the case of Lists and Tuples, an index is associated with
each element. But, in the case of a Dictionary, a key is associated with each element. The
key should be unique. In the case of Lists and Tuples, the index is used to access the
elements. But, in the case of a Dictionary, the key is used to access the elements.
Dictionaries are mutable. So, we can change some of the elements of a Dictionary and
then store the changed elements in the same Dictionary object. Dictionaries are unordered,
as no index is associated with the elements of a Dictionary. A Dictionary, D, can be created
by using a Command of the following form:
<Dictionary-name> = {<key> : <value>, <key> : <value>, ...}
Example
teachers = {“Benedict” : “Maths”, “Albert” : “CS”, “Andrew” : “Commerce” }
An element in a Dictionary can be accessed by using the key, as illustrated below:
teachers [“Andrew”]
Commerce
Traversing a Dictionary
Traversal of a collection of values means accessing and processing each element of it.
The traversal can be done in the case of a Dictionary by using the for loop, as shown below:
d1 = {5 : “Number”, “a” : “String”, (1, 2) : “Tuple” }
for key in d1
print (key, “:”, d1 [key])
a : String
(1,2) : Tuple
5 : Number
Programming Example 2.3
We will now write a Program to create a Phone Dictionary for our friends and then
print it.
PhoneDirectory = {“Jagdish” : “94437 55625”, “Bala” : “96297 09185”,
“Saravanan” : “99947
49333”}
for name in PhoneDirectory :
print (name, “:”, PhoneDirectory [name])
The output of the above program will be as shown below:
Jagdish : 94437 55625
Bala : 96297 09185
Saravanan : 99947 49333
Adding an element to a Dictionary
We can add a new key:value pair to a Dictionary by using a simple assignment
statement, as shown below:
Employee = {“name” : “John”, “salary” : 10000, “age” : 24}
Employee [“dept”] = “Sales”
Employee
{“name” : “John”, “salary” : 10000, “age” : 24, “dept” : “Sales”}
Dictionary Methods
Let us now briefly discuss the various built-in functions and methods that are provided
by Python to manipulate the elements of Dictionaries.
1. len( ) method
This method returns the number of key:value pairs in a Dictionary
Employee5 = {“name” : “John”, “salary” : 10000, “age” : 24}
len (Employee5)
3
2. clear( ) method
This method removes all the elements of a Dictionary and makes it an empty Dictionary.
When we use the del statement, the Dictionary no more exists, not even an empty
Dictionary.
Employee6 = {“name” : “John”, “salary” : 10000, “age” : 24}
Employee6.clear( )
Employee6
{}
3. get( ) method
This method helps us to get the value that is associated with a key.
Employee7 = {“salary” : 10000, “department” : “Sales”, “age” : 24, “name” : “John”}
Employee7.get (“department”)
“Sales”
4. items( ) method
This method returns all the key:value pairs in a Dictionary.
Employee8 = {“name” : “John”, “salary” : 10000, “age” : 24}
myList = Employee8.items( )
for x in myList
print x
(“salary”, 10000)
(“age”, 24)
(“name”, “John”)
5. keys( ) method
This method returns all the keys in a Dictionary
Employee9 = {“salary” : 10000, “department” : “sales”, “age” : 24, “name” : “John”}
Employee9.keys( )
[“salary”, “department”, “age”, “name”]
6. values( ) method
This method returns all the values in a Dictionary
Employee10 = {“salary” : 10000, “department” : “Sales”, “age” : 24, “name” : “John”}
Employee10: values( )
[10000, “Sales”, 24, “John”]
7. update( ) method
This method merges key:value pairs from the new dictionary into the original
dictionary, adding or replacing, as needed.
Employee11 = {“name” : “John”, “salary” : 10000, “age” : 24}
Employee12 = {“name” : “David”, “salary” : 54000, “department” : “Sales”}
Employees11.update (Employees12)
Employees11
{“salary”: 540000, “department”: “Sales”, “name”: “David”, “age”: 24}
Note:
The elements of Employees12 dictionary have overridden the elements of
Employees11 dictionary having the same keys. So, the values associated with the keys
“name” and “salary” have been changed.
2.12 Default Dictionary
Imagine that you are trying to count the number of occurrences of the words in a
document. An obvious approach is to create a Dictionary in which the keys are words and the
values are counts. As you check each word, you can increment its count if it is already in the
Dictionary and add it to the Dictionary if it is not.
word_counts = { }
for word in document:
if word in word_counts:
word_counts [word] = word_counts [word] + 1
else:
word_counts [word] = 1
We can just handle the exception that may arise due to trying to look up a missing key.
word_counts = { }
for word in document:
try:
word_counts [word] = word_counts [word] + 1
except keyError:
word_counts [word] = 1
A third approach is to use the get( ) method, which behaves gracefully in the case of missing
Keys:
word_counts = { }
for word in document:
previous_count = word_counts.get (word, 0)
word_counts [word] = previous_count + 1
Every one of these is slightly unwieldly. In such cases, defaultdict( ) method is helpful.
A defaultdict is like a regular dictionary, except that when you try to look up a key it doesn’t
contain, it first adds a value for it by using a zero-argument function you provided when you
created it. In order to use defaultdicts, you have to import them from collections module.
2.13 Exception Handling
When something goes wrong, Python raises an exception. If they are not handled,
exceptions will cause our Program to crash. We can handle the exceptions by using try and
except, as shown below:
try :
print (0/0)
except ZeroDivisionError :
print ( “cannot divide by zero” )
Suppose that a Person is not sure whether a Tuple is immutable or not. So, when he writes
code to alter the value of an element, he will make use of exception handling, as shown
below:
try:
my_tuple [1] = 3
except TypeError:
print (“cannot modify a Tuple”)
Suppose that a Person is not sure whether a key named “Kate” is present in a Dictionary or
not. So, when she writes code to access the value associated with the key “Kate”, she will
make use of exception handling so that her Program does not crash in case the key “Kate” is
not present in the Dictionary.
try:
kates_grade = grades [“kate”]
except keyError:
print (“no value for kate!”)
So for, we have discussed about handling exceptions that are raised by default. Exceptions
can also be raised manually, as shown in the following example:
try:
a = int (input (“Enter the value of a:”) )
b = int (input (“Enter the value of b:”) )
print (“The value of a =”, a)
print (“The value of b =”, b)
if (a-b) < 0 :
raise Exception (“Exception raised”)
except Exception as e:
print (“Received exception:”, e)
Output
Enter the value of a : 15
Enter the value of b : 20
The value of a = 15
The value of b = 20
Received exception : value of a-b is < 0
2.14 Counter
A Counter turns a sequence of values into a defaultdict(int) - like object. The keys will
be mapped to counts.
from collections import counter
c = Counter ( [0, 1, 2, 0] ) # c is {0:2, 1:1, 2:1}
Counter gives us a very simple way to solve our word-counts problem:
word_counts = Counter (document) # Here, document is a list of words
The most_common( ) method of Counter instance is often used in Natural Language
Processing.
# print the 10 most common words and their counts
for word, count in word_counts.most_common(10) :
print (word, count)
2.15 List Comprehensions
Frequently, you will want to transform a list into another list by choosing only certain
elements, by transforming elements, or both. The Pythonic way to do this is with list
comprehensions.
even_members = [x for x in range(5) if x%2 == 0] # [0, 2, 4]
squares = [x*x for x in range(5) ] # [0, 1, 4, 9, 16]
even_squares = [x*x for x in even_numbers] # [0, 4, 16]
We can similarly turn lists into dictionaries or sets, as shown below:
square_dict = {x: x*x for x in range (5) } # {0:0, 1:1, 2:4, 3:9, 4:16}
square_set = {x*x for x in [1, –1] } # {1}
If you don’t need the value from the list, it is common to use an underscore as the variable:
zeroes = [0 for _ in even-numbers ] # has the same length as even-numbers
A list comprehension can include multiple for loops:
pairs = [(x, y)
for x in range (10)
for y in range (10) ] # 100 pairs (0,0), (0,1), ..., (9, 8), (9,9)
The later for loops can use the results of earlier for loops:
increasing_pairs = [(x, y) # only pairs with x < y
for x in range (10) # range 1 to
10 equals
for y in rang (x+1, 10)]# [1, 2, 3, 4, 5, 6, 7, 8,
9]