Elitedatascience Python Crash Course
Elitedatascience Python Crash Course
1.1 - Lists
Lists are very versatile and you will use them all the time.
Mutable: You can update elements of a list, add to the list, delete elements, etc.
Sequence: Each element is ordered and indexed, starting at 0.
Of Objects: You can put any other Python object in lists, including functions, other
lists, and custom objects.
Square Brackets: ['element 0', 'element 1', 'element 2']
For example:
In [2]:
integer_list = [0, 1, 2, 3, 4]
print( integer_list )
print( type(integer_list) )
[0, 1, 2, 3, 4]
< type 'list' >
That means you access the first element using index 0, the second element using
index 1, and so on...
Don't forget this fact. Index 1 is not the first element in a Python list.
For example:
In [4]:
print( my_list[0] ) # Print the first element
print( my_list[2] ) # Print the third element
hello
world
You can also select elements from the list using a range of indices.
They are denoted like start_index:end_index.
In [5]:
# Selects all starting from the 2nd element, but BEFORE the 4th element
print( my_list[1:3] )
[1, 'world']
In addition:
In [6]:
# Selects all BEFORE the 4th element
print( my_list[:3] )
For example:
In [7]:
# Selects the last element
print( my_list[-1] )
# Selects all starting from the 2nd element, but before the last element
print( my_list[1:-1] )
3.0
['hello', 1, 'world', 2]
[1, 'world', 2]
1.3 - Mutable
Because lists are mutable, you can change individual elements of the list.
Appending to and removing elements from lists are both easy to do.
In [9]:
# Add to end of the list
my_list.append(99)
print(my_list)
Python offers an entire suite of list operations. Remember, operations for each type of object behave
differently.
In [10]:
a = [1, 2, 3]
b = [4, 5, 6]
Or repeat a list.
In [12]:
print( a * 3 ) # Repetition
[1, 2, 3, 1, 2, 3, 1, 2, 3]
And finally, you can check the length of a list, which is just the number of elements in the list.
In [15]:
print( len(a) ) # Length
3
2.1 - Tuples
We won't use tuples as often as lists, but they are still fairly common.
Once again, let's break that statement down, and highlight the parts different from lists:
Immutable: You cannot update elements of a tuple, add to the tuple, delete
elements, etc.
Sequence: Each element is ordered and indexed, starting at 0.
Of Objects: You can put any other Python object in tuples, including functions,
other lists, and custom objects.
Parentheses: ('element 0', 'element 1', 'element 2')
For example:
In [22]:
integer_tuple = (0, 1, 2, 3, 4)
print( integer_tuple )
print( type(integer_tuple) )
(0, 1, 2, 3, 4)
< type 'tuple' >
Running the code cell below will give you this error:
In [26]:
# Tuples cannot be updated
my_tuple[0] = 'goodbye' # Will throw an error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in ()
1 # Tuples cannot be updated
----> 2 my_tuple[0] = 'goodbye' # Will throw an error
TypeError: 'tuple' object does not support item assignment
2.4 - Unpacking
One convenient tip about tuples is that you can unpack their individual elements.
In [27]:
# Unpack tuple
a, b, c, d, e = my_tuple
# Print a and c, which were the first and third elements from the tuple
print( a, c )
hello world
In case you're wondering, you can unpack lists too (it's just not as common).
In [28]:
my_list = ['hello', 1, 'world', 2, 99]
# Unpack list
a, b, c, d, e = my_list
# Print a and c, which were the first and third elements from the list
print( a, c )
hello world
2.5 - Tuple operations
For example:
In [29]:
a = (1, 2, 3)
b = (4, 5, 6)
print( a + b ) # Concatentation
print( a * 3 ) # Repetition
print( 3 in a ) # Membership
print( len(a) ) # Length
print( min(b), max(b) ) # Min, Max
(1, 2, 3, 4, 5, 6)
(1, 2, 3, 1, 2, 3, 1, 2, 3)
True
3
4 6
For this course, we won't worry too much about the difference between lists and tuples, which
are:
We will mostly use lists, but it's helpful to be able to spot tuples when they appear.
3.1 - Sets
Sets are unordered collections of unique objects, enclosed by curly braces: {}.
For example:
In [30]:
integer_set = {0, 1, 2, 3, 4}
print( integer_set )
print( type(integer_set) )
set([0, 1, 2, 3, 4])
< type 'set' >
You can also count the number of elements in the set with the same len() function you used for lists
and tuples.
In [31]:
# Print length of integer set
print( integer_set, 'has', len(integer_set), 'elements.' )
set([0, 1, 2, 3, 4]) has 5 elements.
3.2 - Removing duplicates
Because each element in a set must be unique, sets are a great tool for removing duplicates.
For example:
In [32]:
fibonacci_list = [ 1, 1, 2, 3, 5, 8, 13 ] # Will keep both 1's will remain
fibonacci_set = { 1, 1, 2, 3, 5, 8, 13 } # Only one 1 will remain
print( fibonacci_list )
print( fibonacci_set )
[1, 1, 2, 3, 5, 8, 13]
set([1, 2, 3, 5, 8, 13])
3.3 - No indexing
In [33]:
# Throws an error
fibonacci_set[0]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in ()
1 # Throws an error
----> 2 fibonacci_set[0]
You can use it like the int() and float() functions from the previous lesson.
In [34]:
# Create a list
fibonacci_list = [ 1, 1, 2, 3, 5, 8, 13 ]
# Convert it to a set
fibonacci_set = set(fibonacci_list)
print( fibonacci_set )
set([1, 2, 3, 5, 8, 13])
3.5 - Set operations
You can easily take the union, intersection, and difference of sets.
In [35]:
powers_of_two = { 1, 2, 4, 8, 16 }
fibonacci_set = { 1, 1, 2, 3, 5, 8, 13 }
Finally, the difference... By the way... watch out for the order of the sets!
In [38]:
# Difference
print( powers_of_two - fibonacci_set )
print( fibonacci_set - powers_of_two)
set([16, 4])
set([3, 5, 13])
Dictionaries are unordered collections of key-value pairs, enclosed by curly braces, like {}.
For example:
In [41]:
integer_dict = {
'zero' : 0,
'one' : 1,
'two' : 2,
'three' : 3,
'four' : 4
}
print( integer_dict )
print( type(integer_dict) )
{'four': 4, 'zero': 0, 'three': 3, 'two': 2, 'one': 1}
< type 'dict' >
Don't get dictionaries confused with sets. They both use curly braces, but that's about it.
A dictionary, also called a "dict", is like a miniature database for storing and organizing data.
Each element in a dict is actually a key-value pair, and it's called an item.
A key is like a name for the value. You should make them descriptive.
A value is some other Python object to store. They can be floats, lists, functions,
and so on.
For the purposes of this course, we'll use strings for most of our keys, but they can be other data types as
well.
The cool thing about dictionaries is that you can access values by their keys.
In [43]:
# Print the value for the 'title' key
print( my_dict['title'] )
But wait, the author field is wrong! His name should be "Douglas Adams," not "Dougie Adams."
4.3 - Updating values
In [44]:
# Updating existing key-value pair
my_dict['author'] = 'Douglas Adams'
Values can be any other Python object, even other lists or dictionaries.
For example, those who have read Hitchhiker's Guide to the Galaxy will recognize 42 as not just any
old mundane number, but rather
The Answer to the Ultimate Question of Life, the Universe, and Everything.
(If you haven't read the book, you're seriously missing out!)
The Meaning of Life
Conveniently, we can .append() that to our list, just as we would with any other list.
In [46]:
# Append element to list
my_dict[42].append('Answer to the Ultimate Question of Life, the Universe, and Everything')
You can also add a new item by simply setting a value to an unused key.
In [47]:
# Creating a new key-value pair
my_dict['year'] = 1979
Now that we have the year, we can print a summary of the book:
In [48]:
# Print summary of the book.
print('{} was written by {} in {}.'.format(my_dict['title'], my_dict['author'], my_dict['year']) )
Hitchhiker's Guide to the Galaxy was written by Douglas Adams in 1979.
4.5 - Convenience functions
Finally, you can also access a list of all the keys in a dictionary using the .keys() function.
And you can get a list of all the values using the .values() function.
In [49]:
# Keys
print( my_dict.keys() )
# Values
print( my_dict.values() )
[42, 'author', 'year', 'title']
[['A number.', 'The answer to 40 + 2 = ?', 'Answer to the Ultimate Question of Life,
the Universe, and Everything'],
'Douglas Adams',
1979,
"Hitchhiker's Guide to the Galaxy"]
You can access a list of all key-value pairs using the .items() function.
This is very useful for iterating through a dictionary, which we'll see in Lesson 3:
Flow and Functions.
It will return a list of tuples.
In [50]:
# All items (list of tuples)
print( my_dict.items() )
[(42, ['A number.', 'The answer to 40 + 2 = ?', 'Answer to the Ultimate Question of Li
fe, the Universe, and Everything']), ('author', 'Douglas Adams'),
('year', 1979),
('title', "Hitchhiker's Guide to the Galaxy")]
When you indent a line of code, it becomes a child of the previous line.
The parent has a colon following it.
Each indent is exactly 4 spaces.
To end a block of code, you would simply outdent.
In Jupyter Notebook, you can just press tab to indent 4 spaces.
For example:
You begin them with the if keyword. Then the statement has two parts:
1. The condition, which must evaluate to a boolean. (Technically, it's fine as long as
its "truthiness" can be evaluated, but let's not worry about that for now.)
2. The code block to run if the condition is met (indented with 4 spaces).
For example:
In [2]:
current_fuel = 85
# Condition
if current_fuel >= 80:
# Code block to run if condition is met
print( 'We have enough fuel to last the zombie apocalypse. ')
We have enough fuel to last the zombie apocalypse.
That seems OK, but what if we want to print another message, such as a warning to restock on fuel?
For example:
In [4]:
current_fuel = 50
# Condition
if current_fuel >= 80:
# Do this when condition is met
print( 'We have enough fuel to last the zombie apocalypse. ')
else:
# Do this when condition is not met
print( 'Restock! We need at least {} gallons.'.format(80 - current_fuel) )
Restock! We need at least 30 gallons.
1.3 - If... Elif... Else...
The elif (short for else if) statement checks another condition if the first one is
not met.
For example:
In [5]:
current_fuel = 50
# First condition
if current_fuel >= 80:
print( 'We have enough fuel to last the zombie apocalypse. ')
# If first condition is not met, check this condition
elif current_fuel < 60:
print( 'ALERT: WE ARE WAY TOO LOW ON FUEL!' )
# If no conditions were met, perform this
else:
print( 'Restock! We need at least {} gallons.'.format(80 - current_fuel) )
ALERT: WE ARE WAY TOO LOW ON FUEL!
You begin them with the for keyword. Then the loop has three parts:
1. An iterable (list, tuple, etc.) object that contains the elements to loop through.
2. A named variable that represents each element in the list.
3. A code block to run for each element (indented with 4 spaces).
For example:
In [9]:
for number in [0, 1, 2, 3, 4]:
print( number )
0
1
2
3
4
Next, we set each single element that we loop through to a named variable. In
this case, we named it number.
print( number )
2.2 - Range
range() is a built-in Python function for generating lists of sequential integers.
i.e. range(10) creates the list [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Note: In Python 3, it creates something called a generator instead. We won't
worry too much about the differences because we'll use them identically.
For example, this produces the same output as our for loop from earlier:
In [10]:
for number in range(5):
print( number )
0
1
2
3
4
You can also iterate in reversed() order.
In [11]:
for number in reversed(range(5)):
print( number )
4
3
2
1
0
2.3 - Nested control flow
You can nest if statements within for loops to write more complex logic.
Here's an example:
The % (modulo) operator appears again. It's used to check if a number is divisible
by another.
In [12]:
range_list = range(10)
for a in list_a:
for b in list_b:
print( a, 'x', b, '=', a * b )
4 x 6 = 24
4 x 3 = 12
3 x 6 = 18
3 x 3 = 9
2 x 6 = 12
2 x 3 = 6
2.4 - Building new lists
for loops can be used to build new lists from scratch. Here's how:
For example, let's say we want to separate our range_list into an evens_list and
an odds_list. We can do so like this:
In [14]:
range_list = range(10)
They are somewhat advanced, and you don't technically need to use them, but they help keep your code
clean and concise.
List comprehensions construct new lists out of existing ones after applying transformations
or conditions to them.
These are one of the trickier concepts in Python, so don't worry if it doesn't make sense right away. We'll
get plenty of practice with them in the projects.
Here's an example:
In [17]:
# Construct list of the squares in range(10) using list comprehension
squares_list = [number**2 for number in range(10)]
print( squares_list )
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
They also have an output. In this case, it's the squared number in the list.
print( evens_list )
[0, 2, 4, 6, 8]
3.3 - Conditional outputs
You can also use if... else... in the output for conditional outputs.
print( even_odd_labels )
['Even', 'Odd', 'Even', 'Odd', 'Even', 'Odd', 'Even', 'Odd', 'Even', 'Odd']
3.4 - Other comprehensions
In [20]:
# Construct set of doubles using set comprehension
doubles_set = { number * 2 for number in range(10) }
print( doubles_set )
set([0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
Functions allow you to reuse and quickly tailor code to different situations.
Here's an example:
In [23]:
def make_message_exciting(message='hello, world'):
text = message + '!'
return text
print( type(make_message_exciting) )
< type 'function' >
Functions begin with the def keyword, followed by the function name (and a
colon).
def make_message_exciting(message='hello, world'):
text = message + '!'
return text
This is the default value for the argument (more on this later).
return text
To call a function, simply type its name and a parentheses.
In [24]:
# Call make_message_exciting() function
make_message_exciting()
Out[24]:
'hello, world!'
As you can see, if you don't pass the function any argument values, it will use the default ones.
4.2 - In practice
In practice, functions are ideal for isolating functionality.
In [25]:
def square(x):
output = x*x
return output
def cube(x):
output = x*x*x
return output
print( square(3) )
print( cube(2) )
print( square(3) + cube(2) )
9
8
17
4.3 - Optional parts
It's worth noting that the code block is actually optional, as long as you have a return statement.
In [26]:
# Example of function without a code block
def hello_world():
return 'hello world'
In [27]:
# Example of function without a return statement
def print_hello_world():
print( 'hello world' )
Basically, as long as you have either a code block or a return statement, you're good to go.
4.4 - Arguments
Finally, functions can have arguments, which are variables that you pass into the function.
In [28]:
def print_message( message='Hello, world', punctuation='.' ):
output = message + punctuation
print( output )
To pass a new value for the argument, simply set it again when calling the function.
In [29]:
# Print new message, but default punctuation
print_message( message='Nice to meet you', punctuation='...' )
Nice to meet you...
When passing a value to an argument, you don't have to write the argument's name if the values are in
order.
The first value is for the first argument, second value for the second argument, etc.
In [30]:
# Print new message without explicity setting the argument
print_message( 'Where is everybody', '?' )
Where is everybody?
Well, one of the main reasons NumPy is so popular for scientific computing is because it provides a new
data structure that's optimized for calculations with arrays of data.
1.1 - NumPy Arrays
For example:
In [3]:
# Array of ints
array_a = np.array([0, 1, 2, 3])
print( array_a )
print( type(array_a) )
[0 1 2 3]
< type 'numpy.ndarray' >
You can see the data type of the contained elements using the .dtype attribute.
In [4]:
# Print data type of contained elements
print( array_a.dtype )
int64
In NumPy, integers have the dtype int64.
Note: don't get it confused!
1.2 - Homogenous
NumPy arrays are homogenous, which means all of their elements must have the same data type.
In [5]:
# Mixed array with 1 string and 2 integers
array_b = np.array(['four', 5, 6])
We can use the .shape attribute to see the axes for a NumPy array.
In [6]:
print( array_a.shape )
print( array_b.shape )
(4,)
(3,)
As you can see, .shape returns a tuple.
The number of elements in the tuple is the number of axes.
And each element's value is the length of that axis.
Together, these two pieces of information make up the shape, or dimensions, of the array.
If this seems confusing right now, don't worry. This will become clearer once we see more examples.
1.4 - Indexing
Similar to the lists we saw in Lesson 2: Data Structures, we can access elements in NumPy arrays by
their indices.
In [7]:
# First element of array_a
print( array_a[0] )
Or by slicing.
In [8]:
# From second element of array_a up to the 4th
print( array_a[2:4] )
[2 3]
1.5 - Missing data
Finally, another reason NumPy is popular in the data science community is that it can indicate missing
data.
As you'll see in the next 3 projects, most real world datasets are plagued by
missing data.
Fortunately, NumPy has a special np.nan object for denoting missing values.
In [9]:
# Array with missing values
array_with_missing_value = np.array([1.2, 8.8, 4.0, np.nan, 6.1])
# Print array
print( array_with_missing_value )
[ 1.2 8.8 4. nan 6.1]
NaN allows you to indicate a value is missing while preserving the array's numeric dtype:
In [10]:
# Print array's dtype
print( array_with_missing_value.dtype )
float64
See how NumPy keeps a nan to indicate a missing value, but the .dtype is still float64?
This turns out to be a very useful property for data analysis, as you'll see soon!