Learning IPython For Interactive Computing and Data Visualization - Second Edition - Sample Chapter
Learning IPython For Interactive Computing and Data Visualization - Second Edition - Sample Chapter
P U B L I S H I N G
Cyrille Rossant
$ 39.99 US
25.99 UK
Second Edition
Fr
ee
Sa
m
pl
C o m m u n i t y
E x p e r i e n c e
D i s t i l l e d
Cyrille Rossant
Preface
Data analysis skills are now essential in scientific research, engineering, finance,
economics, journalism, and many other domains. With its high accessibility and
vibrant ecosystem, Python is one of the most appreciated open source languages for
data science.
This book is a beginner-friendly introduction to the Python data analysis platform,
focusing on IPython (Interactive Python) and its Notebook. While IPython is an
enhanced interactive Python terminal specifically designed for scientific computing
and data analysis, the Notebook is a graphical interface that combines code, text,
equations, and plots in a unified interactive environment.
The first edition of Learning IPython for Interactive Computing and Data Visualization
was published in April 2013, several months before the release of IPython 1.0. This
new edition targets IPython 4.0, released in August 2015. In addition to reflecting the
novelties of this new version of IPython, the present book is also more accessible to
non-programmer beginners. The first chapter contains a brand new crash course on
Python programming, as well as detailed installation instructions.
Since the first edition of this book, IPython's popularity has grown significantly,
with an estimated user base of several millions of people and ongoing collaborations
with large companies like Microsoft, Google, IBM, and others. The project itself has
been subject to important changes, with a refactoring into a language-independent
interface called the Jupyter Notebook, and a set of backend kernels in various
languages. The Notebook is no longer reserved to Python; it can now also be used
with R, Julia, Ruby, Haskell, and many more languages (50 at the time of this
writing!).
Preface
The Jupyter project has received significant funding in 2015 from the Leona M. and
Harry B. Helmsley Charitable Trust, the Gordon and Betty Moore Foundation, and
the Alfred P. Sloan Foundation, which will allow the developers to focus on the
growth and maturity of the project in the years to come.
Here are a few references:
[ viii ]
Consequently, the past 15 years have seen the development of widely-used libraries
such as NumPy (providing a practical array data structure), SciPy (scientific
computing), matplotlib (graphical plotting), pandas (data analysis and statistics),
scikit-learn (machine learning), SymPy (symbolic computing), and Jupyter/IPython
(efficient interfaces for interactive computing). Python, along with this set of
libraries, is sometimes referred to as the SciPy stack or PyData platform.
Competing platforms
Python has several competitors. For example, MATLAB (by Mathworks)
is a commercial software focusing on numerical computing that is
widely-used in scientific research and engineering. SPSS (by IBM) is a
commercial software for statistical analysis. Python, however, is free and
open source, and that's one of its greatest strengths. Alternative open
source platforms include R (specialized in statistics) and Julia (a young
language for high-performance numerical computing).
[2]
Chapter 1
Example of a notebook
It quickly became clear that this interface could be used with languages other than
Python such as R, Julia, Lua, Ruby, and many others. Further, the Notebook is not
restricted to scientific computing: it can be used for academic courses, software
documentation, or book writing thanks to conversion tools targeting Markdown,
HTML, PDF, ODT, and many other formats. Therefore, the IPython developers
decided in 2014 to acknowledge the general-purpose nature of the Notebook by
giving a new name to the project: Jupyter.
Jupyter features a language-independent Notebook platform that can work with
a variety of kernels. Implemented in any language, a kernel is the backend of the
Notebook interface. It manages the interactive session, the variables, the data, and so
on. By contrast, the Notebook interface is the frontend of the system. It manages the
user interface, the text editor, the plots, and so on. IPython is henceforth the name
of the Python kernel for the Jupyter Notebook. Other kernels include IR, IJulia,
ILua, IRuby, and many others (50 at the time of this writing).
[3]
In August 2015, the IPython/Jupyter developers achieved the "Big Split" by splitting
the previous monolithic IPython codebase into a set of smaller projects, including
the language-independent Jupyter Notebook (see https://blog.jupyter.
org/2015/08/12/first-release-of-jupyter/). For example, the parallel
computing features of IPython are now implemented in a standalone Python
package named ipyparallel, the IPython widgets are implemented in ipywidgets,
and so on. This separation makes the code of the project more modular and facilitates
third-party contributions. IPython itself is now a much smaller project than before
since it only features the interactive Python terminal and the Python kernel for the
Jupyter Notebook.
You will find the list of changes in IPython 4.0 at http://ipython.
readthedocs.org/en/latest/whatsnew/version4.html.
Many internal IPython imports have been deprecated due to the
code reorganization. Warnings are raised if you attempt to perform
a deprecated import. Also, the profiles have been removed and
replaced with a unique default profile. However, you can simulate
this functionality with environment variables. You will find more
information at http://jupyter.readthedocs.org.
[4]
Chapter 1
References
Here are a few references about IPython and the Notebook:
Anaconda comes with a package manager named conda, which lets you manage
your Python distribution and install new packages.
Miniconda
Miniconda (http://conda.pydata.org/miniconda.html) is
a light version of Anaconda that gives you the ability to only install
the packages you need.
[5]
Downloading Anaconda
The first step is to download Anaconda from Continuum Analytics' website
(http://continuum.io/downloads). This is actually not the easiest part since
several versions are available. Three properties define a particular version:
The operating system (OS): Linux, Mac OS X, or Windows. This will depend
on the computer you want to install Python on.
32-bit or 64-bit: You want the 64-bit version, unless you're on an old or lowend computer. The 64-bit version will allow you to manipulate large datasets.
The version of Python: 2.7, or 3.4 (or later). In this book, we will use
Python 3.4. You can also use Python 3.5 (released in September 2015)
which introduces many features, including a new @ operator for matrix
multiplication. However, it is easy to temporarily switch to a Python 2.7
environment with Anaconda if necessary (see the next section).
Python 3 brought a few backward-incompatible changes over Python 2 (also
known as Legacy Python). This is why many people are still using Python
2.7 at this time, even though Python 3 was released in 2008. We will use
Python 3 in this book, and we recommend that newcomers learn Python
3. If you need to use legacy Python code that hasn't yet been updated to
Python 3, you can use conda to temporarily switch to a Python 2 interpreter.
Once you have found the right link for your OS and Python 3 64-bit, you can
download the package. You should then find it in your downloads directory
(depending on your OS and your browser's settings).
Installing Anaconda
The Anaconda installer comes in different flavors depending on your OS, as follows:
Linux: The Linux installer is a bash .sh script. Run it with a command
like bash Anaconda3-2.3.0-Linux-x86_64.sh (if necessary, replace the
filename by the one you downloaded).
Mac: The Mac graphical installer is a .pkg file that you can run with a
double-click.
Windows: The Windows graphical installer is an .exe file that you can run
with a double-click.
[6]
Chapter 1
Then, follow the instructions to install Anaconda on your computer. Here are a few
remarks:
You don't need administrator rights to install Anaconda. In most cases, you
can choose to install it in your personal user account.
Opening a terminal
You can skip this section if you already know how to do these things.
Opening a terminal
A terminal is a command-line application that lets you interact with your computer
by typing commands with the keyboard, instead of clicking on windows with the
mouse. While most computer users only know Graphical User Interfaces, developers
and scientists generally need to know how to use the command-line interface for
advanced usage. To use the command-line interface, follow the instructions that are
specific to your OS:
On Windows, you can use Powershell. Press the Windows + R keys, type
powershell in the Run box, and press Enter. You will find more information
about Powershell at https://blog.udemy.com/powershell-tutorial/.
Alternatively, you can use the older Windows terminal by typing cmd in the
Run box.
On Linux, you can open the Terminal from your application manager.
On Linux, edit or create the file ~/.bashrc and add export PATH="$PATH:/
path/to/directory" at the end of the file.
[8]
Chapter 1
4 2015, 15:29:08)
Managing environments
Anaconda lets you create different isolated Python environments. For example, you
can have a Python 2 distribution for the rare cases where you need to temporarily
switch to Python 2.
[9]
This will create a new isolated environment named py2 based on the original
Anaconda distribution, but with Python 2.7. You could also use the command conda
env: type conda env -h to see the details.
You can now activate your py2 environment by typing the following command in a
terminal:
Windows: activate py2 (note that you might have problems with
Powershell, see https://github.com/conda/conda/issues/626, or use the
old cmd terminal)
Now, you should see a (py2) prefix in front of your terminal prompt. Typing
python in your terminal with the py2 environment activated will open a Python 2
interpreter.
Type deactivate on Windows or source deactivate on Linux/OS X to deactivate
the environment in the terminal.
conda env list: Displays the list of environments installed. The currently
active one is marked by a star *.
available version.
[ 10 ]
Chapter 1
conda clean -t: Removes the old tarballs that are left over after installation
and updates.
Some commands ask for confirmation (you need to press y to confirm). You can also
use the -y option to avoid the confirmation prompt.
If conda install somepackage fails, you can try pip install somepackage
instead. This will use the Python Package Index (PyPI) instead of Anaconda. Many
scientific Anaconda packages are easier to install than the corresponding PyPI
packages because they are precompiled for your platform. However, many packages
are available on PyPI but not on Anaconda.
Here are some references:
References
Here are a few references about Anaconda:
[ 11 ]
Check your git installation: Open a new OS terminal and type git version.
You should see the version of git and not an error message.
This will download the very latest version of the code into a minibook subdirectory
in your home directory. You can also choose another directory.
From this directory, you can update to the latest version at any time by typing git
pull.
Notebooks on GitHub
Notebook documents stored on GitHub (with the file extension .ipynb)
are automatically rendered on the GitHub website.
[ 12 ]
Chapter 1
IPython console
[ 13 ]
The Notebook is most convenient when you start a complex analysis project that
will involve a substantial amount of interactive experimentation with your code.
Other common use-cases include keeping track of your interactive session (like a lab
notebook), or writing technical documents that involve code, equations, and figures.
In the rest of this section, we will focus on the Notebook interface.
Closing the Notebook server
To close the Notebook server, go to the OS terminal where you launched
the server from, and press Ctrl + C. You may need to confirm with y.
[ 14 ]
Chapter 1
[ 15 ]
A new notebook
Here are the main components of the interface, from top to bottom:
The notebook name, which you can change by clicking on it. This is also the
name of the .ipynb file.
The Menu bar gives you access to several actions pertaining to either the
notebook or the kernel.
To the right of the menu bar is the Kernel name. You can change the kernel
language of your notebook from the Kernel menu. We will see in Chapter 6,
Customizing IPython how to manage different kernel languages.
The Toolbar contains icons for common actions. In particular, the dropdown
menu showing Code lets you change the type of a cell.
Following is the main component of the UI: the actual Notebook. It consists
of a linear list of cells. We will detail the structure of a cell in the following
sections.
[ 16 ]
Chapter 1
You can change the type of a cell by first clicking on a cell to select it, and then
choosing the cell's type in the toolbar's dropdown menu showing Markdown
or Code.
Markdown cells
Here is a screenshot of a Markdown cell:
A Markdown cell
The top panel shows the cell in edit mode, while the bottom one shows it in render
mode. The edit mode lets you edit the text, while the render mode lets you display
the rendered cell. We will explain the differences between these modes in greater
detail in the following section.
[ 17 ]
Code cells
Here is a screenshot of a complex code cell:
The Prompt number shows the cell's number. This number increases every
time you run the cell. Since you can run cells of a notebook out of order,
nothing guarantees that code numbers are linearly increasing in a given
notebook.
The Input area contains a multiline text editor that lets you write one or
several lines of code with syntax highlighting.
The Widget area may contain graphical controls; here, it displays a slider.
[ 18 ]
Chapter 1
Use the edit mode to write code (the selected cell has a green border,
and a pen icon appears at the top right of the interface). Click inside
a cell to enable the edit mode for this cell (you need to double-click with
Markdown cells).
Use the command mode to operate on cells (the selected cell has a gray
border, and there is no pen icon). Click outside the text area of a cell to
enable the command mode (you can also press the Esc key).
Keyboard shortcuts are available in the Notebook interface. Type h to show them.
We review here the most common ones (for Windows and Linux; shortcuts for
OS X may be slightly different).
Shift + Enter: run the cell and select the cell below
Alt + Enter: run the cell and insert a new cell below
[ 19 ]
References
Here are a few references:
[ 20 ]
Chapter 1
Hello world
Open a new notebook and type the following in the first cell:
In [1]: print("Hello world!")
Out[1]: Hello world!
Here is a screenshot:
Prompt string
Note that the convention chosen in this book is to show Python code
(also called the input) prefixed with In [x]: (which shouldn't be
typed). This is the standard IPython prompt. Here, you should just type
print("Hello world!") and then press Shift + Enter.
Variables
Let's use Python as a calculator.
In [2]: 2 * 2
Out[2]: 4
Division
In Python 3, 3 / 2 returns 1.5 (floating-point division), whereas it returns
1 in Python 2 (integer division). This can be source of errors when
porting Python 2 code to Python 3. It is recommended to always use
the explicit 3.0 / 2.0 for floating-point division (by using floating-point
numbers) and 3 // 2 for integer division. Both syntaxes work in Python
2 and Python 3. See http://python3porting.com/differences.
html#integer-division for more details.
There are different types of variables. Here, we have used a number (more precisely,
an integer). Other important types include floating-point numbers to represent real
numbers, strings to represent text, and booleans to represent True/False values.
Here are a few examples:
In [6]: somefloat = 3.1415
sometext = 'pi is about'
print(sometext, somefloat)
Note how we used the # character to write comments. Whereas Python discards the
comments completely, adding comments in the code is important when the code is
to be read by other humans (including yourself in the future).
[ 22 ]
Chapter 1
String escaping
String escaping refers to the ability to insert special characters in a string. For
example, how can you insert ' and ", given that these characters are used to delimit
a string in Python code? The backslash \ is the go-to escape character in Python (and
in many other languages too). Here are a few examples:
In [7]: print("Hello \"world\"")
print("A list:\n* item 1\n* item 2")
print("C:\\path\\on\\windows")
print(r"C:\path\on\windows")
Out[7]: Hello "world"
A list:
* item 1
* item 2
C:\path\on\windows
C:\path\on\windows
The special character \n is the new line (or line feed) character. To insert a backslash,
you need to escape it, which explains why it needs to be doubled as \\.
You can also disable escaping by using raw literals with a r prefix before the string,
like in the last example above. In this case, backslashes are considered as normal
characters.
This is convenient when writing Windows paths, since Windows uses backslash
separators instead of forward slashes like on Unix systems. A very common error on
Windows is forgetting to escape backslashes in paths: writing "C:\path" may lead
to subtle errors.
You will find the list of special characters in Python at https://docs.python.
org/3.4/reference/lexical_analysis.html#string-and-bytes-literals.
[ 23 ]
Lists
A list contains a sequence of items. You can concisely instruct Python to perform
repeated actions on the elements of a list. Let's first create a list of numbers as
follows:
In [8]: items = [1, 3, 0, 4, 1]
Note the syntax we used to create the list: square brackets [], and commas , to
separate the items.
The built-in function len() returns the number of elements in a list:
In [9]: len(items)
Out[9]: 5
Now, let's compute the sum of all elements in the list. Python provides a built-in
function for this:
In [10]: sum(items)
Out[10]: 9
We can also access individual elements in the list, using the following syntax:
In [11]: items[0]
Out[11]: 1
In [12]: items[-1]
Out[12]: 1
Note that indexing starts at 0 in Python: the first element of the list is indexed by 0,
the second by 1, and so on. Also, -1 refers to the last element, -2 to the penultimate
element, and so on.
The same syntax can be used to alter elements in the list:
In [13]: items[1] = 9
items
Out[13]: [1, 9, 0, 4, 1]
[ 24 ]
Chapter 1
Here, 1:3 represents a slice going from element 1 included (this is the second element
of the list) to element 3 excluded. Thus, we get a sublist with the second and third
element of the original list. The first-included/last-excluded asymmetry leads to an
intuitive treatment of overlaps between consecutive slices. Also, note that a sublist
refers to a dynamic view of the original list, not a copy; changing elements in the
sublist automatically changes them in the original list.
Python provides several other types of containers:
Dictionaries contain key-value pairs. They are extremely useful and common:
In [16]: my_dict = {'a': 1, 'b': 2, 'c': 3}
print('a:', my_dict['a'])
Out[16]: a: 1
In [17]: print(my_dict.keys())
Out[17]: dict_keys(['c', 'a', 'b'])
A Python object is mutable if its value can change after it has been
created. Otherwise, it is immutable. For example, a string is immutable;
to change it, a new string needs to be created. A list, a dictionary, or a
set is mutable; elements can be added or removed. By contrast, a tuple
is immutable, and it is not possible to change the elements it contains
without recreating the tuple. See https://docs.python.org/3.4/
reference/datamodel.html for more details.
[ 25 ]
Loops
We can run through all elements of a list using a for loop:
In [19]: for item in items:
print(item)
Out[19]: 1
9
0
4
1
The for item in items syntax means that a temporary variable named
item is created at every iteration. This variable contains the value of every
Note the colon : at the end of the for statement. Forgetting it will lead to a
syntax error!
The statement print(item) will be executed for all items in the list.
Note the four spaces before print: this is called the indentation. You will
find more details about indentation in the next subsection.
This is called a list comprehension. A new list is created here; it contains the squares
of all numbers in the list. This concise syntax leads to highly readable and Pythonic
code.
[ 26 ]
Chapter 1
Indentation
Indentation refers to the spaces that may appear at the beginning of some lines of
code. This is a particular aspect of Python's syntax.
In most programming languages, indentation is optional and is generally used
to make the code visually clearer. But in Python, indentation also has a syntactic
meaning. Particular indentation rules need to be followed for Python code to be
correct.
In general, there are two ways to indent some text: by inserting a tab character
(also referred to as \t), or by inserting a number of spaces (typically, four). It is
recommended to use spaces instead of tab characters. Your text editor should be
configured such that the Tab key on the keyboard inserts four spaces instead of a tab
character.
In the Notebook, indentation is automatically configured properly; so you shouldn't
worry about this issue. The question only arises if you use another text editor for
your Python code.
Finally, what is the meaning of indentation? In Python, indentation delimits coherent
blocks of code, for example, the contents of a loop, a conditional branch, a function,
and other objects. Where other languages such as C or JavaScript use curly braces to
delimit such blocks, Python uses indentation.
Conditional branches
Sometimes, you need to perform different operations on your data depending on
some condition. For example, let's display all even numbers in our list:
In [21]: for item in items:
if item % 2 == 0:
print(item)
Out[21]: 0
4
[ 27 ]
If a and b are two integers, the modulo operand a % b returns the remainder
from the division of a by b. Here, item % 2 is 0 for even numbers, and 1 for
odd numbers.
Like with the for loop, the if statement ends with a colon :.
The part of the code that is executed when the condition is satisfied follows
the if statement. It is indented. Indentation is cumulative: since this if is
inside a for loop, there are eight spaces before the print(item) statement.
Python supports a concise syntax to select all elements in a list that satisfy certain
properties. Here is how to create a sublist with only even numbers:
In [22]: even = [item for item in items if item % 2 == 0]
even
Out[22]: [0, 4]
Functions
Code is typically organized into functions. A function encapsulates part of your
code. Functions allow you to reuse bits of functionality without copy-pasting the
code. Here is a function that tells whether an integer number is even or not:
In [23]: def is_even(number):
"""Return whether an integer is even or not."""
return number % 2 == 0
[ 28 ]
Chapter 1
The body of the function is indented (and note the colon : at the end of the
def statement).
The return keyword in the body of the function specifies the output of the
function. Here, the output is a Boolean, obtained from the expression number
% 2 == 0. It is possible to return several values; just use a comma to separate
them (in this case, a tuple of Booleans would be returned).
[ 29 ]
There are two equivalent ways of specifying a keyword argument when calling a
function. They are as follows:
In [28]: remainder(5, 3)
Out[28]: 2
In [29]: remainder(5, divisor=3)
Out[29]: 2
In the first case, 3 is understood as the second argument, divisor. In the second
case, the name of the argument is given explicitly by the caller. This second syntax is
clearer and less error-prone than the first one.
Functions can also accept arbitrary sets of positional and keyword arguments, using
the following syntax:
In [30]: def f(*args, **kwargs):
print("Positional arguments:", args)
print("Keyword arguments:", kwargs)
In [31]: f(1, 2, c=3, d=4)
Out[31]: Positional arguments: (1, 2)
Keyword arguments: {'c': 3, 'd': 4}
Inside the function, args is a tuple containing positional arguments, and kwargs is a
dictionary containing keyword arguments.
Passage by assignment
When passing a parameter to a Python function, a reference to the object is actually
passed (passage by assignment):
Here is an example:
In [32]: my_list = [1, 2]
def add(some_list, value):
some_list.append(value)
add(my_list, 3)
my_list
Out[32]: [1, 2, 3]
[ 30 ]
Chapter 1
The add() function modifies an object defined outside it (in this case, the object
my_list); we say this function has side-effects. A function with no side-effects
is called a pure function: it doesn't modify anything in the outer context, and it
deterministically returns the same result for any given set of inputs. Pure functions
are to be preferred over functions with side-effects.
Knowing this can help you spot out subtle bugs. There are further related concepts
that are useful to know, including function scopes, naming, binding, and more. Here
are a couple of links:
Errors
Let's talk about errors in Python. As you learn, you will inevitably come across
errors and exceptions. The Python interpreter will most of the time tell you what the
problem is, and where it occurred. It is important to understand the vocabulary used
by Python so that you can more quickly find and correct your errors.
Let's see the following example:
In [33]: def divide(a, b):
return a / b
In [34]: divide(1, 0)
Out[34]: --------------------------------------------------------ZeroDivisionError
<ipython-input-2-b77ebb6ac6f6> in <module>()
----> 1 divide(1, 0)
<ipython-input-1-5c74f9fd7706> in divide(a, b)
1 def divide(a, b):
----> 2
return a / b
[ 31 ]
Object-oriented programming
Object-oriented programming (OOP) is a relatively advanced topic. Although we
won't use it much in this book, it is useful to know the basics. Also, mastering OOP is
often essential when you start to have a large code base.
In Python, everything is an object. A number, a string, or a function is an object. An
object is an instance of a type (also known as class). An object has attributes and
methods, as specified by its type. An attribute is a variable bound to an object, giving
some information about it. A method is a function that applies to the object.
[ 32 ]
Chapter 1
For example, the object 'hello' is an instance of the built-in str type (string). The
type() function returns the type of an object, as shown here:
In [35]: type('hello')
Out[35]: str
There are native types, like str or int (integer), and custom types, also called
classes, that can be created by the user.
In IPython, you can discover the attributes and methods of any object with the
dot syntax and tab completion. For example, typing 'hello'.u and pressing Tab
automatically shows us the existence of the upper() method:
In [36]: 'hello'.upper()
Out[36]: 'HELLO'
Here, upper() is a method available to all str objects; it returns an uppercase copy
of a string.
A useful string method is format(). This simple and convenient templating system
lets you generate strings dynamically, as shown in the following example:
In [37]: 'Hello {0:s}!'.format('Python')
Out[37]: Hello Python!
The {0:s} syntax means "replace this with the first argument of format(), which
should be a string". The variable type after the colon is especially useful for numbers,
where you can specify how to display the number (for example, .3f to display
three decimals). The 0 makes it possible to replace a given value several times in a
given string. You can also use a name instead of a positionfor example 'Hello
{name}!'.format(name='Python').
Some methods are prefixed with an underscore _; they are private and are generally
not meant to be used directly. IPython's tab completion won't show you these private
attributes and methods unless you explicitly type _ before pressing Tab.
In practice, the most important thing to remember is that appending a dot . to
any Python object and pressing Tab in IPython will show you a lot of functionality
pertaining to that object.
[ 33 ]
Functional programming
Python is a multi-paradigm language; it notably supports imperative, objectoriented, and functional programming models. Python functions are objects and
can be handled like other objects. In particular, they can be passed as arguments to
other functions (also called higher-order functions). This is the essence of functional
programming.
Decorators provide a convenient syntax construct to define higher-order functions.
Here is an example using the is_even() function from the previous Functions
section:
In [38]: def show_output(func):
def wrapped(*args, **kwargs):
output = func(*args, **kwargs)
print("The result is:", output)
return wrapped
[ 34 ]
Chapter 1
Python 2 and 3
Let's finish this section with a few notes about Python 2 and Python 3 compatibility
issues.
There are still some Python 2 code and libraries that are not compatible with Python
3. Therefore, it is sometimes useful to be aware of the differences between the
two versions. One of the most obvious differences is that print is a statement in
Python 2, whereas it is a function in Python 3. Therefore, print "Hello" (without
parentheses) works in Python 2 but not in Python 3, while print("Hello") works in
both Python 2 and Python 3.
There are several non-mutually exclusive options to write portable code that works
with both versions:
2to3 at https://docs.python.org/3.4/library/2to3.html
six at https://pythonhosted.org/six/
futures at https://docs.python.org/3.4/library/__future__.html
[ 35 ]
Here are some slightly more advanced concepts that you might find useful if you
want to strengthen your Python skills:
The pickle module for persisting Python objects on disk and exchanging
them across a network
[ 36 ]
Chapter 1
Python Cookbook, by David Beazley and Brian K. Jones, O'Reilly Media (advanced
level, highly recommended if you want to become a Python expert)
Open a terminal and type the following commands to go to the minibook's chapter1
directory and launch the Notebook server:
$ cd ~/minibook/chapter1/
$ jupyter notebook
In the Notebook dashboard, open the 15-ten.ipynb notebook. You can also create a
new notebook if you prefer not to use the book's code.
Let's illustrate how to use IPython as an extended shell. We will download an
example dataset, navigate through the filesystem, and open text files, all from
the Notebook. The dataset contains social network data of hundreds of volunteer
Facebook users. This BSD-licensed dataset is provided freely by Stanford's SNAP
project (http://snap.stanford.edu/data/).
[ 37 ]
IPython provides several magic commands that let you interact with your filesystem.
These commands are prefixed with a %. For example here is how to display the
current working directory:
In [1]: %pwd
Out[1]: '/home/cyrille/minibook/chapter1'
Like most other magic commands, this magic command works on all
operating systems, including Windows. IPython implements several
cross-platform Python equivalents of common Unix commands like
pwd. For other commands not implemented by IPython, we need
to call shell commands directly with the ! prefix (as shown in the
following examples). This doesn't work well on Windows since many
of these commands are Unix-specific. In brief, %-prefixed commands
should work on all operating systems while !-prefixed commands will
generally only work on Linux and OS X, not Windows.
Let's download the dataset from the book's data repository (https://github.
com/ipython-books/minibook-2nd-data). IPython doesn't yet provide a magic
command for downloading data, but we can use another IPython trick: we can run
any system or terminal command from IPython by prefixing it with an exclamation
mark (!). For example, here is how to use the wget download utility only available
on Unix systems:
In [2]: !wget https://raw.githubusercontent.com/ipython-books/minibook2nd-data/master/facebook.zip
This wget command downloads a file from a URL and saves it to a file in the local
filesystem. Let's display the list of files in the current directory using the %ls magic
command (available on all systems, even on Windows, since it is a magic command
provided by IPython), as follows:
In [3]: %ls
Out[3]: facebook.zip
[...]
[ 38 ]
Chapter 1
The next step is to unzip this file in the current directory. The first way of doing it
is to use your operating system, generally with a right-click on the icon. On Linux
and OS X, we can also use the unzip command-line tool (you may need to install it
first, for example with a command like sudo apt-get install unzip on Ubuntu).
Finally, it is also possible to do it in pure Python with the zipfile module (see
https://docs.python.org/3.4/library/zipfile.html).
Here, we'll call the unzip tool, which will only work on Linux and OS X, not
Windows:
In [4]: !unzip facebook.zip
Once the archive has been extracted, a new subdirectory named facebook appears,
as shown here:
In [5]: %ls
Out[5]: facebook
facebook.zip
[...]
Let's enter into this subdirectory with the %cd magic command (all operating
systems), as follows:
In [6]: %cd facebook
Out[6]: /home/cyrille/minibook/chapter1/facebook
IPython provides a %bookmark magic to create an alias to the current directory. Let's
type the following:
In [7]: %bookmark fbdata
Now, in any future session, we'll be able to just type %cd fbdata to enter into this
directory. Type %bookmark? to see all options. This magic command is helpful when
dealing with many directories.
[ 39 ]
1684.circles
3437.circles
3980.circles
686.
0.edges
1684.edges
3437.edges
3980.edges
686.edges
107.circles
1912.circles
348.circles
414.circles
698.
107.edges
1912.edges
348.edges
414.edges
698.edges
circles
Here, every number identifies a Facebook user (called the ego user). The .edges file
contains its social graph. In this graph, nodes represent other Facebook users, and
edges represent friendship links between them. The .circles file contains lists of
friends.
Let's retrieve the list of .edges files with the following command (which won't work
on Windows):
In [9]: files = !ls -1 -S | grep .edges
The Unix command ls -1 -S lists all files in the current directory, sorted by
decreasing size. The pipe | grep edges filters only those files that contain .edges.
Then, this list is assigned to a new Python variable named files, as follows:
In [10]: files
Out[10]: ['1912.edges',
'107.edges',
'1684.edges',
'3437.edges',
'348.edges',
'0.edges',
'414.edges',
'686.edges',
'698.edges',
'3980.edges']
[ 40 ]
Chapter 1
On Windows, you can use the following Python code to obtain the same list (if
you're not on Windows, you can skip this code listing):
In [11]: import os
from operator import itemgetter
# Get the name and file size of all .edges files.
files = [(file, os.stat(file).st_size)
for file in os.listdir('.')
if file.endswith('.edges')]
# Sort the list with the second item (file size),
# in decreasing order.
files = sorted(files,
key=itemgetter(1),
reverse=True)
# Only keep the first item (file name), in the same order.
files = [file for (file, size) in files]
Let's display the first few lines of the first file in the list (Unix-specific command):
In [12]: !head -n5 {files[0]}
Out[12]: 2290 2363
2346 2025
2140 2428
2201 2506
2425 2557
The curly braces {} let us insert a Python variable within a system command (here,
the head Unix command which displays the first lines of a text file).
In an .edges file, every line contains the two nodes forming every edge. The
.circles file contains lists of friends. Every line contains a space-separated list of
the users forming every circle.
Alias commands
If you use a complex command regularly, you can create an alias with
the %alias magic command. Type %alias? for more information. See
also the related %store magic command.
[ 41 ]
To obtain information about a magic command, append a question mark (?) after the
command, as shown in the following example:
In [14]: %history?
The %history magic command lets you display and manipulate your command
history in IPython. For example, the following command shows your last five
commands:
In [15]: %history -l 5
Out[15]: files = !ls -1 -S | grep .edges
files
!head -n5 {files[0]}
%lsmagic
%history?
[ 42 ]
Chapter 1
Let's also mention the %dhist magic command that shows you a history of all visited
directories.
Another useful magic command is %paste, which lets you copy-paste Python code
from anywhere into the IPython console (it is not available in the Notebook, where
you can copy-paste as usual).
In IPython, the underscore (_) character always contains the last output. This is
useful if you ran some command and forgot to assign the output to a variable.
In [16]: # how many minutes in a day?
24 * 60
Out[16]: 1440
In [17]: # and in a year?
_ * 365
Out[17]: 525600
We will now see several cell magics, which are magic commands that apply to
a whole code cell rather than just a line of code. They are prefixed by two percent
signs (%%).
The %%capture cell magic lets you capture the standard output and error output of
some code into a Python variable. Here is an example (the outputs are captured in
the output Python variable):
In [18]: %%capture output
%ls
In [19]: output.stdout
Out[19]: 0.circles
circles
1684.circles
3437.circles
3980.circles
686.
0.edges
1684.edges
3437.edges
3980.edges
686.edges
107.circles
1912.circles
348.circles
414.circles
698.
107.edges
1912.edges
348.edges
414.edges
698.edges
circles
[ 43 ]
The %%bash cell magic is an extension of the ! shell prefix. It lets you run multiline
bash code in the Notebook, as shown here:
In [20]: %%bash
cd ..
touch _HEY
ls
rm _HEY
cd facebook
Out[20]: _HEY
facebook
facebook.zip
[...]
More generally, the %%script cell magic lets you execute code with any program
installed on your system. For example, assuming Haskell is installed (see https://
www.haskell.org/downloads), you can easily execute Haskell code from the
Notebook, as follows:
In [21]: %%script ghci
putStrLn "Hello world!"
Out[21]: GHCi, version 7.6.3: http://www.haskell.org/ghc/
:? for help
The ghci executable runs in a separate process, and the contents of the cell are
passed to the executable's input. You can also put a full path after %%script, for
example, on Linux: %%script /usr/bin/ghci.
IHaskell kernel
This way of calling external scripts is only useful for quick interactive
experiments. If you want to run Haskell notebooks, you can use the
IHaskell notebook for Jupyter, available at https://github.com/
gibiansky/IHaskell.
[ 44 ]
Chapter 1
Finally, the %%writefile cell magic lets you write some text in a new file, as shown
here:
In [22]: %%writefile myfile.txt
Hello world!
Out[22]: Writing myfile.txt
In [23]: !more myfile.txt
Out[23]: Hello world!
There are many other magic commands available. We will see several of them later
in this book. Also, in Chapter 6, Customizing IPython, we will see how to create new
magic commands. This is much easier than it sounds!
Refer to the following page for up-to-date documentation about all magic commands:
http://www.ipython.org/ipython-doc/dev/interactive/magics.html.
1684.circles
3437.circles
3980.circles
686.
0.edges
1684.edges
3437.edges
3980.edges
686.edges
107.circles
1912.circles
348.circles
414.circles
698.
107.edges
1912.edges
348.edges
414.edges
698.edges
circles
circles
[ 45 ]
Now, start typing a command and press Tab before finishing it (here, press the Tab
key on your keyboard right after typing e), as follows:
!head -n5 107.e<TAB>
IPython automatically completes the command and adds the four remaining
characters (dges). IPython recognized the beginning of a file name and completed
the command. If there are several completion possibilities, IPython doesn't complete
anything, but instead shows a list of all options. You can then choose the appropriate
solution by pressing the Up or Down keys on the keyboard, and pressing Tab again.
The following screenshot shows an example:
Tab completion is extremely useful when you're getting acquainted with a new
Python package. For example, to quickly see all functions provided by the NetworkX
package, you can type import networkx; networkx.<TAB>.
Customizing tab completion
If you're writing a Python library, you probably want to write
tab-completion-aware code. Your users who work with IPython
will thank you! In most cases, you have nothing to do, and tab
completion will just work. In the rare cases where you use advanced
dynamic techniques in a class, you can customize tab completion
by implementing a __dir__(self) method that returns all
attributes available in the current class instance. See this reference
for more details: https://docs.python.org/3.4/library/
functions.html#dir.
[ 46 ]
Chapter 1
```python
print("Hello world!")
```
and images:

[ 47 ]
If you write this in a Markdown cell, and "play" the cell (for example, by pressing
Ctrl + Enter), you will see the rendered text. The following screenshot shows the two
modes of the cell:
By using both Markdown cells and code cells in a notebook, you can write an
interactive document about any technical topic. Hence, the Notebook is not only an
interface to code, it is also a platform to write documents or even books. In fact, this
very book is entirely written in the Notebook!
Here are a few references about Markdown and LaTeX:
[ 48 ]
Chapter 1
Here is a screenshot:
The square(x) function just prints a sentence like The square of 7 is 49. By
adding the @interact decorator above the function's definition, we tell IPython
to create a widget to control the function's input x. The argument x=(0, 10) is a
convention to indicate that we want a slider to control an integer between 0 and 10.
This method supports other common controls like checkboxes, dropdown menus,
radio buttons, push buttons, and others.
Finally, entirely customizable widgets can be created, but this requires some
knowledge of web technologies such as HTML, CSS, and JavaScript. The IPython
Cookbook (http://ipython-books.github.io/cookbook/) contains many
examples. You can also refer to the following links for more information:
[ 50 ]
Chapter 1
myscript.py. Such a script can be called from the system terminal like this: python
myscript.py. Python will execute the script and quit at the end. If you use the -i
option, Python will start the interactive prompt when the script ends.
IPython also supports this technique; just replace python by ipython. For example:
ipython -i script.py to run script.py interactively with IPython.
You can also run a script from within IPython by using the %run magic command.
The script runs in an empty namespace, meaning that any variable defined in the
interactive namespace is not available within the executed script. However, at the
end of the execution, the control returns to IPython, and the variables defined in
the script are imported into the interactive namespace. This lets you inspect the
intermediate variables used in the script. If you use the -i option, the script will run
in the interactive namespace. Any variable defined in the interactive session will be
available in the script.
Let's also mention the similar %load magic command.
A namespace is a dictionary mapping variable names to Python objects.
The global namespace contains global variables, whereas the local
namespace of a function contains the local variables defined in the
function. In IPython, the interactive namespace contains all objects
defined and imported within the current interactive session. The %who,
%whos, and %who_ls magic commands give you some information
about the interactive variables.
[ 51 ]
For example, let's write a script egos.py that lists all ego identifiers in the Facebook
data folder. Since each filename is of the form <egoid>.<extension>, we list all files,
remove the extensions, and take the sorted list of all unique identifiers. We can create
this file from the Notebook, using the %%writefile cell magic as follows:
In [28]: %cd fbdata
%cd ..
Out[28]: (bookmark:fbdata) -> /home/cyrille/minibook/chapter1/facebook
/home/cyrille/minibook/chapter1/facebook
In [29]: %%writefile egos.py
import sys
import os
# We retrieve the folder as the first positional argument
# to the command-line call
if len(sys.argv) > 1:
folder = sys.argv[1]
# We list all files in the specified folder
files = os.listdir(folder)
# ids contains the list of idenfitiers
identifiers = [int(file.split('.')[0]) for file in files]
# Finally, we remove duplicates with set(), and sort the list
# with sorted().
ids = sorted(set(identifiers))
Out[29]: Overwriting egos.py
This script accepts an argument folder as an input. It is retrieved from the Python
script via the sys.argv list, which contains the list of arguments passed to the script
via the command-line interface.
Let's execute this script in IPython using the %run magic command, as follows:
In [30]: %run egos.py facebook
If you get an error when running this script, make sure that the
facebook directory only contains <number>.xxx files (like
0.circles or 1684.edges).
In [31]: ids
Out[31]: [0, 107, 348, 414, 686, 698, 1684, 1912, 3437, 3980]
The ids variable created in the script is now available in the interactive namespace.
[ 52 ]
Chapter 1
Let's see what happens if we do not specify the folder name to the script, as follows:
In [32]: folder = 'facebook'
In [33]: %run egos.py
We get an error: NameError: name 'folder' is not defined. This is because the
variable folder is defined in the interactive namespace, but is not available within
the script by default. We can change this behavior with the -i option, as follows:
In [34]: %run -i egos.py
In [35]: ids
Out[35]: [0, 107, 348, 414, 686, 698, 1684, 1912, 3437, 3980]
This shows the docstring and other information in the Notebook pager, as shown in
the following screenshot:
[ 53 ]
Typing ?? instead of ? shows even more information, including the whole source
code of the Python object when it is available.
There are also several magic commands for inspecting Python objects:
%pfile: Displays the source code of the Python script where an object is
defined
Chapter 1
The entire list of commands can be found in the documentation of the pdb module in
Python at https://docs.python.org/3.4/library/pdb.html.
Let's also mention the IPython.embed() function that you can call anywhere in
a Python script. This stops the script execution and starts IPython for debugging
purposes. Leaving the embedded IPython terminal resumes the normal execution of
the script.
Multiple calls are done in order to get more reliable time estimates. The number of
calls is determined automatically, but you can use the -r and -n options to specify
them directly. Type %timeit? to get more information.
Now we write a function that returns the number of connected components in all
graphs defined in the directory, as follows:
In [46]: import glob
def ncomponents_files():
return [(file, ncomponents(file))
for file in sorted(glob.glob('*.edges'))]
[ 56 ]
Chapter 1
Out[47]: 0.edges
5 component(s)
107.edges
1 component(s)
1684.edges
4 component(s)
1912.edges
2 component(s)
3437.edges
2 component(s)
348.edges
1 component(s)
3980.edges
4 component(s)
414.edges
2 component(s)
686.edges
1 component(s)
698.edges
3 component(s)
Now, to run the profiler, we use the %prun magic function, as follows:
In [49]: %prun -s cumtime ncomponents_files()
Out[49]: 2391070 function calls in 1.038 seconds
Ordered by: cumulative time
ncalls tottime percall
filename:lineno(function)
cumtime
percall
0.000
0.000
1.038
0.000
0.000
1.038
1.038 <string>:1(<module>)
10
0.000
0.000
0.995
0.100 <string>:1(read_
10
0.000
0.000
0.995
0.100 decorators.py:155(_
10
0.376
py:174(parse_edgelist)
0.038
0.995
0.099 edgelist.
170174
0.279
0.000
0.350
0.000 graph.py:648(add_
0.059
0.000
0.095
0.000 edgelist.
10
0.000
0.000
py:98(number_connected_components)
0.021
0.002 connected.
35
0.001
0.000
py:22(connected_components)
0.021
0.001 connected.
exec}
edgelist)
open_file)
edge)
170184
py:366(<genexpr>)
[ 57 ]
Let's explain what happened here. The profiler kept track of all function calls
(including functions internal to NetworkX and Python) performed while our
ncomponents_files() function was running. There were 2,391,070 function calls.
That's a lot! Opening a file, reading and parsing every line, creating the graphs,
finding the number of connected components, and so on, are operations that involve
many function calls.
The profiler shows the list of all function calls (we just showed a subset here). There
are many ways to sort the functions. Here, we chose to sort them by cumulative time,
which is the total time spent within every function (-s cumtime option).
For every function, the profiler shows the total number of calls, and several time
statistics, described here (copied verbatim from the profiler documentation):
tottime: the total time spent in the given function (and excluding time made
in calls to sub-functions)
You will find more information by typing %prun? or by looking here: https://
docs.python.org/3.4/library/profile.html
Here, we see that computing the number of connected components took considerably
less time than loading the graphs from the text files. Depending on the use-case, this
might suggest using a more efficient file format.
There is of course much more to say about profiling and optimization. For example,
it is possible to profile a function line by line, which provides an even more finegrained profiling report. The IPython Cookbook contains many more details.
Summary
In this chapter, we covered everything you need to get started with Python, IPython,
and the Jupyter Notebook. We detailed how to install the software, we reviewed
the basics of the Python language, and we demonstrated ten of the most essential
features of IPython and the Jupyter Notebook.
In the next chapter, we will use these tools to analyze real-world datasets.
[ 58 ]
www.PacktPub.com
Stay Connected: