Scientific Computing With Python
Scientific Computing With Python
Robert Johansson
April 16, 2016
Contents
1 Introduction to scientific computing with Python
1.1 The role of computing in science . . . . . . . . . . . .
1.1.1 References . . . . . . . . . . . . . . . . . . . . .
1.2 Requirements on scientific computing . . . . . . . . . .
1.2.1 Tools for managing source code . . . . . . . . .
1.3 What is Python? . . . . . . . . . . . . . . . . . . . . .
1.4 What makes python suitable for scientific computing?
1.4.1 The scientific python software stack . . . . . .
1.4.2 Python environments . . . . . . . . . . . . . .
1.4.3 Python interpreter . . . . . . . . . . . . . . . .
1.4.4 IPython . . . . . . . . . . . . . . . . . . . . . .
1.4.5 IPython notebook . . . . . . . . . . . . . . . .
1.4.6 Spyder . . . . . . . . . . . . . . . . . . . . . . .
1.5 Versions of Python . . . . . . . . . . . . . . . . . . . .
1.6 Installation . . . . . . . . . . . . . . . . . . . . . . . .
1.6.1 Conda . . . . . . . . . . . . . . . . . . . . . . .
1.6.2 Linux . . . . . . . . . . . . . . . . . . . . . . .
1.6.3 MacOS X . . . . . . . . . . . . . . . . . . . . .
1.6.4 Windows . . . . . . . . . . . . . . . . . . . . .
1.7 Further reading . . . . . . . . . . . . . . . . . . . . . .
1.8 Python and module versions . . . . . . . . . . . . . . .
2 Introduction to Python programming
2.1 Python program files . . . . . . . . . . . . . . .
2.1.1 Example: . . . . . . . . . . . . . . . . .
2.1.2 Character encoding . . . . . . . . . . . .
2.2 IPython notebooks . . . . . . . . . . . . . . . .
2.3 Modules . . . . . . . . . . . . . . . . . . . . . .
2.3.1 References . . . . . . . . . . . . . . . . .
2.3.2 Looking at what a module contains, and
2.4 Variables and types . . . . . . . . . . . . . . . .
2.4.1 Symbol names . . . . . . . . . . . . . .
2.4.2 Assignment . . . . . . . . . . . . . . . .
2.4.3 Fundamental types . . . . . . . . . . . .
2.4.4 Type utility functions . . . . . . . . . .
2.4.5 Type casting . . . . . . . . . . . . . . .
2.5 Operators and comparisons . . . . . . . . . . .
2.6 Compound types: Strings, List and dictionaries
2.6.1 Strings . . . . . . . . . . . . . . . . . . .
2.6.2 List . . . . . . . . . . . . . . . . . . . .
2.6.3 Tuples . . . . . . . . . . . . . . . . . . .
2.6.4 Dictionaries . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
4
4
5
5
6
6
7
7
7
7
7
8
8
8
8
8
9
9
9
9
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
its documentation
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
11
12
12
12
12
13
14
14
14
15
16
16
17
18
18
20
23
24
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2.7
2.8
2.9
2.10
2.11
2.12
2.13
2.14
Control Flow . . . . . . . . . . . . . . . . . . . . . .
2.7.1 Conditional statements: if, elif, else . . . . . .
Loops . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8.1 for loops: . . . . . . . . . . . . . . . . . . .
2.8.2 List comprehensions: Creating lists using for
2.8.3 while loops: . . . . . . . . . . . . . . . . . .
Functions . . . . . . . . . . . . . . . . . . . . . . . .
2.9.1 Default argument and keyword arguments . .
2.9.2 Unnamed functions (lambda function) . . . .
Classes . . . . . . . . . . . . . . . . . . . . . . . . . .
Modules . . . . . . . . . . . . . . . . . . . . . . . . .
Exceptions . . . . . . . . . . . . . . . . . . . . . . .
Further reading . . . . . . . . . . . . . . . . . . . . .
Versions . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
loops:
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
24
24
26
26
27
27
28
29
30
30
31
33
35
35
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
arrays
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
36
36
36
36
38
40
40
41
42
42
42
43
44
45
45
46
46
46
46
47
47
48
49
50
50
52
53
54
55
55
55
55
56
56
57
58
59
59
60
3.17 Versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
61
61
62
63
63
65
70
71
72
72
73
73
75
76
77
78
79
80
81
81
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
82
82
83
83
84
89
90
90
92
95
97
98
100
102
103
103
104
105
107
107
111
114
117
119
121
122
Python
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
123
123
123
124
124
125
126
126
127
127
127
128
128
128
129
129
130
130
130
131
131
. . . . .
. . . . .
. . . . .
. . . . .
output
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
132
132
132
132
134
138
141
141
141
143
143
143
144
144
146
146
146
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
147
147
148
151
151
151
152
152
153
154
154
155
158
158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8.6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
161
161
161
162
162
162
163
163
164
165
166
166
167
167
169
169
170
172
173
173
173
174
174
Chapter 1
1.1
Science has traditionally been divided into experimental and theoretical disciplines, but during the last
several decades computing has emerged as a very important part of science. Scientific computing is often
closely related to theory, but it also has many characteristics in common with experimental work. It is
therefore often viewed as a new third branch of science. In most fields of science, computational work is an
important complement to both experiments and theory, and nowadays a vast majority of both experimental
and theoretical papers involve some numerical calculations, simulations or computer modeling.
In experimental and theoretical sciences there are well established codes of conducts for how results
and methods are published and made available to other scientists. For example, in theoretical sciences,
derivations, proofs and other results are published in full detail, or made available upon request. Likewise,
in experimental sciences, the methods used and the results are published, and all experimental data should
be available upon request. It is considered unscientific to withhold crucial details in a theoretical proof or
experimental method, that would hinder other scientists from replicating and reproducing the results.
In computational sciences there are not yet any well established guidelines for how source code and
generated data should be handled. For example, it is relatively rare that source code used in simulations for
published papers are provided to readers, in contrast to the open nature of experimental and theoretical work.
And it is not uncommon that source code for simulation software is withheld and considered a competitive
advantage (or unnecessary to publish).
However, this issue has recently started to attract increasing attention, and a number of editorials in
high-profile journals have called for increased openness in computational sciences. Some prestigious journals,
including Science, have even started to demand of authors to provide the source code for simulation software
used in publications to readers upon request.
Discussions are also ongoing on how to facilitate distribution of scientific software, for example as supplementary materials to scientific papers.
1.1.1
References
Reproducible Research in Computational Science, Roger D. Peng, Science 334, 1226 (2011).
Shining Light into Black Boxes, A. Morin et al., Science 336, 159-160 (2012).
The case for open computer programs, D.C. Ince, Nature 482, 485 (2012).
1.2
Replication and reproducibility are two of the cornerstones in the scientific method. With respect to
numerical work, complying with these concepts have the following practical implications:
Replication: An author of a scientific paper that involves numerical calculations should be able to
rerun the simulations and replicate the results upon request. Other scientist should also be able to
perform the same calculations and obtain the same results, given the information about the methods
used in a publication.
Reproducibility: The results obtained from numerical simulations should be reproducible with an
independent implementation of the method, or using a different method altogether.
In summary: A sound scientific result should be reproducible, and a sound scientific study should be
replicable.
To achieve these goals, we need to:
Keep and take note of exactly which source code and version that was used to produce data and figures
in published papers.
Record information of which version of external software that was used. Keep access to the environment
that was used.
Make sure that old codes and notes are backed up and kept for future reference.
Be ready to give additional information about the methods used, and perhaps also the simulation
codes, to an interested reader who requests it (even years after the paper was published!).
Ideally codes should be published online, to make it easier for other scientists interested in the codes
to access it.
1.2.1
Ensuring replicability and reprodicibility of scientific simulations is a complicated problem, but there are
good tools to help with this:
Revision Control System (RCS) software.
Good choices include:
git - http://git-scm.com
mercurial - http://mercurial.selenic.com. Also known as hg.
subversion - http://subversion.apache.org. Also known as svn.
Online repositories for source code. Available as both private and public repositories.
Some good alternatives are
Github - http://www.github.com
Bitbucket - http://www.bitbucket.com
Privately hosted repositories on the universitys or departments servers.
Note
Repositories are also excellent for version controlling manuscripts, figures, thesis files, data files, lab logs,
etc. Basically for any digital content that must be preserved and is frequently updated. Again, both public
and private repositories are readily available. They are also excellent collaboration tools!
7
1.3
What is Python?
1.4
1.4.1
1.4.2
Python environments
Python is not only a programming language, but often also refers to the standard implementation of the
interpreter (technically referred to as CPython) that actually runs the python code on a computer.
There are also many different environments through which the python interpreter can be used. Each
environment has different advantages and is suitable for different workflows. One strength of python is that
it is versatile and can be used in complementary ways, but it can be confusing for beginners so we will start
with a brief survey of python environments that are useful for scientific computing.
1.4.3
Python interpreter
The standard way to use the Python programming language is to use the Python interpreter to run python
code. The python interpreter is a program that reads and execute the python code in files passed to it as
arguments. At the command prompt, the command python is used to invoke the Python interpreter.
For example, to run a file my-program.py that contains python code from the command prompt, use::
$ python my-program.py
We can also start the interpreter by simply typing python at the command line, and interactively type
python code into the interpreter.
This is often how we want to work when developing scientific applications, or when doing small calculations. But the standard python interpreter is not very convenient for this kind of work, due to a number of
limitations.
1.4.4
IPython
IPython is an interactive shell that addresses the limitation of the standard python interpreter, and it is a
work-horse for scientific use of python. It provides an interactive prompt to the python interpreter with a
greatly improved user-friendliness.
Some of the many useful features of IPython includes:
Command history, which can be browsed with the up and down arrows on the keyboard.
Tab auto-completion.
In-line editing of code.
Object introspection, and automatic extract of documentation strings from python objects like classes
and functions.
Good interaction with operating system shell.
Support for multiple parallel back-end processes, that can run on computing clusters or cloud services
like Amazon EE2.
1.4.5
IPython notebook
IPython notebook is an HTML-based notebook environment for Python, similar to Mathematica or Maple.
It is based on the IPython shell, but provides a cell-based environment with great interactivity, where
calculations can be organized and documented in a structured way.
Although using a web browser as graphical interface, IPython notebooks are usually run locally, from the
same computer that run the browser. To start a new IPython notebook session, run the following command:
$ ipython notebook
from a directory where you want the notebooks to be stored. This will open a new browser window (or
a new tab in an existing window) with an index page where existing notebooks are shown and from which
new notebooks can be created.
1.4.6
Spyder
Spyder is a MATLAB-like IDE for scientific computing with python. It has the many advantages of a
traditional IDE environment, for example that everything from code editing, execution and debugging is
carried out in a single environment, and work on different calculations can be organized as projects in the
IDE environment.
Some advantages of Spyder:
Powerful code editor, with syntax high-lighting, dynamic code introspection and integration with the
python debugger.
Variable explorer, IPython command prompt.
Integrated documentation and help.
1.5
Versions of Python
There are currently two versions of python: Python 2 and Python 3. Python 3 will eventually supercede
Python 2, but it is not backward-compatible with Python 2. A lot of existing python code and packages
has been written for Python 2, and it is still the most wide-spread version. For these lectures either version
will be fine, but it is probably easier to stick with Python 2 for now, because it is more readily available via
prebuilt packages and binary installers.
To see which version of Python you have, run
$ python --version
Python 2.7.3
$ python3.2 --version
Python 3.2.3
Several versions of Python can be installed in parallel, as shown above.
1.6
1.6.1
Installation
Conda
The best way set-up an scientific Python environment is to use the cross-platform package manager conda
from Continuum Analytics. First download and install miniconda http://conda.pydata.org/miniconda.html
or Anaconda (see below). Next, to install the required libraries for these notebooks, simply run:
$ conda install ipython ipython-notebook spyder numpy scipy sympy matplotlib cython
This should be sufficient to get a working environment on any platform supported by conda.
1.6.2
Linux
10
1.6.3
MacOS X
Macports
Python is included by default in Mac OS X, but for our purposes it will be useful to install a new python
environment using Macports, because it makes it much easier to install all the required additional packages.
Using Macports, we can install what we need with:
$ sudo port install py27-ipython +pyside+notebook+parallel+scientific
$ sudo port install py27-scipy py27-matplotlib py27-sympy
$ sudo port install py27-spyder
These will associate the commands python and ipython with the versions installed via macports (instead
of the one that is shipped with Mac OS X), run the following commands:
$ sudo port select python python27
$ sudo port select ipython ipython27
Fink
Or, alternatively, you can use the Fink package manager. After installing Fink, use the following command
to install python and the packages that we need:
$ sudo fink install python27 ipython-py27 numpy-py27 matplotlib-py27 scipy-py27 sympy-py27
$ sudo fink install spyder-mac-py27
1.6.4
Windows
Windows lacks a good packaging system, so the easiest way to setup a Python environment is to install a
pre-packaged distribution. Some good alternatives are:
Enthought Python Distribution. EPD is a commercial product but is available free for academic use.
Anaconda. The Anaconda Python distribution comes with many scientific computing and data science
packages and is free, including for commercial use and redistribution. It also has add-on products such
as Accelerate, IOPro, and MKL Optimizations, which have free trials and are free for academic use.
Python(x,y). Fully open source.
Note
EPD and Anaconda are also available for Linux and Max OS X.
1.7
Further reading
1.8
Since there are several different versions of Python and each Python package has its own release cycle and
version number (for example scipy, numpy, matplotlib, etc., which we installed above and will discuss in
detail in the following lectures), it is important for the reproducibility of an IPython notebook to record
the versions of all these different software packages. If this is done properly it will be easy to reproduce the
environment that was used to run a notebook, but if not it can be hard to know what was used to produce
the results in a notebook.
To encourage the practice of recording Python and module versions in notebooks, Ive created a simple
IPython extension that produces a table with versions numbers of selected software components. I believe
that it is a good practice to include this kind of table in every notebook you create.
To install this IPython extension, use pip install version information:
11
12
Chapter 2
2.1
Python code is usually stored in text files with the file ending .py:
myprogram.py
Every line in a Python program file is assumed to be a Python statement, or part thereof.
The only exception is comment lines, which start with the character # (optionally preceded by
an arbitrary number of white-space characters, i.e., tabs or spaces). Comment lines are usually
ignored by the Python interpreter.
To run our Python program from the command line we use:
$ python myprogram.py
On UNIX systems it is common to define the path to the interpreter on the first line of the program
(note that this is a comment line as far as the Python interpreter is concerned):
#!/usr/bin/env python
If we do, and if we additionally set the file script to be executable, we can run the program like this:
$ myprogram.py
2.1.1
Example:
In [1]: ls scripts/hello-world*.py
scripts/hello-world-in-swedish.py
scripts/hello-world.py
13
#!/usr/bin/env python
print("Hello world!")
2.1.2
Character encoding
The standard character encoding is ASCII, but we can use any other encoding, for example UTF-8. To
specify that UTF-8 is used we include the special line
# -*- coding: UTF-8 -*at the top of the file.
In [4]: cat scripts/hello-world-in-swedish.py
#!/usr/bin/env python
# -*- coding: UTF-8 -*print("Hej v
arlden!")
2.2
IPython notebooks
This file - an IPython notebook - does not follow the standard pattern with Python code in a text file.
Instead, an IPython notebook is stored as a file in the JSON format. The advantage is that we can mix
formatted text, Python code and code output. It requires the IPython notebook server to run it though,
and therefore isnt a stand-alone Python program as described above. Other than that, there is no difference
between the Python code that goes into a program file or an IPython notebook.
2.3
Modules
Most of the functionality in Python is provided by modules. The Python Standard Library is a large collection
of modules that provides cross-platform implementations of common facilities such as access to the operating
system, file I/O, string management, network communication, and much more.
2.3.1
References
14
To use a module in a Python program it first has to be imported. A module can be imported using the
import statement. For example, to import the module math, which contains many standard mathematical
functions, we can do:
In [6]: import math
This includes the whole module and makes it available for use later in the program. For example, we can
do:
In [7]: import math
x = math.cos(2 * math.pi)
print(x)
1.0
Alternatively, we can chose to import all symbols (functions and variables) in a module to the current
namespace (so that we dont need to use the prefix math. every time we use something from the math
module:
In [8]: from math import *
x = cos(2 * pi)
print(x)
1.0
This pattern can be very convenient, but in large programs that include many modules it is often a good
idea to keep the symbols from each module in their own namespaces, by using the import math pattern.
This would elminate potentially confusing problems with name space collisions.
As a third alternative, we can chose to import only a few selected symbols from a module by explicitly
listing which ones we want to import instead of using the wildcard character *:
In [9]: from math import cos, pi
x = cos(2 * pi)
print(x)
1.0
2.3.2
Once a module is imported, we can list the symbols it provides using the dir function:
In [10]: import math
print(dir(math))
[' doc ', ' file ', ' name ', ' package ', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh
15
And using the function help we can get a description of each function (almost .. not all functions have
docstrings, as they are technically called, but the vast majority of functions are documented this way).
In [11]: help(math.log)
Help on built-in function log in module math:
log(...)
log(x[, base])
Return the logarithm of x to the given base.
If the base not specified, returns the natural logarithm (base e) of x.
In [12]: log(10)
Out[12]: 2.302585092994046
In [13]: log(10, 2)
Out[13]: 3.3219280948873626
We can also use the help function directly on modules: Try
help(math)
Some very useful modules form the Python standard library are os, sys, math, shutil, re, subprocess,
multiprocessing, threading.
A complete lists of standard modules for Python 2 and Python 3 are available at
http://docs.python.org/2/library/ and http://docs.python.org/3/library/, respectively.
2.4
2.4.1
Variable names in Python can contain alphanumerical characters a-z, A-Z, 0-9 and some special characters
such as . Normal variable names must start with a letter.
By convention, variable names start with a lower-case letter, and Class names start with a capital letter.
In addition, there are a number of Python keywords that cannot be used as variable names. These
keywords are:
and, as, assert, break, class, continue, def, del, elif, else, except,
exec, finally, for, from, global, if, import, in, is, lambda, not, or,
pass, print, raise, return, try, while, with, yield
Note: Be aware of the keyword lambda, which could easily be a natural variable name in a scientific
program. But being a keyword, it cannot be used as a variable name.
2.4.2
Assignment
The assignment operator in Python is =. Python is a dynamically typed language, so we do not need to
specify the type of a variable when we create one.
Assigning a value to a new variable creates the variable:
In [14]: # variable assignments
x = 1.0
my_variable = 12.2
16
Although not explicitly specified, a variable does have a type associated with it. The type is derived from
the value that was assigned to it.
In [15]: type(x)
Out[15]: float
If we assign a new value to a variable, its type can change.
In [16]: x = 1
In [17]: type(x)
Out[17]: int
If we try to use a variable that has not yet been defined we get an NameError:
In [18]: print(y)
--------------------------------------------------------------------------NameError
<ipython-input-18-36b2093251cd> in <module>()
----> 1 print(y)
2.4.3
Fundamental types
In [19]: # integers
x = 1
type(x)
Out[19]: int
In [20]: # float
x = 1.0
type(x)
Out[20]: float
In [21]: # boolean
b1 = True
b2 = False
type(b1)
Out[21]: bool
In [22]: # complex numbers: note the use of `j` to specify the imaginary part
x = 1.0 - 1.0j
type(x)
Out[22]: complex
17
In [23]: print(x)
(1-1j)
2.4.4
The module types contains a number of type name definitions that can be used to test if variables are of
certain types:
In [25]: import types
# print all types defined in the `types` module
print(dir(types))
In [26]: x = 1.0
# check if the variable x is a float
type(x) is float
Out[26]: True
In [27]: # check if the variable x is an int
type(x) is int
Out[27]: False
We can also use the isinstance method for testing types of variables:
In [28]: isinstance(x, float)
Out[28]: True
2.4.5
Type casting
In [29]: x = 1.5
print(x, type(x))
(1.5, <type 'float'>)
In [30]: x = int(x)
print(x, type(x))
(1, <type 'int'>)
18
In [31]: z = complex(x)
print(z, type(z))
((1+0j), <type 'complex'>)
In [32]: x = float(z)
--------------------------------------------------------------------------TypeError
<ipython-input-32-e719cc7b3e96> in <module>()
----> 1 x = float(z)
2.5
Out[37]: 4
Note: The / operator always performs a floating point division in Python 3.x. This is not true in Python
2.x, where the result of / is always an integer if the operands are integers. to be more specific, 1/2 = 0.5
(float) in Python 3.x, and 1/2 = 0 (int) in Python 2.x (but 1.0/2 = 0.5 in Python 2.x).
The boolean operators are spelled out as the words and, not, or.
In [38]: True and False
Out[38]: False
In [39]: not False
Out[39]: True
In [40]: True or False
Out[40]: True
Comparison operators >, <, >= (greater or equal), <= (less or equal), == equality, is identical.
In [41]: 2 > 1, 2 < 1
Out[41]: (True, False)
In [42]: 2 > 2, 2 < 2
Out[42]: (False, False)
In [43]: 2 >= 2, 2 <= 2
Out[43]: (True, True)
In [44]: # equality
[1,2] == [1,2]
Out[44]: True
In [45]: # objects identical?
l1 = l2 = [1,2]
l1 is l2
Out[45]: True
2.6
2.6.1
Strings are the variable type that is used for storing text messages.
In [46]: s = "Hello world"
type(s)
Out[46]: str
In [47]: # length of the string: the number of characters
len(s)
20
Out[47]: 11
In [48]: # replace a substring in a string with somethign else
s2 = s.replace("world", "test")
print(s2)
Hello test
We can index a character in a string using []:
In [49]: s[0]
Out[49]: 'H'
Heads up MATLAB users: Indexing start at 0!
We can extract a part of a string using the syntax [start:stop], which extracts characters between
index start and stop -1 (the character at index stop is not included):
In [50]: s[0:5]
Out[50]: 'Hello'
In [51]: s[4:5]
Out[51]: 'o'
If we omit either (or both) of start or stop from [start:stop], the default is the beginning and the
end of the string, respectively:
In [52]: s[:5]
Out[52]: 'Hello'
In [53]: s[6:]
Out[53]: 'world'
In [54]: s[:]
Out[54]: 'Hello world'
We can also define the step size using the syntax [start:end:step] (the default value for step is 1, as
we saw above):
In [55]: s[::1]
Out[55]: 'Hello world'
In [56]: s[::2]
Out[56]: 'Hlowrd'
This
technique
is
called
slicing.
Read
more
about
http://docs.python.org/release/2.7.3/library/functions.html?highlight=slice#slice
Python has a very rich set of functions for text processing.
http://docs.python.org/2/library/string.html for more information.
21
the
See
syntax
for
here:
example
value = 1.000000
In [61]: # this formatting creates a string
s2 = "value1 = %.2f. value2 = %d" % (3.1415, 1.5)
print(s2)
value1 = 3.14. value2 = 1
In [62]: # alternative, more intuitive way of formatting a string
s3 = 'value1 = {0}, value2 = {1}'.format(3.1415, 1.5)
print(s3)
value1 = 3.1415, value2 = 1.5
2.6.2
List
Lists are very similar to strings, except that each element can be of any type.
The syntax for creating lists in Python is [...]:
In [63]: l = [1,2,3,4]
print(type(l))
print(l)
<type 'list'>
[1, 2, 3, 4]
We can use the same slicing techniques to manipulate lists as we could use on strings:
In [64]: print(l)
print(l[1:3])
print(l[::2])
22
[1, 2, 3, 4]
[2, 3]
[1, 3]
Heads up MATLAB users: Indexing starts at 0!
In [65]: l[0]
Out[65]: 1
Elements in a list do not all have to be of the same type:
In [66]: l = [1, 'a', 1.0, 1-1j]
print(l)
[1, 'a', 1.0, (1-1j)]
Python lists can be inhomogeneous and arbitrarily nested:
In [67]: nested_list = [1, [2, [3, [4, [5]]]]]
nested_list
Out[67]: [1, [2, [3, [4, [5]]]]]
Lists play a very important role in Python. For example they are used in loops and other flow control
structures (discussed below). There are a number of convenient functions for generating lists of various
types, for example the range function:
In [68]: start = 10
stop = 30
step = 2
range(start, stop, step)
Out[68]: [10, 12, 14, 16, 18, 20, 22, 24, 26, 28]
In [69]: # in python 3 range generates an interator, which can be converted to a list using 'list(...)'.
# It has no effect in python 2
list(range(start, stop, step))
Out[69]: [10, 12, 14, 16, 18, 20, 22, 24, 26, 28]
In [70]: list(range(-10, 10))
Out[70]: [-10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [71]: s
Out[71]: 'Hello world'
In [72]: # convert a string to a list by type casting:
s2 = list(s)
s2
23
Out[72]: ['H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']
In [73]: # sorting lists
s2.sort()
print(s2)
[' ', 'H', 'd', 'e', 'l', 'l', 'l', 'o', 'o', 'r', 'w']
"i")
"n")
"s")
"e")
"r")
"t")
print(l)
['i', 'n', 's', 'e', 'r', 't', 'A', 'd', 'd']
Remove first element with specific value using remove
24
In [78]: l.remove("A")
print(l)
['i', 'n', 's', 'e', 'r', 't', 'd', 'd']
Remove an element at a specific location using del:
In [79]: del l[7]
del l[6]
print(l)
['i', 'n', 's', 'e', 'r', 't']
See help(list) for more details, or read the online documentation
2.6.3
Tuples
Tuples are like lists, except that they cannot be modified once created, that is they are immutable.
In Python, tuples are created using the syntax (..., ..., ...), or even ..., ...:
In [80]: point = (10, 20)
print(point, type(point))
((10, 20), <type 'tuple'>)
In [81]: point = 10, 20
print(point, type(point))
((10, 20), <type 'tuple'>)
We can unpack a tuple by assigning it to a comma-separated list of variables:
In [82]: x, y = point
print("x =", x)
print("y =", y)
('x =', 10)
('y =', 20)
If we try to assign a new value to an element in a tuple we get an error:
In [83]: point[0] = 20
--------------------------------------------------------------------------TypeError
25
<ipython-input-83-ac1c641a5dca> in <module>()
----> 1 point[0] = 20
2.6.4
Dictionaries
Dictionaries are also like lists, except that each element is a key-value pair. The syntax for dictionaries is
{key1 : value1, ...}:
In [84]: params = {"parameter1" : 1.0,
"parameter2" : 2.0,
"parameter3" : 3.0,}
print(type(params))
print(params)
<type 'dict'>
{'parameter1': 1.0, 'parameter3': 3.0, 'parameter2': 2.0}
2.7
2.7.1
=
=
=
=
=
=
=
=
"
"
"
"
+
+
+
+
str(params["parameter1"]))
str(params["parameter2"]))
str(params["parameter3"]))
str(params["parameter4"]))
A
B
3.0
D
Control Flow
Conditional statements: if, elif, else
The Python syntax for conditional execution of code uses the keywords if, elif (else if), else:
26
2.8
Loops
In Python, loops can be programmed in a number of different ways. The most common is the for loop,
which is used together with iterable objects, such as lists. The basic syntax is:
2.8.1
for loops:
0
1
2
In [95]: for word in ["scientific", "computing", "with", "python"]:
print(word)
scientific
computing
with
python
To iterate over key-value pairs of a dictionary:
In [96]: for key, value in params.items():
print(key + " = " + str(value))
parameter4
parameter1
parameter3
parameter2
=
=
=
=
D
A
3.0
B
Sometimes it is useful to have access to the indices of the values when iterating over a list. We can use
the enumerate function for this:
In [97]: for idx, x in enumerate(range(-3,3)):
print(idx, x)
(0,
(1,
(2,
(3,
(4,
(5,
-3)
-2)
-1)
0)
1)
2)
2.8.2
2.8.3
while loops:
In [99]: i = 0
while i < 5:
print(i)
29
i = i + 1
print("done")
0
1
2
3
4
done
Note that the print("done") statement is not part of the while loop body because of the difference in
indentation.
2.9
Functions
A function in Python is defined using the keyword def, followed by a function name, a signature within
parentheses (), and a colon :. The following code, with one additional level of indentation, is the function
body.
In [100]: def func0():
print("test")
In [101]: func0()
test
Optionally, but highly recommended, we can define a so called docstring, which is a description of the
functions purpose and behaivor. The docstring should follow directly after the function definition, before
the code in the function body.
In [102]: def func1(s):
"""
Print a string 's' and tell how many characters it has
"""
print(s + " has " + str(len(s)) + " characters")
In [103]: help(func1)
Help on function func1 in module
main :
func1(s)
Print a string 's' and tell how many characters it has
In [104]: func1("test")
test has 4 characters
Functions that returns a value use the return keyword:
30
2.9.1
In a definition of a function, we can give default values to the arguments the function takes:
In [110]: def myfunc(x, p=2, debug=False):
if debug:
print("evaluating myfunc for x = " + str(x) + " using exponent p = " + str(p))
return x**p
If we dont provide a value of the debug argument when calling the the function myfunc it defaults to
the value provided in the function definition:
In [111]: myfunc(5)
Out[111]: 25
In [112]: myfunc(5, debug=True)
evaluating myfunc for x = 5 using exponent p = 2
Out[112]: 25
If we explicitly list the name of the arguments in the function calls, they do not need to come in the same
order as in the function definition. This is called keyword arguments, and is often very useful in functions
that takes a lot of optional arguments.
In [113]: myfunc(p=3, debug=True, x=7)
evaluating myfunc for x = 7 using exponent p = 3
Out[113]: 343
31
2.9.2
In Python we can also create unnamed functions, using the lambda keyword:
In [114]: f1 = lambda x: x**2
# is equivalent to
def f2(x):
return x**2
In [115]: f1(2), f2(2)
Out[115]: (4, 4)
This technique is useful for example when we want to pass a simple function as an argument to another
function, like this:
In [116]: # map is a built-in python function
map(lambda x: x**2, range(-3,4))
Out[116]: [9, 4, 1, 0, 1, 4, 9]
In [117]: # in python 3 we can use `list(...)` to convert the iterator to an explicit list
list(map(lambda x: x**2, range(-3,4)))
Out[117]: [9, 4, 1, 0, 1, 4, 9]
2.10
Classes
Classes are the key features of object-oriented programming. A class is a structure for representing an object
and the operations that can be performed on the object.
In Python a class can contain attributes (variables) and methods (functions).
A class is defined almost like a function, but using the class keyword, and the class definition usually
contains a number of class method definitions (a function in a class).
Each class method should have an argument self as its first argument. This object is a self-reference.
Some class method names have special meaning, for example:
init : The name of the method that is invoked when the object is first created.
str : A method that is invoked when a simple string representation of the class is needed, as
for example when printed.
There are many more, see http://docs.python.org/2/reference/datamodel.html#special-methodnames
2.11
Modules
One of the most important concepts in good programming is to reuse code and avoid repetitions.
The idea is to write functions and classes with a well-defined purpose and scope, and reuse these instead
of repeating similar code in different part of a program (modular programming). The result is usually that
readability and maintainability of a program is greatly improved. What this means in practice is that our
programs have fewer bugs, are easier to extend and debug/troubleshoot.
Python supports modular programming at different levels. Functions and classes are examples of tools
for low-level modular programming. Python modules are a higher-level modular programming construct,
where we can collect related variables, functions and classes in a module. A python module is defined in
a python file (with file-ending .py), and it can be made accessible to other Python modules and programs
using the import statement.
Consider the following example: the file mymodule.py contains simple example implementations of a
variable, function and a class:
In [121]: %%file mymodule.py
"""
33
34
class MyClass
| Example class.
|
| Methods defined here:
|
init (self)
|
|
| get variable(self)
|
| set variable(self, new value)
|
Set self.variable to a new value
FUNCTIONS
my function()
Example function
DATA
my variable = 0
In [124]: mymodule.my_variable
Out[124]: 0
In [125]: mymodule.my_function()
Out[125]: 0
In [126]: my_class = mymodule.MyClass()
my_class.set_variable(10)
my_class.get_variable()
Out[126]: 10
If we make changes to the code in mymodule.py, we need to reload it using reload:
In [127]: reload(mymodule)
2.12
Exceptions
In Python errors are managed with a special language construct called Exceptions. When errors occur
exceptions can be raised, which interrupts the normal program flow and fallback to somewhere else in the
code where the closest try-except statement is defined.
To generate an exception we can use the raise statement, which takes an argument that must be an
instance of the class BaseException or a class derived from it.
In [128]: raise Exception("description of the error")
--------------------------------------------------------------------------Exception
<ipython-input-128-8f47ba831d5a> in <module>()
----> 1 raise Exception("description of the error")
36
2.13
Further reading
2.14
Versions
37
Chapter 3
3.1
Introduction
The numpy package (module) is used in almost all numerical computation using Python. It is a package
that provide high-performance vector, matrix and higher-dimensional data structures for Python. It is
implemented in C and Fortran so when calculations are vectorized (formulated with vectors and matrices),
performance is very good.
To use numpy you need to import the module, using for example:
In [2]: from numpy import *
In the numpy package the terminology used for vectors, matrices and higher-dimensional data sets is
array.
3.2
There are a number of ways to initialize new numpy arrays, for example from
a Python list or tuples
using functions that are dedicated to generating numpy arrays, such as arange, linspace, etc.
reading data from files
3.2.1
From lists
For example, to create new vector and matrix arrays from Python lists we can use the numpy.array function.
In [3]: # a vector: the argument to the array function is a Python list
v = array([1,2,3,4])
v
38
Out[11]: dtype('int64')
We get an error if we try to assign a value of the wrong type to an element in a numpy array:
In [12]: M[0,0] = "hello"
--------------------------------------------------------------------------ValueError
<ipython-input-12-a09d72434238> in <module>()
----> 1 M[0,0] = "hello"
2.+0.j],
4.+0.j]])
Common data types that can be used with dtype are: int, float, complex, bool, object, etc.
We can also explicitly define the bit size of the data types, for example: int64, int16, float128,
complex128.
3.2.2
For larger arrays it is inpractical to initialize the data manually, using explicit python lists. Instead we can
use one of the many functions in numpy that generate arrays of different forms. Some of the more common
are:
arange
In [14]: # create a range
x = arange(0, 10, 1) # arguments: start, stop, step
x
Out[14]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [15]: x = arange(-1, 1, 0.1)
x
Out[15]: array([ -1.00000000e+00,
-7.00000000e-01,
-4.00000000e-01,
-1.00000000e-01,
2.00000000e-01,
5.00000000e-01,
8.00000000e-01,
-9.00000000e-01, -8.00000000e-01,
-6.00000000e-01, -5.00000000e-01,
-3.00000000e-01, -2.00000000e-01,
-2.22044605e-16,
1.00000000e-01,
3.00000000e-01,
4.00000000e-01,
6.00000000e-01,
7.00000000e-01,
9.00000000e-01])
40
0.
,
1.66666667,
3.33333333,
5.
,
6.66666667,
8.33333333,
0.41666667,
2.08333333,
3.75
,
5.41666667,
7.08333333,
8.75
,
0.83333333,
2.5
,
4.16666667,
5.83333333,
7.5
,
9.16666667,
1.25
,
2.91666667,
4.58333333,
6.25
,
7.91666667,
9.58333333,
10.
1.00000000e+00,
2.80316249e+01,
7.85771994e+02,
2.20264658e+04])
3.03773178e+00,
8.51525577e+01,
2.38696456e+03,
9.22781435e+00,
2.58670631e+02,
7.25095809e+03,
mgrid
In [18]: x, y = mgrid[0:5, 0:5] # similar to meshgrid in MATLAB
In [19]: x
Out[19]: array([[0,
[1,
[2,
[3,
[4,
0,
1,
2,
3,
4,
0,
1,
2,
3,
4,
0,
1,
2,
3,
4,
0],
1],
2],
3],
4]])
1,
1,
1,
1,
1,
2,
2,
2,
2,
2,
3,
3,
3,
3,
3,
4],
4],
4],
4],
4]])
In [20]: y
Out[20]: array([[0,
[0,
[0,
[0,
[0,
random data
In [21]: from numpy import random
In [22]: # uniform random numbers in [0,1]
random.rand(5,5)
Out[22]: array([[
[
[
[
[
0.92932506,
0.18803573,
0.53700462,
0.9024635 ,
0.95876515,
0.19684255,
0.9312815 ,
0.02361381,
0.20860922,
0.29341553,
0.736434 ,
0.1284532 ,
0.97760688,
0.67729644,
0.37520629,
0.18125714,
0.38138008,
0.73296701,
0.68386687,
0.29194432,
0.70905038],
0.36646481],
0.23042324],
0.49385729],
0.64102804]])
])
diag
In [24]: # a diagonal matrix
diag([1,2,3])
Out[24]: array([[1, 0, 0],
[0, 2, 0],
[0, 0, 3]])
In [25]: # diagonal with offset from the main diagonal
diag([1,2,3], k=1)
Out[25]: array([[0,
[0,
[0,
[0,
1,
0,
0,
0,
0,
2,
0,
0,
0],
0],
3],
0]])
0.,
0.,
0.,
0.],
0.],
0.]])
1.,
1.,
1.,
1.],
1.],
1.]])
In [27]: ones((3,3))
Out[27]: array([[ 1.,
[ 1.,
[ 1.,
3.3
3.3.1
File I/O
Comma-separated values (CSV)
A very common file format for data files is comma-separated values (CSV), or related formats such as TSV
(tab-separated values). To read data from such files into Numpy arrays we can use the numpy.genfromtxt
function. For example,
In [28]: !head stockholm_td_adj.dat
1800
1800
1800
1800
1800
1800
1800
1800
1800
1800
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
1 9
1 10
-6.1
-15.4
-15.0
-19.3
-16.8
-11.4
-7.6
-7.1
-10.1
-9.5
-6.1
-15.4
-15.0
-19.3
-16.8
-11.4
-7.6
-7.1
-10.1
-9.5
-6.1
-15.4
-15.0
-19.3
-16.8
-11.4
-7.6
-7.1
-10.1
-9.5
1
1
1
1
1
1
1
1
1
1
42
Out[30]: (77431, 7)
In [31]: fig, ax = plt.subplots(figsize=(14,4))
ax.plot(data[:,0]+data[:,1]/12.0+data[:,2]/365, data[:,5])
ax.axis('tight')
ax.set_title('tempeatures in Stockholm')
ax.set_xlabel('year')
ax.set_ylabel('temperature (C)');
0.40043577,
0.4791374 ,
0.15459644,
0.66254019],
0.8237106 ],
0.96082399]])
In [33]: savetxt("random-matrix.csv", M)
In [34]: !cat random-matrix.csv
7.787257639287014088e-01 4.004357670697732408e-01 6.625401863466899854e-01
6.041006328761111543e-01 4.791373994963619154e-01 8.237105968088237473e-01
9.685631757740569281e-01 1.545964379103705877e-01 9.608239852111523094e-01
In [35]: savetxt("random-matrix.csv", M, fmt='%.5f') # fmt specifies the format
!cat random-matrix.csv
0.77873 0.40044 0.66254
0.60410 0.47914 0.82371
0.96856 0.15460 0.96082
3.3.2
Useful when storing and reading back numpy array data. Use the functions numpy.save and numpy.load:
In [36]: save("random-matrix.npy", M)
!file random-matrix.npy
43
random-matrix.npy: data
In [37]: load("random-matrix.npy")
Out[37]: array([[ 0.77872576,
[ 0.60410063,
[ 0.96856318,
3.4
0.40043577,
0.4791374 ,
0.15459644,
0.66254019],
0.8237106 ],
0.96082399]])
3.5
3.5.1
Manipulating arrays
Indexing
0.40043577,
0.4791374 ,
0.15459644,
0.66254019],
0.8237106 ],
0.96082399]])
In [44]: M[1]
Out[44]: array([ 0.60410063,
0.4791374 ,
0.8237106 ])
0.4791374 ,
0.8237106 ])
44
0.4791374 ,
0.15459644])
0.40043577,
0.4791374 ,
0.15459644,
0.66254019],
0.8237106 ],
0.96082399]])
3.5.2
0.40043577, -1.
0.
, -1.
0.15459644, -1.
],
],
]])
Index slicing
Index slicing is the technical name for the syntax M[lower:upper:step] to extract part of an array:
In [51]: A = array([1,2,3,4,5])
A
Out[51]: array([1, 2, 3, 4, 5])
In [52]: A[1:3]
Out[52]: array([2, 3])
Array slices are mutable: if they are assigned a new value the original array from which the slice was
extracted is modified:
In [53]: A[1:3] = [-2,-3]
A
Out[53]: array([ 1, -2, -3,
4,
5])
4,
5])
In [55]: A[::2] # step is 2, lower and upper defaults to the beginning and end of the array
Out[55]: array([ 1, -3,
5])
1,
11,
21,
31,
41,
2,
12,
22,
32,
42,
3,
13,
23,
33,
43,
4],
14],
24],
34],
44]])
3.5.3
Fancy indexing
Fancy indexing is the name for when an array or list is used in-place of an index:
In [64]: row_indices = [1, 2, 3]
A[row_indices]
Out[64]: array([[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34]])
In [65]: col_indices = [1, 2, -1] # remember, index -1 means the last element
A[row_indices, col_indices]
Out[65]: array([11, 22, 34])
46
We can also use index masks: If the index mask is an Numpy array of data type bool, then an element
is selected (True) or not (False) depending on the value of the index mask at the position of each element:
In [66]: B = array([n for n in range(5)])
B
Out[66]: array([0, 1, 2, 3, 4])
In [67]: row_mask = array([True, False, True, False, False])
B[row_mask]
Out[67]: array([0, 2])
In [68]: # same thing
row_mask = array([1,0,1,0,0], dtype=bool)
B[row_mask]
Out[68]: array([0, 2])
This feature is very useful to conditionally select elements from an array, using for example comparison
operators:
In [69]: x = arange(0, 10, 0.5)
x
Out[69]: array([ 0. ,
5.5,
0.5,
6. ,
1. ,
6.5,
1.5,
7. ,
2. ,
7.5,
2.5,
8. ,
3. ,
8.5,
3.5,
9. ,
4. , 4.5,
9.5])
5. ,
3.6
3.6.1
6. ,
6.5,
7. ])
The index mask can be converted to position index using the where function
In [72]: indices = where(mask)
indices
Out[72]: (array([11, 12, 13, 14]),)
In [73]: x[indices] # this indexing is equivalent to the fancy indexing x[mask]
Out[73]: array([ 5.5,
6. ,
6.5,
7. ])
47
3.6.2
diag
With the diag function we can also extract the diagonal and subdiagonals of an array:
In [74]: diag(A)
Out[74]: array([ 0, 11, 22, 33, 44])
In [75]: diag(A, -1)
Out[75]: array([10, 21, 32, 43])
3.6.3
take
0,
1,
2])
0,
2])
In [78]: v2.take(row_indices)
Out[78]: array([-2,
0,
2])
3.6.4
0,
0,
1,
2], row_indices)
2])
choose
3.7
5, -2])
Linear algebra
Vectorizing code is the key to writing efficient numerical calculation with Python/Numpy. That means
that as much as possible of a program should be formulated in terms of matrix and vector operations, like
matrix-matrix multiplication.
48
3.7.1
Scalar-array operations
We can use the usual arithmetic operators to multiply, add, subtract, and divide arrays with scalar numbers.
In [81]: v1 = arange(0, 5)
In [82]: v1 * 2
Out[82]: array([0, 2, 4, 6, 8])
In [83]: v1 + 2
Out[83]: array([2, 3, 4, 5, 6])
In [84]: A * 2, A + 2
Out[84]: (array([[ 0,
[20,
[40,
[60,
[80,
[12,
[22,
[32,
[42,
3.7.2
2,
22,
42,
62,
82,
13,
23,
33,
43,
4,
24,
44,
64,
84,
14,
24,
34,
44,
6,
26,
46,
66,
86,
15,
25,
35,
45,
8],
28],
48],
68],
88]]), array([[ 2,
16],
26],
36],
46]]))
3,
4,
5,
6],
When we add, subtract, multiply and divide arrays with each other, the default behaviour is element-wise
operations:
In [85]: A * A # element-wise multiplication
Out[85]: array([[
0,
1,
4,
9,
16],
[ 100, 121, 144, 169, 196],
[ 400, 441, 484, 529, 576],
[ 900, 961, 1024, 1089, 1156],
[1600, 1681, 1764, 1849, 1936]])
In [86]: v1 * v1
Out[86]: array([ 0,
1,
4,
9, 16])
If we multiply arrays with compatible shapes, we get an element-wise multiplication of each row:
In [87]: A.shape, v1.shape
Out[87]: ((5, 5), (5,))
In [88]: A * v1
Out[88]: array([[
[
[
[
[
0,
0,
0,
0,
0,
1,
11,
21,
31,
41,
4,
9, 16],
24, 39, 56],
44, 69, 96],
64, 99, 136],
84, 129, 176]])
49
3.7.3
Matrix algebra
What about matrix mutiplication? There are two ways. We can either use the dot function, which applies
a matrix-matrix, matrix-vector, or inner vector multiplication to its two arguments:
In [89]: dot(A, A)
Out[89]: array([[ 300,
[1300,
[2300,
[3300,
[4300,
310,
1360,
2410,
3460,
4510,
320,
1420,
2520,
3620,
4720,
330,
1480,
2630,
3780,
4930,
340],
1540],
2740],
3940],
5140]])
310,
1360,
2410,
3460,
4510,
320,
1420,
2520,
3620,
4720,
330,
1480,
2630,
3780,
4930,
340],
1540],
2740],
3940],
5140]])
In [95]: M * v
Out[95]: matrix([[ 30],
[130],
[230],
[330],
[430]])
In [96]: # inner product
v.T * v
Out[96]: matrix([[30]])
In [97]: # with matrix objects, standard matrix algebra applies
v + M*v
50
--------------------------------------------------------------------------ValueError
<ipython-input-100-995fb48ad0cc> in <module>()
----> 1 M * v
/Users/rob/miniconda/envs/py27-spl/lib/python2.7/site-packages/numpy/matrixlib/defmatrix.pyc in
339
if isinstance(other, (N.ndarray, list, tuple)) :
340
# This promotes 1-D vectors to row vectors
--> 341
return N.dot(self, asmatrix(other))
342
if isscalar(other) or not hasattr(other, ' rmul ') :
343
return N.dot(self, other)
3.7.4
Array/Matrix transformations
Above we have used the .T to transpose the matrix object v. We could also have used the transpose
function to accomplish the same thing.
Other mathematical functions that transform matrix objects are:
In [101]: C = matrix([[1j, 2j], [3j, 4j]])
C
Out[101]: matrix([[ 0.+1.j,
[ 0.+3.j,
0.+2.j],
0.+4.j]])
In [102]: conjugate(C)
Out[102]: matrix([[ 0.-1.j,
[ 0.-3.j,
0.-2.j],
0.-4.j]])
51
In [103]: C.H
Out[103]: matrix([[ 0.-1.j,
[ 0.-2.j,
0.-3.j],
0.-4.j]])
We can extract the real and imaginary parts of complex-valued arrays using real and imag:
In [104]: real(C) # same as: C.real
Out[104]: matrix([[ 0.,
[ 0.,
0.],
0.]])
2.],
4.]])
1.10714872],
1.32581766]])
In [107]: abs(C)
Out[107]: matrix([[ 1.,
[ 3.,
3.7.5
2.],
4.]])
Matrix computations
Inverse
In [108]: linalg.inv(C) # equivalent to C.I
Out[108]: matrix([[ 0.+2.j ,
[ 0.-1.5j,
0.-1.j ],
0.+0.5j]])
In [109]: C.I * C
Out[109]: matrix([[
[
1.00000000e+00+0.j,
0.00000000e+00+0.j,
4.44089210e-16+0.j],
1.00000000e+00+0.j]])
Determinant
In [110]: linalg.det(C)
Out[110]: (2.0000000000000004+0j)
In [111]: linalg.det(C.I)
Out[111]: (0.50000000000000011+0j)
3.7.6
Data processing
Often it is useful to store datasets in Numpy arrays. Numpy provides a number of functions to calculate
statistics of datasets in arrays.
For example, lets calculate some properties from the Stockholm temperature dataset used above.
In [112]: # reminder, the tempeature dataset is stored in the data variable:
shape(data)
Out[112]: (77431, 7)
52
mean
In [113]: # the temperature data is in column 3
mean(data[:,3])
Out[113]: 6.1971096847515854
The daily mean temperature in Stockholm over the last 200 years has been about 6.2 C.
standard deviations and variance
In [114]: std(data[:,3]), var(data[:,3])
Out[114]: (8.2822716213405734, 68.596023209663414)
min and max
In [115]: # lowest daily average temperature
data[:,3].min()
Out[115]: -25.800000000000001
In [116]: # highest daily average temperature
data[:,3].max()
Out[116]: 28.300000000000001
sum, prod, and trace
In [117]: d = arange(0, 10)
d
Out[117]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [118]: # sum up all elements
sum(d)
Out[118]: 45
In [119]: # product of all elements
prod(d+1)
Out[119]: 3628800
In [120]: # cummulative sum
cumsum(d)
Out[120]: array([ 0,
1,
3,
1,
40320,
2,
6,
362880, 3628800])
24,
120,
720,
5040,
3.7.7
We can compute with subsets of the data in an array using indexing, fancy indexing, and the other methods
of extracting data from an array (described above).
For example, lets go back to the temperature dataset:
In [123]: !head -n 3 stockholm_td_adj.dat
1800
1800
1800
1
1
1
1
2
3
-6.1
-15.4
-15.0
-6.1
-15.4
-15.0
-6.1 1
-15.4 1
-15.0 1
The dataformat is: year, month, day, daily average temperature, low, high, location.
If we are interested in the average temperature only in a particular month, say February, then we can
create a index mask and use it to select only the data for that month using:
In [124]: unique(data[:,1]) # the month column takes values from 1 to 12
Out[124]: array([
1.,
12.])
2.,
3.,
4.,
5.,
6.,
7.,
8.,
9.,
10.,
11.,
54
3.7.8
When functions such as min, max, etc. are applied to a multidimensional arrays, it is sometimes useful to
apply the calculation to the entire array, and sometimes only on a row or column basis. Using the axis
argument we can specify how these functions should behave:
In [128]: m = random.rand(3,3)
m
Out[128]: array([[ 0.2850926 ,
[ 0.80070487,
[ 0.11372793,
0.17302017,
0.45527067,
0.43608703,
0.17748378],
0.61277451],
0.87010206]])
0.45527067,
0.87010206])
0.80070487,
0.87010206])
Many other functions and methods in the array and matrix classes accept the same (optional) axis
keyword argument.
55
3.8
The shape of an Numpy array can be modified without copying the underlaying data, which makes it a fast
operation even for large arrays.
In [132]: A
Out[132]: array([[ 0,
[10,
[20,
[30,
[40,
1,
11,
21,
31,
41,
2,
12,
22,
32,
42,
3,
13,
23,
33,
43,
4],
14],
24],
34],
44]])
In [133]: n, m = A.shape
In [134]: B = A.reshape((1,n*m))
B
Out[134]: array([[ 0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31,
32, 33, 34, 40, 41, 42, 43, 44]])
In [135]: B[0,0:5] = 5 # modify the array
B
Out[135]: array([[ 5, 5, 5, 5, 5, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31,
32, 33, 34, 40, 41, 42, 43, 44]])
In [136]: A # and the original variable is also changed. B is only a different view of the same data
Out[136]: array([[ 5,
[10,
[20,
[30,
[40,
5,
11,
21,
31,
41,
5,
12,
22,
32,
42,
5,
13,
23,
33,
43,
5],
14],
24],
34],
44]])
We can also use the function flatten to make a higher-dimensional array into a vector. But this function
create a copy of the data.
In [137]: B = A.flatten()
B
Out[137]: array([ 5, 5, 5, 5, 5, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31,
32, 33, 34, 40, 41, 42, 43, 44])
In [138]: B[0:5] = 10
B
Out[138]: array([10, 10, 10, 10, 10, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31,
32, 33, 34, 40, 41, 42, 43, 44])
In [139]: A # now A has not changed, because B's data is a copy of A's, not refering to the same data
Out[139]: array([[ 5,
[10,
[20,
[30,
[40,
5,
11,
21,
31,
41,
5,
12,
22,
32,
42,
5,
13,
23,
33,
43,
5],
14],
24],
34],
44]])
56
3.9
With newaxis, we can insert new dimensions in an array, for example converting a vector to a column or
row matrix:
In [140]: v = array([1,2,3])
In [141]: shape(v)
Out[141]: (3,)
In [142]: # make a column matrix of the vector v
v[:, newaxis]
Out[142]: array([[1],
[2],
[3]])
In [143]: # column matrix
v[:,newaxis].shape
Out[143]: (3, 1)
In [144]: # row matrix
v[newaxis,:].shape
Out[144]: (1, 3)
3.10
Using function repeat, tile, vstack, hstack, and concatenate we can create larger vectors and matrices
from smaller ones:
3.10.1
3.10.2
concatenate
3.10.3
In [151]: vstack((a,b))
Out[151]: array([[1, 2],
[3, 4],
[5, 6]])
In [152]: hstack((a,b.T))
Out[152]: array([[1, 2, 5],
[3, 4, 6]])
3.11
To achieve high performance, assignments in Python usually do not copy the underlaying objects. This is
important for example when objects are passed between functions, to avoid an excessive amount of memory
copying when it is not necessary (technical term: pass by reference).
In [153]: A = array([[1, 2], [3, 4]])
A
Out[153]: array([[1, 2],
[3, 4]])
In [154]: # now B is referring to the same array data as A
B = A
In [155]: # changing B affects A
B[0,0] = 10
B
Out[155]: array([[10,
[ 3,
2],
4]])
In [156]: A
Out[156]: array([[10,
[ 3,
2],
4]])
If we want to avoid this behavior, so that when we get a new completely independent object B copied
from A, then we need to do a so-called deep copy using the function copy:
In [157]: B = copy(A)
In [158]: # now, if we modify B, A is not affected
B[0,0] = -5
B
Out[158]: array([[-5,
[ 3,
2],
4]])
In [159]: A
Out[159]: array([[10,
[ 3,
2],
4]])
58
3.12
Generally, we want to avoid iterating over the elements of arrays whenever we can (at all costs). The reason
is that in a interpreted language like Python (or MATLAB), iterations are really slow compared to vectorized
operations.
However, sometimes iterations are unavoidable. For such cases, the Python for loop is the most convenient way to iterate over an array:
In [160]: v = array([1,2,3,4])
for element in v:
print(element)
1
2
3
4
In [161]: M = array([[1,2], [3,4]])
for row in M:
print("row", row)
for element in row:
print(element)
('row', array([1, 2]))
1
2
('row', array([3, 4]))
3
4
When we need to iterate over each element of an array and modify its elements, it is convenient to use
the enumerate function to obtain both the element and its index in the for loop:
In [162]: for row_idx, row in enumerate(M):
print("row_idx", row_idx, "row", row)
for col_idx, element in enumerate(row):
print("col_idx", col_idx, "element", element)
# update the matrix M: square each element
M[row_idx, col_idx] = element ** 2
('row
('col
('col
('row
('col
('col
idx',
idx',
idx',
idx',
idx',
idx',
0,
0,
1,
1,
0,
1,
3.13
Vectorizing functions
As mentioned several times by now, to get good performance we should try to avoid looping over elements
in our vectors and matrices, and instead use vectorized algorithms. The first step in converting a scalar
algorithm to a vectorized algorithm is to make sure that the functions we write work with vector inputs.
In [164]: def Theta(x):
"""
Scalar implemenation of the Heaviside step function.
"""
if x >= 0:
return 1
else:
return 0
In [165]: Theta(array([-3,-2,-1,0,1,2,3]))
--------------------------------------------------------------------------ValueError
<ipython-input-165-6658efdd2f22> in <module>()
----> 1 Theta(array([-3,-2,-1,0,1,2,3]))
<ipython-input-164-9a0cb13d93d4> in Theta(x)
3
Scalar implemenation of the Heaviside step function.
4
"""
----> 5
if x >= 0:
6
return 1
7
else:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or
OK, that didnt work because we didnt write the Theta function so that it can handle a vector input. . .
To get a vectorized version of Theta we can use the Numpy function vectorize. In many cases it can
automatically vectorize a function:
In [166]: Theta_vec = vectorize(Theta)
In [167]: Theta_vec(array([-3,-2,-1,0,1,2,3]))
Out[167]: array([0, 0, 0, 1, 1, 1, 1])
We can also implement the function to accept a vector input from the beginning (requires more effort
but might give better performance):
In [168]: def Theta(x):
"""
Vector-aware implemenation of the Heaviside step function.
"""
return 1 * (x >= 0)
60
In [169]: Theta(array([-3,-2,-1,0,1,2,3]))
Out[169]: array([0, 0, 0, 1, 1, 1, 1])
In [170]: # still works for scalars as well
Theta(-1.2), Theta(2.6)
Out[170]: (0, 1)
3.14
When using arrays in conditions,for example if statements and other boolean expressions, one needs to use
any or all, which requires that any or all elements in the array evalutes to True:
In [171]: M
Out[171]: array([[ 1, 4],
[ 9, 16]])
In [172]: if (M > 5).any():
print("at least one element in M is larger than 5")
else:
print("no element in M is larger than 5")
at least one element in M is larger than 5
In [173]: if (M > 5).all():
print("all elements in M are larger than 5")
else:
print("all elements in M are not larger than 5")
all elements in M are not larger than 5
3.15
Type casting
Since Numpy arrays are statically typed, the type of an array does not change once created. But we can
explicitly cast an array of some type to another using the astype functions (see also the similar asarray
function). This always create a new array of new type:
In [174]: M.dtype
Out[174]: dtype('int64')
In [175]: M2 = M.astype(float)
M2
Out[175]: array([[
[
1.,
9.,
4.],
16.]])
In [176]: M2.dtype
Out[176]: dtype('float64')
In [177]: M3 = M.astype(bool)
M3
Out[177]: array([[ True,
[ True,
True],
True]], dtype=bool)
61
3.16
Further reading
http://numpy.scipy.org
http://scipy.org/Tentative NumPy Tutorial
http://scipy.org/NumPy for Matlab Users - A Numpy guide for MATLAB users.
3.17
Versions
62
Chapter 4
4.1
Introduction
The SciPy framework builds on top of the low-level NumPy framework for multidimensional arrays, and
provides a large number of higher-level scientific algorithms. Some of the topics that SciPy covers are:
Each of these submodules provides a number of functions and classes that can be used to solve problems
in their respective topics.
In this lecture we will look at how to use some of these subpackages.
To access the SciPy package in a Python program, we start by importing everything from the scipy
module.
In [2]: from scipy import *
If we only need to use part of the SciPy framework we can selectively include only those modules we are
interested in. For example, to include the linear algebra package under the name la, we can do:
63
4.2
Special functions
A large number of mathematical special functions are important for many computional physics problems.
SciPy provides implementations of a very extensive set of special functions. For details, see the list of
functions in the reference documention at http://docs.scipy.org/doc/scipy/reference/special.html#modulescipy.special.
To demonstrate the typical usage of special functions we will look in more detail at the Bessel functions:
In [4]: #
# The scipy.special module includes a large number of Bessel-functions
# Here we will use the functions jn and yn, which are the Bessel functions
# of the first and second kind and real-valued order. We also include the
# function jn_zeros and yn_zeros that gives the zeroes of the functions jn
# and yn.
#
from scipy.special import jn, yn, jn_zeros, yn_zeros
In [5]: n = 0
x = 0.0
# order
64
4.3
4.3.1
2.40482556,
5.52007811,
8.65372791,
11.79153444])
Integration
Numerical integration: quadrature
is called numerical quadrature, or simply quadature. SciPy provides a series of functions for different
kind of quadrature, for example the quad, dblquad and tplquad for single, double and triple integrals,
respectively.
In [8]: from scipy.integrate import quad, dblquad, tplquad
The quad function takes a large number of optional arguments, which can be used to fine-tune the
behaviour of the function (try help(quad) for details).
The basic usage is as follows:
In [9]: # define a simple function for the integrand
def f(x):
return x
In [10]: x_lower = 0 # the lower limit of x
x_upper = 1 # the upper limit of x
65
analytical = sqrt(pi)
print "analytical =", analytical
numerical = 1.77245385091 1.42026367809e-08
analytical = 1.77245385091
As show in the example above, we can also use Inf or -Inf as integral limits.
Higher-dimensional integration works in the same way:
In [13]: def integrand(x, y):
return exp(-x**2-y**2)
x_lower
x_upper
y_lower
y_upper
=
=
=
=
0
10
0
10
66
0.785398163397 1.63822994214e-13
Note how we had to pass lambda functions for the limits for the y integration, since these in general can
be functions of x.
4.4
SciPy provides two different ways to solve ODEs: An API based on the function odeint, and object-oriented
API based on the class ode. Usually odeint is easier to get started with, but the ode class offers some finer
level of control.
Here we will use the odeint functions. For more information about the class ode, try help(ode). It
does pretty much the same thing as odeint, but in an object-oriented fashion.
To use odeint, first import it from the scipy.integrate module
In [14]: from scipy.integrate import odeint, ode
A system of ODEs are usually formulated on standard form before it is attacked numerically. The
standard form is:
y 0 = f (y, t)
where
y = [y1 (t), y2 (t), ..., yn (t)]
and f is some function that gives the derivatives of the function yi (t). To solve an ODE we need to know
the function f and an initial condition, y(0).
Note that higher-order ODEs can always be written in this form by introducing new variables for the
intermediate derivatives.
Once we have defined the Python function f and array y 0 (that is f and y(0) in the mathematical
formulation), we can use the odeint function as:
y_t = odeint(f, y_0, t)
where t is and array with time-coordinates for which to solve the ODE problem. y t is an array with
one row for each point in time in t, where each column corresponds to a solution y i(t) at that point in
time.
We will see how we can implement f and y 0 in Python code in the examples below.
Example: double pendulum
Lets consider a physical example: The double compound pendulum, described in some detail here:
http://en.wikipedia.org/wiki/Double pendulum
In [15]: Image(url='http://upload.wikimedia.org/wikipedia/commons/c/c9/Double-compound-pendulum-dimensio
Out[15]: <IPython.core.display.Image object>
The equations of motion of the pendulum are given on the wiki page:
6 2p1 3 cos(1 2 )p2
1 = m`
2
169 cos2 (1 2 )
1 2 )p1
2 = 6 2 8p2 3 cos(
.
2
m`
169
h cos (1 2 )
i
1
p1 = 2 m`2 1 2 sin(1 2 ) + 3 g` sin 1
h
i
p2 = 21 m`2 1 2 sin(1 2 ) + g` sin 2
To make the Python code simpler to follow, lets introduce new variable names and the vector notation:
x = [1 , 2 , p1 , p2 ]
6 2x3 3 cos(x1 x2 )x4
x 1 = m`
2 169 cos2 (x x )
1
2
67
In [16]: g = 9.82
L = 0.5
m = 0.1
def dx(x, t):
"""
The right-hand side of the pendulum ODE
"""
x1, x2, x3, x4 = x[0], x[1], x[2], x[3]
dx1
dx2
dx3
dx4
=
=
=
=
6.0/(m*L**2) * (2
6.0/(m*L**2) * (8
-0.5 * m * L**2 *
-0.5 * m * L**2 *
x1 = + L * sin(x[:, 0])
y1 = - L * cos(x[:, 0])
x2 = x1 + L * sin(x[:, 1])
y2 = y1 - L * cos(x[:, 1])
axes[1].plot(x1, y1, 'r', label="pendulum1")
axes[1].plot(x2, y2, 'b', label="pendulum2")
axes[1].set_ylim([-1, 0])
axes[1].set_xlim([1, -1]);
68
Simple annimation of the pendulum motion. We will see how to make better animation in Lecture 4.
In [21]: from IPython.display import display, clear_output
import time
In [22]: fig, ax = plt.subplots(figsize=(4,4))
for t_idx, tt in enumerate(t[:200]):
x1 = + L * sin(x[t_idx, 0])
y1 = - L * cos(x[t_idx, 0])
x2 = x1 + L * sin(x[t_idx, 1])
y2 = y1 - L * cos(x[t_idx, 1])
ax.cla()
ax.plot([0, x1], [0, y1], 'r.-')
ax.plot([x1, x2], [y1, y2], 'b.-')
ax.set_ylim([-1.5, 0.5])
ax.set_xlim([1, -1])
clear_output()
display(fig)
time.sleep(0.1)
69
70
=
=
=
=
odeint(dy,
odeint(dy,
odeint(dy,
odeint(dy,
y0,
y0,
y0,
y0,
t,
t,
t,
t,
args=(0.0,
args=(0.2,
args=(1.0,
args=(5.0,
w0))
w0))
w0))
w0))
#
#
#
#
undamped
under damped
critial damping
over damped
label="undamped", linewidth=0.25)
label="under damped")
label=r"critical damping")
label="over damped")
71
4.5
Fourier transform
Fourier transforms are one of the universal tools in computational physics, which appear over and over again
in different contexts. SciPy provides functions for accessing the classic FFTPACK library from NetLib,
which is an efficient and well tested FFT library written in FORTRAN. The SciPy API has a few additional
convenience functions, but overall the API is closely related to the original FORTRAN library.
To use the fftpack module in a python program, include it using:
In [28]: from numpy.fft import fftfreq
from scipy.fftpack import *
To demonstrate how to do a fast Fourier transform with SciPy, lets look at the FFT of the solution to
the damped oscillator from the previous section:
In [29]: N = len(t)
dt = t[1]-t[0]
# calculate the fast fourier transform
# y2 is the solution to the under-damped oscillator from the previous section
F = fft(y2[:,0])
# calculate the frequencies for the components in F
w = fftfreq(N, dt)
In [30]: fig, ax = plt.subplots(figsize=(9,3))
ax.plot(w, abs(F));
72
Since the signal is real, the spectrum is symmetric. We therefore only need to plot the part that corresponds
to the postive frequencies. To extract that part of the w and F we can use some of the indexing tricks for
NumPy arrays that we saw in Lecture 2:
In [31]: indices = where(w > 0) # select only indices for elements that corresponds to positive frequenc
w_pos = w[indices]
F_pos = F[indices]
In [32]: fig, ax = plt.subplots(figsize=(9,3))
ax.plot(w_pos, abs(F_pos))
ax.set_xlim(0, 5);
As expected, we now see a peak in the spectrum that is centered around 1, which is the frequency we used
in the damped oscillator example.
4.6
Linear algebra
The linear algebra module contains a lot of matrix related functions, including linear equation solving, eigenvalue solvers, matrix functions (for example matrix-exponentiation), a number of different decompositions
(SVD, LU, cholesky), etc.
Detailed documetation is available at: http://docs.scipy.org/doc/scipy/reference/linalg.html
Here we will look at how to use some of these functions:
73
4.6.1
0.66666667,
0.
])
In [36]: # check
dot(A, x) - b
Out[36]: array([ -1.11022302e-16,
0.00000000e+00,
0.00000000e+00])
0.38437594],
0.66370273],
0.39801748]])
In [40]: # check
norm(dot(A, X) - B)
Out[40]: 2.0014830212433605e-16
4.6.2
0.33612878+0.j, -0.28229973+0.j])
0.33612878+0.j, -0.28229973+0.j])
In [45]: evecs
Out[45]: array([[-0.20946865, -0.48428024, -0.14392087],
[-0.79978578, 0.8616452 , -0.79527482],
[-0.56255275, 0.15178997, 0.58891829]])
The eigenvectors corresponding to the nth eigenvalue (stored in evals[n]) is the nth column in evecs,
i.e., evecs[:,n]. To verify this, lets try mutiplying eigenvectors with the matrix and compare to the product
of the eigenvector and the eigenvalue:
In [46]: n = 1
norm(dot(A, evecs[:,n]) - evals[n] * evecs[:,n])
Out[46]: 3.243515426387745e-16
There are also more specialized eigensolvers, like the eigh for Hermitian matrices.
4.6.3
Matrix operations
4.6.4
Sparse matrices
Sparse matrices are often useful in numerical simulations dealing with large systems, if the problem can be
described in matrix form where the matrices or vectors mostly contains zeros. Scipy has a good support for
sparse matrices, with basic linear algebra operations (such as equation solving, eigenvalue calculations, etc).
There are many possible strategies for storing sparse matrices in an efficient way. Some of the most
common are the so-called coordinate form (COO), list of list (LIL) form, and compressed-sparse column CSC
(and row, CSR). Each format has some advantanges and disadvantages. Most computational algorithms
(equation solving, matrix-matrix multiplication, etc) can be efficiently implemented using CSR or CSC
formats, but they are not so intuitive and not so easy to initialize. So often a sparse matrix is initially
created in COO or LIL format (where we can efficiently add elements to the sparse matrix data), and then
converted to CSC or CSR before used in real calcalations.
For more information about these sparse formats, see e.g. http://en.wikipedia.org/wiki/Sparse matrix
When we create a sparse matrix we have to choose which format it should be stored in. For example,
In [50]: from scipy.sparse import *
In [51]: # dense matrix
M = array([[1,0,0,0], [0,3,0,0], [0,1,1,0], [1,0,0,1]]); M
75
Out[51]: array([[1,
[0,
[0,
[1,
0,
3,
1,
0,
0,
0,
1,
0,
0],
0],
0],
1]])
0,
3,
1,
0,
0,
0,
1,
0,
0],
0],
0],
1]])
More efficient way to create sparse matrices: create an empty matrix and populate with using matrix
indexing (avoids creating a potentially large dense matrix)
In [54]: A = lil_matrix((4,4)) # empty 4x4 sparse matrix
A[0,0] = 1
A[1,1] = 3
A[2,2] = A[2,1] = 1
A[3,3] = A[3,0] = 1
A
Out[54]: <4x4 sparse matrix of type '<type 'numpy.float64'>'
with 6 stored elements in LInked List format>
In [55]: A.todense()
Out[55]: matrix([[
[
[
[
1.,
0.,
0.,
1.,
0.,
3.,
1.,
0.,
0.,
0.,
1.,
0.,
0.],
0.],
0.],
1.]])
In [59]: A.todense()
Out[59]: matrix([[
[
[
[
1.,
0.,
0.,
1.,
0.,
3.,
1.,
0.,
0.,
0.,
1.,
0.,
0.],
0.],
0.],
1.]])
0.,
9.,
4.,
0.,
0.,
0.,
1.,
0.,
0.],
0.],
0.],
1.]])
0.,
3.,
1.,
0.,
0.,
0.,
1.,
0.,
0.],
0.],
0.],
1.]])
0.,
0.,
1.,
0.,
0.],
0.],
0.],
1.]])
In [60]: (A * A).todense()
Out[60]: matrix([[
[
[
[
1.,
0.,
0.,
2.,
In [61]: A.todense()
Out[61]: matrix([[
[
[
[
1.,
0.,
0.,
1.,
In [62]: A.dot(A).todense()
Out[62]: matrix([[
[
[
[
1.,
0.,
0.,
2.,
0.,
9.,
4.,
0.,
In [63]: v = array([1,2,3,4])[:,newaxis]; v
Out[63]: array([[1],
[2],
[3],
[4]])
In [64]: # sparse matrix - dense vector multiplication
A * v
Out[64]: array([[
[
[
[
1.],
6.],
5.],
5.]])
4.7
1.],
6.],
5.],
5.]])
Optimization
Optimization (finding minima or maxima of a function) is a large field in mathematics, and optimization of complicated functions or in many variables can be rather involved. Here we will only look at a
few very simple cases. For a more detailed introduction to optimization with SciPy see: http://scipylectures.github.com/advanced/mathematical optimization/index.html
To use the optimization module in scipy first include the optimize module:
In [66]: from scipy import optimize
77
4.7.1
Finding a minima
Lets first look at how to find the minima of a simple function of a single variable:
In [67]: def f(x):
return 4*x**3 + (x-2)**2 + x**4
In [68]: fig, ax = plt.subplots()
x = linspace(-5, 3, 100)
ax.plot(x, f(x));
We can use the fmin bfgs function to find the minima of a function:
In [69]: x_min = optimize.fmin_bfgs(f, -2)
x_min
Optimization terminated successfully.
Current function value: -3.506641
Iterations: 6
Function evaluations: 30
Gradient evaluations: 10
Out[69]: array([-2.67298164])
In [70]: optimize.fmin_bfgs(f, 0.5)
Optimization terminated successfully.
Current function value: 2.804988
Iterations: 3
Function evaluations: 15
Gradient evaluations: 5
78
4.7.2
To find the root for a function of the form f (x) = 0 we can use the fsolve function. It requires an initial
guess:
4.8
Interpolation
Interpolation is simple and convenient in scipy: The interp1d function, when given arrays describing X and
Y data, returns and object that behaves like a function that can be called for an arbitrary value of x (in the
range covered by X), and it returns the corresponding interpolated y value:
In [78]: from scipy.interpolate import *
In [79]: def f(x):
return sin(x)
In [80]: n = arange(0, 10)
x = linspace(0, 9, 100)
y_meas = f(n) + 0.1 * randn(len(n)) # simulate measurement with noise
y_real = f(x)
linear_interpolation = interp1d(n, y_meas)
y_interp1 = linear_interpolation(x)
cubic_interpolation = interp1d(n, y_meas, kind='cubic')
y_interp2 = cubic_interpolation(x)
In [81]: fig, ax = plt.subplots(figsize=(10,4))
ax.plot(n, y_meas, 'bs', label='noisy data')
ax.plot(x, y_real, 'k', lw=2, label='true function')
ax.plot(x, y_interp1, 'r', label='linear interp')
ax.plot(x, y_interp2, 'g', label='cubic interp')
ax.legend(loc=3);
80
4.9
Statistics
The scipy.stats module contains a large number of statistical distributions, statistical functions and tests.
For a complete documentation of its features, see http://docs.scipy.org/doc/scipy/reference/stats.html.
There is also a very powerful python package for statistical modelling called statsmodels. See
http://statsmodels.sourceforge.net for more details.
In [82]: from scipy import stats
In [83]: # create a (discreet) random variable with poissionian distribution
X = stats.poisson(3.5) # photon distribution for a coherent state with n=3.5 photons
In [84]: n = arange(0,15)
fig, axes = plt.subplots(3,1, sharex=True)
# plot the probability mass function (PMF)
axes[0].step(n, X.pmf(n))
# plot the commulative distribution function (CDF)
axes[1].step(n, X.cdf(n))
# plot histogram of 1000 random realizations of the stochastic variable X
axes[2].hist(X.rvs(size=1000));
81
Statistics:
In [87]: X.mean(), X.std(), X.var() # poission distribution
Out[87]: (3.5, 1.8708286933869707, 3.5)
In [88]: Y.mean(), Y.std(), Y.var() # normal distribution
Out[88]: (0.0, 1.0, 1.0)
4.9.1
Statistical tests
Test if two sets of (independent) random data comes from the same distribution:
In [89]: t_statistic, p_value = stats.ttest_ind(X.rvs(size=1000), X.rvs(size=1000))
print "t-statistic =", t_statistic
print "p-value =", p_value
t-statistic = -0.901953297251
p-value = 0.367190391714
82
Since the p value is very large we cannot reject the hypothesis that the two sets of random data have
different means.
To test if the mean of a single sample of data has mean 0.1 (the true mean is 0.0):
In [90]: stats.ttest_1samp(Y.rvs(size=1000), 0.1)
Out[90]: Ttest 1sampResult(statistic=-3.1644288210071765, pvalue=0.0016008455559249511)
Low p-value means that we can reject the hypothesis that the mean of Y is 0.1.
In [91]: Y.mean()
Out[91]: 0.0
In [92]: stats.ttest_1samp(Y.rvs(size=1000), Y.mean())
Out[92]: Ttest 1sampResult(statistic=2.2098772438652992, pvalue=0.027339807364469011)
4.10
Further reading
4.11
Versions
83
Chapter 5
5.1
Introduction
Matplotlib is an excellent 2D and 3D graphics library for generating scientific figures. Some of the many
advantages of this library include:
One of the key features of matplotlib that I would like to emphasize, and that I think makes matplotlib
highly suitable for generating figures for scientific publications is that all aspects of the figure can be controlled
programmatically. This is important for reproducibility and convenient when one needs to regenerate the
figure with updated data or change its appearance.
More information at the Matplotlib web page: http://matplotlib.org/
To get started using Matplotlib in a Python program, either include the symbols from the pylab module
(the easy way):
In [2]: from pylab import *
or import the matplotlib.pyplot module under the name plt (the tidy way):
In [3]: import matplotlib
import matplotlib.pyplot as plt
In [4]: import numpy as np
84
5.2
MATLAB-like API
The easiest way to get started with plotting using matplotlib is often to use the MATLAB-like API provided
by matplotlib.
It is designed to be compatible with MATLABs plotting functions, so it is easy to get started with if
you are familiar with MATLAB.
To use this API from matplotlib, we need to include the symbols in the pylab module:
In [5]: from pylab import *
5.2.1
Example
Most of the plotting related functions in MATLAB are covered by the pylab module. For example, subplot
and color/symbol selection:
In [8]: subplot(1,2,1)
plot(x, y, 'r--')
subplot(1,2,2)
plot(y, x, 'g*-');
85
The good thing about the pylab MATLAB-style API is that it is easy to get started with if you are familiar
with MATLAB, and it has a minumum of coding overhead for simple plots.
However, Id encourrage not using the MATLAB compatible API for anything but the simplest figures.
Instead, I recommend learning and using matplotlibs object-oriented plotting API. It is remarkably
powerful. For advanced figures with subplots, insets and other components it is very nice to work with.
5.3
The main idea with object-oriented programming is to have objects that one can apply functions and actions
on, and no object or program states should be global (such as the MATLAB-like API). The real advantage
of this approach becomes apparent when more than one figure is created, or when a figure contains more
than one subplot.
To use the object-oriented API we start out very much like in the previous example, but instead of
creating a new global figure instance we store a reference to the newly created figure instance in the fig
variable, and from it we create a new axis instance axes using the add axes method in the Figure class
instance fig:
In [9]: fig = plt.figure()
axes = fig.add_axes([0.1, 0.1, 0.8, 0.8]) # left, bottom, width, height (range 0 to 1)
axes.plot(x, y, 'r')
axes.set_xlabel('x')
axes.set_ylabel('y')
axes.set_title('title');
86
Although a little bit more code is involved, the advantage is that we now have full control of where the plot
axes are placed, and we can easily add more than one axis to the figure:
In [10]: fig = plt.figure()
axes1 = fig.add_axes([0.1, 0.1, 0.8, 0.8]) # main axes
axes2 = fig.add_axes([0.2, 0.5, 0.4, 0.3]) # inset axes
# main figure
axes1.plot(x, y, 'r')
axes1.set_xlabel('x')
axes1.set_ylabel('y')
axes1.set_title('title')
# insert
axes2.plot(y, x, 'g')
axes2.set_xlabel('y')
axes2.set_ylabel('x')
axes2.set_title('insert title');
87
If we dont care about being explicit about where our plot axes are placed in the figure canvas, then we can
use one of the many axis layout managers in matplotlib. My favorite is subplots, which can be used like
this:
In [11]: fig, axes = plt.subplots()
axes.plot(x, y, 'r')
axes.set_xlabel('x')
axes.set_ylabel('y')
axes.set_title('title');
88
89
That was easy, but it isnt so pretty with overlapping figure axes and labels, right?
We can deal with that by using the fig.tight layout method, which automatically adjusts the positions
of the axes on the figure canvas so that there is no overlapping content:
In [13]: fig, axes = plt.subplots(nrows=1, ncols=2)
for ax in axes:
ax.plot(x, y, 'r')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_title('title')
fig.tight_layout()
90
5.3.1
Matplotlib allows the aspect ratio, DPI and figure size to be specified when the Figure object is created,
using the figsize and dpi keyword arguments. figsize is a tuple of the width and height of the figure in
inches, and dpi is the dots-per-inch (pixel per inch). To create an 800x400 pixel, 100 dots-per-inch figure,
we can do:
In [14]: fig = plt.figure(figsize=(8,4), dpi=100)
<matplotlib.figure.Figure at 0x8065320>
The same arguments can also be passed to layout managers, such as the subplots function:
In [15]: fig, axes = plt.subplots(figsize=(12,3))
axes.plot(x, y, 'r')
axes.set_xlabel('x')
axes.set_ylabel('y')
axes.set_title('title');
91
5.3.2
Saving figures
To save a figure to a file we can use the savefig method in the Figure class:
In [16]: fig.savefig("filename.png")
Here we can also optionally specify the DPI and choose between different output formats:
In [17]: fig.savefig("filename.png", dpi=200)
What formats are available and which ones should be used for best quality?
Matplotlib can generate high-quality output in a number formats, including PNG, JPG, EPS, SVG, PGF
and PDF. For scientific papers, I recommend using PDF whenever possible. (LaTeX documents compiled
with pdflatex can include PDFs using the includegraphics command). In some cases, PGF can also be
good alternative.
5.3.3
Now that we have covered the basics of how to create a figure canvas and add axes instances to the canvas,
lets look at how decorate a figure with titles, axis labels, and legends.
Figure titles
A title can be added to each axis instance in a figure. To set the title, use the set title method in the
axes instance:
In [18]: ax.set_title("title");
Axis labels
Similarly, with the methods set xlabel and set ylabel, we can set the labels of the X and Y axes:
In [19]: ax.set_xlabel("x")
ax.set_ylabel("y");
Legends
Legends for curves in a figure can be added in two ways. One method is to use the legend method of
the axis object and pass a list/tuple of legend texts for the previously defined curves:
In [20]: ax.legend(["curve1", "curve2", "curve3"]);
The method described above follows the MATLAB API. It is somewhat prone to errors and unflexible if
curves are added to or removed from the figure (resulting in a wrongly labelled curve).
A better method is to use the label="label text" keyword argument when plots or other objects are
added to the figure, and then using the legend method without arguments to add the legend to the figure:
92
93
5.3.4
The figure above is functional, but it does not (yet) satisfy the criteria for a figure used in a publication.
First and foremost, we need to have LaTeX formatted text, and second, we need to be able to adjust the
font size to appear right in a publication.
Matplotlib has great support for LaTeX. All we need to do is to use dollar signs encapsulate LaTeX in
any text (legend, title, label, etc.). For example, "$y=x^3$".
But here we can run into a slightly subtle problem with LaTeX code and Python text strings. In LaTeX,
we frequently use the backslash in commands, for example \alpha to produce the symbol . But the
backslash already has a meaning in Python strings (the escape code character). To avoid Python messing
up our latex code, we need to use raw text strings. Raw text strings are prepended with an r, like
r"\alpha" or r\alpha instead of "\alpha" or \alpha:
In [24]: fig, ax = plt.subplots()
ax.plot(x, x**2, label=r"$y = \alpha^2$")
ax.plot(x, x**3, label=r"$y = \alpha^3$")
ax.legend(loc=2) # upper left corner
ax.set_xlabel(r'$\alpha$', fontsize=18)
ax.set_ylabel(r'$y$', fontsize=18)
ax.set_title('title');
We can also change the global font size and font family, which applies to all text elements in a figure (tick
labels, axis labels and titles, legends, etc.):
In [25]: # Update the matplotlib configuration parameters:
matplotlib.rcParams.update({'font.size': 18, 'font.family': 'serif'})
In [26]: fig, ax = plt.subplots()
94
95
Or, alternatively, we can request that matplotlib uses LaTeX to render the text elements in the figure:
In [29]: matplotlib.rcParams.update({'font.size': 18, 'text.usetex': True})
In [30]: fig, ax = plt.subplots()
ax.plot(x, x**2, label=r"$y = \alpha^2$")
ax.plot(x, x**3, label=r"$y = \alpha^3$")
ax.legend(loc=2) # upper left corner
ax.set_xlabel(r'$\alpha$')
ax.set_ylabel(r'$y$')
ax.set_title('title');
96
In [31]: # restore
matplotlib.rcParams.update({'font.size': 12, 'font.family': 'sans', 'text.usetex': False})
5.3.5
Colors
With matplotlib, we can define the colors of lines and other graphical elements in a number of ways. First
of all, we can use the MATLAB-like syntax where b means blue, g means green, etc. The MATLAB
API for selecting line styles are also supported: where, for example, b.- means a blue line with dots:
In [32]: # MATLAB style line color and style
ax.plot(x, x**2, 'b.-') # blue line with dots
ax.plot(x, x**3, 'g--') # green dashed line
Out[32]: [<matplotlib.lines.Line2D at 0x96df0b8>]
We can also define colors by their names or RGB hex codes and optionally provide an alpha value using
the color and alpha keyword arguments:
In [33]: fig, ax = plt.subplots()
ax.plot(x, x+1, color="red", alpha=0.5) # half-transparant red
ax.plot(x, x+2, color="#1155dd")
# RGB hex code for a bluish color
ax.plot(x, x+3, color="#15cc55")
# RGB hex code for a greenish color
Out[33]: [<matplotlib.lines.Line2D at 0x6fbc048>]
97
x+1,
x+2,
x+3,
x+4,
color="blue",
color="blue",
color="blue",
color="blue",
# possible
ax.plot(x,
ax.plot(x,
ax.plot(x,
linestype options
x+5, color="red",
x+6, color="red",
x+7, color="red",
linewidth=0.25)
linewidth=0.50)
linewidth=1.00)
linewidth=2.00)
# custom dash
line, = ax.plot(x, x+8, color="black", lw=1.50)
line.set_dashes([5, 10, 15, 10]) # format: line length, space length, ...
# possible
ax.plot(x,
ax.plot(x,
ax.plot(x,
ax.plot(x,
marker symbols: marker = '+', 'o', '*', 's', ',', '.', '1', '2', '3', '4', ...
x+ 9, color="green", lw=2, ls='--', marker='+')
x+10, color="green", lw=2, ls='--', marker='o')
x+11, color="green", lw=2, ls='--', marker='s')
x+12, color="green", lw=2, ls='--', marker='1')
5.3.6
The appearance of the axes is an important aspect of a figure that we often need to modify to make a
publication quality graphics. We need to be able to control where the ticks and labels are placed, modify the
font size and possibly the labels used on the axes. In this section we will look at controling those properties
in a matplotlib figure.
Plot range
The first thing we might want to configure is the ranges of the axes. We can do this using the set ylim
and set xlim methods in the axis object, or axis(tight) for automatrically getting tightly fitted axes
ranges:
In [35]: fig, axes = plt.subplots(1, 3, figsize=(12, 4))
axes[0].plot(x, x**2, x, x**3)
axes[0].set_title("default axes ranges")
axes[1].plot(x, x**2, x, x**3)
axes[1].axis('tight')
axes[1].set_title("tight axes")
axes[2].plot(x, x**2, x, x**3)
axes[2].set_ylim([0, 60])
axes[2].set_xlim([2, 5])
axes[2].set_title("custom axes range");
99
Logarithmic scale
It is also possible to set a logarithmic scale for one or both axes. This functionality is in fact only one
application of a more general transformation system in Matplotlib. Each of the axes scales are set seperately
using set xscale and set yscale methods which accept one parameter (with the value log in this case):
In [36]: fig, axes = plt.subplots(1, 2, figsize=(10,4))
axes[0].plot(x, x**2, x, np.exp(x))
axes[0].set_title("Normal scale")
axes[1].plot(x, x**2, x, np.exp(x))
axes[1].set_yscale("log")
axes[1].set_title("Logarithmic scale (y)");
5.3.7
We can explicitly determine where we want the axis ticks with set xticks and set yticks, which both
take a list of values for where on the axis the ticks are to be placed. We can also use the set xticklabels
100
and set yticklabels methods to provide a list of custom text labels for each tick location:
In [37]: fig, ax = plt.subplots(figsize=(10, 4))
ax.plot(x, x**2, x, x**3, lw=2)
ax.set_xticks([1, 2, 3, 4, 5])
ax.set_xticklabels([r'$\alpha$', r'$\beta$', r'$\gamma$', r'$\delta$', r'$\epsilon$'], fontsize
yticks = [0, 50, 100, 150]
ax.set_yticks(yticks)
ax.set_yticklabels(["$%.1f$" % y for y in yticks], fontsize=18); # use LaTeX formatted labels
Out[37]: [<matplotlib.text.Text
<matplotlib.text.Text
<matplotlib.text.Text
<matplotlib.text.Text
at
at
at
at
0x10a3ae610>,
0x10a3aedd0>,
0x10a3fe110>,
0x10a3fe750>]
There are a number of more advanced methods for controlling major and minor tick placement in matplotlib figures, such as automatic placement according to different policies.
See
http://matplotlib.org/api/ticker api.html for details.
Scientific notation
With large numbers on axes, it is often better use scientific notation:
In [38]: fig, ax = plt.subplots(1, 1)
ax.plot(x, x**2, x, np.exp(x))
ax.set_title("scientific notation")
ax.set_yticks([0, 50, 100, 150])
from matplotlib import ticker
formatter = ticker.ScalarFormatter(useMathText=True)
formatter.set_scientific(True)
formatter.set_powerlimits((-1,1))
ax.yaxis.set_major_formatter(formatter)
101
5.3.8
In [39]: # distance between x and y axis and the numbers on the axes
matplotlib.rcParams['xtick.major.pad'] = 5
matplotlib.rcParams['ytick.major.pad'] = 5
fig, ax = plt.subplots(1, 1)
ax.plot(x, x**2, x, np.exp(x))
ax.set_yticks([0, 50, 100, 150])
ax.set_title("label and axis spacing")
# padding between axis label and axis numbers
ax.xaxis.labelpad = 5
ax.yaxis.labelpad = 5
ax.set_xlabel("x")
ax.set_ylabel("y");
102
103
5.3.9
Axis grid
With the grid method in the axis object, we can turn on and off grid lines. We can also customize the
appearance of the grid lines using the same keyword arguments as the plot function:
In [42]: fig, axes = plt.subplots(1, 2, figsize=(10,3))
# default grid appearance
axes[0].plot(x, x**2, x, x**3, lw=2)
axes[0].grid(True)
# custom grid appearance
axes[1].plot(x, x**2, x, x**3, lw=2)
axes[1].grid(color='b', alpha=0.5, linestyle='dashed', linewidth=0.5)
104
5.3.10
Axis spines
5.3.11
Twin axes
Sometimes it is useful to have dual x or y axes in a figure; for example, when plotting curves with different
units together. Matplotlib supports this with the twinx and twiny functions:
In [44]: fig, ax1 = plt.subplots()
ax1.plot(x, x**2, lw=2, color="blue")
ax1.set_ylabel(r"area $(m^2)$", fontsize=18, color="blue")
for label in ax1.get_yticklabels():
label.set_color("blue")
ax2 = ax1.twinx()
ax2.plot(x, x**3, lw=2, color="red")
ax2.set_ylabel(r"volume $(m^3)$", fontsize=18, color="red")
for label in ax2.get_yticklabels():
label.set_color("red")
105
5.3.12
106
5.3.13
In addition to the regular plot method, there are a number of other functions for generating different kind of plots. See the matplotlib plot gallery for a complete list of available plot types:
http://matplotlib.org/gallery.html. Some of the more useful ones are show below:
In [46]: n = np.array([0,1,2,3,4,5])
In [47]: fig, axes = plt.subplots(1, 4, figsize=(12,3))
axes[0].scatter(xx, xx + 0.25*np.random.randn(len(xx)))
axes[0].set_title("scatter")
axes[1].step(n, n**2, lw=2)
axes[1].set_title("step")
axes[2].bar(n, n**2, align="center", width=0.5, alpha=0.5)
axes[2].set_title("bar")
axes[3].fill_between(x, x**2, x**3, color="green", alpha=0.5);
axes[3].set_title("fill_between");
107
In [49]: # A histogram
n = np.random.randn(100000)
fig, axes = plt.subplots(1, 2, figsize=(12,4))
axes[0].hist(n)
axes[0].set_title("Default histogram")
axes[0].set_xlim((min(n), max(n)))
axes[1].hist(n, cumulative=True, bins=50)
axes[1].set_title("Cumulative detailed histogram")
axes[1].set_xlim((min(n), max(n)));
108
5.3.14
Text annotation
Annotating text in matplotlib figures can be done using the text function. It supports LaTeX formatting
just like axis label texts and titles:
In [50]: fig, ax = plt.subplots()
ax.plot(xx, xx**2, xx, xx**3)
ax.text(0.15, 0.2, r"$y=x^2$", fontsize=20, color="blue")
ax.text(0.65, 0.1, r"$y=x^3$", fontsize=20, color="green");
5.3.15
Axes can be added to a matplotlib Figure canvas manually using fig.add axes or using a sub-figure layout
manager such as subplots, subplot2grid, or gridspec:
subplots
In [51]: fig, ax = plt.subplots(2, 3)
fig.tight_layout()
109
subplot2grid
In [52]: fig = plt.figure()
ax1 = plt.subplot2grid((3,3),
ax2 = plt.subplot2grid((3,3),
ax3 = plt.subplot2grid((3,3),
ax4 = plt.subplot2grid((3,3),
ax5 = plt.subplot2grid((3,3),
fig.tight_layout()
(0,0), colspan=3)
(1,0), colspan=2)
(1,2), rowspan=2)
(2,0))
(2,1))
110
gridspec
In [53]: import matplotlib.gridspec as gridspec
In [54]: fig = plt.figure()
gs = gridspec.GridSpec(2, 3, height_ratios=[2,1], width_ratios=[1,2,1])
for g in gs:
ax = fig.add_subplot(g)
fig.tight_layout()
111
add axes
Manually adding axes with add axes is useful for adding insets to figures:
In [55]: fig, ax = plt.subplots()
ax.plot(xx, xx**2, xx, xx**3)
fig.tight_layout()
# inset
inset_ax = fig.add_axes([0.2, 0.55, 0.35, 0.35]) # X, Y, width, height
inset_ax.plot(xx, xx**2, xx, xx**3)
inset_ax.set_title('zoom near origin')
# set axis range
inset_ax.set_xlim(-.2, .2)
inset_ax.set_ylim(-.005, .01)
# set axis tick locations
inset_ax.set_yticks([0, 0.005, 0.01])
inset_ax.set_xticks([-0.1,0,.1]);
112
5.3.16
Colormaps and contour figures are useful for plotting functions of two variables. In most of these functions we will use a colormap to encode one dimension of the data. There are a number of predefined
colormaps. It is relatively straightforward to define custom colormaps. For a list of pre-defined colormaps,
see: http://www.scipy.org/Cookbook/Matplotlib/Show colormaps
In [56]: alpha = 0.7
phi_ext = 2 * np.pi * 0.5
def flux_qubit_potential(phi_m, phi_p):
return 2 + alpha - 2 * np.cos(phi_p) * np.cos(phi_m) - alpha * np.cos(phi_ext - 2*phi_p)
In [57]: phi_m = np.linspace(0, 2*np.pi, 100)
phi_p = np.linspace(0, 2*np.pi, 100)
X,Y = np.meshgrid(phi_p, phi_m)
Z = flux_qubit_potential(X, Y).T
pcolor
In [58]: fig, ax = plt.subplots()
113
imshow
In [59]: fig, ax = plt.subplots()
im = ax.imshow(Z, cmap=matplotlib.cm.RdBu, vmin=abs(Z).min(), vmax=abs(Z).max(), extent=[0, 1,
im.set_interpolation('bilinear')
cb = fig.colorbar(im, ax=ax)
114
contour
In [60]: fig, ax = plt.subplots()
115
5.4
3D figures
To use 3D graphics in matplotlib, we first need to create an instance of the Axes3D class. 3D axes can be
added to a matplotlib figure canvas in exactly the same way as 2D axes; or, more conveniently, by passing
a projection=3d keyword argument to the add axes or add subplot methods.
In [61]: from mpl_toolkits.mplot3d.axes3d import Axes3D
Surface plots
In [62]: fig = plt.figure(figsize=(14,6))
# `ax` is a 3D-aware axis instance because of the projection='3d' keyword argument to add_subpl
ax = fig.add_subplot(1, 2, 1, projection='3d')
p = ax.plot_surface(X, Y, Z, rstride=4, cstride=4, linewidth=0)
Wire-frame plot
In [63]: fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(1, 1, 1, projection='3d')
p = ax.plot_wireframe(X, Y, Z, rstride=4, cstride=4)
116
117
118
5.4.1
Animations
Matplotlib also includes a simple API for generating animations for sequences of figures. With the
FuncAnimation function we can generate a movie file from sequences of figures. The function takes the
following arguments: fig, a figure canvas, func, a function that we provide which updates the figure,
init func, a function we provide to setup the figure, frame, the number of frames to generate, and blit,
which tells the animation function to only update parts of the frame which have changed (for smoother
animations):
def init():
# setup figure
def update(frame_counter):
# update figure for new frame
anim = animation.FuncAnimation(fig, update, init_func=init, frames=200, blit=True)
anim.save('animation.mp4', fps=30) # fps = frames per second
To use the animation features in matplotlib we first need to import the module matplotlib.animation:
In [66]: from matplotlib import animation
In [67]: # solve the ode problem of the double compound pendulum again
from scipy.integrate import odeint
from numpy import cos, sin
g = 9.82; L = 0.5; m = 0.1
def dx(x, t):
x1, x2, x3, x4 = x[0], x[1], x[2], x[3]
dx1 = 6.0/(m*L**2) * (2 * x3 - 3 * cos(x1-x2) * x4)/(16 - 9 * cos(x1-x2)**2)
119
# anim.save can be called in a few different ways, some which might or might not work
# on different platforms and with different versions of matplotlib and video encoders
#anim.save('animation.mp4', fps=20, extra_args=['-vcodec', 'libx264'], writer=animation.FFMpegW
#anim.save('animation.mp4', fps=20, extra_args=['-vcodec', 'libx264'])
#anim.save('animation.mp4', fps=20, writer="ffmpeg", codec="libx264")
anim.save('animation.mp4', fps=20, writer="avconv", codec="libx264")
plt.close(fig)
Note: To generate the movie file we need to have either ffmpeg or avconv installed. Install it on Ubuntu
using:
$ sudo apt-get install ffmpeg
or (newer versions)
$ sudo apt-get install libav-tools
On MacOSX, try:
120
5.4.2
Backends
Matplotlib has a number of backends which are responsible for rendering graphs. The different backends
are able to generate graphics with different formats and display/event loops. There is a distinction between
noninteractive backends (such as agg, svg, pdf, etc.) that are only used to generate image files (e.g. with
the savefig function), and interactive backends (such as Qt4Agg, GTK, MaxOSX) that can display a GUI
window for interactively exploring figures.
A list of available backends are:
In [70]: print(matplotlib.rcsetup.all_backends)
[u'GTK', u'GTKAgg', u'GTKCairo', u'MacOSX', u'Qt4Agg', u'Qt5Agg', u'TkAgg', u'WX', u'WXAgg', u'CocoaAgg'
The default backend, called agg, is based on a library for raster graphics which is great for generating
raster formats like PNG.
Normally we dont need to bother with changing the default backend; but sometimes it can be useful to
switch to, for example, PDF or GTKCairo (if you are using Linux) to produce high-quality vector graphics
instead of raster based graphics.
Generating SVG with the svg backend
In [1]: #
# RESTART THE NOTEBOOK: the matplotlib backend can only be selected before pylab is imported!
# (e.g. Kernel > Restart)
#
import matplotlib
matplotlib.use('svg')
import matplotlib.pylab as plt
import numpy
from IPython.display import Image, SVG
In [2]: #
# Now we are using the svg backend to produce SVG vector graphics
#
fig, ax = plt.subplots()
t = numpy.linspace(0, 10, 100)
ax.plot(t, numpy.cos(t)*numpy.sin(t))
plt.savefig("test.svg")
In [3]: #
# Show the produced SVG file.
#
SVG(filename="test.svg")
Out[3]:
121
122
5.5
Further reading
123
5.6
Versions
124
Chapter 6
6.1
Introduction
There are two notable Computer Algebra Systems (CAS) for Python:
SymPy - A python module that can be used in any Python program, or in an IPython session, that
provides powerful CAS features.
Sage - Sage is a full-featured and very powerful CAS enviroment that aims to provide an open source
system that competes with Mathematica and Maple. Sage is not a regular Python module, but rather
a CAS environment that uses Python as its programming language.
Sage is in some aspects more powerful than SymPy, but both offer very comprehensive CAS functionality.
The advantage of SymPy is that it is a regular Python module and integrates well with the IPython notebook.
In this lecture we will therefore look at how to use SymPy with IPython notebooks. If you are interested
in an open source CAS environment I also recommend to read more about Sage.
To get started using SymPy in a Python program or notebook, import the module sympy:
In [2]: from sympy import *
To get nice-looking LATEX formatted output run:
In [3]: init_printing()
# or with older versions of sympy/ipython, load the IPython extension
#%load_ext sympy.interactive.ipythonprinting
# or
#%load_ext sympyprinting
6.2
Symbolic variables
In SymPy we need to create symbols for the variables we want to work with. We can create a new symbol
using the Symbol class:
125
In [4]: x = Symbol('x')
In [5]: (pi + x)**2
Out[5]:
In [6]: # alternative way of defining symbols
a, b, c = symbols("a, b, c")
In [7]: type(a)
Out[7]: sympy.core.symbol.Symbol
We can add assumptions to symbols when we create them:
In [8]: x = Symbol('x', real=True)
In [9]: x.is_imaginary
Out[9]: False
In [10]: x = Symbol('x', positive=True)
In [11]: x > 0
Out[11]:
6.2.1
Complex numbers
6.2.2
Rational numbers
There are three different numerical types in SymPy: Real, Rational, Integer:
In [15]: r1 = Rational(4,5)
r2 = Rational(5,4)
In [16]: r1
Out[16]:
In [17]: r1+r2
Out[17]:
In [18]: r1/r2
Out[18]:
126
6.3
Numerical evaluation
SymPy uses a library for artitrary precision as numerical backend, and has predefined SymPy expressions
for a number of mathematical constants, such as: pi, e, oo for infinity.
To evaluate an expression numerically we can use the evalf function (or N). It takes an argument n which
specifies the number of significant digits.
In [19]: pi.evalf(n=50)
Out[19]:
In [20]: y = (x + pi)**2
In [21]: N(y, 5) # same as evalf
Out[21]: When we numerically evaluate algebraic expressions we often want to substitute a symbol with a
numerical value. In SymPy we do that using the subs function:
In [22]: y.subs(x, 1.5)
Out[22]:
In [23]: N(y.subs(x, 1.5))
Out[23]: The subs function can of course also be used to substitute Symbols and expressions:
In [24]: y.subs(x, a+pi)
Out[24]: We can also combine numerical evolution of expressions with NumPy arrays:
In [25]: import numpy
In [26]: x_vec = numpy.arange(0, 10, 0.1)
In [27]: y_vec = numpy.array([N(((x + pi)**2).subs(x, xx)) for xx in x_vec])
In [28]: fig, ax = plt.subplots()
ax.plot(x_vec, y_vec);
127
However, this kind of numerical evolution can be very slow, and there is a much more efficient way to do it:
Use the function lambdify to compile a Sympy expression into a function that is much more efficient to
evaluate numerically:
In [29]: f = lambdify([x], (x + pi)**2, 'numpy')
In [30]: y_vec = f(x_vec)
# now we can directly pass a numpy array and f(x) is efficiently evaluated
The speedup when using lambdified functions instead of direct numerical evaluation can be significant,
often several orders of magnitude. Even in this simple example we get a significant speed up:
In [31]: %%timeit
y_vec = numpy.array([N(((x + pi)**2).subs(x, xx)) for xx in x_vec])
10 loops, best of 3: 28.2 ms per loop
In [32]: %%timeit
y_vec = f(x_vec)
The slowest run took 8.86 times longer than the fastest. This could mean that an intermediate result is
100000 loops, best of 3: 2.93 s per loop
6.4
Algebraic manipulations
One of the main uses of an CAS is to perform algebraic manipulations of expressions. For example, we might
want to expand a product, factor an expression, or simply an expression. The functions for doing these basic
operations in SymPy are demonstrated in this section.
6.4.1
6.4.2
Simplify
The simplify tries to simplify an expression into a nice looking expression, using various techniques. More
specific alternatives to the simplify functions also exists: trigsimp, powsimp, logcombine, etc.
The basic usages of these functions are as follows:
In [38]: # simplify expands a product
simplify((x+1)*(x+2)*(x+3))
Out[38]:
In [39]: # simplify uses trigonometric identities
simplify(sin(a)**2 + cos(a)**2)
Out[39]:
In [40]: simplify(cos(x)/sin(x))
Out[40]:
6.4.3
To manipulate symbolic expressions of fractions, we can use the apart and together functions:
In [41]: f1 = 1/((a+1)*(a+2))
In [42]: f1
Out[42]:
In [43]: apart(f1)
Out[43]:
In [44]: f2 = 1/(a+2) + 1/(a+3)
In [45]: f2
Out[45]:
In [46]: together(f2)
Out[46]: Simplify usually combines fractions but does not factor:
In [47]: simplify(f2)
Out[47]:
6.5
Calculus
In addition to algebraic manipulations, the other main use of CAS is to do calculus, like derivatives and
integrals of algebraic expressions.
129
6.5.1
Differentiation
Differentiation is usually simple. Use the diff function. The first argument is the expression to take the
derivative of, and the second argument is the symbol by which to take the derivative:
In [48]: y
Out[48]:
In [49]: diff(y**2, x)
Out[49]: For higher order derivatives we can do:
In [50]: diff(y**2, x, x)
Out[50]:
In [51]: diff(y**2, x, 2) # same as above
Out[51]: To calculate the derivative of a multivariate expression, we can do:
In [52]: x, y, z = symbols("x,y,z")
In [53]: f = sin(x*y) + cos(y*z)
d3 f
dxdy 2
In [54]: diff(f, x, 1, y, 2)
Out[54]:
6.6
Integration
6.6.1
6.7
Limits
f (x + h, y) f (x, y)
df (x, y)
=
dx
h
In [67]: h = Symbol("h")
In [68]: limit((f.subs(x, x+h) - f)/h, h, 0)
Out[68]: OK!
We can change the direction from which we approach the limiting point using the dir keywork argument:
In [69]: limit(1/x, x, 0, dir="+")
Out[69]:
In [70]: limit(1/x, x, 0, dir="-")
Out[70]:
6.8
Series
Series expansion is also one of the most useful features of a CAS. In SymPy we can perform a series expansion
of an expression using the series function:
In [71]: series(exp(x), x)
Out[71]: By default it expands the expression around x = 0, but we can expand around any value of x by
explicitly include a value in the function call:
In [72]: series(exp(x), x, 1)
Out[72]: And we can explicitly define to which order the series expansion should be carried out:
In [73]: series(exp(x), x, 1, 10)
Out[73]: The series expansion includes the order of the approximation, which is very useful for keeping
track of the order of validity when we do calculations with series expansions of different order:
In [74]: s1 = cos(x).series(x, 0, 5)
s1
Out[74]:
In [75]: s2 = sin(x).series(x, 0, 2)
s2
Out[75]:
131
6.9
6.9.1
Linear algebra
Matrices
6.10
Solving equations
For solving equations and systems of equations we can use the solve function:
In [86]: solve(x**2 - 1, x)
Out[86]:
In [87]: solve(x**4 - x**2 - 1, x)
Out[87]: System of equations:
In [88]: solve([x + y - 1, x - y - 1], [x,y])
Out[88]: In terms of other symbolic expressions:
In [89]: solve([x + y - a, x - y - c], [x,y])
Out[89]:
132
6.11
Further reading
6.12
Versions
133
Chapter 7
7.1
7.1.1
Fortran
F2PY
F2PY is a program that (almost) automatically wraps fortran code for use in Python: By using the f2py
program we can compile fortran code into a module that we can import in a Python program.
F2PY is a part of NumPy, but you will also need to have a fortran compiler to run the examples below.
7.1.2
134
100
do 100 i=0, n
print *, "Fortran says hello"
continue
end
Overwriting hellofortran.f
Generate a python module using f2py:
In [4]: !f2py -c -m hellofortran hellofortran.f
running build
running config cc
unifing config cc, config, build clib, build ext, build commands --compiler options
running config fc
unifing config fc, config, build clib, build ext, build commands --fcompiler options
running build src
build src
building extension "hellofortran" sources
f2py options: []
f2py:> /tmp/tmpz2IPjB/src.linux-x86 64-2.7/hellofortranmodule.c
creating /tmp/tmpz2IPjB/src.linux-x86 64-2.7
Reading fortran codes...
Reading file 'hellofortran.f' (format:fix,strict)
Post-processing...
Block: hellofortran
Block: hellofortran
Post-processing (stage 2)...
Building modules...
Building module "hellofortran"...
Constructing wrapper function "hellofortran"...
hellofortran(n)
Wrote C/API module "hellofortran" to file "/tmp/tmpz2IPjB/src.linux-x86 64-2.7/hellofortranmodul
adding '/tmp/tmpz2IPjB/src.linux-x86 64-2.7/fortranobject.c' to sources.
adding '/tmp/tmpz2IPjB/src.linux-x86 64-2.7' to include dirs.
copying /usr/lib/python2.7/dist-packages/numpy/f2py/src/fortranobject.c -> /tmp/tmpz2IPjB/src.linux-x86
copying /usr/lib/python2.7/dist-packages/numpy/f2py/src/fortranobject.h -> /tmp/tmpz2IPjB/src.linux-x86
build src: building npy-pkg config files
running build ext
customize UnixCCompiler
customize UnixCCompiler using build ext
customize Gnu95FCompiler
Found executable /usr/bin/gfortran
customize Gnu95FCompiler
customize Gnu95FCompiler using build ext
building 'hellofortran' extension
compiling C sources
C compiler: x86 64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-pr
creating /tmp/tmpz2IPjB/tmp
creating /tmp/tmpz2IPjB/tmp/tmpz2IPjB
creating /tmp/tmpz2IPjB/tmp/tmpz2IPjB/src.linux-x86 64-2.7
compile options: '-I/tmp/tmpz2IPjB/src.linux-x86 64-2.7 -I/usr/lib/python2.7/dist-packages/numpy/core/in
x86 64-linux-gnu-gcc: /tmp/tmpz2IPjB/src.linux-x86 64-2.7/hellofortranmodule.c
135
7.1.3
says
says
says
says
says
says
hello
hello
hello
hello
hello
hello
y = 1.0
100
do 100 i=1, n
y = y * x(i)
continue
end
Overwriting dprod.f
running build
running config cc
unifing config cc, config, build clib, build ext, build commands --compiler options
running config fc
unifing config fc, config, build clib, build ext, build commands --fcompiler options
running build src
build src
building extension "dprod" sources
creating /tmp/tmpWyCvx1/src.linux-x86 64-2.7
f2py options: []
f2py: dprod.pyf
Reading fortran codes...
Reading file 'dprod.pyf' (format:free)
Post-processing...
Block: dprod
Block: dprod
Post-processing (stage 2)...
Building modules...
Building module "dprod"...
Constructing wrapper function "dprod"...
y = dprod(x,[n])
Wrote C/API module "dprod" to file "/tmp/tmpWyCvx1/src.linux-x86 64-2.7/dprodmodule.c"
adding '/tmp/tmpWyCvx1/src.linux-x86 64-2.7/fortranobject.c' to sources.
adding '/tmp/tmpWyCvx1/src.linux-x86 64-2.7' to include dirs.
copying /usr/lib/python2.7/dist-packages/numpy/f2py/src/fortranobject.c -> /tmp/tmpWyCvx1/src.linux-x86
copying /usr/lib/python2.7/dist-packages/numpy/f2py/src/fortranobject.h -> /tmp/tmpWyCvx1/src.linux-x86
build src: building npy-pkg config files
running build ext
customize UnixCCompiler
customize UnixCCompiler using build ext
customize Gnu95FCompiler
Found executable /usr/bin/gfortran
customize Gnu95FCompiler
customize Gnu95FCompiler using build ext
building 'dprod' extension
compiling C sources
C compiler: x86 64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-pr
creating /tmp/tmpWyCvx1/tmp
creating /tmp/tmpWyCvx1/tmp/tmpWyCvx1
creating /tmp/tmpWyCvx1/tmp/tmpWyCvx1/src.linux-x86 64-2.7
compile options: '-I/tmp/tmpWyCvx1/src.linux-x86 64-2.7 -I/usr/lib/python2.7/dist-packages/numpy/core/in
x86 64-linux-gnu-gcc: /tmp/tmpWyCvx1/src.linux-x86 64-2.7/dprodmodule.c
138
In [14]: dprod.dprod(arange(1,50))
Out[14]: 6.082818640342675e+62
In [15]: # compare to numpy
prod(arange(1.0,50.0))
Out[15]: 6.0828186403426752e+62
In [16]: dprod.dprod(arange(1,10), 5) # only the 5 first elements
Out[16]: 120.0
Compare performance:
In [17]: xvec = rand(500)
In [18]: timeit dprod.dprod(xvec)
1000000 loops, best of 3: 882 ns per loop
In [19]: timeit xvec.prod()
100000 loops, best of 3: 4.45 s per loop
7.1.4
The cummulative sum function for an array of data is a good example of a loop intense algorithm: Loop
through a vector and store the cummulative sum in another vector.
In [20]: # simple python algorithm: example of a SLOW implementation
# Why? Because the loop is implemented in python.
def py_dcumsum(a):
b = empty_like(a)
b[0] = a[0]
for n in range(1,len(a)):
b[n] = b[n-1]+a[n]
return b
Fortran subroutine for the same thing: here we have added the intent(in) and intent(out) as comment
lines in the original fortran code, so we do not need to manually edit the fortran module declaration file
generated by f2py.
In [21]: %%file dcumsum.f
c File dcumsum.f
subroutine dcumsum(a, b, n)
double precision a(n)
double precision b(n)
integer n
cf2py intent(in) :: a
cf2py intent(out) :: b
140
cf2py
100
intent(hide) :: n
b(1) = a(1)
do 100 i=2, n
b(i) = b(i-1) + a(i)
continue
end
Overwriting dcumsum.f
We can directly compile the fortran code to a python module:
In [22]: !f2py -c dcumsum.f -m dcumsum
running build
running config cc
unifing config cc, config, build clib, build ext, build commands --compiler options
running config fc
unifing config fc, config, build clib, build ext, build commands --fcompiler options
running build src
build src
building extension "dcumsum" sources
f2py options: []
f2py:> /tmp/tmpfvrMl6/src.linux-x86 64-2.7/dcumsummodule.c
creating /tmp/tmpfvrMl6/src.linux-x86 64-2.7
Reading fortran codes...
Reading file 'dcumsum.f' (format:fix,strict)
Post-processing...
Block: dcumsum
Block: dcumsum
Post-processing (stage 2)...
Building modules...
Building module "dcumsum"...
Constructing wrapper function "dcumsum"...
b = dcumsum(a)
Wrote C/API module "dcumsum" to file "/tmp/tmpfvrMl6/src.linux-x86 64-2.7/dcumsummodule.c"
adding '/tmp/tmpfvrMl6/src.linux-x86 64-2.7/fortranobject.c' to sources.
adding '/tmp/tmpfvrMl6/src.linux-x86 64-2.7' to include dirs.
copying /usr/lib/python2.7/dist-packages/numpy/f2py/src/fortranobject.c -> /tmp/tmpfvrMl6/src.linux-x86
copying /usr/lib/python2.7/dist-packages/numpy/f2py/src/fortranobject.h -> /tmp/tmpfvrMl6/src.linux-x86
build src: building npy-pkg config files
running build ext
customize UnixCCompiler
customize UnixCCompiler using build ext
customize Gnu95FCompiler
Found executable /usr/bin/gfortran
customize Gnu95FCompiler
customize Gnu95FCompiler using build ext
building 'dcumsum' extension
compiling C sources
C compiler: x86 64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-pr
creating /tmp/tmpfvrMl6/tmp
creating /tmp/tmpfvrMl6/tmp/tmpfvrMl6
141
1.,
3.,
6.,
10.,
15.,
21.,
28.,
36.])
3.,
6.,
10.,
15.,
21.,
28.,
36.])
3.,
6.,
10.,
15.,
21.,
28.,
36.])
In [26]: dcumsum.dcumsum(a)
Out[26]: array([
1.,
In [27]: cumsum(a)
Out[27]: array([
1.,
142
7.1.5
Further reading
1. http://www.scipy.org/F2py
2. http://dsnra.jpl.nasa.gov/software/Python/F2PY tutorial.pdf
3. http://www.shocksolution.com/2009/09/f2py-binding-fortran-python/
7.2
7.3
ctypes
ctypes is a Python library for calling out to C code. It is not as automatic as f2py, and we manually need
to load the library and set properties such as the functions return and argument types. On the otherhand
we do not need to touch the C code at all.
In [32]: %%file functions.c
#include <stdio.h>
void hello(int n);
double dprod(double *x, int n);
void dcumsum(double *a, double *b, int n);
void
hello(int n)
{
int i;
for (i = 0; i < n; i++)
{
printf("C says hello\n");
}
}
double
dprod(double *x, int n)
{
int i;
double y = 1.0;
for (i = 0; i < n; i++)
{
143
y *= x[i];
}
return y;
}
void
dcumsum(double *a, double *b, int n)
{
int i;
b[0] = a[0];
for (i = 1; i < n; i++)
{
b[i] = a[i] + b[i-1];
}
}
Overwriting functions.c
Compile the C file into a shared library:
In [33]: !gcc -c -Wall -O2 -Wall -ansi -pedantic -fPIC -o functions.o functions.c
!gcc -o libfunctions.so -shared functions.o
The result is a compiled shared library libfunctions.so:
In [34]: !file libfunctions.so
libfunctions.so: ELF 64-bit LSB
Now we need to write wrapper functions to access the C library: To load the library we use the ctypes
package, which included in the Python standard library (with extensions from numpy for passing arrays to
C). Then we manually set the types of the argument and return values (no automatic code inspection here!).
In [35]: %%file functions.py
import numpy
import ctypes
_libfunctions = numpy.ctypeslib.load_library('libfunctions', '.')
_libfunctions.hello.argtypes = [ctypes.c_int]
_libfunctions.hello.restype = ctypes.c_void_p
_libfunctions.dprod.argtypes = [numpy.ctypeslib.ndpointer(dtype=numpy.float), ctypes.c_int]
_libfunctions.dprod.restype = ctypes.c_double
144
7.3.1
Product function:
In [39]: functions.dprod([1,2,3,4,5])
Out[39]: 120.0
7.3.2
Cummulative sum:
In [40]: a = rand(100000)
In [41]: res_c = functions.dcumsum(a, len(a))
In [42]: res_fortran = dcumsum.dcumsum(a)
In [43]: res_c - res_fortran
Out[43]: array([ 0.,
7.3.3
0.,
0., ...,
0.,
0.,
Simple benchmark
145
0.])
7.3.4
Further reading
http://docs.python.org/2/library/ctypes.html
http://www.scipy.org/Cookbook/Ctypes
7.4
Cython
A hybrid between python and C that can be compiled: Basically Python code with type declarations.
In [47]: %%file cy_dcumsum.pyx
cimport numpy
x86 64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fP
In file included from /usr/include/python2.7/numpy/ndarraytypes.h:1761:0,
from /usr/include/python2.7/numpy/ndarrayobject.h:17,
from /usr/include/python2.7/numpy/arrayobject.h:4,
from cy dcumsum.c:352:
/usr/include/python2.7/numpy/npy 1 7 deprecated api.h:15:2: warning: #warning "Using deprecated NumPy API
#warning "Using deprecated NumPy API, disable it by " \
^
In file included from /usr/include/python2.7/numpy/ndarrayobject.h:26:0,
from /usr/include/python2.7/numpy/arrayobject.h:4,
from cy dcumsum.c:352:
/usr/include/python2.7/numpy/ multiarray api.h:1629:1: warning: import array defined but not used [-W
import array(void)
^
In file included from /usr/include/python2.7/numpy/ufuncobject.h:327:0,
from cy dcumsum.c:353:
/usr/include/python2.7/numpy/ ufunc api.h:241:1: warning: import umath defined but not used [-Wunused
import umath(void)
^
x86 64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,r
1.,
3.,
6.,
10.])
1.,
3.,
6.,
10.,
15.,
21.,
28.,
36.])
3.,
6.,
10.,
15.,
21.,
28.,
36.])
In [54]: py_dcumsum(a)
Out[54]: array([
1.,
In [55]: a = rand(100000)
b = empty_like(a)
In [56]: timeit py_dcumsum(a)
10 loops, best of 3: 50.1 ms per loop
147
7.4.1
When working with the IPython (especially in the notebook), there is a more convenient way of compiling
and loading Cython code. Using the %%cython IPython magic (command to IPython), we can simply type
the Cython code in a code cell and let IPython take care of the conversion to C code, compilation and loading
of the function. To be able to use the %%cython magic, we first need to load the extension cythonmagic:
In [58]: %load_ext cythonmagic
In [62]: %%cython
cimport numpy
7.4.2
Further reading
http://cython.org
http://docs.cython.org/src/userguide/tutorial.html
http://wiki.cython.org/tutorials/numpy
7.5
Versions
148
Chapter 8
8.1
multiprocessing
Python has a built-in process-based library for concurrent computing, called multiprocessing.
In [2]: import
import
import
import
multiprocessing
os
time
numpy
PID
PID
PID
PID
PID
=
=
=
=
=
29008
29006
29009
29007
29008
,
,
,
,
,
args
args
args
args
args
=
=
=
=
=
3
6
5
8
7
In [7]: result
Out[7]: [(29006,
(29007,
(29008,
(29009,
(29009,
(29006,
(29008,
(29007,
1),
2),
3),
4),
5),
6),
7),
8)]
The multiprocessing package is very useful for highly parallel tasks that do not need to communicate
with each other, other than when sending the initial data to the pool of processes and when and collecting
the results.
8.2
IPython parallel
IPython includes a very interesting and versatile parallel computing environment, which is very easy to use.
It builds on the concept of ipython engines and controllers, that one can connect to and submit tasks to.
To get started using this framework for parallel computing, one first have to start up an IPython cluster of
engines. The easiest way to do this is to use the ipcluster command,
$ ipcluster start -n 4
Or, alternatively, from the Clusters tab on the IPython notebook dashboard page. This will start
4 IPython engines on the current host, which is useful for multicore systems. It is also possible to setup
IPython clusters that spans over many nodes in a computing cluster. For more information about possible
use cases, see the official documentation Using IPython for parallel computing.
To use the IPython cluster in our Python programs or notebooks, we start by creating an instance of
IPython.parallel.Client:
In [8]: from IPython.parallel import Client
In [9]: cli = Client()
Using the ids attribute we can retreive a list of ids for the IPython engines in the cluster:
In [10]: cli.ids
Out[10]: [0, 1, 2, 3]
Each of these engines are ready to execute tasks. We can selectively run code on individual engines:
In [11]: def getpid():
""" return the unique ID of the current process """
import os
return os.getpid()
In [12]: # first try it on the notebook process
getpid()
150
Out[12]: 28995
In [13]: # run it on one of the engines
cli[0].apply_sync(getpid)
Out[13]: 30181
In [14]: # run it on ALL of the engines at the same time
cli[:].apply_sync(getpid)
Out[14]: [30181, 30182, 30183, 30185]
We can use this cluster of IPython engines to execute tasks in parallel. The easiest way to dispatch a
function to different engines is to define the function with the decorator:
@view.parallel(block=True)
Here, view is supposed to be the engine pool which we want to dispatch the function (task). Once our
function is defined this way we can dispatch it to the engine using the map method in the resulting class (in
Python, a decorator is a language construct which automatically wraps the function into another function
or a class).
To see how all this works, lets look at an example:
In [15]: dview = cli[:]
In [16]: @dview.parallel(block=True)
def dummy_task(delay):
""" a dummy task that takes 'delay' seconds to finish """
import os, time
t0 = time.time()
pid = os.getpid()
time.sleep(delay)
t1 = time.time()
return [pid, t0, t1]
In [17]: # generate random delay times for dummy tasks
delay_times = numpy.random.rand(4)
Now, to map the function dummy task to the random delay time data, we use the map method in
dummy task:
In [18]: dummy_task.map(delay_times)
Out[18]: [[30181,
[30182,
[30183,
[30185,
1395044753.2096598,
1395044753.2084103,
1395044753.2113762,
1395044753.2130392,
1395044753.9150908],
1395044753.4959202],
1395044753.6453338],
1395044754.1905618]]
Lets do the same thing again with many more tasks and visualize how these tasks are executed on
different IPython engines:
In [19]: def visualize_tasks(results):
res = numpy.array(results)
fig, ax = plt.subplots(figsize=(10, res.shape[1]))
151
yticks = []
yticklabels = []
tmin = min(res[:,1])
for n, pid in enumerate(numpy.unique(res[:,0])):
yticks.append(n)
yticklabels.append("%d" % pid)
for m in numpy.where(res[:,0] == pid)[0]:
ax.add_patch(plt.Rectangle((res[m,1] - tmin, n-0.25),
res[m,2] - res[m,1], 0.5, color="green", alpha=0.5))
ax.set_ylim(-.5, n+.5)
ax.set_xlim(0, max(res[:,2]) - tmin + 0.)
ax.set_yticks(yticks)
ax.set_yticklabels(yticklabels)
ax.set_ylabel("PID")
ax.set_xlabel("seconds")
In [20]: delay_times = numpy.random.rand(64)
In [21]: result = dummy_task.map(delay_times)
visualize_tasks(result)
Thats a nice and easy parallelization! We can see that we utilize all four engines quite well.
But one short coming so far is that the tasks are not load balanced, so one engine might be idle while
others still have more tasks to work on.
However, the IPython parallel environment provides a number of alternative views of the engine cluster,
and there is a view that provides load balancing as well (above we have used the direct view, which is why
we called it dview).
To obtain a load balanced view we simply use the load balanced view method in the engine cluster
client instance cli:
In [22]: lbview = cli.load_balanced_view()
In [23]: @lbview.parallel(block=True)
def dummy_task_load_balanced(delay):
""" a dummy task that takes 'delay' seconds to finish """
import os, time
t0 = time.time()
pid = os.getpid()
time.sleep(delay)
152
t1 = time.time()
return [pid, t0, t1]
In [24]: result = dummy_task_load_balanced.map(delay_times)
visualize_tasks(result)
In the example above we can see that the engine cluster is a bit more efficiently used, and the time to
completion is shorter than in the previous example.
8.2.1
Further reading
There are many other ways to use the IPython parallel environment. The official documentation has a nice
guide:
http://ipython.org/ipython-doc/dev/parallel/
8.3
MPI
When more communication between processes is required, sophisticated solutions such as MPI and OpenMP
are often needed. MPI is process based parallel processing library/protocol, and can be used in Python
programs through the mpi4py package:
http://mpi4py.scipy.org/
To use the mpi4py package we include MPI from mpi4py:
from mpi4py import MPI
A MPI python program must be started using the mpirun -n N command, where N is the number of
processes that should be included in the process group.
Note that the IPython parallel enviroment also has support for MPI, but to begin with we will use mpi4py
and the mpirun in the follow examples.
8.3.1
Example 1
153
if rank == 0:
data = [1.0, 2.0, 3.0, 4.0]
comm.send(data, dest=1, tag=11)
elif rank == 1:
data = comm.recv(source=0, tag=11)
print "rank =", rank, ", data =", data
Overwriting mpitest.py
In [26]: !mpirun -n 2 python mpitest.py
rank = 0 , data = [1.0, 2.0, 3.0, 4.0]
rank = 1 , data = [1.0, 2.0, 3.0, 4.0]
8.3.2
Example 2
8.3.3
0.08007216
0.50832534
0.80038331
0.08007216
0.50832534
0.80038331
8.3.4
3.42334637
5.12403754
3.90903001
3.99854639 4.95852419
4.87891654 2.38660728
5.82330226]
6.13378754
6.72030412
return rcvBuf
a = np.load("random-vector.npy")
s = psum(a)
if MPI.COMM_WORLD.Get_rank() == 0:
print "sum =", s, ", numpy sum =", a.sum()
Overwriting mpi-psum.py
8.3.5
Further reading
http://mpi4py.scipy.org
http://mpi4py.scipy.org/docs/usrman/tutorial.html
https://computing.llnl.gov/tutorials/mpi/
8.4
OpenMP
What about OpenMP? OpenMP is a standard and widely used thread-based parallel API that unfortunaltely
is not useful directly in Python. The reason is that the CPython implementation use a global interpreter
lock, making it impossible to simultaneously run several Python threads. Threads are therefore not useful for parallel computing in Python, unless it is only used to wrap compiled code that do the OpenMP
parallelization (Numpy can do something like that).
This is clearly a limitation in the Python interpreter, and as a consequence all parallelization in Python
must use processes (not threads).
However, there is a way around this that is not that painful. When calling out to compiled code the GIL
is released, and it is possible to write Python-like code in Cython where we can selectively release the GIL
and do OpenMP computations.
In [35]: N_core = multiprocessing.cpu_count()
print("This system has %d cores" % N_core)
This system has 12 cores
Here is a simple example that shows how OpenMP can be used via cython:
In [36]: %load_ext cythonmagic
In [37]: %%cython -f -c-fopenmp --link-args=-fopenmp -c-g
cimport cython
cimport numpy
from cython.parallel import prange, parallel
cimport openmp
def cy_openmp_test():
156
cdef int n, N
# release GIL so that we can use OpenMP
with nogil, parallel():
N = openmp.omp_get_num_threads()
n = openmp.omp_get_thread_num()
with gil:
print("Number of threads %d: thread number %d" % (N, n))
In [38]: cy_openmp_test()
Number
Number
Number
Number
Number
Number
Number
Number
Number
Number
Number
Number
8.4.1
of
of
of
of
of
of
of
of
of
of
of
of
threads
threads
threads
threads
threads
threads
threads
threads
threads
threads
threads
threads
12:
12:
12:
12:
12:
12:
12:
12:
12:
12:
12:
12:
thread
thread
thread
thread
thread
thread
thread
thread
thread
thread
thread
thread
number
number
number
number
number
number
number
number
number
number
number
number
0
10
8
4
7
3
2
1
11
9
5
6
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0., 0.,
0., 0.,
0., 0.,
0.])
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
158
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0., 0.,
0., 0.,
0., 0.,
0.])
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
0.,
159
For large problem sizes the the cython+OpenMP implementation is faster than numpy.dot.
With this simple implementation, the speedup for large problem sizes is about:
In [51]: ((duration_ref / duration_cy_omp)[-10:]).mean()
Out[51]: 3.0072232987815148
Obviously one could do a better job with more effort, since the theoretical limit of the speed-up is:
In [52]: N_core
Out[52]: 12
8.4.2
Further reading
http://openmp.org
http://docs.cython.org/src/userguide/parallelism.html
8.5
OpenCL
OpenCL is an API for heterogenous computing, for example using GPUs for numerical computations. There
is a python package called pyopencl that allows OpenCL code to be compiled, loaded and executed on
the compute units completely from within Python. This is a nice way to work with OpenCL, because the
time-consuming computations should be done on the compute units in compiled code, and in this Python
only server as a control language.
In [53]: %%file opencl-dense-mv.py
import pyopencl as cl
import numpy
import time
# problem size
n = 10000
160
# platform
platform_list = cl.get_platforms()
platform = platform_list[0]
# device
device_list = platform.get_devices()
device = device_list[0]
if False:
print("Platform name:" + platform.name)
print("Platform version:" + platform.version)
print("Device name:" + device.name)
print("Device type:" + cl.device_type.to_string(device.type))
print("Device memory: " + str(device.global_mem_size//1024//1024) + ' MB')
print("Device max clock speed:" + str(device.max_clock_frequency) + ' MHz')
print("Device compute units:" + str(device.max_compute_units))
# context
ctx = cl.Context([device]) # or we can use cl.create_some_context()
# command queue
queue = cl.CommandQueue(ctx)
# kernel
KERNEL_CODE = """
//
// Matrix-vector multiplication: r = m * v
//
#define N %(mat_size)d
__kernel
void dmv_cl(__global float *m, __global float *v, __global float *r)
{
int i, gid = get_global_id(0);
r[gid] = 0;
for (i = 0; i < N; i++)
{
r[gid] += m[gid * N + i] * v[i];
}
}
"""
kernel_params = {"mat_size": n}
program = cl.Program(ctx, KERNEL_CODE % kernel_params).build()
# data
A = numpy.random.rand(n, n)
x = numpy.random.rand(n, 1)
# host buffers
h_y = numpy.empty(numpy.shape(x)).astype(numpy.float32)
h_A = numpy.real(A).astype(numpy.float32)
h_x = numpy.real(x).astype(numpy.float32)
161
# device buffers
mf = cl.mem_flags
d_A_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=h_A)
d_x_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=h_x)
d_y_buf = cl.Buffer(ctx, mf.WRITE_ONLY, size=h_y.nbytes)
# execute OpenCL code
t0 = time.time()
event = program.dmv_cl(queue, h_y.shape, None, d_A_buf, d_x_buf, d_y_buf)
event.wait()
cl.enqueue_copy(queue, h_y, d_y_buf)
t1 = time.time()
print "opencl elapsed time =", (t1-t0)
# Same calculation with numpy
t0 = time.time()
y = numpy.dot(h_A, h_x)
t1 = time.time()
print "numpy elapsed time =", (t1-t0)
# see if the results are the same
print "max deviation =", numpy.abs(y-h_y).max()
Overwriting opencl-dense-mv.py
8.5.1
Further reading
http://mathema.tician.de/software/pyopencl
8.6
Versions
162
Chapter 9
9.1
9.2
It is often useful to create a new branch in a repository, or a fork or clone of an entire repository,
when we doing larger experimental development. The main branch in a repository is called often
master or trunk. When work on a branch or fork is completed, it can be merged in to the master
branch/repository.
With distributed RCSs such as GIT or Mercurial, we can pull and push changesets between different
repositories. For example, between a local copy of there repository to a central online reposistory (for
example on a community repository host site like github.com).
9.2.1
9.3
Installing git
On Linux:
$ sudo apt-get install git
On Mac (with macports):
$ sudo port install git
The first time you start to use git, youll need to configure your author information:
$ git config --global user.name 'Robert Johansson'
$ git config --global user.email robert@riken.jp
9.4
To create a brand new empty repository, we can use the command git init repository-name:
In [4]: # create a new git repository called gitdemo:
!git init gitdemo
Reinitialized existing Git repository in /home/rob/Desktop/scientific-python-lectures/gitdemo/.git/
If we want to fork or clone an existing repository, we can use the command git clone repository:
In [5]: !git clone https://github.com/qutip/qutip
Cloning into 'qutip'...
remote: Counting objects: 7425, done.
remote: Compressing objects: 100% (2013/2013), done.
remote: Total 7425 (delta 5386), reused 7420 (delta 5381)
Receiving objects: 100% (7425/7425), 2.25 MiB | 696 KiB/s, done.
Resolving deltas: 100% (5386/5386), done.
Git clone can take a URL to a public repository, like above, or a path to a local directory:
164
9.5
Status
Using the command git status we get a summary of the current status of the working directory. It shows
if we have modified, added or removed files.
In [34]: !git status
# On branch master
#
# Initial commit
#
# Untracked files:
#
(use "git add <file>..." to include in what will be committed)
#
#
Lecture-7-Revision-Control-Software.ipynb
nothing added to commit but untracked files present (use "git add" to track)
In this case, only the current ipython notebook has been added. It is listed as an untracked file, and is
therefore not in the repository yet.
9.6
To add a new file to the repository, we first create the file and then use the git add filename command:
In [35]: %%file README
A file with information about the gitdemo repository.
Writing README
In [36]: !git status
# On branch master
#
# Initial commit
#
# Untracked files:
#
(use "git add <file>..." to include in what will be committed)
#
#
Lecture-7-Revision-Control-Software.ipynb
#
README
nothing added to commit but untracked files present (use "git add" to track)
165
After having added the file README, the command git status list it as an untracked file.
In [37]: !git add README
In [38]: !git status
#
#
#
#
#
#
#
#
#
#
#
#
#
On branch master
Initial commit
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file:
README
Untracked files:
(use "git add <file>..." to include in what will be committed)
Lecture-7-Revision-Control-Software.ipynb
Now that it has been added, it is listed as a new file that has not yet been commited to the repository.
9.7
Commiting changes
When files that is tracked by GIT are changed, they are listed as modified by git status:
In [43]: %%file README
A file with information about the gitdemo repository.
A new line.
166
Overwriting README
In [44]: !git status
# On branch master
# Changes not staged for commit:
#
(use "git add <file>..." to update what will be committed)
#
(use "git checkout -- <file>..." to discard changes in working directory)
#
#
modified:
README
#
no changes added to commit (use "git add" and/or "git commit -a")
Again, we can commit such changes to the repository using the git commit -m "message" command.
In [45]: !git commit -m "added one more line in README" README
[master b6db712] added one more line in README
1 file changed, 3 insertions(+), 1 deletion(-)
In [46]: !git status
# On branch master
nothing to commit (working directory clean)
9.8
Removing files
To remove file that has been added to the repository, use git rm filename, which works similar to git add
filename:
In [47]: %%file tmpfile
A short-lived file.
Writing tmpfile
Add it:
In [48]: !git add tmpfile
In [49]: !git commit -m "adding file tmpfile" tmpfile
[master 44ed840] adding file tmpfile
1 file changed, 2 insertions(+)
create mode 100644 tmpfile
Remove it again:
In [51]: !git rm tmpfile
rm 'tmpfile'
167
9.9
Commit logs
The messages that are added to the commit command are supposed to give a short (often one-line) description
of the changes/additions/deletions in the commit. If the -m "message" is omitted when invoking the git
commit message an editor will be opened for you to type a commit message (for example useful when a
longer commit message is requried).
We can look at the revision log by using the command git log:
In [53]: !git log
commit a9dc0a4b68e8b1b6d973be8f7e7b8f1c92393c17
Author: Robert Johansson <jrjohansson@gmail.com>
Date:
Mon Dec 10 06:54:41 2012 +0100
remove file tmpfile
commit 44ed840422571c62db55eabd8e8768be6c7784e4
Author: Robert Johansson <jrjohansson@gmail.com>
Date:
Mon Dec 10 06:54:31 2012 +0100
adding file tmpfile
commit b6db712506a45a68001c768a6cf6e15e11c62f89
Author: Robert Johansson <jrjohansson@gmail.com>
Date:
Mon Dec 10 06:54:26 2012 +0100
added one more line in README
commit da8b6e92b34fe3838873bdd27a94402ecc121c43
Author: Robert Johansson <jrjohansson@gmail.com>
Date:
Mon Dec 10 06:54:20 2012 +0100
added notebook file
commit 1f26ad648a791e266fbb951ef5c49b8d990e6461
Author: Robert Johansson <jrjohansson@gmail.com>
Date:
Mon Dec 10 06:54:19 2012 +0100
Added a README file
In the commit log, each revision is shown with a timestampe, a unique has tag that, and author information and the commit message.
9.10
Diffs
All commits results in a changeset, which has a diff describing the changes to the file associated with it.
We can use git diff so see what has changed in a file:
168
README files usually contains installation instructions, and information about how to get start
Overwriting README
-A new line.
\ No newline at end of file
+README files usually contains installation instructions, and information about how to get started using
\ No newline at end of file
That looks quite cryptic but is a standard form for describing changes in files. We can use other tools,
like graphical user interfaces or web based systems to get a more easily understandable diff.
In github (a web-based GIT repository hosting service) it can look like this:
In [24]: Image(filename='images/github-diff.png')
Out[24]:
9.11
To discard a change (revert to the latest version in the repository) we can use the checkout command like
this:
In [58]: !git checkout -- README
In [59]: !git status
# On branch master
nothing to commit (working directory clean)
9.12
If we want to get the code for a specific revision, we can use git checkout and giving it the hash code for
the revision we are interested as argument:
In [60]: !git log
169
commit a9dc0a4b68e8b1b6d973be8f7e7b8f1c92393c17
Author: Robert Johansson <jrjohansson@gmail.com>
Date:
Mon Dec 10 06:54:41 2012 +0100
remove file tmpfile
commit 44ed840422571c62db55eabd8e8768be6c7784e4
Author: Robert Johansson <jrjohansson@gmail.com>
Date:
Mon Dec 10 06:54:31 2012 +0100
adding file tmpfile
commit b6db712506a45a68001c768a6cf6e15e11c62f89
Author: Robert Johansson <jrjohansson@gmail.com>
Date:
Mon Dec 10 06:54:26 2012 +0100
added one more line in README
commit da8b6e92b34fe3838873bdd27a94402ecc121c43
Author: Robert Johansson <jrjohansson@gmail.com>
Date:
Mon Dec 10 06:54:20 2012 +0100
added notebook file
commit 1f26ad648a791e266fbb951ef5c49b8d990e6461
Author: Robert Johansson <jrjohansson@gmail.com>
Date:
Mon Dec 10 06:54:19 2012 +0100
Added a README file
170
9.13
9.13.1
Tags
Tags are named revisions. They are useful for marking particular revisions for later references. For example,
we can tag our code with the tag paper-1-final when when simulations for paper-1 are finished and the
paper submitted. Then we can always retreive the exactly the code used for that paper even if we continue
to work on and develop the code for future projects and papers.
In [66]: !git log
commit a9dc0a4b68e8b1b6d973be8f7e7b8f1c92393c17
Author: Robert Johansson <jrjohansson@gmail.com>
Date:
Mon Dec 10 06:54:41 2012 +0100
remove file tmpfile
commit 44ed840422571c62db55eabd8e8768be6c7784e4
Author: Robert Johansson <jrjohansson@gmail.com>
Date:
Mon Dec 10 06:54:31 2012 +0100
adding file tmpfile
commit b6db712506a45a68001c768a6cf6e15e11c62f89
Author: Robert Johansson <jrjohansson@gmail.com>
Date:
Mon Dec 10 06:54:26 2012 +0100
added one more line in README
commit da8b6e92b34fe3838873bdd27a94402ecc121c43
Author: Robert Johansson <jrjohansson@gmail.com>
Date:
Mon Dec 10 06:54:20 2012 +0100
added notebook file
commit 1f26ad648a791e266fbb951ef5c49b8d990e6461
171
In [67]: !git tag -a demotag1 -m "Code used for this and that purpuse"
In [68]: !git tag -l
demotag1
9.14
Branches
With branches we can create diverging code bases in the same repository. They are for example useful for
experimental development that requires a lot of code changes that could break the functionality in the master
branch. Once the development of a branch has reached a stable state it can always be merged back into the
trunk. Branching-development-merging is a good development strategy when serveral people are involved in
working on the same code base. But even in single author repositories it can often be useful to always keep
the master branch in a working state, and always branch/fork before implementing a new feature, and later
merge it back into the main trunk.
In GIT, we can create a new branch like this:
In [70]: !git branch expr1
172
README files usually contains installation instructions, and information about how to get start
Experimental addition.
Overwriting README
In [76]: !git commit -m "added a line in expr1 branch" README
[expr1 a6dc24f] added a line in expr1 branch
1 file changed, 3 insertions(+), 1 deletion(-)
In [77]: !git branch
* expr1
master
In [78]: !git checkout master
Switched to branch 'master'
In [79]: !git branch
expr1
* master
We can merge an existing branch and all its changesets into another branch (for example the master
branch) like this:
First change to the target branch:
In [82]: !git checkout master
Switched to branch 'master'
173
9.15
If the respository has been cloned from another repository, for example on github.com, it automatically
remembers the address of the parant repository (called origin):
In [5]: !git remote
origin
In [4]: !git remote show origin
* remote origin
Fetch URL: git@github.com:jrjohansson/scientific-python-lectures.git
Push URL: git@github.com:jrjohansson/scientific-python-lectures.git
HEAD branch: master
Remote branch:
master tracked
Local branch configured for 'git pull':
master merges with remote master
Local ref configured for 'git push':
master pushes to master (up to date)
174
9.15.1
pull
We can retrieve updates from the origin repository by pulling changesets from origin to our repository:
In [6]: !git pull origin
Already up-to-date.
We can register addresses to many different repositories, and pull in different changesets from different
sources, but the default source is the origin from where the repository was first cloned (and the work origin
could have been omitted from the line above).
9.15.2
push
After making changes to our local repository, we can push changes to a remote repository using git push.
Again, the default target repository is origin, so we can do:
In [7]: !git status
# On branch master
# Untracked files:
#
(use "git add <file>..." to include in what will be committed)
#
#
Lecture-7-Revision-Control-Software.ipynb
nothing added to commit but untracked files present (use "git add" to track)
In [8]: !git add Lecture-7-Revision-Control-Software.ipynb
In [9]: !git commit -m "added lecture notebook about RCS" Lecture-7-Revision-Control-Software.ipynb
[master d0d6a70] added lecture notebook about RCS
1 file changed, 2114 insertions(+)
create mode 100644 Lecture-7-Revision-Control-Software.ipynb
In [11]: !git push
Counting objects: 4, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 118.94 KiB, done.
Total 3 (delta 1), reused 0 (delta 0)
To git@github.com:jrjohansson/scientific-python-lectures.git
2495af4..d0d6a70 master -> master
9.16
Hosted repositories
Github.com is a git repository hosting site that is very popular with both open source projects (for which it
is free) and private repositories (for which a subscription might be needed).
With a hosted repository it easy to collaborate with colleagues on the same code base, and you get a
graphical user interface where you can browse the code and look at commit logs, track issues etc.
Some good hosted repositories are
Github : http://www.github.com
Bitbucket: http://www.bitbucket.org
In [14]: Image(filename='images/github-project-page.png')
Out[14]:
175
9.17
There are also a number of graphical users interfaces for GIT. The available options vary a little bit from
platform to platform:
http://git-scm.com/downloads/guis
In [15]: Image(filename='images/gitk.png')
Out[15]:
9.18
Further reading
http://git-scm.com/book
http://www.vogella.com/articles/Git/article.html
http://cheat.errtheblog.com/s/git
176