Silo - Tips - An Introduction To Python Programming For Research
Silo - Tips - An Introduction To Python Programming For Research
James Hetherington
November 4, 2015
Contents
1 Introduction 15
1.1 Why teach Python? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.1.1 Why Python? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.1.2 Why write programs for research? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.1.3 Sensible Input - Reasonable Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2 Many kinds of Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2.1 The IPython Notebook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2.2 Typing code in the notebook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2.3 Python at the command line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2.4 Python scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2.5 Python Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Variables 33
3.1 Variable Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Reassignment and multiple labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3 Objects and types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4 Reading error messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.5 Variables and the notebook kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Using Functions 37
4.1 Calling functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Using methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3 Functions are just a type of object! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4 Getting help on functions and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.5 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1
5 Types 43
5.1 Floats and integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.3 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.4 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.5 Unpacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6 Containers 47
6.1 Checking for containment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.2 Mutability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.3 Tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.4 Memory and containers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.5 Identity vs Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7 Dictionaries 51
7.1 The Python Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.2 Keys and Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.3 Immutable Keys Only . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.4 No guarantee of order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.5 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
8 Data structures 54
8.1 Nested Lists and Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
8.2 Exercise: a Maze Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
8.2.1 Solution: my Maze Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
10 Comprehensions 64
10.1 The list comprehension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
10.2 Selection in comprehensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
10.3 Comprehensions versus building lists with append: . . . . . . . . . . . . . . . . . . . . . . . . 64
10.4 Nested comprehensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
10.5 Dictionary Comprehensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
10.6 List-based thinking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
10.7 Classroom Exercise: Occupancy Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
10.7.1 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2
11 Functions 68
11.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
11.2 Default Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
11.3 Side effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
11.4 Early Return . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
11.5 Unpacking arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
11.6 Sequence Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
11.7 Keyword Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
12 Using Libraries 71
12.1 Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
12.2 Why bother? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
12.3 Importing from modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
12.4 Import and rename . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3
18 Structured Data 99
18.1 Structured data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
18.2 Json . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
18.3 Unicode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
18.4 Yaml . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
18.5 XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
18.6 Exercise: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
18.7 Solution: Saving and Loading a Maze . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
21 NumPy 118
21.1 The Scientific Python Trilogy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
21.2 Limitations of Python Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
21.3 The NumPy array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
21.4 Elementwise Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
21.5 Arange and linspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
21.6 Multi-Dimensional Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
21.7 Array Datatypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
21.8 Broadcasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
21.9 Newaxis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
21.10Dot Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
21.11Array DTypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
21.12Record Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
21.13Logical arrays, masking, and selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
21.14Numpy memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4
23 Understanding the “Greengraph” Example 137
23.1 Classes for Greengraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
23.2 Invoking our code and making a plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
23.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
23.3.1 What’s version control? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
23.3.2 Why use version control? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
23.3.3 Git != GitHub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
23.3.4 How do we use version control? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
23.3.5 What is version control? (Team version) . . . . . . . . . . . . . . . . . . . . . . . . . . 140
23.3.6 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
23.4 Practising with Git . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
23.4.1 Example Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
23.4.2 Programming and documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
23.4.3 Markdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
23.4.4 Displaying Text in this Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
23.4.5 Setting up somewhere to work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
23.5 Solo work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
23.5.1 Configuring Git with your name and email . . . . . . . . . . . . . . . . . . . . . . . . 142
23.5.2 Initialising the repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
23.6 Solo work with Git . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
23.6.1 A first example file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
23.6.2 Telling Git about the File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
23.6.3 Our first commit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
23.6.4 Configuring Git with your editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
23.6.5 Git log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
23.6.6 Hash Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
23.6.7 Nothing to see here . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
23.6.8 Unstaged changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
23.6.9 Staging a file to be included in the next commit . . . . . . . . . . . . . . . . . . . . . 145
23.6.10 The staging area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
23.6.11 Message Sequence Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
23.6.12 The Levels of Git . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
23.6.13 Review of status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
23.6.14 Carry on regardless . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
23.6.15 Commit with a built-in-add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
23.6.16 Review of changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
23.6.17 Git Solo Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
23.7 Fixing mistakes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
23.7.1 Referring to changes with HEAD and ˆ . . . . . . . . . . . . . . . . . . . . . . . . . . 151
23.7.2 Reverting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
23.7.3 Conflicted reverts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
23.7.4 Review of changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
23.7.5 Antipatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
23.7.6 Rewriting history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
23.7.7 A new lie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
23.7.8 Using reset to rewrite history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
23.7.9 Covering your tracks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
23.7.10 Resetting the working area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
23.8 Publishing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
23.8.1 Sharing your work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
23.8.2 Creating a repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
23.8.3 Paying for GitHub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
23.8.4 Adding a new remote to your repository . . . . . . . . . . . . . . . . . . . . . . . . . . 157
23.8.5 Remotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5
23.8.6 Playing with GitHub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
23.9 Working with multiple files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
23.9.1 Some new content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
23.9.2 Git will not by default commit your new file . . . . . . . . . . . . . . . . . . . . . . . . 159
23.9.3 Tell git about the new file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
23.10Changing two files at once . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
23.11Collaboration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
23.11.1 Form a team . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
23.11.2 Giving permission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
23.11.3 Obtaining a colleague’s code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
23.11.4 Nonconflicting changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
23.11.5 Rejected push . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
23.11.6 Merge commits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
23.11.7 Nonconflicted commits to the same file . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
23.11.8 Conflicting commits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
23.11.9 Resolving conflicts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
23.11.10Commit the resolved file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
23.11.11Distributed VCS in teams with conflicts . . . . . . . . . . . . . . . . . . . . . . . . . . 177
23.11.12The Levels of Git . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
23.12Editing directly on GitHub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
23.12.1 Editing directly on GitHub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
23.13Social Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
23.13.1 GitHub as a social network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
23.14Fork and Pull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
23.14.1 Different ways of collaborating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
23.14.2 Forking a repository on GitHub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
23.14.3 Pull Request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
23.14.4 Practical example - Team up! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
23.14.5 Some Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
23.15Git Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
23.15.1 The revision Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
23.15.2 Git concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
23.15.3 The levels of Git . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
23.16Branches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
23.16.1 Publishing branches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
23.16.2 Find out what is on a branch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
23.16.3 Merging branches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
23.16.4 Cleaning up after a branch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
23.16.5 A good branch strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
23.16.6 Grab changes from a branch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
23.17Git Stash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
23.18Tagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
23.19Working with generated files: gitignore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
23.20Git clean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
23.21Hunks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
23.21.1 Git Hunks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
23.21.2 Interactive add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
23.22GitHub pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
23.22.1 Yaml Frontmatter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
23.22.2 The gh-pages branch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
23.22.3 UCL layout for GitHub pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
23.23Working with multiple remotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
23.23.1 Distributed versus centralised . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
23.23.2 Referencing remotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
6
23.24Hosting Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
23.24.1 Hosting a local server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
23.24.2 Home-made SSH servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
23.25SSH keys and GitHub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
23.26Rebasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
23.26.1 Rebase vs merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
23.26.2 An example rebase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
23.26.3 Fast Forwards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
23.26.4 Rebasing pros and cons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
23.27Squashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
23.27.1 Using rebase to squash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
23.28Debugging With Git Bisect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
23.28.1 An example repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
23.28.2 Solving Manually . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
23.28.3 Solving automatically . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
23.29Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
23.29.1 A few reasons not to do testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
23.29.2 A few reasons to do testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
23.29.3 Not a panacea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
23.30Testing primer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
23.30.1 Tests at different scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
23.30.2 Legacy code hardening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
23.30.3 Testing vocabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
23.30.4 Branch coverage: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
7
24.9.4 Breakpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
24.9.5 Post-mortem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
24.10Jenkins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
24.10.1 Test servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
24.10.2 Memory and profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
24.11Extended TDD Example: Monte-Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
24.11.1 Problem: Implement and test a simple Monte-Carlo algorithm . . . . . . . . . . . . . 231
24.12Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
24.13Testing frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
24.13.1 Why use testing frameworks? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
24.13.2 Common testing frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
24.13.3 Nose framework: usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
24.14Testing with floating points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
24.14.1 Floating points are not reals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
24.14.2 Comparing floating points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
24.14.3 Comparing vectors of floating points . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
24.15Classroom exercise: energy calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
24.15.1 Diffusion model in 1D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
24.15.2 Starting point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
24.15.3 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
24.15.4 Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
24.16Mocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
24.16.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
24.16.2 Mocking frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
24.16.3 Recording calls with mock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
24.17Using mocks to model test resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
24.17.1 Testing functions that call other functions . . . . . . . . . . . . . . . . . . . . . . . . . 243
24.18Using a debugger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
24.18.1 Stepping through the code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
24.18.2 Using the python debugger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
24.18.3 Basic navigation: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
24.18.4 Breakpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
24.18.5 Post-mortem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
24.19Jenkins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
24.19.1 Test servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
24.19.2 Memory and profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
24.20Extended TDD Example: Monte-Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
24.20.1 Problem: Implement and test a simple Monte-Carlo algorithm . . . . . . . . . . . . . 249
24.21Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
8
25.8.1 The Python Standard Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
25.8.2 The Python Package Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
25.9 Argparse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
25.10Packaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
25.10.1 Packaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
25.10.2 Distribution tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
25.10.3 Laying out a project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
25.10.4 Using setuptools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
25.10.5 Installing from GitHub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
25.10.6 Convert the script to a module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
25.10.7 Write an executable script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
25.10.8 Write an entry point script stub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
25.10.9 Write a readme file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
25.10.10Write a license file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
25.10.11Write a citation file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
25.10.12Define packages and executables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
25.10.13Write some unit tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
25.10.14Developer Install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
25.10.15Distributing compiled code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
25.10.16Homebrew . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
25.10.17Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
25.11Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
25.11.1 Documentation is hard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
25.11.2 Prefer readable code with tests and vignettes . . . . . . . . . . . . . . . . . . . . . . . 266
25.11.3 Comment-based Documentation tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
25.12Example of using Sphinx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
25.12.1 Write some docstrings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
25.12.2 Set up sphinx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
25.12.3 Define the root documentation page . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
25.12.4 Run sphinx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
25.12.5 Sphinx output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
25.13Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
25.13.1 Software Engineering Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
25.13.2 Requirements Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
25.13.3 Functional and architectural design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
25.13.4 Waterfall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
25.13.5 Why Waterfall? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
25.13.6 Problems with Waterfall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
25.13.7 Software is not made of bricks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
25.13.8 Software is not made of bricks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
25.13.9 Software is not made of bricks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
25.13.10The Agile Manifesto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
25.13.11Agile is not absence of process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
25.13.12Elements of an Agile Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
25.13.13Ongoing Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
25.13.14Iterative Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
25.13.15Continuous Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
25.13.16Self-organising teams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
25.13.17Agile in Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
25.13.18Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
25.14Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
25.14.1 Refactoring to classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
25.14.2 Refactoring to Inheritance and Polymorphism . . . . . . . . . . . . . . . . . . . . . . . 273
25.14.3 Refactoring to Patterns: Builder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
9
25.14.4 Refactoring to Patterns: Model/View . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
25.14.5 Using UML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
25.15Software Licensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
25.15.1 Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
25.15.2 Disclaimer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
25.15.3 Choose a license . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
25.15.4 Open source doesn’t stop you making money . . . . . . . . . . . . . . . . . . . . . . . 274
25.15.5 Plagiarism vs promotion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
25.15.6 Your code is good enough . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
25.15.7 Worry about license compatibility and proliferation . . . . . . . . . . . . . . . . . . . . 274
25.15.8 Academic license proliferation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
25.15.9 Licenses for code, content, and data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
25.15.10Licensing issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
25.15.11Permissive vs share-alike . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
25.15.12Academic use only . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
25.15.13Patents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
25.15.14Use as a web service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
25.15.15Library linking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
25.15.16Citing software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
25.15.17Referencing the license in every file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
25.15.18Choose a license . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
25.15.19Open source does not equal free maintenance . . . . . . . . . . . . . . . . . . . . . . . 276
25.16Managing software issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
25.16.1 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
25.16.2 Some Issue Trackers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
25.16.3 Anatomy of an issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
25.16.4 Reporting a Bug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
25.16.5 Owning an issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
25.16.6 Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
25.16.7 Resolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
25.16.8 Bug triage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
25.16.9 The backlog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
25.16.10Development cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
25.16.11GitHub issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
25.16.12Exercise - Packaging Greengraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
26 Construction 279
26.1 Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
26.1.1 Construction vs Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
26.1.2 Low-level design decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
26.1.3 Algorithms and structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
26.1.4 Architectural design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
26.1.5 Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
26.1.6 Literate programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
26.1.7 Programming for humans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
26.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
26.3 Coding Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
26.3.1 One code, many layouts: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
26.3.2 So many choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
26.3.3 Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
26.3.4 Layout choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
26.3.5 Naming Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
26.3.6 Hungarian Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
26.3.7 Newlines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
10
26.3.8 Syntax Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
26.3.9 Syntax choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
26.3.10 Coding Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
26.3.11 Lint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
26.4 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
26.4.1 Why comment? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
26.4.2 Bad Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
26.4.3 Comments which are obvious . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
26.4.4 Comments which could be replaced by better style . . . . . . . . . . . . . . . . . . . . 284
26.4.5 Comments vs expressive code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
26.4.6 Comments which belong in an issue tracker . . . . . . . . . . . . . . . . . . . . . . . . 284
26.4.7 Comments which only make sense to the author today . . . . . . . . . . . . . . . . . . 285
26.4.8 Comments which are unpublishable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
26.5 Good comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
26.5.1 Pedagogical comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
26.5.2 Other good comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
26.6 Refactoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
26.6.1 Refactoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
26.6.2 A word from the Master . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
26.6.3 List of known refactorings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
26.6.4 Replace magic numbers with constants . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
26.6.5 Replace repeated code with a function . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
26.6.6 Change of variable name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
26.6.7 Separate a complex expression into a local variable . . . . . . . . . . . . . . . . . . . . 287
26.6.8 Replace loop with iterator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
26.6.9 Replace hand-written code with library code . . . . . . . . . . . . . . . . . . . . . . . 287
26.6.10 Replace set of arrays with array of structures . . . . . . . . . . . . . . . . . . . . . . . 288
26.6.11 Replace constants with a configuration file . . . . . . . . . . . . . . . . . . . . . . . . . 288
26.6.12 Replace global variables with function arguments . . . . . . . . . . . . . . . . . . . . . 288
26.6.13 Merge neighbouring loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
26.6.14 Break a large function into smaller units . . . . . . . . . . . . . . . . . . . . . . . . . . 289
26.6.15 Separate code concepts into files or modules . . . . . . . . . . . . . . . . . . . . . . . . 289
26.6.16 Refactoring is a safe way to improve code . . . . . . . . . . . . . . . . . . . . . . . . . 290
26.6.17 Tests and Refactoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
26.6.18 Refactoring Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
26.7 Introduction to Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
26.7.1 Classes: User defined types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
26.7.2 Declaring a class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
26.7.3 Object instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
26.7.4 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
26.7.5 Constructor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
26.7.6 Member Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
26.8 Object refactorings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
26.8.1 Replace add-hoc structure with user defined classes . . . . . . . . . . . . . . . . . . . . 292
26.8.2 Replace function with a method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
26.8.3 Replace method arguments with class members . . . . . . . . . . . . . . . . . . . . . . 293
26.8.4 Replace global variable with class and member . . . . . . . . . . . . . . . . . . . . . . 293
26.8.5 Object Oriented Refactoring Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
26.9 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
26.9.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
26.9.2 Object-Oriented Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
26.9.3 Design processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
26.9.4 Design and research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
26.10More on Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
11
26.10.1 Object Based Programming Recap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
26.10.2 Class design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
26.10.3 UML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
26.10.4 YUML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
26.11Information Hiding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
26.11.1 Property accessors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
26.11.2 Class Members . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
26.11.3 Object-based vs Object-Oriented . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
26.12Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
26.12.1 Ontology and inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
26.12.2 Inheritance in python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
26.12.3 Inheritance terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
26.12.4 Inheritance and constructors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
26.12.5 Inheritance UML diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
26.12.6 Aggregation vs Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
26.12.7 Aggregation in UML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
26.12.8 Refactoring to inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
26.13Polymorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
26.13.1 Polymorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
26.13.2 Polymorphism and Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
26.13.3 Undefined Functions and Polymorphism . . . . . . . . . . . . . . . . . . . . . . . . . . 301
26.13.4 Refactoring to Polymorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
26.13.5 Interfaces and concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
26.13.6 Interfaces in UML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
26.13.7 Further UML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
26.14Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
26.14.1 Class Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
26.14.2 Design Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
26.14.3 Reading a pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
26.14.4 Introducing Some Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
26.15Factory Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
26.15.1 Factory Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
26.15.2 Factory UML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
26.15.3 Factory Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
26.15.4 Agent model constructor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
26.15.5 Agent derived classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
26.15.6 Refactoring to Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
26.16Builder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
26.16.1 Builder Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
26.16.2 Builder example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
26.16.3 Builder preferred to complex constructor . . . . . . . . . . . . . . . . . . . . . . . . . 306
26.16.4 Using a builder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
26.16.5 Avoid staged construction without a builder. . . . . . . . . . . . . . . . . . . . . . . . 306
26.16.6 Builder Message Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
26.17Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
26.18Strategy Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
26.18.1 Strategy pattern example: sunspots . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
26.18.2 Sunspot cycle has periodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
26.18.3 Years are not constant length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
26.18.4 Uneven time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
26.18.5 Uneven time series design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
26.18.6 Strategy Pattern for Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
26.18.7 Strategy Pattern for Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
26.18.8 Strategy Pattern for Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
12
26.18.9 Strategy Pattern for Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
26.18.10Strategy Pattern for Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
26.18.11Strategy Pattern for Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
26.18.12Strategy Pattern for Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
26.18.13Strategy Pattern for Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
26.18.14Comparison of different algorithms for frequency spectrum of sunspots. . . . . . . . . 312
26.18.15Deviation of year length from average . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
26.19Model-View-Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
26.19.1 Separate graphics from science! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
26.19.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
26.19.3 View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
26.19.4 Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
26.20Exercise: Refactoring The Bad Boids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
26.20.1 Bad Boids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
26.20.2 Your Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
26.20.3 A regression test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
26.20.4 A regression test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
26.20.5 Make the regression test fail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
26.20.6 Start Refactoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
13
29 NumPy 357
29.1 NumPy constructors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
29.2 Arraywise Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
36 Cython 385
36.1 Start Coding in Cython . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
36.2 Cython with C Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
36.3 Cython with numpy ndarray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
36.4 Calling C functions from Cython . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
14
Chapter 1
Introduction
• Sensible input
• Reasonable output
15
In [1]: ### Make plot
%matplotlib inline
import numpy as np
import math
import matplotlib.pyplot as plt
theta=np.arange(0,4*math.pi,0.1)
eight=plt.figure()
axes=eight.add_axes([0,0,1,1])
axes.plot(0.5*np.sin(theta),np.cos(theta/2))
We’re going to be mainly working in the IPython notebook in this course. To get hold of a copy of the
notebook, follow the setup instructions shown on the course website, or use the installation in UCL teaching
cluster rooms.
IPython notebooks consist of discussion cells, referred to as “markdown cells”, and “code cells”, which
contain Python. This document has been created using IPython notebook, and this very cell is a Markdown
Cell.
Code cell inputs are numbered, and show the output below.
Markdown cells contain text which uses a simple format to achive pretty layout, for example, to obtain:
bold, italic
• Bullet
16
Quote
We write:
**bold**, *italic*
* Bullet
> Quote
• When in a cell, press escape to leave it. When moving around outside cells, press return to enter.
• Outside a cell:
• Use arrow keys to move around.
• Press b to add a new cell below the cursor.
• Press m to turn a cell from code mode to markdown mode.
• Press shift+enter to calculate the code in the block.
• Press h to see a list of useful keys in the notebook.
• Inside a cell:
• Press tab to suggest completions of variables. (Try it!)
Supplementary material : Learn more about the notebook here. Try these videos
In [3]: %%bash
# Above line tells Python to execute this cell as *shell code*
# not Python, as if we were in a command line
# This is called a ’cell magic’
In [4]: %%bash
echo "print 2*4" > eight.py
python eight.py
17
In [5]: %%bash
echo ’#!/usr/bin/env python’ > eight
echo "print 2*4" >> eight
chmod u+x eight
./eight
import numpy as np
import math
import matplotlib.pyplot as plt
def make_figure():
theta=np.arange(0,4*math.pi,0.1)
eight=plt.figure()
axes=eight.add_axes([0,0,1,1])
axes.plot(0.5*np.sin(theta),np.cos(theta/2))
return eight
In a real example, we could edit the file on disk using a program such as Notepad++ for windows or
Atom for Mac.
In [7]: import draw_eight # Load the library file we just wrote to disk
In [8]: image=draw_eight.make_figure()
18
19
Chapter 2
Now, if you try to follow along on this example in an IPython notebook, you’ll probably find that you
just got an error message.
You’ll need to wait until we’ve covered installation of additional python libraries later in the course, then
come back to this and try again. For now, just follow along and try get the feel for how programming for
data-focused research works.
In [2]: geocoder=geopy.geocoders.GoogleV3(domain="maps.google.co.uk")
geocoder.geocode(’Cambridge’,exactly_one=False)
The results come out as a list inside a list: [Name, [Latitude, Longitude]]. Programs represent data
in a variety of different containers like this.
20
2.2.2 Comments
Code after a # symbol doesn’t get run.
This runs
2.2.3 Functions
We can wrap code up in a function, so that we can repeatedly get just the information we want.
Defining functions which put together code to make a more complex task seem simple from the outside
is the most important thing in programming. The output of the function is stated by “return”; the input
comes in in brackets after the function name:
In [5]: geolocate(’London’)
2.2.4 Variables
We can store a result in a variable:
In [6]: london_location=geolocate("London")
print london_location
(51.5073509, -0.1277583)
params=dict(
sensor= str(sensor).lower(),
zoom= zoom,
size= "x".join(map(str,size)),
center= ",".join(map(str,(lat,long))),
style="feature:all|element:labels|visibility:off"
)
if satellite:
params["maptype"]="satellite"
return requests.get(base,params=params)
21
2.2.6 Checking our work
Let’s see what URL we ended up with:
In [9]: url=map_response.url
print url[0:50]
print url[50:100]
print url[100:]
http://maps.googleapis.com/maps/api/staticmap?styl
e=feature%3Aall%7Celement%3Alabels%7Cvisibility%3A
off¢er=51.5072%2C-0.1275&zoom=10&maptype=satellite&sensor=false&size=400x400
We can write automated tests so that if we change our code later, we can check the results are still
valid.
Our previous function comes back with an Object representing the web request. In object oriented
programming, we use the . operator to get access to a particular property of the object, in this case, the
actual image at that URL is in the content property. It’s a big file, so I’ll just get the first few chars:
In [11]: map_response.content[0:20]
Out[11]: ’\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01\x90’
I can use a library that comes with IPython notebook to display the image. Being able to work with
variables which contain images, or documents, or any other weird kind of data, just as easily as we can with
numbers or letters, is one of the really powerful things about modern programming languages like Python.
In [14]: print "The type of our map result is actually a: ", type(map_png)
In [15]: IPython.core.display.Image(map_png)
Out[15]:
22
2.2.8 Manipulating Numbers
Now we get to our research project: we want to find out how urbanised the world is, based on satellite
imagery, along a line between two cites. We expect the satellite image to be greener in the countryside.
We’ll use lots more libraries to count how much green there is in an image.
In [16]: from StringIO import StringIO # A library to convert between files and strings
import numpy as np # A library to deal with matrices
from matplotlib import image as img # A library to deal with images
23
This code has assumed we have our pixel data for the image as a 400 × 400 × 3 3-d matrix, with each of
the three layers being red, green, and blue pixels.
We find out which pixels are green by comparing, element-by-element, the middle (green, number 1) layer
to the top (red, zero) and bottom (blue, 2)
Now we just need to parse in our data, which is a PNG image, and turn it into our matrix format:
106725
We’ll also need a function to get an evenly spaced set of places between two endpoints:
out = green[:,:,np.newaxis]*np.array([0,1,0])[np.newaxis,np.newaxis,:]
buffer = StringIO()
result = img.imsave(buffer, out, format=’png’)
return buffer.getvalue()
In [23]: IPython.core.display.Image(
map_at(*london_location, satellite=True)
)
Out[23]:
24
In [24]: IPython.core.display.Image(
show_green_in_png(
map_at(
*london_location,
satellite=True)))
Out[24]:
25
2.2.10 Looping
We can loop over each element in out list of coordinates, and get a map for that place:
26
27
28
29
So now we can count the green from London to Birmingham!
In [26]: [count_green_in_png(map_at(*location))
Out[26]: [106725,
127797,
155996,
158581,
157918,
158665,
158407,
156403,
148491,
138544]
30
2.2.11 Plotting graphs
Let’s plot a graph.
In [28]: plt.plot([count_green_in_png(map_at(*location))
for location in location_sequence(geolocate("London"),
geolocate("Birmingham"),
10)])
From a research perspective, of course, this code needs a lot of work. But I hope the power of using
programming is clear.
By putting these together, we can make a function which can plot this graph automatically for any two
places:
31
In [29]: def green_between(start, end,steps):
return [count_green_in_png(map_at(*location))
for location in location_sequence(
geolocate(start),
geolocate(end),
steps)]
And that’s it! We’ve covered, very very quickly, the majority of the python language, and much of the
theory of software engineering.
Now we’ll go back, carefully, through all the concepts we touched on, and learn how to use them properly
ourselves.
32
Chapter 3
Variables
In [1]: 2*3
Out[1]: 6
If we want to get back to that result, we have to store it. We put it in a box, with a name on the box.
This is a variable.
In [2]: six=2*3
If we look for a variable that hasn’t ever been defined, we get an error.
---------------------------------------------------------------------------
<ipython-input-4-cd3a57e315ea> in <module>()
----> 1 print seven
None
33
(None is the special python value for a no-value variable.)
Supplementary Materials: There’s more on variables at http://swcarpentry.github.io/python-novice-
inflammation/01-numpy.html
Anywhere we could put a raw number, we can put a variable label, and that works fine:
30
216
In [10]: scary = 25
25
Note that the data that was there before has been lost.
No labels refer to it any more - so it has been “Garbage Collected”! We might imagine something pulled
out of the box, and thrown on the floor, to make way for the next occupant.
In fact, though, it is the label that has moved. We can see this because we have more than one label
refering to the same box:
James
James
James
Hetherington
34
So we can now develop a better understanding of our labels and boxes: each box is a piece of space (an
address) in computer memory. Each label (variable) is a reference to such a place.
When the number of labels on a box (“variables referencing an address”) gets down to zero, then the
data in the box cannot be found any more.
After a while, the language’s “Garbage collector” will wander by, notice a box with no labels, and throw
the data away, making that box available for more data.
Old fashioned languages like C and Fortran don’t have Garbage collectors. So a memory address with
no references to it still takes up memory, and the computer can more easily run out.
So when I write:
In [19]: name = "Jim"
The following things happen:
1. A new text object is created, and an address in memory is found for it.
2. The variable “name” is moved to refer to that address.
3. The old address, containing “James”, now has no labels.
4. The garbage collector frees the memory at the old address.
Supplementary materials: There’s an online python tutor which is great for visualising memory and
references. Try the scenario we just looked at
Labels are contained in groups called “frames”: our frame contains two labels, ‘nom’ and ‘name’.
---------------------------------------------------------------------------
<ipython-input-24-76215e50e85b> in <module>()
----> 1 z.wrong
35
3.4 Reading error messages.
It’s important, when learning to program, to develop an ability to read an error message and find, from in
amongst all the confusing noise, the bit of the error message which tells you what to change!
We don’t yet know what is meant by AttributeError, or “Traceback”.
In [25]: z2=5-6j
print "Gets to here"
print z.wrong
print "Didn’t get to here"
Gets to here
---------------------------------------------------------------------------
<ipython-input-25-891c7b6126ae> in <module>()
1 z2=5-6j
2 print "Gets to here"
----> 3 print z.wrong
4 print "Didn’t get to here"
But in the above, we can see that the error happens on the third line of our code cell.
We can also see that the error message: > ‘complex’ object has no attribute ‘wrong’
. . . tells us something important. Even if we don’t understand the rest, this is useful for debugging!
36
Chapter 4
Using Functions
In [1]: len("pneumonoultramicroscopicsilicovolcanoconiosis")
Out[1]: 45
In [2]: sorted("Python")
In [3]: len(’Jim’)*8
Out[3]: 24
In [4]: x=len(’Mike’)
y=len(’Bob’)
z=x+y
In [5]: print z
In [6]: "shout".upper()
37
Out[6]: ’SHOUT’
These are called methods. If you try to use a method defined for a different type, you get an error:
In [7]: 5.upper()
If you try to use a method that doesn’t exist, you get an error:
In [8]: x=5
x.wrong
---------------------------------------------------------------------------
<ipython-input-8-c914fcddd360> in <module>()
1 x=5
----> 2 x.wrong
Methods and properties are both kinds of attribute, so both are accessed with the dot operator.
Objects can have both properties and methods:
In [9]: z=1+5j
In [10]: z.real
Out[10]: 1.0
In [11]: z.conjugate()
Out[11]: (1-5j)
In [12]: z.conjugate
In [13]: type(z.conjugate)
38
Out[13]: builtin function or method
In [14]: somefunc=z.conjugate
In [15]: somefunc()
Out[15]: (1-5j)
In [17]: type(magic)
In [19]: help(sorted)
sorted(...)
sorted(iterable, cmp=None, key=None, reverse=False) --> new sorted list
The ‘dir’ function, when applied to an object, lists all its attributes (properties and methods):
In [20]: dir("Hexxo")
Out[20]: [’ add ’,
’ class ’,
’ contains ’,
’ delattr ’,
’ doc ’,
’ eq ’,
’ format ’,
’ ge ’,
’ getattribute ’,
’ getitem ’,
’ getnewargs ’,
’ getslice ’,
’ gt ’,
’ hash ’,
’ init ’,
’ le ’,
’ len ’,
’ lt ’,
’ mod ’,
’ mul ’,
’ ne ’,
’ new ’,
39
’ reduce ’,
’ reduce ex ’,
’ repr ’,
’ rmod ’,
’ rmul ’,
’ setattr ’,
’ sizeof ’,
’ str ’,
’ subclasshook ’,
’ formatter field name split’,
’ formatter parser’,
’capitalize’,
’center’,
’count’,
’decode’,
’encode’,
’endswith’,
’expandtabs’,
’find’,
’format’,
’index’,
’isalnum’,
’isalpha’,
’isdigit’,
’islower’,
’isspace’,
’istitle’,
’isupper’,
’join’,
’ljust’,
’lower’,
’lstrip’,
’partition’,
’replace’,
’rfind’,
’rindex’,
’rjust’,
’rpartition’,
’rsplit’,
’rstrip’,
’split’,
’splitlines’,
’startswith’,
’strip’,
’swapcase’,
’title’,
’translate’,
’upper’,
’zfill’]
Most of these are confusing methods beginning and ending with , part of the internals of python.
Again, just as with error messages, we have to learn to read past the bits that are confusing, to the bit
we want:
40
Out[21]: ’Hello’
4.5 Operators
Now that we know that functions are a way of taking a number of inputs and producing an output, we
should look again at what happens when we write:
In [22]: x = 2 + 3
In [23]: print x
This is just a pretty way of calling an “add” function. Things would be more symmetrical if add were
actually written
x = +(2,3)
Where ‘+’ is just the name of the name of the adding function.
In python, these functions do exist, but they’re actually methods of the first input: they’re the myste-
rious functions we saw earlier (Two underscores.)
In [24]: x.__add__(7)
Out[24]: 12
Out[25]: [2, 3, 4, 5, 6]
In [26]: 7-2
Out[26]: 5
---------------------------------------------------------------------------
<ipython-input-27-4627195e7799> in <module>()
----> 1 [2, 3, 4] - [5, 6]
In [28]: [2, 3, 4] + 5
41
---------------------------------------------------------------------------
<ipython-input-28-84117f41979f> in <module>()
----> 1 [2, 3, 4] + 5
Just as in Mathematics, operators have a built-in precedence, with brackets used to force an order of
operations:
14
20
42
Chapter 5
Types
In [1]: type(5)
Out[1]: int
In [2]: one=1
ten=10
one_float=1.
ten_float=10.
0 0.1
The divided by operator when applied to floats, means divide by for real numbers. But when applied to
integers, it means divide then round down:
In [6]: 10/3
Out[6]: 3
In [7]: 10.0/3
Out[7]: 3.3333333333333335
In [8]: 10/3.0
43
Out[8]: 3.3333333333333335
So if I have two integer variables, and I want the float division, I need to change the type first. There
is a function for every type name, which is used to convert the input to an output of the desired type.
In [9]: x=float(5)
type(x)
Out[9]: float
In [10]: 10/float(3)
Out[10]: 3.3333333333333335
I lied when I said that the float type was a real number. √It’s actually a computer representation of
a real number called a “floating point number”. Representing 2 or 31 perfectly would be impossible in a
computer, so we use a finite amount of memory to do it.
Supplementary material :
• https://docs.python.org/2/tutorial/floatingpoint.html
• http://floating-point-gui.de/formats/fp/
• Advanced: http://docs.oracle.com/cd/E19957-01/806-3568/ncg goldberg.html
5.2 Strings
Python has a built in string type, supporting many useful methods.
JAMES HETHERINGTON
As for float and int, the name of a type can be used as a function to convert between types:
11
101.0
We can remove extraneous material from the start and end of a string:
Out[15]: ’Hello’
44
5.3 Lists
Python’s basic container type is the list
We can define our own list with square brackets:
In [16]: [1, 3, 7]
Out[16]: [1, 3, 7]
Out[17]: list
In [19]: various_things[2]
Out[19]: ’banana’
In [20]: index = 0
various_things[index]
Out[20]: 1
[0, 1, 2, 3, 4]
James,Philip,John,Hetherington
45
5.4 Sequences
Many other things can be treated like lists. Python calls things that can be treated like lists sequences.
A string is one such sequence type
1
m
[1, 2]
o Wo
5
6
True
False
5.5 Unpacking
Multiple values can be unpacked when assigning from sequences, like dealing out decks of cards.
World
46
Chapter 6
Containers
Out[1]: True
Out[2]: False
In [3]: 2 in range(5)
Out[3]: True
In [4]: 99 in range(5)
Out[4]: False
6.2 Mutability
An array can be modified:
47
6.3 Tuples
A tuple is an immutable sequence:
---------------------------------------------------------------------------
<ipython-input-7-d3ad0c7e33f1> in <module>()
1 my tuple = ("Hello", "World")
----> 2 my tuple[0]="Goodbye"
---------------------------------------------------------------------------
<ipython-input-8-fe8069275347> in <module>()
1 fish = "Hake"
----> 2 fish[0] = ’R’
But note that container reassignment is moving a label, not changing an element:
Supplementary material : Try the online memory visualiser for this one.
In [10]: x = range(3)
print x
[0, 1, 2]
In [11]: y = x
print y
48
[0, 1, 2]
In [12]: z = x[0:3]
y[1] = "Gotcha!"
print x
print y
print z
[0, ’Gotcha!’, 2]
[0, ’Gotcha!’, 2]
[0, 1, 2]
[0, ’Gotcha!’, 2]
[0, ’Gotcha!’, 2]
[0, 1, ’Really?’]
In [14]: x=[[’a’,’b’],’c’]
y=x
z=x[0:2]
x[0][1]=’d’
z[1]=’e’
In [15]: x
In [16]: y
In [17]: z
True
False
49
The == operator checks, element by element, that two containers have the same data. The is operator
checks that they are actually the same object.
But, and this point is really subtle, for immutables, the python language might save memory by reusing
a single instantiated copy. This will always be safe.
True
True
50
Chapter 7
Dictionaries
Out[1]: ’Luther’
In [3]: print me
[’Programmer’, ’Teacher’]
<type ’dict’>
In [6]: me.keys()
In [7]: me.values()
51
In [8]: ’Jobs’ in me
Out[8]: True
In [9]: ’James’ in me
Out[9]: False
Out[10]: True
but:
---------------------------------------------------------------------------
<ipython-input-12-cca03b227ff4> in <module>()
----> 1 illegal = {[1,2]: 3}
Supplementary material : You can start to learn about the ‘hash table’ here This material is very ad-
vanced, but, I think, really interesting!
52
7.5 Sets
A set is a list which cannot contain the same element twice.
print "".join(unique_letters)
a egiHJmonsrth
It has no particular order, but is really useful for checking or storing unique values.
53
Chapter 8
Data structures
In [1]: UCL={
’City’: ’London’,
’Street’: ’Gower Street’,
’Postcode’: ’WC1E 6BT’
}
In [2]: James={
’City’: ’London’,
’Street’: ’Waterson Street’,
’Postcode’: ’E2 8HH’
}
In [4]: addresses
A more complicated data structure, for example for a census database, might have a list of residents or
employees at each address:
In [7]: addresses
54
’people’: [’Clare’, ’James’, ’Owain’]},
{’City’: ’London’,
’Postcode’: ’E2 8HH’,
’Street’: ’Waterson Street’,
’people’: [’Sue’, ’James’]}]
Which is then a list of dictionaries, with keys which are strings or lists.
We can go further, e.g.:
In [8]: UCL[’Residential’]=False
[’Clare’, ’Sue’]
This was an example of a ‘list comprehension’, which have used to get data of this structure, and which
we’ll see more of in a moment. . .
• The front room can hold 2 people. James is currently there. You can go outside to the garden, or
upstairs to the bedroom, or north to the kitchen.
• From the kitchen, you can go south to the front room. It fits 1 person.
• From the garden you can go inside to living room. It fits 3 people. Sue is currently there.
• From the bedroom, you can go downstairs. You can also jump out of the window to the garden. It fits
2 people.
In [1]: house = {
’living’ : {
’exits’: {
’north’ : ’kitchen’,
55
’outside’ : ’garden’,
’upstairs’ : ’bedroom’
},
’people’ : [’James’],
’capacity’ : 2
},
’kitchen’ : {
’exits’: {
’south’ : ’living’
},
’people’ : [],
’capacity’ : 1
},
’garden’ : {
’exits’: {
’inside’ : ’living’
},
’people’ : [’Sue’],
’capacity’ : 3
},
’bedroom’ : {
’exits’: {
’downstairs’ : ’living’,
’jump’ : ’garden’
},
’people’ : [],
’capacity’ : 1
}
}
56
Chapter 9
• Control whether a program statement should be executed or not, based on a variable. “Conditionality”
• Jump back to an earlier point in the program, and run some statements again. “Branching”
Once we have these, we can write computer programs to process information in arbitrary ways: we are
Turing Complete!
9.2 Conditionality
Conditionality is achieved through Python’s if statement:
In [1]: x = 5
if x < 0:
print x, " is negative"
x=-10
if x < 0:
print x, " is negative"
-10 is negative
In [2]: x = 5
if x < 0:
print "x is negative"
else:
print "x is positive"
57
x is positive
In [3]: x = 5
if x < 0:
print "x is negative"
elif x == 0:
print "x is zero"
else:
print "x is positive"
x is positive
Try editing the value of x here, and note that other sections are found.
if choice == ’high’:
print 1
elif choice == ’medium’:
print 2
else:
print 3
9.4 Comparison
True and False are used to represent boolean (true or false) values.
In [5]: 1 > 2
Out[5]: False
Out[6]: True
Out[7]: False
There are subtle implied order comparisons between types, but it would be bad style to rely on these,
because most human readers won’t remember them:
Out[8]: False
Out[9]: True
Any statement that evaluates to True or False can be used to control an if Statement.
58
9.5 Automatic Falsehood
Various other things automatically count as true or false, which can make life easier when coding:
In [10]: mytext = "Hello"
if mytext:
print "Mytext is not empty"
mytext2 = ""
if mytext2:
print "Mytext2 is not empty"
Mytext is not empty
We can use logical not and logical and to combine true and false:
In [11]: x=3.2
if not (x>0 and type(x)==int):
print x,"is not a positive integer"
3.2 is not a positive integer
not also understands magic conversion from false-like things to True or False.
In [12]: not not "Who’s there!" # Thanks to Mysterious Student
Out[12]: True
In [13]: bool("")
Out[13]: False
In [14]: bool("James")
Out[14]: True
In [15]: bool([])
Out[15]: False
In [16]: bool([’a’])
Out[16]: True
In [17]: bool({})
Out[17]: False
In [18]: bool({’name’: ’James’})
Out[18]: True
In [19]: bool(0)
Out[19]: False
In [20]: bool(1)
Out[20]: True
But subtly, although these quantities evaluate True or False in an if statement, they’re not themselves
actually True or False under ==:
In [21]: [] == False
Out[21]: False
In [22]: (not not []) == (not not False)
Out[22]: True
59
9.6 Indentation
In Python, indentation is semantically significant. You can choose how much indentation to use, so long as
you are consistent, but four spaces is conventional. Please do not use tabs.
In the notebook, and most good editors, when you press <tab>, you get four spaces.
In [23]: if x>0:
print x
9.7 Pass
A statement expecting identation must have some indented code. This can be annoying when commenting
things out. (With #)
In [24]: if x>0:
# print x
print "Hello"
9.8 Iteration
Our other aspect of control is looping back on ourselves.
We use for . . . in to “iterate” over lists:
In [1]: mylist = [3, 7, 15, 2]
for whatever in mylist:
print whatever**2
9
49
225
4
Each time through the loop, the variable in the value slot is updated to the next element of the sequence.
60
9.9 Iterables
Any sequence type is iterable:
In [2]: vowels="aeiou"
sarcasm = []
for letter in "Okay":
if letter.lower() in vowels:
repetition = 3
else:
repetition = 1
sarcasm.append(letter*repetition)
"".join(sarcasm)
Out[2]: ’OOOkaaay’
The above is a little puzzle, work through it to understand why it does what it does, you have
current_year = now.year
In [4]: triples=[
[4,11,15],
[39,4,18]
]
61
11
4
In [7]: # A reminder that the words you use for variable names are arbitrary:
for hedgehog, badger, fox in triples:
print badger
11
4
for example, to iterate over the items in a dictionary as pairs:
In [8]: things = {"James": [1976, ’Kendal’],
"UCL": [1826, ’Bloomsbury’],
"Cambridge": [1209, ’Cambridge’]}
print things.items()
[(’James’, [1976, ’Kendal’]), (’UCL’, [1826, ’Bloomsbury’]), (’Cambridge’, [1209, ’Cambridge’])]
In [9]: for name, year in founded.items():
print name, " is ", current_year - year, "years old."
James is 39 years old.
UCL is 189 years old.
Cambridge is 806 years old.
62
9.13.1 Solution: counting people in the maze
With this maze structure:
In [1]: house = {
’living’ : {
’exits’: {
’north’ : ’kitchen’,
’outside’ : ’garden’,
’upstairs’ : ’bedroom’
},
’people’ : [’James’],
’capacity’ : 2
},
’kitchen’ : {
’exits’: {
’south’ : ’living’
},
’people’ : [],
’capacity’ : 1
},
’garden’ : {
’exits’: {
’inside’ : ’living’
},
’people’ : [’Sue’],
’capacity’ : 3
},
’bedroom’ : {
’exits’: {
’downstairs’ : ’living’,
’jump’ : ’garden’
},
’people’ : [],
’capacity’ : 1
}
}
In [2]: capacity = 0
occupancy = 0
for name, room in house.items():
capacity+=room[’capacity’]
occupancy+=len(room[’people’])
print "House can fit", capacity, "people, and currently has:", occupancy, "."
63
Chapter 10
Comprehensions
Out[2]: [1, 1, 1, 1, 2, 2, 2, 3, 3, 3]
Out[3]: [1, 8, 64, 512, 4096, 32768, 262144, 2097152, 16777216, 134217728]
Consider the following, and make sure you understand why it works:
In [5]: result=[]
for x in range(30):
if x%3 == 0:
result.append(2**x)
print result
64
Does the same as the comprehension above. The comprehension is generally considered more readable.
Comprehensions are therefore an example of what we call ‘syntactic sugar’: they do not increase the
capabilities of the language.
Instead, they make it possible to write the same thing in a more readable way.
Everything we learn from now on will be either syntactic sugar or interaction with something other than
idealised memory, such as a storage device or the internet. Once you have variables, conditionality, and
branching, your language can do anything. (And this can be proved.)
Out[7]: [0, 1, 0, 2, 1, 0, 3, 2, 1, 0]
If you want something more like a matrix, you need to do two nested comprehensions!
Out[8]: [[0, 1, 2, 3], [-1, 0, 1, 2], [-2, -1, 0, 1], [-3, -2, -1, 0]]
Out[9]: [’a1’, ’a2’, ’a3’, ’b1’, ’b2’, ’b3’, ’c1’, ’c2’, ’c3’]
Out[10]: [[’a1’, ’b1’, ’c1’], [’a2’, ’b2’, ’c2’], [’a3’, ’b3’, ’c3’]]
65
There are lots of built-in methods that provide actions on lists as a whole:
Out[12]: True
Out[13]: False
Out[14]: 3
Out[15]: 6
My favourite is map, which is syntactic sugar for a simple list comprehension that applies one function to
every member of a list:
Out[16]: [’0’, ’1’, ’2’, ’3’, ’4’, ’5’, ’6’, ’7’, ’8’, ’9’]
Out[17]: [’0’, ’1’, ’2’, ’3’, ’4’, ’5’, ’6’, ’7’, ’8’, ’9’]
So I can write:
10.7.1 Solution
With this maze structure:
In [1]: house = {
’living’ : {
’exits’: {
’north’ : ’kitchen’,
’outside’ : ’garden’,
’upstairs’ : ’bedroom’
},
’people’ : [’James’],
’capacity’ : 2
},
’kitchen’ : {
66
’exits’: {
’south’ : ’living’
},
’people’ : [],
’capacity’ : 1
},
’garden’ : {
’exits’: {
’inside’ : ’living’
},
’people’ : [’Sue’],
’capacity’ : 3
},
’bedroom’ : {
’exits’: {
’downstairs’ : ’living’,
’jump’ : ’garden’
},
’people’ : [],
’capacity’ : 1
}
}
67
Chapter 11
Functions
11.1 Definition
We use def to define a function, and return to pass back a value:
10 [5, 5] fivefive
In [3]: jeeves()
In [4]: jeeves(’James’)
If you have some parameters with defaults, and some without, those with defaults must go later.
z=range(4)
double_inplace(z)
print z
[0, 2, 4, 6]
68
In this example, we’re using [:] to access into the same list, and write it’s data.
In [6]: x=5
x=7
x=[’a’,’b’,’c’]
y=x
In [7]: x
In [9]: y
In [11]: x=range(3)
extend(6,x,’a’)
print x
In [12]: z=range(9)
extend(6,z,’a’)
print z
[0, 1, 2, 3, 4, 5, 6, 7, 8]
print arrow(1, 3)
1 -> 3
In [14]: x=[1,-1]
print arrow(*x)
69
1 -> -1
electron -> -1
proton -> 1
neutron -> 0
print doubler(1,2,3)
[2, 4, 6]
arrowify(neutron="n",proton="p",electron="e")
electron -> e
proton -> p
neutron -> n
70
Chapter 12
Using Libraries
12.1 Import
To use a function or type from a python library, rather than a built-in function or type, we have to import
the library.
In [1]: math.sin(1.6)
---------------------------------------------------------------------------
<ipython-input-1-ecc9cee3d19a> in <module>()
----> 1 math.sin(1.6)
In [3]: math.sin(1.6)
Out[3]: 0.9995736030415051
In [4]: type(math)
Out[4]: module
The tools supplied by a module are attributes of the module, and as such, are accessed with a dot.
In [5]: dir(math)
Out[5]: [’ doc ’,
’ file ’,
’ name ’,
’ package ’,
’acos’,
’acosh’,
71
’asin’,
’asinh’,
’atan’,
’atan2’,
’atanh’,
’ceil’,
’copysign’,
’cos’,
’cosh’,
’degrees’,
’e’,
’erf’,
’erfc’,
’exp’,
’expm1’,
’fabs’,
’factorial’,
’floor’,
’fmod’,
’frexp’,
’fsum’,
’gamma’,
’hypot’,
’isinf’,
’isnan’,
’ldexp’,
’lgamma’,
’log’,
’log10’,
’log1p’,
’modf’,
’pi’,
’pow’,
’radians’,
’sin’,
’sinh’,
’sqrt’,
’tan’,
’tanh’,
’trunc’]
In [6]: math.pi
Out[6]: 3.141592653589793
You can always find out where on your storage medium a library has been imported from:
/usr/local/Cellar/python/2.7.10 2/Frameworks/Pytho
n.framework/Versions/2.7/lib/python2.7/lib-dynload/math.so
72
Note that import does not install libraries from PyPI. It just makes them available to your current
notebook session, assuming they are already installed. Installing libraries is harder, and we’ll cover it later.
So what libraries are available? Until you install more, you might have just the modules that come with
Python, the standard library
Supplementary Materials: Review the list of standard library modules:
https://docs.python.org/2/library/
If you installed via Anaconda, then you also have access to a bunch of modules that are commonly used
in research.
Supplementary Materials: Review the list of modules that are packaged with Anaconda by default:
http://docs.continuum.io/anaconda/pkg-docs.html (The green ticks)
We’ll see later how to add more libraries to our setup.
73
Chapter 13
Just as with other python types, you use the name of the type as a function to make a variable of that
type:
<type ’int’>
Living
The most common use of a class is to allow us to group data into an object in a way that is easier to
read and understand than organising data into lists and dictionaries.
In [6]: myroom.capacity = 3
myroom.occupants = ["James", "Sue"]
13.2 Methods
So far, our class doesn’t do much!
We define functions inside the definition of a class, in order to give them capabilities, just like the
methods on built-in types.
74
In [7]: class Room(object):
def overfull(self):
return len(self.occupants) > self.capacity
In [9]: myroom.overfull()
Out[9]: False
In [10]: myroom.occupants.append([’Clare’])
In [11]: myroom.occupants.append([’Bob’])
In [12]: myroom.overfull()
Out[12]: True
When we write methods, we always write the first function argument as self, to refer to the object
instance itself, the argument that goes “before the dot”.
13.3 Constructors
Normally, though, we don’t want to add data to the class attributes on the fly like that. Instead, we define
a constructor that converts input data into an object.
In [15]: living.capacity
Out[15]: 3
Methods which begin and end with two underscores in their names fulfil special capabilities in Python,
such as constructors.
For example, the below program might describe our “Maze of Rooms” system:
We define a “Maze” class which can hold rooms:
75
In [16]: class Maze(object):
def __init__(self, name):
self.name = name
self.rooms = {}
def occupants(self):
return [occupant for room in self.rooms.values()
for occupant in room.occupants.values()]
def wander(self):
"Move all the people in a random direction"
for occupant in self.occupants():
occupant.wander()
def describe(self):
for room in self.rooms.values():
room.describe()
def step(self):
house.describe()
print
house.wander()
print
def has_space(self):
return len(self.occupants) < self.capacity
def available_exits(self):
return [exit for exit, target in self.exits.items()
if self.maze.rooms[target].has_space() ]
def random_valid_exit(self):
import random
if not self.available_exits():
return None
return random.choice(self.available_exits())
76
def destination(self, exit):
return self.maze.rooms[ self.exits[exit] ]
def describe(self):
if self.occupants:
print self.name, ": ", " ".join(self.occupants.keys())
def wander(self):
exit = self.room.random_valid_exit()
if exit:
self.use(exit)
And we use these classes to define our people, rooms, and their relationships:
In [19]: james=Person(’James’)
sue=Person(’Sue’)
bob=Person(’Bob’)
clare=Person(’Clare’)
In [23]: living.add_occupant(james)
In [24]: garden.add_occupant(sue)
garden.add_occupant(clare)
In [25]: bedroom.add_occupant(bob)
77
In [26]: house.simulate(3)
bedroom : Bob
livingroom : James
garden : Clare Sue
bedroom : James
livingroom : Clare Sue
garden : Bob
bedroom : Sue
livingroom : Bob
garden : James
kitchen : Clare
78
def wander(self):
"Move all the people in a random direction"
for occupant in self.occupants:
occupant.wander()
def describe(self):
for occupant in self.occupants:
occupant.describe()
def step(self):
house.describe()
print
house.wander()
print
def has_space(self):
return self.occupancy < self.capacity
def available_exits(self):
return [exit for exit in self.exits if exit.valid() ]
def random_valid_exit(self):
import random
if not self.available_exits():
return None
return random.choice(self.available_exits())
79
def wander(self):
exit = self.room.random_valid_exit()
if exit:
self.use(exit)
def describe(self):
print self.name, "is in the", self.room.name
def valid(self):
return self.target.has_space()
In [32]: living=house.add_room(’livingroom’, 2)
bed = house.add_room(’bedroom’, 1)
garden = house.add_room(’garden’, 3)
kitchen = house.add_room(’kitchen’, 1)
In [38]: house.simulate(3)
80
James is in the garden
Sue is in the kitchen
Bob is in the livingroom
Clare is in the garden
This is a huge topic, about which many books have been written. The differences between these two
designs are important, and will have long-term consequences for the project. That is the how we start to
think about software engineering, as opposed to learning to program, and is an important part of this
course.
81
Chapter 14
We will often want to save our Python classes, for use in multiple Notebooks. We can do this by writing
text files with a .py extension, and then importing them.
In [1]: import os
if ’mazetool’ not in os.listdir(os.getcwd()):
os.mkdir(’mazetool’)
class Maze(object):
def __init__(self, name):
self.name = name
self.rooms = []
self.occupants = []
82
def wander(self):
"Move all the people in a random direction"
for occupant in self.occupants:
occupant.wander()
def describe(self):
for occupant in self.occupants:
occupant.describe()
def step(self):
house.describe()
print
house.wander()
print
Writing mazetool/maze.py
class Room(object):
def __init__(self, name, capacity):
self.name = name
self.capacity = capacity
self.occupancy = 0
self.exits = []
def has_space(self):
return self.occupancy < self.capacity
def available_exits(self):
return [exit for exit in self.exits if exit.valid() ]
def random_valid_exit(self):
import random
if not self.available_exits():
return None
return random.choice(self.available_exits())
Writing mazetool/room.py
class Person(object):
def __init__(self, name, room = None):
83
self.name=name
self.room=room
def wander(self):
exit = self.room.random_valid_exit()
if exit:
self.use(exit)
def describe(self):
print self.name, "is in the", self.room.name
Writing mazetool/person.py
In [5]: %%writefile mazetool/exit.py
class Exit(object):
def __init__(self, name, target):
self.name = name
self.target = target
def valid(self):
return self.target.has_space()
Writing mazetool/exit.py
In order to tell Python that our “mazetool” folder is a Python package, we have to make a special file
called init .py. If you import things in there, they are imported as part of the package:
In [6]: %%writefile mazetool/__init__.py
from maze import Maze
Writing mazetool/ init .py
---------------------------------------------------------------------------
<ipython-input-7-fa208048e024> in <module>()
----> 1 myhouse=Maze()
84
But now, we can import Maze, (and the other files will get imported via the chained Import statements,
starting from the init .py file.
Note the files we have created are on the disk in the folder we made:
In [10]: import os
In [11]: os.listdir(os.path.join(os.getcwd(),’mazetool’) )
.pyc files are “Compiled” temporary python files that the system generates to speed things up. They’ll
be regenerated on the fly when your .py files change.
/Library/Python/2.7/site-packages
/usr/local/lib/python2.7/site-packages/IPython/extensions
/Users/ccsprsd/.ipython
In [13]: sys.path.append(os.path.join(’/home/jamespjh/devel/libraries/python’))
/home/jamespjh/devel/libraries/python
I’ve thus added a folder to the list of places searched. If you want to do this permanently, you should set
the PYTHONPATH Environment Variable, which you can learn about in a shell course, or can read about
online for your operating system.
85
Chapter 15
So let us put it all back together, not forgetting ultimately what it is for.
86
Let it give us one more final pleasure; drink it and forget it all!
- Richard Feynman
Writing mydata.txt
Where did that go? It went to the current folder, which for a notebook, by default, is where the notebook
is on disk.
In [2]: import os # The ’os’ module gives us all the tools we need to search in the file system
os.getcwd() # Use the ’getcwd’ function from the ’os’ module to find where we are on disk.
Out[2]: ’/Users/ccsprsd/jenkins/development/workspace/engineering-publisher/ch01data’
Out[3]: [’mydata.txt’]
Yep! Note how we used a list comprehension to filter all the extraneous files.
In [4]: os.path.dirname(os.getcwd())
Out[4]: ’/Users/ccsprsd/jenkins/development/workspace/engineering-publisher’
In [5]: "/".join(os.getcwd().split("/")[:-1])
Out[5]: ’/Users/ccsprsd/jenkins/development/workspace/engineering-publisher’
But this would not work on windows, where path elements are separated with a \ instead of a /. So it’s
important to use os.path for this stuff.
Supplementary Materials: If you’re not already comfortable with how files fit into folders, and folders
form a tree, with folders containing subfolders, then look at http://swcarpentry.github.io/shell-novice/01-
filedir.html.
Satisfy yourself that after using %%writedir, you can then find the file on disk with Windows Explorer,
OSX Finder, or the Linux Shell.
We can see how in Python we can investigate the file system with functions in the os module, using just
the same programming approaches as for anything else.
We’ll gradually learn more features of the os module as we go, allowing us to move around the disk, walk
around the disk looking for relevant files, and so on. These will be important to master for automating our
data analyses.
In [6]: myfile=open(’mydata.txt’)
In [7]: type(myfile)
Out[7]: file
87
We can go line-by-line, by treating the file as an iterable:
Out[8]: ["A poet once said, ’The whole universe is in a glass of wine.’\n",
’We will probably never know in what sense he meant it, \n’,
’for poets do not write to be understood. \n’,
’But it is true that if we look at a glass of wine closely enough we see the entire universe. \
’There are the things of physics: the twisting liquid which evaporates depending\n’,
’on the wind and weather, the reflection in the glass;\n’,
’and our imagination adds atoms.\n’,
"The glass is a distillation of the earth’s rocks,\n",
"and in its composition we see the secrets of the universe’s age, and the evolution of stars. \
’What strange array of chemicals are in the wine? How did they come to be? \n’,
’There are the ferments, the enzymes, the substrates, and the products.\n’,
’There in wine is found the great generalization; all life is fermentation.\n’,
’Nobody can discover the chemistry of wine without discovering, \n’,
’as did Louis Pasteur, the cause of much disease.\n’,
’How vivid is the claret, pressing its existence into the consciousness that watches it!\n’,
’If our small minds, for some convenience, divide this glass of wine, this universe, \n’,
’into parts -- \n’,
’physics, biology, geology, astronomy, psychology, and so on -- \n’,
’remember that nature does not know it!\n’,
’\n’,
’So let us put it all back together, not forgetting ultimately what it is for.\n’,
’Let it give us one more final pleasure; drink it and forget it all!\n’,
’ - Richard Feynman’]
If we do that again, the file has already finished, there is no more data.
Out[9]: []
In [10]: myfile.seek(0)
[len(x) for x in myfile if ’ut’ in x]
It’s really important to remember that a file is a different built in type than a string.
In [11]: myfile.seek(0)
first = myfile.readline()
In [12]: first
Out[12]: "A poet once said, ’The whole universe is in a glass of wine.’\n"
In [13]: second=myfile.readline()
In [14]: second
88
Out[14]: ’We will probably never know in what sense he meant it, \n’
We can read the whole remaining file with read:
In [15]: rest=myfile.read()
In [16]: rest
Out[16]: "for poets do not write to be understood. \nBut it is true that if we look at a glass of wine c
Which means that when a file is first opened, read is useful to just get the whole thing as a string:
In [17]: open(’mydata.txt’).read()
Out[17]: "A poet once said, ’The whole universe is in a glass of wine.’\nWe will probably never know in
You can also read just a few characters:
In [18]: myfile.seek(1335)
In [19]: myfile.read(15)
Out[19]: ’\n - Richard F’
---------------------------------------------------------------------------
<ipython-input-22-8c85154fa12a> in <module>()
----> 1 mystring.readline()
This is important, because some file format parsers expect input from a file and not a string. We can
convert between them using the StringIO module in the standard library:
In [23]: from StringIO import StringIO
In [24]: mystringasafile=StringIO(mystring)
In [25]: mystringasafile.readline()
Out[25]: ’Hello World\n’
In [26]: mystringasafile.readline()
Out[26]: ’ My name is James’
Note that in a string, \n is used to represent a newline.
89
15.7 Closing files
We really ought to close files when we’ve finished with them, as it makes the computer more efficient. (On
a shared computer, this is particuarly important)
In [27]: myfile.close()
Because it’s so easy to forget this, python provides a context manager to open a file, then close it
automatically at the end of an indented block:
print content
So let us put it all back together, not forgetting ultimately what it is for.
Let it give us one more final pleasure; drink it and forget it all!
- Richard Feynman
The code to be done while the file is open is indented, just like for an if statement.
You should pretty much always use this syntax for working with files.
90
In [31]: with open(’mywrittenfile’,’r’) as source:
print source.read()
HelloWorld
HelloWorldHelloJames
91
Chapter 16
We’ve seen about obtaining data from our local file system.
The other common place today that we might want to obtain data is from the internet.
It’s very common today to treat the web as a source and store of information; we need to be able to
programmatically download data, and place it in python objects.
We may also want to be able to programmatically upload data, for example, to automatically fill in forms.
This can be really powerful if we want to, for example, do automated metaanalysis across a selection of
research papers.
16.1 URLs
All internet resources are defined by a Uniform Resource Locator.
In [1]: "http://maps.googleapis.com:80/maps/api/staticmap?size=400x400¢er=51.51,-0.1275&zoom=12"
Out[1]: ’http://maps.googleapis.com:80/maps/api/staticmap?size=400x400¢er=51.51,-0.1275&zoom=12’
Supplementary materials: These can actually be different for different protocols, the above is a
simplification, you can see more, for example, at https://en.wikipedia.org/wiki/URI scheme
URLs are not allowed to include all characters; we need to, for example, “escape” a space that appears
inside the URL, replacing it with %20, so e.g. a request of http://some example.com/ would need to be
http://some%20example.com/
Supplementary materials: The code used to replace each character is the ASCII code for it.
Supplementary materials: The escaping rules a are quite subtle. See
https://en.wikipedia.org/wiki/Percent-encoding
16.2 Requests
The python requests library can help us manage and manipulate URLs. It is easier to use than the ‘urllib’
library that is part of the standard library, and is included with anaconda and canopy. It sorts out escaping,
parameter encoding, and so on for us.
To request the above URL, for example, we write:
92
In [2]: import requests
In [4]: response.url
Out[4]: u’http://maps.googleapis.com/maps/api/staticmap?zoom=12¢er=51.51%2C-0.1275&size=400x400’
When we do a request, the result comes back as text. For the png image in the above, this isn’t very
readable:
In [5]: response.content[0:10]
Out[5]: ’\x89PNG\r\n\x1a\n\x00\x00’
Just as for file access, therefore, we will need to send the text we get to a python module which understands
that file format.
Again, it is important to separate the transport model, (e.g. a file system, or an “http request” for the
web, from the data model of the data that is returned.)
In [6]: spots=requests.get(’http://www.sidc.be/silso/INFO/snmtotcsv.php’).text
In [7]: spots[0:80]
This looks like semicolon-separated data, with different records on different lines. (Line separators come
out as \n)
There are many many scientific datasets which can now be downloaded like this - integrating the download
into your data pipeline can help to keep your data flows organised.
In [8]: lines=spots.split("\n")
lines[0:5]
93
In [10]: years[0:15]
Out[10]: [u’1749’,
u’1749’,
u’1749’,
u’1749’,
u’1749’,
u’1749’,
u’1749’,
u’1749’,
u’1749’,
u’1749’,
u’1749’,
u’1749’,
u’1750’,
u’1750’,
u’1750’]
But don’t: what if, for example, one of the records contains a separator inside it; most computers will
put the content in quotes, so that, for example,
something; something
The naive code above would give four fields, of which the first is
"Something
You’ll never manage to get all that right; so you’ll be better off using a library to do it.
94
Chapter 17
95
17.3 Python CSV readers
The Python standard library has a csv module. However, it’s less powerful than the CSV capabilities in
numpy, the main scientific python library for handling data. Numpy is destributed with Anaconda and
Canopy, so we recommend you just use that.
Numpy has powerful capabilities for handling matrices, and other fun stuff, and we’ll learn about these
later in the course, but for now, we’ll just use numpy’s CSV reader, and assume it makes us lists and
dictionaries, rather than it’s more exciting array type.
genfromtxt is a powerful CSV reader. I used the delimiter optional argument to specify the delimeter.
I could also specify names=True if I had a first line naming fields, and comments=# if I had comment lines.
In [4]: sunspots
96
The plot command accepted an array of ‘X’ values and an array of ‘Y’ values. We used a special NumPy
“:” syntax, which we’ll learn more about later.
CSV
Filename: SN m tot V2.0.csv Format: Comma Separated values (adapted for import in spread-
sheets) The separator is the semicolon ‘;’.
Contents: * Column 1-2: Gregorian calendar date - Year - Month * Column 3: Date in fraction
of year. * Column 4: Monthly mean total sunspot number. * Column 5: Monthly mean standard
deviation of the input sunspot numbers. * Column 6: Number of observations used to compute
the monthly mean total sunspot number. * Column 7: Definitive/provisional marker. ‘1’ indicates
that the value is definitive. ‘0’ indicates that the value is still provisional.
In [7]: sunspots
97
(2015.0, 9.0, 2015.707, 78.1, 6.6, 804.0, 0.0),
(2015.0, 10.0, 2015.79, 61.7, 5.3, 605.0, 0.0)],
dtype=[(’year’, ’<f8’), (’month’, ’<f8’), (’date’, ’<f8’), (’mean’, ’<f8’), (’deviation’,
98
Chapter 18
Structured Data
18.2 Json
A very common structured data format is JSON.
This allows us to represent data which is combinations of lists and dictionaries as a text file which looks
a bit like a Javascript (or Python) data literal.
In [1]: import json
Any nested group of dictionaries and lists can be saved:
In [2]: mydata = {’key’: [’value1’, ’value2’], ’key2’: {’key4’:’value3’}}
In [3]: json.dumps(mydata)
Out[3]: ’{"key2": {"key4": "value3"}, "key": ["value1", "value2"]}’
Loading data is also really easy:
In [4]: %%writefile myfile.json
{
"somekey": ["a list", "with values"]
}
Writing myfile.json
In [5]: mydata=json.load(open(’myfile.json’))
In [6]: mydata
Out[6]: {u’somekey’: [u’a list’, u’with values’]}
In [7]: mydata[’somekey’]
Out[7]: [u’a list’, u’with values’]
This is a very nice solution for loading and saving python datastructures.
It’s a very common way of transferring data on the internet, and of saving datasets to disk.
There’s good support in most languages, so it’s a nice inter-language file interchange format.
99
18.3 Unicode
Supplementary Material: Why do the strings come back with ‘u’ everywhere? These are Unicode
Strings, designed to hold hold all the world’s characters.
18.4 Yaml
Yaml is a very similar dataformat to Json, with some nice additions:
• You don’t need to quote strings if they don’t have funny characters in
• You can have comment lines, beginning with a #
• You can write dictionaries without the curly brackets: it just notices the colons.
• You can write lists like this:
Writing myfile.yaml
In [10]: yaml.load(open(’myfile.yaml’))
Yaml is my favourite format for ad-hoc datafiles, but the library doesn’t ship with default Python,
(though it is part of Anaconda and Canopy) so some people still prefer Json for it’s univerality.
Because Yaml gives the option of serialising a list either as newlines with dashes, or with square brackets,
you can control this choice:
In [11]: yaml.dump(mydata)
18.5 XML
Supplementary material : XML is another popular choice when saving nested data structures. It’s very
careful, but verbose. If your field uses XML data, you’ll need to learn a python XML parser, (there are a
few), and about how XML works.
18.6 Exercise:
Use YAML or JSON to save your maze datastructure to disk and load it again.
100
18.7 Solution: Saving and Loading a Maze
In [1]: house = {
’living’ : {
’exits’: {
’north’ : ’kitchen’,
’outside’ : ’garden’,
’upstairs’ : ’bedroom’
},
’people’ : [’James’],
’capacity’ : 2
},
’kitchen’ : {
’exits’: {
’south’ : ’living’
},
’people’ : [],
’capacity’ : 1
},
’garden’ : {
’exits’: {
’inside’ : ’living’
},
’people’ : [’Sue’],
’capacity’ : 3
},
’bedroom’ : {
’exits’: {
’downstairs’ : ’living’,
’jump’ : ’garden’
},
’people’ : [],
’capacity’ : 1
}
}
In [4]: %%bash
cat ’maze.json’
{"living": {"capacity": 2, "exits": {"outside": "garden", "north": "kitchen", "upstairs": "bedroom"}, "p
In [6]: maze_again
101
Out[6]: {u’bedroom’: {u’capacity’: 1,
u’exits’: {u’downstairs’: u’living’, u’jump’: u’garden’},
u’people’: []},
u’garden’: {u’capacity’: 3,
u’exits’: {u’inside’: u’living’},
u’people’: [u’Sue’]},
u’kitchen’: {u’capacity’: 1, u’exits’: {u’south’: u’living’}, u’people’: []},
u’living’: {u’capacity’: 2,
u’exits’: {u’north’: u’kitchen’,
u’outside’: u’garden’,
u’upstairs’: u’bedroom’},
u’people’: [u’James’]}}
Or with YAML:
In [9]: %%bash
cat ’maze.yaml’
bedroom:
capacity: 1
exits: {downstairs: living, jump: garden}
people: []
garden:
capacity: 3
exits: {inside: living}
people: [Sue]
kitchen:
capacity: 1
exits: {south: living}
people: []
living:
capacity: 2
exits: {north: kitchen, outside: garden, upstairs: bedroom}
people: [James]
In [11]: maze_again
102
Chapter 19
In [2]: quakes.text[0:50]
Out[2]: u’{"type":"FeatureCollection","metadata":{"generated’
Your exercise: determine the location of the largest magnitude earthquake in the UK this century.
You’ll need to: * Get the text of the web result * Parse the data as JSON * Understand how the data
is structured into dictionaries and lists * Where is the magnitude? * Where is the place description or
coordinates? * Program a search through all the quakes to find the biggest quake. * Find the place of the
biggest quake * Form a URL for Google Maps at that latitude and longitude: look back at the introductory
example * Display that image
103
"maxlongitude":"1.67",
"minlongitude":"-9.756",
"minmagnitude":"1",
"endtime":"2015-07-13",
"orderby":"time-asc"}
)
Out[4]: dict
In [5]: requests_json.keys()
In [6]: len(requests_json[’features’])
Out[6]: 110
In [7]: requests_json[’features’][0].keys()
In [8]: requests_json[’features’][0][’properties’][’mag’]
Out[8]: 2.6
In [9]: requests_json[’features’][0][’geometry’]
4.8
In [12]: lat=largest_so_far[’geometry’][’coordinates’][1]
long=largest_so_far[’geometry’][’coordinates’][0]
print "Latitude:", lat, "Longitude:", long
104
19.6 Get a map at the point of the quake
In [13]: import requests
def request_map_at(lat,long, satellite=False,zoom=12,size=(400,400),sensor=False):
base="http://maps.googleapis.com/maps/api/staticmap?"
params=dict(
sensor= str(sensor).lower(),
zoom= zoom,
size= "x".join(map(str,size)),
center= ",".join(map(str,(lat,long)))
)
if satellite:
params["maptype"]="satellite"
return requests.get(base,params=params)
Out[15]:
105
106
Chapter 20
We tell the IPython notebook to show figures we generate alongside the code that created it, rather than
in a separate window. Lines beginning with a single percent are not python code: they control how the
notebook deals with python code.
Lines beginning with two percents are “cell magics”, that tell IPython notebook how to interpret the
particular cell; we’ve seen %%writefile, for example.
107
The plot command returns a figure, just like the return value of any function. The notebook then displays
this.
To add a title, axis labels etc, we need to get that figure object, and manipulate it. For convenience,
matplotlib allows us to do this just by issuing commands to change the “current figure”:
108
But this requires us to keep all our commands together in a single cell, and makes use of a “global” single
“current plot”, which, while convenient for quick exploratory sketches, is a bit cumbersome. To produce
from our notebook proper plots to use in papers, Python’s plotting library, matplotlib, defines some types
we can use to treat individual figures as variables, and manipulate this.
109
Once we have some axes, we can plot a graph on them:
In [8]: sine_graph_axes.set_ylabel("f(x)")
Now we need to actually display the figure. As always with the notebook, if we make a variable be
returned by the last line of a code cell, it gets displayed:
In [10]: sine_graph
Out[10]:
110
We can add another curve:
In [12]: sine_graph
Out[12]:
111
A legend will help us distinguish the curves:
In [13]: sine_graph_axes.legend()
In [14]: sine_graph
Out[14]:
112
20.5 Saving figures.
We must be able to save figures to disk, in order to use them in papers. This is really easy:
In [15]: sine_graph.savefig(’my_graph.png’)
In order to be able to check that it worked, we need to know how to display an arbitrary image in the
notebook.
The programmatic way is like this:
In [16]: import IPython # Get the notebook’s own library for manipulating itself.
IPython.core.display.Image(open(’my_graph.png’).read())
Out[16]:
113
20.6 Subplots
We might have wanted the sin and cos graphs on separate axes:
In [17]: double_graph=plt.figure()
<matplotlib.figure.Figure at 0x1128edf10>
In [18]: sin_axes=double_graph.add_subplot(2,1,1)
In [19]: cos_axes=double_graph.add_subplot(2,1,2)
In [21]: sin_axes.set_ylabel("sin(x)")
In [23]: cos_axes.set_ylabel("cos(x)")
114
In [24]: cos_axes.set_xlabel("100 x")
In [25]: double_graph
Out[25]:
In [26]: double_graph=plt.figure()
sin_axes=double_graph.add_subplot(2,1,1)
cos_axes=double_graph.add_subplot(2,1,2)
cos_axes.set_ylabel("cos(x)")
sin_axes.set_ylabel("sin(x)")
cos_axes.set_xlabel("x")
115
In [27]: sin_axes.plot([x/100.0 for x in range(100)], [sin(pi*x/100.0) for x in range(100)])
cos_axes.plot([x/100.0 for x in range(100)], [cos(pi*x/100.0) for x in range(100)])
In [28]: double_graph
Out[28]:
116
20.8 Learning More
There’s so much more to learn about matplotlib: pie charts, bar charts, heat maps, 3-d plotting, animated
plots, and so on. You can learn all this via the Matplotlib Website. You should try to get comfortable with
all this, so please use some time in class, or at home, to work your way through a bunch of the examples.
117
Chapter 21
NumPy
By combining a plotting library, a matrix maths library, and an easy-to-use interface allowing live plotting
commands in a persistent environment, the powerful capabilities of MATLAB were matched by a free and
open toolchain.
We’ve learned about Matplotlib and IPython in this course already. NumPy is the last part of the trilogy.
In [2]: x
In [3]: x+5
---------------------------------------------------------------------------
118
<ipython-input-3-7b83a566c210> in <module>()
----> 1 x+5
---------------------------------------------------------------------------
<ipython-input-9-ad82621ab44a> in <module>()
----> 1 my array.append(4)
For NumPy arrays, you typically don’t change the data size once you’ve defined your array, whereas for
Python lists, you can do this efficiently. However, you get back lots of goodies in return. . .
119
21.4 Elementwise Operations
But most operations can be applied element-wise automatically!
In [10]: my_array + 2
In [11]: big_list=range(10000)
big_array=np.arange(10000)
In [12]: %%timeit
[x**2 for x in big_list]
In [13]: %%timeit
big_array**2
The slowest run took 8.86 times longer than the fastest. This could mean that an intermediate result is
100000 loops, best of 3: 7.46 µs per loop
In [14]: x=np.arange(0,10,0.1)
x
Out[14]: array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ,
1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1,
2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2,
3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4. , 4.1, 4.2, 4.3,
4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5. , 5.1, 5.2, 5.3, 5.4,
5.5, 5.6, 5.7, 5.8, 5.9, 6. , 6.1, 6.2, 6.3, 6.4, 6.5,
6.6, 6.7, 6.8, 6.9, 7. , 7.1, 7.2, 7.3, 7.4, 7.5, 7.6,
7.7, 7.8, 7.9, 8. , 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7,
8.8, 8.9, 9. , 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8,
9.9])
In [17]: values
120
1.11066407, 1.14239733, 1.17413059, 1.20586385, 1.23759711,
1.26933037, 1.30106362, 1.33279688, 1.36453014, 1.3962634 ,
1.42799666, 1.45972992, 1.49146318, 1.52319644, 1.5549297 ,
1.58666296, 1.61839622, 1.65012947, 1.68186273, 1.71359599,
1.74532925, 1.77706251, 1.80879577, 1.84052903, 1.87226229,
1.90399555, 1.93572881, 1.96746207, 1.99919533, 2.03092858,
2.06266184, 2.0943951 , 2.12612836, 2.15786162, 2.18959488,
2.22132814, 2.2530614 , 2.28479466, 2.31652792, 2.34826118,
2.37999443, 2.41172769, 2.44346095, 2.47519421, 2.50692747,
2.53866073, 2.57039399, 2.60212725, 2.63386051, 2.66559377,
2.69732703, 2.72906028, 2.76079354, 2.7925268 , 2.82426006,
2.85599332, 2.88772658, 2.91945984, 2.9511931 , 2.98292636,
3.01465962, 3.04639288, 3.07812614, 3.10985939, 3.14159265])
NumPy comes with ‘vectorised’ versions of common functions which work element-by-element when
applied to arrays:
In [19]: np.zeros([3,4,2])
121
Out[19]: array([[[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]],
[[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]],
[[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]]])
In [20]: x=np.array(range(40))
y=x.reshape([4,5,2])
y
[[10, 11],
[12, 13],
[14, 15],
[16, 17],
[18, 19]],
[[20, 21],
[22, 23],
[24, 25],
[26, 27],
[28, 29]],
[[30, 31],
[32, 33],
[34, 35],
[36, 37],
[38, 39]]])
In [21]: y[3,2,1]
Out[21]: 35
Including selecting on inner axes while taking all from the outermost:
In [22]: y[:,2,1]
122
And subselecting ranges:
In [23]: y[2:,:1,:]
[[30, 31]]])
In [24]: y.transpose()
In [25]: y.shape
Out[25]: (4, 5, 2)
In [26]: y.transpose().shape
Out[26]: (2, 5, 4)
Some numpy functions apply by default to the whole array, but can be chosen to act only on certain
axes:
In [27]: x=np.arange(12).reshape(4,3)
x
In [28]: x.sum(1) # Sum along the second axis, leaving the first.
In [29]: x.sum(0) # Sum along the first axis, leaving the first.
Out[30]: 66
123
21.7 Array Datatypes
A Python list can contain data of mixed type:
In [32]: type(x[2])
Out[32]: float
In [33]: type(x[1])
Out[33]: int
In [34]: np.array(x)
NumPy will choose the least-generic-possible datatype that can contain the data:
In [36]: y
In [37]: type(y[0])
Out[37]: numpy.float64
21.8 Broadcasting
This is another really powerful feature of NumPy
By default, array operations are element-by-element:
In [38]: np.arange(5)*np.arange(5)
---------------------------------------------------------------------------
<ipython-input-39-66b7c967724c> in <module>()
----> 1 np.arange(5) * np.arange(6)
ValueError: operands could not be broadcast together with shapes (5,) (6,)
124
In [40]: np.zeros([2,3])*np.zeros([2,4])
---------------------------------------------------------------------------
<ipython-input-40-4fb354a381e7> in <module>()
----> 1 np.zeros([2,3])*np.zeros([2,4])
ValueError: operands could not be broadcast together with shapes (2,3) (2,4)
In [43]: m1+m2
---------------------------------------------------------------------------
<ipython-input-43-e9085a7f6251> in <module>()
----> 1 m1+m2
ValueError: operands could not be broadcast together with shapes (10,10) (10,5,2)
Except, that if one array has any Dimension 1, then the data is REPEATED to match the other.
In [45]: m1=np.arange(10).reshape([10,1])
m1
Out[45]: array([[0],
[1],
[2],
[3],
[4],
[5],
[6],
[7],
[8],
[9]])
In [46]: m2=m1.transpose()
m2
125
Out[46]: array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
Out[47]: (10, 1)
In [49]: m1+m2
In [50]: 10*m1+m2
This works for arrays with more than one unit dimension.
21.9 Newaxis
Broadcasting is very powerful, and numpy allows indexing with np.newaxis to temporarily create new
one-long dimensions on the fly.
In [51]: x=np.arange(10).reshape(2,5)
y=np.arange(8).reshape(2,2,2)
In [52]: res=x[:,:,np.newaxis,np.newaxis]*y[:,np.newaxis,:,:]
In [53]: res.shape
Out[53]: (2, 5, 2, 2)
In [54]: np.sum(res)
Out[54]: 830
Note that newaxis works because a 3 × 1 × 3 array and a 3 × 3 array contain the same data, differently
shaped:
126
In [55]: threebythree=np.arange(9).reshape(3,3)
threebythree
In [56]: threebythree[:,np.newaxis,:]
[[3, 4, 5]],
[[6, 7, 8]]])
In [60]: np.array(x)
In [61]: np.array(x).dtype
Out[61]: dtype(’float64’)
These are, when you get to know them, fairly obvious string codes for datatypes: NumPy supports all
kinds of datatypes beyond the python basics.
NumPy will convert python type names to dtypes:
In [64]: int_array
127
Out[64]: array([2, 3, 7, 0])
In [65]: float_array
In [66]: int_array.dtype
Out[66]: dtype(’int64’)
In [67]: float_array.dtype
Out[67]: dtype(’float64’)
In [68]: x=np.arange(50).reshape([10,5])
In [70]: record_x
Record arrays can be addressed with field names like they were a dictionary:
In [71]: record_x[’col1’]
128
21.13 Logical arrays, masking, and selection
Numpy defines operators like == and < to apply to arrays element by element
In [72]: x=np.zeros([3,4])
x
Out[72]: array([[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]])
In [73]: y=np.arange(-1,2)[:,np.newaxis]*np.arange(-2,2)[np.newaxis,:]
y
Out[73]: array([[ 2, 1, 0, -1],
[ 0, 0, 0, 0],
[-2, -1, 0, 1]])
In [74]: iszero = x==y
iszero
Out[74]: array([[False, False, True, False],
[ True, True, True, True],
[False, False, True, False]], dtype=bool)
A logical array can be used to select elements from an array:
In [75]: y[np.logical_not(iszero)]
Out[75]: array([ 2, 1, -1, -2, -1, 1])
Although when printed, this comes out as a flat list, if assigned to, the selected elements of the array are
changed!
In [76]: y[iszero]=5
In [77]: y
Out[77]: array([[ 2, 1, 5, -1],
[ 5, 5, 5, 5],
[-2, -1, 5, 1]])
129
Chapter 22
The Boids!
22.1 Flocking
The aggregate motion of a flock of birds, a herd of land animals, or a school of fish is a beautiful
and familiar part of the natural world. . . The aggregate motion of the simulated flock is created
by a distributed behavioral model much like that at work in a natural flock; the birds choose
their own course. Each simulated bird is implemented as an independent actor that navigates
according to its local perception of the dynamic environment, the laws of simulated physics that
rule its motion, and a set of behaviors programmed into it. . . The aggregate motion of the
simulated flock is the result of the dense interaction of the relatively simple behaviors of the
individual simulated birds.
– Craig W. Reynolds, “Flocks, Herds, and Schools: A Distributed Behavioral Model”, Computer Graphics
21 4 1987, pp 25-34 See the original paper
In [2]: boid_count = 10
130
Out[4]: array([[ 710.35809712, 510.16788062, 1486.85845396, 430.77593244,
1185.14988108, 122.60227361, 484.48948249, 52.73997141,
1635.55381156, 1756.18999319],
[ 1909.51584644, 2763.25337086, 1750.37654827, 408.25217478,
3994.47310684, 2390.07042472, 2255.85236995, 2295.40982104,
593.19707991, 159.85521759]])
We used broadcasting with np.newaxis to apply our upper limit to each boid. rand gives us a random
number between 0 and 1. We multiply by our limits to get a number up to that limit.
Let’s put that in a function:
But each bird will also need a starting velocity. Let’s make these random too:
figure = plt.figure()
axes = plt.axes(xlim=(0, limits[0]), ylim=(0, limits[0]))
scatter=axes.scatter(positions[0,:],positions[1,:])
scatter
131
Out[9]: <matplotlib.collections.PathCollection at 0x10ef25950>
/usr/local/lib/python2.7/site-packages/matplotlib/collections.py:571: FutureWarning: elementwise compari
if self. edgecolors == str(’face’):
Then, we define a function which updates the figure for each timestep
In [10]: def update_boids(positions, velocities):
positions += velocities
def animate(frame):
update_boids(positions, velocities)
scatter.set_offsets(positions.transpose())
Call FuncAnimation, and specify how many frames we want:
In [11]: anim=animation.FuncAnimation(figure, animate,
frames=50, interval=50, blit=True)
Save out the figure:
In [12]: anim.save(’boids_1.mp4’)
And download the saved animation
You can even use an external library to view the results directly in the notebook. If you’re on your own
computer, you can download it from https://gist.github.com/gforsyth/188c32b6efe834337d8a (See the notes
on installing libraries. . . )
Unfortunately, if you’re on the teaching cluster, you won’t be able to install it there.
In [13]: from JSAnimation import IPython_display
# Inline animation tool; needs manual install via
# If you don’t have this, you need to save animations as MP4.
positions=new_flock(100, np.array([100,900]), np.array([200,1100]))
anim
132
Out[13]: <matplotlib.animation.FuncAnimation at 0x10edeef50>
In [15]: positions
In [16]: velocities
In [17]: middle=np.mean(positions, 1)
middle
133
22.6 Avoiding collisions
We’ll want to add our other flocking rules to the behaviour of the Boids.
We’ll need a matrix giving the distances between each bird. This should be N × N .
In [25]: xpos=positions[0,:]
In [27]: xsep_matrix.shape
Out[27]: (4, 4)
In [28]: xsep_matrix
But in NumPy we can be cleverer than that, and make a 2 by N by N matrix of separations:
In [30]: separations.shape
Out[30]: (2, 4, 4)
And then we can get the sum-of-squares δx2 + δy2 like this:
In [33]: square_distances
Find the direction distances only to those birds which are too close:
134
In [35]: separations_if_close = np.copy(separations)
far_away = np.logical_not(close_birds)
In [36]: separations_if_close[0,:,:][far_away] = 0
separations_if_close[1,:,:][far_away] = 0
separations_if_close
Out[36]: array([[[ 0. , 0. , 0. , 0. ],
[ 0. , 0. , 3.29796454, 11.17309188],
[ 0. , -3.29796454, 0. , 7.87512734],
[ 0. , -11.17309188, -7.87512734, 0. ]],
[[ 0. , 0. , 0. , 0. ],
[ 0. , 0. , -7.95762813, 6.38158724],
[ 0. , 7.95762813, 0. , 14.33921537],
[ 0. , -6.38158724, -14.33921537, 0. ]]])
positions += velocities
135
22.7 Match speed with nearby birds
This is pretty similar:
positions += velocities
anim=animation.FuncAnimation(figure, animate,
frames=200, interval=50, blit=True)
Hopefully the power of NumPy should be pretty clear now. This would be enormously slower and, I
think, harder to understand using traditional lists.
136
Chapter 23
We now know enough to understand everything we did in the initial example chapter on the “Greengraph”.
Go back to that part of the notes, and re-read the code.
Now, we can even write it up into a class, and save it as a module.
class Greengraph(object):
def __init__(self, start, end):
self.start=start
self.end=end
self.geocoder=geopy.geocoders.GoogleV3(domain="maps.google.co.uk")
Writing greengraph/graph.py
137
In [3]: %%writefile greengraph/map.py
import numpy as np
from StringIO import StringIO
from matplotlib import image as img
import requests
class Map(object):
def __init__(self, lat, long, satellite=True, zoom=10, size=(400,400), sensor=False):
base="http://maps.googleapis.com/maps/api/staticmap?"
params=dict(
sensor= str(sensor).lower(),
zoom= zoom,
size= "x".join(map(str, size)),
center= ",".join(map(str, (lat, long) )),
style="feature:all|element:labels|visibility:off"
)
if satellite:
params["maptype"]="satellite"
Writing greengraph/map.py
138
mygraph=Greengraph(’New York’,’Chicago’)
data = mygraph.green_between(20)
In [6]: plt.plot(data)
23.3 Introduction
23.3.1 What’s version control?
Version control is a tool for managing changes to a set of files.
There are many different version control systems:
• Git
• Mercurial (hg)
• CVS
• Subversion (svn)
• ...
139
• “How can I share my code?”
• “How can I submit a change to someone else’s code?”
• “How can I merge my work with Sue’s?”
Sue James
my vcs commit ...
... Join the team
... my vcs checkout
... Do some programming
... my vcs commit
my vcs update ...
Do some programming Do some programming
my vcs commit ...
my vcs update ...
my vcs merge ...
my vcs commit ...
23.3.6 Scope
This course will use the git version control system, but much of what you learn will be valid with other
version control tools you may encounter, including subversion (svn) and mercurial (hg).
140
In later parts of the course, you will use the version control tools you learn today with actual Python
code.
23.4.3 Markdown
The text files we create will use a simple “wiki” markup style called markdown to show formatting. This is
the convention used in this file, too.
You can view the content of this file in the way Markdown renders it by looking on the web, and compare
the raw text.
In [1]: %%bash
echo some output
some output
Writing somefile.md
But if you are following along, you should edit the file using a text editor. On windows, we recommend
Notepad++. On mac, we recommend Atom
In [4]: import os
top_dir = os.getcwd()
top_dir
Out[4]: ’/Users/ccsprsd/jenkins/development/workspace/engineering-publisher/ch02git’
In [7]: os.chdir(working_dir)
141
23.5 Solo work
23.5.1 Configuring Git with your name and email
First, we should configure Git to know our name and email address:
In [8]: %%bash
git config --global user.name "James Hetherington"
git config --global user.email "jamespjh@gmail.com"
Initial commit
142
Writing index.md
Mountains in the UK
===================
England is not very mountainous.
But has some tall hills, and maybe a mountain or two depending on your definition.
In [4]: %%bash
git add index.md
Don’t forget: Any files in repositories which you want to “track” need to be added with git add after
you create them.
In [5]: %%bash
git commit -m "First commit of discourse on UK topography"
In [6]: %%bash
git config --global core.editor vim
In [7]: %%bash
git config --get core.editor
vim
To configure Notepad++ on windows you’ll need something like the below, ask a demonstrator to help
for your machine.
I’m going to be using vim as my editor, but you can use whatever editor you prefer. (Windows users
could use “Notepad++”, Mac users could use “textmate” or “sublime text”, linux users could use vim, nano
or emacs.)
143
23.6.5 Git log
Git now has one change in its history:
In [8]: %%bash
git log
commit 1561abf7777027fdf3549e56f4f3f1a85e652309
Author: James Hetherington <jamespjh@gmail.com>
Date: Wed Nov 4 17:55:46 2015 +0000
In [9]: %%bash
git status
On branch master
nothing to commit, working directory clean
vim index.md
Overwriting index.md
Mountains in the UK
===================
England is not very mountainous.
But has some tall hills, and maybe a mountain or two depending on your definition.
144
23.6.8 Unstaged changes
In [12]: %%bash
git status
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: index.md
no changes added to commit (use "git add" and/or "git commit -a")
We can now see that there is a change to “index.md” which is currently “not staged for commit”. What
does this mean?
If we do a git commit now nothing will happen.
Git will only commit changes to files that you choose to include in each commit.
This is a difference from other version control systems, where committing will affect all changed files.
We can see the differences in the file with:
In [13]: %%bash
git diff
diff --git a/index.md b/index.md
index 4f737f1..263ec81 100644
--- a/index.md
+++ b/index.md
@@ -1,4 +1,6 @@
Mountains in the UK
===================
England is not very mountainous.
-But has some tall hills, and maybe a mountain or two depending on your definition.
\ No newline at end of file
+But has some tall hills, and maybe a mountain or two depending on your definition.
+
+Mount Fictional, in Barsetshire, U.K. is the tallest mountain in the world.
\ No newline at end of file
Deleted lines are prefixed with a minus, added lines prefixed with a plus.
145
23.6.11 Message Sequence Charts
In order to illustrate the behaviour of Git, it will be useful to be able to generate figures in Python of a
“message sequence chart” flavour.
There’s a nice online tool to do this, called “Message Sequence Charts”.
Have a look at https://www.websequencediagrams.com
Instead of just showing you these diagrams, I’m showing you in this notebook how I make them. This is
part of our “reproducible computing” approach; always generating all our figures from code.
Here’s some quick code in the Notebook to download and display an MSC illustration, using the Web
Sequence Diagrams API:
In [15]: %%writefile wsd.py
import requests
import re
import IPython
def wsd(code):
response = requests.post("http://www.websequencediagrams.com/index.php", data={
’message’: code,
’apiVersion’: 1,
})
expr = re.compile("(\?(img|pdf|png|svg)=[a-zA-Z0-9]+)")
m = expr.search(response.text)
if m == None:
print "Invalid response from server."
return False
image=requests.get("http://www.websequencediagrams.com/" + m.group(0))
return IPython.core.display.Image(image.content)
Writing wsd.py
In [16]: from wsd import wsd
%matplotlib inline
wsd("Sender->Recipient: Hello\n Recipient->Sender: Message received OK")
Out[16]:
146
23.6.12 The Levels of Git
Let’s make ourselves a sequence chart to show the different aspects of Git we’ve seen so far:
In [17]: message="""
Working Directory -> Staging Area : git add
Staging Area -> Local Repository : git commit
Working Directory -> Local Repository : git commit -a
"""
wsd(message)
Out[17]:
On branch master
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
modified: index.md
Untracked files:
(use "git add <file>..." to include in what will be committed)
wsd.py
wsd.pyc
In [19]: %%bash
git commit -m "Add a lie about a mountain"
In [20]: %%bash
git log
147
commit 47f2a5cb7119582d84116d3646286545f93ad967
Author: James Hetherington <jamespjh@gmail.com>
Date: Wed Nov 4 17:55:48 2015 +0000
commit 1561abf7777027fdf3549e56f4f3f1a85e652309
Author: James Hetherington <jamespjh@gmail.com>
Date: Wed Nov 4 17:55:46 2015 +0000
vim index.md
Overwriting index.md
This last command, git commit -a automatically adds changes to all tracked files to the staging area,
as part of the commit command. So, if you never want to just add changes to some tracked files but not
others, you can just use this and forget about the staging area!
148
23.6.16 Review of changes
In [24]: %%bash
git log | head
commit 201110ec7a142c3590a59700c62bcbdf8574dac1
Author: James Hetherington <jamespjh@gmail.com>
Date: Wed Nov 4 17:55:49 2015 +0000
Change title
commit 47f2a5cb7119582d84116d3646286545f93ad967
Author: James Hetherington <jamespjh@gmail.com>
Date: Wed Nov 4 17:55:48 2015 +0000
In [25]: %%bash
git log --oneline
In [26]: message="""
participant "Jim’s repo" as R
participant "Jim’s index" as I
participant Jim as J
149
J->R: Commit change to index.md
"""
wsd(message)
Out[26]:
150
23.7 Fixing mistakes
We’re still in our git working directory:
In [1]: import os
top_dir = os.getcwd()
git_dir = os.path.join(top_dir, ’learning_git’)
working_dir=os.path.join(git_dir, ’git_example’)
os.chdir(working_dir)
working_dir
23.7.2 Reverting
Ok, so now we’d like to undo the nasty commit with the lie about Mount Fictional.
In [2]: %%bash
git revert HEAD^
A commit may pop up, with some default text which you can accept and save.
In [3]: %%bash
git log
commit b7b89246d538956237b8c3308bd6502e2c1ef6ee
Author: James Hetherington <jamespjh@gmail.com>
Date: Wed Nov 4 17:55:51 2015 +0000
151
This reverts commit 47f2a5cb7119582d84116d3646286545f93ad967.
commit 201110ec7a142c3590a59700c62bcbdf8574dac1
Author: James Hetherington <jamespjh@gmail.com>
Date: Wed Nov 4 17:55:49 2015 +0000
Change title
commit 47f2a5cb7119582d84116d3646286545f93ad967
Author: James Hetherington <jamespjh@gmail.com>
Date: Wed Nov 4 17:55:48 2015 +0000
commit 1561abf7777027fdf3549e56f4f3f1a85e652309
Author: James Hetherington <jamespjh@gmail.com>
Date: Wed Nov 4 17:55:46 2015 +0000
23.7.5 Antipatch
Notice how the mistake has stayed in the history.
There is a new commit which undoes the change: this is colloquially called an “antipatch”. This is nice:
you have a record of the full story, including the mistake and its correction.
Overwriting index.md
In [5]: %%bash
cat index.md
In [6]: %%bash
git diff
152
@@ -1,4 +1,4 @@
Mountains and Hills in the UK
===================
-England is not very mountainous.
+Engerland is not very mountainous.
But has some tall hills, and maybe a mountain or two depending on your definition.
\ No newline at end of file
In [7]: %%bash
git commit -am "Add a silly spelling"
In [8]: %%bash
git log | head
commit 94dac4477e6486160a628edf488eaa9eefee24fd
Author: James Hetherington <jamespjh@gmail.com>
Date: Wed Nov 4 17:55:51 2015 +0000
commit b7b89246d538956237b8c3308bd6502e2c1ef6ee
Author: James Hetherington <jamespjh@gmail.com>
Date: Wed Nov 4 17:55:51 2015 +0000
In [10]: %%bash
git log --oneline
In [11]: %%bash
cat index.md
153
Mountains and Hills in the UK
===================
Engerland is not very mountainous.
But has some tall hills, and maybe a mountain or two depending on your definition.
If you want to lose the change from the working directory as well, you can do git reset --hard.
I’m going to get rid of the silly spelling, and I didn’t do --hard, so I’ll reset the file from the working
directory to be the same as in the index:
In [12]: %%bash
git checkout index.md
In [13]: %%bash
cat index.md
In [14]: message="""
Working Directory -> Staging Area : git add
Staging Area -> Local Repository : git commit
Working Directory -> Local Repository : git commit -a
Staging Area -> Working Directory : git checkout
Local Repository -> Staging Area : git reset
Local Repository -> Working Directory: git reset --hard
"""
from wsd import wsd
%matplotlib inline
wsd(message)
Out[14]:
154
We can add it to Jim’s story:
In [15]: message="""
participant "Jim’s repo" as R
participant "Jim’s index" as I
participant Jim as J
"""
wsd(message)
Out[15]:
155
23.8 Publishing
We’re still in our working directory:
In [1]: import os
top_dir = os.getcwd()
git_dir = os.path.join(top_dir, ’learning_git’)
working_dir=os.path.join(git_dir, ’git_example’)
os.chdir(working_dir)
working_dir
156
23.8.2 Creating a repository
Ok, let’s create a repository to store our work. Hit “new repository” on the right of the github home screen,
or click here.
Fill in a short name, and a description. Choose a “public” repository. Don’t choose to add a Readme.
23.8.5 Remotes
The first command sets up the server as a new remote, called origin.
Git, unlike some earlier version control systems is a “distributed” version control system, which means
you can work with multiple remote servers.
Usually, commands that work with remotes allow you to specify the remote to use, but assume the origin
remote if you don’t.
Here, git push will push your whole history onto the server, and now you’ll be able to see it on the
internet! Refresh your web browser where the instructions were, and you’ll see your repository!
Let’s add these commands to our diagram:
In [4]: message="""
Working Directory -> Staging Area : git add
Staging Area -> Local Repository : git commit
Working Directory -> Local Repository : git commit -a
Staging Area -> Working Directory : git checkout
Local Repository -> Staging Area : git reset
Local Repository -> Working Directory: git reset --hard
Local Repository -> Remote Repository : git push
"""
from wsd import wsd
%matplotlib inline
wsd(message)
157
Out[4]:
vim lakeland.md
Writing lakeland.md
Lakeland
========
158
23.9.2 Git will not by default commit your new file
In [7]: %%bash
git commit -am "Try to add Lakeland"
On branch master
Untracked files:
lakeland.md
wsd.py
wsd.pyc
To squelch this message and adopt the new behavior now, use:
See ’git help config’ and search for ’push.default’ for further information.
(the ’simple’ mode was introduced in Git 1.7.11. Use the similar mode
’current’ instead of ’simple’ if you sometimes use older versions of Git)
159
23.10 Changing two files at once
What if we change both files?
Mountains:
* Helvellyn
Overwriting lakeland.md
Overwriting index.md
In [12]: %%bash
git status
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: index.md
modified: lakeland.md
Untracked files:
(use "git add <file>..." to include in what will be committed)
wsd.py
wsd.pyc
no changes added to commit (use "git add" and/or "git commit -a")
These changes should really be separate commits. We can do this with careful use of git add, to stage
first one commit, then the other.
In [13]: %%bash
git add index.md
git commit -m "Include lakes in the scope"
Because we “staged” only index.md, the changes to lakeland.md were not included in that commit.
In [14]: %%bash
git commit -am "Add Helvellyn"
160
[master fa37aa5] Add Helvellyn
1 file changed, 4 insertions(+), 1 deletion(-)
In [15]: %%bash
git log --oneline
In [16]: %%bash
git push
To squelch this message and adopt the new behavior now, use:
See ’git help config’ and search for ’push.default’ for further information.
(the ’simple’ mode was introduced in Git 1.7.11. Use the similar mode
’current’ instead of ’simple’ if you sometimes use older versions of Git)
In [17]: message="""
participant "Jim’s remote" as M
participant "Jim’s repo" as R
participant "Jim’s index" as I
participant Jim as J
161
note right of J: git commit -m "Include lakes"
I->R: Make a commit from currently staged changes: index.md only
Out[17]:
23.11 Collaboration
23.11.1 Form a team
Now we’re going to get to the most important question of all with Git and GitHub: working with others.
Organise into pairs. You’re going to be working on the website of one of the two of you, together, so
decide who is going to be the leader, and who the collaborator.
162
In [1]: import os
top_dir = os.getcwd()
git_dir = os.path.join(top_dir, ’learning_git’)
working_dir=os.path.join(git_dir, ’git_example’)
os.chdir(git_dir)
In [2]: %%bash
pwd
rm -rf github-example # cleanup after previous example
rm -rf partner_dir # cleanup after previous example
/Users/ccsprsd/jenkins/development/workspace/engineering-publisher/ch02git/learning git
Next, the collaborator needs to find out the URL of the repository: they should go to the leader’s
repository’s GitHub page, and note the URL on the top of the screen. Make sure the “ssh” button is pushed,
the URL should begin with git@github.com.
Copy the URL into your clipboard by clicking on the icon to the right of the URL, and then:
In [3]: %%bash
pwd
git clone git@github.com:UCL/github-example.git
mv github-example partner_dir
/Users/ccsprsd/jenkins/development/workspace/engineering-publisher/ch02git/learning git
In [5]: %%bash
pwd
ls
Note that your partner’s files are now present on your disk:
In [6]: %%bash
cat lakeland.md
Lakeland
========
Mountains:
* Helvellyn
163
23.11.4 Nonconflicting changes
Now, both of you should make some changes. To start with, make changes to different files. This will mean
your work doesn’t “conflict”. Later, we’ll see how to deal with changes to a shared file.
Both of you should commit, but not push, your changes to your respective files:
E.g., the leader:
In [7]: os.chdir(working_dir)
* Tryfan
* Yr Wyddfa
Writing Wales.md
In [9]: %%bash
ls
Wales.md
index.md
lakeland.md
wsd.py
wsd.pyc
In [10]: %%bash
git add Wales.md
git commit -m "Add wales"
In [11]: os.chdir(partner_dir)
* Ben Eighe
* Cairngorm
Overwriting Scotland.md
In [13]: %%bash
ls
Makefile
Pennines.md
Scotland.md
Wales.md
index.md
lakeland.md
164
In [14]: %%bash
git add Scotland.md
git commit -m "Add Scotland"
In [15]: %%bash
git push
To squelch this message and adopt the new behavior now, use:
See ’git help config’ and search for ’push.default’ for further information.
(the ’simple’ mode was introduced in Git 1.7.11. Use the similar mode
’current’ instead of ’simple’ if you sometimes use older versions of Git)
In [16]: os.chdir(working_dir)
In [17]: %%bash
git push
To squelch this message and adopt the new behavior now, use:
165
git config --global push.default simple
See ’git help config’ and search for ’push.default’ for further information.
(the ’simple’ mode was introduced in Git 1.7.11. Use the similar mode
’current’ instead of ’simple’ if you sometimes use older versions of Git)
Do as it suggests:
In [18]: %%bash
git pull
If you wish to set tracking information for this branch you can do so with:
In [19]: %%bash
git push
To squelch this message and adopt the new behavior now, use:
166
git config --global push.default simple
See ’git help config’ and search for ’push.default’ for further information.
(the ’simple’ mode was introduced in Git 1.7.11. Use the similar mode
’current’ instead of ’simple’ if you sometimes use older versions of Git)
In [20]: os.chdir(partner_dir)
In [21]: %%bash
git pull
Already up-to-date.
In [22]: %%bash
ls
Makefile
Pennines.md
Scotland.md
Wales.md
index.md
lakeland.md
* Tryfan
* Snowdon
Overwriting Wales.md
In [24]: %%bash
git diff
167
diff --git a/Wales.md b/Wales.md
index 784f1df..e2ca555 100644
--- a/Wales.md
+++ b/Wales.md
@@ -1,9 +1,5 @@
Mountains In Wales
==================
-* Pen y Fan
* Tryfan
-* Snowdon
-* Glyder Fawr
-* Fan y Big
-* Cadair Idris
\ No newline at end of file
+* Snowdon
\ No newline at end of file
In [25]: %%bash
git commit -am "Translating from the Welsh"
In [26]: %%bash
git log --oneline
In [27]: os.chdir(working_dir)
168
* Pen y Fan
* Tryfan
* Snowdon
Overwriting Wales.md
In [29]: %%bash
git commit -am "Add a beacon"
In [30]: %%bash
git log --oneline
In [31]: %%bash
git push
To squelch this message and adopt the new behavior now, use:
See ’git help config’ and search for ’push.default’ for further information.
(the ’simple’ mode was introduced in Git 1.7.11. Use the similar mode
’current’ instead of ’simple’ if you sometimes use older versions of Git)
In [32]: os.chdir(partner_dir)
169
In [33]: %%bash
git push
To squelch this message and adopt the new behavior now, use:
See ’git help config’ and search for ’push.default’ for further information.
(the ’simple’ mode was introduced in Git 1.7.11. Use the similar mode
’current’ instead of ’simple’ if you sometimes use older versions of Git)
In [34]: %%bash
git pull
Already up-to-date.
In [35]: %%bash
git push
To squelch this message and adopt the new behavior now, use:
170
See ’git help config’ and search for ’push.default’ for further information.
(the ’simple’ mode was introduced in Git 1.7.11. Use the similar mode
’current’ instead of ’simple’ if you sometimes use older versions of Git)
In [36]: %%bash
git log --oneline --graph
In [37]: os.chdir(working_dir)
In [38]: %%bash
git pull
If you wish to set tracking information for this branch you can do so with:
171
In [39]: %%bash
git log --graph --oneline
In [40]: message="""
participant Sue as S
participant "Sue’s repo" as SR
participant "Shared remote" as M
participant "Jim’s repo" as JR
participant Jim as J
"""
from wsd import wsd
%matplotlib inline
wsd(message)
Out[40]:
172
23.11.8 Conflicting commits
Finally, go through the process again, but this time, make changes which touch the same line.
* Pen y Fan
* Tryfan
* Snowdon
* Fan y Big
Overwriting Wales.md
In [42]: %%bash
git commit -am "Add another Beacon"
git push
To squelch this message and adopt the new behavior now, use:
173
git config --global push.default simple
See ’git help config’ and search for ’push.default’ for further information.
(the ’simple’ mode was introduced in Git 1.7.11. Use the similar mode
’current’ instead of ’simple’ if you sometimes use older versions of Git)
In [43]: os.chdir(partner_dir)
* Pen y Fan
* Tryfan
* Snowdon
* Glyder Fawr
Overwriting Wales.md
In [45]: %%bash
git commit -am "Add Glyder"
git push
To squelch this message and adopt the new behavior now, use:
174
See ’git help config’ and search for ’push.default’ for further information.
(the ’simple’ mode was introduced in Git 1.7.11. Use the similar mode
’current’ instead of ’simple’ if you sometimes use older versions of Git)
When you pull, instead of offering an automatic merge commit message, it says:
In [46]: %%bash
git pull
Already up-to-date.
In [47]: %%bash
cat Wales.md
Mountains In Wales
==================
* Pen y Fan
* Tryfan
* Snowdon
* Glyder Fawr
Manually edit the file, to combine the changes as seems sensible and get rid of the symbols:
* Pen y Fan
* Tryfan
* Snowdon
* Glyder Fawr
* Fan y Big
Overwriting Wales.md
In [49]: %%bash
git commit -a --no-edit # I added a No-edit for this non-interactive session. You can edit the
175
In [50]: %%bash
git push
To squelch this message and adopt the new behavior now, use:
See ’git help config’ and search for ’push.default’ for further information.
(the ’simple’ mode was introduced in Git 1.7.11. Use the similar mode
’current’ instead of ’simple’ if you sometimes use older versions of Git)
In [51]: os.chdir(working_dir)
In [52]: %%bash
git pull
If you wish to set tracking information for this branch you can do so with:
In [53]: %%bash
cat Wales.md
Mountains In Wales
==================
* Pen y Fan
* Tryfan
* Snowdon
* Fan y Big
176
In [54]: %%bash
git log --oneline --graph
* c4189aa Add another Beacon
* 1e4351d Add a beacon
* 35228fd Add wales
* fa37aa5 Add Helvellyn
* 8eeb1aa Include lakes in the scope
* 9d7c86a Add lakeland
* b7b8924 Revert "Add a lie about a mountain"
* 201110e Change title
* 47f2a5c Add a lie about a mountain
* 1561abf First commit of discourse on UK topography
"""
wsd(message)
177
Out[55]:
wsd(message)
Out[56]:
178
23.12 Editing directly on GitHub
23.12.1 Editing directly on GitHub
Note that you can also make changes in the GitHub website itself. Visit one of your files, and hit “edit”.
Make a change in the edit window, and add an appropriate commit message.
That change now appears on the website, but not in your local copy. (Verify this).
Now pull, and check the change is now present on your local version.
179
You can inspect and clone Numpy’s code in GitHub https://github.com/numpy/numpy, play around a
bit and find how to fix the bug.
Numpy has done so much for you asking nothing in return, that you really want to contribute back by
fixing the bug for them.
You make all of the changes but you can’t push it back to Numpy’s repository because you don’t have
permissions.
The right way to do this is forking Numpy’s repository.
1. Fork repository
You will see on the top right of the page a Fork button with an accompanying number indicating how many
GitHub users have forked that repository.
Collaborators need to navigate to the leader’s repository and click the Fork button.
Collaborators: note how GitHub has redirected you to your own GitHub page and you are now looking
at an exact copy of the team leader’s repository.
180
4. Make, commit and push changes to new branch
For example, let’s create a new file called SouthWest.md and edit it to add this text:
* Exmoor
* Dartmoor
* Bodmin Moor
Save it, and push this changes to your fork’s new branch:
7. Fixes by collaborator
Collaborators will be notified of this comment by email and also in their profiles page. Click the link
accompanying this notification to read the comment from the team leader.
Go back to your local repository, make the changes suggested and push them to the new branch.
Add this at the beginning of your file:
git add .
git commit -m "Titles added as requested."
git push origin southwest
This change will automatically be added to the pull request you started.
181
8. Leader accepts pull request
The team leader will be notified of the new changes that can be reviewed in the same fashion as earlier.
Let’s assume the team leader is now happy with the changes.
Leaders can see in the “Conversation” tab of the pull request a green button labelled Merge pull
request. Click it and confirm the decission.
The collaborator’s pull request has been accepted and appears now in the original repository owned by
the team leader.
Fork and Pull Request done!
In [1]: import os
top_dir = os.getcwd()
git_dir = os.path.join(top_dir, ’learning_git’)
working_dir=os.path.join(git_dir, ’git_example’)
os.chdir(working_dir)
In [2]: %%bash
git log --graph --oneline
182
23.15.2 Git concepts
• Each revision has a parent that it is based on
• These revisions form a graph
• Each revision has a unique hash code
• In Sue’s copy, revision 43 is ab3578d6
• Jim might think that is revision 38, but it’s still ab3579d6
• Branches, tags, and HEAD are labels pointing at revisions
• Some operations (like fast forward merges) just move labels.
Understanding all the things git reset can do requires a good grasp of git theory.
• git reset <commit> <filename> : Reset index and working version of that file to the version in a
given commit
• git reset --soft <commit>: Move local repository branch label to that commit, leave working dir
and index unchanged
• git reset <commit>: Move local repository and index to commit (“–mixed”)
• git reset --hard <commit>: Move local repostiory, index, and working directory copy to that state
23.16 Branches
Branches are increadibly important to why git is cool and powerful.
They are an easy and cheap way of making a second version of your software, which you work on in
parallel, and pull in your changes when you are ready.
In [1]: import os
top_dir = os.getcwd()
git_dir = os.path.join(top_dir, ’learning_git’)
working_dir=os.path.join(git_dir, ’git_example’)
os.chdir(working_dir)
In [2]: %%bash
git branch # Tell me what branches exist
* master
In [3]: %%bash
git checkout -b experiment # Make a new branch
In [4]: %%bash
git branch
* experiment
master
183
In [5]: %%bash
git commit -am "Add Cadair Idris"
On branch experiment
Untracked files:
wsd.py
wsd.pyc
* Pen y Fan
* Tryfan
* Snowdon
* Fan y Big
In [8]: %%bash
git checkout experiment
Switched to branch ’experiment’
In [9]: cat Wales.md
Mountains In Wales
==================
* Pen y Fan
* Tryfan
* Snowdon
* Fan y Big
184
In [11]: %%bash
git branch -r
origin/gh-pages
origin/master
Local branches can be, but do not have to be, connected to remote branches They are said to “track”
remote branches. push -u sets up the tracking relationship.
In [12]: %%bash
git branch -vv
* experiment c4189aa Add another Beacon
master c4189aa Add another Beacon
* Ben Eighe
* Cairngorm
* Aonach Eagach
185
Writing Scotland.md
In [18]: %%bash
git diff Scotland.md
In [19]: %%bash
git commit -am "Commit Aonach onto master branch"
On branch master
Untracked files:
Scotland.md
wsd.py
wsd.pyc
Then this notation is useful to show the content of what’s on what branch:
In [20]: %%bash
git log --left-right --oneline master...experiment
Three dots means “everything which is not a common ancestor” of the two commits, i.e. the differences
between them.
In [21]: %%bash
git branch
git merge experiment
experiment
* master
Already up-to-date.
In [22]: %%bash
git log --graph --oneline HEAD~3..HEAD
experiment
* master
In [24]: %%bash
git branch -d experiment
In [25]: %%bash
git branch
186
* master
In [26]: %%bash
git branch --remote
origin/gh-pages
origin/master
In [27]: %%bash
git push --delete origin experiment # Remove remote branch - also can use github interface
In [28]: %%bash
git branch --remote
origin/gh-pages
origin/master
In [1]: import os
top_dir = os.getcwd()
git_dir = os.path.join(top_dir, ’learning_git’)
working_dir=os.path.join(git_dir, ’git_example’)
os.chdir(working_dir)
187
In [2]: %%writefile Wales.md
Mountains In Wales
==================
* Pen y Fan
* Tryfan
* Snowdon
* Glyder Fawr
* Fan y Big
* Cadair Idris
Overwriting Wales.md
In [3]: %%bash
git stash
git pull
Saved working directory and index state WIP on master: c4189aa Add another Beacon
HEAD is now at c4189aa Add another Beacon
If you wish to set tracking information for this branch you can do so with:
In [4]: %%bash
git stash apply
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: Wales.md
Untracked files:
(use "git add <file>..." to include in what will be committed)
Scotland.md
wsd.py
wsd.pyc
no changes added to commit (use "git add" and/or "git commit -a")
The “Stash” is a way of temporarily saving your working area, and can help out in a pinch.
23.18 Tagging
Tags are easy to read labels for revisions, and can be used anywhere we would name a commit.
Produce real results only with tagged revisions
188
In [5]: %%bash
git tag -a v1.0 -m "Release 1.0"
git push --tags
* Cross Fell
Writing Pennines.md
In [7]: %%bash
git add Pennines.md
git commit -am "Add Pennines"
In [8]: %%bash
git log v1.0.. --graph --oneline
MDS=$(wildcard *.md)
189
PDFS=$(MDS:.md=.pdf)
default: $(PDFS)
%.pdf: %.md
pandoc $< -o $@
Writing Makefile
In [10]: %%bash
make
We now have a bunch of output .pdf files corresponding to each Markdown file.
But we don’t want those to show up in git:
In [11]: %%bash
git status
On branch master
Untracked files:
(use "git add <file>..." to include in what will be committed)
Makefile
Pennines.pdf
Scotland.md
Scotland.pdf
Wales.pdf
index.pdf
lakeland.pdf
wsd.py
wsd.pyc
nothing added to commit but untracked files present (use "git add" to track)
Use .gitignore files to tell Git not to pay attention to files with certain paths:
Writing .gitignore
In [13]: %%bash
git status
On branch master
Untracked files:
(use "git add <file>..." to include in what will be committed)
.gitignore
Makefile
190
Scotland.md
wsd.py
wsd.pyc
nothing added to commit but untracked files present (use "git add" to track)
In [14]: %%bash
git add Makefile
git add .gitignore
git commit -am "Add a makefile and ignore generated files"
git push
[master b298c68] Add a makefile and ignore generated files
2 files changed, 9 insertions(+)
create mode 100644 .gitignore
create mode 100644 Makefile
warning: push.default is unset; its implicit value has changed in
Git 2.0 from ’matching’ to ’simple’. To squelch this message
and maintain the traditional behavior, use:
To squelch this message and adopt the new behavior now, use:
See ’git help config’ and search for ’push.default’ for further information.
(the ’simple’ mode was introduced in Git 1.7.11. Use the similar mode
’current’ instead of ’simple’ if you sometimes use older versions of Git)
191
Makefile
Pennines.md
Scotland.md
Wales.md
index.md
lakeland.md
wsd.py
wsd.pyc
23.21 Hunks
23.21.1 Git Hunks
A “Hunk” is one git change. This changeset has three hunks:
+import matplotlib
+import numpy as np
+def increment_or_add(key,hash,weight=1):
+ if key not in hash:
+ hash[key]=0
+ hash[key]+=weight
+
data_path=os.path.join(os.path.dirname(
os.path.abspath(__file__)),
-regenerate=False
+regenerate=True
+import matplotlib
+import numpy as np
#Stage this hunk [y,n,a,d,/,j,J,g,e,?]?
192
---
---
A pair of lines with three dashes, to the top of each markdown file. This is how GitHub knows which
markdown files to make into web pages. Here’s why for the curious.
In [17]: %%writefile index.md
---
title: Github Pages Example
---
Mountains and Lakes in the UK
===================
193
Centralised Distributed
Server has history Every user has full history
Your computer has one snapshot Many local branches
To access history, need internet History always available
You commit to remote server Users synchronise histories
cvs, subversion(svn) git, mercurial (hg), bazaar (bzr)
With modern distributed systems, we can add a second remote. This might be a personal fork on github:
In [1]: import os
top_dir = os.getcwd()
git_dir = os.path.join(top_dir, ’learning_git’)
working_dir=os.path.join(git_dir, ’git_example’)
os.chdir(working_dir)
In [2]: %%bash
git remote add jamespjh git@github.com:jamespjh/github-example.git
git remote -v
* Cross Fell
* Whernside
Overwriting Pennines.md
In [4]: %%bash
git commit -am "Add Whernside"
In [5]: %%bash
git push jamespjh
To squelch this message and adopt the new behavior now, use:
194
When push.default is set to ’matching’, git will push local branches
to the remote branches that already exist with the same name.
See ’git help config’ and search for ’push.default’ for further information.
(the ’simple’ mode was introduced in Git 1.7.11. Use the similar mode
’current’ instead of ’simple’ if you sometimes use older versions of Git)
In [6]: %%bash
git fetch
git log --oneline --left-right jamespjh/master...origin/master
fatal: ambiguous argument ’jamespjh/master...origin/master’: unknown revision or path not in the working
Use ’--’ to separate paths from revisions, like this:
’git <command> [<revision>...] -- [<file>...]’
In [7]: %%bash
git diff --name-only origin/master
Pennines.md
Scotland.md
index.md
When you reference remotes like this, you’re working with a cached copy of the last time you interacted
with the remote. You can do git fetch to update local data with the remotes without actually pulling. You
can also get useful information about whether tracking branches are ahead or behind the remote breanches
they track:
In [8]: %%bash
git branch -vv
195
• Pushing to someone’s working copy is dangerous
• Use git init --bare to make a copy for pushing
• You don’t need to create a “server” as such, any ‘bare’ git repo will do.
In [10]: %%bash
mkdir -p bare_repo
cd bare_repo
git init --bare
In [11]: os.chdir(working_dir)
In [12]: %%bash
git remote add local_bare ../bare_repo
git push local_bare
To squelch this message and adopt the new behavior now, use:
See ’git help config’ and search for ’push.default’ for further information.
(the ’simple’ mode was introduced in Git 1.7.11. Use the similar mode
’current’ instead of ’simple’ if you sometimes use older versions of Git)
To ../bare repo
* [new branch] gh-pages -> gh-pages
In [13]: %%bash
git remote -v
You can now work with this local repository, just as with any other git server. If you have a colleague
on a shared file system, you can use this approach to collaborate through that file system.
196
23.24.2 Home-made SSH servers
Classroom exercise: Try creating a server for yourself using a machine you can SSH to:
ssh <mymachine>
mkdir mygitserver
cd mygitserver
git init --bare
exit
git remote add <somename> ssh://user@host/mygitserver
git push -u <somename> master
23.26 Rebasing
23.26.1 Rebase vs merge
A git merge is only one of two ways to get someone else’s work into yours. The other is called a rebase.
In a merge, a revision is added, which brings the branches together. Both histories are retained. In a
rebase, git tries to work out
What would you need to have done, to make your changes, if your colleague had already made
theirs?
Git will invent some new revisions, and the result will be a repository with an apparently linear history.
On the “Carollian” branch, a commit has been added translating the initial state into Lewis Caroll’s
language:
’Twas brillig,
and the slithy toves
197
git log --oneline --graph master
* 2a74d89 Dancing
* 6a4834d Initial state
If we now merge carollian into master, the final state will include both changes:
’Twas brillig,
and the slithy toves
danced and spun in the waves
But if we rebase, the final content of the file is still the same, but the graph is different:
* df618e0 Dancing
* 2232bf3 Translate into Caroll’s language
* 6a4834d Initial state
Updating 2232bf3..df618e0
Fast-forward
wocky.md | 1 +
1 file changed, 1 insertion(+)
The rebased branch was rebased on the carollian branch, so this merge was just a question of updating
metadata to redefine the branch label: a “fast forward”.
198
23.26.4 Rebasing pros and cons
Some people like the clean, apparently linear history that rebase provides.
But rebase rewrites history.
If you’ve already pushed, or anyone else has got your changes, things will get screwed up.
If you know your changes are still secret, it might be better to rebase to keep the history clean. If in
doubt, just merge.
23.27 Squashing
A second use of the git rebase command, is to rebase your work on top of one of your own earlier commits,
in interactive mode, to “squash” several commits that should really be one:
git log
We can rewrite select commits to be merged, so that the history is neater before we push. This is a great
idea if you have lots of trivial typo commits.
save the interactive rebase config file, and rebase will build a new history:
git log
199
de82 Some good work
fc52 A great piece of work
cd27 Initial commit
Note the commit hash codes for ‘Some good work’ and ‘A great piece of work’ have changed, as the
change they represent has changed.
git bisect
In [1]: import os
top_dir = os.getcwd()
git_dir = os.path.join(top_dir, ’learning_git’)
os.chdir(git_dir)
In [2]: %%bash
rm -rf bisectdemo
git clone git@github.com:shawnsi/bisectdemo.git
In [3]: bisect_dir=os.path.join(git_dir,’bisectdemo’)
os.chdir(bisect_dir)
In [4]: %%bash
python squares.py 2 # 4
This has been set up to break itself at a random commit, and leave you to use bisect to work out where
it has broken:
In [5]: %%bash
./breakme.sh > break_output
Which will make a bunch of commits, of which one is broken, and leave you in the broken final state
200
In [7]: ### Bisecting manually
In [8]: %%bash
git bisect start
git bisect bad # We know the current state is broken
git checkout master
git bisect good # We know the master branch state is OK
Bisect needs one known good and one known bad commit to get started
And eventually:
python squares.py 2
4
201
23.28.3 Solving automatically
If we have an appropriate unit test, we can do all this automatically:
In [9]: %%bash
git bisect start
git bisect bad HEAD # We know the current state is broken
git bisect good master # We know master is good
git bisect run python squares.py 2
202
Previous HEAD position was 719b7e5... Comment 499
Switched to branch ’buggy’
Traceback (most recent call last):
File "squares.py", line 9, in <module>
print(integer**2)
TypeError: unsupported operand type(s) for ** or pow(): ’str’ and ’int’
Traceback (most recent call last):
File "squares.py", line 9, in <module>
print(integer**2)
TypeError: unsupported operand type(s) for ** or pow(): ’str’ and ’int’
Traceback (most recent call last):
File "squares.py", line 9, in <module>
print(integer**2)
TypeError: unsupported operand type(s) for ** or pow(): ’str’ and ’int’
Traceback (most recent call last):
File "squares.py", line 9, in <module>
print(integer**2)
TypeError: unsupported operand type(s) for ** or pow(): ’str’ and ’int’
Traceback (most recent call last):
File "squares.py", line 9, in <module>
print(integer**2)
TypeError: unsupported operand type(s) for ** or pow(): ’str’ and ’int’
Boom!
23.29 Testing
23.29.1 A few reasons not to do testing
Sensibility Sense
It’s boring Maybe
Code is just a one off throwaway As with most research codes
No time for it A bit more code, a lot less debugging
Tests can be buggy too See above
Not a professional programmer See above
Will do it later See above
203
• . . . if the test cases cover the bugs
setup input
run program
read output
check output against expected result
204
Chapter 24
How to Test
codes = [Path.MOVETO,
Path.LINETO,
Path.LINETO,
Path.LINETO,
Path.CLOSEPOLY]
path1 = Path(vertices(*field1), codes)
path2 = Path(vertices(*field2), codes)
fig = plt.figure()
ax = fig.add_subplot(111)
patch1 = patches.PathPatch(path1, facecolor=’orange’, lw=2)
patch2 = patches.PathPatch(path2, facecolor=’blue’, lw=2)
ax.add_patch(patch1)
ax.add_patch(patch2)
ax.set_xlim(0,5)
ax.set_ylim(0,5)
205
show_fields((1.,1.,4.,4.),(2.,2.,3.,3.))
Here, we can see that the area of overlap, is the same as the smaller field, with area 1.
We could now go ahead and write a subroutine to calculate that, and also write some test cases for our
answer.
But first, let’s just consider that question abstractly, what other cases, not equivalent to this might there
be?
For example, this case, is still just a full overlap, and is sufficiently equivalent that it’s not worth another
test:
In [3]: show_fields((1.,1.,4.,4.),(2.5,1.7,3.2,3.4))
206
But this case is no longer a full overlap, and should be tested separately:
In [4]: show_fields((1.,1.,4.,4.),(2.,2.,3.,4.5))
On a piece of paper, sketch now the other cases you think should be treated as non-equivalent. The
answers are in a separate notebook.
207
In [5]: show_fields((1.,1.,4.,4.),(2,2,4.5,4.5)) # Overlap corner
208
In [8]: show_fields((1.,1.,4.,4.),(2.5,4,3.5,4.5)) # Just touching from outside
209
24.1 Using our tests
OK, so how might our tests be useful?
Here’s some code that might correctly calculate the area of overlap:
In [11]: overlap((1.,1.,4.,4.),(2.,2.,3.,3.))
Out[11]: 1.0
210
In [14]: assert overlap((1.,1.,4.,4.),(2.,2.,4.5,4.5)) == 4.0
---------------------------------------------------------------------------
<ipython-input-15-9b6bffd116ce> in <module>()
----> 1 assert overlap((1.,1.,4.,4.),(4.5,4.5,5,5)) == 0.0
AssertionError:
0.25
In [17]: overlap_left=4.5
overlap_right=4
overlap_width=-0.5
overlap_height=-0.5
Both width and height are negative, resulting in a positive area. The above code didn’t take into account
the non-overlap correctly.
It should be:
overlap_left=max(left1, left2)
overlap_bottom=max(bottom1, bottom2)
overlap_right=min(right1, right2)
overlap_top=min(top1, top2)
overlap_height=max(0, (overlap_top-overlap_bottom))
overlap_width=max(0, (overlap_right-overlap_left))
return overlap_height*overlap_width
Note, we reran our other tests, to check our fix didn’t break something else. (We call that “fallout”)
211
24.1.1 Boundary cases
“Boundary cases” are an important area to test:
• Limit between two equivalence classes: edge and corner sharing fields
• Wherever indices appear, check values at 0, N, N+1
• Empty arrays:
Bad input should be expected and should fail early and explicitly.
Testing should ensure that explicit failures do indeed happen.
# Do something
In [21]: I_only_accept_positive_numbers(5)
In [22]: I_only_accept_positive_numbers(-5)
---------------------------------------------------------------------------
<ipython-input-22-e283d4657e88> in <module>()
----> 1 I only accept positive numbers(-5)
212
There are standard “Exception” types, like ValueError we can raise
We would like to be able to write tests like this:
But to do that, we need to learn about more sophisticated testing tools, called “test frameworks”.
• C++ unit-tests:
– CppTest,
– Boost::Test,
– google-test,
– Catch (best)
• Python unit-tests:
213
– unittest comes with standard python library
– py.test, branched off of nose
• R unit-tests:
– RUnit,
– svUnit
– (works with SciViews GUI)
• Fortran unit-tests:
– funit,
– pfunit(works with MPI)
but the real power comes when we write a test file alongside our code files in our homemade packages:
In [26]: %%bash
mkdir -p saskatchewan
touch saskatchewan/__init__.py
overlap_left=max(left1, left2)
overlap_bottom=max(bottom1, bottom2)
overlap_right=min(right1, right2)
overlap_top=min(top1, top2)
# Here’s our wrong code again
overlap_height=(overlap_top-overlap_bottom)
overlap_width=(overlap_right-overlap_left)
return overlap_height*overlap_width
Writing saskatchewan/overlap.py
def test_full_overlap():
assert_equal(overlap((1.,1.,4.,4.),(2.,2.,3.,3.)), 1.0)
def test_partial_overlap():
assert_equal(overlap((1,1,4,4),(2,2,3,4.5)), 2.0)
214
def test_no_overlap():
assert_equal(overlap((1,1,4,4),(4.5,4.5,5,5)), 0.0)
In [29]: %%bash
cd saskatchewan
nosetests
..F
======================================================================
FAIL: saskatchewan.test overlap.test no overlap
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/Users/ccsprsd/jenkins/development/workspace/engineering-publisher/ch03tests/saskatchewan/test o
assert equal(overlap((1,1,4,4),(4.5,4.5,5,5)), 0.0)
AssertionError: 0.25 != 0.0
----------------------------------------------------------------------
Ran 3 tests in 0.012s
FAILED (failures=1)
Note that it reported which test had failed, how many tests ran, and how many failed.
The symbol ..F means there were three tests, of which the third one failed.
Nose will:
Some options:
Out[30]: 2.220446049250313e-13
215
Both results are wrong: 2e-13 is the correct answer.
The size of the error will depend on the magnitude of the floating points:
Out[31]: 1.4901161193847656e-08
Or relative:
---------------------------------------------------------------------------
<ipython-input-33-192d51bb43fc> in <module>()
1 from nose.tools import assert almost equal
2 magnitude = 0.7
----> 3 assert almost equal(0.7, 0.7 + 1e-5, delta = magnitude * 1e-5)
/usr/local/Cellar/python/2.7.10 2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittes
561 places)
562 msg = self. formatMessage(msg, standardMsg)
--> 563 raise self.failureException(msg)
564
565 def assertNotAlmostEqual(self, first, second, places=None, msg=None, delta=None):
Where magnitude should be chosen based on the intrinsic scale of the calculations.
For instance, if calculations is a result of differences between large numbers:
Out[34]: 0.625
216
24.5.3 Comparing vectors of floating points
Numerical vectors are best represented using numpy.
Numpy ships with a number of assertions (in numpy.testing) to make comparison easy:
It compares the difference between actual and expected to atol + rtol * abs(expected).
Implementation:
217
Here, the total energy due to position 2 is 3(3 − 1) = 6, and due to column 7 is 1(1 − 1) = 0. We need
to sum these to get the total energy.
In [38]: %%bash
mkdir -p diffusion
touch diffusion/__init__.py
Parameters
----------
Writing diffusion/model.py
218
In [40]: %%writefile diffusion/test_model.py
from model import energy
def test_energy():
""" Optional description for nose reporting """
# Test something
In [41]: %%bash
cd diffusion
nosetests
.
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK
Now, write your code (in model.py), and tests (in test model.py), testing as you do.
24.6.3 Solution
Don’t look until after class!
def energy(density):
""" Energy associated with the diffusion model
:Parameters:
density: array of positive integers
Number of particles at each position i in the array/geometry
"""
from numpy import array, any, sum
# ...of the right kind (integer). Unless it is zero length, in which case type does not matte
if density.dtype.kind != ’i’ and len(density) > 0:
raise TypeError("Density should be a array of *integers*.")
# and the right values (positive or null)
if any(density < 0):
raise ValueError("Density should be an array of *positive* integers.")
if density.ndim != 1:
raise ValueError("Density should be an a *1-dimensional* array of positive integers.")
Overwriting diffusion/model.py
219
from nose.tools import assert_raises, assert_almost_equal
from model import energy
def test_energy_fails_on_non_integer_density():
with assert_raises(TypeError) as exception:
energy([1.0, 2, 3])
def test_energy_fails_on_negative_density():
with assert_raises(ValueError) as exception: energy([-1, 2, 3])
def test_energy_fails_ndimensional_density():
with assert_raises(ValueError) as exception: energy([[1, 2, 3], [3, 4, 5]])
def test_zero_energy_cases():
# Zero energy at zero density
densities = [ [], [0], [0, 0, 0] ]
for density in densities:
assert_almost_equal(energy(density), 0)
def test_derivative():
from numpy.random import randint
# modified densities
density_plus_one = density.copy()
density_plus_one[element_index] += 1
def test_derivative_no_self_energy():
""" If particle is alone, then its participation to energy is zero """
from numpy import array
expected = 0
actual = energy(density_plus_one) - energy(density)
assert_almost_equal(expected, actual)
220
In [44]: %%bash
cd diffusion
nosetests
...
----------------------------------------------------------------------
Ran 6 tests in 0.052s
OK
24.6.4 Coverage
1. Comment out from exception tests in solution
2. in solution directory, run
In [45]: %%bash
cd diffusion
nosetests --with-coverage --cover-package=diffusion.model -v --cover-html
----------------------------------------------------------------------
Ran 6 tests in 0.053s
OK
24.7 Mocking
24.7.1 Definition
Mock: verb,
Mocking
Stub routine
• A routine that a simulate a more computationally expensive routine, without actually performing any
calculation. Strictly speaking, the term Mocking is reserved for object-oriented approaches
221
24.7.3 Recording calls with mock
Mock objects record the calls made to them:
In [46]: from mock import Mock
function = Mock(name="myroutine", return_value=2)
function(1)
function(5, "hello", a=True)
function.mock_calls
Out[46]: [call(1), call(5, ’hello’, a=True)]
The arguments of each call can be recovered
In [47]: name, args, kwargs = function.mock_calls[1]
args, kwargs
Out[47]: ((5, ’hello’), {’a’: True})
Mock objects can return different values for each call
In [48]: function = Mock(name="myroutine", side_effect=[2, "xyz"])
In [49]: function(1)
Out[49]: 2
In [50]: function(1, "hello", {’a’: True})
Out[50]: ’xyz’
In [51]: function()
---------------------------------------------------------------------------
<ipython-input-51-2fcbbbc1fe81> in <module>()
----> 1 function()
StopIteration:
222
24.8 Using mocks to model test resources
Often we want to write tests for code which interacts with remote resources. (E.g. databases, the internet,
or data files.)
We don’t want to have our tests actually interact with the remote resource, as this would mean our tests
failed due to lost internet connections, for example.
Instead, we can use mocks to assert that our code does the right thing in terms of the messages it sends:
the parameters of the function calls it makes to the remote resource.
For example, consider the following code that downloads a map from the internet:
base="http://maps.googleapis.com/maps/api/staticmap?"
params=dict(
sensor= str(sensor).lower(),
zoom= zoom,
size= "x".join(map(str,size)),
center= ",".join(map(str,(lat,long))),
style="feature:all|element:labels|visibility:off")
if satellite:
params["maptype"]="satellite"
return requests.get(base,params=params)
Out[54]:
223
We would like to test that it is building the parameters correctly. We can do this by mocking the
requests object. We need to temporarily replace a method in the library with a mock. We can use “patch”
to do this:
224
’sensor’:’false’,
’zoom’:12,
’size’:’400x400’,
’center’:’51.0,0.0’,
’style’:’feature:all|element:labels|visibility:off’
}
)
test_build_default_params()
That was quiet, so it passed. When I’m writing tests, I usually modify one of the expectations, to
something ‘wrong’, just to check it’s not passing “by accident”, run the tests, then change it back!
We want to test that the above function does the right thing. It is supposed to compute the derivative
of a function of a vector in a particular direction.
E.g.:
Out[58]: 1.0
How do we assert that it is doing the right thing? With tests like this:
def test_derivative_2d_y_direction():
func=MagicMock()
partial_derivative(func, [0,0], 1)
func.assert_any_call([0, 1.0])
func.assert_any_call([0, 0])
test_derivative_2d_y_direction()
We made our mock a “Magic Mock” because otherwise, the mock results f x plus delta and f x can’t
be subtracted:
In [60]: MagicMock()-MagicMock()
In [61]: Mock()-Mock()
---------------------------------------------------------------------------
<ipython-input-61-fca1dbe33378> in <module>()
225
----> 1 Mock()-Mock()
The python debugger is a python shell: it can print and compute values, and even change the values
of the variables at that point in the program.
24.9.4 Breakpoints
Break points tell debugger where and when to stop We say * b somefunctionname
The debugger is, of course, most used interactively, but here I’m showing a prewritten debugger script:
226
In [63]: %%writefile commands
restart # restart session
n
b energy # program will stop when entering energy
c # continue program until break point is reached
print density # We are now "inside" the energy function and can print any variable.
Writing commands
In [64]: %%bash
python -m pdb energy_example.py < commands
Alternatively, break-points can be set on files: b file.py:20 will stop on line 20 of file.py.
24.9.5 Post-mortem
Debugging when something goes wrong:
In [65]: %pdb on
from diffusion.model import energy
partial_derivative(energy,[5,6,7,8,0,1],5)
---------------------------------------------------------------------------
<ipython-input-65-cd06eca5c4a7> in <module>()
1 get ipython().magic(u’pdb on’)
2 from diffusion.model import energy
227
----> 3 partial derivative(energy,[5,6,7,8,0,1],5)
/Users/ccsprsd/jenkins/development/workspace/engineering-publisher/ch03tests/diffusion/model.pyc
14 # ...of the right kind (integer). Unless it is zero length, in which case type does not ma
15 if density.dtype.kind != ’i’ and len(density) > 0:
---> 16 raise TypeError("Density should be a array of *integers*.")
17 # and the right values (positive or null)
18 if any(density < 0):
> /Users/ccsprsd/jenkins/development/workspace/engineering-publisher/ch03tests/diffusion/model.py(16)ene
15 if density.dtype.kind != ’i’ and len(density) > 0:
---> 16 raise TypeError("Density should be a array of *integers*.")
17 # and the right values (positive or null)
---------------------------------------------------------------------------
/usr/local/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in showtraceback(self,
1854 if self.call pdb:
1855 # drop into debugger
-> 1856 self.debugger(force=True)
1857 return
1858
228
/usr/local/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in <lambda>()
1017 else:
1018 # fallback to our internal debugger
-> 1019 pm = lambda : self.InteractiveTB.debugger(force=True)
1020
1021 with self.readline no record:
/usr/local/Cellar/python/2.7.10 2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pdb.pyc
208 self.setup(frame, traceback)
209 self.print stack entry(self.stack[self.curindex])
--> 210 self.cmdloop()
211 self.forget()
212
/usr/local/Cellar/python/2.7.10 2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/cmd.pyc
128 if self.use rawinput:
129 try:
--> 130 line = raw input(self.prompt)
131 except EOFError:
132 line = ’EOF’
StdinNotImplementedError: raw input was called, but this frontend does not support input request
229
Traceback (most recent call last):
StdinNotImplementedError: raw input was called, but this frontend does not support input request
230
24.10 Jenkins
24.10.1 Test servers
Goal:
24.12 Solution
We need to break our problem down into pieces:
231
24.13 Testing frameworks
24.13.1 Why use testing frameworks?
Frameworks should simplify our lives:
• C unit-tests:
• C++ unit-tests:
– CppTest,
– Boost::Test,
– google-test,
– Catch (best)
• Python unit-tests:
• R unit-tests:
– RUnit,
– svUnit
– (works with SciViews GUI)
• Fortran unit-tests:
– funit,
– pfunit(works with MPI)
232
24.13.3 Nose framework: usage
nose is a python testing framework.
We can use its tools in the notebook for on-the-fly tests in the notebook. This, happily, includes the
negative-tests example we were looking for a moment ago.
---------------------------------------------------------------------------
<ipython-input-2-ffe48ded21bf> in <module>()
1 with assert raises(ValueError):
----> 2 assert I only accept positive numbers(-5)
but the real power comes when we write a test file alongside our code files in our homemade packages:
In [3]: %%bash
mkdir -p saskatchewan
touch saskatchewan/__init__.py
overlap_left=max(left1, left2)
overlap_bottom=max(bottom1, bottom2)
overlap_right=min(right1, right2)
overlap_top=min(top1, top2)
# Here’s our wrong code again
overlap_height=(overlap_top-overlap_bottom)
overlap_width=(overlap_right-overlap_left)
return overlap_height*overlap_width
Overwriting saskatchewan/overlap.py
def test_full_overlap():
assert_equal(overlap((1.,1.,4.,4.),(2.,2.,3.,3.)), 1.0)
def test_partial_overlap():
assert_equal(overlap((1,1,4,4),(2,2,3,4.5)), 2.0)
233
def test_no_overlap():
assert_equal(overlap((1,1,4,4),(4.5,4.5,5,5)), 0.0)
In [6]: %%bash
cd saskatchewan
nosetests
..F
======================================================================
FAIL: saskatchewan.test overlap.test no overlap
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/Users/ccsprsd/jenkins/development/workspace/engineering-publisher/ch03tests/saskatchewan/test o
assert equal(overlap((1,1,4,4),(4.5,4.5,5,5)), 0.0)
AssertionError: 0.25 != 0.0
----------------------------------------------------------------------
Ran 3 tests in 0.001s
FAILED (failures=1)
Note that it reported which test had failed, how many tests ran, and how many failed.
The symbol ..F means there were three tests, of which the third one failed.
Nose will:
Some options:
Out[7]: 2.220446049250313e-13
234
Both results are wrong: 2e-13 is the correct answer.
The size of the error will depend on the magnitude of the floating points:
Out[8]: 1.4901161193847656e-08
Or relative:
---------------------------------------------------------------------------
<ipython-input-10-192d51bb43fc> in <module>()
1 from nose.tools import assert almost equal
2 magnitude = 0.7
----> 3 assert almost equal(0.7, 0.7 + 1e-5, delta = magnitude * 1e-5)
/usr/local/Cellar/python/2.7.10 2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittes
561 places)
562 msg = self. formatMessage(msg, standardMsg)
--> 563 raise self.failureException(msg)
564
565 def assertNotAlmostEqual(self, first, second, places=None, msg=None, delta=None):
Where magnitude should be chosen based on the intrinsic scale of the calculations.
For instance, if calculations is a result of differences between large numbers:
Out[11]: 0.625
235
24.14.3 Comparing vectors of floating points
Numerical vectors are best represented using numpy.
Numpy ships with a number of assertions (in numpy.testing) to make comparison easy:
Implementation:
---------------------------------------------------------------------------
<ipython-input-1-b05039655391> in <module>()
1 import numpy as np
2 density = np.array([0, 0, 3, 5, 8, 4, 2, 1])
----> 3 fig, ax = plt.subplots()
4 ax.bar(np.arange(len(density))-0.5, density)
5 ax.xrange=[-0.5, len(density)-0.5]
236
Here, the total energy due to position 2 is 3(3 − 1) = 6, and due to column 7 is 1(1 − 1) = 0. We need
to sum these to get the total energy.
In [2]: %%bash
mkdir -p diffusion
touch diffusion/__init__.py
Parameters
----------
Overwriting diffusion/model.py
In [5]: %%bash
cd diffusion
nosetests
.
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK
Now, write your code (in model.py), and tests (in test model.py), testing as you do.
237
24.15.3 Solution
Don’t look until after class!
def energy(density):
""" Energy associated with the diffusion model
:Parameters:
density: array of positive integers
Number of particles at each position i in the array/geometry
"""
from numpy import array, any, sum
# ...of the right kind (integer). Unless it is zero length, in which case type does not matter
if density.dtype.kind != ’i’ and len(density) > 0:
raise TypeError("Density should be a array of *integers*.")
# and the right values (positive or null)
if any(density < 0):
raise ValueError("Density should be an array of *positive* integers.")
if density.ndim != 1:
raise ValueError("Density should be an a *1-dimensional* array of positive integers.")
Overwriting diffusion/model.py
def test_energy_fails_on_non_integer_density():
with assert_raises(TypeError) as exception:
energy([1.0, 2, 3])
def test_energy_fails_on_negative_density():
with assert_raises(ValueError) as exception: energy([-1, 2, 3])
def test_energy_fails_ndimensional_density():
with assert_raises(ValueError) as exception: energy([[1, 2, 3], [3, 4, 5]])
def test_zero_energy_cases():
# Zero energy at zero density
densities = [ [], [0], [0, 0, 0] ]
for density in densities:
assert_almost_equal(energy(density), 0)
def test_derivative():
from numpy.random import randint
238
# Loop over vectors of different sizes (but not empty)
for vector_size in randint(1, 1000, size=30):
# modified densities
density_plus_one = density.copy()
density_plus_one[element_index] += 1
def test_derivative_no_self_energy():
""" If particle is alone, then its participation to energy is zero """
from numpy import array
expected = 0
actual = energy(density_plus_one) - energy(density)
assert_almost_equal(expected, actual)
In [8]: %%bash
cd diffusion
nosetests
...
----------------------------------------------------------------------
Ran 6 tests in 0.052s
OK
24.15.4 Coverage
1. Comment out from exception tests in solution
2. in solution directory, run
In [9]: %%bash
cd diffusion
nosetests --with-coverage --cover-package=diffusion.model -v --cover-html
239
diffusion.test model.test energy fails on negative density ... ok
diffusion.test model.test energy fails ndimensional density ... ok
diffusion.test model.test zero energy cases ... ok
diffusion.test model.test derivative ... ok
If particle is alone, then its participation to energy is zero ... ok
----------------------------------------------------------------------
Ran 6 tests in 0.051s
OK
24.16 Mocking
24.16.1 Definition
Mock: verb,
Mocking
Stub routine
• A routine that a simulate a more computationally expensive routine, without actually performing any
calculation. Strictly speaking, the term Mocking is reserved for object-oriented approaches
240
In [3]: function = Mock(name="myroutine", side_effect=[2, "xyz"])
In [4]: function(1)
Out[4]: 2
Out[5]: ’xyz’
In [6]: function()
---------------------------------------------------------------------------
<ipython-input-6-2fcbbbc1fe81> in <module>()
----> 1 function()
StopIteration:
241
base="http://maps.googleapis.com/maps/api/staticmap?"
params=dict(
sensor= str(sensor).lower(),
zoom= zoom,
size= "x".join(map(str,size)),
center= ",".join(map(str,(lat,long))),
style="feature:all|element:labels|visibility:off")
if satellite:
params["maptype"]="satellite"
return requests.get(base,params=params)
Out[9]:
242
We would like to test that it is building the parameters correctly. We can do this by mocking the
requests object. We need to temporarily replace a method in the library with a mock. We can use “patch”
to do this:
That was quiet, so it passed. When I’m writing tests, I usually modify one of the expectations, to
something ‘wrong’, just to check it’s not passing “by accident”, run the tests, then change it back!
We want to test that the above function does the right thing. It is supposed to compute the derivative
of a function of a vector in a particular direction.
E.g.:
Out[13]: 1.0
How do we assert that it is doing the right thing? With tests like this:
def test_derivative_2d_y_direction():
func=MagicMock()
243
partial_derivative(func, [0,0], 1)
func.assert_any_call([0, 1.0])
func.assert_any_call([0, 0])
test_derivative_2d_y_direction()
We made our mock a “Magic Mock” because otherwise, the mock results f x plus delta and f x can’t
be subtracted:
In [15]: MagicMock()-MagicMock()
In [16]: Mock()-Mock()
---------------------------------------------------------------------------
<ipython-input-16-fca1dbe33378> in <module>()
----> 1 Mock()-Mock()
244
• s(tep): step into current function in line of code
• l(ist): list program around current position
• w(where): prints current stack (where we are in code)
• [enter]: repeats last command
• anypythonvariable: print the value of that variable
The python debugger is a python shell: it can print and compute values, and even change the values
of the variables at that point in the program.
24.18.4 Breakpoints
Break points tell debugger where and when to stop We say * b somefunctionname
The debugger is, of course, most used interactively, but here I’m showing a prewritten debugger script:
Overwriting commands
In [3]: %%bash
python -m pdb energy_example.py < commands
Alternatively, break-points can be set on files: b file.py:20 will stop on line 20 of file.py.
24.18.5 Post-mortem
Debugging when something goes wrong:
245
1. use w and l for position in code and in call stack
2. use up and down to navigate up and down the call stack
3. inspect variables along the way to understand failure
In [4]: %pdb on
from diffusion.model import energy
partial_derivative(energy,[5,6,7,8,0,1],5)
---------------------------------------------------------------------------
<ipython-input-4-cd06eca5c4a7> in <module>()
1 get ipython().magic(u’pdb on’)
2 from diffusion.model import energy
----> 3 partial derivative(energy,[5,6,7,8,0,1],5)
> <ipython-input-4-cd06eca5c4a7>(3)<module>()
1 get ipython().magic(u’pdb on’)
2 from diffusion.model import energy
----> 3 partial derivative(energy,[5,6,7,8,0,1],5)
---------------------------------------------------------------------------
/usr/local/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in showtraceback(self,
1854 if self.call pdb:
1855 # drop into debugger
-> 1856 self.debugger(force=True)
1857 return
1858
246
-> 1022 pm()
1023
1024 #-------------------------------------------------------------------------
/usr/local/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in <lambda>()
1017 else:
1018 # fallback to our internal debugger
-> 1019 pm = lambda : self.InteractiveTB.debugger(force=True)
1020
1021 with self.readline no record:
/usr/local/Cellar/python/2.7.10 2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pdb.pyc
208 self.setup(frame, traceback)
209 self.print stack entry(self.stack[self.curindex])
--> 210 self.cmdloop()
211 self.forget()
212
/usr/local/Cellar/python/2.7.10 2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/cmd.pyc
128 if self.use rawinput:
129 try:
--> 130 line = raw input(self.prompt)
131 except EOFError:
132 line = ’EOF’
StdinNotImplementedError: raw input was called, but this frontend does not support input request
247
> /usr/local/lib/python2.7/site-packages/IPython/kernel/zmq/kernelbase.py(646)raw input()
645 raise StdinNotImplementedError(--> 646 "raw input was called, but th
647 )
StdinNotImplementedError: raw input was called, but this frontend does not support input request
248
24.19 Jenkins
24.19.1 Test servers
Goal:
In [1]:
24.21 Solution
We need to break our problem down into pieces:
249
3. A function to determine the probability of a change given the energy difference (1 if decreases, otherwise
based on exponential): change density()
4. A function to determine whether to execute a change or not by drawing a random num-
beraccept change()
5. A method to iterate the above procedure: step()
1. Input insanity: e.g. density should non-negative integer; testing by giving negative values etc.
2. change density(): density is change by a particle hopping left or right? Do all positions have an
equal chance of moving?
3. accept change() will move be accepted when second energy is lower?
4. Make a small test case for the main algorithm. (Hint: by using mocking, we can pre-set who to move
where.)
In [1]: %%bash
mkdir -p DiffusionExample
class MonteCarlo(object):
""" A simple Monte Carlo implementation """
def __init__(self, energy, density, temperature=1, itermax=1000):
from numpy import any, array
density = array(density)
self.itermax = itermax
if len(density) < 2:
raise ValueError("Density is too short")
# of the right kind (integer). Unless it is zero length, in which case type does not mat
if density.dtype.kind != ’i’ and len(density) > 0:
raise TypeError("Density should be an array of *integers*.")
# and the right values (positive or null)
if any(density < 0):
raise ValueError("Density should be an array of *positive* integers.")
if density.ndim != 1:
raise ValueError("Density should be an a *1-dimensional* array of positive integers.
if sum(density) == 0:
raise ValueError("Density is empty.")
self.current_energy = energy(density)
self.temperature = temperature
self.density = density
250
def random_agent(self, density):
#Particle index
particle = randint(sum(density))
current = 0
for location, n in enumerate(density):
current += n
if current > particle: break
return location
location = self.random_agent(density)
# Move direction
if(density[location]-1<0): return array(density)
if location == 0: direction = 1
elif location == len(density) - 1: direction = -1
else: direction = self.random_direction()
def step(self):
iteration = 0
while iteration < self.itermax:
new_density = self.change_density(self.density)
new_energy = energy(new_density)
251
from numpy import array, any, sum
# of the right kind (integer). Unless it is zero length, in which case type does not matter.
if density.dtype.kind != ’i’ and len(density) > 0:
raise TypeError("Density should be an array of *integers*.")
# and the right values (positive or null)
if any(density < 0):
raise ValueError("Density should be an array of *positive* integers.")
if density.ndim != 1:
raise ValueError("Density should be an a *1-dimensional* array of positive integers.")
Writing DiffusionExample/MonteCarlo.py
Temperature = 0.1
density=[np.sin(i) for i in np.linspace(0.1, 3, 100)]
density=np.array(density)*100
density = density.astype(int)
fig = plt.figure()
ax = plt.axes(xlim=(-1,len(density)),ylim=(0,np.max(density)+1))
image = ax.scatter(range(len(density)), density)
def simulate(step):
energy, density = mc.step()
image.set_offsets(np.vstack((range(len(density)), density)).T)
txt_energy.set_text(’Energy = %f’% energy)
252
#anim
def test_input_sanity():
""" Check incorrect input do fail """
energy = MagicMock()
def test_move_particle_one_over():
""" Check density is change by a particle hopping left or right. """
from numpy import nonzero, multiply
from numpy.random import randint
energy = MagicMock()
def test_equal_probability():
253
""" Check particles have equal probability of movement. """
from numpy import array, sqrt, count_nonzero
energy = MagicMock()
def test_accept_change():
""" Check that move is accepted if second energy is lower """
from numpy import sqrt, count_nonzero, exp
energy = MagicMock
mc = MonteCarlo(energy, [1, 1, 1], temperature=100.0)
# Should always be true. But do more than one draw, in case randomness incorrectly crept int
# implementation
for i in range(10):
assert_true(mc.accept_change(0.5, 0.4))
assert_true(mc.accept_change(0.5, 0.5))
# This should be accepted only part of the time, depending on exponential distribution
prior, successor = 0.4, 0.5
accepted = [mc.accept_change(prior, successor) for i in range(10000)]
assert_almost_equal(
count_nonzero(accepted) / float(len(accepted)),
exp(-(successor - prior) / mc.temperature),
delta = 3e0 / sqrt(len(accepted))
)
def test_main_algorithm():
import numpy as np
from numpy import testing
from mock import Mock
density = [1, 1, 1, 1, 1]
energy = MagicMock()
mc = MonteCarlo(energy, density, itermax = 5)
254
Writing DiffusionExample/test model.py
In [5]: %%bash
cd DiffusionExample
nosetests
...
----------------------------------------------------------------------
Ran 5 tests in 0.386s
OK
255
Chapter 25
Installing Libraries
That was actually pretty easy, I hope. This is how you’ll install new libraries when you need them.
Troubleshooting:
On mac or linux, you might get a complaint that you need “superuser”, “root”, or “administrator” access.
If so type:
256
• sudo pip install geopy
and enter your password.
If you get a complaint like: ‘pip is not recognized as an internal or external command’, try the following:
• conda install pip (Windows)
• sudo easy install pip (Mac, Linux)
Ask me over email if you run into trouble.
257
• cd my python libs
• cd <library name> (e.g. cd JSAnimation-master)
25.6 Libraries
25.6.1 Libraries are awesome
The strength of a language lies as much in the set of libraries available, as it does in the language itself.
A great set of libraries allows for a very powerful programming style:
Not only is this efficient with your programming time, it’s also more efficient with computer time.
The chances are any algorithm you might want to use has already been programmed better by someone
else.
258
25.7.3 How to choose a library
• When was the last commit?
• How often are there commits?
• Can you find the lead contributor on the internet?
• Do they respond when approached:
– emails to developer list
– personal emails
– tweets
– irc
– issues raised on GitHub?
• Are there contributors other than the lead contributor?
• Is there discussion of the library on Stack Exchange?
• Is the code on an open version control tool like GitHub?
• Is it on standard package repositories. (PyPI, apt/yum/brew)
• Are there any tests?
• Download it. Can you build it? Do the tests pass?
• Is there an open test dashboard? (Travis/Jenkins/CDash)
• What dependencies does the library itself have? Do they pass this list?
• Are different versions of the library clearly labeled with version numbers?
• Is there a changelog?
25.9 Argparse
This is the standard library for building programs with a command-line interface.
259
parser = ArgumentParser(description = "Generate appropriate greetings")
parser.add_argument(’--title’, ’-t’)
parser.add_argument(’--polite’,’-p’, action="store_true")
parser.add_argument(’personal’)
parser.add_argument(’family’)
arguments= parser.parse_args()
greeting= "How do you do, " if arguments.polite else "Hey, "
if arguments.title:
greeting+=arguments.title+" "
greeting+= arguments.personal + " " + arguments.family +"."
print greeting
Writing greeter.py
In [2]: %%bash
#!/usr/bin/env bash
chmod u+x greeter.py
./greeter.py --help
./greeter.py James Hetherington
./greeter.py --polite James Hetherington
./greeter.py James Hetherington --title Dr
positional arguments:
personal
family
optional arguments:
-h, --help show this help message and exit
--title TITLE, -t TITLE
--polite, -p
Hey, James Hetherington.
How do you do, James Hetherington.
Hey, Dr James Hetherington.
25.10 Packaging
25.10.1 Packaging
Once we’ve made a working program, we’d like to be able to share it with others.
A good cross-platform build tool is the most important thing: you can always have collaborators build
from source.
260
25.10.3 Laying out a project
When planning to package a project for distribution, defining a suitable project layout is essential.
In [1]: %%bash
tree --charset ascii greetings
greetings
|-- CITATION.md
|-- LICENSE.md
|-- README.md
|-- conf.py
|-- greetings
| |-- init .py
| |-- command.py
| |-- greeter.py
| ‘-- test
| |-- init .py
| |-- fixtures
| | ‘-- samples.yaml
| ‘-- test greeter.py
|-- index.rst
‘-- setup.py
3 directories, 12 files
In [2]: %%bash
mkdir -p greetings/greetings/test/fixtures
mkdir -p greetings/scripts
setup(
name = "Greetings",
version = "0.1",
packages = find_packages(exclude=[’*test’]),
scripts = [’scripts/greet’],
install_requires = [’argparse’]
)
Overwriting greetings/setup.py
And the package will be then available to use everywhere on the system:
261
In [4]: import greetings
from greetings.greeter import greet
print greetings.greeter.greet("James","Hetherington")
In [5]: %%bash
#!/usr/bin/env bash
greet --help
positional arguments:
personal
family
optional arguments:
-h, --help show this help message and exit
--title TITLE, -t TITLE
--polite, -p
In [6]: %%bash
greet James Hetherington
greet --polite James Hetherington
greet James Hetherington --title Dr
In [7]: %%bash
greet Humphry Appleby --title Sir
Try it!
262
Parameters
----------
personal: str
A given name, such as Will or Jean-Luc
family: str
A family name, such as Riker or Picard
title: str
An optional title, such as Captain or Reverend
polite: bool
True for a formal greeting, False for informal.
Returns
-------
string
An appropriate greeting
"""
Overwriting greetings/greetings/greeter.py
The documentation string explains how to use the function; don’t worry about this for now, we’ll consider
this next time.
def process():
parser = ArgumentParser(description = "Generate appropriate greetings")
parser.add_argument(’--title’, ’-t’)
parser.add_argument(’--polite’, ’-p’, action="store_true")
parser.add_argument(’personal’)
parser.add_argument(’family’)
arguments= parser.parse_args()
if __name__ == "__main__":
process()
Overwriting greetings/greetings/command.py
263
25.10.8 Write an entry point script stub
In [10]: %%writefile greetings/scripts/greet
#!/usr/bin/env python
from greetings.command import process
process()
Writing greetings/scripts/greet
Greetings!
==========
Overwriting greetings/README.md
Overwriting greetings/LICENSE.md
Overwriting greetings/CITATION.md
264
25.10.13 Write some unit tests
Separating the script from the logical module made this possible:
Overwriting greetings/greetings/test/fixtures/samples.yaml
265
25.10.16 Homebrew
Homebrew: A ruby DSL, you host off your own webpage
See my installer for the cppcourse example
If you’re on OSX, do:
25.10.17 Exercises
We previously looked at Greengraph.py, a script that enables us to explore how green space varies as we
move from the city to the countryside:
def geolocate(place):
return geocoder.geocode(place,exactly_one=False)[0][1]
london_location=geolocate("London")
print london_location
(51.5073509, -0.1277583)
25.11 Documentation
25.11.1 Documentation is hard
• Good documentation is hard, and very expensive.
• Bad documentation is detrimental.
• Good documentation quickly becomes bad if not kept up-to-date with code changes.
• Professional companies pay large teams of documentation writers.
• Readable code
• Automated tests
• Small code samples demonstrating how to use the api
266
25.11.3 Comment-based Documentation tools
Documentation tools can produce extensive documentation about your code by pulling out comments near
the beginning of functions, together with the signature, into a web page.
The most popular is Doxygen
Have a look at an example of some Doxygen output
Sphinx is nice for Python, and works with C++ as well. Here’s some Sphinx-generated output and the
corresponding source code Breathe can be used to make Sphinx and Doxygen work together.
Roxygen is good for R.
"""
Generate a greeting string for a person.
Parameters
----------
personal: str
A given name, such as Will or Jean-Luc
family: str
A family name, such as Riker or Picard
title: str
An optional title, such as Captain or Reverend
polite: bool
True for a formal greeting, False for informal.
Returns
-------
string
An appropriate greeting
sphinx-quickstart
Which responds:
Please enter avalues for the following settings (just press Enter to
accept a default value, if one is given in brackets).
267
and then look at and adapt the generated config, a file called conf.py in the root of the project. This
contains the project’s Sphinx configuration, as Python variables:
#Add any Sphinx extension module names here, as strings. They can be
#extensions coming with Sphinx (named ’sphinx.ext.*’) or your custom
# ones.
extensions = [
’sphinx.ext.autodoc’, # Support automatic documentation
’sphinx.ext.coverage’ , # Automatically check if functions are documented
’sphinx.ext.mathjax’ , # Allow support for algebra
’sphinx.ext.viewcode’ , # Include the source code in documentation
’numpydoc’ # Support NumPy style docstrings
]
To proceed with the example, we’ll copy a finished conf.py into our folder, though normally you’ll always
use sphinx-quickstart
import sys
import os
extensions = [
’sphinx.ext.autodoc’, # Support automatic documentation
’sphinx.ext.coverage’, # Automatically check if functions are documented
’sphinx.ext.mathjax’, # Allow support for algebra
’sphinx.ext.viewcode’, # Include the source code in documentation
’numpydoc’ # Support NumPy style docstrings
]
templates_path = [’_templates’]
source_suffix = ’.rst’
master_doc = ’index’
project = u’Greetings’
copyright = u’2014, James Hetherington’
version = ’0.1’
release = ’0.1’
exclude_patterns = [’_build’]
pygments_style = ’sphinx’
html_theme = ’default’
html_static_path = [’_static’]
htmlhelp_basename = ’Greetingsdoc’
latex_elements = {
}
latex_documents = [
(’index’, ’Greetings.tex’, u’Greetings Documentation’,
u’James Hetherington’, ’manual’),
]
man_pages = [
(’index’, ’greetings’, u’Greetings Documentation’,
[u’James Hetherington’], 1)
]
texinfo_documents = [
268
(’index’, ’Greetings’, u’Greetings Documentation’,
u’James Hetherington’, ’Greetings’, ’One line description of project.’,
’Miscellaneous’),
]
Overwriting ../ch04packaging/greetings/conf.py
.. autofunction:: greetings.greeter.greet
Overwriting ../ch04packaging/greetings/index.rst
In [3]: %%bash
cd ../session04/greetings/
sphinx-build . doc
25.13 Engineering
25.13.1 Software Engineering Stages
• Requirements
• Functional Design
• Architectural Design
• Implementation
• Integration
269
As a clinician, when I finish an analysis, I want a report to be created on the test results, so that
I can send it to the patient.
As a role, when condition or circumstance applies I want a goal or desire so that benefits occur.
These are easy to map into the Gherkin behaviour driven design test language.
25.13.4 Waterfall
The Waterfall design philosophy argues that the elements of design should occur in order: first requirements
capture, then functional design, then architectural design. This approach is based on the idea that if a
mistake is made in the design, then programming effort is wasted, so significant effort is spent in trying to
ensure that requirements are well understood and that the design is correct before programming starts.
270
25.13.9 Software is not made of bricks
Third, software systems operate in a domain determined principally by arbitrary rules about
information and symbolic communication whilst the operation of physical systems is governed
by the laws of physics. Finally, software is readily changeable and thus is changed, it is used in
settings where our uncertainty leads us to anticipate the need to change.
– Prof. Anthony Finkelstein, UCL Dean of Engineering, and Professor of Software Systems Engineering
That is, while there is value in the items on the right, we value the items on the left more.
– Jim Highsmith.
271
25.13.14 Iterative Development
Agile development maintains a backlog of features to be completed and bugs to be fixed. In each iteration,
we start with a meeting where we decide which backlog tasks will be attempted during the development
cycle, estimating how long each will take, and selecting an achievable set of goals for the “sprint”. At the
end of each cycle, we review the goals completed and missed, and consider what went well, what went badly,
and what could be improved.
We try not to add work to a cycle mid-sprint. New tasks that emerge are added to the backlog, and
considered in the next planning meeting. This reduces stress and distraction.
25.13.18 Conclusion
• Don’t ignore design
• See if there’s a known design pattern that will help
• Do try to think about how your code will work before you start typing
• Do use design tools like UML to think about your design without coding straight away
• Do try to write down some user stories
• Do maintain design documents.
BUT
• Do change your design as you work, updating the documents if you have them
• Don’t go dark – never do more than a couple of weeks programming without showing what you’ve
done to colleagues
• Don’t get isolated from the reasons for your code’s existence, stay involved in the research, don’t be a
Code Monkey.
• Do keep a list of all the things your code needs, estimate and prioritise tasks carefully.
272
25.14 Exercises
25.14.1 Refactoring to classes
Complete the exercise on Boids from last week, as far as creating a class for a Boid, if you haven’t already.
25.15.2 Disclaimer
Here we attempt to give some basic advice on choosing a license for your software. But:
273
25.15.3 Choose a license
It is important to choose a license and to create a license file to tell people what it is.
The license lets people know whether they can reuse your code and under what terms. This course has
one, for example.
Your license file should typically be called LICENSE.txt or similar. GitHub will offer to create a license
file automatically when you create a new repository.
XXXX NON-COMMERCIAL EDUCATIONAL LICENSE Copyright (c) 2013 Prof. Foo. All
rights reserved.
274
You may use and modify this software for any non-commercial purpose within your educational
institution. Teaching, academic research, and personal experimentation are examples of purpose
which can be non-commercial.
You may redistribute the software and modifications to the software for non-commercial purposes,
but only to eligible users of the software (for example, to another university student or faculty
to support joint academic research).
Please don’t do this. Your desire to slightly tweak the terms is harmful to the future software ecosystem.
Also, Unless you are a lawyer, you cannot do this safely!
If you want your code to be maximally reusable, use a permissive license If you want to force other people
using your code to make derivatives open source, use a copyleft license.
If you want to use code that has a permissive license, it’s safe to use it and keep your code secret. If you
want to use code that has a copyleft license, you’ll have to release your code under such a license.
25.15.13 Patents
Intellectual property law distinguishes copyright from patents. This is a complex field, which I am far from
qualified to teach!
People who think carefully about intellectual property law distinguish software licenses based on how
they address patents. Very roughly, if a you want to ensure that contributors to your project can’t then go
off and patent their contribution, some licenses, such as the Apache license, protect you from this.
275
25.15.14 Use as a web service
If I take copyleft code, and use it to host a web service, I have not sold the software.
Therefore, under some licenses, I do not have to release any derivative software. This “loophole” in the
GPL is closed by the AGPL (“Affero GPL”)
H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York, 2009.
@Book{, author = {Hadley Wickham}, title = {ggplot2: elegant graphics for data analy-
sis}, publisher = {Springer New York}, year = {2009}, isbn = {978-0-387-98140-6}, url =
{http://had.co.nz/ggplot2/book}, }
Check your license at opensource.org for details of how to apply it to your software. For example, for the
GPL
276
25.16 Managing software issues
25.16.1 Issues
Code has bugs. It also has features, things it should do.
A good project has an organised way of managing these. Generally you should use an issue tracker.
• Version
• Steps
25.16.6 Status
• Submitted
• Accepted
• Underway
• Blocked
25.16.7 Resolutions
• Resolved
• Will Not Fix
• Not reproducible
• Not a bug (working as intended)
277
25.16.8 Bug triage
Some organisations use a severity matrix based on:
You should, in the end, be able to pip install your code on a clean computer, and do something similar
to
278
Chapter 26
Construction
26.1 Construction
Software design gets a lot of press (Object orientation, UML, design patterns)
In this session we’re going to look at advice on software construction
This lecture is available as an IPython Notebook
279
26.1.5 Construction
So, we’ve excluded most of the exciting topics. What’s left is the bricks and mortar of software: how letters
and symbols are used to build code which is readable.
Software has beauty at these levels too: stories and characters correspond to architecture and object
design, plots corresponds to algorithms, but the rhythm of sentences and the choice of words corresponds to
software construction.
Read CodeComplete
26.2 Setup
This notebook is based on a number of fragments of code, with an implicit context. We’ve made a library
to set up the context so the examples work:
280
input ="2.0"
iOffset=1
offset =1
anothervariable=1
flag1=True
variable=1
flag2=False
def do_something(): pass
from mock import Mock
Mock.__sub__=Mock()
Mock.__abs__=Mock()
chromosome=None
start_codon=None
subsequence=Mock()
transcribe=Mock()
ribe=Mock()
find=Mock()
hawk=Mock()
starling=Mock()
can_see=Mock()
my_name=""
your_name=""
flag1=False
flag2=False
start=0.0
end=1.0
step=0.1
birds=[Mock()]*2
resolution=100
pi=3.141
result= [0]*resolution
import numpy as np
import math
data= [math.sin(y) for y in np.arange(0,pi,pi/resolution)]
import yaml
import os
Writing context.py
281
def add_to_reaction(a_name,
a_reaction):
l_species = Species(a_name)
a_reaction.append( l_species )
26.3.3 Layout
In [4]: reaction= {
"reactants": ["H","H","O"],
"products": ["H2O"]
}
In [5]: reaction2=(
{
"reactants":
[
"H",
"H",
"O"
],
"products":
[
"H2O"
]
}
)
282
26.3.7 Newlines
• Newlines make code easier to read
• Newlines make less code fit on a screen
26.3.11 Lint
There are automated tools which enforce coding conventions and check for common mistakes.
These are called linters
E.g. pip install pep8
In [11]: %%bash
pep8 species.py
It is a good idea to run a linter before every commit, or include it in your CI tests.
26.4 Comments
26.4.1 Why comment?
• You’re writing code for people, as well as computers.
• Comments can help you build code, by representing your design
• Comments explain subtleties in the code which are not obvious from the syntax
• Comments explain why you wrote the code the way you did
283
26.4.3 Comments which are obvious
In [1]: from context import *
Is good. But:
class Agent(object):
def turn(self):
self.direction+=self.angular_velocity;
def move(self):
self.x+=Agent.step_length*sin(self.direction)
self.y+=Agent.step_length*cos(self.direction)
is probably better.
is OK.
284
26.4.7 Comments which only make sense to the author today
In [7]: agent.turn() # Turtle Power!
agent.move()
agents[:]=[]# Shredder!
@double
def try_me_twice():
pass
26.6 Refactoring
26.6.1 Refactoring
To refactor is to:
• Make a change to the design of some software
• Which improves the structure or readability
• But which leaves the actual behaviour of the program completely unchanged.
285
– Martin Fowler
after:
In [3]: resolution=100
pi=3.141
data= [math.sin(x) for x in np.arange(0,pi,pi/resolution)]
result= [0]*resolution
for i in range(resolution):
for j in range(i + 1, resolution):
result[j] += data[i] * data[i-j] / resolution
In [4]: if abs(hawk.facing-starling.facing)<hawk.viewport:
hawk.hunting()
if abs(starling.facing-hawk.facing)<starling.viewport:
starling.flee()
After:
if can_see(hawk,starling):
hawk.hunting()
if can_see(starling,hawk):
starling.flee()
286
26.6.6 Change of variable name
Smell: Code needs a comment to explain what it is for
Before:
In [6]: z=find(x,y)
if z:
ribe(x)
After:
vs
In [10]: sum=0
for i in range(resolution):
sum+=data[i]
After:
In [11]: sum=0
for value in data:
sum+=value
After:
287
26.6.10 Replace set of arrays with array of structures
Smell: A function needs to work corresponding indices of several arrays:
Before:
After:
Warning: this refactoring greatly improves readability but can make code slower, depending on memory
layout. Be careful.
After:
Writing config.yaml
In [18]: config=yaml.load(open("config.yaml"))
In [19]: viewport=pi/4
if hawk.can_see(starling):
hawk.hunt(starling)
class Hawk(object):
def can_see(self,target):
return (self.facing-target.facing)<viewport
Becomes:
288
In [20]: viewport=pi/4
if hawk.can_see(starling,viewport):
hawk.hunt(starling)
class Hawk(object):
def can_see(self,target,viewport):
return (self.facing-target.facing)<viewport
Becomes:
After:
def predate(predator,prey):
if predator.can_see(prey):
predator.hunt(prey)
if predator.can_reach(prey):
predator.eat(prey)
289
In [25]: class One(object):
pass
class Two(object):
def __init__():
self.child = One()
After:
Writing anotherfile.py
class Two(object):
def __init__():
self.child = One()
290
26.7 Introduction to Objects
26.7.1 Classes: User defined types
In [1]: class Person(object):
def __init__(self,name,age):
self.name=name
self.age=age
def grow_up(self):
self.age+=1
james=Person("James",37)
james.home="London"
26.7.4 Method
Method: A function which is “built in” to a class
my_object=MyClass()
my_object.someMethod(value)
26.7.5 Constructor
Constructor: A special method called when instantiating a new object
my_object = MyClass(value)
291
In [7]: class MyClass(object):
def __init__(self):
self.member = "Value"
my_object = MyClass()
assert(my_object.member == "Value")
After:
if can_see(hawk,starling):
hawk.hunt()
After:
if hawk.can_see(starling):
hawk.hunt()
292
26.8.3 Replace method arguments with class members
Smell: A variable is nearly always used in arguments to a class.
After:
In [14]: name="James"
birthday=[19,10,76]
today=[29,10]
if today==birthday[0:2]:
print "Happy Birthday, ", name
else:
print "No birthday for you today."
james=Person([19,10,76],"James")
james.greet_appropriately([29,10])
293
26.8.5 Object Oriented Refactoring Summary
• Replace ad-hoc structure with a class
• Replace function with a method
• Replace method argument with class member
• Replace global variable with class data
26.9 Design
26.9.1 Design
In this session, we will finally discuss the thing most people think of when they refer to “Software Engineer-
ing”: the deliberate design of software. We will discuss processes and methodologies for planned development
of large-scale software projects: Software Architecture.
294
def move(self, delta_t):
self.position+= self.velocity*delta_t
class Particle {
std::vector<double> position;
std::vector<double> velocity;
Particle(std::vector<double> position, std::vector<double> velocity);
void move(double delta_t);
}
type particle
real :: position
real :: velocity
contains
procedure :: init
procedure :: move
end type particle
26.10.3 UML
UML is a conventional diagrammatic notation used to describe “class structures” and other higher level
aspects of software design.
Computer scientists get worked up about formal correctness of UML diagrams and learning the conven-
tions precisely. Working programmers can still benefit from using UML to describe their designs.
26.10.4 YUML
We can see a YUML model for a Particle class with position and velocity data and a move() method
using the YUML online UML drawing tool.
http://yuml.me/diagram/boring/class/[Particle|position;velocity|move%28%29
Here’s how we can use Python code to get an image back from YUML:
In [4]: yuml("[Particle|position;velocity|move()]")
Out[4]:
295
26.11 Information Hiding
Sometimes, our design for a program would be broken if users start messing around with variables we don’t
want them to change.
Robust class design requires consideration of which subroutines are intended for users to use, and which
are internal. Languages provide features to implement this: access control.
In python, we use leading underscores to control whether member variables and methods can be accessed
from outside the class.
MyClass().called_inside()
MyClass()._private_method() # Works, but forbidden by convention
MyClass().public_method() # OK
print MyClass()._private_data
print MyClass().public_data
0
0
---------------------------------------------------------------------------
<ipython-input-6-f273bcb10b88> in <module>()
----> 1 MyClass(). private method() # Generates error
---------------------------------------------------------------------------
296
<ipython-input-7-9da94df448bc> in <module>()
----> 1 print MyClass(). private data # Generates error
becomes:
@property
def name(self):
return self._first + " " + self._second
Note that the code behaves the same way to the outside user. The implementation detail is hidden by
private variables. In languages without this feature, such as C++, it is best to always make data private,
and always access data through functions:
297
Counted.howMany() # 0
x=Counted()
Counted.howMany() # 1
z=[Counted() for x in range(5)]
Counted.howMany() # 6
Out[11]: 6
26.12 Inheritance
• Inheritance allows related classes to share code
• Inheritance allows a program to reflect the ontology of kinds of thing in a program.
class Bird(Animal):
def fly(self): print "Whee!"
class Eagle(Bird):
def hunt(self): print "I’m gonna eatcha!"
Eagle().beBorn()
Eagle().hunt()
I exist
I’m gonna eatcha!
298
26.12.3 Inheritance terminology
• A derived class derives from a base class
• A subclass inherits from a superclass
class Person(Animal):
def __init__(self, age, name):
super(Person, self).__init__(age)
self.name=name
In [14]: yuml("[Animal]^-[Bird],[Bird]^-[Eagle],[Bird]^-[Starling]%")
Out[14]:
In [15]: yuml("[Model]<>-*>[Boid],[Boid]position++->[Vector],[Boid]velocity++->[Vector]%")
Out[15]:
299
The open diamond indicates Aggregation, the closed diamond composition. (A given boid might
belong to multiple models, a given position vector is forever part of the corresponding Boid.)
The asterisk represents cardinality, a model may contain multiple Boids.
class Pet(object):
def __init__(self, age, owner):
self.age = age
self.owner = owner
def birthday(self):
self.age += 1
After:
class Person(Animal):
def __init__(self, age, job):
self.job = job
super(Person, self).__init__(age)
26.13 Polymorphism
26.13.1 Polymorphism
In [18]: class Dog(object):
def noise(self):
return "Bark"
class Cat(object):
def noise(self):
return "Miaow"
300
class Pig(object):
def noise(self): return "Oink"
class Cow(object):
def noise(self): return "Moo"
Bark
Bark
Miaow
Oink
Moo
Miaow
class Dog(Animal):
def noise(self): return "Bark"
class Worm(Animal):
pass
class Poodle(Animal):
pass
Bark
I don’t make a noise.
Oink
Moo
I don’t make a noise.
301
Instead, we can explicitly deliberately leave this undefined, and we get a crash if we access an undefined
method.
---------------------------------------------------------------------------
<ipython-input-21-048812fbd3ee> in <module>()
----> 1 Worm().noise() # Generates error
def noise(self):
if self.type=="Dog":
return "Bark"
elif self.type=="Cat":
return "Miaow"
elif self.type=="Cow":
return "Moo"
In [23]: yuml("[<<Animal>>]^-.-[Dog]")
Out[23]:
302
26.13.7 Further UML
UML is a much larger diagram language than the aspects we’ve shown here.
• Message sequence charts show signals passing back and forth between objects (Web Sequence Diagrams)
• Entity Relationship Diagrams can be used to show more general relationships between things in a
system
26.14 Patterns
26.14.1 Class Complexity
We’ve seen that using object orientation can produce quite complex class structures, with classes owning
each other, instantiating each other, and inheriting from each other.
There are lots of different ways to design things, and decisions to make.
Should I inherit from this class, or own it as a member variable? (“is a” vs “has a”)
• Intent
• Motivation
• Applicability
• Structure
• Participants
• Collaborations
• Consequences
• Implementation
• Sample Code
• Factory Method
303
• Builder
• Handle-Body
• Strategy
---------------------------------------------------------------------------
<ipython-input-1-8ac5e7e89ac3> in <module>()
----> 1 yuml("[Product]^-[ConcreteProduct], [Creator| (v) FactoryMethod()]^-[ConcreteCreator| Factor
304
In [3]: class AgentModel(object):
def __init__(self, config):
self.agents=[]
for agent_config in config:
self.agents.append(self.create(**agent_config))
This is the factory method pattern: a common design solution to the need to defer the construction of
daughter objects to a derived class.
There is no need to define an explicit base interface for the “Agent” concept in Python: anything that
responds to “simulate” and “interact” methods will do: this is our Agent concept.
26.16 Builder
In [6]: from mock import Mock
---------------------------------------------------------------------------
<ipython-input-7-20d53252b1a7> in <module>()
----> 1 yuml("[Director|Construct()]<>->[Builder| (a) BuildPart()], [Builder]^-[ConcreteBuilder| Bui
305
26.16.2 Builder example
Let’s continue our Agent Based modelling example.
There’s a lot more to defining a model than just adding agents of different kinds: we need to define
boundary conditions, specify wind speed or light conditions.
We could define all of this for an imagined advanced Model with a very very long constructor, with lots
of optional arguments:
In [8]: class Model(object):
def __init__(self, xsize, ysize,
agent_count, wind_speed,
agent_sight_range, eagle_start_location):
pass
306
26.16.6 Builder Message Sequence
Note: Need to add message sequence chart here
26.17 Strategy
In [12]: from numpy import linspace,exp,log,sqrt, array
import math
from scipy.interpolate import UnivariateSpline
from scipy.signal import lombscargle
from scipy.integrate import cumtrapz
from numpy.fft import rfft,fft,fftfreq
import csv
from StringIO import StringIO
from datetime import datetime
import requests
import matplotlib.pyplot as plt
In [15]: spots=load_sunspots()
plt.plot(spots)
307
26.18.2 Sunspot cycle has periodicity
In [16]: spectrum=rfft(spots)
308
26.18.3 Years are not constant length
There’s a potential problem with this analysis however:
We also want to find the period of the strongest periodic signal in the data, there are various different
methods we could use for this also, such as integrating the fourier series by quadrature to find the mean
frequency, or choosing the largest single value.
• The constructors for each derived class will need arguments for all the numerical method’s control
parameters, such as the degree of spline for the interpolation method, the order of quadrature for
integrators, and so on.
309
• Where we have multiple algorithmic choices to make (interpolator, periodogram, peak finder. . . ) the
number of derived classes would explode: class SunspotAnalyzerSplineFFTTrapeziumNearMode is
a bit unweildy.
• The algorithmic choices are not then available for other projects
• This design doesn’t fit with a clean Ontology of “kinds of things”: there’s no Abstract Base for
spectrogram generators. . .
def load_data(self):
start_date_str=’1700-12-31’
end_date_str=’2014-01-01’
self.start_date=self.format_date(start_date_str)
end_date=self.format_date(end_date_str)
url_base=("http://www.quandl.com/api/v1/datasets/"+
"SIDC/SUNSPOTS_A.csv")
x=requests.get(url_base,params={’trim_start’:start_date_str,
’trim_end’:end_date_str,
’sort_order’:’asc’})
secs_per_year=(datetime(2014,1,1)-datetime(2013,1,1)
).total_seconds()
data=csv.reader(StringIO(x.text)) #Convert requests
#result to look
310
#like a file buffer before
#reading with CSV
data.next() # Skip header row
self.series=Series([[
(self.format_date(row[0])-self.start_date
).total_seconds()/secs_per_year
,float(row[1])] for row in data])
def frequency_data(self):
return self.frequency_strategy.transform(self.series)
311
26.18.12 Strategy Pattern for Algorithms
Define our concrete solutions with particular strategies
In [22]: fourier_model=AnalyseSunspotData(FourierSplineFrequencyStrategy())
lomb_model=AnalyseSunspotData(LombFrequencyStrategy())
nearest_model=AnalyseSunspotData(FourierNearestFrequencyStrategy())
In [23]: comparison=fourier_model.frequency_data().inverse_plot_data+[’r’]
comparison+=lomb_model.frequency_data().inverse_plot_data+[’g’]
comparison+=nearest_model.frequency_data().inverse_plot_data+[’b’]
deviation=365*(fourier_model.series.times-linspace(
fourier_model.series.start,
fourier_model.series.end,
fourier_model.series.count))
312
26.18.15 Deviation of year length from average
In [25]: plt.plot(deviation)
26.19 Model-View-Controller
26.19.1 Separate graphics from science!
Whenever we are coding a simulation or model we want to:
We often see scientific programs where the code which is used to display what is happening is mixed up
with the mathematics of the analysis. This is hard to understand.
We can do better by separating the Model from the View, and using a “Controller” to manage them.
26.19.2 Model
In [26]: import numpy as np
class Model(object):
def __init__(self):
self.positions=np.random.rand(100,2)
self.speeds=np.random.rand(100,2)+np.array([-0.5,-0.5])[np.newaxis,:]
self.deltat=0.01
def simulation_step(self):
313
self.positions += self.speeds * self.deltat
def agent_locations(self):
return self.positions
26.19.3 View
In [27]: class View(object):
def __init__(self, model):
from matplotlib import pyplot as plt
self.figure=plt.figure()
axes=plt.axes()
self.model=model
self.scatter=axes.scatter(model.agent_locations()[:,0],
model.agent_locations()[:,1])
def update(self):
self.scatter.set_offsets(self.model.agent_locations())
26.19.4 Controller
In [28]: class Controller(object):
def __init__(self):
self.model=Model() # Or use Builder
self.view=View(self.model)
def animate(frame_number):
self.model.simulation_step()
self.view.update()
self.animator=animate
def go(self):
from JSAnimation import IPython_display
from matplotlib import animation
anim = animation.FuncAnimation(self.view.figure, self.animator, frames=200, interval=50
return anim
In [29]: contl=Controller()
314
In [30]: from matplotlib import pyplot as plt
%matplotlib inline
contl.go()
For the Exercise, you should start from the GitHub repository, but here’s my terrible code:
In [1]: """
A deliberately bad implementation of [Boids](http://dl.acm.org/citation.cfm?doid=37401.37406)
for use as an exercise on refactoring.
"""
import random
315
# Deliberately terrible code for teaching purposes
def update_boids(boids):
xs,ys,xvs,yvs=boids
# Fly towards the middle
for i in range(len(xs)):
for j in range(len(xs)):
xvs[i]=xvs[i]+(xs[j]-xs[i])*0.01/len(xs)
for i in range(len(xs)):
for j in range(len(xs)):
yvs[i]=yvs[i]+(ys[j]-ys[i])*0.01/len(xs)
# Fly away from nearby boids
for i in range(len(xs)):
for j in range(len(xs)):
if (xs[j]-xs[i])**2 + (ys[j]-ys[i])**2 < 100:
xvs[i]=xvs[i]+(xs[i]-xs[j])
yvs[i]=yvs[i]+(ys[i]-ys[j])
# Try to match speed with nearby boids
for i in range(len(xs)):
for j in range(len(xs)):
if (xs[j]-xs[i])**2 + (ys[j]-ys[i])**2 < 10000:
xvs[i]=xvs[i]+(xvs[j]-xvs[i])*0.125/len(xs)
yvs[i]=yvs[i]+(yvs[j]-yvs[i])*0.125/len(xs)
# Move according to velocities
for i in range(len(xs)):
xs[i]=xs[i]+xvs[i]
ys[i]=ys[i]+yvs[i]
figure=plt.figure()
axes=plt.axes(xlim=(-500,1500), ylim=(-500,1500))
scatter=axes.scatter(boids[0],boids[1])
def animate(frame):
update_boids(boids)
scatter.set_offsets(zip(boids[0],boids[1]))
cd bad_boids
python bad_boids.py
You should be able to see some birds flying around, and then disappearing as they leave the window.
316
26.20.2 Your Task
Transform bad boids gradually into better code, while making sure it still works, using a Refactoring
approach.
import yaml
import boids
from copy import deepcopy
before=deepcopy(boids.boids)
boids.update_boids(boids.boids)
after=boids.boids
fixture={"before":before,"after":after}
fixture_file=open("fixture.yml",’w’)
fixture_file.write(yaml.dump(fixture))
fixture_file.close()
def test_bad_boids_regression():
regression_data=yaml.load(open(os.path.join(os.path.dirname(__file__),’fixture.yml’)))
boid_data=regression_data["before"]
update_boids(boid_data)
assert_equal(regression_data["after"],boid_data)
nosetests
Edit the file to make the test fail, see the fail, then reset it:
317
Chapter 27
All concepts, ideas, or instructions should be in the program in just one place. Every line in the program
should say something useful and important.
We refer to code that respects this principle as DRY code.
In this chapter, we’ll look at some techniques that can enable us to refactor away repetitive code.
Since in many of these places, the techniques will involve working with functions as if they were vari-
ables, we’ll learn some functional programming. We’ll also learn more about the innards of how Python
implements classes.
We’ll also think about how to write programs that generate the more verbose, repetitive program we
could otherwise write. We call this metaprogramming.
add(5,6)
318
Out[1]: 11
How could we do this, in a fictional version of Python which only defined functions of one argument? In
order to understand this, we’ll have to understand several of the concepts of functional programming. Let’s
start with a program which just adds five to something:
In [2]: def add_five(a):
return a+5
add_five(6)
Out[2]: 11
OK, we could define lots of these, one for each number we want to add. But that would be infinitely
repetitive. So, let’s try to metaprogram that: we want a function which returns these add N() functions.
Let’s start with the easy case: a function which returns a function which adds 5 to something:
In [3]: def generate_five_adder():
def _add_five(a):
return a+5
return _add_five
coolfunction = generate_five_adder()
coolfunction(7)
Out[3]: 12
OK, so what happened there? Well, we defined a function inside the other function. We can always do
that:
In [4]: def thirty_function():
def times_three(a):
return a*3
def add_seven(a):
return a+7
return times_three(add_seven(3))
thirty_function()
Out[4]: 30
When we do this, the functions enclosed inside the outer function are local functions, and can’t be seen
outside:
In [5]: with assert_raises(NameError):
add_seven
---------------------------------------------------------------------------
<ipython-input-5-d7b364b66955> in <module>()
----> 1 with assert raises(NameError):
2 add seven
319
There’s not really much of a difference between functions and other variables in python. A function is
just a variable which can have () put after it to call the code!
for fun in x:
print fun
And we know that one of the things we can do with a variable is return it. So we can return a function,
and then call it outside:
friendlyfunction=deferred_greeting()
# Do something else
print "Just passing the time..."
# OK, Go!
friendlyfunction()
So now, to finish this, we just need to return a function to add an arbitrary amount:
add_3=define_adder(3)
add_3(9)
Out[8]: 12
We can make this even prettier: let’s make another variable pointing to our define adder() function:
In [10]: add(8)(5)
Out[10]: 13
320
27.2.2 Closures
You may have noticed something a bit weird:
In the definition of define adder, increment is a local variable. It should have gone out of scope and
died at the end of the definition. How can the amount the returned adder function is adding still be kept?
This is called a closure. In Python, whenever a function definition references a variable in the surrounding
scope, it is preserved within the function definition.
You can close over global module variables as well:
greet()
Hello, James
And note that the closure stores a reference to the variable in the surrounding scope: (“Late Binding”)
In [12]: name="Matt"
greet()
Hello, Matt
In [13]: numbers=range(10)
This map operation is really important conceptually when understanding efficient parallel programming:
different computers can apply the mapped function to their input at the same time. We call this Single
Program, Multiple Data. (SPMD) map is half of the map-reduce functional programming paradigm
which is key to the efficient operation of much of today’s “data science” explosion.
Let’s continue our functional programming mind-stretch by looking at reduce operations.
We very often want to loop with some kind of accumulator, such as when finding a mean, or finding a
maximum:
mean(range(10))
321
import sys
def my_max(data):
# Start with the smallest possible number
highest=sys.float_info.min
for x in data:
if x>highest:
highest=x
return highest
my_max([2,5,10,-11,-5])
Out[15]: 10
These operations, where we have some variable which is building up a result, and the result is updated
with some operation, can be gathered together as a functional program, taking in the operation to be used
to combine results as an argument:
def my_sum(data):
def _add(a,b):
return a+b
return accumulate(0, _add, data)
print my_sum(range(5))
def bigger(a,b):
if b>a:
return b
return a
def my_max(data):
return accumulate(sys.float_info.min, bigger, data)
print my_max([2,5,10,-11,-5])
10
10
Now, because these operations, bigger, and add, are such that e.g. (a+b)+c = a+(b+c) , i.e. they are
associative, we could apply our accumulation to the left half and the right half of the array, each on a
different computer, and then combine the two halves:
1+2+3+4=(1+2)+(3+4)
Indeed, with a bigger array, we can divide-and-conquer more times:
1+2+3+4+5+6+7+8=((1+2)+(3+4))+((5+6)+(7+8))
So with enough parallel computers, we could do this operation on eight numbers in three steps: first, we
use four computers to do one each of the pairwise adds.
Then, we use two computers to add the four totals.
Then, we use one of the computers to do the final add of the two last numbers.
You might be able to do the maths to see that with an N element list, the number of such steps is
proportional to the logarithm of N.
322
We say that with enough computers, reduction operations are O(ln N)
This course isn’t an introduction to algorithms, but we’ll talk more about this O() notation when we
think about programming for performance.
Anyway, this accumulate-under-an-operation process, is so fundamental to computing that it’s usually
in standard libraries for languages which allow functional programming:
def my_max(data):
return reduce(bigger,data,sys.float_info.min)
my_max([2,5,10,-11,-5])
Out[17]: 10
def most_Gs_in_any_sequence(sequences):
return max(map(lambda sequence: sequence.count(’G’),sequences))
data=[
"CGTA",
"CGGGTAAACG",
"GATTACA"
]
most_Gs_in_any_sequence(data)
Out[18]: 4
def func_name(a,b,c):
a+b+c
most_of_given_base_in_any_sequence(data,’A’)
Out[20]: 3
The above fragment defined a lambda function as a closure over base. If you understood that, you’ve
got it!
323
In [21]: def my_max(data): return reduce(lambda a,b: a if a>b else b, data,
sys.float_info.min)
my_max([2,5,10,-11,-5])
Out[21]: 10
xs=linspace(-1,2,50)
solved=[xs,map(solve_me,xs),xs,zeros(50)]
plt.plot(*solved)
1.0 -3.44190514264e-21
324
In [24]: def derivative(func, eps):
def _func_derived(x):
return (func(x+eps)-func(x))/eps
return _func_derived
derived=(xs,map(solve_me,xs),xs,map(derivative(solve_me,0.01),xs))
plt.plot(*derived)
print newton(derivative(solve_me,0.01),0)
0.495
def derivative(func):
def _func_derived(x):
return scipy.misc.derivative(solve_me,x)
return _func_derived
newton(derivative(solve_me),0)
Out[25]: 0.5
If you’ve done a moderate amount of calculus, then you’ll find similarities between functional program-
ming in computer science and Functionals in the calculus of variations.
325
In [1]: for key in baskets:
print key.upper()
---------------------------------------------------------------------------
<ipython-input-1-ddc35b8862a9> in <module>()
----> 1 for key in baskets:
2 print key.upper()
Surprisingly often, we want to iterate over something that takes a moderately large amount of storage to
store. For example, our map images in the green-graph example.
Our green-graph example involved making an array of all the maps between London and Birmingham.
This kept them all in memory at the same time: first we downloaded all the maps, then we counted the
green pixels in each of them.
This would NOT work if we used more points. We need to use a generator
27.6 Iterators
Consider the basic python range function:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [3]: total=0
for x in range(int(1e6)): total+= x
total
Out[3]: 499999500000
While this was executing, the range() statement allocated a million integers. This is very inefficient.
We don’t actually need a million integers at once, just each integer in turn up to a million.
xrange is like range, but yields an iterable which is not an array.
xrange(3)
0
1
2
An generator object, like xrange(3), when we iterate over it, works by defining a next() method which
moves the iterator forward:
326
In [6]: a=iter(xrange(3))
In [7]: a.next()
Out[7]: 0
In [8]: a.next()
Out[8]: 1
In [9]: a.next()
Out[9]: 2
---------------------------------------------------------------------------
<ipython-input-10-5aed2ff39c77> in <module>()
----> 1 print a.next()
StopIteration:
In [11]: total=0
for x in xrange(int(1e6)): total+= x
print total
499999500000
Similarly:
In [12]: baskets={
’apples’:5,
’oranges’:3,
’kiwis’:2
}
In [13]: baskets.items()
In [14]: baskets.iteritems()
327
27.7 Defining Our Own Iterable
We can make our own iterators by defining classes that implement next() and iter() methods: this is the
iterator protocol.
For each of the concepts, in Python, like sequence, container, iterable, python defines a protocol, a set
of methods a class must implement, in order to be treated as a member of that concept.
The iterator protocol is the protocol that defines things that support for x in y:.
To define an iterator, the methods that must be supported are next() and iter ().
next() must update the iterator.
We’ll see why we need to define iter in a moment.
def __iter__(self):
return self
def next(self):
(self.previous, self.current)=(
self.current, self.previous+self.current)
self.limit -=1
if self.limit<0: raise StopIteration() # This will be
# explained in a few slides!
return self.current
In [16]: x=fib_iterator(5)
In [17]: x.next()
Out[17]: 2
In [18]: x.next()
Out[18]: 3
In [19]: x.next()
Out[19]: 5
In [20]: x.next()
Out[20]: 8
2
3
5
8
13
In [22]: sum(fib_iterator(5))
Out[22]: 31
328
27.7.1 A shortcut to iterables: the iter method.
In fact, if, to be iterated over, a class just wants to behave as if it were some other iterable, you can just
implement iter and return iter(some other iterable), without implementing next. For example, an
image class might want to implement some metadata, but behave just as if it were just a 1-d pixel array
when being iterated:
class MyImage(object):
def __init__(self, pixels):
self.pixels=array(pixels,dtype=’uint8’)
self.channels=self.pixels.shape[2]
def __iter__(self):
# return an iterator over the pixels
# See future NumPy lecture for using reshape
return iter(self.pixels.reshape(-1,self.channels))
def show(self):
plt.imshow(self.pixels, interpolation="None")
x=[[[255,255,0],[0,255,0]],[[0,0,255],[255,255,255]]]
image=MyImage(x)
In [25]: image.channels
Out[25]: 3
329
In [26]: from webcolors import rgb_to_name
for pixel in image:
print rgb_to_name(pixel)
---------------------------------------------------------------------------
<ipython-input-26-755abc5ac407> in <module>()
----> 1 from webcolors import rgb to name
2 for pixel in image:
3 print rgb to name(pixel)
The iterator protocol is to implement both iter and next, while the iterable protocol is to imple-
ment iter and return an something iterable.
27.7.2 Generators
There’s a fair amount of “boiler-plate” in the above class-based definition of an iterable.
Python provides another way to specify something which meets the iterator protocol: generators.
x=my_generator()
print x.next()
print x.next()
with assert_raises(StopIteration):
print x.next()
5
10
---------------------------------------------------------------------------
<ipython-input-27-69305c995a91> in <module>()
7 print x.next()
8
----> 9 with assert raises(StopIteration):
10 print x.next()
330
A function which has yield statements instead of a return statement returns temporarily: it automag-
ically becomes something which implements next.
Each call of next() returns control to the function where it left off.
Control passes back-and-forth between the generator and the caller. Our fibonacci example therefore
becomes a function rather than a class.
31
In [30]: plt.plot(list(yield_fibs(20)))
331
---------------------------------------------------------------------------
<ipython-input-31-16ce57c9cf6d> in <module>()
----> 1 with open(’example.yaml’) as foo:
2 print yaml.load(foo)
How could we define our own one of these, if we too have clean-up code we always want to run after a
calling function has done its work, or set-up code we want to do first?
We can define a class that meets an appropriate protocol:
with verbose_context("James"):
print "Doing it!"
However, this is pretty verbose! Again, a generator with yield makes for an easier syntax:
@contextmanager
def verbose_context(name):
print "Get ready for action, ", name
yield name.upper()
print "You did it"
27.8.2 Decorators
When doing functional programming, we may often want to define mutator functions which take in one
function and return a new function, such as our derivative example earlier.
332
In [34]: def repeater(count):
def wrap_function_in_repeat(func):
def _repeated(x):
counter=count
while counter>0:
counter-=1
x=func(x)
return x
return _repeated
return wrap_function_in_repeat
fiftytimes=repeater(50)
fiftyroots=fiftytimes(sqrt)
print fiftyroots(100)
1.0
It turns out that, quite often, we want to apply one of these to a function as we’re defining a class. For
example, we may want to specify that after certain methods are called, data should always be stored:
class SomeClass(object):
def __init__(self):
self.data=[]
self.stored_data=[]
def _step1(self, ins):
self.data=[x*2 for x in ins]
step1=reset_required(_step1)
In [36]: x=SomeClass()
In [37]: x.step1("Hello")
print x.data
In [38]: x.step1("World")
print x.data
[[’HH’, ’ee’, ’ll’, ’ll’, ’oo’], [’WW’, ’oo’, ’rr’, ’ll’, ’dd’]]
Python provides some “syntactic sugar” to make this kind of coding prettier:
333
In [40]: def reset_required(func):
def _with_data_save(self, *args):
func(self,*args)
self.stored_data.append(self.data)
return _with_data_save
class SomeClass(object):
def __init__(self):
self.data=[]
self.stored_data=[]
@reset_required
def step1(self, ins):
self.data=[x*2 for x in ins]
x=SomeClass()
x.step1("Hello")
x.step1("World")
print x.stored_data
[[’HH’, ’ee’, ’ll’, ’ll’, ’oo’], [’WW’, ’oo’, ’rr’, ’ll’, ’dd’]]
Any function which accepts a function as its first argument and returns a function can be used as a
decorator like this.
Much of Python’s standard functionality is implemented as decorators: we’ve seen @contextmanager,
@classmethod and @attribute. The @contextmanager metafunction, for example, takes in an iterator, and
yields a class conforming to the context manager protocol.
def test_greeter():
with open(os.path.join(os.path.dirname(
__file__),’fixtures’,’samples.yaml’)
) as fixtures_file:
fixtures=yaml.load(fixtures_file)
for fixture in fixtures:
yield assert_exemplar(**fixture)
Each time a function beginning with test does a yield it results in another test.
with assert_raises(AttributeError):
x=2
x.foo()
334
We can now see how nose might have implemented this:
@contextmanager
def reimplement_assert_raises(exception):
try:
yield
except exception:
pass
else:
raise Exception("Expected,", exception,
" to be raised, nothing was.")
@raises(TypeError, ValueError)
def test_raises_type_error():
raise TypeError("This test passes")
In [46]: test_raises_type_error()
In [47]: @raises(Exception)
def test_that_fails_by_passing():
pass
In [48]: test_that_fails_by_passing()
---------------------------------------------------------------------------
<ipython-input-48-fbd84a070d60> in <module>()
----> 1 test that fails by passing()
335
We could reimplement this ourselves now too:
# Return it
return _output
return wrap_function
In [50]: @homemade_raises_decorator(TypeError)
def test_raises_type_error():
raise TypeError("This test passes")
In [51]: test_raises_type_error()
27.9 Exceptions
27.9.1 Exceptions
When we learned about testing, we saw that Python complains when things go wrong by raising an “Excep-
tion” naming a type of error:
---------------------------------------------------------------------------
<ipython-input-1-818314fe40b1> in <module>()
----> 1 with assert raises(ZeroDivisionError):
2 1/0
Exceptions are objects, forming a class hierarchy. We just raised an instance of the ZeroDivisionError
class, making the program crash.
Out[2]: (ZeroDivisionError,
ArithmeticError,
StandardError,
Exception,
BaseException,
object)
So we can see that a zero division error is a particular kind of Arithmetic Error.
336
In [3]: x=1
with assert_raises(TypeError):
for y in x: print y
inspect.getmro(TypeError)
---------------------------------------------------------------------------
<ipython-input-3-73048ceb9439> in <module>()
1 x=1
----> 2 with assert raises(TypeError):
3 for y in x: print y
4
5 inspect.getmro(TypeError)
When we were looking at testing, we saw that it is important for code to crash with a meaningful
exception type when something is wrong. We raise an Exception with raise. Often, we can look for an
appropriate exception from the standard set to raise.
However, we may want to define our own exceptions. Doing this is as simple as inheriting from Exception:
with assert_raises(MyCustomErrorType):
raise(MyCustomErrorType("Problem"))
---------------------------------------------------------------------------
<ipython-input-4-c4ea74bc09ae> in <module>()
2 pass
3
----> 4 with assert raises(MyCustomErrorType):
5 raise(MyCustomErrorType("Problem"))
337
with assert_raises(MyCustomErrorType):
raise(MyCustomErrorType(404))
---------------------------------------------------------------------------
<ipython-input-5-0de47407e54b> in <module>()
5 return "Error, cateory " + str(self. category)
6
----> 7 with assert raises(MyCustomErrorType):
8 raise(MyCustomErrorType(404))
The real power of exceptions comes, however, not in letting them crash the program, but in letting your
program handle them. We say that an exception has been “thrown” and then “caught”.
print user
anonymous
Note that we specify only the error we expect to happen and want to handle. Sometimes you see code
that catches everything:
In [7]: try:
config=yaml.lod(open("datasource.yaml"))
user=config["userid"]
password=config["password"]
except:
user="anonymous"
password=None
print user
anonymous
There was a mistyped function name there, but we did not notice the error, as the generic except caught
it. Therefore, we should catch only the error we want.
338
with open(’datasource3.yaml’,’w’) as outfile:
outfile.write(’user: jamespjh\n’)
outfile.write(’password: secret\n’)
def read_credentials(source):
try:
datasource=open(source)
config=yaml.load(datasource)
user=config["userid"]
password=config["password"]
datasource.close()
except IOError:
user="anonymous"
password=None
return user, password
print read_credentials(’datasource2.yaml’)
print read_credentials(’datasource.yaml’)
with assert_raises(KeyError):
print read_credentials(’datasource3.yaml’)
(’jamespjh’, ’secret’)
(’anonymous’, None)
---------------------------------------------------------------------------
<ipython-input-8-9d772c187db7> in <module>()
23 print read credentials(’datasource.yaml’)
24
---> 25 with assert raises(KeyError):
26 print read credentials(’datasource3.yaml’)
This last code has a flaw: the file was successfully opened, the missing key was noticed, but not explicitly
closed. It’s normally OK, as python will close the file as soon as it notices there are no longer any references
to datasource in memory, after the function exits. But this is not good practice, you should keep a file handle
for as short a time as possible.
339
password=None
finally:
datasource.close()
return user, password
Exceptions do not have to be caught close to the part of the program calling them. They can be caught
anywhere “above” the calling point in the call stack: control can jump arbitrarily far in the program: up to
the except clause of the “highest” containing try statement.
340
In [14]: def f1(x):
try:
print "F1Before"
f2(x)
print "F1After"
except TypeError:
print "F1Except"
In [15]: f1(0)
F1Before
F2Before
F3Before
F3After
F2After
F1After
In [16]: f1(1)
F1Before
F2Before
F3Before
F3Except
F2After
F1After
In [17]: f1(2)
F1Before
F2Before
F3Before
F2Except
F1After
In [18]: f1(3)
F1Before
F2Before
F3Before
F1Except
341
if type(source)==dict:
name=source[’modelname’]
else:
content=open(source)
source=yaml.load(content)
name=source[’modelname’]
print name
In [20]: analysis({’modelname’:’Super’})
Super
In [22]: analysis(’example.yaml’)
brilliant
analysis(’example.yaml’)
brilliant
This approach is more extensible, and behaves properly if we give it some other data-source
which responds like a dictionary or string.
analysis("modelname: Amazing")
Amazing
342
Sometimes we want to catch an error, partially handle it, perhaps add some extra data to the exception,
and then re-raise to be caught again further up the call stack.
The keyword “raise” with no argument in an except: clause will cause the caught error to be re-thrown.
Doing this is the only circumstance where it is safe to do except: without catching a specfic type of error.
In [25]: try:
# Something
pass
except:
# Do this code here if anything goes wrong
raise
It can be useful to catch and re-throw an error as you go up the chain, doing any clean-up needed for
each layer of a program.
The error will finally be caught and not re-thrown only at a higher program layer that knows how to
recover. This is known as the “throw low catch high” principle.
Imagine we wanted to make a library to describe some kind of symbolic algebra system:
class Expression(object):
def __init__(self, terms): self.terms=terms
In [3]: first=Term([’x’,’y’],[2,1],5)
second=Term([’x’],[1],7)
third=Term([],[],2)
result=Expression([first, second, third])
343
This is pretty cumbersome.
What we’d really like is to have 2x+y give an appropriate expression.
First, we’ll define things so that we can construct our terms and expressions in different ways.
In [6]: @extend(Term)
class Term(object):
def add(self, *others):
return Expression((self,)+others)
In [7]: @extend(Term)
class Term(object):
def multiply(self, *others):
result_data=dict(self.data)
result_coeff=self.coefficient
# Convert arguments to Terms first if they are
344
# constants or integers
others=map(Term,others)
for another in others:
for symbol, exponent in another.data.iteritems():
if symbol in result_data:
result_data[symbol]+=another.data[symbol]
else:
result_data[symbol]=another.data[symbol]
result_coeff*=another.coefficient
return Term(result_data,result_coeff)
In [8]: @extend(Expression)
class Expression(object):
def add(self, *others):
result=Expression(self.terms)
for another in others:
if type(another)==Term:
result.terms.append(another)
else:
result.terms+=another.terms
return result
In [9]: x=Term(’x’)
y=Term(’y’)
first=Term(5).multiply(Term(’x’),Term(’x’),Term(’y’))
second=Term(7).multiply(Term(’x’))
third=Term(2)
expr=first.add(second,third)
This is better, but we still can’t write the expression in a ‘natural’ way.
However, we can define what * and + do when applied to Terms!:
In [10]: @extend(Term)
class Term(object):
def __add__(self, other):
return self.add(other)
def __mul__(self, other):
return self.multiply(other)
In [11]: @extend(Expression)
class Expression(object):
def multiply(self, another):
# Distributive law left as exercise
pass
In [12]: x_plus_y=Term(’x’)+’y’
x_plus_y.terms[0].data
Out[12]: {’x’: 1}
345
In [13]: five_x_ysq=Term(’x’)*5*’y’*’y’
print five_x_ysq.data, five_x_ysq.coefficient
{’y’: 2, ’x’: 1} 5
This is called operator overloading. We can define what add and multiply mean when applied to our
class.
Note that this only works so far if we multiply on the right-hand-side! However, we can define a multi-
plication that works backwards, which is used as a fallback if the left multiply raises an error:
In [14]: @extend(Expression)
class Expression(object):
def __radd__(self, other):
return self.__add__(other)
In [15]: @extend(Term)
class Term(object):
def __rmul__(self, other):
return self.__mul__(other)
def __radd__(self, other):
return self.__add__(other)
It’s not easy at the moment to see if these things are working!
In [17]: fivex=5*Term(’x’)
print fivex.data, fivex.coefficient
{’x’: 1} 5
We can add another operator method str , which defines what happens if we try to print our class:
In [18]: @extend(Term)
class Term(object):
def __str__(self):
def symbol_string(symbol, power):
if power==1:
return symbol
else:
return symbol+’^’+str(power)
symbol_strings=[symbol_string(symbol, power)
for symbol, power in self.data.iteritems()]
prod=’*’.join(symbol_strings)
if not prod:
return str(self.coefficient)
if self.coefficient==1:
return prod
else:
return str(self.coefficient)+’*’+prod
346
In [19]: @extend(Expression)
class Expression(object):
def __str__(self):
return ’+’.join(map(str,self.terms))
In [20]: first=Term(5)*’x’*’x’*’y’
second=Term(7)*’x’
third=Term(2)
expr=first+second+third
5*y*x^2+7*x+2
We can add lots more operators to classes. eq to determine if objects are equal. getitem to apply
[1] to your object. Probably the most exciting one is call , which overrides the () operator; allows us to
define classes that behave like functions! We call these callables.
greeter_instance = Greeter("Hello")
greeter_instance("James")
Hello James
We’ve now come full circle in the blurring of the distinction between functions and objects! The full
power of functional programming is really remarkable.
If you want to know more about the topics in this lecture, using a different language syntax, I recommend
you watch the Abelson and Sussman “Structure and Interpretation of Computer Programs” lectures. These
are the Computer Science equivalent of the Feynman Lectures!
In [1]: bananas=0
apples=0
oranges=0
bananas+=1
apples+=1
oranges+=1
The right hand side of these assignments doesn’t respect the DRY principle. We could of course define
a variable for our initial value:
In [2]: initial_fruit_count=0
bananas=initial_fruit_count
apples=initial_fruit_count
oranges=initial_fruit_count
347
However, this is still not as DRY as it could be: what if we wanted to replace the assignment with, say,
a class constructor and a buy operation:
bananas=Basket()
apples=Basket()
oranges=Basket()
bananas.buy()
apples.buy()
oranges.buy()
We had to make the change in three places. Whenever you see a situation where a refactoring or change
of design might require you to change the code in multiple places, you have an opportunity to make the code
DRYer.
In this case, metaprogramming for incrementing these variables would involve just a loop over all the
variables we want to initialise:
So can we declare a new variable programmatically? Given a list of the names of fruit baskets we want,
initialise a variable with that name?
globals()[’apples’]
Wow, we can! Every module or class in Python, is, under the hood, a special dictionary, storing the
values in its namespace. So we can create new variables by assigning to this dictionary. globals() gives a
reference to the attribute dictionary for the current module
print kiwis.count
This is metaprogramming.
I would NOT recommend using it for an example as trivial as the one above. A better, more Pythonic
choice here would be to use a data structure to manage your set of fruit baskets:
348
In [8]: baskets={}
for name in basket_names:
baskets[name]=Basket()
print baskets[’kiwis’].count
Which is the nicest way to do this, I think. Code which feels like metaprogramming is needed to make
it less repetitive can often instead be DRYed up using a refactored data structure, in a way which is cleaner
and more easy to understand. Nevertheless, metaprogramming is worth knowing.
In [11]: x=Boring()
x.name="James"
In [12]: x.name
Out[12]: ’James’
And these turn up, as expected, in an attribute dictionary for the class:
In [13]: x.__dict__
Out[14]: ’James’
If we want to add an attribute given it’s name as a string, we can use setattr:
In [15]: setattr(x,’age’,38)
x.age
Out[15]: 38
349
And we could do this in a loop to programmatically add many attributes.
The real power of accessing the attribute dictionary comes when we realise that there is very little
difference between member data and member functions.
Now that we know, from our functional programming, that a function is just a variable that can
be called with (), we can set an attribute to a function, and it becomes a member function!
In [17]: x.describe()
In [18]: x.describe
In [19]: Boring.describe
Note that we set this method as an attribute of the class, not the instance, so it is available to other
instances of Boring:
In [20]: y=Boring()
y.name = ’Jim’
y.age = 99
In [21]: y.describe()
We can define a standalone function, and then bind it to the class. It’s first argument automagically
becomes self.
In [23]: Boring.birth_year=broken_birth_year
In [24]: x.birth_year()
Out[24]: 1977
In [25]: x.birth_year
Out[25]: <bound method Boring.broken birth year of < main .Boring object at 0x106085350>>
In [26]: x.birth_year.__name__
350
27.11.2 Metaprogramming function locals
We can access the attribute dictionary for the local namespace inside a function with locals() but this
cannot be written to.
Lack of safe programmatic creation of function-local variables is a flaw in Python.
In [29]: me.name
Out[29]: ’James’
Sometimes, metaprogramming will be really helpful in making non-repetitive code, and you should have
it in your toolbox, which is why I’m teaching you it. But doing it all the time overcomplicated matters.
We’ve talked a lot about the DRY principle, but there is another equally important principle:
Whenever you write code and you think, “Gosh, I’m really clever”,you’re probably doing it wrong. Code
should be about clarity, not showing off.
351
Chapter 28
Performance programming
We’ve spent most of this course looking at how to make code readable and reliable. For research work, it is
often also important that code is efficient: that it does what it needs to do quickly.
It is very hard to work out beforehand whether code will be efficient or not: it is essential to Profile code,
to measure its performance, to determine what aspects of it are slow.
When we looked at Functional programming, we claimed that code which is conceptualised in terms of
actions on whole data-sets rather than individual elements is more efficient. Let’s measure the performance
of some different ways of implementing some code and see how they perform.
In [2]: xmin=-1.5
ymin=-1.0
xmax=0.5
ymax=1.0
resolution=300
xstep=(xmax-xmin)/resolution
ystep=(ymax-ymin)/resolution
xs=[(xmin+(xmax-xmin)*i/resolution) for i in range(resolution)]
ys=[(ymin+(ymax-ymin)*i/resolution) for i in range(resolution)]
In [3]: %%timeit
data=[[mandel1(complex(x,y)) for x in xs] for y in ys]
352
In [4]: data1=[[mandel1(complex(x,y)) for x in xs] for y in ys]
We will learn this lesson how to make a version of this code which works Ten Times faster:
return diverged_at_count
In [7]: ymatrix,xmatrix=np.mgrid[ymin:ymax:ystep,xmin:xmax:xstep]
values=xmatrix+1j*ymatrix
data_numpy=mandel_numpy(values)
353
Out[8]: <matplotlib.image.AxesImage at 0x10537e110>
In [9]: %%timeit
data_numpy=mandel_numpy(values)
10 loops, best of 3: 51.3 ms per loop
Note we get the same answer:
In [10]: sum(sum(abs(data_numpy-data1)))
Out[10]: 0.0
In [1]: xmin=-1.5
ymin=-1.0
xmax=0.5
ymax=1.0
resolution=300
xstep=(xmax-xmin)/resolution
ystep=(ymax-ymin)/resolution
xs=[(xmin+(xmax-xmin)*i/resolution) for i in range(resolution)]
ys=[(ymin+(ymax-ymin)*i/resolution) for i in range(resolution)]
In [2]: def mandel1(position,limit=50):
value=position
while abs(value)<2:
limit-=1
value=value**2+position
if limit<0:
return 0
return limit
In [3]: data1=[[mandel1(complex(x,y)) for x in xs] for y in ys]
354
28.2 Many Mandelbrots
Let’s compare our naive python implementation which used a list comprehension, taking 662ms, with the
following:
In [4]: %%timeit
data2=[]
for y in ys:
row=[]
for x in xs:
row.append(mandel1(complex(x,y)))
data2.append(row)
In [5]: data2=[]
for y in ys:
row=[]
for x in xs:
row.append(mandel1(complex(x,y)))
data2.append(row)
Interestingly, not much difference. I would have expected this to be slower, due to the normally high cost
of appending to data.
We ought to be checking if these results are the same by comparing the values in a test, rather than
re-plotting. This is cumbersome in pure Python, but easy with NumPy, so we’ll do this later.
Let’s try a pre-allocated data structure:
In [8]: %%timeit
for j,y in enumerate(ys):
for i,x in enumerate(xs):
data3[j][i]=mandel1(complex(x,y))
In [10]: plt.imshow(data3,interpolation=’none’)
In [11]: %%timeit
data4=[]
for y in ys:
bind_mandel=lambda x: mandel1(complex(x,y))
data4.append(map(bind_mandel,xs))
355
1 loops, best of 3: 1.04 s per loop
In [12]: data4=[]
for y in ys:
bind_mandel=lambda x: mandel1(complex(x,y))
data4.append(map(bind_mandel,xs))
In [13]: plt.imshow(data4,interpolation=’none’)
In [1]: xmin=-1.5
ymin=-1.0
xmax=0.5
ymax=1.0
resolution=300
xstep=(xmax-xmin)/resolution
ystep=(ymax-ymin)/resolution
xs=[(xmin+(xmax-xmin)*i/resolution) for i in range(resolution)]
ys=[(ymin+(ymax-ymin)*i/resolution) for i in range(resolution)]
356
Chapter 29
NumPy
Numerical Python, NumPy, is a library that enables us to do much faster work with floating point data than
ordinary python.
In [3]: np.zeros([3,4,2])
[[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]],
[[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]]])
In [4]: np.ndarray([2,2,2],dtype=’int’)
[[ 140264375108176, 4523476336],
[ 4551329752, 0]]])
We can convert any Python iterable into an ndarray with the array magic constructor:
In [5]: x=np.array(xrange(5))
print x
357
[0 1 2 3 4]
But NumPy arrays can only contain one type of data, unlike Python lists, which is one source of their
speed:
In [6]: np.array([1,1.0,’one’])
In [7]: x*2
Ndarray multiplication is element wise, not matrix multiplication or vector dot product:
In [8]: x*x
Numpy’s mathematical functions also happen this way, and are said to be “vectorized” functions.
In [9]: np.sqrt(x)
Numpy contains many useful functions for creating matrices. In our earlier lectures we’ve seen linspace
and arange for evenly spaced numbers.
In [10]: np.linspace(0,10,21)
In [11]: np.arange(0,10,0.5)
In [12]: xmin=-1.5
ymin=-1.0
xmax=0.5
ymax=1.0
resolution=300
xstep=(xmax-xmin)/resolution
ystep=(ymax-ymin)/resolution
ymatrix, xmatrix=np.mgrid[ymin:ymax:ystep,xmin:xmax:xstep]
358
[[-1. -1. -1. ..., -1. -1. -1. ]
[-0.99333333 -0.99333333 -0.99333333 ..., -0.99333333 -0.99333333
-0.99333333]
[-0.98666667 -0.98666667 -0.98666667 ..., -0.98666667 -0.98666667
-0.98666667]
...,
[ 0.98 0.98 0.98 ..., 0.98 0.98 0.98 ]
[ 0.98666667 0.98666667 0.98666667 ..., 0.98666667 0.98666667
0.98666667]
[ 0.99333333 0.99333333 0.99333333 ..., 0.99333333 0.99333333
0.99333333]]
We can add these together to make a grid containing the complex numbers we want to test for membership
in the Mandelbrot set.
In [14]: values=xmatrix+1j*ymatrix
In [16]: z0=values
z1=z0*z0+values
z2=z1*z1+values
z3=z2*z2+values
In [17]: print z3
[[ 24.06640625+20.75j 23.16610231+20.97899073j
22.27540349+21.18465854j ..., 11.20523832 -1.88650846j
11.57345330 -1.6076251j 11.94394738 -1.31225596j]
[ 23.82102149+19.85687829j 22.94415031+20.09504528j
22.07634812+20.31020645j ..., 10.93323949 -1.5275283j
11.28531994 -1.24641067j 11.63928527 -0.94911594j]
[ 23.56689029+18.98729242j 22.71312709+19.23410533j
359
21.86791017+19.4582314j ..., 10.65905064 -1.18433756j
10.99529965 -0.90137318j 11.33305161 -0.60254144j]
...,
[ 23.30453709-18.14090998j 22.47355537-18.39585192j
21.65061048-18.62842771j ..., 10.38305264 +0.85663867j
10.70377437 +0.57220289j 11.02562928 +0.27221042j]
[ 23.56689029-18.98729242j 22.71312709-19.23410533j
21.86791017-19.4582314j ..., 10.65905064 +1.18433756j
10.99529965 +0.90137318j 11.33305161 +0.60254144j]
[ 23.82102149-19.85687829j 22.94415031-20.09504528j
22.07634812-20.31020645j ..., 10.93323949 +1.5275283j
11.28531994 +1.24641067j 11.63928527 +0.94911594j]]
In [19]: mandel1(values)
---------------------------------------------------------------------------
<ipython-input-19-34d5142e7f61> in <module>()
----> 1 mandel1(values)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or
No. The logic of our current routine would require stopping for some elements and not for others.
We can ask numpy to vectorise our method for us:
In [20]: mandel2=np.vectorize(mandel1)
In [21]: data5=mandel2(values)
360
Out[22]: <matplotlib.image.AxesImage at 0x1110120d0>
In [23]: %%timeit
data5=mandel2(values)
This is not significantly faster. When we use vectorize it’s just hiding an plain old python for loop under
the hood. We want to make the loop over matrix elements take place in the “C Layer”.
What if we just apply the Mandelbrot algorithm without checking for divergence until the end:
return abs(value)<2
In [25]: data6=mandel_numpy_explode(values)
361
In [26]: def mandel_numpy(position,limit=50):
value=position
while limit>0:
limit-=1
value=value**2+position
diverging=abs(value)>2
# Avoid overflow
value[diverging]=2
return abs(value)<2
In [27]: data6=mandel_numpy(values)
In [28]: %%timeit
data6=mandel_numpy(values)
362
Chapter 30
Logical Arrays
In [30]: diverging=abs(z3)>2
print diverging[30]
[ True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True]
In [31]: x=np.arange(10)*1.0
y=np.ones([10])*5
z=x>y
print z
[False False False False False False True True True True]
In [32]: x[x>80]
363
Out[32]: array([], dtype=float64)
In [33]: x[np.logical_not(z)]
In [34]: x[z]=5
print x
[ 0. 1. 2. 3. 4. 5. 5. 5. 5. 5.]
30.1 Broadcasting
In our example above, we didn’t compare two arrays to get our logical array, but an array to a scalar integer.
When we apply an operation to things of different shapes, NumPy will broadcast the smaller index:
[False False False False False False False False False False]
In [36]: row=np.array([[1,2,3]])
column=np.array([[0],[2],[4]])
print row.shape
(1, 3)
(3, 1)
In [38]: row*column
In [39]: x=np.ones([4,1,2])
y=np.ones([1,4,1])
print (x+y).shape
print x+y
(4, 4, 2)
[[[ 2. 2.]
[ 2. 2.]
[ 2. 2.]
[ 2. 2.]]
[[ 2. 2.]
[ 2. 2.]
[ 2. 2.]
[ 2. 2.]]
[[ 2. 2.]
364
[ 2. 2.]
[ 2. 2.]
[ 2. 2.]]
[[ 2. 2.]
[ 2. 2.]
[ 2. 2.]
[ 2. 2.]]]
return diverged_at_count
In [41]: data7=mandel4(values)
In [42]: plt.imshow(data7,interpolation=’none’)
Out[42]: <matplotlib.image.AxesImage at 0x1118fd110>
365
In [43]: %%timeit
data7=mandel4(values)
Note that here, all the looping over mandelbrot steps was in Python, but everything below the loop-over-
positions happened in C. The code was amazingly quick compared to pure Python.
Can we do better by avoiding a square root?
return diverged_at_count
In [45]: %%timeit
data8=mandel5(values)
366
Chapter 31
NumPy Testing
Now, let’s look at calculating those residuals, the differences between the different datasets.
In [46]: data8=mandel5(values)
data5=mandel2(values)
In [47]: np.sum((data8-data5)**2)
Out[47]: 0.0
For our non-numpy datasets, numpy knows to turn them into arrays:
In [48]: data1=[[mandel1(complex(x,y)) for x in xs] for y in ys]
sum(sum((data1-data7)**2))
Out[48]: 0.0
In [49]: data2=[]
for y in ys:
row=[]
for x in xs:
row.append(mandel1(complex(x,y)))
data2.append(row)
In [50]: data2-data1
---------------------------------------------------------------------------
<ipython-input-50-f2646eae3452> in <module>()
----> 1 data2-data1
In [51]: sum(sum((np.array(data2)-np.array(data1))**2))
367
Out[51]: 0
NumPy provides some convenient assertions to help us write unit tests with NumPy arrays:
368
Chapter 32
Note that we might worry that we carry on calculating the mandelbrot values for points that have already
diverged.
return diverged_at_count
In [55]: data8=mandel6(values)
In [56]: %%timeit
data8=mandel6(values)
In [57]: plt.imshow(data8,interpolation=’none’)
369
This was not faster even though it was doing less work
This often happens: on modern computers, branches (if statements, function calls) and memory access
is usually the rate-determining step, not maths.
Complicating your logic to avoid calculations sometimes therefore slows you down. The only way to know
is to measure
370
Chapter 33
We’ve been using Boolean arrays a lot to get access to some elements of an array. We can also do this with
integers:
In [58]: x=np.arange(64)
y=x.reshape([8,8])
y
In [59]: y[[0,5,2]]
In [60]: y[[0,2,5],[1,2,7]]
We can use a : to indicate we want all the values from a particular axis:
In [61]: y[0:8:2,[0,2]]
We can mix array selectors, boolean selectors, :s and ordinary array seqeuencers:
In [62]: z=x.reshape([4,4,4])
print z
371
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
[[16 17 18 19]
[20 21 22 23]
[24 25 26 27]
[28 29 30 31]]
[[32 33 34 35]
[36 37 38 39]
[40 41 42 43]
[44 45 46 47]]
[[48 49 50 51]
[52 53 54 55]
[56 57 58 59]
[60 61 62 63]]]
In [63]: z[:,[1,3],0:3]
In [64]: z[:,np.newaxis,[1,3],0].shape
Out[64]: (4, 1, 2)
When we use basic indexing with integers and : expressions, we get a view on the matrix so a copy is
avoided:
In [65]: a=z[:,:,2]
a[0,0]=-500
z
372
[[ 32, 33, 34, 35],
[ 36, 37, 38, 39],
[ 40, 41, 42, 43],
[ 44, 45, 46, 47]],
In [66]: z[1]
In [67]: z[...,2]
However, boolean mask indexing and array filter indexing always causes a copy.
Let’s try again at avoiding doing unnecessary work by using new arrays containing the reduced data
instead of a mask:
value=value[carry_on]
indices=indices[:,carry_on]
positions=positions[carry_on]
diverged_at_count[diverging_now_indices[0,:],
diverging_now_indices[1,:]]=limit
return diverged_at_count
In [69]: data9=mandel7(values)
In [70]: plt.imshow(data9,interpolation=’none’)
373
In [71]: %%timeit
data9=mandel7(values)
374
Chapter 34
We’ve seen that NumPy arrays are really useful. Why wouldn’t we always want to use them for data which
is all the same type?
In [4]: counts=np.arange(1,100000,10000)
In [5]: plt.plot(counts,map(time_append_to_list,counts))
plt.ylim(ymin=0)
375
In [6]: plt.plot(counts,map(time_append_to_ndarray,counts))
plt.ylim(ymin=0)
376
In [7]: def time_lookup_middle_element_in_list(count):
before=[0]*count
def totime():
x=before[count/2]
return repeat(totime,number=10000)
In [9]: plt.plot(counts,map(time_lookup_middle_element_in_list,counts))
plt.ylim(ymin=0)
In [10]: plt.plot(counts,map(time_lookup_middle_element_in_ndarray,counts))
plt.ylim(ymin=0)
377
But a list performs badly for insertions at the beginning:
In [11]: x=range(5)
In [12]: x
Out[12]: [0, 1, 2, 3, 4]
In [13]: x[0:0]=[-1]
In [14]: x
Out[14]: [-1, 0, 1, 2, 3, 4]
In [16]: plt.plot(counts,map(time_insert_to_list,counts))
plt.ylim(ymin=0)
378
There are containers in Python that work well for insertion at the start:
In [17]: from collections import deque
In [18]: def time_insert_to_deque(count):
return repeat(’before.appendleft(0)’,’from collections import deque; before=deque([0]*’+str
In [19]: plt.plot(counts,map(time_insert_to_deque,counts))
plt.ylim(ymin=0)
Out[19]: (0, 0.0014499999999999999)
379
But looking up in the middle scales badly:
In [21]: plt.plot(counts,map(time_lookup_middle_element_in_deque,counts))
plt.ylim(ymin=0)
380
Chapter 35
Dictionary performance
For another example, let’s consider the performance of a dictionary versus a couple of other ways in which
we could implement an associative array.
In [23]: me=[["Name","James"],["Job","Programmer"],["Home","London"]]
In [24]: me_evil=evildict(me)
In [25]: me_evil["Job"]
Out[25]: ’Programmer’
In [26]: me_dict=dict(me)
In [27]: me_evil["Job"]
Out[27]: ’Programmer’
In [29]: me_sorted=sorteddict(me)
In [30]: me_sorted["Job"]
Out[30]: ’Programmer’
381
In [31]: def time_dict_generic(ttype,count,number=10000):
from random import randrange
keys=range(count)
values=[0]*count
data=ttype(zip(keys,values))
def totime():
x=data[keys[count/2]]
return repeat(totime,number=10000)
In [33]: counts=np.arange(1,1000,100)
plt.plot(counts,map(time_dict,counts))
plt.ylim(ymin=0)
In [34]: plt.plot(counts,map(time_sorted,counts))
plt.ylim(ymin=0)
382
In [35]: plt.plot(counts,map(time_evil,counts))
plt.ylim(ymin=0)
383
We can’t really see what’s going on here for the sorted example as there’s too much noise, but theoretically
we should get logarithmic asymptotic performance.
We write this down as O(ln N ). This doesn’t mean there isn’t also a constant term, or a term proportional
to something that grows slower (such as ln(ln N )): we always write down just the term that is dominant for
large N . Similarly, the hash-table based solution used by dict is O(1) and the simple check-each-in-turn
solution is O(N ). We saw before that list is O(1) for appends, O(N ) for inserts. Numpy’s array is O(N )
for appends.
Exercise: determine what the asymptotic peformance for the Boids model in terms of the number of Boids.
Make graphs to support this. Bonus: how would the performance scale with the number of dimensions?
384
Chapter 36
Cython
Cython can be viewed as an extension of Python where variables and functions are annotated with extra
information, in particular types. The resulting Cython source code will be compiled into optimized C or
C++ code, and thereby yielding substantial speed-up of slow Python code. In other word, cython provides
a way of writting Python with comparable performance to that of C/C++.
In ipython notebook, everything is a lot easier. One need only to load Cython extension (%load ext
Cython) at the beginning and put %%cython mark in front of cells of cython code. Cells with cython mark
will be treated as a .pyx code and consequently, compiled into C.
For details, please see Building Cython Code.
Pure python Mandelbrot set:
In [1]: xmin=-1.5
ymin=-1.0
xmax=0.5
ymax=1.0
resolution=300
xstep=(xmax-xmin)/resolution
ystep=(ymax-ymin)/resolution
xs=[(xmin+(xmax-xmin)*i/resolution) for i in range(resolution)]
ys=[(ymin+(ymax-ymin)*i/resolution) for i in range(resolution)]
Compiled by Cython:
385
In [3]: %load_ext Cython
In [4]: %%cython
def mandel_cython(position,limit=50):
value=position
while abs(value)<2:
limit-=1
value=value**2+position
if limit<0:
return 0
return limit
We have improved the performance of a factor of 1.5 by just using the cython compiler, without chang-
ing the code!
386
36.2 Cython with C Types
But we can do better by telling Cython what C data type we would use in the code. Note we’re not actually
writing C, we’re writing Python with C types.
typed variable
In [7]: %%cython
def var_typed_mandel_cython(position,limit=50):
cdef double complex value # typed variable
value=position
while abs(value)<2:
limit-=1
value=value**2+position
if limit<0:
return 0
return limit
In [8]: %%cython
cpdef call_typed_mandel_cython(double complex position,int limit=50): # typed function
cdef double complex value # typed variable
value=position
while abs(value)<2:
limit-=1
value=value**2+position
if limit<0:
return 0
return limit
387
In [13]: import numpy as np
ymatrix,xmatrix=np.mgrid[ymin:ymax:ystep,xmin:xmax:xstep]
values=xmatrix+1j*ymatrix
In [14]: %%cython
import numpy as np
cimport numpy as np
xlim=position.shape[1]
ylim=position.shape[0]
diverged_at=np.zeros([ylim, xlim], dtype=int)
for x in xrange(xlim):
for y in xrange(ylim):
steps=limit
value=position[y,x]
pos=position[y,x]
while abs(value)<2 and steps>=0:
steps-=1
value=value**2+pos
diverged_at[y,x]=steps
return diverged_at
Note the double import of numpy: the standard numpy module and a Cython-enabled version of numpy
that ensures fast indexing of and other operations on arrays. Both import statements are necessary in code
that uses numpy arrays. The new thing in the code above is declaration of arrays by np.ndarray.
In [18]: numpy_cython_2=np.vectorize(call_typed_mandel_cython)
388
36.4 Calling C functions from Cython
Example: compare sin() from Python and C library
In [20]: %%cython
import math
cpdef py_sin():
cdef int x
cdef double y
for x in xrange(1e7):
y=math.sin(x)
In [21]: %%cython
from libc.math cimport sin as csin # import from C library
cpdef c_sin():
cdef int x
cdef double y
for x in xrange(1e7):
y=csin(x)
389