Mastering Python Scientific Computing - Sample Chapter
Mastering Python Scientific Computing - Sample Chapter
$ 39.99 US
25.99 UK
P U B L I S H I N G
Mastering Python
Scientific Computing
ee
Sa
m
pl
C o m m u n i t y
E x p e r i e n c e
D i s t i l l e d
Mastering Python
Scientific Computing
A complete guide for Python programmers to master scientific
computing using Python APIs and tools
Preface
"I am absolutely convinced that in a few decades, historians of science will describe
the period we are in right now as one of deep and significant transformations to the
very structure of science. And in that process, the rise of free openly available tools
plays a central role."
Fernando Perez, creator of IPython
This book covers the Python APIs and toolkits used to perform scientific computing.
It is highly recommended for readers who perform computerized engineering or
scientific computations. Scientific computing is an interdisciplinary branch that
requires a background in computer science, mathematics, general science (at least
any one branch out of physics, chemistry, environmental science, biology, and
others), and engineering. Python consists of a large number of packages, APIs, and
toolkits for supporting the functionalities required by these diverse scientific and
engineering domains.
A large community of users, lots of help and documentation, a large collection of
scientific libraries and environments, great performance, and good support make
Python a great choice for scientific computing.
Preface
Chapter 2, A Deeper Dive into Scientific Workflows and the Ingredients of Scientific
Computing Recipes, discusses the various concepts of mathematical and numerical
analysis that are generally required to solve scientific problems. It also covers a
brief introduction to the packages, toolkits, and APIs meant for performing scientific
computing in the Python language.
Chapter 3, Efficiently Fabricating and Managing Scientific Data, discusses all the aspects
about the underlying data of scientific applications, including the basic concepts,
various operations, and the formats and software used to store data. It also presents
standard datasets and techniques of preparing synthetic data.
Chapter 4, Scientific Computing APIs for Python, covers the basic concepts, features,
and selected sample programs of various scientific computing APIs and toolkits,
including NumPy, SciPy, and SymPy. A basic introduction to interactive computing,
data analysis, and data visualization is also discussed in this chapter using IPython,
matplotlib, and pandas.
Chapter 5, Performing Numerical Computing, discusses how to perform numerical
computations using the NumPy and SciPy packages of Python. This chapter starts
with the basics of numerical computation and covers a number of advanced concepts,
such as optimization, interpolation, Fourier transformation, signal processing, linear
algebra, statistics, spatial algorithms, image processing, file input/output, and others.
Chapter 6, Applying Python for Symbolic Computing, starts with the fundamentals of the
Computerized Algebra System (CAS) and performing symbolic computations using
SymPy. It covers a vast range of topics on CAS, from using simple expressions and
basic arithmetic to advanced concepts of mathematics and physics.
Chapter 7, Data Analysis and Visualization, presents the concepts and applications of
matplotlib and pandas for data analysis and visualization.
Chapter 8, Parallel and Large-scale Scientific Computing, discusses the concepts of
high-performance scientific computing using IPython (which is done using MPI),
the management of the Amazon EC2 cluster using StarCluster, multiprocessing,
multithreading, Hadoop, and Spark.
Chapter 9, Revisiting Real-life Case Studies, illustrates several case studies of scientific
computing applications, libraries, and tools developed using the Python language.
Some cases studied from various engineering and science domains are presented
in this chapter.
Preface
Chapter 10, Best Practices for Scientific Computing, discusses the best practices for
scientific computing. It consists of the best practices for designing, coding, data
management, application deployment, high-performance computing, security,
data privacy, maintenance, and support. We also cover the best practices for
general Python-based development.
Chapter 1
A background of Python
[1]
[2]
Chapter 1
Scientific
Computing
Computer Science
(Language/Frameworks)
Problem Domain
(Engineering/ Science)
Mathematics
(Algorithms/Modeling)
[4]
Chapter 1
Problem
Mathematical
Model
Algorithm
Implementation/
Simulation
Results
Collection
Analyze
Results
The design and analysis of algorithms that solves any mathematical problem,
specifically about science and engineering, is known as numerical analysis, and
nowadays it is also called scientific computing. In scientific computing, the problems
under consideration mainly deal with continuous values rather than discrete values.
The latter are dealt with in other computer science problems. Generally saying,
scientific computing solves problems that involve functions and equations with
continuous variables, for example, time, distance, velocity, weight, height, size,
temperature, density, pressure, stress, and much more.
Generally, problems of continuous mathematics have approximate solutions, as
their exact solution is not always possible in a finite number of steps. Hence, these
problems are solved using an iterative process that finally converges to an acceptable
solution. The acceptable solution depends on the nature of the specific problem.
Generally, the iterative process is not infinite, and after each iteration, the current
solution gets closer to the desired solution for the purpose of simulation. Reviewing
the accuracy of the solution and swift convergence to the solution form the gist of the
scientific computing process.
[5]
There are well-established areas of science that use scientific computing to solve
problems. They are as follows:
Atmospheric science
Seismology
Structural analysis
Chemistry
Magnetohydrodynamics
Reservoir modeling
Astronomy/astrophysics
Cosmology
Environmental studies
Nuclear engineering
Recently, some emerging areas have also started harnessing the power of scientific
computing. They include:
Biology
Economics
Materials research
Medical imaging
Animal science
Chapter 1
[7]
Prerequisite computations: The data may have been obtained from the
results of previous experiments, or simulations may have had minor,
acceptable inaccuracies that finally led to further approximations. Such
prior processing may be a prerequisite of the subsequent experiments.
[8]
Chapter 1
The approximate value of the final result of a computation problem may be the
outcome of any combination of the various sources discussed previously. The
accuracy of the final output may be reduced or increased depending on the problem
being solved and the approach used to solve it.
Errors/
Approximations
During
Computations
Before
Computation
Modeling Errors
Simplification of the
problem
Error in Experimentation
Data
[9]
Error analysis
Error analysis is a process used to observe the impact of such approximations on the
accuracy of an algorithm or computational process. In the subsequent text, we are
going to discuss the basic concepts associated with error analysis.
An observation may be made from the previous discussion on approximations that
the errors can be considered as errors in the input data and they arose during the
computations on this input data.
On a similar path, computation errors may again be divided into two categories:
truncation errors and rounding errors. A truncation error is the result of reducing
a complex problem to a simpler problem, for example, immature termination of
iterations before the desired accuracy is achieved. A rounding error is the result
of the precision used to represent numbers in the number system used for the
computerized computation, and also the result of performing arithmetic on these
numbers.
Ultimately, the amount of error that is significant or ignorable depends on the scale
of the values. For example, an error of 10 in a final value of 15 is highly significant,
while an error of 10 in a final value of 785 is not that significant. Moreover, the same
error of 10 in obtaining the final value of 17,685 is ignorable. Generally, the impact of
an error value is relative to the value of the result. If we know the magnitude of the
final value to be obtained, then after looking at the value of the error, we can decide
whether to ignore it or consider it as significant. If the error is significant, then we
should start taking the corrective measures.
[ 10 ]
Chapter 1
[ 11 ]
[ 12 ]
Chapter 1
Explicit is better than implicit: Most concepts are kept explicit, just like
the explicit Boolean type. We have used an explicit literal valuetrue
or falsefor Boolean variables instead of depending on zero or nonzero
integers. Still, it does support the integer-based Boolean concept. Nonzero
values are treated as Boolean. Similarly, its for loop can operate data
structures without managing the variable. The same loop can iterate
through tuples and characters in a string.
Simple is better than complex: Memory allocation and the garbage collector
manage allocation or deallocation of memory to avoid complexity. Another
simplicity is introduced in the simple print statement. This avoids the use
of file descriptors for simple printing. Moreover, objects automatically get
converted to a printable form in comma-separated values.
Flat is better than nested: Python provides a wide variety of modules in its
standard library. Namespaces in Python are kept in a flat structure, so there is
no need to use very long names, such as java.net.socket instead of a simple
socket in Python. Python's standard library follows the batteries included
philosophy. This standard library provides tools suitable for many tasks.
For example, modules for various network protocols are supported for the
development of rich Internet applications. Similarly, modules for graphic user
interface programming, database programming, regular expressions, highprecision arithmetic, unit testing, and much more are bundled in the standard
library. Some of the modules in the library include networking (socket,
select, SocketServer, BaseHTTPServer, asyncore, asynchat, xmlrpclib,
and SimpleXMLRPCServer), Internet protocols (urllib, httplib, ftplib,
smtpd, smtplib, poplib, imaplib, and json), database (anydbm, pickle,
shelve, sqlite3, and mongodb), and parallel processing (subprocess,
threading, multiprocessing, and queue).
[ 13 ]
Sparse is better than dense: The Python standard library is kept shallow
and the Python package index maintains an exhaustive list of third-party
packages meant for supporting in-depth operations for a topic. We can use
pip to install custom Python packages.
Special cases aren't special enough to break the rules: The philosophy
behind this is that everything in Python is an object. All built-in types
are implemented as objects. The data types that represent numbers have
methods. Even functions are themselves objects with methods.
Errors should never pass silently: It uses the concept of exception handling
to avoid handling errors at low level APIs so that they may be handled at a
higher level while writing the program that uses these APIs. It supports the
concept of standard exceptions with specific meanings, and users are allowed
to define exceptions for custom error handling. To support debugging of
code, the concept of traceback is provided. In Python programs, by default,
the error handling mechanism prints a complete traceback pointing to the
error in stderr. The traceback includes the source filename, line number,
and source code, if it is available.
[ 14 ]
Chapter 1
Although that way may not be obvious at first unless you're Dutch: The way
that we discussed in the previous point is applicable to the standard library.
Of course, there will be redundancy in third-party modules. For example, we
have support for multiple GUI APIs, such as as GTK, wxPython, and KDE.
Similarly for web programming, we have Django, AppEngine, and Pyramid.
Now is better than never: This statement is meant to motivate users to adopt
Python as their favorite tool. There is a concept of ctypes meant to wrap
existing C/C++ shared libraries for use in Python programs.
Although never is often better than *right* now: With this philosophy, the
Python Enhancement Proposals (PEP) processed a temporary moratorium
(suspension) on all changes to the syntax, semantics, and built-in components
for a specified period to promote the alternative development catch-up.
[ 15 ]
Language interoperability
Python supports interoperability with most existing technologies. We can call or
use functions, code, packages, and objects written in different languages, such as
MATLAB, C, C++, R, Fortran, and others. There are a number of options available
to support this interoperability, such as Ctypes, Cython, and SWIG.
Data structures
Python supports an exhaustive range of data structures, which is the most important
component in the design and implementation of a program to perform scientific
computations. Support for a dictionary is the most highlightable feature of the data
structure functionality of the Python language.
[ 16 ]
Chapter 1
Available libraries
Owing to the batteries-included philosophy of Python, it supports a wide range of
standard packages in its bundled library. As it is an extensible language, a number of
well-tested custom-specific purpose libraries are available for a wide range of users.
Let's briefly discus a few libraries used for scientific computations.
NumPy/SciPy is a package that supports most mathematical and statistical
operations required for any scientific computation. The SymPy library provides
functionality for symbolic computations of basic symbolic arithmetic, algebra,
calculus, discrete mathematics, quantum physics, and more. PyTables is a package
used to efficiently process datasets that have a large amount of data in the form of a
hierarchical database. IPython facilitates the interactive computing feature of Python.
It is a command shell that supports interactive computing in multiple programming
languages. matplotlib is a library that supports plotting functionality for Python/
NumPy. It supports plotting of various types of graphs, such as line plot, histogram,
scatter plot, and 3D plot. SQLAlchemy is an object-relational mapping library for
Python programming. By using this library, we can use the database capability
for scientific computations with great performance and ease. Finally, it is time to
introduce a toolkit written on top of the packages we just discussed and a number of
other open source libraries and toolkits. This toolkit is named SageMath. It is a piece
of open source mathematical software.
[ 17 ]
Summary
In this chapter, we discussed the basic concepts of scientific computing and its
definitions. Then we covered the flow of the scientific computing process. Next,
we briefly discussed some examples from a few science and engineering domains.
After the examples, we explained an effective strategy to solve complex problems.
After that, we covered the concept of approximation, errors, and related terms.
We also discussed the background of the Python language and its guiding
principles. Finally, we discussed why Python is the most suitable choice for
scientific computing.
In the next chapter, we will discuss various mathematical/numerical analysis
concepts involved in scientific computing. We will also cover various Python
packages, toolkits, and APIs for scientific computing.
[ 18 ]
www.PacktPub.com
Stay Connected: