08 - Mixedprogramming: 1 Mixed Programming
08 - Mixedprogramming: 1 Mixed Programming
1 Mixed Programming
Section ??
Section ??
Section ??
Section ??
Section ??
Section 7
Section ??
Section ??
%matplotlib inline
1
from IPython.display import Image
from re import search
Out[4]: 499999500000
We can see a breakdown of the CPU time in user and sys and the also the Wall time needed
to complete the operation. You can see by yourself why wall times are different between two
measurements of the same operations.
By the way, the time Python module can be used in the same way:
import time
start_time = time.time()
def somefunc(bla,bla):
for i in range(iterations):
#do stuff
return result
elapsed = time.time() - start_time
per_iter = elapsed / iterations
Since this is a common operation, IPython provides the %time magic function:
but we have already shown that %timeit is a better tool to make estimates.
In [5]: %%timeit
j = 0
for i in range(bigN): j += i
2
In [7]: %%timeit
NN = np.arange(bigN)
np.sum(NN)
In [9]: %%timeit
s = ''
for i in range(int(1e4)):
s += mystring
In [10]: %%timeit
s = ''
for i in range(int(1e4)):
s = "".join((s, mystring))
In [12]: %%timeit
s = list()
for i in mylist:
s.append(chr(i))
In [13]: %%timeit
s = list()
s = [chr(i) for i in mylist]
3
In [14]: %%timeit
s = (chr(i) for i in mylist)
The slowest run took 7.07 times longer than the fastest. This could mean that an intermediate re
1000000 loops, best of 3: 347 ns per loop
3.1 Exercises
1. Find another way to join the strings and compare the three methods (may be you can make
a graph).
2. Find another way of converting integers to ASCII characters.
3.1.1 Solutions
In [15]: %%timeit
#solution 1
r = list(mystring)
for i in range(int(1e4)):
r.append(mystring)
s = ''.join(r)
In [16]: %%timeit
#solution 2
s = map(chr,mylist)
The slowest run took 6.34 times longer than the fastest. This could mean that an intermediate re
10000000 loops, best of 3: 188 ns per loop
In [17]: %%timeit
#solution 3
s = list(map(chr,mylist))
4
In [18]: def simple_pi(num_iter):
mysum = .0
step = 1./num_iter
for i in range(num_iter):
x = (i+0.5)*step
mysum += 4./(x*x + 1)
#print(mysum*step)
return mysum*step
In [19]: simple_pi(bigN)
Out[19]: 3.1415926535897643
In [20]: %%timeit
simple_pi(bigN)
In [22]: numpy_pi(bigN)
Out[22]: 3.1415936535896263
program simple_pi
implicit none
real*8 mysum, step, x
integer i, numiter
character*10 l
call getarg(1,l)
read (l,'(I10)') numiter
mysum = 0.d0
5
step = 1.d0 / dble(numiter)
Do 10 i = 0, numiter-1
x = (dble(i)+0.5d0)*step
mysum = mysum + 4.d0/(x*x + 1.d0)
10 Continue
write(6,*) mysum*step
In [26]: %%timeit
%%bash -s $bigN
./fpi $1 > /dev/null
Note the user and sys values and the cost of opening a BASH subprocess. By the way, notice
how we can pass a variable as a BASH positional argument...
In [28]: print(fbench.stdout)
What about C?
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
6
step = 1.0/(double) num_steps;
printf("PI: %f\n",PI);
return 0;
}
In [30]: %%bash
gcc -Wall -o cpi simple_pi.c
In [31]: %%bash -s $bigN
./cpi $1
PI: 3.141593
So, it’s not a matter or language (or compiler). [Note the %capture IPython magic]
To wrap up:
1. Python is slower than a compiled language
2. Numpy (when we can use it) can sometimes perform approximately like compiled code (if
take into account the overhead of a BASH subprocess)
We have also shown that is fundamental to choose the proper data structure (e. g. map vs
generator vs list) to do a certain job, to avoid a catastrophic performance. More tips available
Section ??.
Finally, using a compiler requires some learning. More information about the GNU compiler
collection can be found here: https://gcc.gnu.org/
and a quick tutorial is found in this Section ??
7
4.0.1 Duck typing
Take the difference of C variable vs a Python one at runtime:
• the C variable is just a pointer to some array location with few definite properties
• the Python variable is a "box" containing an instance of a subclass of object
when the code executes, the Python interpreter decides what properties the variable may have
depending on context. The following C code:
int i,j,k;
i = 2;
j = 2;
k = i*j;
return k
i = 2
j = 2
k = i*j
return k
and each of these operations means that the interpreter creates or uses something like that:
In [34]: Image(filename="box.png")
Out[34]:
8
4.0.2 Bytecode
From Wikipedia:
Bytecode, also known as p-code (portable code), is a form of instruction set designed for ef-
ficient execution by a software interpreter. Unlike human-readable source code, bytecodes are
compact numeric codes, constants, and references (normally numeric addresses) which encode
the result of parsing and semantic analysis of things like type, scope, and nesting depths of pro-
gram objects. They therefore allow much better performance than direct interpretation of source
code.
In [35]: %%bash
cat hello_mod.py
echo "++++++"
cat hello_script.py
def hello():
print("Hello, World!")
++++++
import hello_mod
9
hello_mod.hello()
quit()
Hello, World!
__pycache__/hello_mod.cpython-35.pyc swig/swigmc.pyc
the .pyc is a bytecode, so python is in case 2 above. The Python interpreter loads .pyc files before
.py files, so if they’re present, it can save some time by not having to re-compile the Python source
code.
But how this affects execution speed?
Compared to an interpreter, a good compiler can look ahead and optimize the code (remove re-
dundant operation, unrolling small loops). This may be guided by user selected switches, yielding
a significant speed-up. However, compilation may be a difficult task by itself, requiring knowl-
edge of the platform and compiler being used.
The Section ?? module allows you to dissassemble your Python bytecode.
4.0.4 Locality
One of the key features of Numpy is the locality of data that allows to access to rows and columns
as C (or Fortran) plus the decoration that make possible all the fancy stuff. The values in the array
are contiguous and have a common size.
In [39]: Image(filename="array.png")
Out[39]:
10
A list object in CPython is represented by the following C structure. ob_item is an array of
pointers to the list elements. allocated is the number of slots allocated in memory
typedef struct {
PyObject_VAR_HEAD
PyObject **ob_item;
Py_ssize_t allocated;
} PyListObject;
each element in the Python list is found by a pointer to a buffer of pointers, each of which
points to a Python object which encapsulates its own data. Thus, operations like append or pop are
cheap but running over all the elements in the list can be costly since it involves a great deal of
referencing operations and (likely) a lot of copies to and from the memory.
5 Profiling Python
In software engineering, program profiling, software profiling or simply profiling, a form of dy-
namic program analysis (as opposed to static code analysis), is the investigation of a program’s
behavior using information gathered as the program executes. The usual purpose of this analysis
is to determine which sections of a program to optimize - to increase its overall speed, decrease its
memory requirement or sometimes both.
IPython provides an interface to the cProfile Python module, using the %prun magic com-
mand which analyses the time spent in each call of Python block:
11
results = []
for _ in range(niter):
mat = np.random.randn(K, K)
max_eigenvalue = np.abs(eigvals(mat)).max()
results.append(max_eigenvalue)
return results
Ordered by: internal time List reduced from 33 to 5 due to restriction <5>
ncalls tottime percall cumtime percall filename:lineno(function) 100 0.629 0.006 0.635 0.006
linalg.py:832(eigvals) 100 0.047 0.000 0.047 0.000 {method ’randn’ of ’mtrand.RandomState’ ob-
jects} 300 0.002 0.000 0.002 0.000 {method ’reduce’ of ’numpy.ufunc’ objects} 100 0.002 0.000 0.003
0.000 linalg.py:214(_assertFinite) 1 0.001 0.001 0.684 0.684 :2(run_experiment)
where the various lines are:
• percall: tottime/ncalls
Note the use of _ in the above loop. It is a convention for a throwaway variable that can be
discarded as in
1. Using a single core makes your calculation too slow; here slow may vary from “over lunch” to
"before dissertation". Using multiple cores speeds up your calculation (we’ll see an example
of that).
2. Your main memory per core/CPU is not enough: bigger problems (with more complicated
physics, more particles, etc.) may be solved using memory from multiple CPUs (not covered
here).
12
Example: matrix-matrix multiplication is time intensive for big (dense) matrices. However,
each row-column dot product is independent from each other and so can be given to a core with-
out the need to communicate between cores mid-task. This can be done by generating multiple
processes (as the MPI libraries do) or multiple threads (as pthreads in Unix does). In the following
we will take just a glimpse of threads. But what is a process or thread?
Processes A process is a set of independent executions that run in a memory space separated
from other processes. It has a private virtual address space, environment variables and OS identi-
fiers. A process may split its sequence of executions in one or more threads.
In [43]: Image(filename="thread_en.png")
Out[43]:
13
6.0.2 OpenMP
From wikipedia:
Section ?? is an application programming interface (API) that supports multi-platform shared
memory multiprocessing programming in C, C++, and Fortran, on most platforms, processor ar-
chitectures and operating systems, including Solaris, AIX, HP-UX, Linux, OS X, and Windows. It
consists of a set of compiler directives, library routines, and environment variables that influence
run-time behavior.
GCC supports OpenMP with the -fopenmp switch. Note that many versions of
BLAS/LAPACK use OpenMP multithreading and should do Numpy/Scipy wrappers when com-
piled against OpenMP enabled libraries.
OpenMP allows to parallelize CPU intensive blocks of code (for loops) with very little effort
using pre-processor directives; the do/while loop in calcPI above will be split in sub-loops on each
core that will hold a fraction of the estimation of π:
14
cpu2 cpu5 cpufreq isolated modalias possible uevent
You can check the load on each core with htop (or using a desktop applet or a utility such as
conky)
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <omp.h>
return 0;
}
In [46]: %%bash
gcc -Wall -O3 -fopenmp -o pi_omp pi_omp.c
PI: 3.141593
The OMP_NUM_THREADS environment variable sets the number of threads to use. We can
see how the speed up scales with the number of threads.
15
In [49]: %%capture t1
%%timeit
%%bash -s $bigN
export OMP_NUM_THREADS=1
./pi_omp $1 >& /dev/null
In [50]: %%capture t2
%%timeit
%%bash -s $bigN
export OMP_NUM_THREADS=2
./pi_omp $1 >& /dev/null
In [51]: %%capture t4
%%timeit
%%bash -s $bigN
export OMP_NUM_THREADS=4
./pi_omp $1 >& /dev/null
In [52]: %%capture t6
%%timeit
%%bash -s $bigN
export OMP_NUM_THREADS=6
./pi_omp $1 >& /dev/null
In [53]: %%capture t8
%%timeit
%%bash -s $bigN
export OMP_NUM_THREADS=8
./pi_omp $1 >& /dev/null
16
between 6 and 8 cores the speed-up is very low and the code is hitting a barrier (at least, where
I run the nb)
7 Cython
In [55]: Image(filename="cy_logo.png")
Out[55]:
The fundamental nature of Cython can be summed up as follows: Cython is Python with C
data types. This means that Cython can handle at the same type:
17
• intermixed C and Python variable and commands
what happens: the Cython compiler reads in a .pxy file and produces a .c file; the C file is
compiled; the resulting module is linked against the CPython library, and used by the interpreter
More in detail (from the documentation):
Cython is an optimising static compiler for both the Python programming language and the extended
Cython programming language. The Cython language is a superset of the Python language that addition-
ally supports calling C functions and declaring C types on variables and class attributes. This allows the
compiler to generate very efficient C code from Cython code. The C code is generated once and then compiles
with all major C/C++ compilers.
So, that’s your world without Cython’
In [56]: Image(filename="wo_cython.png")
Out[56]:
With Cython:
In [57]: Image(filename="wcy.png")
Out[57]:
18
7.0.1 Summing integers
What if we want to use Cython? Generally speaking, this should include a compilation step,
converting the Cython code to C. In the following, we will take advantage of the %%cython cell
magic to pass compilation options to cython.
In [58]: %%cython
def mymultiply(int a,int b):
cdef int c = a*b
return c
try:
mymultiply("w","q")
except TypeError as e:
print(e)
finally:
print(mymultiply(3.,4.))
an integer is required
12
cdef may be used to declare C types and structures (and also union and enum types). cdef
may also be used to declare functions and classes (known as extensions, behave like a builtin) with
19
C attributes which are then callable from C. cpdef puts a Python wrapper around a C function
definition and makes it callable from C and Python.
In [60]: %%timeit
%%cython
n = int(1e6)
cdef int j = 0
cdef int i
for i in range(n):
j += i
The slowest run took 27.66 times longer than the fastest. This could mean that an intermediate r
1000 loops, best of 3: 171 µs per loop
In [61]: %%timeit
j = 0
for i in range(bigN): j += i
Note that C variables are not python objects, i. e. they are typed:
In [62]: %%cython
cdef int n
n = 10.
20
n = 10.
------------------------------------------------------------
/home/gmancini/.cache/ipython/cython/_cython_magic_45136ee03b41a20bcca5f6fda90449af.pyx:2:4: Can
In [63]: %%cython
cdef double d
d = 10
does an automatic casting
return step*mysum
In [65]: %%bash
gcc -O3 -ffast-math -pipe simple_pi.c -o cpi
gfortran -O3 -ffast-math -pipe simple_pi.f -o fpi
In [66]: %%capture cypi
%timeit cy_simple_pi(bigN)
In [67]: %%capture simplepi
%timeit simple_pi(bigN)
In [68]: %%capture num_pi
%timeit numpy_pi(bigN)
In [69]: %%capture f95_pi
%%timeit
%%bash -s $bigN
./fpi $1 >& /dev/null
21
In [70]: %%capture c_pi
%%timeit
%%bash -s $bigN
./cpi $1 >& /dev/null
Which seems to imply that if i is declared as a cdef integer type, it will optimise this into a pure C
loop. Remember the cost of opening a BASH subprocess.
In [74]: %%cython -a
def cy_simple_pi(int niter=int(1e6)):
"""
another version of arctg integration
using Cython
"""
22
cdef double s, mysum=.0, step=1./niter
cdef int i=0
return step*mysum
Invoking Cython (had we built a module using python setup.py ...) generates about 1000 lines
of C code which is then compiled and used when running cpi.cpi(). I.e. Cython has generate a
valid C code which exposes itself to Python and is not so bad compared to C.
Let’s see what happens with MonteCarlo
i = 0
inside = 0
for i in range(niter):
x = 2.*random.random() - 1
y = 2.*random.random() - 1
r = x*x + y*y
if r<=1:
inside+=1
i+=1
In [79]: %%bash
gcc -O3 -Wall -ffast-math -pipe -o mc_cpi pimc.c
23
In [81]: %%cython
import random
def pi_mc_cy(int niter=int(1e6)):
"""
another version of MonteCarlo integration using Cython
"""
cdef double x, y, r, PI
cdef int i=0, inside = 0
24
To sum up, just using %%cython and cdef yields performance an order of magnitude less than
C, but we have Numpy not pure Python. It is possible to use Numpy arrays and Cython?
In [84]: %%cython
import numpy as np
def pi_mc_np_cy(int niter=10000000):
"""
another version of MonteCarlo integration using Cython
"""
cdef double PI,inside
x = 2.*np.random.rand(niter)-1
y = 2.*np.random.rand(niter)-1
r = x*x + y*y
r = r[r<=1.]
inside = r.shape[0]
return 4.0*inside/niter
25
Why is that? The reason is that working with NumPy arrays incurs substantial Python over-
heads. We can do better by using Cython’s typed memoryviews, which provide more direct access
to arrays in memory. Like a standard numpy array view (e. g. a slice object) a memoryview stores
information on a memory location without holding it. Being typed, the cython compiler can access
it as it would with a standrd C array, avoiding interpreter overhead.
When using them, the first step is to create a NumPy array and then declare a memoryview
and bind it to the NumPy array.
In [86]: %%cython
import numpy as np
from numpy cimport float_t
def pi_mc_np_cy_mv(int niter=10000000):
"""
another version of MonteCarlo integration
using Cython and typed memoryviews
"""
cdef double PI, inside
x = 2.*np.random.rand(niter)-1
y = 2.*np.random.rand(niter)-1
cdef float_t [:] X = x
cdef float_t [:] Y = y
for i in range(niter):
if X[i]*X[i] + Y[i]*Y[i]<=1.:
inside += 1.
return 4.*inside/niter
In [87]: %%capture mv
%timeit pi_mc_np_cy_mv(bigN)
In [88]: times.append(search(pat,str(mv)).group(2))
print(times)
scale = 200
colour = ("k","b","r","g","c")
for point,method in enumerate(("Pure Python","Numpy","C","Cython","MemoryViews")):
plt.scatter(point+1,times[point],label=method,c=colour[point],s=scale,edgecolor=Non
plt.legend()
plt.ylabel("Elapsed time (ms)")
26
Note that memoryviews supports a number of operations, including copy, in analogy with
arrays: new_mv[:] = old_mv
or new_mv[...] = old_mv for all dimensions
In [94]: %%cython
cdef extern from "string.h":
int strlen(char *c)
def get_len(char *message):
return strlen(message)
In [95]: get_len(bytearray("www",encoding="ascii"))
Out[95]: 3
Release the GIL and using OpenMP Cython supports native parallelism with OpenMP; it is
also possible to MPI (e.g. with mpi4py) and it should be possible to exploit Cython to interface
27
with C code using MPI or pthreads (not easy). To use this kind of parallelism, we must release the
GIL. Note that the GIL is released whenever using low-level C-code (Numpy comes to mind).
To use OpenMP within Cython you have to:
2. Release the GIL before a block of code: with nogil: # This block of code is executed after
releasing the GIL
with gil:
\#this block of code will enable the GIL in a no-GIL context (kind of omp atomic or serial)
Note that any in-place operation is automatically taken to be a reduction variable, which means
that the thread local values are combined after all threads have completed. Further, the index
variable is always lastprivate, i.e. it will hold the value of the last.
return mysum*step
In [97]: pi_cy_omp(int(1e8))
Out[97]: 3.1415926535904264
In [98]: %%capture t1
%timeit pi_cy_omp(int(1e8),1)
28
In [99]: %%capture t2
%timeit pi_cy_omp(int(1e8),2)
In [100]: %%capture t4
%timeit pi_cy_omp(int(1e8),4)
In [101]: %%capture t6
%timeit pi_cy_omp(int(1e8),6)
In [102]: %%capture t8
%timeit pi_cy_omp(int(1e8),8)
In [104]: x = np.array((1,2,4,6,8))
pat = r'(.*):\s(.*)\s(s|ms)'
cy = list()
for i in (t1,t2,t4,t6,t8):
res = search(pat,str(i))
cy.append(float(res.group(2)))
if res.group(3) is "s":
cy[-1] = cy[-1]*1e3
cy = list(map(float,cy))
plt.plot(x,results,marker='s',color='k',ls='-',label="OpenMP test")
plt.plot(x,cy,marker='o',color='r',ls=':',label="Cython OpenMP test")
plt.legend()
plt.xlabel("Number of cores")
plt.ylabel("Elapsed time (ms)")
cy
Out[104]: [2660.0, 1370.0, 775.0, 529.0, 399.0]
29
8 Exercise
Section ??: given a diagonally dominant matrix A and a vector b, solve:
Ax = b
by iterating:
(k)
( k +1)
bi − ∑i̸= j Aij x j
xi =
Aii
.
Use Python/Numpy to create a random matrix A such as that Aii >= 2. ∗ ∑ j̸=i A j and a ran-
dom vector b; then implement the Jacobi solver using:
You can test that the algorithm converges by calculating every (nth) step:
∑ ( xi
( k +1) (k)
conv = − x i )2
i
and comparing against a preset tolerance; also you may test if it yields the correct result at the
end of the loop by computing the error as:
erri = bi − ∑ Aij
j
and
√
∑i erri2
err =
n
Possible testing parameters can be:
8.0.1 Solutions
Pure Python Version
30
A = np.random.rand(dim*dim) + tol
A.shape = (dim,dim)
b = np.random.rand(dim) + tol
x_0 = np.random.rand(dim) + tol
for i in range(dim):
A[i,i] = 1.5*(np.sum(A[i,:i])+np.sum(A[i,i:]))
return A,b,x_0
def solv_jac(A,b,xold,tol=1e-8,kmax=100):
"""
Jacobi solver
"""
dim = A.shape[0]
xnew = np.ones(dim)
k = 0
conv = kmax
xold = np.copy(xnew)
k += 1
return k,conv,xnew
def test_jac(A,b,x,debug=False):
"""
test Jacobi solver
"""
dim = A.shape[0]
mysum = np.zeros(dim)
for i in range(dim):
mysum[i] = b[i] - np.sum(A[i,:]*x)
err = math.sqrt(np.sum(mysum**2)/dim)
if not debug:
return err
else:
return err,mysum
31
In [106]: A,b,x_0 = gen_jac(1000)
32 0.0013954521625608946 7.78475077906e-09
Numpy Version
k = 0
conv = kmax
D = np.zeros((dim,dim))
d = np.diag(A)
np.fill_diagonal(D,d)
R = A - D
xnew = (b - np.dot(xold,R))/d
conv = np.sum((xnew - xold)**2)
xold = np.copy(xnew)
k += 1
return k,conv,xnew
35 0.004884897156685974 7.44462729776e-09
32
In [112]: %timeit solv_jac_np(A,b,x_0)
1 loop, best of 3: 3.16 s per loop
Cython version
In [113]: %%cython
import numpy as np
from numpy cimport float_t
def solv_jac_cy(A,b,xinit,double tol=1e-8,int kmax=100):
"""
Cython Jacobi solver
"""
xold = np.copy(xinit)
Xold[...] = Xnew
k += 1
return k,conv,np.array(Xnew)
In [114]: nk, conv, xnew = solv_jac_cy(A,b,x_0)
err = test_jac(A,b,xnew)
print(nk,err,conv)
35 0.004313748271923786 7.444846214863868e-09
33
Cython/Numpy version
In [116]: %%cython --compile-args=-fopenmp --link-args=-fopenmp
import numpy as np
from cython.parallel import prange
import cython
from numpy cimport float_t
@cython.cdivision(True)
@cython.boundscheck(False)
def solv_jac_cy_np(A,b,xinit,int NT,double tol=1e-8,int kmax=100):
"""
Cython/Numpy Jacobi solver using OpenMP
"""
xold = np.copy(xinit)
return k,conv,np.array(Xnew)
35 0.004313748271923786 7.444846214863865e-09
34
1 loop, best of 3: 1.25 s per loop
module fjac
contains
subroutine jacsolv(kmax,A,b,xold,conv,xnew,order)
implicit none
integer :: order
real(8), dimension(0:order-1,0:order-1) :: A
real(8), dimension(0:order-1) :: xold,b
real(8), dimension(0:order-1) :: xnew
35
!f2py intent(in) order
!f2py intent(out) xnew
!f2py depend(order) xnew
integer :: i, k, kmax
!f2py intent(in,out) kmax
real(8) :: conv, tol
!f2py intent(out) conv
real(8), dimension(0:order-1) :: dd, xtmp
real(8), dimension(0:order-1, 0:order-1) :: R, D
conv = kmax
tol = 1e-8
xtmp = xold
xnew = 1.
forall (i=0:size(dd)-1) D(i,i) = A(i,i)
forall (i=0:size(dd)-1) dd(i) = 1./A(i,i)
R = A-D
do k=1,kmax
conv = 0.
do i = 0,order-1
xnew(i) = (b(i) - DOT_PRODUCT(xtmp,R(i,:)) )*dd(i)
end do
conv = SUM((xnew-xtmp)*(xnew-xtmp))
xtmp = xnew
end do
kmax = k
end subroutine
end module fjac
This is quite standard Fortran 95 code, with the exception of the !f2py lines which are prepro-
cessor directives (like the #pragma ones).
Instead of directly calling gfortran we use f2py3 to generate the appropriate wrapping code
and then compile:
36
In [124]: !ls -rt
hello_script.py __pycache__
hello_mod.py jac_solv.f95
thread_en.png swig
simple_pi.f 08_MixedProgramming.slides.html
simple_pi.c pi_omp
pi_omp.c cpi
wcy.png fpi
wo_cython.png mc_cpi
cy_logo.png Untitled.ipynb
box.png 08_MixedProgramming.ipynb
array.png fjac.cpython-35m-x86_64-linux-gnu.so
pimc.c
In [126]: fjac.__doc__
In [127]: A.shape[0]
Out[127]: 10000
35 7.444846214863856e-09
37
%module testswig
%{
#include "test.h"
%}
%include "test.h"
the %module specifies the name of the module to be generated from this wrapper file. The
code between the %{ %} is placed, verbatim, in the C output file.
An interface for SWIG and Numpy Section ?? and can be now found Section ??. To estimate π
using MonteCarlo by passing Numpy arrays to an underlying C function we need:
/home/gmancini/Dropbox/Calcolo/08_Mixed/swig
#include <stdio.h>
#include <stdlib.h>
#include "swigmc.h"
int i, inside=0;
double r2, pi;
for(i=0;i<npoints;i++){
r2 = x[i]*x[i] + y[i]*y[i];
if(r2<=1.0){
inside++;
}
}
38
In [135]: !cat swigmc.i
%module swigmc
%{
#define SWIG_FILE_WITH_INIT
#include <stdio.h>
#include <stdlib.h>
#include "swigmc.h"
%}
%include "numpy.i"
%init %{
import_array();
%}
%inline %{
double swigpi(int npx, double* x, int npy, double* y){
printf("%f %f\n",x[0],y[0]);
double PI;
PI = calcpi(x,y,npx);
return PI;
}
%}
SHELL = /bin/sh
OBJECTS = swigmc.o
INCLUDE = swigmc.h
SWIG = swig
39
SWIGOPT = -python -Wall
SWIGOBJS = swigmc_wrap.o
swigmc_wrap.c: swigmc.i
$(SWIG) $(SWIGOPT) swigmc.i
swigmc_wrap.o: swigmc_wrap.c
$(CC) $(CFLAGS) $(CPPFLAGS) -c swigmc.c swigmc_wrap.c
clean:
rm -f *.so *.o *.pyc swigmc_wrap.c *py *gch
.SUFFIXES : .c .h .o
.c.o:
$(CC) $(INCLUDE) $(CFLAGS) -c $*.c
import numpy as np
import swigmc
npoints = int(1e7)
x = 2.*np.random.rand(npoints)-1
y = 2.*np.random.rand(npoints)-1
print(x[0],y[0])
40
pi = swigmc.swigpi(x,y)
print(pi)
In [139]: %%bash
#module load python-2.7
python test.PY
(-0.89016441899965737, -0.060196036224869243)
-0.890164 -0.060196
-0.890164 -0.060196
3.1418348
41