NumPy Array Operations
In this section, you will learn some basic NumPy array operations:
- Basic Arithmetic (Addition, Subtraction, Multiplication, Division)
- Scalar Operations
- Linear Algebra: Matrix Multiplication, Dot Products
- Aggregate Functions
- Universal Functions (ufunc)
We will have to briefly cover broadcasting, which will be explained in more detail later.
Basic Arithmetic
In NumPy, array operations for basic arithmetic are performed element-wise. This means that the operation is performed on each element in the array. Let's see some examples:
import numpy as np
# addition is done elementwise
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = a + b # array([5, 7, 9])
print(result)
[5 7 9]
# we use reshape to create a 3x3 matrix for us
# we will learn more about this later
a = np.arange(1,10).reshape((3, 3))
b = np.arange(10,19).reshape((3, 3))
print(a)
print()
print(b)
print()
print(a + b)
[[1 2 3]
[4 5 6]
[7 8 9]]
[[10 11 12]
[13 14 15]
[16 17 18]]
[[11 13 15]
[17 19 21]
[23 25 27]]
Multiplication, Division, and Subtraction
The NumPy array operations for multiplication, division, and subtraction all work the same as addition, which we saw above.
# multiplication is done elementwise
print(a)
print()
print(b)
print()
# elementwise multiplication
print(a * b)
[[1 2 3]
[4 5 6]
[7 8 9]]
[[10 11 12]
[13 14 15]
[16 17 18]]
[[ 10 22 36]
[ 52 70 90]
[112 136 162]]
# division is done elementwise
print(a)
print()
print(b)
print()
# elementwise division
print(b / a)
[[1 2 3]
[4 5 6]
[7 8 9]]
[[10 11 12]
[13 14 15]
[16 17 18]]
[[10. 5.5 4. ]
[ 3.25 2.8 2.5 ]
[ 2.28571429 2.125 2. ]]
# subtraction is done elementwise
print(a)
print()
print(b)
print()
# subtraction is done elementwise
print(b - a)
[[1 2 3]
[4 5 6]
[7 8 9]]
[[10 11 12]
[13 14 15]
[16 17 18]]
[[9 9 9]
[9 9 9]
[9 9 9]]
Scalar Operations
In NumPy, you can also perform operations with scalars. For example, if you multiply an array by a scalar, the scalar is multiplied by each element in the array. Let's see some examples:
print(a)
print()
# multiplication each element by 3
print(a * 3)
[[1 2 3]
[4 5 6]
[7 8 9]]
[[ 3 6 9]
[12 15 18]
[21 24 27]]
# division each element by 10
print(a / 10)
[[0.1 0.2 0.3]
[0.4 0.5 0.6]
[0.7 0.8 0.9]]
# subtraction each element by 5
print (a - 5)
[[-4 -3 -2]
[-1 0 1]
[ 2 3 4]]
# square each element
print (a ** 2)
[[ 1 4 9]
[16 25 36]
[49 64 81]]
Linear Algebra & NumPy Matrix Multiplication
NumPy also has some built-in functions for linear algebra, which we will briefly cover here. Matrix Multiplication uses the dot function or the @ operator. Below are some examples of NumPy Matrix Multiplication:
# matrix multiplication
a = np.arange(1,10).reshape((3, 3))
b = np.arange(10,19).reshape((3, 3))
# matrix multiplication
c = np.dot(a,b)
print(c)
[[ 84 90 96]
[201 216 231]
[318 342 366]]
# matrix multiplication - same as above
d = a.dot(b)
print(d)
[[ 84 90 96]
[201 216 231]
[318 342 366]]
print(a)
print()
print(b)
print()
# matrix multiplication - same as above
e = a @ b
print(e)
[[1 2 3]
[4 5 6]
[7 8 9]]
[[10 11 12]
[13 14 15]
[16 17 18]]
[[ 84 90 96]
[201 216 231]
[318 342 366]]
# matrix multiplication - same as above
f = np.matmul(a,b)
print(f)
[[ 84 90 96]
[201 216 231]
[318 342 366]]
Why Are There Four Different Methods in NumPy Matrix Multiplication?
Well, first, there are really two methods:
np.matmul()is the same as@; the latter is just a shorthand for the former.np.dot()is the same asarr.dot()the latter is just the method version of the function.
The answer is slightly historic, and also, the four methods I just showed you are identical for both 2D and 1D arrays, but would give different results for higher dimensional arrays.
The @ operator and matmul functions were introduced later into NumPy to allow authors to be explicit that the operations were intended to be matrix multiplication. This is because the .dot function does not do matrix multiplication for higher dimensional arrays.
Generally speaking, as you start to use numpy, you should use the @ operator or matmul function for matrix multiplication and the .dot function for dot products. Don't worry too much about figuring out "what happens in higher dimensions" - if you need to do that, you can look it up in the documentation. It's quite complicated, so I wouldn't touch it until I needed it.
Just for Fun
Here is the head-scratcher that happens in higher dimensions. I will refer you to the documentation if you want to investigate further. I won't be trying to explain this, though!
It's both:
- beyond the scope of this course
- not something you need to learn about until you need to learn about it. (You will know when! Tackle it then)
import numpy as np
# Random 3D array of shape (2, 3, 4)
A = np.random.rand(2, 3, 4)
# Random 3D array of shape (2, 4, 3)
B = np.random.rand(2, 4, 3)
# Using np.dot
C_dot = np.dot(A, B)
print(C_dot.shape)
# Outputs: (2, 3, 2, 3)
# Using @
C_matmul = A @ B
print(C_matmul.shape)
# Outputs: (2, 3, 3)
(2, 3, 2, 3)
(2, 3, 3)
Linear Algebra Routines
There are a number of linear algebra routines in NumPy. We will not cover them all here, but you can do things like find the determinant of a matrix, find the inverse of a matrix, etc. You can find the full list of routines here.
Aggregate Functions
More useful and common than linear algebra routines are aggregate functions. These functions take an array as input and return a single value. For example, you can find the sum of all the elements in an array, the mean, the standard deviation, etc. Let's see some examples:
print(a)
print()
# sum of all elements
print(np.sum(a))
[[1 2 3]
[4 5 6]
[7 8 9]]
45
# mean of all elements
np.mean(a)
5.0
# standard deviation of all elements
np.std(a)
2.581988897471611
Aggregate Functions Along an Axis
It's more often more useful to perform aggregate functions along a specific axis. For example, you might want to find the sum of each column in a matrix. Let's see some examples:
print(a)
print()
# sum of each column
np.sum(a, axis=0)
[[1 2 3]
[4 5 6]
[7 8 9]]
array([12, 15, 18])
print(a)
print()
# sum of each row
np.sum(a, axis=1)
[[1 2 3]
[4 5 6]
[7 8 9]]
array([ 6, 15, 24])
What Is a NumPy Axis?
In NumPy (and in Pandas), the axis refers to the dimension of the array that you want to perform the operation on. When we say .np.sum(a, axis=0) we mean "sum along the 0th axis", which means down the rows --> which returns the sum of the columns. When we say np.sum(a, axis =1), we are saying sum along axis 1, which means sum along the columns --> which returns the sum of the rows. The following picture helps to summarize sums of a NumPy axis.
# some more examples
# standard deviation of each column
print(np.std(a, axis=0))
# standard deviation of each row
print(np.std(a, axis=1))
[2.44948974 2.44948974 2.44948974]
[0.81649658 0.81649658 0.81649658]
# If you are struggling to see it, then I
# suggest using a non-symmetric matrix, such as:
a = np.arange(1,13).reshape((2, 6))
print(a)
print()
# mean of each column, will be 6 values
print(a.sum(axis=0))
print()
# mean of each row, will be 2 values
print(a.sum(axis=1))
[[ 1 2 3 4 5 6]
[ 7 8 9 10 11 12]]
[ 8 10 12 14 16 18]
[21 57]
Other Stats
NumPy has loads of stats you can run. You can check the documentation here
NumPy uFuncs
NumPy uFuncs, or Universal functions, are functions that can be applied to each element in an array. These are also called element-wise functions. We have already seen some examples of ufuncs, such as when we multiplied a matrix by a scalar. Here is a list of some other universal functions:
np.sqrt()- square rootnp.exp()- exponentialnp.log()- natural lognp.abs()- absolute valuenp.sin()- sinenp.cos()- cosinenp.tan()- tangent- any kind of addition, subtraction, multiplication, division, etc.
# square root of each element
np.sqrt(a)
array([[1. , 1.41421356, 1.73205081, 2. , 2.23606798,
2.44948974],
[2.64575131, 2.82842712, 3. , 3.16227766, 3.31662479,
3.46410162]])
# natural log of each element
np.log(a)
array([[0. , 0.69314718, 1.09861229, 1.38629436, 1.60943791,
1.79175947],
[1.94591015, 2.07944154, 2.19722458, 2.30258509, 2.39789527,
2.48490665]])
# exponential of each element
np.exp(a)
array([[2.71828183e+00, 7.38905610e+00, 2.00855369e+01, 5.45981500e+01,
1.48413159e+02, 4.03428793e+02],
[1.09663316e+03, 2.98095799e+03, 8.10308393e+03, 2.20264658e+04,
5.98741417e+04, 1.62754791e+05]])
uFuncs or Universal Functions are Fast
Universal functions are fast because they are implemented in compiled C code. They will use complex algorithms to operate on arrays very quickly, performing the operations orders of magnitude faster than if you were to write a for loop to do the same thing.
Let's see some examples of speedy NumPy uFuncs in action:
# Create a large array of a million elements
data = np.random.rand(1000000)
# Define a function for adding using a loop
def add_python_loop(arr):
result = np.empty(len(arr))
for i in range(len(arr)):
# adding 1 to each element
result[i] = arr[i] + 1
return result
%%timeit
# Now, we'll use the %%timeit magic command to measure the loop version
add_python_loop(data)
191 ms ± 5.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
# Using a ufunc for adding
result_ufunc = data + 1
# NumPy uses broadcasting and ufuncs to do element-wise
# operations, we will learn about broadcasting soon.
1.14 ms ± 31.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
On my computer, for the for loop method, I got 205 ms ± 13.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) Which is 205 milliseconds, or 0.205 seconds.
For the ufunc method, I got 952 µs ± 26.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) which is 952 microseconds, or 0.000952 seconds.
That is a lot faster! This speedup is often called vectorized code. So if you see someone saying, "you should vectorize your code," they are saying, "you should use uFuncs instead of for loops". This can be a bit confusing because we are using vectors in both cases, but the difference is that the ufunc is operating on the entire vector at once, whereas the for loop is operating on each element in the vector one at a time.
The basic premise of vectorization is that you should try to use ufuncs whenever possible and avoid for loops whenever possible. This is because ufuncs are much faster than for loops. However, there are some cases where you will need to use a for loop, and that is okay. Just try to use ufuncs whenever possible.
Also, don't forget that premature optimization is the root of all evil. If you are just starting out, don't worry about vectorization. Just write your code in whatever way makes sense to you. Once you have a working program, then you can go back and try to optimize it (when you are convinced it's not fast enough). .2 seconds is still pretty quick, after all.
A Word on Statistical Functions
Some functions like np.mean() look like uFunc's because they operate on the entire array. However, they do not operate element-wise they require aggregating data. So they are not uFuncs. They are still definitely faster than running your own for-loop to find the mean, but technically they are not uFuncs. This distinction does not really matter, but I thought I would mention it since we are in the weeds here. It's kinda like ... I researched all this to teach you, so I can't help but tell you. But I never knew this, and I have used NumPy for 10+ years just fine...
Summary: NumPy Array Operations
- NumPy supports basic arithmetic operations (
+,-,*,/) that are performed element-wise on arrays, along with scalar operations where a scalar value is applied to each element of an array. - Linear algebra operations like NumPy matrix multiplication can be performed using functions like
np.dot(),arr.dot(),np.matmul(), or the@operator, while other linear algebra routines are available for operations like finding determinants and inverses. - Aggregate functions like
np.sum(),np.mean(), andnp.std()can be applied to arrays to compute summary statistics, and these functions can operate along specific axes of multi-dimensional arrays. - NumPy provides Universal Functions (
ufuncs) likenp.sqrt(),np.exp(),np.log(), and trigonometric functions that perform element-wise operations on arrays, leveraging optimized C code for improved performance compared to Python loops.