Numpy Crash Course - Sharp Sight
Numpy Crash Course - Sharp Sight
C RASH C OURSE
sharpsightlabs.com 1
Table of Contents
A quick introduction to Numpy arrays 3
sharpsightlabs.com 2
A quick introduction to Numpy arrays
One of the cornerstones of the Python data science ecosystem is Numpy, and
That being the case, if you want to learn data science in Python, you’ll need to
sharpsightlabs.com 3
What is a Numpy array?
A Numpy array is a collection of elements that have the same data type.
You can think of it like a container that has several compartments that hold data,
We have a set of integers: 88, 19, 46, 74, 94. These values are all integers; they
are all of the same type. You can see that these values are stored in
Very quickly, I’ll explain a little more about some of the properties of a Numpy
array.
sharpsightlabs.com 4
Numpy arrays must contain data all of the same type
As I mentioned above, Numpy arrays must contain data all of the same type.
That means that if your Numpy array contains integers, all of the values must be
integers. If it contains floating point numbers, all of the values must be floats.
I won’t write extensively about data types and Numpy data types here. There is
a section below in this blog post about how to create a Numpy array of a
particular type.
sharpsightlabs.com 5
If you’re familiar with computing in general, and Python specifically, you’re
probably familiar with indexes. Many data structures in Python have indexes,
If you’re not familiar with indexes though, let me explain. Again, an index is sort
Just like other Python structures that have indexes, the indexes of a Numpy
So if you want to reference the value in the very first location, you need to
reference location “0”. In the example shown here, the value at index 0 is 88.
sharpsightlabs.com 6
I’ll explain how exactly to use these indexes syntactically, but to do that, I want
to give you working examples. To give you working examples, I’ll need to explain
There are a lot of ways to create a Numpy array. Really. A lot. Off the top of my
head, I can think of at least a half dozen techniques and functions that will
create a Numpy array (we’ll discuss a few in this Crash Course). In fact, the
But here, I want to start simple. I’ll show you a few very basic ways to do it.
In particular, I’ll how you how to use the Numpy array() function.
To use the Numpy array() function, you call the function and pass in a Python list
as the argument.
sharpsightlabs.com 7
Let’s take a look at some examples. We’ll start by creating a 1-dimensional
Numpy array.
You call the function with the syntax np.array(). Keep in mind that before you call
np.array(), you need to import the Numpy package with the code import numpy
as np.
When you call the array() function, you’ll need to provide a list of elements as the
#import Numpy
import numpy as np
sharpsightlabs.com 8
We’ve called the np.array() function. The argument to the function is a list of
Note that you can also create Numpy arrays with other data types, besides
integers. I’ll explain how to do that a little later in this Crash Course.
To do this using the np.array() function, you need to pass in a list of lists.
# 2-d array
np.array([[1,2,3],[4,5,6]])
Inside of the call to np.array(), there is a list of two lists: [[1,2,3],[4,5,6]]. The
first list is [1,2,3] and the second list is [4,5,6]. Those two lists are contained
inside of a larger list. The whole input is a list of lists. That list of lists is passed
sharpsightlabs.com 9
This might be a little confusing if you’re just getting started with Python and
Numpy. In that case, I highly recommend that you review Python lists.
There are also other ways to create a 2-d Numpy array. For example, you can
use the array() function to create a 1-dimensional Numpy array, and then use
# 2-d array
np.array([1,2,3,4,5,6]).reshape([2,3])
For right now, I don’t want to get too “in the weeds” explaining reshape(), so
I’ll leave this as it is. I just want you to understand that there are a few ways to
I’ll write more about how to create and work with 2-dimensional Numpy arrays
sharpsightlabs.com 10
N-dimensional Numpy arrays
Numpy arrays. However, in the interest of simplicity, I’m not going to explain
Using the Numpy array() function, we can also create Numpy arrays with
specific data types. Remember that in a Numpy array, all of the elements must
To do this, we need to use the dtype parameter inside of the array() function.
integer
To create a Numpy array with integers, we can use the code dtype = 'int'.
sharpsightlabs.com 11
float
Similarly, to create a Numpy array with floating point number, we can use the
These are just a couple of examples. Keep in mind that Numpy supports almost
2 dozen data types … many more than what I’ve shown you here.
Having said that, a full explanation of Python data types and Numpy data types
is beyond the scope of this book. Just understand that you can specify the data
sharpsightlabs.com 12
Common mistakes when creating a Numpy array
I want to point out one common mistake that many beginners make when they
As I mentioned above, when you create a Numpy array with np.array(), you
Many beginners forget to do this and simply provide the values directly to the
In the two examples above, pay close attention to the syntax. The top example
works properly because the integers are contained inside of a Python list. The
second example causes an error because the integers are passed directly to
sharpsightlabs.com 13
Having said that, pay attention! Make sure that when you use np.array(),
Again, if you’re confused about this or don’t understand Python lists, I strongly
recommend that you go back and review lists and other basic “built-in types” in
Python.
Numpy arrays have a set of attributes that you can access. These attributes
include things like the array’s size, shape, number of dimensions, and data type.
I want to show you a few of these. To illustrate them, let’s make a Numpy array
sharpsightlabs.com 14
Here, we’ll create a simple Numpy array using np.random.randint().
np.random.seed(72)
simple_array = np.random.randint(low = 0, high = 100, size=5)
simple_array is a Numpy array, and like all Numpy arrays, it has attributes.
You can access those attributes by using a dot after the name of the array,
ndim
simple_array.ndim
sharpsightlabs.com 15
1
shape
The shape attribute tells us the number of elements along each dimension.
simple_array.shape
(5,)
What this is telling us is that simple_array has 5 elements along the first axis.
dimensional.)
size
The size attribute tells you the total number of elements in a Numpy array.
sharpsightlabs.com 16
simple_array.size
dtype tells you the type of data stored in the Numpy array.
Let’s take a look. We can access the dtype parameter like this:
simple_array.dtype
sharpsightlabs.com 17
Which produces the output:
dtype('int64')
Also remember: Numpy arrays contain data that are all of the same type.
For example, we can create a Numpy array with decimal values (i.e., floats):
array_float = np.array([1.99,2.99,3.99] )
array_float.dtype
dtype('float64')
sharpsightlabs.com 18
When we construct the array with the above input values, you can see that
decimals).
Now that I’ve explained attributes, let’s examine how to index Numpy arrays.
Indexing is very important for accessing and retrieving the elements of a Numpy
array.
an “index.”
sharpsightlabs.com 19
Notice again that the index of the first value is 0.
We can use the index to retrieve specific values in the Numpy array. Let’s take a
np.random.seed(72)
simple_array = np.random.randint(low = 0, high = 100, size=5)
You can print out the array with the following code:
print(simple_array)
sharpsightlabs.com 20
In this visual representation, you can see the values stored in the array: 88, 19,
46, 74, 94. But, I’ve also shown you the index values associated with each of
those elements.
The simplest form of indexing is retrieving a single value from the array.
To retrieve a single value from particular location in the Numpy array, you need
Syntactically, you need to use bracket notation and provide the index inside of
the brackets.
sharpsightlabs.com 21
Let me show you an example. Above, we created the Numpy array
simple_array.
To get the value at index 1 from simple_array, you can use the following syntax:
sharpsightlabs.com 22
So the code simple_array[1] is basically saying, “give me the value that’s at
Numpy also supports negative index values. Using a negative index allows you
Here’s an example:
simple_array[-1]
sharpsightlabs.com 23
We could also retrieve this value by using the index 4 (both will work). But
sometimes you won’t know exactly how long the array is. This is a convenient
I just showed you simple examples of array indexing, but array indexing can be
quite complex.
To do this, we still use bracket notation, but we can use a colon to specify a
simple_array[2:4]
This code is saying, “retrieve the values stored from index 2, up to but excluding
index 4.”
sharpsightlabs.com 24
Visually, we can represent this as follows:
Now that you’ve learned how to use indexes in 1-dimensional Numpy arrays,
Working with 2-d Numpy arrays is very similar to working with 1-d arrays. The
major difference (with regard to indexes) is that 2-d arrays have 2 indexes, a row
To retrieve a value from a 2-d array, you need to provide the specific row and
column indexes.
Here’s an example. We’ll create a 2-d Numpy array, and then we’ll retrieve a
value.
sharpsightlabs.com 25
np.random.seed(72)
square_array = np.random.randint(low = 0
,high = 100
,size = 25).reshape([5,5])
square_array[2,1]
Here, we’re essentially retrieving the value at row index 2 and column index 1.
This is fairly straightforward. The major challenge is that you need to remember
that the row index is first and the column index is second.
sharpsightlabs.com 26
Slicing 2-d Numpy arrays
Finally, let’s review how to retrieve slices from 2-d Numpy arrays. Slicing 2-d
arrays is very similar to slicing 1-d arrays. The major difference is that you need
to provide 2 ranges, one for the rows and one for the columns.
np.random.seed(72)
square_array = np.random.randint(low = 0
,high = 100
,size = 25).reshape([5,5])
square_array[1:3,1:4]
sharpsightlabs.com 27
Let’s break this down.
Then, we took a slice of that array. The slice included the rows from index 1 up-
excluding index 4.
This might seem a little confusing if you’re a true beginner. In that case, I
recommend working with 1-d arrays first, until you get the hang of them. Then,
start working with relatively small 2-d Numpy arrays until you build your intuition
sharpsightlabs.com 28
Numpy axes explained
It will explain what a Numpy axis is. It will also explain how axes work, and how
But before I get into a detailed explanation of Numpy axes, let me just start by
Numpy axes are one of the hardest things to understand in the Numpy system.
If you’re just getting started with Numpy, this is particularly true. Many beginners
Don’t worry, it’s not you. A lot of Python data science beginners struggle with
this.
sharpsightlabs.com 29
Having said that, this chapter will explain all the essentials that you need to
know.
Let’s start with the basics. I’ll make Numpy axes easier to understand by
If you’re reading this book, chances are you’ve taken more than a couple of
math classes.
Think back to early math, when you were first learning about graphs.
You learned about Cartesian coordinates. Numpy axes are very similar to axes in
You probably remember this, but just so we’re clear, let’s take a look at a simple
sharpsightlabs.com 30
A simple 2-dimensional Cartesian coordinate system has two axes, the x axis
directions).
So if we have a point at position (2, 3), we’re basically saying that it lies 2 units
sharpsightlabs.com 31
If all of this is familiar to you, good. You’re half way there to understanding
Numpy axes.
sharpsightlabs.com 32
Numpy axes are the directions along the rows and columns
In a 2-dimensional Numpy array, the axes are the directions along the rows and
columns.
Assuming that we’re talking about multi-dimensional arrays, axis 0 is the axis
sharpsightlabs.com 33
Keep in mind that this really applies to 2-d arrays and multi dimensional arrays.
1-dimensional arrays are a bit of a special case, and I’ll explain those later in the
tutorial.
When we’re talking about 2-d and multi-dimensional arrays, axis 1 is the axis
sharpsightlabs.com 34
Once again, keep in mind that 1-d arrays work a little differently. Technically, 1-d
arrays don’t have an axis 1. I’ll explain more about this later in the tutorial.
It is probably obvious at this point, but I should point out that array axes in
sharpsightlabs.com 35
This is just like index values for Python sequences. In Python sequences – like
lists and tuples – the values in a the sequence have an index associated with
them.
So, let’s say that we have a Python list with a few capital letters:
alpha_list = ['A','B','C','D']
alpha_list.index('A')
Here, A is the first item in the list, but the index position is 0.
Essentially all Python sequences work like this. In any Python sequence – like a
sharpsightlabs.com 36
Numbering of Numpy axes essentially works the same way. They are numbered
starting with 0. So the “first” axis is actually “axis 0.” The “second” axis is “axis
In the following section, I’m going to show you examples of how Numpy axes
are used in Numpy, but before I show you that, you need to remember that the
The details that I just explained, about axis numbers, and about which axis is
Having said that, before you move on to the examples, make sure you really
Now that we’ve explained how Numpy axes work in general, let’s look at some
sharpsightlabs.com 37
These examples are important, because they will help develop your intuition
about how Numpy axes work when used with Numpy functions.
Before we start working with these examples, you’ll need to run a small bit of
code:
import numpy as np
This code will basically import the Numpy package into your environment so you
can work with it. Going forward, you’ll be able to reference the Numpy package
as np in our syntax.
Before I show you the following examples, I want to give you a piece of advice.
sharpsightlabs.com 38
To understand how to use the axis parameter in the Numpy functions, it’s very
important to understand what the axis parameter actually controls for each
function.
function, the axis parameter behaves in a way that many people think is counter
intuitive.
I’ll explain exactly how it works in a minute, but I need to stress this point: pay
very careful attention to what the axis parameter actually controls for each
function.
Let’s take a look at how Numpy axes work inside of the Numpy sum function.
When trying to understand axes in Numpy sum, you need to know what the axis
sharpsightlabs.com 39
Said differently, the axis parameter controls which axis will be collapsed.
Remember, functions like sum(), mean(), min(), median(), and other statistical
Imagine you have a set of 5 numbers. If sum up those 5 numbers, the result will
Similarly, when you use np.sum() on a 2-d array with the axis parameter, it is
going to collapse your 2-d array down to a 1-d array. It will collapse the data and
When you use the Numpy sum function with the axis parameter, the axis that
sharpsightlabs.com 40
Numpy sum with axis = 0
Here, we’re going to use the Numpy sum function with axis = 0.
And let’s quickly print it out, so you can see the contents.
print(np_array_2d)
[[0 1 2]
[3 4 5]]
The array, np_array_2d, is a 2-dimensional array that contains the values from
0 to 5 in a 2-by-3 format.
Next, let’s use the Numpy sum function with axis = 0.
sharpsightlabs.com 41
np.sum(np_array_2d, axis = 0)
array([3, 5, 7])
When we set axis = 0, the function actually sums down the columns. The
result is a new Numpy array that contains the sum of each column. Why?
As I mentioned earlier, the axis parameter indicates which axis gets collapsed.
sharpsightlabs.com 42
So when we set axis = 0, we’re not summing across the rows. When we set
axis = 0, we’re aggregating the data such that we collapse the rows … we
collapse axis 0.
Now, let’s use the Numpy sum function on our array with axis = 1.
In this example, we’re going to reuse the array that we created earlier,
np_array_2d.
print(np_array_2d)
OUT:
[[0 1 2]
[3 4 5]]
sharpsightlabs.com 43
Next, we’re going to use the sum function, and we’ll set the axis parameter to
axis = 1.
np.sum(np_array_2d, axis = 1)
array([3, 12])
Let me explain.
Again, with the sum() function, the axis parameter sets the axis that gets
Recall from earlier in this tutorial that axis 1 refers to the horizontal direction
summation.
sharpsightlabs.com 44
setting axis = 1, Numpy would sum down the columns, but that’s not how it
works.
The code has the effect of summing across the columns. It collapses axis 1.
Here, we’re going to work with the axis parameter in the context of using the
sharpsightlabs.com 45
When we use the axis parameter with the np.concatenate() function, the
axis parameter defines the axis along which we stack the arrays. If that doesn’t
make sense, then work through the examples. It will probably become more
clear once you run the code and see the output.
In both of the following examples, we’re going to work with two 2-dimensional
Numpy arrays:
np_array_1s = np.array([[1,1,1],[1,1,1]])
np_array_9s = np.array([[9,9,9],[9,9,9]])
array([[1, 1, 1],
[1, 1, 1]])
And:
array([[9, 9, 9],
[9, 9, 9]])
sharpsightlabs.com 46
Numpy concatenate with axis = 0
array([[1, 1, 1],
[1, 1, 1],
[9, 9, 9],
[9, 9, 9]])
Recall what I mentioned a few paragraphs ago. When we use the concatenate
function, the axis parameter defines the axis along which we stack the arrays.
So when we set axis = 0, we’re telling the concatenate function to stack the
two arrays along the rows. We’re specifying that we want to concatenate the
sharpsightlabs.com 47
Numpy concatenate with axis = 1
1.
Here, we’re going to reuse the two 2-dimensional Numpy arrays that we just
We’re going to use the concatenate function to combine these arrays together
horizontally.
sharpsightlabs.com 48
np.concatenate([np_array_1s, np_array_9s], axis = 1)
Which produces the following output:
array([[1, 1, 1, 9, 9, 9],
[1, 1, 1, 9, 9, 9]])
If you’ve been reading carefully and you’ve understood the other examples in
These arrays are 2 dimensional, so they have two axes, axis 0 and axis 1. Axis 1
is the axis that runs horizontally across the columns of the Numpy arrays.
since axis 1 is the axis that runs horizontally across the columns.
sharpsightlabs.com 49
Warning: 1-dimensional arrays work differently
Hopefully this Numpy axis tutorial helped you understand how Numpy axes
work.
But before I end this section, I want to give you a warning: 1-dimensional arrays
work differently!
Everything that I’ve said in this post really applies to 2-dimensional arrays (and
sharpsightlabs.com 50
The axes of 1-dimensional Numpy arrays work differently. For beginners, this is
Having said all of that, let me quickly explain how axes work in 1-dimensional
Numpy arrays.
The important thing to know is that 1-dimensional Numpy arrays only have one
axis.
If 1-d arrays only have one axis, can you guess the name of that axis?
sharpsightlabs.com 51
So, in a 1-d Numpy array, the first and only axis is axis 0.
The fact that 1-d arrays have only one axis can cause some results that confuse
Numpy beginners.
Let me show you an example of some of these “confusing” results that can
np_array_1s_1dim = np.array([1,1,1])
np_array_9s_1dim = np.array([9,9,9])
print(np_array_1s_1dim)
print(np_array_9s_1dim)
sharpsightlabs.com 52
Output:
[1 1 1]
[9 9 9]
0.
Output:
array([1, 1, 1, 9, 9, 9])
This output confuses many beginners. The arrays were concatenated together
horizontally.
sharpsightlabs.com 53
This is different from how the function works on 2-dimensional arrays. If we use
concatenating these arrays along axis 0. The issue is that in 1-d arrays, axis 0
Moreover, you’ll also run into problems if you try to concatenate these arrays on
axis 1.
Try it:
sharpsightlabs.com 54
This code causes an error:
on an axis that doesn’t exist in these arrays. Therefore, the code generates an
error.
All of this is to say that you need to be careful when working with 1-dimensional
arrays. When you’re working with 1-d arrays, and you use some Numpy
functions with the axis parameter, the code can generate confusing results.
The results make a lot of sense if you really understand how Numpy axes work.
But if you don’t understand Numpy array axes, the results will probably be
confusing.
sharpsightlabs.com 55
So make sure that before you start working with Numpy array axes that you
sharpsightlabs.com 56
How to use the Numpy arange function
The Numpy arange function (sometimes called np.arange) is a tool for creating
If you’re learning data science in Python, the Numpy toolkit is important. The
Having said that, this tutorial will show you how to use the Numpy arange
function in Python.
It will explain how the syntax works. It will also show you some working
examples of the np.arange function, so you can play with it and see how it
operates.
The Numpy arange function returns evenly spaced numeric values within an
sharpsightlabs.com 57
We can call the arange() function like this:
numpy.arange(5)
Having said that, what’s actually going on here is a little more complicated, so to
fully understand the np.arange function, we need to examine the syntax. Once
we look at the syntax, I’ll show you more complicated examples which will make
sharpsightlabs.com 58
The syntax of numpy arange
Like essentially all of the Numpy functions, you call the function name and then
there are a set of parameters that enable you to specify the exact behavior of
the function.
Assuming that you’ve imported Numpy into your environment as np, you call the
Then inside of the arange() function, there are 4 parameters that you can
modify:
sharpsightlabs.com 59
• start
• stop
• step
• dtype
Let’s take a look at each of those parameters, so you know what each one does.
start (optional)
stop (required)
The stop parameter indicates the end of the range. Keep in mind that like all
Python indexing, this value will not be included in the resulting range. (The
examples below will explain and clarify this point.) So essentially, the sequence
sharpsightlabs.com 60
step (optional)
The step parameter specifies the spacing between values in the sequence.
This parameter is optional. If you don’t specify a step value, by default the step
value will be 1.
dtype (optional)
Python and Numpy have a variety of data types that can be used here.
Having said that, if you don’t specify a data type, it will be inferred based on the
Now, let’s work through some examples of how to use the Numpy arange
function.
Before you start working through these examples though, make sure that you’ve
sharpsightlabs.com 61
# IMPORT NUMPY
import numpy as np
np.arange(stop = 5)
sharpsightlabs.com 62
Notice a few things.
First, we didn’t specify a start value. Because of this, the sequence starts at “0.”
Second, when we used the code stop = 5, the “5” serves as the stop position.
This causes Numpy to create a sequence of values starting from 0 (the start
Next, notice the spacing between values. The values are increasing in “steps” of
1. This is because we did not specify a value for the step parameter. If we don’t
Finally, the data type is integer. We didn’t specify a data type with the dtype
parameter, so Python has inferred the data type from the other arguments to the
function.
One last note about this example. In this example, we’ve explicitly used stop =
parameter. It’s possible to remove the parameter itself, and just leave the
sharpsightlabs.com 63
np.arange(5)
Here, the value “5” is treated a positional argument to the stop parameter.
Python “knows” that the value “5” serves as the stop point of the range. Having
said that, I think it’s much clearer to explicitly use the parameter names. It’s
Now that you’ve seen a basic example, let’s look at something a little more
complicated.
sharpsightlabs.com 64
The code creates a ndarray object like this:
Numpy array: 0, 2, 4, 6.
Let’s take a step back and analyze how this worked. The output range begins at
The output range then consists of values starting from 0 and incrementing in
steps of 2: 2, 4, 6.
The range stops at 6. Why? We set the stop parameter to 8. Remember though,
increment by the step value of 2, it will produce the value of 8, which should be
sharpsightlabs.com 65
Specify the data type for np.arange
As noted above, you can also specify the data type of the output array by using
Here’s an example:
First of all, notice the decimal point at the end of each number. This essentially
How did we create this? This is very straightforward, if you’ve understood the
prior examples.
We’ve called the np.arange function starting from 1 and stopping at 5. And
we’ve set the datatype to float by using the syntax dtype = 'float'.
sharpsightlabs.com 66
Keep in mind that we used floats here, but we could have one of several
different data types. Python and Numpy have a couple dozen different data
types. These are all available when manipulating the dtype parameter.
reshape method.
understand.
sharpsightlabs.com 67
This code creates a Numpy array like this:
Notice how the code worked. The sequence of values starts at 1, which is the
start value. The final value is 9. That’s because we set the stop parameter equal
to 10; the range of values will be up to but excluding the stop value. Because
Ok. Let’s take this a step further to turn this into a 3 by 3 array.
Here, we’re going to use the reshape method to re-shape this 1-dimensional
sharpsightlabs.com 68
Notice that this is just a modification of the code we used a moment ago. We’re
dimensional array with 3 values along the rows and 3 values along the columns.
In the last few examples, I’ve given you an overview of how the Numpy arange
function works.
It’s pretty straightforward once you understand the syntax, and it’s not that hard
to learn.
Having said that, make sure that you study and practice this syntax. Like all
simple examples that I’ve shown above and try to recall the code. Practice and
review this code until you know how the syntax works from memory.
sharpsightlabs.com 69
How to use the Numpy linspace function
It’s somewhat similar to the Numpy arange function, in that it creates sequences
There are some differences though. Moreover, some people find the linspace
It’s not that hard to understand, but you really need to learn how it works.
That being said, this tutorial will explain how the Numpy linspace function
works. It will explain the syntax, and it will also show you concrete examples of
Near the end of the chapter, this will also explain a little more about how
Ok, first things first. Let’s look a little more closely at what the np.linspace
sharpsightlabs.com 70
Numpy linspace creates sequences of evenly spaced
values within an interval
The Numpy linspace function creates sequences of evenly spaced values within
a defined interval.
Essentially, you specify a starting point and an ending point of an interval, and
then specify the total number of breakpoints you want within that interval
(including the start and end points). The np.linspace function will return a
To illustrate this, here’s a quick example. (We’ll look at more examples later, but
This code produces a Numpy array (an ndarray object) that looks like the
following:
sharpsightlabs.com 71
That’s the ndarray that the code produces, but we can also visualize the
We specified that interval with the start and stop parameters. In particular, this
sharpsightlabs.com 72
We also specified that we wanted 5 observations within that range. So, the
linspace function returned an ndarray with 5 evenly spaced elements. The first
element is 0. The last element is 100. The remaining 3 elements are evenly
As should be expected, the output array is consistent with the arguments we’ve
Having said that, let’s look a little more closely at the syntax of the
np.linspace function so you can understand how it works a little more clearly.
Obviously, when using the function, the first thing you need to do is call the
sharpsightlabs.com 73
To do this, you use the code np.linspace (assuming that you’ve imported
Numpy as np).
stop, and num. These are 3 parameters that you’ll use most frequently with the
linspace function. There are also a few other optional parameters that you can
use.
sharpsightlabs.com 74
The parameters of Numpy linspace
There are several parameters that help you control the linspace function: start,
To understand these parameters, let’s take a look again at the following visual:
start
sharpsightlabs.com 75
So if you set start = 0, the first number in the new nd.array will be 0.
stop
In most cases, this will be the last value in the range of numbers. Having said
that, if you modify the parameter and set endpoint = False, this value will
not be included in the output array. (See the examples below to understand how
this works.)
num (optional)
The num parameter controls how many total items will appear in the output
array.
For example, if num = 5, then there will be 5 total items in the output array. If,
num = 10, then there will be 10 total items in the output array, and so on.
This parameter is optional. If you don’t provide a value for num, then
sharpsightlabs.com 76
endpoint (optional)
The endpoint parameter controls whether or not the stop value is included in
If endpoint = True, then the value of the stop parameter will be included as
If endpoint = False, then the value of the stop parameter will not be
included.
dtype (optional)
Just like in many other Numpy functions, with np.linspace, the dtype
parameter controls the data type of the items in the output array.
If you don’t specify a data type, Python will infer the data type based on the
sharpsightlabs.com 77
If you do explicitly use this parameter, however, you can use any of the available
Keep in mind that you won’t use all of these parameters every time that you use
Moreover, start, stop, and num are much more commonly used than endpoint
and dtype.
Also keep in mind that you don’t need to explicitly use the parameter names.
You can write code without the parameter names themselves; you can add the
Here’s an example:
np.linspace(0, 100, 5)
sharpsightlabs.com 78
This code is functionally identical to the code we used in our previous examples:
The main difference is that we did not explicitly use the start, stop, and num
When you don’t use the parameter names explicitly, Python knows that the first
number (0) is supposed to be the start of the interval. It know that 100 is
supposed to be the stop. And it knows that the third number (5) corresponds to
the num parameter. Again, when you don’t explicitly use the parameter names,
You’ll see people do this frequently in their code. People will commonly exclude
the parameter names in their code and use positional arguments instead.
Although I realize that it’s a little faster to write code with positional arguments, I
think that it’s clearer to actually use the parameter names. As a best practice,
sharpsightlabs.com 79
Examples: how to use numpy linspace
Now that you’ve learned how the syntax works, and you’ve learned about each
A quick example
An example like this would be useful if you’re working with percents in some
metrics for a machine learning classifier, you might use this code to construct
part of your plot. Explaining how to do that is beyond the scope of this post, so
sharpsightlabs.com 80
Create interval between 0 and 100, in breaks of 10
10.
The code for this is almost identical to the prior example, except we’re creating
Since it’s somewhat common to work with data with a range from 0 to 100, a
sharpsightlabs.com 81
By default (if you don’t set any value for endpoint), this parameter will have the
default value of True. That means that the value of the stop parameter will be
However, if you set endpoint = False, then the value of the stop parameter
Here’s an example.
But because we’re also setting endpoint = False, 5 will not be included as
On the contrary, the output nd.array contains 4 evenly spaced values (i.e.,
sharpsightlabs.com 82
Personally, I find that it’s a little un-intuitive to use endpoint = False, so I
don’t use it often. But if you have a reason to use it, this is how to do it.
data type from the other input arguments. You’ll notice that in many cases, the
If you want to manually specify the data type, you can use the dtype parameter.
identical to how you specify the data type with np.array, specify the data type
Essentially, you use the dtype parameter and indicate the exact Python or
Numpy data type that you want for the output array:
sharpsightlabs.com 83
In this case, when we set dtype = int, the linspace function produces an
Again, Python and Numpy have a variety of available data types, and you can
If you’re familiar with Numpy, you might have noticed that np.linspace is
The essential difference between Numpy linspace and Numpy arange is that
linspace enables you to control the precise end value, whereas arange gives
you more direct control over the increments between values in the sequence.
To be clear, if you use them carefully, both linspace and arange can be used
to create evenly spaced sequences. To a large extent, these are two similar
different tools for creating sequences, and which you use will be a matter of
sharpsightlabs.com 84
How to Use the Numpy Sum Function
This section will show you how to use the Numpy sum function (sometimes
called np.sum).
In the section, I’ll explain what the function does. I’ll also explain the syntax of
the function step by step. Finally, I’ll show you some concrete examples so you
Let’s very quickly talk about what the Numpy sum function does.
Essentially, the Numpy sum function sums up the elements of an array. It just
takes the elements within a Numpy array (an ndarray object) and adds them
together.
sharpsightlabs.com 85
Having said that, it can get a little more complicated. It’s possible to also add up
the rows or add up the columns of an array. This will produce a new array object
Further down in this tutorial, I’ll show you examples of all of these cases, but
first, let’s take a look at the syntax of the np.sum function. You need to
straightforward syntactically.
We typically call the function using the syntax np.sum(). Note that this
assumes that you’ve imported numpy using the code import numpy as np.
Then inside of the np.sum() function there are a set of parameters that enable
sharpsightlabs.com 86
The parameters of Numpy sum
The Numpy sum function has several parameters that enable you to control the
Although technically there are 6 parameters, the ones that you’ll use most often
are a, axis, and dtype. I’ve shown those in the image above.
sharpsightlabs.com 87
a (required)
The a = parameter specifies the input array that the sum() function will operate
on. It is essentially the array of elements that you want to sum up.
ndarray object).
Having said that, technically the np.sum function will operate on any array like
np.sum will also operate on Python tuples, Python lists, and other structures
axis (optional)
The axis parameter specifies the axis or axes upon which the sum will be
performed.
Does that sound a little confusing? Don’t feel bad. Many people think that array
sharpsightlabs.com 88
I’ll show you some concrete examples below. You can learn about Numpy axes
elsewhere in this book, and the examples will clarify what an axis is, but let me
axis. This is sort of like the Cartesian coordinate system, which has an x-axis
and a y-axis. The different “directions” – the dimensions – can be called axes.
the dimensions are the rows and columns. Again, we can call these dimensions,
Every axis in a numpy array has a number, starting with 0. In this way, they are
So the first axis is axis 0. The second axis (in a 2-d array) is axis 1. For multi-
sharpsightlabs.com 89
Critically, you need to remember that the axis 0 refers to the rows. Axis 1 refers
to the columns.
Why is this relevant to the Numpy sum function? It matters because when we
use the axis parameter, we are specifying an axis along which to sum up the
values.
sharpsightlabs.com 90
If you’re still confused about this, don’t worry. There is an example further down
dtype (optional)
The dtype parameter enables you to specify the data type of the output of
np.sum.
So for example, if you set dtype = 'int', the np.sum function will produce a
Numpy array of integers. If you set dtype = 'float', the function will
Python and Numpy have a variety of data types available, so review the
documentation to see what the possible arguments are for the dtype
parameter.
out (optional)
The out parameter enables you to specify an alternative array in which to put
sharpsightlabs.com 91
Note that the out parameter is optional.
keepdims (optional)
The keepdims parameter enables you to keep the number of dimensions of the
This might sound a little confusing, so think about what np.sum is doing. When
summarizing the values. It either sums up all of the values, in which case it
collapses down an array into a single scalar value. Or (if we use the axis
we use the Numpy sum function, the output should have a reduced number of
dimensions.
But, it’s possible to change that behavior. If we set keepdims = True, the
axes that are reduced will be kept in the output. So if you use np.sum on a 2-
dimensional array and set keepdims = True, the output will be in the form of
a 2-d array.
sharpsightlabs.com 92
Still confused by this? Don’t worry. I’ll show you an example of how keepdims
works below.
initial (optional)
The initial parameter enables you to set an initial value for the sum.
Ok, now that we’ve examined the syntax, lets look at some concrete examples. I
think that the best way to learn how a function works is to look at and play with
import numpy as np
sharpsightlabs.com 93
Sum the elements of a 1-d array with np.sum
We’re going to create a simple 1-dimensional Numpy array using the np.array
function.
np_array_1d = np.array([0,2,4,6,8,10])
If we print this out with print(np_array_1d), you can see the contents of
this ndarray:
[0 2 4 6 8 10]
Now that we have our 1-dimensional array, let’s sum up the values.
Doing this is very simple. We’re going to call the Numpy sum function with the
code np.sum(). Inside of the function, we’ll specify that we want it to operate
sharpsightlabs.com 94
np.sum(np_array_1d)
30
Notice that we’re not using any of the function parameters here. This is as
simple as it gets.
When operating on a 1-d array, np.sum will basically sum up all of the values
and produce a single scalar quantity … the sum of the values in the input array.
sharpsightlabs.com 95
Sum the elements of a 2-d array with np.sum
Syntactically, this is almost exactly the same as summing the elements of a 1-d
array.
Basically, we’re going to create a 2-dimensional array, and then use the Numpy
Let’s first create the 2-d array using the np.array function:
np_array_2x3 = np.array([[0,2,4],[1,3,5]])
columns.
If we print this out using print(np_array_2x3), you can see the contents:
sharpsightlabs.com 96
[[0 2 4]
[1 3 5]]
Next, we’re going to use the np.sum function to add up all of the elements of
This is very straight forward. We’re just going to call np.sum, and the only
argument will be the name of the array that we’re going to operate on,
np_array_2x3:
np.sum(np_array_2x3)
15
sharpsightlabs.com 97
Essentially, the Numpy sum function is adding up all of the values contained
within np_array_2x3. When you add up all of the values (0, 2, 4, 1, 3, 5), the
This is very straightforward. When you use the Numpy sum function without
specifying an axis, it will simply add together all of the values and produce a
Having said that, it’s possible to also use the np.sum function to add up the
sharpsightlabs.com 98
First, let’s just create the array:
np_array_2x3 = np.array([[0,2,4],[1,3,5]])
following output:
[[0 2 4]
[1 3 5]]
np.sum(np_array_2x3, axis = 0)
sharpsightlabs.com 99
array([1, 5, 9])
When we use np.sum with the axis parameter, the function will sum the values
In particular, when we use np.sum with axis = 0, the function will sum over
the 0th axis (the rows). It’s basically summing up the values row-wise, and
sharpsightlabs.com 100
To understand this, refer back to the explanation of axes earlier in this book.
Remember: axes are like directions along a Numpy array. They are the
Specifically, axis 0 refers to the rows and axis 1 refers to the columns.
So when we use np.sum and set axis = 0, we’re basically saying, “sum the
Also note that by default, if we use np.sum like this on an n-dimensional Numpy
array, the output will have the dimensions n – 1. So in this example, we used
np.sum on a 2-d array, and the output is a 1-d array. (For more control over the
sharpsightlabs.com 101
dimensions of the output array, see the example that explains the keepdims
parameter.)
Similar to adding the rows, we can also use np.sum to sum across the columns.
It works in a very similar way to our prior example, but here we will modify the
First, let’s create the array (this is the same array from the prior example, so if
you’ve already run that code, you don’t need to run this again):
np_array_2x3 = np.array([[0,2,4],[1,3,5]])
This code produces a simple 2-d array with 2 rows and 3 columns.
following output:
sharpsightlabs.com 102
[[0 2 4]
[1 3 5]]
Next, we’re going to use the np.sum function to sum the columns.
np.sum(np_array_2x3, axis = 1)
array([6, 9])
Essentially, the np.sum function has summed across the columns of the input
array.
sharpsightlabs.com 103
Once again, remember: the “axes” refer to the different dimensions of a Numpy
array. Axis 0 is the rows and axis 1 is the columns. So when we set the
columns only. Specifically, we’re telling the function to sum up the values across
the columns.
In the last two examples, we used the axis parameter to indicate that we want
Notice that when you do this it actually reduces the number of dimensions.
sharpsightlabs.com 104
You can see that by checking the dimensions of the initial array, and the the
np_array_2x3.ndim
Now, let’s use the np.sum function to sum across the rows:
How many dimensions does the output have? Let’s check the ndim attribute:
np_array_colsum.ndim
sharpsightlabs.com 105
This produces the following output:
What that means is that the output array, np_array_colsum, has only 1
dimension. But the original array that we operated on, np_array_2x3, has 2
dimensions.
Why?
When we used np.sum with axis = 1, the function summed across the
This is an important point. By default, when we use the axis parameter, the
np.sum function collapses the input from n dimensions and produces an output
of lower dimensions.
sharpsightlabs.com 106
The problem is, there may be situations where you want to keep the number of
dimensions the same. If your input is n dimensions, you may want the output to
also be n dimensions.
setting axis = 1. But we’re also going to use the keepdims parameter to
keep the dimensions of the output the same as the dimensions of the input:
If you take a look a the ndim attribute of the output array you can see that it has
2 dimensions:
np_array_colsum_keepdim.ndim
sharpsightlabs.com 107
This will produce the following:
To understand this better, you can also print the output array with the code
[[6]
[9]]
single column.
This is a little subtle if you’re not well versed in array shapes, so to develop your
sharpsightlabs.com 108
print(np_array_colsum)
[6 9]
at least one of the axes. But when we set keepdims = True, this will cause
np.sum to produce a result with the same dimensions as the original input array.
Again, this is a little subtle. To understand it, you really need to understand the
basics of Numpy arrays, Numpy shapes, and Numpy axes. So if you’re a little
confused, make sure that you study the basics of Numpy arrays … it will make it
sharpsightlabs.com 109
Numpy random seed explained
In this section, I’ll explain how to use the Numpy random seed function, which is
why we need to use Numpy random seed, you actually need to know a little bit
Numpy random seed is simply a function that sets the random seed of the
sharpsightlabs.com 110
Unless you have a background in computing and probability, what I just wrote is
That being the case, let me give you a quick introduction to them …
Here, I want to give you a very quick overview of pseudo-random numbers and
more sense.
number” is fairly self explanatory, and it gives us some insight into what pseudo-
sharpsightlabs.com 111
Let’s just break down the name a little.
Pseudo-random.
It might sound like I’m being a bit sarcastic here, but that’s essentially what they
are. Pseudo-random numbers are numbers that appear to be random, but are
In the interest of clarity though, let’s see if we can get a definition that’s a little
more precise.
number is:
sharpsightlabs.com 112
(Source: https://mathworld.wolfram.com/PseudorandomNumber.html)
as radioactive decay.
randomness/)
sharpsightlabs.com 113
I think that these definitions help quite a bit, and they are a great starting point
Really. Just bear with me. This will make sense soon.
random processes.
Setting aside some rare exceptions, computers are deterministic by their very
design. To quote an article at MIT’s School of Engineering “if you ask the same
sharpsightlabs.com 114
(Source: https://engineering.mit.edu/engage/ask-an-engineer/can-a-computer-
generate-a-truly-random-number/)
Another way of saying this is that if you give a computer a certain input, it will
… And if you later give a computer the same input, it will produce the same
output.
If the input is the same, then the output will be the same.
sharpsightlabs.com 115
The behavior of computers is deterministic …
This introduces a problem: how can you use a non-random machine to produce
random numbers?
Computers solve the problem of generating “random” numbers the same way
As such, they are completely deterministic. However, the numbers that they
sharpsightlabs.com 116
Pseudo-random numbers appear to be random
appear to be random.
Even though the numbers they are completely determined by the algorithm,
For example, here we’ll create some pseudo-random numbers with the Numpy
randint function:
np.random.seed(1)
np.random.randint(low = 1, high = 10, size = 50))
OUT:
[6, 9, 6, 1, 1, 2, 8, 7, 3, 5, 6, 3, 5, 3, 5, 8, 8,
2, 8, 1, 7, 8, 7, 2, 1, 2, 9, 9, 4, 9, 8, 4, 7, 6,
2, 4, 5, 9, 2, 5, 1, 4, 3, 1, 5, 3, 8, 8, 9, 7]
sharpsightlabs.com 117
See any pattern here?
Me neither.
I can assure you though, that these numbers are not random, and are in fact
completely determined by the algorithm. If you run the same code again, you’ll
What I mean is that if you run the algorithm with the same input, it will produce
So you can use pseudo-random number generators to create and then re-create
sharpsightlabs.com 118
Let me show you.
numpy.random.randint.
np.random.seed(0)
np.random.randint(10, size = 5)
array([5, 0, 3, 3, 7])
Simple. The algorithm produced an array with the values [5, 0, 3, 3, 7].
sharpsightlabs.com 119
Generate pseudo-random integers again
Ok.
… and notice that we’re using np.random.seed in exactly the same way …
np.random.seed(0)
np.random.randint(10, size = 5)
OUTPUT:
array([5, 0, 3, 3, 7])
We ran the exact same code, and it produced the exact same output.
sharpsightlabs.com 120
I will repeat what I said earlier: pseudo random number generators produce
Remember what I wrote earlier: computers and algorithms process inputs into
The numpy.random.seed function provides the input (i.e., the seed) to the
sharpsightlabs.com 121
Now you can learn about Numpy random seed.
The “random” numbers generated by Numpy are not exactly random. They are
sharpsightlabs.com 122
We use numpy.random.seed in conjunction with other numpy
functions
from Numpy.
numpy.random namespace.
Numpy.
from an input.
sharpsightlabs.com 123
In fact, there are several dozen Numpy random functions that enable you to
probability distributions.
I’ll show you a few examples of some of these functions in the examples section
of this tutorial.
What this means is that if you provide the same seed, you will get the same
output.
And if you change the seed, you will get a different output.
The output that you get depends on the input that you give it.
sharpsightlabs.com 124
Numpy random seed makes your code repeatable
The important thing about using a seed for a pseudo-random number generator
If you give a pseudo-random number generator the same input, you’ll get the
same output.
There are times when you really want your “random” processes to be
repeatable.
Code that has well defined, repeatable outputs is good for testing.
sharpsightlabs.com 125
Numpy random seed makes your code easier to share
The fact that np.random.seed makes your code repeatable also makes is
easier to share.
I post detailed tutorials about how to perform various data science tasks, and I
When I do this, it’s important that people who read the tutorials and run the code
get the same result. If a student reads the tutorial, and copy-and-pastes the
code exactly, I want them to get the exact same result. This just helps them
check their work! If they type in the code exactly as I show it in a tutorial, getting
the exact same result gives them confidence that they ran the code properly.
Again, in order to get repeatable results when we are using “random” functions
Ok … now that you understand what Numpy random seed is (and why we use
sharpsightlabs.com 126
The syntax of Numpy random seed
There’s essentially only one parameter, and that is the seed value.
So essentially, to use the function, you just call the function by name and then
Note that in this syntax explanation, I’m using the abbreviation “np” to refer to
Numpy. This is a common convention, but it requires you to import Numpy with
the code “import numpy as np.” I’ll explain more about this soon in the
examples section.
sharpsightlabs.com 127
Examples of numpy.random.seed
numpy.random.seed.
Before we look at the examples though, you’ll have to run some code.
To get the following examples to run properly, you’ll need to import Numpy with
import numpy as np
Running this code will enable us to use the alias np in our syntax to refer to
numpy.
sharpsightlabs.com 128
This is a common convention in Numpy. When you read Numpy code, it is
might not realize that you need to import Numpy with the code import numpy as
Now that we’ve imported Numpy properly, let’s start with a simple example.
We’ll generate a single random number between 0 and 1 using Numpy random
random.
Here, we’re going to use Numpy to generate a random number between zero
and one. To do this, we’re going to use the Numpy random random function
(AKA, np.random.random).
np.random.seed(0)
np.random.random()
sharpsightlabs.com 129
OUTPUT:
0.5488135039273248
Note that the output is a float. It’s a decimal number between 0 and 1.
For the record, we can essentially treat this number as a probability. We can
Now that I’ve shown you how to use np.random.random, let’s just run it again
Here, I just want to show you what happens when you use np.random.seed
np.random.seed(0)
np.random.random()
sharpsightlabs.com 130
OUTPUT:
0.5488135039273248
Notice that the number is exactly the same as the first time we ran the code.
Essentially, if you execute a Numpy function with the same seed, you’ll get the
same result.
Next, we’re going to use np.random.seed to set the number generator before
and 99.
sharpsightlabs.com 131
np.random.seed(74)
np.random.randint(low = 0, high = 100, size = 5)
OUTPUT:
Numpy random seed sets the seed for the pseudo-random number generator,
and then Numpy random randint selects 5 numbers between 0 and 99.
Let’s just run the code so you can see that it reproduces the same output if you
sharpsightlabs.com 132
np.random.seed(74)
np.random.randint(low = 0, high = 100, size = 5)
OUTPUT:
Once again, as you can see, the code produces the same integers if we use the
same seed. As noted previously in the tutorial, Numpy random randint doesn’t
It’s also common to use the NP random seed function when you’re doing
random sampling.
sharpsightlabs.com 133
Let’s take a look.
np.random.seed(0)
np.random.choice(a = [1,2,3,4,5,6], size = 5)
OUTPUT:
array([5, 6, 1, 4, 4])
As you can see, we’ve basically generated a random sample from the list of
In the output, you can see that some of the numbers are repeated. This is
sharpsightlabs.com 134
Rerun the code
I want to re-run the code just so you can see, once again, that the primary
reason we use Numpy random seed is to create results that are completely
repeatable.
Ok, here is the exact same code that we just ran (with the same seed).
np.random.seed(0)
np.random.choice(a = [1,2,3,4,5,6], size = 5)
OUTPUT:
array([5, 6, 1, 4, 4])
Once again, we used the same seed, and this produced the same output.
sharpsightlabs.com 135
Frequently asked questions about np.random.seed
Now that we’ve taken a look at some examples of using Numpy random seed to
questions.
Dude. I just wrote 2000 words explaining what the np.random.seed function
I’ll summarize.
sharpsightlabs.com 136
These pseudo-random number generators are algorithms that produce numbers
The code np.random.seed(0) enables you to provide a seed (i.e., the starting
Numpy then uses the seed and the pseudo-random number generator in
… so if what I just wrote doesn’t make sense, please return to the top of the
sharpsightlabs.com 137
What number should I use in numpy random seed?
other number.
For the most part, the number that you use inside of the function doesn’t really
make a difference.
You just need to understand that using different seeds will cause Numpy to
Here’s a quick example. We’re going to use Numpy random seed in conjunction
with Numpy random randint to create a set of integers between 0 and 99.
sharpsightlabs.com 138
In the first example, we’ll set the seed value to 0.
np.random.seed(0)
np.random.randint(99, size = 5)
and 99. Note that if you run this code again with the exact same seed (i.e. 0),
np.random.seed(1)
np.random.randint(99, size = 5)
sharpsightlabs.com 139
OUTPUT:
With a different seed, Numpy random randint created a different set of integers.
Everything else is the same. The code for np.random.randint is the same.
sharpsightlabs.com 140
If you use a function from the numpy.random namespace (like
random see first, Python will actually still use numpy.random.seed in the
background. Numpy will generate a seed value from a part of your computer
If you don’t explicitly set a seed, your code will not have repeatable outputs.
Numpy will generate a seed on its own, but that seed might change moment to
moment. This will make your outputs different every time you run it.
you should use it if you want your code to have repeatable outputs.
sharpsightlabs.com 141
What’s the difference between np.random.seed and
np.random.RandomState?
Ok.
Confused?
That’s okay …. this answer is a little technical and it requires you to know a little
about how Numpy is structured on the back end. It also requires you to know a
little bit about programming concepts like “global variables.” If you’re a relative
data science beginner, the details that you need to know might be over your
head.
sharpsightlabs.com 142
The important thing is that Numpy random seed is probably sufficient if you’re
analytics, data science, and scientific computing, but you need to learn more
sharpsightlabs.com 143
How to use numpy random normal in
Python
This tutorial will cover the Numpy random normal function (AKA,
np.random.normal).
If you’re doing any sort of statistics or data science in Python, you’ll often need
to work with random numbers. And in particular, you’ll often need to work with
The Numpy random normal function generates a sample of numbers drawn from
This tutorial will show you how the function works, and will show you how to use
the function.
As you've learned in this book, Numpy is a package for working with numerical
sharpsightlabs.com 144
As I mentioned previously, Numpy has a variety of tools for working with
numerical data. In most cases, Numpy’s tools enable you to do one of two
The Numpy random normal function enables you to create a Numpy array that
Hopefully you’re familiar with normally distributed data, but just as a refresher,
Normally distributed data is shaped sort of like a bell, so it’s often called the
“bell curve.”
sharpsightlabs.com 145
Now that I’ve explained what the np.random.normal function does at a high
Note that in the following illustration and throughout this blog post, we will
assume that you’ve imported Numpy with the following code: import numpy
Let me explain this. Typically, we will call the function with the name
Inside of the function, you’ll notice 3 parameters: loc, scale, and size.
sharpsightlabs.com 146
Let’s talk about each of those parameters.
The np.random.normal function has three primary parameters that control the
loc
sharpsightlabs.com 147
This parameter defaults to 0, so if you don’t use this parameter to specify the
scale
The scale parameter controls the standard deviation of the normal distribution.
size
The size parameter controls the size and shape of the output.
Remember that the output will be a Numpy array. Numpy arrays can be 1-
sharpsightlabs.com 148
The argument that you provide to the size parameter will dictate the size and
For example, if you specify size = (2, 3), np.random.normal will produce
a numpy array with 2 rows and 3 columns. It will be filled with numbers drawn
Keep in mind that you can create ouput arrays with more than 2 dimensions, but
np.random.randn.
sharpsightlabs.com 149
Just like np.random.normal, the np.random.randn function produces
np.random.normal.
scale = 1.
So this code:
np.random.seed(1)
np.random.normal(loc = 0, scale = 1, size = (3,3))
np.random.seed(1)
np.random.randn(3, 3)
sharpsightlabs.com 150
Examples: how to use the numpy random normal
function
Now that I’ve shown you the syntax the numpy random normal function, let’s
Before you work with any of the following examples, make sure that you run the
following code:
import numpy as np
sharpsightlabs.com 151
Here, we’re going to use np.random.normal to generate a single observation
np.random.normal(1)
This code will generate a single number drawn from the normal distribution with
loc = 0, scale = 1). Remember, if we don’t specify values for the loc and
This code will look almost exactly the same as the code in the previous
example.
sharpsightlabs.com 152
np.random.normal(5)
Here, the value 5 is the value that’s being passed to the size parameter. It
Note as well that because we have not explicitly specified values for loc and
np.random.seed(42)
np.random.normal(size = (2, 3))
sharpsightlabs.com 153
Which produces the output:
So we’ve used the size parameter with the size = (2, 3). This has
generated a 2-dimensional Numpy array with 6 values. This output array has 2
To be clear, you can use the size parameter to create arrays with even higher
dimensional shapes.
To do this, we’ll use the loc parameter. Recall from earlier in the tutorial that the
loc parameter controls the mean of the distribution from which we draw the
Here, we’re going to set the mean of the data to 50 with the syntax loc = 50.
sharpsightlabs.com 154
np.random.seed(42)
np.random.normal(size = 1000, loc = 50)
The full array of values is too large to show here, but here are the first several
You can see at a glance that these values are roughly centered around 50. If you
were to calculate the average using the numpy mean function, you would see
sharpsightlabs.com 155
As noted earlier in the blog post, we can modify the standard deviation by using
In this example, we’ll generate 1000 values with a standard deviation of 100.
np.random.seed(42)
np.random.normal(size = 1000, scale = 100)
And here is a truncated output that shows the first few values:
Notice that we set size = 1000, so the code will generate 1000 values. I’ve
only shown the first few values for the sake of brevity.
It’s a little difficult to see how the data are distributed here, but we can use the
sharpsightlabs.com 156
np.random.seed(42)
np.random.normal(size = 1000, scale = 100).std()
99.695552529463015
Notice that in this example, we have not used the loc parameter. Remember
that by default, the loc parameter is set to loc = 0, so by default, this data is
centered around 0. We could modify the loc parameter here as well, but for the
sharpsightlabs.com 157
Here, we’ll create an array of values with a mean of 50 and a standard deviation
of 100.
np.random.seed(42)
np.random.normal(size = 1000, loc = 50, scale = 100)
I won’t show the output of this operation …. I’ll leave it for you to run it yourself.
Let’s quickly discuss the code. If you’ve read the previous examples in this
We’re defining the mean of the data with the loc parameter. The mean of the
We’re defining the standard deviation of the data with the scale parameter.
The code size = 1000 indicates that we’re creating a Numpy array with 1000
values.
sharpsightlabs.com 158
That’s it. That’s how you can use the Numpy random normal function to create
sharpsightlabs.com 159
Do you want to master Numpy?
In this book, I’ve shown you the basics of how to use Numpy.
Moreover, there’s a lot more to learn if you want to master data science in
If and when you’re ready to take the next step, you should enroll in one of our
https://www.sharpsightlabs.com/course-directory/
sharpsightlabs.com 160