Pytorch

PyTorch Tutorial
Willie Chang
Pranay Manocha
Installing PyTorch
• 💻💻 On your own computer
• Anaconda/Miniconda: conda install pytorch -c pytorch
• Others via pip: pip3 install torch
• 🌐🌐 On Princeton CS server (ssh cycles.cs.princeton.edu)

• Non-CS students can request a class account.
• Miniconda is highly recommended, because:
• It lets you manage your own Python installation
• It installs locally; no admin privileges required
• It’s lightweight and fits within your disk quota
• Instructions:
• wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
• chmod u+x ./Miniconda3-latest-Linux-x86_64.sh
• ./Miniconda3-latest-Linux-x86_64.sh
• After Miniconda is installed: conda install pytorch -c pytorch
Writing code
• Up to you; feel free to use emacs, vim, PyCharm, etc. if you want.
• Our recommendations:
Jupyter Notebook Also try

Jupyter Lab!
VS Code
• Install: conda/pip3 install jupyter • Install the Python extension.
• 💻💻 Run on your computer • 🌐🌐 Install the Remote
• jupyter notebook Development extension.
• 🌐🌐 Run on Princeton CS server • Python files can be run like
• Pick any 4-digit number, say 1234 Jupyter notebooks by delimiting
• 🌐🌐 hostname -s cells/sections with #%%
• 🌐🌐 jupyter notebook --no-browser --port=1234 • Debugging PyTorch code is just
• 💻💻 like debugging any other Python
ssh -N -L 1234:localhost:1234 __@__.cs.princeton.edu
• First blank is username, second is hostname code: see Piazza @108 for info.
Why talk about libraries?
• Advantage of various deep learning frameworks
• Quick to develop and test new ideas

• Automatically compute gradients
• Run it all efficiently on GPU to speed up computation
Various Frameworks
• Various Deep Learning Frameworks
• Focus on PyTorch in this session.
Source: CS231n slides

Preview: (and advantages)
• Preview of Numpy & PyTorch & Tensorflow
Computation Graph Numpy Tensorflow PyTorch

Advantages (continued)
• Which one do you think is better?
Advantages (continued)
• Which one do you think is better?
PyTorch!
• Easy Interface − easy to use API. The code execution in this framework is quite easy. Also need a
fewer lines to code in comparison.
• It is easy to debug and understand the code.
• Python usage − This library is considered to be Pythonic which smoothly integrates with the Python
data science stack.
• It can be considered as NumPy extension to GPUs.
• Computational graphs − PyTorch provides an excellent platform which offers dynamic
computational graphs. Thus a user can change them during runtime.
• It includes many layers as Torch.
• It includes lot of loss functions.
• It allows building networks whose structure is dependent on computation itself.
• NLP: account for variable length sentences. Instead of padding the sentence to a
fixed length, we create graphs with different number of LSTM cells based on the sentence’s
length.
PyTorch
• Fundamental Concepts of PyTorch
• Tensors
• Autograd
• Modular structure
• Models / Layers
• Datasets
• Dataloader
• Visualization Tools like
• TensorboardX (monitor training)
• PyTorchViz (visualise computation graph)
• Various other functions
• loss (MSE,CE etc..)
• optimizers
Prepare Train Evaluate
Input Data Model Model
•Load data •Train •Visualise
•Iterate over weights
examples
Tensor
• Tensor?
• PyTorch Tensors are just like numpy arrays, but they can run on GPU.
• Examples:
And more operations like:
Indexing, slicing, reshape, transpose, cross product,

matrix product, element wise multiplication etc...
Tensor (continued)
• Attributes of a tensor 't':
• t= torch.randn(1)
• requires_grad - making a trainable parameter

•By default False
•Turn on:
• t.requires_grad_() or
• t= torch.randn(1,requires_grad=True)
•Accessing tensor value:
• t.data
•Accessing tensor gradient
• t.grad
• grad_fn – history of operations for autograd

• t.grad_fn
Loading Data, Devices and CUDA
• Numpy arrays to PyTorch tensors
• torch.from_numpy(x_train)
• Returns a cpu tensor!
• PyTorch tensor to numpy
• t.numpy()
• Using GPU acceleration
• t.to()
• Sends to whatever device (cuda or cpu)
• Fallback to cpu if gpu is unavailable:
• torch.cuda.is_available()
• Check cpu/gpu tensor OR numpy array ?
• type(t) or t.type()
• returns
• numpy.ndarray
• torch.Tensor
• CPU - torch.cpu.FloatTensor
• GPU - torch.cuda.FloatTensor
*Assume 't' is a tensor

Autograd
• Autograd
• Automatic Differentiation Package
• Don’t need to worry about partial differentiation, chain rule etc..
• backward() does that
• loss.backward()
• Gradients are accumulated for each step by default:
• Need to zero out gradients after each update
• t.grad.zero_()
*Assume 't' is a tensor

Autograd (continued)
• Manual Weight Update - example
Optimizer
• Optimizers (optim package)
• Adam, Adagrad, Adadelta, SGD etc..
• Manually updating is ok if small number of weights
• Imagine updating 100k parameters!
• An optimizer takes the parameters we want to update, the learning rate we want
to use (and possibly many other hyper-parameters as well!)
and performs the updates
Loss
• Loss
• Various predefined loss functions to choose from
• L1, MSE, Cross Entropy …...
Model
• In PyTorch, a model is represented by a regular Python class that inherits

from the Module class.
• Two components
• __init__(self): it defines the parts that make up the model —in our
case, two parameters, a and b
• forward(self, x): it performs the actual computation, that is, it outputs a prediction,
given the input x
Model (example)
• Example:
• Properties:
• model = ManualLinearRegression()
• model.state_dic() - returns a dictionary of trainable parameters with their current
values
• model.parameters() - returns a list of all trainable parameters in the model
• model.train() or model.eval()
Putting things together
• Sample Code in practice
Complex Models
• Complex Model Class
• Predefined 'layer' modules
• 'Sequential' layer modules

Dataset
• Dataset
• In PyTorch, a dataset is represented by a regular Python class that inherits
from the Dataset class. You can think of it as a kind of a Python list of
tuples, each tuple corresponding to one point (features, label)
• 3 components:
• __init__(self)
• __get_item__(self, index)
• __len__(self)
• Unless the dataset is huge
(cannot fit in memory), you don’t
explictly need to define this class.
Use TensorDataset
Dataloader
• Dataloader
• What happens if we have a huge dataset? Have to train in 'batches'
• Use PyTorch's Dataloader class!
• We tell it which dataset to use, the desired mini-batch size and if we’d like to shuffle it
or not. That’s it!
• Our loader will behave like an iterator, so we can loop over it and fetch a different
mini-batch every time.
Dataloader (example)
• Sample Code in Practice:
Split Data
• Random Split for Train, Val and Test Set
• random_split()
Saving / Loading Weights
Method 1
• Only inference/evaluation – save only state_dict
• Save:
• torch.save(model.state_dict(), PATH)
• Load:
• model = TheModelClass(*args, **kwargs)
• model.load_state_dict(torch.load(PATH))
• model.eval()
• CONVENTION IS TO SAVE MODELS USING EITHER A .PT OR A .PTH EXTENSION
https://pytorch.org/tutorials/beginner/saving_loading_models.html
Saving / Loading Weights (continued)
• Method 2
• Checkpoint – resume training / inference
• Save:
• torch.save({
'epoch': epoch,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'loss': loss,
...
}, PATH)
• Load:
• model = TheModelClass(*args, **kwargs)
optimizer = TheOptimizerClass(*args, **kwargs)
checkpoint = torch.load(PATH)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']
model.eval()
# - or -
model.train()
Evaluation
• Two important things:
• torch.no_grad()
• Don’t store the history of all computations
• eval()
• Tell compiler which mode to run on.
Visualization
• TensorboardX (visualise training)
• PyTorchViz (visualise computation graph)
https://github.com/lanpa/tensorboardX/
Visualization (continued)
• PyTorchViz
https://github.com/szagoruyko/pytorchviz
References
• Important References:
• For setting up jupyter notebook on princeton ionic cluster
• https://oncomputingwell.princeton.edu/2018/05/jupyter-on-the-cluster/
• Best reference is PyTorch Documentation
• https://pytorch.org/ and https://github.com/pytorch/pytorch
• Good Blogs: (with examples and code)
• https://lelon.io/blog/2018/02/08/pytorch-with-baby-steps
• https://www.tutorialspoint.com/pytorch/index.htm
• https://github.com/hunkim/PyTorchZeroToAll
• Free GPU access for short time:
• Google Colab provides free Tesla K80 GPU of about 12GB. You can run the session in
an interactive Colab Notebook for 12 hours.
• https://colab.research.google.com/
Misc
• Dynamic VS Static Computation Graph
Epoch 1
a b x_train_tensor
Misc
a b x_train_tensor
yhat
Misc
a b x_train_tensor
loss
yhat y_train_tensor
loss
Misc
Epoch 2
a b x_train_tensor
Misc
a b x_train_tensor
yhat
Misc
a b x_train_tensor
loss
yhat y_train_tensor
loss
Misc
Building the graph and computing the

graph happen at the same time.
Seems inefficient, especially if we are

building the same graph over and over
again...
Misc
• Alternative : Static Computation Graphs:
Alternative: Static
graphs
Step 1: Build
computational graph
describing our
computation (including
finding paths for
backprop)
Step 2: Reuse the same

graph on every iteration

Pytorch

Uploaded by

Pytorch

Uploaded by

PyTorch Tutorial

• 🌐🌐 On Princeton CS server (ssh cycles.cs.princeton.edu)

Jupyter Notebook Also try

• Quick to develop and test new ideas

• Focus on PyTorch in this session.

Source: CS231n slides

Computation Graph Numpy Tensorflow PyTorch

And more operations like:

Indexing, slicing, reshape, transpose, cross product,

• requires_grad - making a trainable parameter

• grad_fn – history of operations for autograd

*Assume 't' is a tensor

*Assume 't' is a tensor

• In PyTorch, a model is represented by a regular Python class that inherits

• 'Sequential' layer modules

• CONVENTION IS TO SAVE MODELS USING EITHER A .PT OR A .PTH EXTENSION

Building the graph and computing the

Seems inefficient, especially if we are

Step 2: Reuse the same

You might also like