CS236 Introduction To PyTorch

Introduction to PyTorch
CS236 Section, Autumn 2024

Honglin Chen
Machine learning framework
Deep learning primitives such as data loading, NN layer types,

activations, loss functions, and optimizers
Hardware acceleration on NVIDIA GPUs
What is Pytorch? Libraries for vision, NLP, and audio applications
Research prototyping
A machine learning framework that
accelerates the path from research Models are Python code, Automatic differentiation, and eager mode
prototyping to production deployment
Production deployment
TorchScript, TorchServe, quantization

Motivations
Python
NumPy
Building Blocks
Tensors
Operations
Overview
Modules
Examples
MNIST
Beyond PyTorch
Tools
High Level Libraries
Domain Speciﬁc Libraries
Motivations
Python vs. NumPy
X = [1] * 10000 X = np.full((10000,), 1)

Y = [0.5] * 10000 Y = np.full((10000,), 0.5)
Z = [None] * 10000 Z = X * Y
for i in range(10000):
Z[i] = X[i] * Y[i]
# 2.772092819213867 ms # 0.08273124694824219 ms
# Interpreter Overhead # Low Level Implementation
# 64 bit # Vectorization
Motivations
NumPy vs. PyTorch
X = np.full((10000,), 1) X = torch.full((10000,), 1).cuda()

Y = np.full((10000,), 0.5) Y = torch.full((10000,), 0.5).cuda()
Z = X * Y Z = X * Y
# 0.3185272216796875 ms
# GPU Acceleration
Z.sum().backward()
# 0.08273124694824219 ms dX = X.grad
# Low Level Implementation
# Vectorization # Automatic Differentiation
Building Blocks
TENSORS
Building Blocks
Tensors / Initialization
torch.tensor([5., 3.])
tensor([ 5., 3.,]) # defaults to
torch.float32
torch.from_numpy(np.array([5., 3.]))
tensor([ 5., 3.,], dtype=torch.float64) #
because numpy defaults to 64bit
torch.tensor([5., 3.]).numpy()
array([5., 3.], dtype=float32)
Building Blocks
torch.ones(5, 3)
tensor([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]], dtype=torch.float64)
Building Blocks
torch.randn(5, 3)
tensor([[ 0.2349, -0.0427, -0.5053],

[ 0.6455, 0.1199, 0.4239],
[ 0.1279, 0.1105, 1.4637],
[ 0.4259, -0.0763, -0.9671],
[ 0.6856, 0.5047, 0.4250]])
Building Blocks
torch.ones_like(tensor)
Input: tensor([[ 0.2349, -0.0427, -0.5053],
[ 0.6455, 0.1199,
0.4239]])
Output: tensor([[1., 1., 1.],
[1., 1., 1.],
dtype=torch.float64)
Building Blocks
torch.empty(5, 3)
tensor([[ 0.0000e+00, 2.5244e-29, 0.0000e+00],
[ 2.5244e-29, 1.4569e-19, 2.7517e+12],
[ 7.5338e+28, 3.0313e+32, 6.3828e+28],
[ 1.4603e-19, 1.0899e+27, 6.8943e+34],
[ 1.1835e+22, 7.0976e+22, 1.8515e+28]])
# The values are not initialized

Building Blocks
Tensors / Indexing & Reshaping
torch.tensor([[5., 3.]])[0, :]
tensor([ 5., 3.,])
torch.tensor([[5., 3.]]).view(-1) # infer

dimension size
torch.tensor([[5., 3.]]).view(2)
tensor([ 5., 3.,])
torch.tensor([[5., 3.]]).size()
torch.Size([1, 2])
Building Blocks
Tensors / Broadcasting
X = torch.ones((3, 3, 3))
Y = torch.ones((1, 1, 3))
Z = X * Y
Z.size()
torch.Size([3, 3, 3])
#
https://pytorch.org/docs/stable/notes/broad
casting.html
Building Blocks
Tensors / Devices
if torch.cuda.is_available():
device = torch.device("cuda") # a CUDA device object
x = torch.ones(2, device=device) # directly create a tensor on GPU
y = torch.ones(2).to(device) # or just use strings
`.to("cuda")`
z = x + y
print(z) # z is on GPU
print(z.to("cpu", torch.double)) # to(‘cpu’) moves array to CPU
# `x.cuda()` and `x.cpu()` also works

Building Blocks
Operations / Primitives
torch.tensor([5., 3.]) + torch.tensor([3., 5.])

tensor([ 8., 8.,])
z = torch.add(x, y)
torch.add(x, y, out=z)
y = y.add_(x) # inplace y += x
torch.tanh(y)
torch.stack([x, y])
# https://pytorch.org/docs/stable/torch.html
Building Blocks
Operations / Functional
import torch.nn.functional as F
X = torch.randn((64, 3, 256, 256))

W = torch.randn((8, 3, 3, 3)
out = F.conv2d(X, W, stride=1, padding=1)
# Like SciPy
# https://pytorch.org/docs/stable/nn.functional.html
Building Blocks
Operations / Automatic Differentiation
Computation as a graph built at runtime

x 2
x = torch.ones(2, 2, requires_grad=True)
tensor([[1., 1.],
[1., 1.]], requires_grad=True)
∂+
y = x + 2
tensor([[3., 3.],
[3., 3.]], grad_fn=<AddBackward0>)
y
Building Blocks
x 2
z = y * 3
out = z.mean()
∂+
tensor(9., grad_fn=<MeanBackward1>)
y 3
out.backward() # Must be scalar

∂*
print(x.grad) # Only leaf nodes have grad
Gradient w.r.t. the input Tensors is computed ∂mean

step-by-step from loss to the top in reverse
out
Building Blocks
x.requires_grad # True
(x ** 2).requires_grad # True
# Keeping track of activations is expensive
with torch.no_grad():
(x ** 2).requires_grad # False
(x.detach() ** 2).requires_grad # False

Building Blocks
Operations / nn
import torch.nn as nn import torch.nn.functional as F
X = torch.ones((64, 3, 256, 256)) X = torch.randn((64, 3, 256, 256))

W = torch.randn((8, 3, 3, 3)
conv = nn.Conv2D(in_channels=3,
out_channels=8, out = F.conv2d(X, W,
kernel_size=3, stride=1, padding=1)
stride=1,
padding=1) # Inherits from nn.Module
# Implemented using functional
out = conv(img) # Stores internal states
Building Blocks
Operations / Module
import torch.nn as nn # Move the module to GPUs

conv.cuda()
X = torch.ones((64, 3, 256, 256))
# Saves states
conv = nn.Conv2D(in_channels=3, conv.state_dict()
out_channels=8,
kernel_size=3, # Saves trainable states
stride=1, conv.parameters()
padding=1)
# Recursively visit child modules
conv.apply(weight_init)
Examples
MNIST
Preprocessing
Dataloader
Example Network
MNIST
Optimizer
Training
Examples
MNIST / Preprocessing
import torchvision.transforms as transforms
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
# Convert to Torch Tensor and perform normalization

# https://pytorch.org/vision/stable/transforms.html
# e.x Color Jitter, Five Crops
Examples
MNIST / Dataloader
Import torch
import torchvision
trainset = torchvision.datasets.CIFAR10(
root='./data', train=True,
download=True, transform=transform)
# Dataloaders are python iterators

trainloader = torch.utils.data.DataLoader(
trainset, batch_size=8,
shuffle=True, num_workers=2)
Examples
MNIST / Network
import torch.nn as nn
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
Examples
MNIST / Network
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
...
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = torch.flatten(self.pool(F.relu(self.conv2(x))))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
return self.fc3(x)
Examples
MNIST / Optimizer
import torch.optim as optim
# Instantiate nn.Module (Use default weights)

net = Net().to(“cuda”)
# Define loss function

criterion = nn.CrossEntropyLoss()
# Create optimizer: https://pytorch.org/docs/stable/optim.html

optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
Examples
MNIST / Training
net.train() # Set to training mode (there is also `net.eval()`)
for epoch in range(2):

for inputs, labels in trainloader:
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs.to(“cuda”))
loss = criterion(outputs, labels.to(“cuda”))
loss.backward()
optimizer.step()
Examples
MNIST / Recap
... transforms.Compose( ... # Define preprocessing transforms

... torch.utils.data.DataLoader( ... # Create Dataloader
... def Net(nn.Module): ... # Define Network
... criterion = nn.CrossEntropyLoss() ... # Define loss function
... optim.SGD(net.parameters(), ... # Create Optimizer
... for x, y in trainloader: ... # Iterate over Dataloader
... outputs = net(inputs) # Forward Pass
... criterion(outputs, labels) ... # Compute Loss
... optimizer.zero_grad() ... # Zero out gradients
... loss.backward() ... # Back Propagate
... optimizer.step() ... # Update weights
Beyond PyTorch
Tools / Keep Track of experiments, artifacts
Beyond PyTorch
High Level Libraries / Distributed & Mixed Precision Training
Beyond PyTorch
Domain Speciﬁc Libraries / Graph, RL, Probabilistic Programming

CS236 Introduction To PyTorch

Uploaded by

CS236 Introduction To PyTorch

Uploaded by

Introduction to PyTorch

CS236 Section, Autumn 2024

Deep learning primitives such as data loading, NN layer types,

Hardware acceleration on NVIDIA GPUs

What is Pytorch? Libraries for vision, NLP, and audio applications

prototyping to production deployment

TorchScript, TorchServe, quantization

X = [1] * 10000 X = np.full((10000,), 1)

X = np.full((10000,), 1) X = torch.full((10000,), 1).cuda()

tensor([[ 0.2349, -0.0427, -0.5053],

# The values are not initialized

torch.tensor([[5., 3.]]).view(-1) # infer

# `x.cuda()` and `x.cpu()` also works

torch.tensor([5., 3.]) + torch.tensor([3., 5.])

X = torch.randn((64, 3, 256, 256))

out = F.conv2d(X, W, stride=1, padding=1)

Computation as a graph built at runtime

out.backward() # Must be scalar

Gradient w.r.t. the input Tensors is computed ∂mean

# Keeping track of activations is expensive

(x.detach() ** 2).requires_grad # False

import torch.nn as nn import torch.nn.functional as F

X = torch.ones((64, 3, 256, 256)) X = torch.randn((64, 3, 256, 256))

import torch.nn as nn # Move the module to GPUs

import torchvision.transforms as transforms

# Convert to Torch Tensor and perform normalization

# Dataloaders are python iterators

import torch.optim as optim

# Instantiate nn.Module (Use default weights)

# Define loss function

# Create optimizer: https://pytorch.org/docs/stable/optim.html

net.train() # Set to training mode (there is also `net.eval()`)

for epoch in range(2):

... transforms.Compose( ... # Define preprocessing transforms

You might also like