import numpy as np
import torch
= np.array([1.0, 2.0])
a = torch.from_numpy(a) # NumPy → PyTorch (shares memory!)
b = b.numpy() # PyTorch → NumPy (also shares memory!) c
PyTorch
Introduction
PyTorch
, accessed via the torch
package, is a scientific computing framework centered on representing data as tensors and automatic differentiation of tensor operations, making it easy to implement differentiable computing. Deep learning is one of its most popular applications, where differentiation of loss functions to form gradients is key to training models on data.
Tensors
Tensors in torch
are \(n\)-dimensional array objects, similar to numpy
arrays. In fact, you can easily switch:
However, torch
tensor objects contain additional information and methods related to automatic differentiation and where it is located in system memory.
Autograd
Deep learning is nearly impossible without the ability to automatically differentiate models. Before that, demonstrating the value of a new model required careful derivation and implementation of its gradients. Now, all of that work has been automated away.
import torch
= torch.Tensor([0.8762])
t print(t) # This variable is not considered for gradient computations
= True
t.requires_grad print(t) # Now it is
# calculate gradient (of sigmoid(t))
torch.sigmoid(t).backward() = torch.sigmoid(t) * (1 - torch.sigmoid(t)) # manually calculated gradient of sigmoid
manual_grad print("torch grad: ", t.grad.item()," manual gradient: ", manual_grad.item())
tensor([0.8762])
tensor([0.8762], requires_grad=True)
torch grad: 0.20754991471767426 manual gradient: 0.20754991471767426
Note that \(\sigma(x) = \frac{1}{1+e^{-x}}\) has derivative \(\sigma'(x) = \frac{e^{-x}}{(1+e^{-x})^2} = \sigma(x) (1-\sigma(x))\)
Application: Linear Model
We fit a linear model to scalar input-output data in the simple models lesson using gradient descent ‘by hand’. Here, we use PyTorch
. There are two differences from the pattern we saw at the end of the lesson on simple models:
- The model is defined separately from the loss.
- We don’t implement a gradient function, autodiff will derive it for us.
## Plot iterates and loss under GD using torch
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
# 1. Create data
## First as numpy array
= np.linspace(-1,1,100)
x = 4*x + 5+0.5
y_noise_free = y_noise_free+0.5*np.random.randn(100)
y
## Create tensors, as 2D arrays using unsqueeze
= torch.tensor(x, dtype=torch.float32).unsqueeze(1)
X_train = torch.tensor(y, dtype=torch.float32).unsqueeze(1)
y_train
print(X_train.shape) ## two dimensions now
print(y_train.shape)
# 2. Define Linear model
class Lin(nn.Module):
def __init__(self):
super().__init__()
self.net = nn.Linear(1, 1)
def forward(self, x):
return self.net(x)
= Lin()
model
# 3. Define loss
= nn.MSELoss() # Mean Squared Error
criterion
print("Start training: ")
# 4. Training loop
for epoch in range(5000):
# Tell pytorch that model computations are for training now
model.train() # clear accumulated grads
model.zero_grad() = model(X_train) # predict values ...
outputs = criterion(outputs, y_train) # ... so we can compute loss
loss # Automatically get gradients of loss wrt model as it is right now
loss.backward() with torch.no_grad(): ## Manually implementing gradient descent. Don't want the update to be tracked by autograd
for param in model.parameters():
-= 0.005 * param.grad # θ ← θ − α∇θ L
param
if (epoch + 1) % 500 == 0: ## Show us the loss as training proceeds
print(f"Epoch {epoch+1}/5000 - Loss: {loss.item():.4f}")
# 5. Evaluation
eval() # tell pytorch that we will use the model in evaluation mode
model.with torch.no_grad():
= model(X_train).numpy() # convert predictions to numpy array to work with matplotlib
outputs ="noise data")
plt.scatter(x,y,label="true (noise-free)")
plt.scatter(x,y_noise_free,label0],label="predicted")
plt.scatter(x,outputs[:, plt.legend()
torch.Size([100, 1])
torch.Size([100, 1])
Start training:
Epoch 500/5000 - Loss: 0.4852
Epoch 1000/5000 - Loss: 0.2297
Epoch 1500/5000 - Loss: 0.2212
Epoch 2000/5000 - Loss: 0.2210
Epoch 2500/5000 - Loss: 0.2210
Epoch 3000/5000 - Loss: 0.2210
Epoch 3500/5000 - Loss: 0.2210
Epoch 4000/5000 - Loss: 0.2210
Epoch 4500/5000 - Loss: 0.2210
Epoch 5000/5000 - Loss: 0.2210