Cheatsheet - PyTorch

PyTorch is an open-source machine learning library primarily used for deep learning applications. It's known for its flexibility, dynamic computation graph, and Pythonic interface.

1. Tensors: The Building Blocks

Tensors are the fundamental data structure in PyTorch, similar to NumPy arrays but with GPU acceleration and automatic differentiation capabilities.

1.1 Creating Tensors

Operation	Syntax	Example
From data (list, tuple, NumPy array)	`torch.tensor(data, dtype=None, device=None)`	`x = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32)`
Uninitialized tensor	`torch.empty(shape)`	`x = torch.empty(2, 3)`
Tensor with random values (uniform)	`torch.rand(shape)`	`x = torch.rand(2, 2)`
Tensor with random values (normal)	`torch.randn(shape)`	`x = torch.randn(2, 2)`
Tensor of zeros	`torch.zeros(shape, dtype=None)`	`x = torch.zeros(3, 3)`
Tensor of ones	`torch.ones(shape, dtype=None)`	`x = torch.ones(1, 5)`
Tensor of specific value	`torch.full(shape, value, dtype=None)`	`x = torch.full((2, 2), 7)`
Tensor from a range	`torch.arange(start, end, step, dtype=None)`	`x = torch.arange(0, 10, 2)`
Tensor from values in a range	`torch.linspace(start, end, steps, dtype=None)`	`x = torch.linspace(0, 1, 5)`
Tensor with same properties as another	`torch.ones_like(input)`, `torch.zeros_like(input)`, `torch.rand_like(input)`	`x = torch.ones_like(existing_tensor)`

1.2 Tensor Properties & Conversion

Property/Conversion	Syntax	Example
Shape	`tensor.shape` or `tensor.size()`	`x.shape` Output: `torch.Size([2, 2])`
Data type	`tensor.dtype`	`x.dtype` Output: `torch.float32`
Device (CPU/GPU)	`tensor.device`	`x.device` Output: `cpu` (or `cuda:0`)
To NumPy array	`tensor.numpy()`	`np_array = x.numpy()`
From NumPy array	`torch.from_numpy(np_array)`	`x = torch.from_numpy(np_array)`
To CPU	`tensor.cpu()`	`x_cpu = x_gpu.cpu()`
To GPU	`tensor.cuda()`, `tensor.to('cuda')`, `tensor.to(device)`	`x_gpu = x_cpu.cuda()` or `x_gpu = x_cpu.to('cuda')`
Change data type	`tensor.to(dtype)` or `tensor.type(dtype)`	`x = x.to(torch.int64)` or `x = x.type(torch.float64)`
Item (for single-element tensors)	`tensor.item()`	`single_element_tensor.item()`

1.3 Tensor Operations

Arithmetic: +, -, *, /, %, **, torch.add(), torch.sub(), torch.mul(), torch.div(), torch.pow(), etc.
- y = x + y or torch.add(x, y, out=result)
- y.add_(x) (in-place addition)
Indexing/Slicing: Same as NumPy.
- x[0, :], x[:, 1], x[1, 1].item()
Reshaping:
- x.view(new_shape): Returns a new tensor with the same data but different shape. Requires contiguous memory.
- x.reshape(new_shape): Similar to view, but can handle non-contiguous memory by making a copy if necessary.
- x.T or x.transpose(dim0, dim1): Transpose.
- x.permute(dim_order): Rearrange dimensions.
- x.unsqueeze(dim): Add a dimension.
- x.squeeze(dim): Remove a dimension (if size is 1).
Concatenation:
- torch.cat((t1, t2), dim=0)
Stacking:
- torch.stack((t1, t2), dim=0)
Aggregation:
- torch.sum(x), x.sum(), x.sum(dim=0)
- torch.mean(x), x.mean()
- torch.max(x), x.min(), x.argmax(), x.argmin()
Matrix Multiplication:
- torch.matmul(tensor1, tensor2) or tensor1 @ tensor2
- torch.mm(tensor1, tensor2) (for 2D matrices)
- tensor1.mm(tensor2)

2. Autograd: Automatic Differentiation

PyTorch's automatic differentiation engine.

requires_grad=True: Tells PyTorch to track operations on a tensor for gradient computation.
- x = torch.tensor([1., 2.], requires_grad=True)
tensor.grad: Stores gradients of a scalar loss with respect to the tensor.
loss.backward(): Computes gradients. Gradients accumulate, so you often need optimizer.zero_grad().
with torch.no_grad():: Temporarily disable gradient tracking. Useful during evaluation or when updating model weights.
- with torch.no_grad(): pred = model(x)
tensor.detach(): Creates a new tensor that shares the same data as tensor but does not require gradients. It's "detached" from the computation graph.

x = torch.tensor([1., 2.], requires_grad=True)
y = x + 2
z = y * y * 3
out = z.mean() # scalar output

out.backward() # compute gradients
print(x.grad) # d(out)/dx

3. Neural Network Modules (`torch.nn`)

The torch.nn module provides classes for building neural networks.

3.1 Basic Layers

Linear (Fully Connected): nn.Linear(in_features, out_features)
Convolutional:
- nn.Conv1d(in_channels, out_channels, kernel_size, ...)
- nn.Conv2d(in_channels, out_channels, kernel_size, ...)
- nn.Conv3d(in_channels, out_channels, kernel_size, ...)
Pooling:
- nn.MaxPool2d(kernel_size, stride=None, ...)
- nn.AvgPool2d(kernel_size, stride=None, ...)
Activation Functions:
- nn.ReLU(), nn.Sigmoid(), nn.Tanh(), nn.LeakyReLU(), nn.Softmax(dim=...)
Normalization:
- nn.BatchNorm1d(num_features), nn.BatchNorm2d(num_features)
Dropout:
- nn.Dropout(p=0.5)
Recurrent:
- nn.RNN(), nn.LSTM(), nn.GRU()
Embedding:
- nn.Embedding(num_embeddings, embedding_dim) (for word embeddings)
Containers:
- nn.Sequential(*layers): A linear stack of modules.
- nn.ModuleList([module1, module2, ...]): Holds a list of submodules.
- nn.ParameterList([param1, param2, ...]): Holds a list of parameters.

3.2 Defining a Custom Neural Network

import torch.nn as nn
import torch.nn.functional as F

class SimpleNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

model = SimpleNet(input_size=10, hidden_size=5, num_classes=2)
# print(model)

4. Loss Functions (`torch.nn` and `torch.nn.functional`)

Calculate how far an output is from a target.

Loss Function	Class (`nn`)	Functional (`F`)	Use Case
Mean Squared Error	`nn.MSELoss()`	`F.mse_loss(input, target)`	Regression tasks
Cross Entropy	`nn.CrossEntropyLoss()`	`F.cross_entropy(input, target)`	Multi-class classification (input is raw scores/logits)
Binary Cross Entropy with Logits	`nn.BCEWithLogitsLoss()`	`F.binary_cross_entropy_with_logits(input, target)`	Binary classification (input is raw scores/logits)
Binary Cross Entropy	`nn.BCELoss()`	`F.binary_cross_entropy(input, target)`	Binary classification (input is probabilities 0-1)
L1 Loss (Mean Absolute Error)	`nn.L1Loss()`	`F.l1_loss(input, target)`	Robust regression
Negative Log Likelihood Loss	`nn.NLLLoss()`	`F.nll_loss(input, target)`	Multi-class classification (input is log-probabilities)
Kullback-Leibler Divergence	`nn.KLDivLoss()`	`F.kl_div(input, target)`	Measuring difference between two probability distributions
Margin Ranking Loss	`nn.MarginRankingLoss()`	`F.margin_ranking_loss(input1, input2, target)`	Ranking tasks
MultiMarginLoss (SVM-like)	`nn.MultiMarginLoss()`	`F.multi_margin_loss(input, target)`	Multi-class classification (SVM-style)

5. Optimizers (`torch.optim`)

Update model weights to minimize the loss.

Optimizer	Class (`optim`)	Description
Stochastic GD	`optim.SGD(model.parameters(), lr=0.01, momentum=0.9)`	Basic gradient descent. Supports momentum.
Adam	`optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.999), eps=1e-08)`	Adaptive moment estimation. Popular, generally good performance.
RMSprop	`optim.RMSprop(model.parameters(), lr=0.01, alpha=0.99)`	Adaptive learning rate optimizer.
Adagrad	`optim.Adagrad(model.parameters(), lr=0.01)`	Adaptive learning rate for sparse data.
Adadelta	`optim.Adadelta(model.parameters(), lr=1.0, rho=0.9)`	Adaptive learning rate optimizer, less sensitive to learning rate hyperparameter.

Optimization Step

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Inside training loop:
optimizer.zero_grad() # Clear previous gradients
loss.backward()       # Compute gradients of loss w.r.t. model parameters
optimizer.step()      # Update model parameters

6. Data Loading (`torch.utils.data`)

Efficiently load data in batches.

6.1 `Dataset`

Abstract class representing a dataset. Your custom dataset should inherit from it and implement __len__ and __getitem__.

from torch.utils.data import Dataset

class CustomDataset(Dataset):
    def __init__(self, data, labels):
        self.data = data # a tensor or list of tensors
        self.labels = labels # a tensor or list of tensors

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        sample = self.data[idx]
        label = self.labels[idx]
        return sample, label

# Example usage:
# dataset = CustomDataset(some_tensor_data, some_tensor_labels)

6.2 `DataLoader`

Wraps a Dataset to provide iterators for easy batching, shuffling, and multiprocessing.

from torch.utils.data import DataLoader

# Create dummy data and labels
dummy_data = torch.randn(100, 10) # 100 samples, 10 features
dummy_labels = torch.randint(0, 2, (100,)) # 100 binary labels

dataset = CustomDataset(dummy_data, dummy_labels)

dataloader = DataLoader(
    dataset,
    batch_size=32,
    shuffle=True,
    num_workers=4 # For multiprocessing, 0 for main process
)

# Iterate through data
for batch_idx, (inputs, targets) in enumerate(dataloader):
    # inputs.shape will be [32, 10] (or less for last batch)
    # targets.shape will be [32]
    pass

7. GPU Usage (CUDA)

Move models and tensors to GPU for accelerated computation.

# 1. Check for CUDA availability
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# 2. Move tensor to device
x = torch.randn(3, 3).to(device)

# 3. Move model to device
model = SimpleNet(10, 5, 2).to(device)

# Ensure all inputs to the model are also on the same device
# inputs = inputs.to(device)
# targets = targets.to(device)

8. Saving and Loading Models

8.1 Saving

Recommended: Save state_dict (parameters only).

# Save model parameters
torch.save(model.state_dict(), 'model_weights.pth')

# Save entire model (not recommended for cross-version compatibility)
# torch.save(model, 'entire_model.pth')

8.2 Loading

# 1. Instantiate the model architecture
model = SimpleNet(input_size=10, hidden_size=5, num_classes=2)

# 2. Load the state_dict
model.load_state_dict(torch.load('model_weights.pth'))

# 3. Set model to evaluation mode (important for BatchNorm, Dropout)
model.eval()

# For inference:
# with torch.no_grad():
#     output = model(input_tensor)

# To load entire model (if saved that way):
# model = torch.load('entire_model.pth')
# model.eval()

9. Training Loop Structure

# 1. Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# 2. Hyperparameters
input_size = 784 # For MNIST
hidden_size = 500
num_classes = 10
num_epochs = 5
batch_size = 100
learning_rate = 0.001

# 3. Dataset and DataLoader (example for MNIST)
# train_dataset = torchvision.datasets.MNIST(root='./data', train=True, transform=transforms.ToTensor(), download=True)
# train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)

# 4. Model instantiation
model = SimpleNet(input_size, hidden_size, num_classes).to(device)

# 5. Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# 6. Training loop
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Move tensors to the configured device
        images = images.reshape(-1, input_size).to(device)
        labels = labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i+1) % 100 == 0:
            print (f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{len(train_loader)}], Loss: {loss.item():.4f}')

# 7. Evaluation (on test set)
model.eval() # Set model to evaluation mode
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.reshape(-1, input_size).to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print(f'Accuracy of the network on the 10000 test images: {100 * correct / total} %')