{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "PMvDlW7YuXD4"
},
"source": [
"# CS480/680, Spring 2023, Assignment 2\n",
"## Designer: Yimu Wang; Instructor: Hongyang Zhang\n",
"Released: June 5; Due: **Extended: June 29, noon**\n",
"\n",
"[Link of the Assignment](https://colab.research.google.com/drive/1WjC7sUNiBRPga7M011Qope7lDuKaxNzc#scrollTo=PMvDlW7YuXD4)\n",
"\n",
"\n",
"Tips:\n",
"- Please save a copy of this notebook to avoid losing your changes.\n",
"- Debug your code and ensure that it can run.\n",
"- Save the output of each cell. Failure to do so may result in your coding questions not being graded.\n",
"- **To accelerate the training time, you can choose 'Runtime' -> 'Change runtime type' -> 'Hardware accelerator' and set 'Hardware accelerator' to 'GPU'. With T4, all the experiments can be done in 5 minutes. In total, this notebook can be run in 20 minutes.** (You can either use or not use GPU to accelearte. It is your choice and it does not affect your grades.)\n",
"- Your grade is independent of the test accuracy (unless it is 10%, as 10% is the accuracy of random guess).\n",
"\n",
"Tips for sumbission:\n",
"- Do not change the order of the problems.\n",
"- Select 'Runtime' -> 'Run all' to run all cells and generate a final \"gradable\" version of your notebook and save your ipynb file.\n",
"- **We recommend using Chrome to generate the PDF.**\n",
"- Also use 'File' -> 'Print' and then print your report from your browser into a PDF file.\n",
"- **Submit both the .pdf and .ipynb files.**\n",
"- **We do not accept any hand-written report.**\n",
"\n"
]
},
{
"cell_type": "markdown",
"source": [
"## Question 0. **[Important] Please name your submission as {Last-name}\\_{First-name}\\_assignment2.ipynb and {Last-name}\\_{First-name}\\_assignment2.pdf. If you do not follow this rule, your grade of assignment 2 will be 0.**"
],
"metadata": {
"id": "AEVYyg3-FPMj"
}
},
{
"cell_type": "markdown",
"metadata": {
"id": "xhdVaFciIPvV"
},
"source": [
"## Question 1. Basics of MLP, CNN, and Transformers (40 points)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "M1TwKbq7IPvV"
},
"source": [
"### 1.1 Given an MLP with an input layer of size 100, two hidden layers of size 50 and 25, and an output layer of size 10, calculate the total number of parameters in the network. (10 points)\n",
"\n",
"This MLP network is a standard MLP with bias terms. The activation function is ReLU. But you do not need to consider the parameters of the activation function.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "D6e9OkDHIPvW"
},
"source": [
"Solution:\n",
"[Your answer here, You should give the full calculation process.]"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Axu9AelNIPvW"
},
"source": [
"### 1.2 Given the loss functions of [mean squared error (MSE)](https://en.wikipedia.org/wiki/Mean_squared_error) and [CE (cross-entropy) loss (between logits and target)](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html), and the predicted logit of a data example (before softmax) is [0.5, 0.3, 0.8], while the target of a data example is [0.3, 0.6, 0.1], calculate the value of the loss (MSE and CE). (10 points)\n",
"\n",
"The loss functions of MSE and CE are as follows,\n",
"$$\n",
"\\begin{aligned}\n",
" \\ell_{MSE}(\\hat{y}, y) &= \\sum_{i \\in C} (\\hat{y}_c - y_c)^2,\\\\\n",
" \\ell_{\\text{CE}}(\\hat{y}, y) &= -\\sum_{i=1}^{C} y_i \\log\\left(\\frac{\\exp(\\hat{y}_i)}{\\sum_{j \\in [C]}\\exp(\\hat{y}_j)}\\right)\\,,\n",
"\\end{aligned}\n",
"$$\n",
"where $\\hat{y}_i$ is the $i$-th element of predict logit (before softmax), $y$ is the $i$-th element of target, and $C$ is the number of classes.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jQtDx5rcIPvW"
},
"source": [
"Solution:\n",
"[Your answer here, You should give the full calculation process.]"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "W8s1lWXVIPvX"
},
"source": [
"### 1.3 Given a convolutional layer in a CNN with an input feature map of size 32x32, a filter size of 3x3, a stride of 1, and no padding, calculate the size of the output feature map. (10 points)\n",
"\n",
"You can refer to [PyTorch Convolutional Layer](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html) for the definition of the convolutional layer."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Tgx-bIK0IPvX"
},
"source": [
"Solution:\n",
"[Your answer here, You should give the full calculation process.]"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "njiSliQOIPvX"
},
"source": [
"### 1.4 Given a Transformer encoder with 1 layer, the input and output dimensions of attention of 256, and 8 attention heads, calculate the total number of parameters in the **self-attention mechanism**. (10 points)\n",
"\n",
"You can refer to [Attention is all you need (Transformer paper)](https://arxiv.org/abs/1706.03762) for reference."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pcLrJswgIPvY"
},
"source": [
"Solution:\n",
"[Your answer here, You should give the full calculation process.]"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9oe7oy0jwbwu"
},
"source": [
"## Question 2. Implementation of Multi-Layer Perceptron and understanding Gradients (60 points)\n",
"\n",
"In this question, you will learn how to implment a Multi-Layer Perceptron (MLP) PyTorch, test the performance on CIFAR10, and learn what is gradient.\n",
"Please refer to [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) and [PyTorch](https://pytorch.org/) for details.\n",
"\n",
"Assuming the MLP follows the following construction\n",
"$$\n",
" \\hat{y} = \\operatorname{softmax}( W_2 \\operatorname{sigmoid}(W_1x + c_1) + c_2)\\,,\n",
"$$\n",
"where $\\hat{y}$ is the prediction (probability) of the input $x$ by the MLP and $W_1$, $W_2$, $c_1$, and $c_2$ are four learnable parameters of the MLP.\n",
"$\\operatorname{softmax}$ and $\\operatorname{sigmoid}$ are the [softmax](https://en.wikipedia.org/wiki/Softmax_function) and [sigmoid](https://en.wikipedia.org/wiki/Sigmoid_function) function.\n",
"\n",
"Please note that the label is one-hot vectors, i.e., $y_i$ are either 0 or 1. And the sum of prediction is equal to 1, i.e., $\\sum_{i\\in[C]}\\hat{y}_i = 1$.\n",
"However, usually in PyTorch, the labels are integers. You may need to transfer integers into one-hot vectors.\n",
"\n",
"Tips: The process of using SGD to train a model is as follows:\n",
"1. Initialize the model parameters.\n",
"2. Calculate the gradients of the model parameters.\n",
"3. Update the model parameters using the gradients.\n",
"4. Test the model.\n",
"5. Repeat 2 - 4 until the model converges or for a given epochs.\n",
"\n",
"You do not need to implement the training procedure. Please follow the instruction of each question. This is just set to help you understanding the training procedure.\n",
"\n",
"\n",
"You can also refer to [SGD](https://optimization.cbe.cornell.edu/index.php?title=Stochastic_gradient_descent) and [PyTorch Tutorial](https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html) for inspiration.\n",
"\n",
"**Please note that you are allowed to use any PyTorch api and any public code to complete the following coding questions. This assignment is help you to learn PyTorch.**"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kX9w6dh7IPvY"
},
"source": [
"### 2.0 Get CIFAR10 data (0 points)\n",
"\n",
"We will show you how to get the data using PyTorch. **You are not allowed to edit the following code.**\n",
"Please see [Dataset and DataLoaders](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html) for details."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "0d-o30eEIPvY"
},
"outputs": [],
"source": [
"### YOU ARE NOT ALLOWED TO EDIT THE FOLLOWING CODE. ###\n",
"import torch\n",
"import torchvision.datasets as datasets\n",
"import torchvision.transforms as transforms\n",
"from torch.utils.data import DataLoader\n",
"\n",
"def get_CIFAR10():\n",
" # Set the random seed for reproducibility\n",
" torch.manual_seed(480)\n",
"\n",
" # Load the CIFAR10 dataset and apply transformations\n",
" train_dataset = datasets.CIFAR10(root='./data', train=True,\n",
" transform=transforms.ToTensor(),\n",
" download=True)\n",
" test_dataset = datasets.CIFAR10(root='./data', train=False,\n",
" transform=transforms.ToTensor())\n",
"\n",
" # Define the data loaders\n",
" batch_size = 100\n",
" train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)\n",
" test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)\n",
" return train_dataset, test_dataset, train_loader, test_loader"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "N6YjwwQFIPvZ"
},
"source": [
"### 2.1 Implement the MLP with PyTorch (15 points)\n",
"\n",
"The shape of $W_1$ and $W_2$ should be $256 \\times 3072$ and $10 \\times 256$.\n",
"\n",
"You can refer to [Define a NN](https://pytorch.org/tutorials/recipes/recipes/defining_a_neural_network.html) for inspiration.\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "36NZgKUsIPvZ"
},
"outputs": [],
"source": [
"import torch.nn as nn\n",
"import torchvision.models as models\n",
"\n",
"##### COMPLETE THE FOLLOWING CODE, YOU ARE ALLOWED TO CHANGE THE SKLETON CODE#####\n",
"\n",
"# Define the MLP model in PyTorch\n",
"class MLPinPyTorch(nn.Module):\n",
" def __init__(self, input_dim=3072, hidden_dim=256, output_dim=10):\n",
" super(MLPinPyTorch, self).__init__()\n",
" # TODO: Your code here: init with parameters\n",
" # you are allowed to use any initialization method\n",
" pass\n",
"\n",
" def forward(self, x):\n",
" # TODO: Your code here: forward\n",
" pass"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Xtuen5bgGA94"
},
"source": [
"### 2.2 Calculate the accuracy given true labels and predicted labels (5 points)\n",
"\n",
"You should complete the following code, including the calculation of the accuracy."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "MrpblVbBGA94"
},
"outputs": [],
"source": [
"##### COMPLETE THE FOLLOWING CODE, YOU ARE ALLOWED TO CHANGE THE SKLETON CODE#####\n",
"def accuracy(y, predicted):\n",
" # TODO: Your code here: calculate accuracy\n",
" pass"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_SKtKLNSIPva"
},
"source": [
"### 2.3 Test your implementation on CIFAR10 and reports the accuracy on training and testing datasets (20 points)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "3FOpS2fHIPva"
},
"outputs": [],
"source": [
"device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
"##### COMPLETE THE FOLLOWING CODE, YOU ARE ALLOWED TO CHANGE THE SKLETON CODE#####\n",
"def test(model, train_dataloader, test_dataloader):\n",
" # TODO: Your code here: calculate accuracy\n",
" predict_train = []\n",
" predict_test = []\n",
" train_labels = []\n",
" test_labels = []\n",
"\n",
" for inputs, labels in enumerate(train_dataloader):\n",
" inputs = inputs.view(-1, 3072).to(device)\n",
" train_labels.append(labels)\n",
"\n",
" # Forward pass\n",
"\n",
" for inputs, labels in enumerate(test_dataloader):\n",
" inputs = inputs.view(-1, 3072).to(device)\n",
" test_labels.append(labels)\n",
"\n",
" # Forward pass\n",
"\n",
" train_acc = accuracy(train_labels, predict_train)\n",
" test_acc = accuracy(test_labels, predict_test)\n",
" return train_acc, test_acc\n",
"\n",
"###### the following is served for you to check the functions #####\n",
"##### you can change but ensure the following code can output the accuracies #####\n",
"model = MLPinPyTorch()\n",
"train_dataset, test_dataset, train_loader, test_loader = get_CIFAR10()\n",
"train_acc, test_acc = test(model, train_loader, test_loader)\n",
"print(\"train_acc: \", train_acc)\n",
"print(\"test_acc: \", test_acc)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1vwuOX6dIPva"
},
"source": [
"### 2.4 Calculate the gradients $\\frac{\\partial \\ell}{\\partial W_1}$, $\\frac{\\partial \\ell}{\\partial c_1}$, $\\frac{\\partial \\ell}{\\partial W_2}$, and $\\frac{\\partial \\ell}{\\partial c_2}$ using algebra given a data example $(x, y)$ (20 points)\n",
"\n",
"The loss function we use for training the MLP is the [log-loss](https://scikit-learn.org/stable/modules/model_evaluation.html#log-loss), which is defined as follows:\n",
"$$\n",
" \\ell(\\hat{y}, y) = -\\sum_{i=1}^{C} y_i \\log(\\hat{y}_i)\\,,\n",
"$$\n",
"where $C$ is the number of classes, $y_i$ and $\\hat{y}_i$ are the $i$-th index of $y$ and $\\hat{y}$.\n",
"\n",
"Considering the MLP model in question 2, please use chainrule to calculate the gradients. You can directly write LaTex/Markdown in the following cell."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EyP0Yv4zIPva"
},
"source": [
"Solution:\n",
"[Your answer here]"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.16"
}
},
"nbformat": 4,
"nbformat_minor": 0
}