{ "metadata": { "language": "lua", "name": "", "signature": "sha256:a01b499df5ec44b66ed7a020c9df130067282667bb649a8d92a8da1cd1b58f54" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "I learned torch about a year ago. I am not as advanced as Soumith by far.\n", "But for me the learning process is fresh in memory, so I want to hold your hand to take you through some critical and insightful part of my learning process as I remember it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Parallels (roughly)\n", "+ python -> lua\n", "+ numpy -> torch\n", "+ sklearn -> nn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The docs: https://github.com/torch/nn You HAVE to know this URL by heart. For more advanced modules, you're often better off just reading the code." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is probably the number one most important page of the docs to understand how nn works. Read every single word of it. https://github.com/torch/nn/blob/master/doc/module.md" ] }, { "cell_type": "code", "collapsed": false, "input": [ "require 'nn' -- auto imports torch too. \n", "-- quick hint, You can do \"require 'cunn'\" if on a gpu machine, then you have all you need." ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Exercise 1: nn.Linear" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below I make a single linear layer. What is the size of one input sample? What is the batch size? What is the output size? Print the weights and the bias. Do you find the state variables as defined in the Module documentation?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "net = nn.Linear(3,4)\n", "net:zeroGradParameters() --i'll come back to this later\n", "inp = torch.randn(2,3)\n", "out = net:forward(inp)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok so now imagine calculating a loss and receiving a gradient to the output of the \"net\" layer. This gradient needs to have the same size as the output: the derivative of a scalar (=the objective) with respect to each of the net's outputs.\n", "\n", "Backpropagate through the network." ] }, { "cell_type": "code", "collapsed": false, "input": [ "gradientout = torch.randn(2,4)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What is the size of the gradient wrt the network's inputs?" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "During backpropping through the linear layer, not only the gradient wrt input is calculated, also the gradient wrt the parameters and the bias. Find these gradients." ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Take a look at the net's weight and the bias. Scroll up and take a look at the weight & bias before backprop. Explain" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Do the exact same backprop step and look at the gradBias and gradWeight again. Explain" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To make sure you understand what's going on with an update of the weights, let's first set all weights to zero. Now apply an update with learning rate 0.1 and print the new weights." ] }, { "cell_type": "code", "collapsed": false, "input": [ "net.weight:zero()\n", "net.bias:zero()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok, now we did our gradient udpate and want to process a new minibatch. Clear the gradient of the bias and gradient of the weights." ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Convolutional layer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At home: do all the previous exercises for a convolutional layer.\n", "\n", "nn.SpatialConvolution(nInputPlane, nOutputPlane, kW, kH, [dW], [dH])\n", "\n", "I make batchsize 1 and output feature maps 1 so you can still print stuff easily." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below I make a single convolutional layer. What is the output size? How does it compare to the input size? Take pen and paper and draw the convolution if you don't understand the relation between input, output and kernel size.\n", "\n", "Print the weights and the bias. Do you find the state variables as defined in the Module documentation?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "net = nn.SpatialConvolution(2,1,3,3)\n", "inp = torch.randn(1,2,5,5) -- batch of 1, 2 input channels, 5x5 image\n", "out = net:forward(inp)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok so now imagine calculating a loss and receiving a gradient to the output of the \"net\" layer. This gradient needs to have the same size as the output: the derivative of a scalar (=the objective) with respect to each of the net's outputs.\n", "\n", "Backpropagate through the network." ] }, { "cell_type": "code", "collapsed": false, "input": [ "gradientout = torch.randn(1,1,3,3)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What is the size of the gradient wrt the network's inputs?" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "During backpropping through the linear layer, not only the gradient wrt input is calculated, also the gradient wrt the parameters and the bias. Find these gradients." ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Take a look at the net's weight and the bias. Scroll up and take a look at the weight & bias before backprop. Explain" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Do the exact same backprop step and look at the gradBias and gradWeight again. Explain" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To make sure you understand what's going on with an update of the weights, let's first set all weights to zero. Now apply an update with learning rate 0.1 and print the new weights." ] }, { "cell_type": "code", "collapsed": false, "input": [ "net.weight:zero()\n", "net.bias:zero()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok, now we did our gradient udpate and want to process a new minibatch. Clear the gradient of the bias and gradient of the weights." ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Extra" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How are linear layers' weights and biases initialized? What about conv layers? (look in the code)\n", "\n", "What about gradWeight and gradBias? (test it out with small linear layer)" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }