{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Numpy\n", "\n", "Numpy is python's library for numerical linear algebra, and provides a wealth of functionality for working with vectors, matrices, and tensors.\n", "\n", "Our first step is to **import** the package (note the \"import as\" shortcut):" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Arrays are the standard data containers in numpy, and can have any number of dimensions.\n", "\n", "Let's create a one dimensional array:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3])" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "(3,)\n" ] } ], "source": [ "a = np.array([1, 2, 3])\n", "a\n", "print(type(a))\n", "print(a.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Accessing/modifying array elements:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 2 3\n" ] }, { "data": { "text/plain": [ "array([5, 2, 3])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(a[0], a[1], a[2])\n", "a[0] = 5 # Change an element of the array\n", "a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Two dimensional arrays:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([[1, 2, 3],\n", " [4, 5, 6]]), (2, 3))" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" }, { "name": "stdout", "output_type": "stream", "text": [ "1 2 4\n", "1 2 4\n" ] } ], "source": [ "b = np.array([[1,2,3],[4,5,6]])\n", "b,b.shape\n", "\n", "print(b[0, 0], b[0, 1], b[1, 0])\n", "print(b[0][0], b[0][1], b[1][0])\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some functions for creating arrays:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0. 0.]\n", " [ 0. 0.]]\n", "[[ 1. 1.]\n", " [ 1. 1.]]\n", "[[ 7. 7.]\n", " [ 7. 7.]]\n", "[[ 1. 0.]\n", " [ 0. 1.]]\n", "[[ 0.13849416 0.46435903 0.9068119 ]\n", " [ 0.86278242 0.66594523 0.71520984]\n", " [ 0.65630407 0.39285181 0.99686805]]\n", "[ 2. 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9]\n", "[ 1. 1.6 2.2 2.8 3.4 4. ]\n" ] } ], "source": [ "a = np.zeros((2,2)) # Create an array of zeros\n", "print(a)\n", "\n", "b = np.ones((2,2)) # Create an array of ones\n", "print(b)\n", "\n", "c = np.full((2,2), 7.0) # Create a constant array (can also be done with the ones function)\n", "print(c)\n", "\n", "\n", "d = np.eye(2) # Create a 2x2 identity matrix\n", "print(d)\n", "\n", "e = np.random.random((3,3))\n", "print(e) \n", "\n", "f = np.arange(2, 3, 0.1)\n", "print(f)\n", "\n", "g = np.linspace(1., 4., 6)\n", "print(g)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sidenote - getting **help** on python objects:\n", "\n", "For getting help e.g. on the Numpy **linspace** function you can do one of the following:\n", "\n", "```python\n", "?np.linspace\n", "```\n", "\n", "or\n", "\n", "```python\n", "help(np.linspace)\n", "```" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Array indexing\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1, 2, 3, 4],\n", " [ 5, 6, 7, 8],\n", " [ 9, 10, 11, 12]])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])\n", "X" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll think of this matrix as containing three feature vectors: each row of the matrix is a vector of features, and each column is the set of values that a given feature has in the data.\n", "\n", "To access the vector of features of the first example in the dataset:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([1, 2, 3, 4]), (4,))" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "row = X[0] # the first row of X\n", "row, row.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To access a column of the matrix (a single feature):" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([1, 5, 9]), (3,))" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "col = X[:, 0]\n", "col, col.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are multiple ways of accessing sub-arrays. The first uses **slices**, similarly to python lists:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([[ 6, 7, 8],\n", " [10, 11, 12]]), (2, 3))" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "submatrix = X[1:3, 1:4]\n", "submatrix, submatrix.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here's my favorite way of indexing an array, using an integer array:\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1, 2, 3, 4],\n", " [ 9, 10, 11, 12]])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X[ [0, 2] ] # extract a given set of rows" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1, 3],\n", " [ 5, 7],\n", " [ 9, 11]])" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X[:, [0,2]] # extract a given set of columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Operations on arrays\n", "\n", "You can multiply an array by a scalar:\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([2, 2])" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.array([1,1])\n", "x * 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Add arrays:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([2, 1])" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x + np.array([1,0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dot products " ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "X = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's construct a weight vector for a linear classifier:\n" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1, -1, 2, -1])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "w = np.array([1,-1, 2, -1])\n", "w" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can easily compute the dot/inner product of a row of \n", "X with the weight vector:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.dot(X[0], w)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or you can do it for the whole matrix at once:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 5, 9])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.dot(X, w)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This can also be done using methods:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 5, 9])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X.dot(w)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and can also be achieved using" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 5, 9])" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.inner(X, w)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Other linear algebra operations\n", "\n", "Numpy has many other useful things it can do for you when it comes to vectors and matrices." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's very easy to find the inverse of a matrix:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0.66666667, -0.33333333, 0. ],\n", " [ 0. , 2. , -1. ],\n", " [-0.33333333, -1.33333333, 1. ]])" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Z = np.array([[2,1,1],[1,2,2],[2,3,4]])\n", "np.linalg.inv(Z)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we can easily verify that this is correct:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ True, True, True],\n", " [ True, True, True],\n", " [ True, True, True]], dtype=bool)" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.dot(Z, np.linalg.inv(Z))==np.eye(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can compute some useful statistics over your matrix such the mean and standard deviation:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(6.5, array([ 5., 6., 7., 8.]), array([ 2.5, 6.5, 10.5]))" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])\n", "X.mean(), X.mean(axis=0), X.mean(axis=1)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(3.4520525295346629,\n", " array([ 3.26598632, 3.26598632, 3.26598632, 3.26598632]),\n", " array([ 1.11803399, 1.11803399, 1.11803399]))" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X.std(), X.std(axis=0), X.std(axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are lots of other things you can do with numpy:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3, 4],\n", " [5, 6, 7, 8]])" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "array([1, 2, 3, 4, 5, 6, 7, 8])" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.array( [1,2,3,4] )\n", "y = np.array( [5,6,7,8] )\n", "np.vstack([x,y])\n", "np.hstack([x,y])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Avoid loops when you can\n", "\n", "Consider the following piece of code:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 2, 2, 4],\n", " [ 5, 5, 7],\n", " [ 8, 8, 10],\n", " [11, 11, 13]])" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])\n", "v = np.array([1, 0, 1])\n", "y = np.empty_like(x) # Create an empty matrix with the same shape as x\n", "\n", "# Add the vector v to each row of the matrix x with an explicit loop\n", "for i in range(4):\n", " y[i, :] = x[i, :] + v\n", "\n", "y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we know, loops are slow in python. There is a much more efficient way of doing this:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 2, 2, 4],\n", " [ 5, 5, 7],\n", " [ 8, 8, 10],\n", " [11, 11, 13]])" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])\n", "v = np.array([1, 0, 1])\n", "y = x + v\n", "y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is called **broadcasting**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Numpy documentation\n", "\n", "These were some of the basics that are relevant to our course. You can find more details in the [Numpy user manual](https://docs.scipy.org/doc/numpy/user/) and the detailed [reference guide](https://docs.scipy.org/doc/numpy/reference).\n", "The following is a good resource for learning how to [vectorize](http://www.labri.fr/perso/nrougier/from-python-to-numpy/) Python code using Numpy for obtaining good performance." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some aspects of these tutorial were inspired by this [tutorial](http://cs231n.github.io/python-numpy-tutorial/)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 1 }