Python is available for you favorite platform on the downloads page (or you can choose the anaconda version instead).
You can use the Python interpreter interactively by typing *python* at a terminal window. However, for data analysis we recommend IPython, which is a nicer front end to python (it works with both a regular Python install, or an anaconda version of it). If you have installed it (or are using department machines) it can be invoked with:

ipython

To quit, type control-d

To run python code in a file *code.py*, either type

run code.py

in the *ipython* interpreter, or

python code.py

at the unix command line.

In addition to python statements/expressions *ipython* allows you to type in shell commands and its own special magic commands, and it provides better integration with matplotlib, which is the best python plotting library.

You can use Python as a calculator. For example, what is the value of $(100\cdot 2 - 12^2) / 7 \cdot 5 + 2\;\;\;$?

In [301]: (100*2 - 12**2) / 7*5 + 2 Out[301]: 42

In order to compute something like $\sin(\pi/2)$ we first need to *import* the *math* module:

In [303]: import math In [304]: math.sin(math.pi/2) 1.0

How do I find out what other mathematical functions are available?

help("math")

Can I work with vectors and matrices in python?

Of course! Every data analysis tool is worth its bytes should.
The `numpy`

package provides the required magic.

Vectors and matrices are all represented as numpy arrays. First, some vectors:

In [1]: import numpy as np In [2]: x = np.array([1,1])

We can multiply a vector by a scalar:

In [3]: x * 2 Out[3]: array([2, 2])

And we can add vectors:

In [4]: x + np.array([1,0]) Out[4]: array([2, 1])

After we introduce matrices, we'll show how to do inner products.

Let's create an array that represents the following matrix: \[\left ( \begin{array}{cc} 1 & 2\\ 3 & 4\\ 5 & 6 \end{array} \right ) \]

In [18]: X = np.array([[1,2], [4,3], [5,6]]) In [19]: X Out[19]: array([[1, 2], [4, 3], [5, 6]])

We'll think of $X$ as the feature matrix of a machine learning dataset. To access a row of the matrix (corresponding to the features of the ith example in the dataset):

In [20]: X[0] Out[20]: array([1, 2])

To access a column of the matrix (a single feature):

In [21]: X[:,0] Out[22]: array([1, 4, 5])

Let's construct a weight vector for a linear classifier:

In [20]: w = np.array([1,-1])

We can easily compute the dot/inner product of a row of $X$ with the weight vector:

In [21]: np.inner(w, X[0]) Out[21]: -1

We can even compute the inner products for all the rows of the matrix all at once:

In [22]: np.inner(w, X) Out[22]: array([-1, 1, -1])

Let's construct another matrix

In [33]: A = np.ones((2,3)) * 2 In [34]: A Out[34]: array([[ 2., 2., 2.], [ 2., 2., 2.]])

Let's look for a way to compute the matrix product $A \times X$. Our first guess would be to try the multiplication operator, since we saw above that we can multiply a matrix by a scalar:

In [36]: A * X --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-36-578836d375ce> in <module>() ----> 1 A * X

So it didn't work. You can multiply matrices that are the same shape using the `*`

operator, but it performs
component-wise multiplication rather than matrix product. Instead, use the
`numpy`

function `dot`

to do matrix multiplication:

In [37]: np.dot(A, X) Out[37]: array([[ 20., 22.], [ 20., 22.]])

Turns out that `dot`

is a method, so you can also do:

In [38]: A.dot(X) Out[38]: array([[ 20., 22.], [ 20., 22.]])

An array is transposed by

In [39]: A.transpose() Out[39]: array([[ 2., 2.], [ 2., 2.], [ 2., 2.]]) In [41]: A.T Out[41]: array([[ 2., 2.], [ 2., 2.], [ 2., 2.]])

And let's take a look at all the methods that an array has:

In [39]: A. A.T A.cumsum A.min A.shape A.all A.data A.nbytes A.size A.any A.diagonal A.ndim A.sort A.argmax A.dot A.newbyteorder A.squeeze A.argmin A.dtype A.nonzero A.std A.argpartition A.dump A.partition A.strides A.argsort A.dumps A.prod A.sum A.astype A.fill A.ptp A.swapaxes A.base A.flags A.put A.take A.byteswap A.flat A.ravel A.tobytes A.choose A.flatten A.real A.tofile A.clip A.getfield A.repeat A.tolist A.compress A.imag A.reshape A.tostring A.conj A.item A.resize A.trace A.conjugate A.itemset A.round A.transpose A.copy A.itemsize A.searchsorted A.var A.ctypes A.max A.setfield A.view A.cumprod A.mean A.setflags

Elements and sub-matrices are easily extracted:

In [42]: X Out[42]: array([[1, 2], [4, 3], [5, 6]]) In [43]: X[0,0] Out[43]: 1 In [44]: X[-1,-1] Out[44]: 6 In [46]: X[0:2, 0:2] Out[46]: array([[1, 2], [4, 3]]) # my favorite way of indexing: using an array! In [47]: X[ [0,2] ] Out[47]: array([[1, 2], [5, 6]])

How do I find the inverse of a matrix?

In [2]: z = np.array([[2,1,1],[1,2,2],[2,3,4]]) In [3]: z Out[3]: array([[2, 1, 1], [1, 2, 2], [2, 3, 4]]) In [4]: np.linalg.inv(z) Out[4]: array([[ 0.66666667, -0.33333333, 0. ], [ 0. , 2. , -1. ], [-0.33333333, -1.33333333, 1. ]]) In [5]: np.dot(z, np.linalg.inv(z)) Out[5]: array([[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.]])

Let's get on to that all important step of visualizing data. We will be using the matplotlib Python package for that. Let's start by plotting the function $f(x) = x^2$.

First, let's generate the numbers. Well, there are tons of ways to do so. Python has some nifty syntax for generating lists. Watch this! A list comprehension!!

In [9]: f = [i**2 for i in range(10)] In [10]: f Out[10]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

There's an alternative way of doing this using `numpy`

:

In [12]: f = np.arange(10)**2 In [13]: f Out[13]: array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])

To plot the data, first import the `pyplot`

module.

In [6]: import matplotlib.pyplot as plt In [7]: plt.plot(range(10), f, 'ob') Out[7]: [<matplotlib.lines.Line2D at 0x10549b590>]

In order to actually see the plot you need to do:

In [8]: plt.show()

As an alternative, you can put matplotlib in interactive mode before plotting using the command `plt.ion()`

.
Also note that plotting functions accept either Python lists or `numpy`

arrays.

We can add a second plot to the same axes by calling *plot* again:

In [16]: plt.plot(x, x, 'dr')