$\newcommand{\xv}{\mathbf{x}} \newcommand{\Xv}{\mathbf{X}} \newcommand{\yv}{\mathbf{y}} \newcommand{\zv}{\mathbf{z}} \newcommand{\av}{\mathbf{a}} \newcommand{\Wv}{\mathbf{W}} \newcommand{\wv}{\mathbf{w}} \newcommand{\tv}{\mathbf{t}} \newcommand{\Tv}{\mathbf{T}} \newcommand{\Vv}{\mathbf{V}} \newcommand{\Yv}{\mathbf{Y}} \newcommand{\Zv}{\mathbf{Z}} \newcommand{\muv}{\boldsymbol{\mu}} \newcommand{\sigmav}{\boldsymbol{\sigma}} \newcommand{\phiv}{\boldsymbol{\phi}} \newcommand{\Phiv}{\boldsymbol{\Phi}} \newcommand{\Sigmav}{\boldsymbol{\Sigma}} \newcommand{\Lambdav}{\boldsymbol{\Lambda}} \newcommand{\half}{\frac{1}{2}} \newcommand{\argmax}[1]{\underset{#1}{\operatorname{argmax}}} \newcommand{\argmin}[1]{\underset{#1}{\operatorname{argmin}}}$

Assignment 3: Neural Network Regression

Steve Kommrusch

Key neural network subroutines provided by professor Chuck Anderson

Overview

The goal of this assignment is to compare linear and neural network models applied to the same data set. Given that goal, for this assignment I'm reusing the same technical stock market data I analysed in Assignment 2 as I'm familiar with the linear model results already. (For the next assignment I'll move on to a new data set). This data has several years of stock prices for 9 major technology companies and uses the data to attempt to predict Microsoft's stock price 3 months in advance. An example neural network model for this assignment is shown below, along with the matrix equations used to produce the output.

$\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;$ $\;\;\;\;\;\;\; \tilde{\Xv} \;\;\;\;\;\;\;\;\;\;$ $\;\;\;\; \Vv \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;$ $ \;\;\;\;\;\;\; \tilde{\Zv} \;\;\;\;\;\;\;\;\;\;\;\;$ $\;\;\;\;\; \Wv \;\;\;\;\;\;\;\;\;\;\;\;$ $\;\;\;\; \Yv \;\;\;\;\;\;\;\;\;\;\;\;$

$$ \begin{align*} \tilde{\Zv} & = tanh(\tilde{\Xv} \Vv),\\ \Yv & = \tilde{\Zv} \Wv, \text{ or }\\ \Yv & = \tilde{tanh}(\tilde{\Xv} \Vv) \Wv \end{align*} $$

In the above equation, $\tilde{tanh}$ represent the hyperbolic tangent function being applied and a set of bias 1's added to the output for proper bias weight computations (diagram and equations provided by professor Chuck Anderson).

Method

For this assignment, nnA3.tar is provided (by professor Chuck Anderson) to be used by the subroutines I wrote below. nnA3.tar contains the 3 files described below.

  • neuralnetworks.py defines the NeuralNetwork class that allows for single or multilayer neural networks to be initialized, trained, and used with various methods.
  • scaledconjugategradient.py defines scg and related functions for use in the back-propagation phase of neural network training.
  • mlutils.py defines the 'draw' function to visualize the neural network

I wrote following functions that train and evaluate linear and neural network models. (Note that model.use(X) is available for neural network models, so no function is needed).

  • model = trainLinear(X,T,parameters)
  • predict = useLinear(model,X)
  • error = evaluateLinear(model,X,T)
  • model = trainNN(X,T,parameters)
  • error = evaluateNN(model,X,T)
  • results = trainValidateTestKFolds(trainf,evaluatef,X,T,params,nFolds,shuffle,verbose)

trainValidateTestKFolds is a function used to partition data into folds and combine them into training, validation and testing subsets. It applies train and evaluate functions to the subsets. It uses the validation subset to determine the best model parameter values for any given test fold.

In [154]:
# Import necessary libraries
import numpy as np
import itertools
import matplotlib.pyplot as plt
%matplotlib inline

# Define useLinear and evaluateLinear based on HW2 code
def useLinear (model,X):
    Xplus1=np.ones((X.shape[0],X.shape[1]+1))
    Xplus1[:,1:]=(X-model['means'])/model['stds']
    return np.dot(Xplus1,model['w'])

def evaluateLinear (model,X,T):
    # Use model
    predict = useLinear(model,X)
    # Return RMSE using prediction
    return np.sqrt(np.mean((predict - T)**2))

# Define trainLinear with lambda (high lambda limit overtraining the linear model)
def trainLinear(X,T,lamb):
    means = X.mean(0)
    stds = X.std(0)
    n,d = X.shape
    Xs1 = np.insert( (X - means)/stds, 0, 1, axis=1)
    lambDiag = np.eye(d+1) * lamb
    lambDiag[0,0] = 0       # lambdiag has lambda on diagonal except for 0,0 (which is bias)
    w = np.linalg.lstsq( Xs1.T @ Xs1 + lambDiag, Xs1.T @ T)[0]
    return {'w': w, 'means':means, 'stds':stds}
In [96]:
import neuralnetworks as nn

# X and T are the input at target matrices to train for
# params[0] defines the number of layers and neurons per layer for the network
# params[1] defines the number of training iterations to run
def trainNN(X,T,params):
    nnet = nn.NeuralNetwork(X.shape[1],params[0],T.shape[1])
    nnet.train(X, T, errorPrecision=1.e-10, weightPrecision=1.e-10, nIterations=params[1])
    return nnet

def evaluateNN(model,X,T):
    predict = model.use(X)
    return np.sqrt(np.mean((predict - T)**2))
In [97]:
def trainValidateTestKFolds(trainf,evaluatef,X,T,parameterSets,nFolds,
                            shuffle=False,verbose=False):
    # Randomly arrange row indices if shuffle=True
    rowIndices = np.arange(X.shape[0])
    if shuffle:
        np.random.shuffle(rowIndices)
    # Calculate number of samples in each of the nFolds folds
    nSamples = X.shape[0]
    nEach = int(nSamples / nFolds)
    if nEach == 0:
        raise ValueError("partitionKFolds: Number of samples in each fold is 0.")
    # Calculate the starting and stopping row index for each fold.
    # Store in startsStops as list of (start,stop) pairs
    starts = np.arange(0,nEach*nFolds,nEach)
    stops = starts + nEach
    stops[-1] = nSamples
    startsStops = list(zip(starts,stops))
    # Repeat with testFold taking each single fold, one at a time
    results = []

    for testFold in range (nFolds): # Have testFold loop through all nFolds folds
        bestRMSE=float('inf')  # Set best RMSE so far to infinite then test params
        for parmSet in parameterSets:  # For each set of parameter values in the set
            RMSEvalidate=0
            for validateFold in range (nFolds): # Loop through remaining nFolds-1 folds
                if testFold == validateFold:
                    continue
                # trainFolds are remaining nFolds-2 folds, after selecting test and validate
                trainFolds = np.setdiff1d(range(nFolds), [testFold,validateFold])
                # Construct Xtrain and Ttrain by collecting rows for all trainFolds
                rows = []
                for tf in trainFolds:
                    a,b = startsStops[tf]                
                    rows += rowIndices[a:b].tolist()
                Xtrain = X[rows,:]
                Ttrain = T[rows,:]
                # Construct Xvalidate and Tvalidate
                a,b = startsStops[validateFold]
                rows = rowIndices[a:b]
                Xvalidate = X[rows,:]
                Tvalidate = T[rows,:]
                # Use trainf to fit model to training data using parmSet
                model = trainf(Xtrain,Ttrain,parmSet)
                # Calculate the error of this model by calling evaluatef with 
                #  the model and validation data
                RMSEvalidate += evaluatef(model,Xvalidate,Tvalidate)
            RMSEvalidate /= (nFolds-1)
            # If this error is less than the previously best error for parmSet, 
            # update best parameter values and best error
            if RMSEvalidate < bestRMSE:
                bestRMSE=RMSEvalidate
                bestParm = parmSet
        # Make a new set of training data by concatenating the training and 
        # validation data from previous step.
        Xtrain = np.concatenate((Xtrain,Xvalidate))
        Ttrain = np.concatenate((Ttrain,Tvalidate))
        # Retrain, using trainf again, to fit a new model to this new training data.
        model = trainf(Xtrain,Ttrain,bestParm)
        # Calculate error of this new model on the test data, and also on the new 
        #  training data.
        trainRMSE = evaluatef(model,Xtrain,Ttrain)
        # Construct Xtest and Ttest
        a,b = startsStops[testFold]
        rows = rowIndices[a:b]
        Xtest = X[rows,:]
        Ttest = T[rows,:]
        testRMSE = evaluatef(model,Xtest,Ttest)
        # Construct a list of the best parameter values with this training error, 
        #  the mean of the above valdiation errors, and the testing error
        result = list((bestParm,trainRMSE,bestRMSE,testRMSE))
        # Print this list if verbose == True
        if verbose: print ("Result: ",result)
        # Append this list to a result list
        results.append(result)
    # Return this result list
    return results

The parameters argument to the trainLinear function is just the value of $\lambda$. For the trainNN function it must specify the hidden layer structure and the number of Scaled Conjugate Gradient iterations. Here are some examples:

model = trainNN(X,T,[5, 100])       # Single hidden layer of 5 units, trained 
                                    # for 100 iterations
model = trainNN(X,T,[[10,10], 200]) # Two hidden layers, 10 units each, 
                                    # trained for 200 iterations

Data

My data comes from https://quantquote.com/historical-stock-data, which includes daily stock market data for all 500 of the stocks in the SP500 from 2004 through 2013. For this assignment, I chose to predict Microsoft stock price 3 months into the future using the current stock data for Google, Microsoft, AMD, Nvidia, Intel, Apple, Amazon, IBM, and Hewlett-Packard, as well as the price of those companies 3 months prior. Hence, there are 18 input variables and 1441 samples. There are 'only' 1441 sample points because I only include days that have data exactly 3 months before and after a given date, so if any of those days are on weekends or market holidays the data was not used.

Below are shown charts for the 9 stocks included in this analysis. Not explicitely shown is the target to predict (MSFT 3 months in the future), nor the charts for all 9 stocks 3 months in the past. Data sample 0 (row 1) is associated with 11/23/2004 and data sample 1440 is associated with 5/8/2013. 3 months is about 40-45 sample positions and a full year is about 170 samples in this data.

In [155]:
data = np.loadtxt('Stock.txt', usecols=range(19), delimiter=',')
T=np.array(data[:,0],ndmin=2).T    # The first column is the Target to predict: Microsoft stock price in 3 months
X=np.array(data[:,1:])  # The rest of the data is current or previous stock values
Stocks=list(("AAPL","AMD","AMZN","GOOG","HPQ","IBM","INTC","MSFT","NVDA"))
%precision 4
plt.figure(figsize=(15,10))
plt.suptitle('Stock prices', fontsize=40)
for c in range(9):  # Just plot the 9 stocks over time, not +/- 3 month values
    plt.subplot(3,3, c+1)
    plt.plot(data[:,c+1],'-')
    plt.ylabel(Stocks[c])
    plt.xlabel("Sample number (approx time in work days)")
    plt.grid(True)

Results

Linear results

Below are the results for 3 different runs of trainLinear. Notice that with 5 folds the shuffle=True run has much better test RSME results compared to the shuffle=False results. I believe this is due to the samples in the middle of the data (about row 600 to 700) being very hard for a linear model to predict correctly. If those rows are randomly spread amongst the training data, the model can account for them a bit better, but if they are all grouped into one fold then the results are poor. Note that with enough folds, this effect becomes smaller so having 20 folds allows the unshuffled model to train a bit better, as evidenced by smaller lambdas being found to match the data. Large lambda values imply the data in the validate folds is 'unpredictable' and hence weights are minimized.

In [156]:
result = trainValidateTestKFolds(trainLinear, evaluateLinear ,X, T, np.append(np.arange(0,9.1,1),(10,30,100,300,1000,3000,10000)), nFolds=5, shuffle=False)
print('Linear Model, nFolds=5, shuffle=False\n{:^10s} {:>10s} {:>10s} {:>10s}'.format('lambda', 'train', 'validate', 'test RMSE'))
for x in result:
    print('{:^10.2f} {:>10.3f} {:10.3f} {:10.3f}'.format(*x))
    
result = trainValidateTestKFolds(trainLinear, evaluateLinear ,X, T, np.append(np.arange(0,9.1,1),(10,30,100,300,1000,3000,10000)), nFolds=20, shuffle=False)
print('Linear Model, nFolds=20, shuffle=False\n{:^10s} {:>10s} {:>10s} {:>10s}'.format('lambda', 'train', 'validate', 'test RMSE'))
for x in result:
    print('{:^10.2f} {:>10.3f} {:10.3f} {:10.3f}'.format(*x))
    
result = trainValidateTestKFolds(trainLinear, evaluateLinear ,X, T, np.append(np.arange(0,9.1,0.1),(10,30,100,1000,10000)), nFolds=5, shuffle=True)
print('Linear Model, nFolds=5, shuffle=True\n{:^10s} {:>10s} {:>10s} {:>10s}'.format('lambda', 'train', 'validate', 'test RMSE'))
for x in result:
    print('{:^10.2f} {:>10.3f} {:10.3f} {:10.3f}'.format(*x))
Linear Model, nFolds=5, shuffle=False
  lambda        train   validate  test RMSE
 10000.00       3.069      4.043      2.789
 3000.00        2.486      3.109      3.441
 10000.00       2.541      3.603      4.832
 3000.00        2.704      4.101      2.041
 3000.00        2.435      3.064      5.151
Linear Model, nFolds=20, shuffle=False
  lambda        train   validate  test RMSE
 1000.00        2.363      2.557      0.856
  300.00        2.184      2.528      1.205
 1000.00        2.367      2.558      1.302
 1000.00        2.300      2.463      2.992
 1000.00        2.347      2.476      2.258
 1000.00        2.335      2.479      2.319
  300.00        2.076      2.404      3.817
  300.00        2.114      2.461      2.626
  30.00         1.864      2.364      4.344
 1000.00        2.198      2.401      5.675
 1000.00        2.360      2.574      1.172
 1000.00        2.282      2.431      3.374
 1000.00        2.322      2.482      2.370
  100.00        2.006      2.396      3.145
  300.00        2.109      2.416      3.019
  300.00        2.188      2.512      1.584
 1000.00        2.335      2.490      2.586
  300.00        2.181      2.544      1.312
  300.00        2.143      2.436      2.555
 1000.00        2.263      2.387      4.394
Linear Model, nFolds=5, shuffle=True
  lambda        train   validate  test RMSE
   0.80         2.007      2.045      1.904
   1.80         1.992      2.032      1.972
   0.90         1.919      1.950      2.256
   1.20         1.997      2.032      1.950
   1.30         1.994      2.023      1.978

Based on the lambda results for the shuffle=True runs, below I show the model's ability to fit the training data with lambda=1. (Traning vs test results are shown later in comparison to neural network results). Also shown below is the RMSE of the model relative to the actual data as a running series. The RMSE shown looks at the 50 samples before and after the given sample number, and confirms that the center data in the graph (approximately correlating to the economic crash of 2008) is not predicted well by the model.

In [158]:
model = trainLinear(X,T,1)
Y = useLinear(model,X)
rmse=np.arange(X.shape[0])+0.0 # Initialize rmse to correct size of floating point numbers
for i in range(X.shape[0]): 
    rmse[i] = np.sqrt(np.mean((Y[(0 if i<50 else i-50):(X.shape[0] if i>X.shape[0]-50 else i+50)] - 
                               T[(0 if i<50 else i-50):(X.shape[0] if i>X.shape[0]-50 else i+50)])**2))
plt.plot(T,'-',label="Actual data")
plt.plot(Y,'-',label="Model prediction")
plt.plot(rmse,'-',label="RMSE of nearest 100 samples")
plt.ylabel("Microsoft stock price")
plt.xlabel("Sample number (approx time in workdays)")
plt.legend(loc='best')
plt.ylim(0,45)
plt.grid(True);

Neural network results

Below are results for various neural network tests. I initially tested with the neural network sizes shown in the first parms setting (and iteration counts of 100 and 500). These results showed that a neural network with only 1 hidden layer of 20 neurons tended to fit the data best, but other options had some merit. So I reran with some the neural network sizes adjusted a bit to explore the best network further and found that 1 layer of 25 or 2 layers of 15,10 looked pretty good. Finally, I ran all 3 'good' options with more iterations to find the best of the group (that run took about 10 minutes on my laptop). The best option appears to be 2 layers with 15 and 10 neurons, and it converged to the lowest RMSE in only 500 iterations. It's interesting to note that the test RMSE isn't improving much even as I tune the parameter list - this implies to me that the neural network may be overfitting the data in some way.

In [159]:
parms = list(itertools.product([5, 10, 15, 20, [10, 10], [10, 5, 10]], [100, 500]))
result = trainValidateTestKFolds(trainNN, evaluateNN, X, T, parms, nFolds=5, shuffle=True)
print('{:>30s} {:>10s} {:>10s} {:>10s}'.format('(Hidden Units, Iterations)', 'train', 'validate', 'test RMSE'))
for x in result:
    print('{:>30s} {:10.3f} {:10.3f} {:10.3f}'.format(str(x[0]), *x[1:]))
    (Hidden Units, Iterations)      train   validate  test RMSE
               ([10, 10], 500)      0.478      0.680      0.656
                     (20, 500)      0.451      0.651      0.615
                     (20, 500)      0.441      0.637      0.597
                     (15, 500)      0.483      0.671      0.628
                     (20, 500)      0.420      0.653      0.626
In [161]:
parms = list(itertools.product([15, 20, 25, [15, 10], [10, 5], [10, 10]], [100, 500]))
result = trainValidateTestKFolds(trainNN, evaluateNN, X, T, parms, nFolds=5, shuffle=True)
print('{:>30s} {:>10s} {:>10s} {:>10s}'.format('(Hidden Units, Iterations)', 'train', 'validate', 'test RMSE'))
for x in result:
    print('{:>30s} {:10.3f} {:10.3f} {:10.3f}'.format(str(x[0]), *x[1:]))
    (Hidden Units, Iterations)      train   validate  test RMSE
               ([15, 10], 500)      0.474      0.664      0.566
                     (25, 500)      0.411      0.643      0.660
               ([15, 10], 500)      0.402      0.646      0.633
               ([15, 10], 500)      0.401      0.656      0.646
                     (25, 500)      0.422      0.684      0.559
In [162]:
parms = list(itertools.product([20, 25, [15, 10]], [200, 500, 1000, 1500]))
result = trainValidateTestKFolds(trainNN, evaluateNN, X, T, parms, nFolds=5, shuffle=True)
print('{:>30s} {:>10s} {:>10s} {:>10s}'.format('(Hidden Units, Iterations)', 'train', 'validate', 'test RMSE'))
for x in result:
    print('{:>30s} {:10.3f} {:10.3f} {:10.3f}'.format(str(x[0]), *x[1:]))
    (Hidden Units, Iterations)      train   validate  test RMSE
                     (20, 500)      0.429      0.653      0.686
               ([15, 10], 500)      0.409      0.664      0.591
               ([15, 10], 500)      0.401      0.660      0.661
                     (20, 500)      0.436      0.651      0.636
               ([15, 10], 500)      0.417      0.647      0.564

Note how much lower the RMSE is for the neural network than the linear network: "test RSME" ranges from 0.56 to 0.69 for the neural network but 1.9 to 2.3 for the linear network. Below is a graph showing how well the neural network learned the training data. Below the graph of actual versus model prediction is the RMSE of the nearest hundred samples again, but this time multiplied by 10 to allow for variations to be visible. The error is more constant across the series, implying the neural network was able to fit the data during the economic crash better than the linear model.

In [163]:
model = trainNN(X,T,[[15, 10], 500])
Y = model.use(X)
rmse=np.arange(X.shape[0])+0.0 # Initialize rmse to correct size of floating point numbers
for i in range(X.shape[0]): 
    rmse[i] = 10*np.sqrt(np.mean((Y[(0 if i<50 else i-50):(X.shape[0] if i>X.shape[0]-50 else i+50)] - 
                               T[(0 if i<50 else i-50):(X.shape[0] if i>X.shape[0]-50 else i+50)])**2))
plt.plot(T,'-',label="Actual data")
plt.plot(Y,'-',label="Model prediction")
plt.plot(rmse,'-',label="RMSE*10 of nearest 100 samples")
plt.ylabel("Microsoft stock price")
plt.xlabel("Sample number (approx time in workdays)")
plt.legend(loc='best')
plt.ylim(0,45)
plt.grid(True);

Graphical comparison of linear and neural network models

For this head-to-head comparison, I train all the models in a way mimicing model usage - the training data is the first 1270 samples in the data set, leaving about a year of time to predict on the final 171 samples. Below I graph 3 different neural network runs against the linear model and actual data for comparison. Also shown are the RMSE values for the 4 models against the actual data. Even though the neural network was able to train much tighter to the training data, none of the 3 neural network runs beat the linear model in actual prediction on this test. Note also that 2 of the neural network models are the same model, just trained 2 different times with different starting random weights. It seems that it is easy for a neural network to overtrain this data set. The 3rd neural network I tested only has 5 neurons in 1 layer and it generally did about as well as the deeper network.

In [203]:
Xtrain=X[0:1270,:];      Ttrain=T[0:1270,:];
Xtest=X[1270:1441,:];    Ttest=T[1270:1441,:]
modell = trainLinear(Xtrain,Ttrain,1)
Yl = useLinear(modell,Xtest)
modeln1 = trainNN(Xtrain,Ttrain,[[15,10], 500])
Yn1 = modeln1.use(Xtest)
modeln2 = trainNN(Xtrain,Ttrain,[[15,10], 500])
Yn2 = modeln2.use(Xtest)
modeln3 = trainNN(Xtrain,Ttrain,[5, 500])
Yn3 = modeln3.use(Xtest)

plt.plot(Ttest,'-',label="Actual data")
plt.plot(Yn1,'-',label="NN Model [15,10] prediction #1")
plt.plot(Yn2,'-',label="NN Model [15,10] prediction #2")
plt.plot(Yn3,'-',label="NN Model [5] prediction")
plt.plot(Yl,'-',label="Linear Model prediction")
plt.ylabel("Microsoft stock price")
plt.xlabel("Sample number (approx time in workdays)")
plt.legend(loc='best')
plt.ylim(20,45)
plt.grid(True);
print("NN [15,10] RMSE#1=%.2f  NN [15,10] RMSE#2=%.2f  NN [5] RMSE=%.2f  Linear RMSE=%.2f" %
      (np.sqrt(np.mean((Yn1-Ttest)**2)),
      np.sqrt(np.mean((Yn2-Ttest)**2)),
      np.sqrt(np.mean((Yn3-Ttest)**2)),
      np.sqrt(np.mean((Yl-Ttest)**2))))
NN [15,10] RMSE#1=2.34  NN [15,10] RMSE#2=3.98  NN [5] RMSE=2.55  Linear RMSE=2.10

Below are shown the input data order and the weights used by both the linear and 5-neuron models. By way of illustration, note that the linear model found the current IBM price and the 3-month ago IBM price to be the 2 most relevant features for predicting Microsoft stock 3 months in the future. Looking at the 2 sets of weights for the neural network, and squinting a bit, it seems that a similar relevance was found (the IBM data row has 4 modest red squares and 1 large red square, and most of the weights in the 2nd layer are negative).

In [204]:
colnames=['Bias']; colnames.extend(Stocks); colnames.extend(Stocks)
for i in range(10,19): colnames[i] += " 3 months ago"
for i in range(len(colnames)): print("%33s: %.3f" % (colnames[i],modell['w'][i]))
                             Bias: 24.073
                             AAPL: -0.876
                              AMD: -1.433
                             AMZN: 0.321
                             GOOG: 1.267
                              HPQ: 0.251
                              IBM: 2.193
                             INTC: 0.686
                             MSFT: 0.488
                             NVDA: 0.704
                AAPL 3 months ago: 0.446
                 AMD 3 months ago: 1.203
                AMZN 3 months ago: 0.292
                GOOG 3 months ago: -0.055
                 HPQ 3 months ago: -1.393
                 IBM 3 months ago: -1.969
                INTC 3 months ago: -0.276
                MSFT 3 months ago: 0.200
                NVDA 3 months ago: -0.148
In [205]:
modeln3.draw(['x'],['y'])

Additional testing

In the cells below are tests to insure the subroutines I wrote are behaving as expected. I included the 'famous' XOR function as a test, which the multilayer neural network can learn, but not the linear model.

In [206]:
# This is the linear example provided in the assignment description:
X = np.arange(100).reshape((-1, 1))
T = np.abs(X -50) + X

result = trainValidateTestKFolds(trainLinear, evaluateLinear ,X, T, range(0,101,10), nFolds=5, shuffle=False)
                                 
print('Linear Model\n{:^10s} {:>10s} {:>10s} {:>10s}'.format('lambda', 'train', 'validate', 'test RMSE'))
for x in result:
    print('{:^10.2f} {:>10.3f} {:10.3f} {:10.3f}'.format(*x))
# The linear model will always produce the same result if shuffle is off, so check the sum
if (np.sum(result) > 657.04 and np.sum(result) < 657.05):  print("Passed")
else:                                                      print("Failed")
Linear Model
  lambda        train   validate  test RMSE
  30.00        13.576     20.035     19.301
  100.00       22.438     25.508     20.099
  30.00        14.419     20.677     25.194
  70.00        21.098     30.736      9.615
  100.00       13.036     15.960     55.356
Passed
In [207]:
# This is the neural network example provided in the assignement description:
parms = list(itertools.product([2, 5, 10, 20, [5, 5], [10, 2, 10]], [10, 20, 100, 500]))
result = trainValidateTestKFolds(trainNN, evaluateNN, X, T, parms, nFolds=5, shuffle=False)
print('NN Model\n{:>30s} {:>10s} {:>10s} {:>10s}'.format('(Hidden Units, Iterations)', 'train', 'validate', 'test RMSE'))
for x in result:
    print('{:>30s} {:10.3f} {:10.3f} {:10.3f}'.format(str(x[0]), *x[1:]))
if (sum(result[2][1:]) < 10.0 and sum(result[4][1:]) > 7.0): print("Passed")
else:                                                        print("Failed")
NN Model
    (Hidden Units, Iterations)      train   validate  test RMSE
                 ([5, 5], 500)      0.164      2.375      0.293
                 ([5, 5], 500)      0.207      4.100      0.510
            ([10, 2, 10], 500)      0.070      1.633      2.093
                     (20, 500)      0.322     10.813      0.945
                     (20, 500)      0.157      3.634      5.777
Passed
In [208]:
# Check that 2 target collums and 4 input collumns work as expected in linear model
X = np.arange(100).reshape((50,2))
T = np.abs(X -50) + X
X = np.arange(200).reshape(50,4)

result = trainValidateTestKFolds(trainLinear, evaluateLinear ,X, T, range(0,101,10), nFolds=5, shuffle=False)
                                 
print('{:^10s} {:>10s} {:>10s} {:>10s}'.format('lambda', 'train', 'validate', 'test RMSE'))
for x in result:
    print('{:^10.2f} {:>10.3f} {:10.3f} {:10.3f}'.format(*x))
# The linear model will always produce the same result if shuffle is off, so check the sum
if (np.sum(result) > 744.89 and np.sum(result) < 744.90):  print("Passed")
else:                                                      print("Failed")
trainLinear(X,T,0.2)
  lambda        train   validate  test RMSE
  60.00        13.569     20.030     19.296
  100.00       19.195     27.653     16.207
  70.00        15.034     20.640     25.202
  100.00       19.494     31.380      8.328
  100.00       11.720     17.076     50.071
Passed
Out[208]:
{'means': array([  98.,   99.,  100.,  101.]),
 'stds': array([ 57.7235,  57.7235,  57.7235,  57.7235]),
 'w': array([[ 74.    ,  75.    ],
        [  6.9919,   7.2082],
        [  6.9919,   7.2082],
        [  6.9919,   7.2082],
        [  6.9919,   7.2082]])}
In [210]:
# Check that NN works as expected with 2 target columns
result = trainValidateTestKFolds(trainNN, evaluateNN, X, T, parms, nFolds=5, shuffle=False)
print('{:>30s} {:>10s} {:>10s} {:>10s}'.format('(Hidden Units, Iterations)', 'train', 'validate', 'test RMSE'))
for x in result:
    print('{:>30s} {:10.3f} {:10.3f} {:10.3f}'.format(str(x[0]), *x[1:]))
if (sum(result[0][1:])<6 and sum(result[3][1:])<19 and sum(result[1][1:])>1): print("Passed")
else:                                                                         print("Failed")
    (Hidden Units, Iterations)      train   validate  test RMSE
                 ([5, 5], 500)      0.251      2.531      0.580
            ([10, 2, 10], 500)      0.253      2.892      0.247
                     (10, 500)      0.120      1.760      3.049
                     (20, 500)      0.252     12.617      1.142
                     (20, 500)      0.156      4.076      8.961
Passed
In [213]:
# Learn the XOR function: at least 2 hidden units are needed for low RMSE
X=np.array([[0,0],[0,1],[1,0],[1,1],[0,0],[0,1],[1,0],[1,1],[0,0],[0,1],[1,0],[1,1]])
T=np.array([[0],  [1],  [1],  [0],  [0],  [1],  [1],  [0],  [0],  [1],  [1],  [0]])
parms = list(itertools.product([1, 2, 5, [5, 5]], [10, 20]))
result = trainValidateTestKFolds(trainNN, evaluateNN, X, T, parms, nFolds=5, shuffle=False)
print('{:>30s} {:>10s} {:>10s} {:>10s}'.format('(Hidden Units, Iterations)', 'train', 'validate', 'test RMSE'))
for x in result:
    print('{:>30s} {:10.3f} {:10.3f} {:10.3f}'.format(str(x[0]), *x[1:]))
    (Hidden Units, Iterations)      train   validate  test RMSE
                  ([5, 5], 20)      0.017      0.001      0.024
                       (5, 20)      0.000      0.052      0.001
                       (5, 20)      0.000      0.001      0.000
                       (2, 20)      0.317      0.046      0.501
                       (5, 20)      0.000      0.001      0.000
In [214]:
# Check that linear model can't learn XOR function
result = trainValidateTestKFolds(trainLinear, evaluateLinear ,X, T, range(0,101,10), nFolds=5, shuffle=False)

print('{:^10s} {:>10s} {:>10s} {:>10s}'.format('lambda', 'train', 'validate', 'test RMSE'))
for x in result:
    print('{:^10.2f} {:>10.3f} {:10.3f} {:10.3f}'.format(*x))
# The linear model has 0.5 RMSE on a 0/1 output and 100 lambda; it just outputs average
  lambda        train   validate  test RMSE
  100.00        0.498      0.505      0.509
  100.00        0.498      0.505      0.509
  100.00        0.498      0.505      0.509
  100.00        0.498      0.505      0.509
  100.00        0.500      0.509      0.500

Grading

Your notebook will be run and graded automatically. Download A3grader.tar and extract A3grader.py from it. Run the code in the following cell to demonstrate an example grading session. You should see a perfect score of 80/100 if your functions are defined correctly. The remaining 20% will be based on the instructors reading of your notebooks. We will be looking for how well the method is explained in text with some LaTeX math, and how well the results are summarized.

In [215]:
%run -i "A3grader.py"
 Testing: result = trainValidateTestKFolds(trainLinear,evaluateLinear,X,T,
                  range(0,101,10),nFolds=5,shuffle=False)
 Your result is
    10   3.158   4.132   2.414
    20   4.368   5.021   3.641
    10   3.245   4.178   5.03
    20   4.448   6.07   2.024
    20   2.426   2.972   10.89
20/20 points. First column, of best lambda values, is correct.
20/20 points. Columns of RMSE values are correct.

 Testing:
   import itertools
   parms = list(itertools.product([[5],[5,5],[2,2,2]], [10,50,100,200]))
   te = []
   for rep in range(5):
       result = trainValidateTestKFolds(trainNN,evaluateNN,X,T,
                                        parms,
                                        nFolds=4,shuffle=False)
       resulte = np.array([r[1:] for r in result])
       meanTestRMSE = resulte[:,-1].mean()
       print('     ',meanTestRMSE)
       te.append(meanTestRMSE)
      1.8943064223
      0.932034657603
      1.74883968203
      1.49452365068
      1.71647459935
40/40 points. Mean test RMSE is less than 5 as it should be.

notebooks Grade is 80/100
Up to 20 more points will be given based on the qualty of your descriptions of the method and the results.