{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Searching for Good Weights in a Linear Model\n",
    "\n",
    "Topics covered in this notebook:\n",
    "\n",
    "1. Loading air quality data as pandas DataFrame.\n",
    "1. Extracting data we want from the DataFrame, into $X$ and $T$.\n",
    "1. A linear model.\n",
    "1. Optimizing weights of linear model with manual guessing. \n",
    "1. Optimizing weights with Coordinate Descent.\n",
    "1. Optimizing weights with Run and Twiddle.\n",
    "1. Optimizing weights with Stochastic Gradient Descent.\n",
    "1. Optimizing weights with AdamW.\n",
    "\n",
    "You can skip ahead in the video recording of this lecture by scanning ahead for large colored banner with next topic, like the following banner."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "n_topics = 8\n",
    "topic_i = 0\n",
    "\n",
    "def new_topic(txt):\n",
    "    global topic_i\n",
    "    topic_i += 1\n",
    "    \n",
    "    txt = f'\\n ({topic_i} of {n_topics})\\n\\n ' + txt + ' \\n'\n",
    "    font = {'family': 'serif',\n",
    "        'color':  'darkblue',\n",
    "        'weight': 'bold',\n",
    "        'size': 40,\n",
    "        }\n",
    "    # plt.figure(figsize=(30, 18))\n",
    "    plt.axis('off')\n",
    "    plt.text(0.5, 0.5, txt, ha='center',  wrap=True,\n",
    "             backgroundcolor='lightyellow', \n",
    "             fontdict=font,\n",
    "             bbox=dict(facecolor='yellow',\n",
    "                       edgecolor='blue',\n",
    "                       linewidth=5))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading air quality data as a pandas DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "new_topic('Loading air quality data as pandas DataFrame')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:49:32.991321Z",
     "start_time": "2023-08-21T19:49:32.013416Z"
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "import pandas  # for reading csv file\n",
    "from IPython.display import display, clear_output  # for animations later in this notebook"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, let's find and download some interesting data.  The [machine learning repository at the University of California, Irvine](http://archive.ics.uci.edu/ml), is a great resource for publicly available data with explanations for machine learning researchers.  Here we download the air quality data set.  If `curl` is not available on your system, you may use the above link to find and download this data.  It is useful to go the link and find the page describing this data set. That page is [here](http://archive.ics.uci.edu/ml/datasets/Air+quality)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:49:39.218899Z",
     "start_time": "2023-08-21T19:49:38.452514Z"
    }
   },
   "outputs": [],
   "source": [
    "!curl -O https://archive.ics.uci.edu/ml/machine-learning-databases/00360/AirQualityUCI.zip\n",
    "!unzip -o AirQualityUCI.zip"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will use the [pandas](https://pandas.pydata.org/) package to read this data.  The `pandas.read_csv` function is extremely useful for reading in all kinds of data with various peculiarities.  Here are the first few lines of `AirQualityUCI.csv`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:49:41.945232Z",
     "start_time": "2023-08-21T19:49:41.828632Z"
    }
   },
   "outputs": [],
   "source": [
    "!head AirQualityUCI.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice a few things.  Fields are separated by semi-colons.  The first line is names for each variable, appearing in separate columns.  Each row is one sample.  Each line ends with two semi-colons. Not immediately obvious is that the decimal values follow the European convention of using a comma instead of decimal point.  Not demonstrated in these first few lines is the fact that missing measurements are given the value -200.\n",
    "\n",
    "All of these issues can be dealt with directly in the call to `pandas.read_csv`.  I don't mean to imply that I got this right on my first try.  The two most puzzling issues were the two semi-colons at the end of each line and the commas for decimal points.  The double semi-colons caused the data returned by `pandas.read_csv` to have more columns than I expected.\n",
    "\n",
    "Very good pandas tutorials are available, such as [Pandas Illustrated: The Definitive Visual Guide to Panda](https://betterprogramming.pub/pandas-illustrated-the-definitive-visual-guide-to-pandas-c31fa921a43) by Lev Maximov, and [Pandas tutorials](http://pandas.pydata.org/pandas-docs/stable/tutorials.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:49:43.181061Z",
     "start_time": "2023-08-21T19:49:43.148909Z"
    }
   },
   "outputs": [],
   "source": [
    "data = pandas.read_csv('AirQualityUCI.csv', delimiter=';', decimal=',', usecols=range(15), na_values=-200)\n",
    "data = data.dropna(axis=0)\n",
    "data.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So, we have 827 rows and 15 columns of data.  This means that we read 827 samples that do not have missing values, and each sample contains 15 values.  Let's look at the first few rows of this data matrix, called a `DataFrame` in `pandas`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:49:44.225075Z",
     "start_time": "2023-08-21T19:49:44.196286Z"
    }
   },
   "outputs": [],
   "source": [
    "data.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's create a simple problem for playing with this data.  Let's say we want to predict the level of carbon monoxide from the time of day.  The column `Time` contains the hour, but not just the hour.  9am will appear as 09.00.00.  Whoopee!  This will give us a chance to practice our skills at extracting substrings, converting strings to integers, and doing these steps for all of the `Time` values within a concise little list comprehension.  You don't know what this is?  Well, it is time to get comfortable not knowing, and typing 'python list comprehension' into your favorite web search engine."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:49:45.169638Z",
     "start_time": "2023-08-21T19:49:45.164341Z"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "data['Time'][:10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:49:45.633034Z",
     "start_time": "2023-08-21T19:49:45.628672Z"
    }
   },
   "outputs": [],
   "source": [
    "[t for t in data['Time'][:10]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:49:46.063397Z",
     "start_time": "2023-08-21T19:49:46.059251Z"
    }
   },
   "outputs": [],
   "source": [
    "[t[:2] for t in data['Time'][:10]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:49:46.438308Z",
     "start_time": "2023-08-21T19:49:46.434561Z"
    }
   },
   "outputs": [],
   "source": [
    "[int(t[:2]) for t in data['Time'][:10]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:49:46.813454Z",
     "start_time": "2023-08-21T19:49:46.810781Z"
    }
   },
   "outputs": [],
   "source": [
    "hour = [int(t[:2]) for t in data['Time']]\n",
    "len(hour)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To get the carbon monoxide measurements for each sample, you can read the data description at the UCI web site to learn that column `CO(GT)` is the ground truth measurement of carbon monoxide."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:49:47.768028Z",
     "start_time": "2023-08-21T19:49:47.764070Z"
    }
   },
   "outputs": [],
   "source": [
    "data.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:49:48.160427Z",
     "start_time": "2023-08-21T19:49:48.155639Z"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "CO = data['CO(GT)']\n",
    "CO[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Extracting data we want from the DataFrame, into $X$ and $T$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "new_topic('Extracting data we want from the DataFrame, into $X$ and $T$')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here I will introduce a convention I will follow throughout this class.  Inputs to a model are given in a matrix named `X`.  Samples are in rows, and the components, measurements, variables, thingies of each sample are given in the columns.  The desired, correct outputs for each sample are given in a matrix named `T`, for **T**argets.  The $i^{th}$ row of `X` is Sample $i$ whose correct target output is in row $i$ of `T`.  Yes, you excellent software developers, `X` and `T` are parallel arrays, which should set of alarms in your coding brains.  As long as we remember that we cannot reorder the rows n `X` without doing the same reording of rows in `T`, we will be okay.\n",
    "\n",
    "Let's set this up for our hour to CO problem."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:49:48.952142Z",
     "start_time": "2023-08-21T19:49:48.947746Z"
    }
   },
   "outputs": [],
   "source": [
    "T = CO\n",
    "T = np.array(T).reshape((-1, 1))  # make T have one column and as many rows as needed to hold the values of T\n",
    "Tnames = ['CO']\n",
    "\n",
    "X = np.array(hour).reshape((-1, 1))\n",
    "Xnames = ['Hour']\n",
    "\n",
    "print('X.shape =', X.shape, 'Xnames =', Xnames, 'T.shape =', T.shape, 'Tnames =', Tnames)\n",
    "# or, using the latest formatting ability in python strings,\n",
    "print(f'{X.shape=} {Xnames=} {T.shape=} {Tnames=}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Say after me...plot early and often!  We can never have too many visualizations.  This next plot verifies that we have defined `X` and `T` correctly.  What else do you notice?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.plot(X, T)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Whoa!  That's a mess.  Why?\n",
    "\n",
    "Let's just plot a circle marker on each data point. And, never, never, forget to label the $x$ and $y$ axes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:49:49.980885Z",
     "start_time": "2023-08-21T19:49:49.788005Z"
    }
   },
   "outputs": [],
   "source": [
    "plt.plot(X, T, '.')\n",
    "plt.xlabel(Xnames[0])\n",
    "plt.ylabel(Tnames[0]);  # semi-colon here prevents printing the cryptic result of call to plt.ylabel()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Well, what do you think?  Will we be able to predict `CO` from `Hour` with a linear model?  The predictions of linear model must appear as a straight line in this plot."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## A linear model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "new_topic('A linear model')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What is a linear model?  \n",
    "\n",
    "Well, a linear model of one variable is specified with a y-intercept and a slope.  These are the two parameters of the linear model.  Let's call them `w0` and `w1`.  If the output of the linear model is `y`, then we have `y = w0 + x * w1`.  Latex makes a nice mathy representation of this.\n",
    "\n",
    "$$f(x) = w_0 + x\\, w_1$$\n",
    "\n",
    "Let's wrap this up in a little function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:49:50.573280Z",
     "start_time": "2023-08-21T19:49:50.570769Z"
    }
   },
   "outputs": [],
   "source": [
    "def linear_model(x, w0, w1):\n",
    "    return w0 + x * w1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So, what values should we use for `w0` and `w1` to make a good prediction of `CO`?  What method shall we use to find good values?  How about good old guessing, or maybe we could call this trial and error.  We will just pick some values and plot the predictions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Optimizing weights of linear model with manual guessing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "new_topic('Optimizing weights of linear model with manual guessing')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:49:51.657192Z",
     "start_time": "2023-08-21T19:49:51.491191Z"
    }
   },
   "outputs": [],
   "source": [
    "w0 = 0\n",
    "w1 = 1\n",
    "\n",
    "Y = linear_model(X, w0, w1)\n",
    "\n",
    "plt.plot(X, T, '.', label='Actual CO')\n",
    "plt.plot(X, Y, 'r.-', label='Predicted CO')\n",
    "plt.xlabel(Xnames[0])\n",
    "plt.ylabel(Tnames[0])\n",
    "plt.legend();  # make legend using the label strings"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Well, clearly our predictions are climbing much too quickly  The slope, or `w1`, is too high.  Try a smaller value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:49:52.458138Z",
     "start_time": "2023-08-21T19:49:52.275781Z"
    }
   },
   "outputs": [],
   "source": [
    "w1 = 0.1\n",
    "Y = linear_model(X, w0, w1)\n",
    "\n",
    "plt.plot(X, T, '.', label='Actual CO')\n",
    "plt.plot(X, Y, 'r.-', label='Predicted CO')\n",
    "plt.xlabel(Xnames[0])\n",
    "plt.ylabel(Tnames[0])\n",
    "plt.legend(); "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Maybe too low."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:49:53.785598Z",
     "start_time": "2023-08-21T19:49:53.603530Z"
    }
   },
   "outputs": [],
   "source": [
    "w1 = 0.3\n",
    "Y = linear_model(X, w0, w1)\n",
    "\n",
    "plt.plot(X, T, '.', label='Actual CO')\n",
    "plt.plot(X, Y, 'r.-', label='Predicted CO')\n",
    "plt.xlabel(Xnames[0])\n",
    "plt.ylabel(Tnames[0])\n",
    "plt.legend(); "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Okay.  Now we can try to increase the y-intercept, `w0`, a bit."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:49:54.731291Z",
     "start_time": "2023-08-21T19:49:54.547800Z"
    }
   },
   "outputs": [],
   "source": [
    "w0 = 0.4\n",
    "Y = linear_model(X, w0, w1)\n",
    "\n",
    "plt.plot(X, T, '.', label='Actual CO')\n",
    "plt.plot(X, Y, 'r.-', label='Predicted CO')\n",
    "plt.xlabel(Xnames[0])\n",
    "plt.ylabel(Tnames[0])\n",
    "plt.legend(); "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We could do this all day.  (Well, maybe a few more times.)  \n",
    "\n",
    "What we really need is a way to quantify how good our linear model is doing.  Let's define a function to calculate the error by calling `linear_model` to get our predictions, `Y`,  and compare them to the target `T` values.  The comparison will be done with the common root-mean-square-error, or RMSE, approach, for which the difference between `T` and `Y` is squared, averaged, and the square root of the result is returned."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:49:55.463124Z",
     "start_time": "2023-08-21T19:49:55.460024Z"
    }
   },
   "outputs": [],
   "source": [
    "def rmse(X, T, w0, w1):\n",
    "    Y = linear_model(X, w0, w1)\n",
    "    return np.sqrt(np.mean((T - Y)**2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we are ready to automate the guessing approach we attempted above.  Let's define a search algorithm that\n",
    "\n",
    "   - bumps a weight up and down to determine which direction decreases the error,\n",
    "   - repeatedly shift that weight in that direction until the error increases, and\n",
    "   - repeat these steps with the other weight, and\n",
    "   - repeat all steps multiple times.\n",
    "   \n",
    "This search algorithm is sometimes called [coordinate descent](https://en.wikipedia.org/wiki/Coordinate_descent)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Optimizing weights with Coordinate Descent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "new_topic('Optimizing weights with Coordinate Descent')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, let's modify `w0`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:49:57.042867Z",
     "start_time": "2023-08-21T19:49:57.036031Z"
    }
   },
   "outputs": [],
   "source": [
    "w0 = 0.4   # Initial guess at weight values\n",
    "w1 = 0.5\n",
    "\n",
    "dw = 0.1   # How much to change a weight's value on each step.\n",
    "\n",
    "current_error = rmse(X, T, w0, w1)\n",
    "up_error = rmse(X, T, w0 + dw, w1)\n",
    "down_error = rmse(X, T, w0 - dw, w1)\n",
    "\n",
    "if down_error < current_error:\n",
    "    dw = -dw\n",
    "    new_error = down_error\n",
    "else:\n",
    "    new_error = up_error\n",
    "    \n",
    "while new_error <= current_error:\n",
    "    current_error = new_error\n",
    "    w0 = w0 + dw\n",
    "    new_error = rmse(X, T, w0, w1)\n",
    "    print(f'{w0=:5.2f} {new_error=:.5f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's modify $w_1$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:49:57.946008Z",
     "start_time": "2023-08-21T19:49:57.940504Z"
    }
   },
   "outputs": [],
   "source": [
    "dw = 0.1\n",
    "current_error = rmse(X, T, w0, w1)\n",
    "up_error = rmse(X, T, w0, w1 + dw)\n",
    "down_error = rmse(X, T, w0, w1 - dw)\n",
    "\n",
    "if down_error < current_error:\n",
    "    dw = -dw\n",
    "    new_error = down_error\n",
    "else:\n",
    "    new_error = up_error\n",
    "    \n",
    "while new_error <= current_error:\n",
    "    current_error = new_error\n",
    "    w1 = w1 + dw\n",
    "    new_error = rmse(X, T, w0, w1)\n",
    "    print('w1 = {:.2f} new_error = {:.5f}'.format(w1, new_error))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lot's of repeated code here.  We don't want to copy and paste for each iteration.  All steps are put together in the following function.  Let's collect the RMSE after each update in a list named `error_sequence`, and also the values of `w0` and `w1` after each update in list named `W_sequence`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:49:58.714784Z",
     "start_time": "2023-08-21T19:49:58.708200Z"
    }
   },
   "outputs": [],
   "source": [
    "def coordinate_descent(errorF, X, T, w0, w1, dw, nSteps):\n",
    "    \n",
    "    step = 0\n",
    "    current_error = errorF(X, T, w0, w1)\n",
    "    error_sequence = [current_error]\n",
    "    W_sequence = [[w0, w1]]\n",
    "    changed = False\n",
    "\n",
    "    while step < nSteps:\n",
    "\n",
    "        step += 1\n",
    "        \n",
    "        if not changed:\n",
    "            dw = dw * 0.1\n",
    "            \n",
    "        changed = False\n",
    "        \n",
    "        # Iteratively update w0, until no improvement.\n",
    "\n",
    "        up_error = errorF(X, T, w0 + dw, w1)\n",
    "        down_error = errorF(X, T, w0 - dw, w1)\n",
    "        \n",
    "        if down_error < current_error:\n",
    "            dw = -dw\n",
    "            \n",
    "        while True:\n",
    "            new_w0 = w0 + dw\n",
    "            new_error = errorF(X, T, new_w0, w1)\n",
    "            if new_error >= current_error or step > nSteps:\n",
    "                break\n",
    "            changed = True\n",
    "            w0 = new_w0\n",
    "            W_sequence.append([w0, w1])\n",
    "            error_sequence.append(new_error)\n",
    "            current_error = new_error\n",
    "            step += 1\n",
    "\n",
    "        # Now iteratively update w1, until no improvement.\n",
    "        \n",
    "        up_error = errorF(X, T, w0, w1 + dw)\n",
    "        down_error = errorF(X, T, w0, w1 - dw)\n",
    "        \n",
    "        if down_error < current_error:\n",
    "            dw = -dw\n",
    "            \n",
    "        while True:\n",
    "            new_w1 = w1 + dw\n",
    "            new_error = errorF(X, T, w0, new_w1)\n",
    "            if new_error >= current_error or step > nSteps:\n",
    "                break\n",
    "            changed = True\n",
    "            w1 = new_w1\n",
    "            W_sequence.append([w0, w1])\n",
    "            error_sequence.append(new_error)\n",
    "            current_error = new_error\n",
    "            step += 1\n",
    "\n",
    "    # When nSteps have been taken, return the two weights and the two lists of sequences.\n",
    "    return w0, w1, error_sequence, W_sequence"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will need some functions to help us create plots showing the error going down and the sequence of weight values that were tried. *Read through this code.  It will be very helpful to fully understand this code.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:49:59.548647Z",
     "start_time": "2023-08-21T19:49:59.539626Z"
    }
   },
   "outputs": [],
   "source": [
    "def plot_sequence(error_sequence, W_sequence, label):\n",
    "    plt.subplot(1, 2, 1)\n",
    "    plt.plot(error_sequence, 'o-', label=label)\n",
    "    plt.xlabel('Steps')\n",
    "    plt.ylabel('Error')\n",
    "    plt.legend()\n",
    "    plt.subplot(1, 2, 2)\n",
    "    W_sequence = np.array(W_sequence)\n",
    "    plt.plot(W_sequence[:, 0], W_sequence[:, 1], '.-', label=label)\n",
    "    plot_error_surface()\n",
    "\n",
    "def plot_error_surface():\n",
    "    n = 20\n",
    "    w0s = np.linspace(-5, 5, n) \n",
    "    w1s = np.linspace(-0.5, 1.0, n) \n",
    "    w0s, w1s = np.meshgrid(w0s, w1s)\n",
    "    surface = []\n",
    "    for w0i in range(n):\n",
    "        for w1i in range(n):\n",
    "            surface.append(rmse(X, T, w0s[w0i, w1i], w1s[w0i, w1i]))\n",
    "    plt.contourf(w0s, w1s, np.array(surface).reshape((n, n)), cmap='bone')\n",
    "    # plt.colorbar()\n",
    "    plt.xlabel('w_bias')\n",
    "    plt.ylabel('w')\n",
    "    \n",
    "def show_animation(model, error_sequence, W_sequence, X, T, label):\n",
    "    W_sequence = np.array(W_sequence)\n",
    "    fig = plt.figure(figsize=(15, 8))\n",
    "    plt.subplot(1, 3, 1)\n",
    "    error_line, = plt.plot([], [])\n",
    "    plt.xlim(0, len(error_sequence))\n",
    "    plt.ylim(0, max(error_sequence))\n",
    "\n",
    "    plt.subplot(1, 3, 2)\n",
    "    plot_error_surface()\n",
    " \n",
    "    w_line, = plt.plot([], [], 'y.-', label=label)\n",
    "    plt.legend()\n",
    "\n",
    "    plt.subplot(1, 3, 3)\n",
    "    plt.plot(X, T, 'o')\n",
    "    model_line, = plt.plot([], [], 'r-', lw=3, alpha=0.5, label=label)\n",
    "    plt.xlim(0, 24)\n",
    "    plt.ylim(np.min(T), np.max(T))\n",
    "\n",
    "    for i in range(len(W_sequence)):\n",
    "        \n",
    "        error_line.set_data(range(i), error_sequence[:i])\n",
    "        w_line.set_data(W_sequence[:i, 0], W_sequence[:i, 1])\n",
    "        Y = model(X, W_sequence[i, 0], W_sequence[i, 1])\n",
    "        model_line.set_data(X, Y)\n",
    "\n",
    "        #plt.pause(0.001)\n",
    "\n",
    "        clear_output(wait=True)\n",
    "        display(fig)\n",
    "    clear_output(wait=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's try these functions to illustrate coordinate descent, given the initial values of `w0` and `w1`, and the parameter values that control the optimization, `nSteps` and `dw`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:50:01.121116Z",
     "start_time": "2023-08-21T19:50:01.108313Z"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "w0 = -2\n",
    "w1 = 0.5\n",
    "nSteps = 200\n",
    "dw = 10\n",
    "w0, w1, error_sequence, W_sequence = coordinate_descent(rmse, X, T, w0, w1, dw, nSteps)\n",
    "print(f'Coordinate Descent: Error is {rmse(X, T, w0, w1):.2f}   W is {w0:.2f}, {w1:.2f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Well, did we succeed?   Hard to know from these three numbers.  Let's plot stuff to get a better understanding."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:50:21.891170Z",
     "start_time": "2023-08-21T19:50:03.248623Z"
    },
    "lines_to_next_cell": 1
   },
   "outputs": [],
   "source": [
    "show_animation(linear_model, error_sequence, W_sequence, X, T, 'coord desc')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Okay, well that's fun, but this becomes kind of silly when we try to apply this to other models that have more weights, like thousands, or millions.  Instead, we need a way to find a direction in which we can change both weights, meaning all weights, on each step.  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Optimizing weights with Run and Twiddle"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "new_topic('Optimizing weights with Run and Twiddle')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How about this?  Take a step in some direction.  If error decreases, continue in that direction.  If error does not decrease, pick a random direction.  Repeat.\n",
    "\n",
    "This has been called \"run and twiddle\", or \"run and tumble\".  [This Wikipedia page](https://en.wikipedia.org/wiki/Flagellum#Motor) describes how single cell organisms use cilia in their cell membranes to provide locomotion, either in the current direction as they move in a coordinated fashion, or to cause a spin to change direction.\n",
    "\n",
    "Now we will be changing `wo` and `w1` together, so that we can step through the two-dimensional weight space in various directions.  Representing our two weights as a two-component vector, and rewriting some functions to accept a vector, simplifies the code a bit.  We will actually represent the weights, `W`, as a column matrix of two components.  It will look like\n",
    "\n",
    "$$W =\\begin{bmatrix} w_0  \\\\ w_1  \\end{bmatrix}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's add a couple of functions to code the application of our model to data, and to calculate the RMSE that we wish to minimize.  Then we will modify a bit the plotting functions to accept the name of the model function.  We will use a different model towards the end of these notes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def linear_model(X, W):\n",
    "    # W is column vector\n",
    "    return W[0, :] + X @ W[1:, :]\n",
    "\n",
    "def rmse(model, X, T, W):\n",
    "    Y = model(X, W)\n",
    "    return np.sqrt(np.mean((T - Y)**2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:50:22.643617Z",
     "start_time": "2023-08-21T19:50:22.638984Z"
    }
   },
   "outputs": [],
   "source": [
    "def vector_length(v):\n",
    "    return np.sqrt(v.T @ v)\n",
    "\n",
    "def run_and_twiddle(model_f, rmse_f, X, T, W, dW, nSteps, verbose=False):\n",
    "    step = 0\n",
    "    current_error = rmse_f(model_f, X, T, W)\n",
    "    error_sequence = [current_error]\n",
    "    W_sequence = [W.flatten()]\n",
    "    nFails = 0\n",
    "    \n",
    "    while step < nSteps:\n",
    "        new_direction = np.random.uniform(-1, 1, size=(2, 1))\n",
    "        if verbose:\n",
    "            print(f'{step=} {nFails=} {new_direction=}')\n",
    "        new_direction = dW * new_direction / vector_length(new_direction)\n",
    "        \n",
    "        if nFails > 10:\n",
    "            dW = dW * 0.8\n",
    "            \n",
    "        while step < nSteps:\n",
    "            new_W = W.copy() + new_direction               # Why call copy() here?\n",
    "            new_error = rmse_f(model_f, X, T, new_W)\n",
    "            step += 1\n",
    "            if new_error >= current_error:\n",
    "                nFails += 1\n",
    "                break\n",
    "            nFails = 0\n",
    "            if verbose:\n",
    "                print(f'good direction {new_direction=}')\n",
    "            W = new_W\n",
    "            W_sequence.append(W.flatten())\n",
    "            error_sequence.append(new_error)\n",
    "            current_error = new_error\n",
    "\n",
    "    return W, error_sequence, W_sequence"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:51:01.907451Z",
     "start_time": "2023-08-21T19:51:01.899260Z"
    },
    "lines_to_next_cell": 1
   },
   "outputs": [],
   "source": [
    "def plot_error_surface(model):\n",
    "    n = 20\n",
    "    wbiass = np.linspace(-5, 5, n)\n",
    "    ws = np.linspace(-0.5, 1.0, n)\n",
    "    wbiass, ws = np.meshgrid(wbiass, ws)\n",
    "    surface = []\n",
    "    for wbi in range(n):\n",
    "        for wi in range(n):\n",
    "            W = np.array([wbiass[wbi, wi], ws[wbi, wi]]).reshape(-1, 1)\n",
    "            surface.append(rmse(model, X, T, W))\n",
    "    plt.contourf(wbiass, ws, np.array(surface).reshape((n, n)), cmap='bone')\n",
    "    # plt.colorbar()\n",
    "    plt.xlabel('w_bias')\n",
    "    plt.ylabel('w')\n",
    "    \n",
    "def show_animation(model, error_sequence, W_sequence, X, T, label):\n",
    "    W_sequence = np.array(W_sequence)\n",
    "    fig = plt.figure(figsize=(15, 8))\n",
    "    plt.subplot(1, 3, 1)\n",
    "    error_line, = plt.plot([], [])\n",
    "    plt.xlim(0, len(error_sequence))\n",
    "    plt.ylim(0, max(error_sequence))\n",
    "\n",
    "    plt.subplot(1, 3, 2)\n",
    "    plot_error_surface(model)\n",
    " \n",
    "    w_line, = plt.plot([], [], 'y.-', label=label)\n",
    "    plt.legend()\n",
    "\n",
    "    plt.subplot(1, 3, 3)\n",
    "    plt.plot(X, T, 'o')\n",
    "    model_line, = plt.plot([], [], 'r-', lw=3, alpha=0.5, label=label)\n",
    "    plt.xlim(0, 24)\n",
    "    plt.ylim(np.min(T), np.max(T))\n",
    "\n",
    "    for i in range(len(W_sequence)):\n",
    "        \n",
    "        error_line.set_data(range(i), error_sequence[:i])\n",
    "        w_line.set_data(W_sequence[:i, 0], W_sequence[:i, 1])\n",
    "        Y = model(X, W_sequence[i:i + 1, :].T)\n",
    "        model_line.set_data(X, Y)\n",
    "\n",
    "        # plt.pause(0.001)\n",
    "\n",
    "        clear_output(wait=True)\n",
    "        display(fig)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:51:02.728794Z",
     "start_time": "2023-08-21T19:51:02.711323Z"
    },
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "w0 = -2\n",
    "w1 = 0.5\n",
    "W = np.array([w0, w1]).reshape(-1, 1)\n",
    "\n",
    "nSteps = 400\n",
    "dW = 10\n",
    "\n",
    "W, error_sequence, W_sequence = run_and_twiddle(linear_model, rmse, X, T, W, dW, nSteps)\n",
    "print('Run and Twiddle:  Error is {:.2f}   W is {:.2f}, {:.2f}'.format(rmse(linear_model, X, T, W), W[0,0], W[1,0]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:51:15.609145Z",
     "start_time": "2023-08-21T19:51:03.311652Z"
    }
   },
   "outputs": [],
   "source": [
    "show_animation(linear_model, error_sequence, W_sequence, X, T, 'run & twiddle')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Optimizing weights with Stochastic Gradient Descent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "new_topic('Optimizing weights with Stochastic Gradient Descent')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's call the output of our model `Y` and the error being minimized `E`. \n",
    "\n",
    "To perform gradient descent, we need $\\frac{\\partial E}{\\partial W}$.  Let's call this `dEdW`.  The calculation of this can be divided into two factors, using the chain rule.\n",
    "\n",
    "$$\\begin{align*}\n",
    "  \\frac{\\partial E}{\\partial W} &= \\frac{\\partial E}{\\partial Y} \\frac{\\partial Y}{\\partial W}\n",
    "  \\end{align*}$$\n",
    "  \n",
    "The error we want to minimize is the squared error, $(T - Y)^2$, and $Y =X W$, so\n",
    "\n",
    "$$\\begin{align*}\n",
    "  \\frac{\\partial E}{\\partial W} &= \\frac{\\partial E}{\\partial Y} \\frac{\\partial Y}{\\partial W} \\\\\n",
    "  \\frac{\\partial E}{\\partial W} &= \\frac{\\partial (T-Y)^2}{\\partial Y} \\frac{\\partial X W}{\\partial W} \\\\\n",
    "  \\frac{\\partial E}{\\partial W} &= -2 (T - Y) X \n",
    "    \\end{align*}$$\n",
    "    \n",
    "In python, we have\n",
    "\n",
    "    dYdW = X\n",
    "    dEdY = -2 (T - Y)\n",
    "    dEdW = dEdY.T @ dYdW\n",
    "    \n",
    "with some other subtle things to allow us to include the bias weight $w_0$ in the calculations.\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-07-27T20:42:55.468691Z",
     "start_time": "2021-07-27T20:42:55.465943Z"
    }
   },
   "outputs": [],
   "source": [
    "# Still using linear_model as defined above\n",
    "\n",
    "#def linear_model(X, W):\n",
    "#    # W is column vector\n",
    "#    return W[0,:] + X @ W[1:, :]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:51:20.898636Z",
     "start_time": "2023-08-21T19:51:20.895278Z"
    }
   },
   "outputs": [],
   "source": [
    "# Gradient of Y with respect to W\n",
    "def dYdW(X, T, W):\n",
    "    # One row per sample in X, T.  One column per W component.\n",
    "    # For first one, is constant 1.\n",
    "    # For second one, is value of X\n",
    "    return np.insert(X, 0, 1, axis=1)\n",
    "\n",
    "#Gradient of E with respect to Y\n",
    "def dEdY(X, T, W):\n",
    "    Y = linear_model(X, W)\n",
    "    return -2 * (T - Y)\n",
    "    \n",
    "# Gradient of E with respect to W.\n",
    "def dEdW(X, T, W):\n",
    "    result = dEdY(X, T, W).T @ dYdW(X, T, W) / (X.shape[0])\n",
    "    return result.T"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can define a function to optimize the weights using stochastic gradient descent. We will call this sgd, since this optimization method is often called SGD."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:51:21.628801Z",
     "start_time": "2023-08-21T19:51:21.624897Z"
    }
   },
   "outputs": [],
   "source": [
    "def sgd(model_f, gradient_f, rmse_f, X, T, W, learning_rate, nSteps):\n",
    "    error_sequence = []\n",
    "    W_sequence = []\n",
    "    for step in range(nSteps):\n",
    "        \n",
    "        error_sequence.append(rmse_f(model_f, X, T, W))\n",
    "        W_sequence.append(W.flatten())   # or W.ravel()\n",
    "        \n",
    "        W -= learning_rate * gradient_f(X, T, W)  # HERE IS THE WHOLE ALGORITHM!!\n",
    "        \n",
    "    return W, error_sequence, W_sequence"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:51:22.607503Z",
     "start_time": "2023-08-21T19:51:22.552506Z"
    }
   },
   "outputs": [],
   "source": [
    "w0 = -2 \n",
    "w1 = 0.5\n",
    "W = np.array([w0, w1]).reshape(-1, 1)\n",
    "\n",
    "nSteps = 200\n",
    "learning_rate = 0.005\n",
    "\n",
    "W, error_sequence, W_sequence = sgd(linear_model, dEdW, rmse, X, T, W, learning_rate, nSteps)\n",
    "print('Gradient Descent:  Error is {:.2f}   W is {}'.format(rmse(linear_model, X, T, W), W))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:51:37.029346Z",
     "start_time": "2023-08-21T19:51:23.889189Z"
    }
   },
   "outputs": [],
   "source": [
    "show_animation(linear_model, error_sequence, W_sequence, X, T, 'SGD')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Optimizing weights with AdamW"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "new_topic('Optimizing weights with AdamW')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's try a recently developed variation of the gradient descent method, called Adam, for adaptive moment estimation.  See [ADAM: A Method for Stochastic Optimization](https://arxiv.org/pdf/1412.6980.pdf) by Diederik P. Kingma and Jimmy Lei Ba."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:51:41.731668Z",
     "start_time": "2023-08-21T19:51:41.726618Z"
    }
   },
   "outputs": [],
   "source": [
    "def adamw(model_f, gradient_f, rmse_f, X, T, W, learning_rate, nSteps):\n",
    "    \n",
    "    # Commonly used parameter values\n",
    "    alpha = learning_rate\n",
    "    beta1 = 0.9\n",
    "    beta2 = 0.999\n",
    "    epsilon = 1e-8\n",
    "    W_decay_rate = 0.1   # set to zero for adam algorithm\n",
    "    \n",
    "    m = 0\n",
    "    v = 0\n",
    "    \n",
    "    error_sequence = []\n",
    "    W_sequence = []\n",
    "    \n",
    "    for step in range(nSteps):\n",
    "        \n",
    "        error_sequence.append(rmse_f(model_f, X, T, W))\n",
    "        W_sequence.append(W.flatten())\n",
    "        \n",
    "        grad = gradient_f(X, T, W)\n",
    "        \n",
    "        m = beta1 * m + (1 - beta1) * grad\n",
    "        v = beta2 * v + (1 - beta2) * grad * grad\n",
    "\n",
    "        mhat = m / (1 - beta1 ** (step+1))\n",
    "        vhat = v / (1 - beta2 ** (step+1))\n",
    "        \n",
    "        W -= alpha * mhat / (np.sqrt(vhat) + epsilon) + W_decay_rate * W\n",
    "\n",
    "    return W, error_sequence, W_sequence"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:51:42.817168Z",
     "start_time": "2023-08-21T19:51:42.756686Z"
    }
   },
   "outputs": [],
   "source": [
    "w0 = -2\n",
    "w1 = 0.5\n",
    "W = np.array([w0, w1]).reshape(-1, 1)\n",
    "\n",
    "nSteps = 200\n",
    "learning_rate = 0.02\n",
    "\n",
    "W, error_sequence, W_sequence = adamw(linear_model, dEdW, rmse, X, T, W, learning_rate, nSteps)\n",
    "print('Adam:  Error is {:.2f}   W is {:.2f}, {:.2f}'.format(rmse(linear_model, X, T, W), W[0,0], W[1,0]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:52:15.842859Z",
     "start_time": "2023-08-21T19:51:43.227522Z"
    }
   },
   "outputs": [],
   "source": [
    "show_animation(linear_model, error_sequence, W_sequence, X, T, 'AdamW')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Let's Try a Slightly Nonlinear Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "new_topic('Let\\'s Try a Slightly Nonlinear Model')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's make a quadratic model now.  The equation for the output of this model for the $i^{th}$ sample is\n",
    "\n",
    "$$ y_i = w_0 + w_1 x_i + w_2 x_i^2$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:56:42.166699Z",
     "start_time": "2023-08-21T19:56:42.158933Z"
    }
   },
   "outputs": [],
   "source": [
    "X[:10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:56:43.198761Z",
     "start_time": "2023-08-21T19:56:43.190446Z"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "X[:10] ** 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:56:44.034433Z",
     "start_time": "2023-08-21T19:56:44.030026Z"
    }
   },
   "outputs": [],
   "source": [
    "X[:10] ** [1, 2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:56:45.185003Z",
     "start_time": "2023-08-21T19:56:45.182342Z"
    }
   },
   "outputs": [],
   "source": [
    "max_degree = 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:56:46.210725Z",
     "start_time": "2023-08-21T19:56:46.206350Z"
    }
   },
   "outputs": [],
   "source": [
    "X[:10] ** [0, 1, 2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:56:46.849860Z",
     "start_time": "2023-08-21T19:56:46.845898Z"
    }
   },
   "outputs": [],
   "source": [
    "range(max_degree)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:56:48.275193Z",
     "start_time": "2023-08-21T19:56:48.271214Z"
    }
   },
   "outputs": [],
   "source": [
    "list(range(max_degree))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:56:48.914691Z",
     "start_time": "2023-08-21T19:56:48.910548Z"
    }
   },
   "outputs": [],
   "source": [
    "list(range(max_degree + 1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:56:49.651201Z",
     "start_time": "2023-08-21T19:56:49.646718Z"
    }
   },
   "outputs": [],
   "source": [
    "X_powers = X ** range(max_degree + 1)\n",
    "X_powers[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To build our nonlinear model, let's remove the addition of the `w0` weight.  Instead, we will add an input variable (column of `W`) that is a constant 1 for all samples (rows of `W`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:56:51.673556Z",
     "start_time": "2023-08-21T19:56:51.670595Z"
    }
   },
   "outputs": [],
   "source": [
    "def nonlinear_model(X_powers, W):\n",
    "    # W is column vector\n",
    "    return X_powers @ W"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:56:53.186170Z",
     "start_time": "2023-08-21T19:56:53.182732Z"
    }
   },
   "outputs": [],
   "source": [
    "def dYdW(X_powers, T, W):\n",
    "    return X_powers\n",
    "\n",
    "def dEdY(X_powers, T, W):\n",
    "    Y = nonlinear_model(X_powers, W)\n",
    "    return -2 * (T - Y)\n",
    "    \n",
    "# dEdW from before does not need to be changed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-21T19:58:53.921416Z",
     "start_time": "2023-08-21T19:58:53.847384Z"
    }
   },
   "outputs": [],
   "source": [
    "max_degree = 2\n",
    "\n",
    "w0 = 0\n",
    "w1 = 0\n",
    "ws_nonlinear = np.zeros(max_degree - 1)\n",
    "\n",
    "W = np.hstack((w0, w1, *ws_nonlinear)).reshape(-1, 1)\n",
    "print(f'{W=}')\n",
    "\n",
    "learning_rate = 0.01\n",
    "nSteps = 400\n",
    "X_powers = X ** range(max_degree + 1)\n",
    "\n",
    "print(X_powers.shape)\n",
    "\n",
    "W, error_sequence, W_sequence = adamw(nonlinear_model, dEdW, rmse, X_powers, T, W, learning_rate, nSteps)\n",
    "print(f'Adam:  Error is {rmse(nonlinear_model, X, T, W):.2f}   W is {W}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Demonstrate    debugging    here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.plot(error_sequence);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-07-27T20:45:14.331343Z",
     "start_time": "2021-07-27T20:45:13.898503Z"
    }
   },
   "outputs": [],
   "source": [
    "plt.figure(figsize=(10,8))\n",
    "plt.plot(X + np.random.uniform(-0.1, 0.1, X.shape), T, '.', label='Training Data')\n",
    "\n",
    "plt.plot(X, nonlinear_model(X_powers, W), 'ro', label='Prediction on Training Data')\n",
    "\n",
    "plt.xlabel(Xnames[0])\n",
    "plt.ylabel(Tnames[0])\n",
    "\n",
    "plt.legend();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How would you change the previous code cell to plot a continuous red line?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-07-27T20:45:14.331343Z",
     "start_time": "2021-07-27T20:45:13.898503Z"
    }
   },
   "outputs": [],
   "source": [
    "plt.figure(figsize=(10,8))\n",
    "plt.plot(X + np.random.uniform(-0.1, 0.1, X.shape), T, '.', label='Training Data')\n",
    "\n",
    "plt.plot(X, nonlinear_model(X_powers, W), 'r', label='Prediction on Training Data')\n",
    "#   only change is here --------------------^\n",
    "plt.xlabel(Xnames[0])\n",
    "plt.ylabel(Tnames[0])\n",
    "\n",
    "plt.legend();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Whoops."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "plt.figure(figsize=(10,8))\n",
    "plt.plot(X, T, '.', label='Training Data')\n",
    "\n",
    "order = np.argsort(X, axis=0).ravel()  # change to 1-dimensional vector\n",
    "plt.plot(X[order], nonlinear_model(X_powers, W)[order], 'r', label='Prediction on Training Data')\n",
    "\n",
    "plt.xlabel(Xnames[0])\n",
    "plt.ylabel(Tnames[0])\n",
    "\n",
    "plt.legend();"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "formats": "ipynb,py:light"
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}