.. _p300_single_subject:

******************************
P300 Waves for Single Subjects
******************************

The Stimulus Channel
====================

Each data file includes EEG recorded during a visual stimulus protocol
designed to elicit P300 waves. In the single-letter protocol, subjects
looked at a computer screen in the center of which single letters were
briefly displayed sequentially, in a random order.  In the first
trial, they were asked to count the number of times the letter 'b'
appeared.  They were then asked to count the number of times the
letter 'd' appeared, and then again counting the number of times the
letter 'p' appeared.  These three trials were then repeated, but
instead of displaying a single letter, nine letters were
simultaneously displayed, in a three-by-three table.  Subjects were
asked to count the number of times the letter 'b' appeared in the
center of the table, and similarly the letter 'd', and the letter 'p'.

The letter being presented in the center of the computer screen is
indicated in the stimulus channel of the data.  For the g.tec
g.GAMMAsys data files, this is the ninth channel.  We can see this by
again plotting all nine channels, but this time of the seventh element
(index 6) in the data list, the one for which the protocol is
``letter-b``.

.. ipython::
   :suppress:

   In [0]: import os

   In [3]: if not os.path.isfile('s21-gammasys-gifford-unimpaired.json'):
      ...:    !wget http://www.cs.colostate.edu/eeg/data/json/zips/s21-gammasys-gifford-unimpaired.json.zip
      ...:    !unzip s21-gamma*zip
      ...:     

.. ipython::

   In [1]: import json

   In [3]: import numpy as np

   In [4]: import matplotlib.pyplot as plt

   In [2]: data = json.load(open('s21-gammasys-gifford-unimpaired.json','r'))

   In [5]: letterd = data[4]

   In [5]: letterd['protocol']

   # Note the transpose here.  Time now advances by row.  Columns are channels
   In [5]: eeg = np.array(letterd['eeg']['trial 1']).T

   In [1]: plt.ion()

   In [5]: plt.figure();

   In [91]: plt.plot(eeg[4000:5000,:9] + 80*np.arange(8,-1,-1));

   In [91]: plt.plot(np.zeros((1000,9)) + 80*np.arange(8,-1,-1),'--',color='gray');

   In [91]: plt.yticks([]);

   In [92]: plt.legend(letterd['channels'] + ['Stimulus']);

   @savefig p300plot1.png width=6in align=center
   In [93]: plt.axis('tight');

The stimulus channel has the values

.. ipython::

   In [1]: np.unique(eeg[:,8])

Its value is 0 only before the trial starts. Of the non-zero values, only
one is positive.  This represents the letter to be counted---the
target letter.  The negative values are nontarget letters, except for
-32, which is the ASCII code for a blank representing the times when
no letter is displayed and the computer screen is black. The
ASCII codes for all stimulus channel values can be converted to letters by

.. ipython::

   In [1]: [chr(int(n)) for n in np.abs(np.unique(eeg[:,8]))]


Segmenting Data into Time Windows
=================================

To investigate the presence or absence of P300 waves, we must segment
the data into time windows following each stimulus presentation.
First, let's take a closer look.  Plot just the data from the stimulus
channel (index 8) and Channel P3 (index 5).

.. ipython::

   In [1]: plt.clf();

   In [1]: plt.plot(eeg[3000:4500,[5,8]]);

   In [1]: plt.axis('tight');

   @savefig p300plot2.png width=6in align=center
   In [1]: plt.xlabel('Time Samples');

It appears that some P300 waves are present following the three
positive stimuli.  Remember that this is the target letter 'd'.  The
positive wave appears roughly 100 samples after the stimulus onset.
Since the sampling frequency is 256 Hz, this is about 0.4 seconds, or
400 milliseconds, after the stimulus.

.. ipython::

   In [1]: 100.0 / letterd['sample rate']

Now let's collect all of the segments from this letter 'd' trial.
First, find the sample indices where each stimulus starts, 
collect the displayed letters for each stimulus, and keep track of
which stimuli are the target letter.

.. ipython::

   In [1]: starts = np.where(np.diff(np.abs(eeg[:,-1])) > 0)[0]

   In [1]: stimuli = [chr(int(n)) for n in np.abs(eeg[starts+1,-1])]

   In [1]: targetLetter = letterd['protocol'][-1]

   In [1]: targetSegments = np.array(stimuli) == targetLetter

   In [1]: len(starts),len(stimuli),len(targetSegments)

   In [1]: for i,(s,stim,targ) in enumerate(zip(starts,stimuli,targetSegments)):
      ...:     print s,stim,targ,'; ',
      ...:     if (i+1) % 5 == 0:
      ...:         print
      ...:

Good. Now, how long should each time window be?  The number of samples
between each stimulus onset is

.. ipython::

   In [1]: np.diff(starts)

and the minimum number is

.. ipython::

   In [1]: np.min(np.diff(starts))

so let's use time windows of 210 samples. We can build a matrix with
rows being segments, and each row being all indices for that segment
by doing this.

.. ipython::

   In [1]: indices = np.array([ np.arange(s,s+210) for s in starts ])

   In [1]: indices.shape

And, finally, we can build an 80 x 210 x 8 array of segments.

.. ipython::

   In [1]: segments = eeg[indices,:8]

   In [1]: segments.shape


Grand Averages to Show P300 Waves
=================================

Typically, segments are averaged to minimize the effect of variations
across multiple stimuli and, hopefully, to emphasize the parts of the
signal that are common to the stimuli.  Let's average all of the
target segments and all of the nontarget segments for the P4 channel
(index 5), and plot two
resulting mean segments.

.. ipython::

   In [1]: segments[targetSegments,:,5].shape

   In [1]: targetMean = np.mean(segments[targetSegments,:,5],axis=0)

   In [1]: nontargetMean = np.mean(segments[targetSegments==False,:,5],axis=0)

   In [1]: plt.clf();

   In [1]: xs = np.arange(len(targetMean)) / 256.0 * 1000  # milliseconds

   In [1]: plt.plot(xs, np.vstack((targetMean,nontargetMean)).T);

   In [1]: plt.legend(('Target','Nontarget'));

   In [1]: plt.axis('tight');

   In [1]: plt.xlabel('Milliseconds After Stimulus');

   @savefig p300plot3.png width=6in align=center
   In [1]: plt.ylabel('Mean Voltage ($\mu$ V)');

Remember that the target mean is over 20 segments, while the nontarget
mean is over 60 segments.  If we balance these numbers by randomly
selecting 20 nontarget segments, the means look like

.. ipython::

   In [1]: targetMean = np.mean(segments[targetSegments,:,5],axis=0)

   In [1]: nontargetSegments = segments[targetSegments==False,:,5]

   In [1]: selectThese = np.random.permutation(range(nontargetSegments.shape[0]))[:20]

   In [1]: nontargetMean = np.mean(nontargetSegments[selectThese,:],axis=0)

   In [1]: plt.clf();

   In [1]: plt.plot(xs, np.vstack((targetMean,nontargetMean)).T);

   In [1]: plt.legend(('Target','Nontarget'));

   In [1]: plt.axis('tight');

   In [1]: plt.xlabel('Milliseconds After Stimulus');

   @savefig p300plot5.png width=6in align=center
   In [1]: plt.ylabel('Mean Voltage ($\mu$ V)');

Running the above code again, with a different random selection of
nontarget trials, results in a different nontarget mean.

.. ipython::
   :suppress:

   In [1]: targetMean = np.mean(segments[targetSegments,:,5],axis=0)

   In [1]: nontargetSegments = segments[targetSegments==False,:,5]

   In [1]: selectThese = np.random.permutation(range(nontargetSegments.shape[0]))[:20]

   In [1]: nontargetMean = np.mean(nontargetSegments[selectThese,:],axis=0)

   In [1]: plt.clf();

   In [1]: plt.plot(xs, np.vstack((targetMean,nontargetMean)).T);

   In [1]: plt.legend(('Target','Nontarget'));

   In [1]: plt.axis('tight');

   In [1]: plt.xlabel('Milliseconds After Stimulus');

   @savefig p300plot6.png width=6in align=center
   In [1]: plt.ylabel('Mean Voltage ($\mu$ V)');