Getting Started with EEG Data

In 2011-2012, the brain-computer interface (BCI) research group at Colorado State University recorded EEG signals from subjects in our lab and in their homes, using three different EEG systems. One goal of this work is to determine if inexpensive EEG systems (about $7,000) are as effective as more expensive ones (about $40,000) for conducting BCI experiments in the home.

On this page, we summarize the steps you can follow to download some of the data, load it into an ipython environment, and visualize it. We also show examples of looking at P300 ERP’s.

Downloading EEG Data

EEG data from multiple subjects can be downloaded from our Public BCI Data site. Let’s select the data files for the first subject in each device column, for subjects recorded in our lab.

_images/eegDownload.png

The zip file should contain six zipped data files.

> cd ~/Download

> unzip eeg.zip
Archive:  eeg.zip
 extracting: s20-activetwo-gifford-unimpaired.json.zip
 extracting: s21-activetwo-gifford-unimpaired.json.zip
 extracting: s20-gammasys-gifford-unimpaired.json.zip
 extracting: s21-gammasys-gifford-unimpaired.json.zip
 extracting: s20-mindset-gifford-unimpaired.json.zip
 extracting: s21-mindset-gifford-unimpaired.json.zip

> rm eeg.zip

> ls -l --block-size=M *json*
-rw-r--r-- 1 ... 84M Mar 12 10:50 s20-activetwo-gifford-unimpaired.json.zip
-rw-r--r-- 1 ...  5M Mar 12 10:50 s20-gammasys-gifford-unimpaired.json.zip
-rw-r--r-- 1 ... 29M Mar 12 10:50 s20-mindset-gifford-unimpaired.json.zip
-rw-r--r-- 1 ... 80M Mar 12 10:51 s21-activetwo-gifford-unimpaired.json.zip
-rw-r--r-- 1 ...  5M Mar 12 10:51 s21-gammasys-gifford-unimpaired.json.zip
-rw-r--r-- 1 ... 28M Mar 12 10:52 s21-mindset-gifford-unimpaired.json.zip

> unzip s20-gammasys-gifford-unimpaired.json.zip
Archive:  s20-gammasys-gifford-unimpaired.json.zip
  inflating: s20-gammasys-gifford-unimpaired.json

> unzip s20-mindset-gifford-unimpaired.json.zip
Archive:  s20-mindset-gifford-unimpaired.json.zip
  inflating: s20-mindset-gifford-unimpaired.json

> unzip s20-activetwo-gifford-unimpaired.json.zip
Archive:  s20-activetwo-gifford-unimpaired.json.zip
  inflating: s20-activetwo-gifford-unimpaired.json

> rm s20*zip

Loading g.GAMMAsys EEG Data into IPython

Let’s start with the smallest file, the one recorded with the g.tec g.GAMMAsys system. Unzip it.

The unzipped data can loaded into an ipython environment.

In [1]: import json

In [2]: data = json.load(open('s20-gammasys-gifford-unimpaired.json','r'))

The variable data is a list of dictionaries, each with the same keys.

In [1]: len(data)
Out[1]: 8

In [2]: data[0].keys()
Out[2]: 
[u'protocol',
 u'sample rate',
 u'notes',
 u'channels',
 u'date',
 u'location',
 u'device',
 u'eeg',
 u'impairment',
 u'subject']

Here is a handy function to show keys and their values in each data element.

import numpy as np

def summarize(datalist):
    for i,element in enumerate(datalist):
        keys = element.keys()
        print '\nData set', i
        keys.remove('eeg')
        for key in keys:
            print '  {}: {}'.format(key,element[key])
        eegtrials = element['eeg']
        shape = np.array(eegtrials['trial 1']).shape
        print ('  eeg: {:d} trials, each a matrix with {:d} rows' +
              ' and approximately {:d} columns').format( \
            len(eegtrials), shape[0], shape[1])
In [1]: summarize(data)

Data set 0
  protocol: 3minutes
  sample rate: 256
  notes: 
  channels: [u'F3', u'F4', u'C3', u'C4', u'P3', u'P4', u'O1', u'O2']
  date: [2012, 3, 8]
  location: gifford
  device: GAMMAsys
  impairment: none
  subject: 20
  eeg: 1 trials, each a matrix with 9 rows and approximately 46330 columns

Data set 1
  protocol: grid-p
  sample rate: 256
  notes: 
  channels: [u'F3', u'F4', u'C3', u'C4', u'P3', u'P4', u'O1', u'O2']
  date: [2012, 3, 8]
  location: gifford
  device: GAMMAsys
  impairment: none
  subject: 20
  eeg: 1 trials, each a matrix with 9 rows and approximately 17692 columns

Data set 2
  protocol: grid-b
  sample rate: 256
  notes: 
  channels: [u'F3', u'F4', u'C3', u'C4', u'P3', u'P4', u'O1', u'O2']
  date: [2012, 3, 8]
  location: gifford
  device: GAMMAsys
  impairment: none
  subject: 20
  eeg: 1 trials, each a matrix with 9 rows and approximately 17695 columns

Data set 3
  protocol: grid-d
  sample rate: 256
  notes: 
  channels: [u'F3', u'F4', u'C3', u'C4', u'P3', u'P4', u'O1', u'O2']
  date: [2012, 3, 8]
  location: gifford
  device: GAMMAsys
  impairment: none
  subject: 20
  eeg: 1 trials, each a matrix with 9 rows and approximately 17696 columns

Data set 4
  protocol: letter-p
  sample rate: 256
  notes: 
  channels: [u'F3', u'F4', u'C3', u'C4', u'P3', u'P4', u'O1', u'O2']
  date: [2012, 3, 8]
  location: gifford
  device: GAMMAsys
  impairment: none
  subject: 20
  eeg: 1 trials, each a matrix with 9 rows and approximately 17692 columns

Data set 5
  protocol: letter-d
  sample rate: 256
  notes: 
  channels: [u'F3', u'F4', u'C3', u'C4', u'P3', u'P4', u'O1', u'O2']
  date: [2012, 3, 8]
  location: gifford
  device: GAMMAsys
  impairment: none
  subject: 20
  eeg: 1 trials, each a matrix with 9 rows and approximately 17691 columns

Data set 6
  protocol: letter-b
  sample rate: 256
  notes: 
  channels: [u'F3', u'F4', u'C3', u'C4', u'P3', u'P4', u'O1', u'O2']
  date: [2012, 3, 8]
  location: gifford
  device: GAMMAsys
  impairment: none
  subject: 20
  eeg: 1 trials, each a matrix with 9 rows and approximately 17692 columns

Data set 7
  protocol: mentaltasks
  sample rate: 256
  notes: 
  channels: [u'F3', u'F4', u'C3', u'C4', u'P3', u'P4', u'O1', u'O2']
  date: [2012, 3, 8]
  location: gifford
  device: GAMMAsys
  impairment: none
  subject: 20
  eeg: 6 trials, each a matrix with 9 rows and approximately 15623 columns

Plotting some EEG

The first element of the data list has key-value pair protocol: 3minutes, meaning that this element contains 3 minutes of EEG recorded while the subject was asked to relax and look at the computer screen. Let’s take a look at 2 seconds of this data.

The EEG consists of one matrix with 9 rows and 46,342 columns. The 9 rows correspond to the channels channels:  ['F3', 'F4', 'C3', 'C4', 'P3', 'P4', 'O1', 'O2'] plus one more channel that is used to mark stimuli onset and offset, which is not used for the 3 minute protocol. The number of samples (in columns) in one second depends on the sample rate, which for this device, device: GAMMAsys, is 256 samples per second, sample rate: 256. Let’s plot data from all 9 channels for columns 1,000 to 1,512.

In [1]: import numpy as np

In [2]: import matplotlib.pyplot as plt

In [3]: first = data[0]

In [4]: eeg = np.array(first['eeg']['trial 1'])

In [5]: eeg.shape
Out[5]: (9, 46330)

# Using ending semicolon to suppress output of plotting functions.
In [6]: plt.figure(1);

In [7]: plt.plot(eeg[:,4000:4512].T);

In [8]: plt.axis('tight');
_images/eegplot1.png

Kind of hard to see each channel. Let’s spread them out and not plot the constant, unused, 9th channel. Also, we can add a legend with the channel names. If we reverse the vertical order of the channel plots, they will correspond with the vertical order of the channel names.

In [1]: plt.figure(2);

In [2]: plt.plot(eeg[:8,4000:4512].T + 80*np.arange(7,-1,-1));

In [3]: plt.plot(np.zeros((512,8)) + 80*np.arange(7,-1,-1),'--',color='gray');

In [4]: plt.yticks([]);

In [5]: plt.legend(first['channels']);

In [6]: plt.axis('tight');
_images/eegplot2.png

Again, for EEG from ActiveTwo and Mindset Systems

Now let’s summarize the data from the other two systems. First, rename data to dataGammasys.

In [1]: dataGammasys = data

In [2]: dataActivetwo = json.load(open('s20-activetwo-gifford-unimpaired.json','r'))

In [3]: dataMindset = json.load(open('s20-mindset-gifford-unimpaired.json','r'))

In [4]: summarize(dataMindset[0:2])

Data set 0
  target indicator: []
  protocol: 3minutes
  sample rate: 512
  notes: 
  channels: [u'FP1', u'FP2', u'F3', u'F4', u'C3', u'C4', u'P3', u'P4', u'O1', u'O2', u'F7', u'F8', u'T3', u'T4', u'T5', u'T6', u'CZ', u'FZ', u'PZ']
  device: mindset
  location: gifford
  date: [2012, 4, 7]
  impairment: none
  subject: 20
  eeg: 1 trials, each a matrix with 24 rows and approximately 92160 columns

Data set 1
  target indicator: [0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0]
  protocol: grid-p
  sample rate: 512
  notes: 
  channels: [u'FP1', u'FP2', u'F3', u'F4', u'C3', u'C4', u'P3', u'P4', u'O1', u'O2', u'F7', u'F8', u'T3', u'T4', u'T5', u'T6', u'CZ', u'FZ', u'PZ']
  device: mindset
  location: gifford
  date: [2012, 4, 7]
  impairment: none
  subject: 20
  eeg: 1 trials, each a matrix with 24 rows and approximately 36352 columns

This shows that the Mindset has 19 channels of EEG, but the EEG matrix has 24 rows. The first 19 rows are the EEG channels. Let’s plot them.

In [1]: eegMindset = np.array(dataMindset[0]['eeg']['trial 1'])

In [2]: plt.figure();

In [3]: plt.plot(eegMindset[:19,4000:4512].T + 30*np.arange(18,-1,-1));

In [4]: plt.plot(np.zeros((512,19)) + 30*np.arange(18,-1,-1),'--',color='gray');

In [5]: plt.yticks([]);

In [6]: plt.legend(dataMindset[0]['channels'], prop={'size':10});

In [7]: plt.axis('tight');
_images/eegplot3.png

Now for the data from the ActiveTwo system. First, let’s see which element in the list is for the 3minutes protocol.

In [1]: summarize(dataActivetwo[0:2])

Data set 0
  protocol: mentaltasks
  sample rate: 1024.0
  notes: 
  channels: [u'Fp1', u'AF3', u'F7', u'F3', u'FC1', u'FC5', u'T7', u'C3', u'CP1', u'CP5', u'P7', u'P3', u'Pz', u'PO3', u'O1', u'Oz', u'O2', u'PO4', u'P4', u'P8', u'CP6', u'CP2', u'C4', u'T8', u'FC6', u'FC2', u'F4', u'F8', u'AF4', u'Fp2', u'Fz', u'Cz', u'EXG1', u'EXG2', u'EXG3', u'EXG4', u'EXG5', u'EXG6', u'EXG7', u'EXG8', u'Status']
  date: [2012, 2, 23]
  location: gifford
  device: activetwo
  impairment: none
  subject: 20
  eeg: 1 trials, each a matrix with 41 rows and approximately 334848 columns

Data set 1
  protocol: 3minutes
  sample rate: 1024.0
  notes: 
  channels: [u'Fp1', u'AF3', u'F7', u'F3', u'FC1', u'FC5', u'T7', u'C3', u'CP1', u'CP5', u'P7', u'P3', u'Pz', u'PO3', u'O1', u'Oz', u'O2', u'PO4', u'P4', u'P8', u'CP6', u'CP2', u'C4', u'T8', u'FC6', u'FC2', u'F4', u'F8', u'AF4', u'Fp2', u'Fz', u'Cz', u'EXG1', u'EXG2', u'EXG3', u'EXG4', u'EXG5', u'EXG6', u'EXG7', u'EXG8', u'Status']
  date: [2012, 2, 23]
  location: gifford
  device: activetwo
  impairment: none
  subject: 20
  eeg: 1 trials, each a matrix with 41 rows and approximately 185344 columns

In [2]: eegActivetwo = np.array(dataActivetwo[1]['eeg']['trial 1'])

In [3]: eegActivetwo.shape
Out[3]: (41, 185344)

This data matrix contains 41 rows. The list of channels is the 41 names

In [1]: dataActivetwo[1]['channels']
Out[1]: 
[u'Fp1',
 u'AF3',
 u'F7',
 u'F3',
 u'FC1',
 u'FC5',
 u'T7',
 u'C3',
 u'CP1',
 u'CP5',
 u'P7',
 u'P3',
 u'Pz',
 u'PO3',
 u'O1',
 u'Oz',
 u'O2',
 u'PO4',
 u'P4',
 u'P8',
 u'CP6',
 u'CP2',
 u'C4',
 u'T8',
 u'FC6',
 u'FC2',
 u'F4',
 u'F8',
 u'AF4',
 u'Fp2',
 u'Fz',
 u'Cz',
 u'EXG1',
 u'EXG2',
 u'EXG3',
 u'EXG4',
 u'EXG5',
 u'EXG6',
 u'EXG7',
 u'EXG8',
 u'Status']

The channels named EXG1 through EXG6 contain non-EEG data as follows:

Channel Index Electrode
EXG1 32 EOG vertical left
EXG2 33 EOG vertical right
EXG3 34 EOG horizontal left
EXG4 35 EOG horizontal right
EXG5 36 earlobe left
EXG6 37 earlobe right

Typically, the EEG channels (indices 0 through 31) are referenced to the earlobes, after removing the linear trend. That’s easy.

In [1]: import scipy.signal as sig

In [2]: eegActivetwo = sig.detrend(eegActivetwo,1)

In [3]: ref = np.mean(eegActivetwo[36:38,:],axis=0).reshape((1,-1))

In [4]: eeg = eegActivetwo[:32,:] - ref

Now we can plot all 32 EEG channels.

In [1]: plt.figure();

In [2]: plt.plot(eeg[:,4000:4512].T + 150*np.arange(31,-1,-1));

In [3]: plt.plot(np.zeros((512,32)) + 150*np.arange(31,-1,-1),'--',color='gray');

In [4]: plt.yticks([]);

In [5]: plt.legend(dataActivetwo[0]['channels'][:32], prop={'size':8});

In [6]: plt.axis('tight');
_images/eegplot4.png