Taxonomy of Feature Extraction and Translation Methods for BCI

Participants in the Signal Processing: Feature Extraction and Translation Workshop at the Third International Meeting on Brain-Computer Interface Technology, June 14-19, 2005, Rensselaerville Institute, NY, were asked for summaries of the feature extraction and translation methods that they have used. This document lists their summaries and contains a draft of a taxonomy of methods that is continually changing. We will discuss and modify this taxonomy at the meeting.

The goals of this effort are to discover how our work relates to the collective effort of this community, to prompt a discussion of which methods appear to be most fruitful for various applications, and to highlight new methods yet to be tried.

Please send suggestions for changes to Chuck Anderson at

A related analysis of methods by S.Mason, A. Bashashati, M. Fatourechi, and G. Birch is being developed through an extensive survey of the literature. While the material in this web page is focused on feature extraction and translation, the work of Mason, et al., encompasses all of the steps involved in BCI research and application. It is avalable at


Reasons to make a taxonomy

  1. People new to field can see overview.
  2. Experienced people can see how their work relates to that of others.
  3. Highlight new methods or combinations of methods.

Desirable characteristics of feature extraction and translation methods

  1. Accuracy
    1. Correct at least x% of the time for classification, or correct within e for x% of the time if continuous. x and e depend on application.
    2. Robust to interference from environmental signals and non-EEG biological signals.
    3. Reliable (repeatable?) from hour to hour, day to day, across different applications of electrode cap, over different environments, and different subjects.
    4. Features that are easily discriminated. (orthogonal)
  2. Fast, Responsive
    1. BCI decision within y seconds, or fraction of second.
    2. Computation time small.
    3. Storage requirements small.
    4. Need not wait for artifact-free segments.
    5. Training time short, requiring reasonable amount of data.
  3. Interpretable
    1. Can explain how BCI decision is being made.
    2. Relate to known electrophysiology and contribute with new knowledge.
    3. Will lead to better feature extraction methods.
    4. Intuitive visualization, leading to biofeedback if in real-time.
  4. Practical
    1. Inexpensive, or at least affordable.
    2. Somewhat portable.
    3. Open source.
    4. Easy setup for subject.
    5. Finely discriminatng between multiple thoughts, states
    6. Automaticity, little effort needed by subject


In response to our request for information from workshop participants, we received replies from Chuck Anderson, Benjamin Blankertz, Clemens Brunner, Anna Buttfield, Mehrdad Fatourechi, Greg Gage, Xiarong Gao, Paul Hammon, Bin He, Ruthy Kaidar, Dean Krusienski, Dennis McFarland, Alois Schloegl, Len Trejo, Doug Weber. Replies are outlined below.


  1. Features extraction (not based on knowledge of desired translation result)
    1. Initial filtering (based on generally-applicable knowledge about signals, not on knowledge specific to particular application)
      1. Spatial
        1. Laplacian filter of neighboring voltages (Krusienski, McFarland, He)
        2. Simple differences of voltages
      2. Temporal
        1. Single frequency passband
        2. 50 or 60 Hz notch filter
        3. Smoothing of single unit firing rates with triangular kernel (Fatourechi)
      3. Multitrial averages synchronized to stimulus
    2. Amplitude (Kaidar, Blankertz)
    3. Frequency
      1. Spatial dependence---number of channels
        1. Single
          1. FFT (Trejo, Hammon, Brunner, Gao, He, Buttfield, Blankertz, Schloegl)
          2. IFFT (Blankertz)
          3. Multi-taper method, gamma-band (Kaidar)
          4. Matched filter
          5. Wavelets (Fatourechi, Trejo---EOG correction)
        2. Multiple
          1. Phase differences (Brunner)
          2. matched filters
          3. multivariate AR
          4. Bispectrum ?
    4. Geometric subspaces (directions in EEG data space that capture most variation, strongest signal, strongest task-specific signal, etc.)
      1. Linear decomposition of matrix of samples
        1. Maximization of variance captured
          1. Project to components with most variance (Fatourechi, Hammon, Gage)
          2. Project to components chosen by validation
          3. Spectrum defined by two data sets, common spatial patterns (Brunner, Gao, Blankertz)
          4. SVD of lagged, multichannel samples (Anderson)
        2. Higher-order statistics
          1. Independent components analysis, ICA (Hammon, He)
    5. Model (not based on knowledge of desired translation result, unsupervised)
      1. Spatial dependence---Number of channels
        1. Single channel
        2. Multi-channel
      2. Temporal dependence---Independent or dependent on history
        1. None---static model
        2. Some---dynamic model
          1. (adaptive) autoregressive model (McFarland, Hammon, Brunner)
      3. source localization (He)
      4. Complexity (nonlinear systems)
    6. Feature subset selection
      1. Most variance
      2. Most significant difference, single electrode selection (Kaidar)
      3. Classification or prediction accuracy
        1. Exhaustive search (Anderson)
        2. Genetic search (Fatourechi)
        3. Sequential forward or backward
  2. Translation, into categories for classification or real-values for prediction (based on knowledge of desired translation output, supervised)
    1. Memory based
      1. k-nearest neighbors (Fatourechi, Anderson)
    2. Discriminant functions
      1. Linear
        1. Linear regression (Krusienski, Fatourechi, McFarland, Blankertz)
        2. LDA (Brunner, Anderson, Blankertz, Schloegl)
        3. Partial Least Squares, PLS
        4. Perceptron (Gao)
      2. Quadratic, QDA (Anderson)
      3. Nonlinear
        1. neural network (Anderson)
        2. support vector machines (Hammon, Kaidar)
        3. decision trees (Anderson)
        4. LVQ (maybe more like feature extraction) (Fatourechi, (Schloegl)
    3. Models---per class
      1. Logistic regression (L1-regularized) (Hammon)
      2. Kalman filters (Gage)
      3. LDA, QDA
      4. Kernel Partial Least Squares, KPLS (Trejo)
      5. k-means
      6. mixture of gaussians (Anderson, Buttfield)
      7. HMM---hidden markov models
      8. Combinations
        1. Voting (Trejo, Anderson)
        2. Averaging