CS545
Machine Learning

Fall 2009
Department of Computer Science
Link to Colorado State University Home
 Page

Assignment 3: Classification Using LDA and QDA

Implement classification procedures for linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA). Demonstrate them using one-dimensional and two-dimensional examples. Then apply them to a real data set of your choosing. Details are presented in the following parts.

Part 1: One-Dimensional Data

One-dimensional data allows intuitive graphs of the components in each algorithm. Generate data from three classes, each from Gaussian (Normal) distributions with means 1, 2, and 3, respectively, and standard deviations of 0.1. Use 10 samples from each class.

Now that you have generated this training data, pretend you do not know the true means and standard deviation.

In a single window, graph

  1. the training data, as its x value versus the class (1, 2 or 3)
  2. the three curves for p(x | C=k) for k = 1, 2, and 3, for x values in a set of test data generated by x <- seq(0,4,len=100), where p(x | C=k) is calculated using means and standard deviations for each class calculated from the training data,
  3. the curve for p(x) for the test data,
  4. the three curves for p(C=k | x) for k = 1, 2, and 3, for the test data,
  5. the three discriminant functions for the test data
  6. the class predicted by the classifier for the test data.
Generate the above graphs for LDA, and again for QDA.

Part 2: Two-Dimensional Data

Generate two-dimensional data from three classes using two Normal distribution for each class, for a total of six Normal distributions. Generate training data as 10 samples from each Normal distribution, so 20 samples for each class. Generate contour plots or three-dimensional plots (check out persp()) for each of the graphs, except the first one, listed above for LDA and QDA. Do this by generating samples arranged as a grid on the two-dimensional input space. For the first graph, just plot digits 1, 2, or 3 on a 2-D plot.

Do this twice for each algorithm, first using data distributions for which both algorithms do well, then again using data distributions for which there is a much greater difference in classification accuracy among the algorithms.

To generate two-dimensional data, you may use rnorm. This means all two-dimensional distributions would have diagonal covariance matrices. This is why we are generating data from two Normal distributions for each class; using two Normal distributions you can produce samples from one class that are very non-Normal. Here is an example of some R code that generates two-dimensional data. Use different means and standard deviations for your solution.

means <- list(matrix(c(2,2, 5,5), 2,2,byrow=TRUE),
              matrix(c(3,6, 2,5), 2,2,byrow=TRUE),
              matrix(c(8,2, 7,2), 2,2,byrow=TRUE))
std <- 0.5

data <- NULL
for (class in 1:3) {
  mus <- means[[class]]
  data <- rbind(data,
                cbind(rnorm(10,mus[1,1],std), rnorm(10,mus[1,2],std)))
  data <- rbind(data,
                cbind(rnorm(10,mus[2,1],std),rnorm(10,mus[2,2],std)))
}
classes <- c(rep(1,20),rep(2,20),rep(3,20))

plot(data[,1],data[,2],col=classes,pch=paste(classes))

Part 3: Real Data

Pick a real data set, either from data sets included with R, or from the UCI Machine Learning Repository, or any other source for which you obtain prior approval from the instructor. Acknowledge the source of your data. If the data set is not already partitioned into training and testing sets, divide the data randomly into 80% and 20% partitions, training your classifier on the 80% partition and testing it on the 20% partition. Do this for LDA and QDA on the same randomly chosen partition. Compare the fraction of samples classified correctly, for both the training data and the testing data.

If your data includes any discretely-valued attributes, like error codes from computer performance data, you should convert those values into binary-valued, indicator variables. Here is one way to convert integer class labels to a matrix of indicator variables:

makeIndicatorVars <- function(Y) {
  if (!is.matrix(Y))
    Y <- matrix(Y)
  classes <- unique(Y)
  N <- nrow(Y)
  K <- length(classes)
  logicalMatrix <- (matrix(Y,N,K) == matrix(classes,N,K,byrow=TRUE))
  mode(logicalMatrix) <- "numeric" ## to convert to numbers 0, 1
  logicalMatrix
}

Discuss what you can conclude about the data given the relative results of each method.

Grading

Here is what the grade sheet will look like for this assignment.
CS545: Assignment 3                     Name: ________________________

Grade: ___ out of 100 points

======================================================================
Generation of one-dimensional data (36 points total).

( 2 points):  Correct R code for generating the data.

( 10 points):  R code for plotting of all six graphs for one-dimensional
              data for LDA.
( 7 points):  Observations and discussion of results. 

( 10 points):  R code for plotting of all six graphs for one-dimensional
              data for QDA.
( 7 points):  Observations and discussion of results. 


======================================================================
Generation of two-dimensional data (46 points total)

( 2 points):  Correct R code for generating the data.

( 10 points):  R code for plotting of all six graphs for one-dimensional
              data for LDA.
( 7 points):  Observations and discussion of results. 

( 10 points):  R code for plotting of all six graphs for one-dimensional
              data for QDA.
( 7 points):  Observations and discussion of results. 

( 5 points):  Repetition of above for two-dimensional data generated
              in a way that shows poor classification results.
( 5 points):  Observations and discussion of results. 

======================================================================
Data set of your choice. (11 points total)

( 2 points):  Explanation of source of data and why you have chosen it.
( 2 points):  R code for reading the data and preparing it for classification.
( 2 points):  R code for classifying it and analyzing results.
( 5 points):  Observations and discussion of results.

======================================================================
Report structure.  (5 points total)

( 1 points): Table of contents included. Heading and subheading structure
             easy to follow and clearly divides report into logical sections.
( 1 points): Code, math, figure captions, and all other aspects of  
             report are well-written and formatted.
( 1 points): Conclusion section included, describing what you learned
             and which aspects were most difficult.
( 1 points): References. Include only references that you cite in the report.
( 1 points): Correct selling.  Use a spell checker! Correct grammar and 
             punctuation. Always proofread the whole report.