![]() |
Machine Learning Fall 2009 Department of Computer Science | ![]() |
Now that you have generated this training data, pretend you do not know the true means and standard deviation.
In a single window, graph
x <-
seq(0,4,len=100), where p(x | C=k) is calculated using means
and standard deviations for each class calculated from the training data,
persp()) for each of
the graphs, except the first one, listed above for LDA and QDA. Do
this by generating samples arranged as a grid on the two-dimensional
input space. For the first graph, just plot digits 1, 2, or 3 on a
2-D plot.
Do this twice for each algorithm, first using data distributions for which both algorithms do well, then again using data distributions for which there is a much greater difference in classification accuracy among the algorithms.
To generate two-dimensional data, you may use rnorm.
This means all two-dimensional distributions would have diagonal
covariance matrices. This is why we are generating data from two
Normal distributions for each class; using two Normal distributions
you can produce samples from one class that are very non-Normal.
Here is an example of some R code that generates two-dimensional
data. Use different means and standard deviations for your
solution.
means <- list(matrix(c(2,2, 5,5), 2,2,byrow=TRUE),
matrix(c(3,6, 2,5), 2,2,byrow=TRUE),
matrix(c(8,2, 7,2), 2,2,byrow=TRUE))
std <- 0.5
data <- NULL
for (class in 1:3) {
mus <- means[[class]]
data <- rbind(data,
cbind(rnorm(10,mus[1,1],std), rnorm(10,mus[1,2],std)))
data <- rbind(data,
cbind(rnorm(10,mus[2,1],std),rnorm(10,mus[2,2],std)))
}
classes <- c(rep(1,20),rep(2,20),rep(3,20))
plot(data[,1],data[,2],col=classes,pch=paste(classes))
If your data includes any discretely-valued attributes, like error codes from computer performance data, you should convert those values into binary-valued, indicator variables. Here is one way to convert integer class labels to a matrix of indicator variables:
makeIndicatorVars <- function(Y) {
if (!is.matrix(Y))
Y <- matrix(Y)
classes <- unique(Y)
N <- nrow(Y)
K <- length(classes)
logicalMatrix <- (matrix(Y,N,K) == matrix(classes,N,K,byrow=TRUE))
mode(logicalMatrix) <- "numeric" ## to convert to numbers 0, 1
logicalMatrix
}
Discuss what you can conclude about the data given the relative results of each method.
CS545: Assignment 3 Name: ________________________
Grade: ___ out of 100 points
======================================================================
Generation of one-dimensional data (36 points total).
( 2 points): Correct R code for generating the data.
( 10 points): R code for plotting of all six graphs for one-dimensional
data for LDA.
( 7 points): Observations and discussion of results.
( 10 points): R code for plotting of all six graphs for one-dimensional
data for QDA.
( 7 points): Observations and discussion of results.
======================================================================
Generation of two-dimensional data (46 points total)
( 2 points): Correct R code for generating the data.
( 10 points): R code for plotting of all six graphs for one-dimensional
data for LDA.
( 7 points): Observations and discussion of results.
( 10 points): R code for plotting of all six graphs for one-dimensional
data for QDA.
( 7 points): Observations and discussion of results.
( 5 points): Repetition of above for two-dimensional data generated
in a way that shows poor classification results.
( 5 points): Observations and discussion of results.
======================================================================
Data set of your choice. (11 points total)
( 2 points): Explanation of source of data and why you have chosen it.
( 2 points): R code for reading the data and preparing it for classification.
( 2 points): R code for classifying it and analyzing results.
( 5 points): Observations and discussion of results.
======================================================================
Report structure. (5 points total)
( 1 points): Table of contents included. Heading and subheading structure
easy to follow and clearly divides report into logical sections.
( 1 points): Code, math, figure captions, and all other aspects of
report are well-written and formatted.
( 1 points): Conclusion section included, describing what you learned
and which aspects were most difficult.
( 1 points): References. Include only references that you cite in the report.
( 1 points): Correct selling. Use a spell checker! Correct grammar and
punctuation. Always proofread the whole report.