Warning: Declaration of action_plugin_wrap::register(&$controller) should be compatible with DokuWiki_Action_Plugin::register(Doku_Event_Handler $controller) in /s/bach/b/class/cs545/public_html/fall13/dokuwiki/lib/plugins/wrap/action.php on line 148

Warning: Declaration of action_plugin_tablewidth::register(&$controller) should be compatible with DokuWiki_Action_Plugin::register(Doku_Event_Handler $controller) in /s/bach/b/class/cs545/public_html/fall13/dokuwiki/lib/plugins/tablewidth/action.php on line 93
assignments:assignment5

Table of Contents

Assignment 5: Naive Bayes

Due: November 17th at 6pm

Part 1: A few short questions about naive Bayes

  1. Can you use naive Bayes for data that contains both categorical and real-valued features?
  2. The basic assumption in naive Bayes is that all attributes are independent given the label. How can you model just 2 of $d$ features as dependent?
  3. Given a trained naive Bayes classifier, and without access to the training data, how would you select a subset of features that are most predictive of the class label?

Part 2: naive Bayes implementation

Implement a naive Bayes classifier for either categorical or continuous data. Compare its performance to that of an SVM (make sure to perform proper model selection for classifier parameters using internal cross-validation). Use two UCI repository datasets for this task. There are several datasets that have categorical data: e.g. nursery school application ranking, census income prediction, and splice junction detection. If you are implementing naive Bayes for categorical data, make sure to include pseudo-counts to avoid over fitting.

Grading

Here is what the grading sheet will look like for this assignment. A few general guidelines for this and future assignments in the course:

Grading sheet for assignment 5

Part 1:  40 points.
(14 points):  1st question
(13 points):  2nd question
(13 points):  3rd question

Part 2:  50 points.
(10 points):  Experimental protocol
(20 points):  Correct classifier implementation
(10 points):  Results for the two classifiers on both datasets
(10 points):  Discussion of the results

Report structure, grammar and spelling:  10 points
( 3 points):  Heading and subheading structure easy to follow and
              clearly divides report into logical sections.
( 4 points):  Code, math, figure captions, and all other aspects of  
              report are well-written and formatted.
( 3 points):  Grammar, spelling, and punctuation.