Click here to view full description of the assignment.

Things to take note of while working on this Assignment:

For this Programming Assignment, you will be working on one single file containing all necessary data of size ~1.2gb. The file will have sentences from each document in the form:

AUTHOR_NAME<===>DATE<===>SENTENCE

Please find a sample dataset here. You may test your code on this sample dataset on your local cluster.

While coding, you will have to clean up the text yourself of non-alphanumerics characters.


The path to the large dataset is here


Page last modified on February 06, 2017, at 08:41 PM EST