Programming Assignment 2

Click here to view full description of the assignment.

FAQs

How to get my cluster out of safemode?

From your namenode, type $HADOOP_HOME/bin/hadoop dfsadmin -safemode leave.

YARN not starting up Properly:

As you run the yarn start command, you will find the path to the log for the yarn start process in the terminal output for each of the slave node. For any of your slave nodes, go to the directory specified(it is usually /tmp/USERNAME/yarn-logs) and see what the issue is. If you see multiple "Retyring to Connect" lines towards the end of the log, please add the following property to your yarn-site.xml and restart your cluster again:
yarn.resourcemanager.admin.address
and give it a value of 0.0.0.0:ANY_UNIQUE_PORT_NUMBER

How to get IDF for unknown document for online part of the software?

You are required to use the IDF values of the words calculated for the offline part of your software while working with the text from the mystery author to generate the new AAV.

How to get IDF for a word not encountered during the offline calculation?

A new word encountered during the online calculations should be assigned an IDF of log N, where N is the total number of authors. Alternatively, you could ignore this word altogether since it would not get considered for similarity calculation later on.


Page last modified on March 06, 2017, at 12:32 AM EST