Programming Assignment 0

Click here to view full description of the assignment.


How to extract the *.zip data files?

Use the command "tar -zxvf <FILENAME>.zip" from the directory containing the zip file.

Problems restarting the HDFS. Cannot view http://<namenode>:<port>.

Please download the sample bash script. Replace the path to your own config directory and the tmp directory that you have set and execute it from terminal. This should get rid of all the tmp directories created at the slave nodes during the previous startup. Now restart the HDFS as you normally would.

Should I keep the port numbers for the properties related to my Namenode in core-site.xml and hdfs-site.xml the same?

No. Make sure that all the port numbers you use for the different properties have different port numbers in their value field.

How can I check the contents of my HDFS?

The command to see the contents of your HDFS is similar to the ls command you use in UNIX:
$HADOOP_HOME/bin/hadoop fs -ls /
For instance, if your have a directory called tmp in your HDFS, in order to look into tmp, just type
$HADOOP_HOME/bin/hadoop fs -ls /tmp

How can I upload some test data files into my cluster?

In order to upload your own data on your cluster, put the data in some directory of your computer, lets call it <SOURCE>. Make sure your cluster is up and running. In order to create a folder(lets call it dataLoc) in HDFS where you want to put that data, use the following command:
$HADOOP_HOME/bin/hadoop fs -mkdir /dataLoc
Now to upload the data into dataLoc, run the following command:
$HADOOP_HOME/bin/hadoop fs -put <SOURCE> /dataLoc
To download the results from folder outputs on the HDFS into your local system, run the following command:
$HADOOP_HOME/bin/hadoop fs -get /outputs <LOCAL_PATH>

Hadoop JARS necessary for importing Map-Reduce related classes:

Click here to download the tar containing the jars that you require. Extract them to any directory of your choice. From your IDE, right-click on your source package and choose build path->configure build path->add external JARS and browse to choose the JARS that you extracted in the previous step to import them to your project.

YARN not starting up Properly:

As you run the yarn start command, you will find the path to the log for the yarn start process in the terminal output for each of the slave node. For any of your slave nodes, go to the directory specified(it is usually /tmp/USERNAME/yarn-logs) and see what the issue is. If you see multiple "Retyring to Connect" lines towards the end of the log, please add the following property to your yarn-site.xml and restart your cluster again:
and give it a value of

