csu logo green Computer Science Department

Big Data:
Assignments and Term Project

CS535

Fall 2017

Home Syllabus Schedule Assignments Grading Policy Course Policy Code of Conduct Canvas
Programming Assignment 1

Title: Hyperlink-Induced Topic Search (HITS) over Wikipedia Articles using Apache Spark

Due: Sept. 27 Wednesday 5:00PM
Submission: via Canvas

Description
In this assignment, you will design and implement a system that generates a subset of graphs using Hyperlink-Induced Topic Search (HITS) over currently available Wikipedia articles. HITS is a link analysis algorithm designed to find the key pages for specific web communities. HITS is also known as hubs and authorities.

The goal of this programming assignment is to enable you to gain experience in:

  • Installing and using analytics tools such as HDFS and Apache Spark
  • Generating root set and base set based on link information
  • Implementing iterative algorithms to estimate Hub and Authority scores of Wikipedia articles using Wikipedia dump data

(Complete description

Dataset
Dataset from the CS servers [links-simple-sorted.zip][titles-sorted.zip]
Original instruction of dataset [Link] : Data downloading from this site is not available any more. Please use the links listed above.

Walkthrough of installing your Apache Hadoop cluster [Link] [Video]
Walkthrough of installing your Apache Spark cluster [Link
Port range assignment [Link]


 
Programming Assignment 2

Title: Detecting the Most Popular Topics with Sentiments from Live Twitter Message Streams using the Lossy Counting Algorithm with Apache Storm

Due: Nov. 6 Wednesday 5:00PM

Description

The goal of this programming assignment is to enable you to gain experience in:
- Implementing approximate on-line algorithms using a real-time streaming data processing framework
- Applying sentiment analysis over the near real-time streaming data
- Understanding and implementing parallelism over a real-time streaming data processing framework

(Complete description

Port range assignment [Link]

 
Term Project

Term project planning: Team and topic

Due: August 28 5:00PM via email

Term Project Proposal and Presentation

Due: Oct.11 5:00PM via Canvas
Description [Link]
Presentation: 10/17, and 19 in class
Schedule:


[10/17, Tuesday]

Team Chokecherry
Team Engelmann Spruce
Team Gambel Oak
Team Limber Pine
Team Lodgepole Pine

[10/19, Thursday]
Team Narrowleaf Cottenwood
Team Peachleaf Willow
Team Pinon Pine
Team Quaking Aspen
Team Rocky Mountain Juniper
Team Rocky Mountain Maple

 

 

Term Project Report and Software
Due: Dec. 4, 5:00PM via Canvas
Description: TBA

Please submit your (1) report and (2) code.

 

Final Presentation
Description:TBA
Presentation: 12/5, and 12/7 in class
Schedule:


[12/5, Tuesday 8:45AM ~ 10:45AM]
Team Rocky Mountain Maple
Team Rocky Mountain Juniper
Team Quaking Aspen
Team Pinon Pine
Team Peachleaf Willow
Team Narrowleaf Cottenwood

[12/7, Thursday 8:45AM ~ 10:45AM]
Team Lodgepole Pine
Team Limber Pine
Team Gambel Oak
Team Engelmann Spruce
Team Chokecherry
Team Boxelder



Home Syllabus Schedule Assignments Grading Policy Course Policy Code of Conduct Canvas