csu logo green Computer Science Department

Introduction to Big Data: assignments CS 435
Fall 2020
| Home | Syllabus | Schedule | Assignments | Grading Policy | Course Policy | Code of Conduct | Canvas |

RECITATION WILL BE HELD VIA MICROSOFT TEAMS, CS435 GROUP ON FRIDAYS 4.00 PM - 4.50 PM (PARTICIPATION IS OPTIONAL)
VIDEO WILL BE RECORDED AND UPLOADED TO CANVAS

RESOURCES ON REMOTE LOGGING: [Infospaces]

RESOURCES ON REMOTE DESKTOP: [Using VNC on CS Linux Machines]

Programming Assignments
Programming Assignment 0: September 3 By 5:00PM via Canvas
Programming Assignment 1: September 17 By 5:00PM via Canvas
Programming Assignment 2: October 15 By 5:00PM via Canvas
Programming Assignment 3: November 12 By 5:00PM via Canvas

Term Project Phases
Term Project Phase 0: September 4 By 5:00PM via Canvas (cs435@cs.colostate.edu)
Term Project Phase 1: October 29 By 5:00PM via Canvas
Term Project Phase 2 (Software/Report): December 3 By 5:00PM via Canvas
Term Project Phase 3: (Presentation Video Clip): December 7 By 10:00AM via Canvas



Programming Assignment 0:
Creating your own Hadoop cluster and a distinct nodes example

Due: September 3 By 5:00PM

Submission: via Canvas, Individual submission

Objectives:
The goal of this programming assignment is to enable you to gain experience in
--Installing and configuring Hadoop
--Gaining Familiarity with basic features of the Hadoop distributed file system
--Running simple example of MapReduce

Full description: [Link]

Helpful infospaces videos: [Hadoop Standalone Mode] [Hadoop Cluster Configuration]

Hadoop Installation Guide: [Link]

Map-Reduce Code for Distinct Nodes of a Social Network: [Link]

Node and Port Assignment: [Link]

Hadoop Configuration Jar: [Link]

Test file: [Link]

Test file for DEMO: [Link]

For the technical questions, please contact GTA.

 

Programming Assignment 1

Due: September 17 By 5:00PM

Submission: via Canvas, Individual submission

Objectives:
The goal of this programming assignment is to enable you to gain experience in
--Profiling a large-scale Social Network Graph using Hadoop MapReduce
--Extracting a feature graph using Hadoop MapReduce

Full description: [Link]

Test file for DEMO: [Link]

Programming Assignment 2

Due: October 22 By 5:00PM

Submission: via Canvas, Individual submission

Objectives:
The goal of this programming assignment is to enable you to gain experience in
--Counting the number of edges and vertices of the Google+ “friend” networks
--Measuring small-world network properties using MapReduce
--Analyzing the results of your measurements

Full description: [Link]

Test file for DEMO: [Link]

Programming Assignment 3

Due: November 19 By 5:00PM

Submission: via Canvas, Individual submission

Objectives:
The goal of this programming assignment is to enable you to gain experience in
--Installing and using analytics tools such as HDFS and Apache Spark
--Implementing iterative algorithms to estimate PageRank values of Wikipedia articles using Wikipedia dump data

Full description: [Link]

Apache Spark Installation Guide: [Link]

Sample Word count project: [Link]

Links file: [Link]

Titles file: [Link]

Term Project

The objectives of the term project are,
- Performing a large-scale data analytics
- Using technologies typically used in modern data centers
- Interpreting your results to extract insight from the data

Phase 0: Term Project Team Assignment

Due: September 4, 2019 By 5:00PM
Requirement: 3 or 4 teammembers only
Submission: via Canvas

Phase 1: Term Project Proposal

Due: October 29 2019 By 5:00 PM via Canvas
Submission: via Canvas, Team submission

Full description of the term project proposal: [Link]

Phase 2: Term Project Submission (Software and Report)

Due: December 3 2019 By 5:00 PM via Canvas

Submission: via Canvas, Team submission

Description: [Link]

Phase 3: Presentations and Software Demonstration (Video clip + zoom discussion)

Phase 3 include a team presentation in class and a software demonstration to the instructure. Your pre-recorded presentation will be 8 minutes (with 2 minutes of the zoom Q&A discussion). We accept only MP4 format. There will be 3-4 judges in the class session.

Due: December 7 By 10:00AM via Canvas

Your presentation should cover,

Slide 1. Title

Slide 2. Problem description: Describe your problem and goal

Slide 3. Description of your data: The characteristics of your data (e.g. why is it challenging?)

Slide 4 (1 or 2 slides). Your approaches (Methodology)
- Algorithm (e.g. description of linear regression, and how you applied the algorithm on your problem)
- Programing paradigm (e.g. your design of mapreduce)
- System and services

Slide 5. Discussion of your analysis
- What did you find from the results of your analysis?
- Do you think that it is accurate?
- How was the performance of your analysis? (if it is applicable)
- Did you find any challenges during your project? Please share those.

Slide 6. Conclusion
-Summary of your project
-Lessons learned from your project

You can provide a realtime demo if you think that can highlight your software.

Your team will also provide a short demo to the instructor.

Presentation schedule
TBA

 

Home Syllabus Schedule Assignments Grading Policy Course Policy Code of Conduct Canvas