csu logo green Computer Science Department

Introduction to Big Data: schedule CS 435
Fall 2018
| Home | Syllabus | Schedule | Assignments | Grading Policy | Course Policy | Code of Conduct | Canvas |

Note that this schedule will be altered during the semester. Please make sure to check it every week.
Week 1. (8/20, 8/22)
Topics
- Introduction to Big Data
- Course Introduction
- Big Data and Analytics: Data collection, Sampling and Preprocessing
- Overview of Big Data Computing Stack
- Introduction to MapReduce

Readings
J. Ginsberg, et al., “Detecting influenza epidemics using search engine query data” Nature 457 pp. 1012 ~ 1014, February 2009, Link

Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2012 --Chapter 1-1.2/1.3 [link]

Useful Links
Keshav's "How to read a paper"[Link]
"How to Read and Understand a Scientific Paper: A Step-by-Step Guide for Non-Scientists"[Link]

 

Lecture Notes
8/20: [DOWNLOAD]
8/22: [DOWNLOAD]

Recitation
8/24: The video clip is now available at the canvas course page.


Notes


8/24 Restricted Drop Deadline
8/26 Add Without Override Deadline
9/5 Census (Add/Drop Ends)
10/15 End Course Withdrawal Period
10/15 Repeat/Delete Deadline
Colorado State University Academic Calendar [Link]


 

Week 2. (8/27, 8/29)

Topics
- MapReduce Design Pattern I. Numerical Summarization

Readings
Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2012 --Chapter 1-1.2/1.3 [link]

Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, "The Google File System"  Proceedings of SOSP 2003: 29-43, Link


Lecture Notes
8/27: [DOWNLOAD]
8/29: [DOWNLOAD]

Recitation
8/31: The video clip is now available at the canvas course page.

Notes
9/5 Census (Add/Drop Ends)
10/15 End Course Withdrawal Period
10/15 Repeat/Delete Deadline
Colorado State University Academic Calendar [Link]

 
Week 3. (9/3, 9/5)

Topics
- MapReduce Design Pattern II. Filtering Patterns
- MapReduce Design Pattern III. Data Organization Patterns

Readings
Jeffrey Dean and Sanjay Ghemawat, "MapReduce:Simplified Data Processing on Large Clusters" In Proceeding
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Vol. 6, [Link]

Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2012 --Chapter 2 [link]

Lecture Notes  
9/3: University Holiday (No Class)
9/5:[DOWNLOAD]: Examples updated

Recitation
9/7: The video clip is now available at the canvas course page.

Notes


 
Week 4. (9/10, 9/12)

Topics
- MapReduce Design Pattern IV. Join Patterns
- MapReduce Design Pattern V. I/O patterns

Readings
Jeffrey Dean and Sanjay Ghemawat, "MapReduce:Simplified Data Processing on Large Clusters" In Proceeding
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Vol. 6, [Link]

Lecture Notes
9/10: [DOWNLOAD]:
9/12: [DOWNLOAD]:

Recitation
9/14: The video clip is now available at the canvas course page.

Notes

 
Week 5. (9/17, 9/19)
Topics
- MapReduce Design Pattern V. I/O patterns
- How MapReduce works



Readings
Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2012 --Chapter 3 [link]

Lecture Notes
9/17:[DOWNLOAD]:
9/19:TBA


Recitation
9/21: TBA

Notes

 

 
Week 6. (9/24,9/26)
Topics
Large-scale Analytics 1. Web-Scale Link Analysis

Readings
Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2012 --Chapter 5 [link]


Lecture Notes
9/24: TBA
9/26: TBA


Recitation
9/28: TBA

Notes

 
Week 7. (10/1, 10/3)

Topics
Large-scale Analytics 1. Web-Scale Link Analysis: continued
Large-scale Analytics 2. Clustering: K-Means Clustering using Canopy algorithm
Midterm Review

Lecture Notes
10/1: TBA
10/3: TBA


Recitation
10/5: TBA

Notes



 
Week 8. (10/8, 10/10)

Topics
Midterm (10/8)
Large-Scale Analytics 3. Predictive Analytics: Linear Regression using Gradient Descent Algorithm
Planning Term Project

Lecture Notes
10/8: TBA
10/10: TBA

Recitation
10/12: TBA

Notes

 
Week 9. (10/15, 10/17)

Topics
Large-Scale Analytics 4. Recommendation Systems: Collaborative Filtering

Readings
Cheng-Tao Chu, Sang Kyun Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Andrew Y. Ng, and Kunle Olukotun. 2006. Map-reduce for machine learning on multicore. In Proceedings of the 19th International Conference on Neural Information Processing Systems (NIPS'06), B. Schölkopf, J. C. Platt, and T. Hoffman (Eds.). MIT Press, Cambridge, MA, USA, 281-288. [Link]

Lecture Notes
10/15: TBA
10/17: TBA

Recitation
10/19: TBA


Notes

 

Part 2. Data Retrieval and Exchange

Week 10. (10/22, 10/24)

Topics
In-Memory Computing Framework for Scalable Analytics with Apach Spark


Readings
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica, “Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing,” The 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12) [Link]

 

Lecture Notes
10/22: TBA
10/24: TBA

Recitation
10/26: TBA

Notes

Week 11. (10/29, 10/31)

Topics
In-Memory Computing Framework for Scalable Analytics with Apach Spark-Continued
Distributed File Systems: Google File System

Readings

Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, "The Google File System"  Proceedings of SOSP 2003: 29-43, Link

Lecture Notes
10/29: TBA
10/31: TBA

Recitation
11/2: TBA

Notes

Week 12. (11/5, 11/7)

Topics
Distributed File Systems: Google File System
NoSQL storage system II. Key-Value storage systems (Amazon's Dynamo)


Readings
Giuseppe DeCandia, et al., "Dynamo: Amazon’s Highly Available Key-value Store," Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, pp. 205-220 [Link]

Lecture Notes
11/5: TBA
11/7: TBA


Recitation
11/9: TBA

Notes

Week 13. (11/12, 11/14)

Topics
NoSQL storage system I. Key-Value storage systems (Amazon's Dynamo)-continued
NoSQL storage system II. Colume Family storage systems (Google's BigTable)



Readings
Giuseppe DeCandia, et al., "Dynamo: Amazon’s Highly Available Key-value Store," Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, pp. 205-220 [Link]

Lecture Notes
11/12: TBA
11/14: TBA


Recitation
11/16: TBA

Notes

Week 14. (11/19, 11/21)
Fall Recess: No Classes are held.

Week 15. (11/26, 11/28)
Topics
NoSQL storage system II. Colume Family storage systems (Google's BigTable) -- Continued
Data Exchange Models
Representational State Transfer (REST)

Readings
Roy Thomas Fielding, "Architectural Styles and the Design of Network-based Software Architectures," Chapter 5. Representational State Transfer (REST), 2000 [Link]

Lecture Notes
11/26: TBA
11/28: TBA

Recitation
11/30: TBA

 
Week 16. (12/3, 12/5)

Term Project Presentations: Schedule (TBA)


Notes

Final Exam: 12/14(Friday) 7:30 AM ~ 9:30 AM CSB130
Fall 2018 Final Exam Schedule

Home Syllabus Schedule Assignments Grading Policy Course Policy Code of Conduct Canvas