csu logo green Computer Science Department

Introduction to Big Data: schedule CS 435
Fall 2018
| Home | Syllabus | Schedule | Assignments | Grading Policy | Course Policy | Code of Conduct | Canvas |

Note that this schedule will be altered during the semester. Please make sure to check it every week.
Week 1. (8/20, 8/22)
Topics
- Introduction to Big Data
- Course Introduction
- Big Data and Analytics: Data collection, Sampling and Preprocessing
- Overview of Big Data Computing Stack
- Introduction to MapReduce

Readings
J. Ginsberg, et al., “Detecting influenza epidemics using search engine query data” Nature 457 pp. 1012 ~ 1014, February 2009, Link

Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2012 --Chapter 1-1.2/1.3 [link]

Useful Links
Keshav's "How to read a paper"[Link]
"How to Read and Understand a Scientific Paper: A Step-by-Step Guide for Non-Scientists"[Link]

 

Lecture Notes
8/20: [DOWNLOAD]
8/22: [DOWNLOAD]

Recitation
8/24: The video clip is now available at the canvas course page.


Notes


8/24 Restricted Drop Deadline
8/26 Add Without Override Deadline
9/5 Census (Add/Drop Ends)
10/15 End Course Withdrawal Period
10/15 Repeat/Delete Deadline
Colorado State University Academic Calendar [Link]


 

Week 2. (8/27, 8/29)

Topics
- MapReduce Design Pattern I. Numerical Summarization

Readings
Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2012 --Chapter 1-1.2/1.3 [link]

Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, "The Google File System"  Proceedings of SOSP 2003: 29-43, Link


Lecture Notes
8/27: [DOWNLOAD]
8/29: [DOWNLOAD]

Recitation
8/31: The video clip is now available at the canvas course page.

Notes
9/5 Census (Add/Drop Ends)
10/15 End Course Withdrawal Period
10/15 Repeat/Delete Deadline
Colorado State University Academic Calendar [Link]

 
Week 3. (9/3, 9/5)

Topics
- MapReduce Design Pattern II. Filtering Patterns
- MapReduce Design Pattern III. Data Organization Patterns

Readings
Jeffrey Dean and Sanjay Ghemawat, "MapReduce:Simplified Data Processing on Large Clusters" In Proceeding
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Vol. 6, [Link]

Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2012 --Chapter 2 [link]

Lecture Notes  
9/3: University Holiday (No Class)
9/5:[DOWNLOAD]: Examples updated

Recitation
9/7: The video clip is now available at the canvas course page.

Notes


 
Week 4. (9/10, 9/12)

Topics
- MapReduce Design Pattern IV. Join Patterns
- MapReduce Design Pattern V. I/O patterns

Readings
Jeffrey Dean and Sanjay Ghemawat, "MapReduce:Simplified Data Processing on Large Clusters" In Proceeding
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Vol. 6, [Link]

Lecture Notes
9/10: [DOWNLOAD]:
9/12: [DOWNLOAD]:

Recitation
9/14: The video clip is now available at the canvas course page.

Notes

 
Week 5. (9/17, 9/19)
Topics
- MapReduce Design Pattern V. I/O patterns
- How MapReduce works



Readings
Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2012 --Chapter 3 [link]

Lecture Notes
9/17:[DOWNLOAD]:
9/19:[DOWNLOAD]:


Recitation
9/21: The video clip is now available at the canvas course page.

Notes

 

 
Week 6. (9/24,9/26)
Topics
Large-scale Analytics 1. Web-Scale Link Analysis

Readings
Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2012 --Chapter 5 [link]


Lecture Notes
9/24:[DOWNLOAD]:
9/26:[DOWNLOAD]:


Recitation
9/28: The video clip is now available at the canvas course page.

Notes

 
Week 7. (10/1, 10/3)

Topics
Large-scale Analytics 1. Web-Scale Link Analysis: continued

Midterm Review

Lecture Notes
10/1:[DOWNLOAD]
10/3:[DOWNLOAD]

Recitation
10/5: The video clip is now available at the canvas course page.

Notes



 
Week 8. (10/8, 10/10)

Topics
Midterm (10/8)
Large-Scale Analytics 3. Predictive Analytics: Linear Regression using Gradient Descent Algorithm
Planning Term Project

Lecture Notes
10/8: Midterm
10/10:[DOWNLOAD]:

Recitation
10/12:The video clip is now available at the canvas course page.

Notes

 
Week 9. (10/15, 10/17)

Topics
Large-Scale Analytics 4. Recommendation Systems: Collaborative Filtering

Readings
Cheng-Tao Chu, Sang Kyun Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Andrew Y. Ng, and Kunle Olukotun. 2006. Map-reduce for machine learning on multicore. In Proceedings of the 19th International Conference on Neural Information Processing Systems (NIPS'06), B. Schölkopf, J. C. Platt, and T. Hoffman (Eds.). MIT Press, Cambridge, MA, USA, 281-288. [Link]

Lecture Notes
10/15:[DOWNLOAD]:
10/17:[DOWNLOAD]:

Recitation
10/19: The video clip is now available at the canvas course page.


Notes

 

Part 2. Data Retrieval and Exchange

Week 10. (10/22, 10/24)

Topics
In-Memory Computing Framework for Scalable Analytics with Apach Spark


Readings
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica, “Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing,” The 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12) [Link]

 

Lecture Notes
10/22:[DOWNLOAD]
10/24:[DOWNLOAD]

Recitation
10/26: The video clip is now available at the canvas course page.

Notes

Week 11. (10/29, 10/31)

Topics
In-Memory Computing Framework for Scalable Analytics with Apach Spark-Continued
Distributed File Systems: Google File System

Readings

Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, "The Google File System"  Proceedings of SOSP 2003: 29-43, Link

Lecture Notes
10/29:[DOWNLOAD]
10/31:[DOWNLOAD]

Recitation
11/2:The video clip is now available at the canvas course page.

Notes

Week 12. (11/5, 11/7)

Topics
Distributed File Systems: Google File System
NoSQL storage system II. Key-Value storage systems (Amazon's Dynamo)


Readings
Giuseppe DeCandia, et al., "Dynamo: Amazon’s Highly Available Key-value Store," Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, pp. 205-220 [Link]

Lecture Notes
11/5:[DOWNLOAD]
11/7: [DOWNLOAD]


Recitation
11/9:The video clip is now available at the canvas course page.

Notes

Week 13. (11/12, 11/14)

Topics
NoSQL storage system I. Key-Value storage systems (Amazon's Dynamo)-continued
NoSQL storage system II. Colume Family storage systems (Google's BigTable)



Readings
Giuseppe DeCandia, et al., "Dynamo: Amazon’s Highly Available Key-value Store," Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, pp. 205-220 [Link]

Lecture Notes
11/12:[DOWNLOAD]:UPDATED
11/14:[DOWNLOAD]

Recitation
11/16:No Recitation

Notes

Week 14. (11/19, 11/21)
Fall Recess: No Classes are held.

Week 15. (11/26, 11/28)
Topics
NoSQL storage system II. Colume Family storage systems (Google's BigTable) -- Continued
Readings
Giuseppe DeCandia, et al., "Dynamo: Amazon’s Highly Available Key-value Store," Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, pp. 205-220 [Link]

Lecture Notes
11/26:[DOWNLOAD]
11/28: Final Term Project Presentations I

Recitation
11/30: No Recitation

 
Week 16. (12/3, 12/5)

Term Project Presentations: Schedule (TBA)


Notes

Final Exam: 12/14(Friday) 7:30 AM ~ 9:30 AM CSB130
Final Exam Guidelines [DOWNLOAD]
Fall 2018 Final Exam Schedule

Home Syllabus Schedule Assignments Grading Policy Course Policy Code of Conduct Canvas