csu logo green Computer Science Department

Introduction to Big Data: schedule CS 435
Spring 2018
| Home | Syllabus | Schedule | Assignments | Grading Policy | Course Policy | Code of Conduct | Canvas |

Note that this schedule will be altered during the semester. Please make sure to check it every week.
Week 1. (1/17)
Topics
Introduction to Big Data
Course Introduction



Readings
J. Ginsberg, et al., “Detecting influenza epidemics using search engine query data” Nature 457 pp. 1012 ~ 1014, February 2009, Link

Useful Links
Keshav's "How to read a paper"[Link]
"How to Read and Understand a Scientific Paper: A Step-by-Step Guide for Non-Scientists"[Link]

 

Lecture Notes
1/17 [DOWNLOAD] 

Recitation
1/20: Video


Notes


1/19 Restricted Drop Deadline
1/21 Add Without Override Deadline
1/30 Add/Drop deadline for Most Courses
Colorado State University Academic Calendar [Link]


 

Week 2. (1/22, 1/24)

Topics
- Big Data and Analytics: Data collection, Sampling and Preprocessing

- Overview of Big Data Computing Stack

- Introduction to MapReduce

- MapReduce Design Pattern I. Numerical Summarization

Readings
Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2012 --Chapter 1-1.2/1.3 [link]

Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, "The Google File System"  Proceedings of SOSP 2003: 29-43, Link


Lecture Notes
1/22: [DOWNLOAD]
1/24: [DOWNLOAD]

Recitation
1/26: Video


Notes

 
Week 3. (1/29, 1/31)

Topics
- MapReduce Design Pattern II. Filtering Patterns

Readings
Jeffrey Dean and Sanjay Ghemawat, "MapReduce:Simplified Data Processing on Large Clusters" In Proceeding
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Vol. 6, [Link]

Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2012 --Chapter 2 [link]

Lecture Notes  
1/29: [DOWNLOAD]
1/31: [DOWNLOAD]

Recitation
2/2:
Video

Notes

 
Week 4. (2/5, 2/7)

Topics
MapReduce Design Pattern III. Data Organization Patterns
MapReduce Design Pattern IV. Join Patterns

Readings
Jeffrey Dean and Sanjay Ghemawat, "MapReduce:Simplified Data Processing on Large Clusters" In Proceeding
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Vol. 6, [Link]

Lecture Notes
2/5: [DOWNLOAD]
2/7: [DOWNLOAD]

Notes
Programming Assignment 1 has been posted.[
Link]

 

 
Week 5. (2/12, 2/14)
Topics
- MapReduce Design Pattern V. I/O patterns
- How MapReduce works



Readings
Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2012 --Chapter 3 [link]

Lecture Notes
2/12: [DOWNLOAD]
2/14: [DOWNLOAD]


Notes

 

 
Week 6. (2/19,2/21)
Topics
Large-scale Analytics 1. Web-Scale Link Analysis

Readings
Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2012 --Chapter 5 [link]


Lecture Notes
2/19: [DOWNLOAD]
2/21: [DOWNLOAD]


Notes
Midterm: 3/6 in class



 
Week 7. (2/26, 2/28)

Topics
Large-scale Analytics 1. Web-Scale Link Analysis: continued
Large-scale Analytics 2. Clustering: K-Means Clustering using Canopy algorithm
Midterm Review

Lecture Notes
2/26: [DOWNLOAD]
2/28: [DOWNLOAD]


Notes
Midterm: 3/5 in class



 
Week 8. (3/5, 3/7)

Topics
Midterm (3/5)
Large-Scale Analytics 3. Predictive Analytics: Linear Regression using Gradient Descent Algorithm
Planning Term Project

Lecture Notes
3/7: [DOWNLOAD]

Notes

 
Week 9. (3/12, 3/14)

Spring recess: No class

Notes

Part 2. Data Retrieval and Exchange

Week 10. (3/19, 3/21)

Topics
Large-Scale Analytics 4. Recommendation Systems: Collaborative Filtering

Readings
Cheng-Tao Chu, Sang Kyun Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Andrew Y. Ng, and Kunle Olukotun. 2006. Map-reduce for machine learning on multicore. In Proceedings of the 19th International Conference on Neural Information Processing Systems (NIPS'06), B. Schölkopf, J. C. Platt, and T. Hoffman (Eds.). MIT Press, Cambridge, MA, USA, 281-288. [Link]

Lecture Notes
3/19: [DOWNLOAD]
3/21: [DOWNLOAD]


Notes

 

 
Week 11. (3/26, 3/28)

Topics
In-Memory Computing Framework for Scalable Analytics with Apach Spark


Readings
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica, “Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing,” The 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12) [Link]

 

Lecture Notes
3/26: [DOWNLOAD]
3/28: [DOWNLOAD]


Notes


Week 12. (4/2, 4/4)

Topics
In-Memory Computing Framework for Scalable Analytics with Apach Spark-Continued
Distributed File Systems: Google File System

Readings

Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, "The Google File System"  Proceedings of SOSP 2003: 29-43, Link

Lecture Notes
4/2: [DOWNLOAD]
4/4: [DOWNLOAD]

Notes

Week 13. (4/9, 4/11)

Topics
Distributed File Systems: Google File System
NoSQL storage system II. Key-Value storage systems (Amazon's Dynamo)


Readings
Giuseppe DeCandia, et al., "Dynamo: Amazon’s Highly Available Key-value Store," Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, pp. 205-220 [Link]

Lecture Notes
4/9: [DOWNLOAD]
4/11: [DOWNLOAD]


Notes

Week 14. (4/16, 4/18)

Topics
NoSQL storage system I. Key-Value storage systems (Amazon's Dynamo)-continued
NoSQL storage system II. Colume Family storage systems (Google's BigTable)



Readings
Giuseppe DeCandia, et al., "Dynamo: Amazon’s Highly Available Key-value Store," Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, pp. 205-220 [Link]

Lecture Notes
4/16: [DOWNLOAD]
4/18: [DOWNLOAD]


Notes


Week 15. (4/23, 4/25)
Topics
NoSQL storage system II. Colume Family storage systems (Google's BigTable) -- Continued
Data Exchange Models
Representational State Transfer (REST)

Readings
Roy Thomas Fielding, "Architectural Styles and the Design of Network-based Software Architectures," Chapter 5. Representational State Transfer (REST), 2000 [Link]

Lecture Notes
4/23: [DOWNLOAD]
4/25: [DOWNLOAD]
Final Exam preparation guide: [DOWNLOAD]

 
Week 16. (4/30, 5/2)

Term Project Presentations: Schedule (TBA)


Notes

Final Exam: 5/7(Monday) 11:50 AM ~ 1:50 PM CSB130
Spring 2018 Final Exam Schedule

Home Syllabus Schedule Assignments Grading Policy Course Policy Code of Conduct Canvas