csu logo green Computer Science Department

Introduction to Big Data: schedule CS 435
Spring 2018
| Home | Syllabus | Schedule | Assignments | Grading Policy | Course Policy | Code of Conduct | Canvas |

Note that this schedule will be altered during the semester. Please make sure to check it every week.
Week 1. (1/17)
Topics
Introduction to Big Data
Course Introduction



Readings
J. Ginsberg, et al., “Detecting influenza epidemics using search engine query data” Nature 457 pp. 1012 ~ 1014, February 2009, Link

Useful Links
Keshav's "How to read a paper"[Link]
"How to Read and Understand a Scientific Paper: A Step-by-Step Guide for Non-Scientists"[Link]

 

Lecture Notes
1/17 [DOWNLOAD] 

Recitation
1/20: Video


Notes


1/19 Restricted Drop Deadline
1/21 Add Without Override Deadline
1/30 Add/Drop deadline for Most Courses
Colorado State University Academic Calendar [Link]


 

Week 2. (1/22, 1/24)

Topics
- Big Data and Analytics: Data collection, Sampling and Preprocessing

- Overview of Big Data Computing Stack

- Introduction to MapReduce

Readings
Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2012 --Chapter 1-1.2/1.3 [link]

Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, "The Google File System"  Proceedings of SOSP 2003: 29-43, Link


Lecture Notes
1/22: [DOWNLOAD]


Notes

 
Week 3. (1/29, 1/31)

Topics
- MapReduce Design Pattern I. Numerical Summarization
- MapReduce Design Pattern II. Filtering Patterns

Readings
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, "The Google File System"  Proceedings of SOSP 2003: 29-43, Link

Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2012 --Chapter 2 [link]

Lecture Notes
TBA

Notes

 
Week 4. (2/5, 2/7)

Topics
MapReduce Design Pattern III. Data Organization Patterns
MapReduce Design Pattern IV. Join Patterns

Readings
Jeffrey Dean and Sanjay Ghemawat, "MapReduce:Simplified Data Processing on Large Clusters" In Proceeding
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Vol. 6, [Link]

Lecture Notes
TBA

Notes
Programming Assignment 1 has been posted.[Link]

 
Week 5. (2/12, 2/14)
Topics
- MapReduce Design Pattern V. I/O patterns
- How does MapReduce works
- Large Scale Analytics 1. Web-Scale Link and Social Network Analysis



Readings
Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2012 --Chapter 3 [link]

Lecture Notes
TBA

Notes

 

 
Week 6. (2/19,2/21)
Topics
Large-scale Analytics 1. Web-Scale Link and Social Network Analysis-continued
Large-scale Analytics 2. Predictive analytics: Linear Regression with Gradient Descent

Readings
Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2012 --Chapter 5 [link]


Lecture Notes
TBA

Notes
Midterm: 3/6 in class



 
Week 7. (2/26, 2/28)

Topics
Large-scale Analytics 2. Predictive analytics: K-Mean clustering with Canopy algorithm
Large-scale Analytics 3. Recommendation Systems

Midterm Review

Lecture Notes
TBA

Notes
Midterm: 3/5 in class


 
Week 8. (3/5, 3/7)

Topics
Midterm (3/5)
Evaluation/validation methods
Planning Term Project

Lecture Notes

Notes

 
Week 9. (3/12, 3/14)

Spring recess: No class

Notes

Part 2. Data Retrieval and Exchange

Week 10. (3/19, 3/21)

Topics
Google File System (HDFS)

Readings
Cheng-Tao Chu, Sang Kyun Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Andrew Y. Ng, and Kunle Olukotun. 2006. Map-reduce for machine learning on multicore. In Proceedings of the 19th International Conference on Neural Information Processing Systems (NIPS'06), B. Schölkopf, J. C. Platt, and T. Hoffman (Eds.). MIT Press, Cambridge, MA, USA, 281-288. [Link]

Lecture Notes
TBA 


Notes

 

 
Week 11. (3/26, 3/28)

Topics
In-Memory Computing Framework for Scalable Analytics with Apach Spark


Readings
Cheng-Tao Chu, Sang Kyun Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Andrew Y. Ng, and Kunle Olukotun. 2006. Map-reduce for machine learning on multicore. In Proceedings of the 19th International Conference on Neural Information Processing Systems (NIPS'06), B. Schölkopf, J. C. Platt, and T. Hoffman (Eds.). MIT Press, Cambridge, MA, USA, 281-288. [Link]

 

Lecture Notes
TBA


Notes


Week 12. (4/2, 4/4)

Topics
In-Memory Computing Framework for Scalable Analytics with Apach Spark-Continued
NoSQL storage system I. Column based storage architecture: HBase

Readings
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, "Bigtable: A Distributed Storage System for Structured Data" OSDI'06: Seventh Symposium on Operating System Design and Implementation, Seattle, WA, November, 2006 [Link]

Lecture Notes
TBA

Notes

Week 13. (4/9, 4/11)

Topics
NoSQL storage system I. Column based storage architecture: HBase - Continued
NoSQL storage system II. Key-Value storage systems (Amazon's Dynamo)


Readings
Giuseppe DeCandia, et al., "Dynamo: Amazon’s Highly Available Key-value Store," Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, pp. 205-220 [Link]

Lecture Notes
TBA

Notes

Week 14. (4/16, 4/18)

Topics
NoSQL storage system II. Key-Value storage systems (Amazon's Dynamo)-continued
Data flow Management with Hadoop and MapReduce with a case study of the Apache's Pig Latin


Readings
Olston, Christopher and Reed, Benjamin and Srivastava, Utkarsh and Kumar, Ravi and Tomkins, Andrew, "Pig Latin: A Not-so-foreign Language for Data Processing," Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, 2008 [Link]

Lecture Notes
TBA

Notes


Week 15. (4/23, 4/25)
Topics
Data flow Management with Hadoop and MapReduce with a case study of the Apache's Pig Latin -- Continued
Data Exchange Models
Representational State Transfer (REST)

Readings
Roy Thomas Fielding, "Architectural Styles and the Design of Network-based Software Architectures," Chapter 5. Representational State Transfer (REST), 2000 [Link]

Lecture Notes
TBA

Notes

 
Week 16. (4/20, 5/2)

Term Project Presentations: Schedule (TBA)


Notes

Final Exam: 5/7(Monday) 11:50 AM ~ 1:50 PM CSB130
Spring 2018 Final Exam Schedule

Home Syllabus Schedule Assignments Grading Policy Course Policy Code of Conduct Canvas