csu logo green Computer Science Department

Introduction to Big Data: schedule CS 435
Spring 2017
| Home | Syllabus | Schedule | Assignments | Grading Policy | Course Policy | Code of Conduct | Canvas |

Note that this schedule will be altered during the semester. Please make sure to check it every week.
Part 0. What is Big Data?
Week 1. (1/18)

Topics
Introduction to Big Data


Readings
J. Ginsberg, et al., “Detecting influenza epidemics using search engine query data” Nature 457 pp. 1012 ~ 1014, February 2009, Link

Useful Links
Keshav's "How to read a paper"[Link]
"How to Read and Understand a Scientific Paper: A Step-by-Step Guide for Non-Scientists"[Link]

Lecture Notes
1/18 [Download]


Notes

1/16 University Holliday
1/17 First day of class in Spring 2017

1/20 Restricted Drop Deadline
1/23 Add Without Override Deadline
3/20 Course Withdrawal Period Ends
Colorado State University Academic Calendar [Link]


 

Part 1. Large Scale Data Analysis with MapReduce
Week 2. (1/23, 1/25)

Topics
Introduction to MapReduce
Distributed File System: Google File System

Readings
Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2012 --Chapter 1-1.2/1.3 [link]

Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, "The Google File System"  Proceedings of SOSP 2003: 29-43, Link


Lecture Notes
1/23 [Download]
1/25 [Download]


Notes

 
Week 3. (1/30, 2/1)

Topics
Distributed File System: Google File System: Continued

Readings
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, "The Google File System"  Proceedings of SOSP 2003: 29-43, Link

Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2012 --Chapter 2 [link]

Lecture Notes
1/30 [Download]
2/1 [Download]

Notes

 
Week 4. (2/6, 2/8)

Topics
How does MapReduce work?
Introduction to Input and Output Management in MapReduce

Input and Output Patterns in MapReduce

Readings
Jeffrey Dean and Sanjay Ghemawat, "MapReduce:Simplified Data Processing on Large Clusters" In Proceeding
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Vol. 6, [Link]

Lecture Notes
2/6 [Download]
2/8 [Download]

Notes
Programming Assignment 1 has been posted.[Link]

 
Week 5. (2/13, 2/15)

Topics
Finding Similar Item with MapReduce


Readings
Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2012 --Chapter 3 [link]

Lecture Notes
2/13 [Download]
2/15 [Download]

Notes

 

 
Week 6. (2/20,2/22)

Topics
Web-scale Link Analysis with MapReduce

Readings
Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2012 --Chapter 5 [link]


Lecture Notes
2/20 [Download]
2/22 [Download]

Notes
Midterm: 3/6 in class



 
Week 7. (2/27, 3/1)

Topics
Lunk Analysis continued
Data Filtering Patterns in MapReduce: filtering, Bloom filtering, top ten and distinct with MapReduce

Midterm Review [Link]

Lecture Notes
2/27 [Download]
3/1 [Download]

Notes
Midterm: 3/6 in class


 
Week 8. (3/6, 3/8)

Topics
Midterm (3/6)
I
nout/Output pattern (3/8)

Lecture Notes
3/7 [Download]

Notes

 
Week 9. (3/13, 3/15)

Spring recess: No class

Notes

Part 2. Data Retrieval and Exchange

Part 2. Data Analytics with Volumenous Datasets
Week 10. (3/20, 3/22)

Topics
Validation methods
Linear Regression with Gradient Descent using MapReduce

Lecture Notes
3/20 [Download]

Readings
Cheng-Tao Chu, Sang Kyun Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Andrew Y. Ng, and Kunle Olukotun. 2006. Map-reduce for machine learning on multicore. In Proceedings of the 19th International Conference on Neural Information Processing Systems (NIPS'06), B. Schölkopf, J. C. Platt, and T. Hoffman (Eds.). MIT Press, Cambridge, MA, USA, 281-288. [Link]


Notes

 

 
Week 11. (3/27, 3/29)

Topics
Linear Regression with Stocastic Gradient Descent using MapReduce
k-Means clustering algorithm with Canopy algorithm using MapReduce
Inout/output pattern
Recommendation systmes


Readings
Cheng-Tao Chu, Sang Kyun Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Andrew Y. Ng, and Kunle Olukotun. 2006. Map-reduce for machine learning on multicore. In Proceedings of the 19th International Conference on Neural Information Processing Systems (NIPS'06), B. Schölkopf, J. C. Platt, and T. Hoffman (Eds.). MIT Press, Cambridge, MA, USA, 281-288. [Link]

 

Lecture Notes
3/27 [Download]
3/29 [Download]


Notes


Week 12. (4/3, 4/5)

Topics
Column Family storage (Google's BigTable)

Readings
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, "Bigtable: A Distributed Storage System for Structured Data" OSDI'06: Seventh Symposium on Operating System Design and Implementation, Seattle, WA, November, 2006 [Link]

Lecture Notes
4/03 [Download]
4/05
[Download]

Notes

Week 13. (4/10, 4/12)

Topics
Column Family storage (Google's BigTable)-continued
Key-Value storage systems (Amazon's Dynamo)


Readings
Giuseppe DeCandia, et al., "Dynamo: Amazon’s Highly Available Key-value Store," Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, pp. 205-220 [Link]

Lecture Notes
4/10 [Download]
4/12 [Download]

Notes

Week 14. (4/17, 4/19)

Topics
Key-Value storage systems (Amazon's Dynamo)-continued
Data flow Management with Hadoop and MapReduce with a case study of the Apache's Pig Latin


Readings
Olston, Christopher and Reed, Benjamin and Srivastava, Utkarsh and Kumar, Ravi and Tomkins, Andrew, "Pig Latin: A Not-so-foreign Language for Data Processing," Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, 2008 [Link]

Lecture Notes
4/17 [Download]
4/19 [Download]

Notes


Week 15. (4/24, 4/26)
Topics
Data Exchange Models
Representational State Transfer (REST)

Readings
Roy Thomas Fielding, "Architectural Styles and the Design of Network-based Software Architectures," Chapter 5. Representational State Transfer (REST), 2000 [Link]

Lecture Notes
4/24 [Download]
4/26
[Download]

Preperation guidelines for final exam [Link]

Quiz 6-11 [Link]

Notes

 
Week 16. (5/1, 5/3)

Term Project Presentations: Schedule (TBA)


Notes

Final Exam: 5/11(Thursday) 7:30 AM ~ 9:30 AM CSB130
Spring 2017 Final Exam Schedule

Home Syllabus Schedule Assignments Grading Policy Course Policy Code of Conduct Canvas