csu logo green Computer Science Department

Big Data: Schedule CS535
Spring 2019
Home Syllabus Schedule Assignments Grading Policy Course Policy Code of Conduct Canvas
Note that this schedule will be altered during the semester. Please make sure to check it every week.

Week 01 | Jan. 23

Topics

1. Introduction to Big Data
Course Introduction


Readings
Keshav's "How to read a paper"[Link]
"How to Read and Understand a Scientific Paper: A Step-by-Step Guide for Non-Scientists"[Link]

Lecture Notes
1/23: [Link] --Revised

Notes
1/21 University Holiday (No class)
1/25 Restrict Drop Deadline
1/27 Add without Override Deadline
1/28 Add with Override Begins
2/6   Add/Drop Ends
3/25 End of Course Withdrawal Period

CSU Academic Calendar 2018-19 [Link]

Week 02 | Jan. 28 | Jan. 30

Topics
2. Data process paradigms for Big Data
3. Distributed Computing Models for Scalable Batch Computing: Part 1. MapReduce

Readings
[W2R1] Jeffrey Dean and Sanjay Ghemawat, "MapReduce:Simplified Data Processing on Large Clusters" In Proceeding
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Vol. 6, [Link]

Lecture Notes
1/28: [Link] -- Answers are added
1/30:
[Link]  -- Answers are added (Slides 52 ~ 55)


Notes


CSU Academic Calendar 2018-19 [Link]


Week 03 | Feb. 4 | Feb. 6

Topics
3. Distributed Computing Models for Scalable Batch Computing: Part 2. In-Memory Cluster Computing- Apache Spark

Readings
[W3R1] Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica, Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, USENIX on NSDI 2012, [Link]

Lecture Notes
2/4: [Link]
2/6:
[Link]

Notes

Week 04 | Feb. 11 | Feb. 13

Topics
3. Distributed Computing Models for Scalable Batch Computing: Part 2. In-Memory Cluster Computing- Apache Spark

Readings
[W4R1] Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica, Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, USENIX on NSDI 2012, [Link]

Lecture Notes
2/11: [Link] updated on 2/13/2019
2/13:
[Link] updated on 2/13/2019


Notes

Week 05 | Feb. 18 | Feb. 20

Topics
4. Real-time Streaming Computing Models: Apache Storm and Twitter Heron

Readings

[W5R1] Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, and Dmitriy Ryaboy. 2014. Storm@twitter. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD '14). ACM, New York, NY, USA, 147-156. DOI: https://doi.org/10.1145/2588555.2595641 [Link]

[W5R2] Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthik Ramasamy, and Siddarth Taneja. 2015. Twitter Heron: Stream Processing at Scale. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15). ACM, New York, NY, USA, 239-250. DOI: https://doi.org/10.1145/2723372.2742788 [Link]

Lecture Notes
2/18:
[Link] updated on 2/18/2019
2/20: TBA



Notes


Week 06 | Feb. 25 | Feb. 27

Topics
5. Scalable Distributed File Systems: Google File System I and II

Readings

[W6R1] Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, "The Google file system"  Proceedings of SOSP 2003: 29-43 [Link]

[W6R2] C.K.P. Clarke, Reed-Solomon Error Correction [Link]


Lecture Notes
2/25: TBA
2/27: TBA

Notes

 

Kernel I | Week 07 | Mar. 4 | Mar. 6

Topics
Kernel I: Advanced Big Data Analytics Case Study

Readings

Lecture Notes
3/4: TBA
3/6: TBA


Notes



Kernel I | Week 08 | Mar. 11 | Mar. 13

Topics
Kernel I: Advanced Big Data Analytics Case Study

Readings


Notes

3/13: Rapid Fire Presentation: Term Project Proposal

Week 09 | Mar. 18 | Mar. 20

Spring Break - No Class

Kernel II | Week 10 | Mar. 25 | Mar. 27

Topics
Kernel II: Scalable Computing Models

Readings

Cheng-Tao Chu, Sang Kyun Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, and Andrew Y. Ng, Map-Reduce for Machine Learning on Multicore, NIPS 2006: 281-288, [Link]
Lecture Notes

3/25: TBA
3/27: TBA
 


Notes



Kernel II | Week 11 | Apr. 1 | Apr. 3

Topics
Kernel II: Scalable Computing Models

Framework for Real-time data stream analytics (3)
Scalable NoSQL storage systems (1)
- Distributed Hash Tables and Apache/Facebook Cassandra


Readings
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan,"Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications" Proc. 2001 SIGCOMM, Mar. 2001, pp.149-160[Link]

Avinash Lakshman, Prashant Malik, “A Decentralized Structured Storage System” ACM SIGOPS Operation Systems Review, Vol. 44-(2), April 2010 pp. 35-40[Link]


Lecture Notes
4/1: TBA

Notes


Kernel III | Week 12 | Apr. 8 | Apr. 10

Topics
Kernel III: Large Scale Graph Analysis


Readings


Lecture Notes
4/8: TBA
4/10: TBA


Notes

Kernel III | Week 13 | Apr. 15 | Apr. 17

Topics
Kernel III: Large Scale Graph Analysis

Readings

Lecture Notes

4/17: TBA


Notes


Kernel IV | Week 14 | Apr. 22 | Apr. 24

Topics
Kernel IV: Scalale Data Storage, retrievals and abalytics


Lecture Notes
4/22: TBA
4/24: TBA

Notes


Kernel IV | Week 15 | Apr. 29 | May 1


Topics

TBA


Readings

TBA


Lecture Notes
4/29: TBA



Week 16 | May 6 | May 8

Topics
Term Project Presentation Session I and II

Notes
Schedule: TBA


Home Syllabus Schedule Assignments Grading Policy Course Policy Code of Conduct Canvas