csu logo green Computer Science Department

Big Data: Schedule CS535
Spring 2020
Home Syllabus Schedule Assignments Grading Policy Course Policy Code of Conduct Canvas
Note that this schedule will be altered during the semester. Please make sure to check it every week.

Week 01 | Jan. 22

Topics

1. Introduction to Big Data
Course Introduction


Readings
[W1R1] Keshav's "How to read a paper"[Link]
[W1R2] "How to Read and Understand a Scientific Paper: A Step-by-Step Guide for Non-Scientists"[Link]

Lecture Notes
1/22:
2 slides per page [Download]
6 slides per page [Download]

Notes
1/20 University Holiday (No class)
1/24 End Restricted Drop
1/26 End Add Without Override
2/05 Registration Closes: Last day for dropping courses without record entry, changes in grade option, and tuition and fee adjustment
3/14 Spring Break
3/23 Classes Resume
3/23 End Course Withdrawal ("W") Period
3/23 Repeat/Delete Deadline
5/08 Last Day of Classes: University Withdrawal Deadline

CSU Academic Calendar 2019-20 [Link]

Week 02 | Jan. 27 | Jan. 29

Topics
2. Data process paradigms for Big Data
3. Distributed Computing Models for Scalable Batch Computing
Part 1. MapReduce
Part 2. In-Memory Cluster Computing- Apache Spark

Readings
[W2R1] Jeffrey Dean and Sanjay Ghemawat, "MapReduce:Simplified Data Processing on Large Clusters" In Proceeding
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Vol. 6, [Link]

Lecture Notes
1/27:
2 slides per page [Download]
6 slides per page [Download]


1/29:
2 slides per page [Download]
6 slides per page [Download]


Notes


2/05 Registration Closes: Last day for dropping courses without record entry, changes in grade option, and tuition and fee adjustment
3/14 Spring Break
3/23 Classes Resume
3/23 End Course Withdrawal ("W") Period
3/23 Repeat/Delete Deadline
5/08 Last Day of Classes: University Withdrawal Deadline

CSU Academic Calendar 2019-20 [Link]


Week 03 | Feb. 03 | Feb. 05

Topics
3. Distributed Computing Models for Scalable Batch Computing: Part 2. In-Memory Cluster Computing- Apache Spark

Readings
[W3R1] Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica, Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, USENIX on NSDI 2012, [Link]

Lecture Notes
2/3:
2 slides per page [Download]
6 slides per page [Download]


2/5:
2 slides per page [Download]
6 slides per page [Download]

Notes



Week 04 | Feb. 10 | Feb. 12

Topics
3. Distributed Computing Models for Scalable Batch Computing: Part 2. In-Memory Cluster Computing- Apache Spark
4. Real-time Streaming Computing Models: Apache Storm and Twitter Heron

Readings

[W4R1] Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, and Dmitriy Ryaboy. 2014. Storm@twitter. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD '14). ACM, New York, NY, USA, 147-156. DOI: https://doi.org/10.1145/2588555.2595641 [Link]

[W4R2] Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthik Ramasamy, and Siddarth Taneja. 2015. Twitter Heron: Stream Processing at Scale. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15). ACM, New York, NY, USA, 239-250. DOI: https://doi.org/10.1145/2723372.2742788 [Link]

Lecture Notes
2/10:
2 slides per page [Download]
6 slides per page [Download]

2/12:

2 slides per page [Download]
6 slides per page [Download]



Notes

[GEAR Session I] Week 05 | Feb. 17 | Feb. 19

Guided Exploration for Big Data Analytics Research (GEAR) [Visit the GEAR Session page]
The CS535 Guided Exploration for Big Data Analytics Research (GEAR) Sessions are designed to provide a guided learning environment for advanced topics in Big Data analytics research.  GEAR involves active participation from students. The class will involve lectures (up to 75%) discussing fundamental concepts of the targeted topic and about 25% of the class will be based on student-led research discussions. Students will provide a critical review of cutting-edge research papers. These discussions provide students an opportunity to extend their knowledge and concepts covered in the lectures to real-world problems, and further explore future research directions. More info

Topics

GEAR Session I: Peta-scale Storage Systems
5. Scalable Distributed File Systems: Google File System I and II


Readings (These are for the lectures)
[GEAR-I-1] Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, "The Google file system"  Proceedings of SOSP 2003: 29-43 [Link]

[GEAR-I-2] C.K.P. Clarke, Reed-Solomon Error Correction [Link]

Lecture Notes
2/17:
2 slides per page [Download]
6 slides per page [Download]

2/29:

2 slides per page [Download]
6 slides per page [Download]


Notes


[GEAR Session I] Week 06 | Feb. 24 | Feb. 26

Topics

GEAR Session I: Peta-scale Storage Systems
6. Scalable NoSQL Systems: DHT based key-value storage

Readings (for lectures)
[GEAR-I-3] Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. 2001. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications (SIGCOMM '01). ACM, New York, NY, USA, 149-160. DOI=http://dx.doi.org/10.1145/383059.383071

[GEAR-I-4] Lakshman, Avinash & Malik, Prashant. (2010). Cassandra — A Decentralized Structured Storage System. Operating Systems Review. 44. 35-40. 10.1145/1773912.1773922.

Lecture Notes
2/24:
2 slides per page [Download]
6 slides per page [Download]


2/26: Please submit your slides at least 2 hours before the presentation via canvas.

Notes

 

[GEAR Session II] Week 07 | Mar. 02 | Mar. 04

Topics
GEAR Session II: Machine Learning for Big Data
7. Large-scale Clustering
8. Deep Learning

Readings (for lectures)

[GEAR-II-1] Bahmani, B., Moseley, B., Vattani, A., Kumar, R. and Vassilvitskii, S., 2012. Scalable k-means++. arXiv preprint arXiv:1203.6402.

[GEAR-II-2] Jeffrey Dean and Greg S. Corrado and Rajat Monga and Kai Chen and Matthieu Devin and Quoc V. Le and Mark Z. Mao and Marc’Aurelio Ranzato and Andrew Senior and Paul Tucker and Ke Yang and Andrew Y. Ng, Large Scale Distributed Deep Networks, 2012, NIPS

[GEAR-II-3] Martin Abadi and Paul Barham and Jianmin Chen and Zhifeng Chen and Andy Davis and Jeffrey Dean and Matthieu Devin and Sanjay Ghemawat and Geoffrey Irving and Michael Isard and Manjunath Kudlur and Josh Levenberg and Rajat Monga and Sherry Moore and Derek G. Murray and Benoit Steiner and Paul Tucker and Vijay Vasudevan and Pete Warden and Martin Wicke and Yuan Yu and Xiaoqiang Zheng, TensorFlow: A system for large-scale machine learning, 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), 2016

Lecture Notes
3/02:
2 slides per page [Download]
6 slides per page [Download]


3/04:
2 slides per page [Download]
6 slides per page [Download]


Notes



[GEAR Session II] Week 08 | Mar. 09 | Mar. 11

Topics
GEAR Session II
: Machine Learning for Big Data

8. Deep Learning with TensorFlow

Readings (for lectures)
[GEAR-II-1] Bahmani, B., Moseley, B., Vattani, A., Kumar, R. and Vassilvitskii, S., 2012. Scalable k-means++. arXiv preprint arXiv:1203.6402.

[GEAR-II-2] Jeffrey Dean and Greg S. Corrado and Rajat Monga and Kai Chen and Matthieu Devin and Quoc V. Le and Mark Z. Mao and Marc’Aurelio Ranzato and Andrew Senior and Paul Tucker and Ke Yang and Andrew Y. Ng, Large Scale Distributed Deep Networks, 2012, NIPS

[GEAR-II-3] Martin Abadi and Paul Barham and Jianmin Chen and Zhifeng Chen and Andy Davis and Jeffrey Dean and Matthieu Devin and Sanjay Ghemawat and Geoffrey Irving and Michael Isard and Manjunath Kudlur and Josh Levenberg and Rajat Monga and Sherry Moore and Derek G. Murray and Benoit Steiner and Paul Tucker and Vijay Vasudevan and Pete Warden and Martin Wicke and Yuan Yu and Xiaoqiang Zheng, TensorFlow: A system for large-scale machine learning, 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), 2016

Lecture Notes
3/09:
2 slides per page [Download]
6 slides per page [Download]

Notes
3/14 Spring Break
3/23 Classes Resume
3/23 End Course Withdrawal ("W") Period
3/23 Repeat/Delete Deadline
5/08 Last Day of Classes: University Withdrawal Deadline

CSU Academic Calendar 2019-20 [Link]

 

[GEAR Session II] Week 9 | Mar. 23 | Mar. 25

Topics

GEAR Session II: Machine Learning for Big Data
9. Deep Learning with PyTorch

Readings (for lectures)

[GEAR-II-4] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L. and Desmaison, A., 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems (pp. 8024-8035)

Lecture Notes
3/23: No Class
3/25:
2 slides per page [Download]
6 slides per page [Download]

Notes
3/14 Spring Break
3/23 Classes Resume --> 3/25 Classes Resume


CSU Academic Calendar 2019-20 [Link]


[GEAR Session III] Week 10 | Mar. 30| Apr. 01

Topics
GEAR Session III: Big Graph Analysis

Readings (for lectures)

[GEAR-III-1] Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, Names C. Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski, “Pregel: a system for large-scale graph processing”, Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135-146  

[GEAR-III-2] Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: graph processing in a distributed dataflow framework. In Proceedings of the 11th USENIX conference on Operating Systems Design and Implementation (OSDI'14). USENIX Association, Berkeley, CA, USA, 599-613.

Lecture Notes
3/30:
2 slides per page [Download]
6 slides per page [Download]

4/2: postponed to 4/6

Notes


[GEAR Session IV] Week 11 | Apr. 06 | Apr. 08

Topics
Workshop IV: Large Scale Recommendation Systems and Social Media

Readings (for lectures)

[GEAR-III-2] Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: graph processing in a distributed dataflow framework. In Proceedings of the 11th USENIX conference on Operating Systems Design and Implementation (OSDI'14). USENIX Association, Berkeley, CA, USA, 599-613.

Lecture Notes
4/06: Vertex-cut Graph Processing Model: GraphX
2 slides per page [Download]
6 slides per page [Download]

4/08: Large Scale Recommendation Systems
TBA

 

[GEAR Session IV] Week 12| Apr. 13 | Apr. 15

Topics
Workshop IV: Large Scale Recommendation Systems and Social Media

Readings (for lectures)

TBA

Lecture Notes
4/13:TBA. 15 | Apr. 17
[GEAR Session V] Week 13 | Apr. 20| Apr. 22

Topics
Workshop V:
Algorithmic Techniques for Big Data

Readings (for lectures)

TBA

Lecture Notes
4/20: TBA
4/24: TBA

 

[GEAR Session V] Week 14 | Apr. 27| Apr. 29

Topics
Workshop V: Algorithmic Techniques for Big Data

Readings (for lectures)

TBA

Lecture Notes
4/27: TBA

[Presentation Week] Week 15 | May 04| May 06

Schedule
TBA