| Note that this schedule will be altered during the semester. Please make sure to check it every week. |
| SECTION A |
Internet-scale Data Storage: Data Lifecycle |
Week 1
1/23, 1/25 |
Lectures
Title: Course Introduction and Introduction to data management
Title: Large Scale Data Storage System
Topics: Storage Technology
Lecture Notes
[Download]
Readings
[Optional Text] Arie Shoshani, and Doron Rotem,
"Storage Technology",Scientific Data Management Challenges, Technology, and Deployment, Chapman & Hall/CRC, 2010
Chapter 1. Storage Technology and Efficient Storage Access
Notes
Term Project Guideline [Download]
Phase 1 submission: Due on Jan. 28 by noon--> Extended to Jan 31 by noon
|
Week 2
1/28, 1/30, 2/1
|
Lectures
Title: Large Scale Data Storage System - Continued
Topic: Data storage and access: GPFS, Lustre
Topic: Dynamic Storage Management: SRM, Case Study in Data Management in WLCG
Lecture Notes
[Download]
Readings
Frank B. Schmuck, Roger L, Haskin, GPFS: A Shared-Disk File System for large Computing Clusters, Proceedings of the USENIX Conference on File and Storage Technologies, pp. 231-244, 2002 [Link]
[Optional Text] Arie Shoshani, and Doron Rotem,
"Storage Technology",Scientific Data Management Challenges, Technology, and Deployment, Chapman & Hall/CRC, 2010
Chapter 2.Parallel Data Storage and Access
Chapter 3. Dynamic Storage Management
Note
Programming Assignment 1 has been posted. Due on Feb. 21 by noon [Link]
|
Week 3
2/4, 2/6, 2/8
|
Lectures
GPFS - continuted
Large Scale Data Storage System: Google File System
Topics: Architecture, Chunk management, metadata, Consistency Model, Leases and Mutation Order, Replication management, Garbage Collection,and Fault Tolerant Scheme
Lecture Notes
2/4 lecture 3-1 [Download]
2/6 lecture 3-2 [Download]
2/8 lecture 3-3
[Download]
Reading
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, "The Google File System", ACM SIGOPS Operating Systems, 2003 [Download]
References
Apache Hadoop project: Open source version of Google File System [Link]
Tom White, Hadoop: The Definitive Guide, O'Reilly Media, 2012, ISBN-10: 1449311520, ISBN-13: 978-1449311520 [Link]
|
| Week 4 2/11, 2/13, 2/15 |
Lectures
Topics: Google File System continued
Lecture Notes
2/11 lecture 4-1
[Download]
2/13 lecture 4-2
[Download]
2/15 lecture 4-3
[Download]
Readings
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, "The Google File System", ACM SIGOPS Operating Systems, 2003 [Download]
Notes
Term Project: Phase 2 due on 2/11 by noon
Term Project Phase 2 interview: Please check the schedule posted on RamCT!
|
Week 5
2/18, 2/20, 2/22 |
Lectures
Title: Data Cleaning: Duplicate Detection-1
Topics: Introduction to Data Cleaning, Similaraty Functions
Lecture Notes
2/18 lecture 5-1
[Download]
2/20 lecture 5-2
[Download]
2/22 lecture 5-3
[Download]
Readings
An Introduction to Duplicate Detection, Felix Naumann and Melanie Hershel [download]
|
Week 6
2/25, 2/27, 3/1
|
Lectures
Title: Data Cleaning: Duplicate Detection-2
Topics: Duplicate Detection Algorithms, Evaluating Detection Success
Lecture Notes
2/25 lecture 6-1
[Download]
2/27 lecture 6-2
[Download]
3/01 lecture 6-3
[Download]
Readings
An Introduction to Duplicate Detection, Felix Naumann and Melanie Hershel [download]
|
| SECTION B |
Data Representation, Models and Metadata |
Week 7
3/4, 3/6, 3/8 |
Midterm
Review for Midterm (3/4)
Midterm (3/6) in class
Lectures
Title: Semistructured data and XML - 1
Topics: Bisimulation, Regular Path Expressions, Query Languages for the Semistructured data model,
Structural recursion
Lecture Notes
3/04 lecture 7-1 (review for midterm)
[Download]
3/08 lecture 7-3 [Download]
Readings
[Optional Text] Serge Abiteboul, Peter Buneman, and Dan Suciu, "Data on the Web: From Relations to Semistructured Data and XML,"
Morgan Kaufman Series in Data Management Systems
XML Query Data Model, [Link]
Notes
Assignment #2 due on 4/2 by noon [Download: description] [Download: sample data]
|
Week 8
3/11, 3/13, 3/15 |
Lectures
Title: Semistructured data and XML - 2
Topics: Schema, XMLSchema, Query Analysis, Index and compressing semi-structured data
Lecture Notes
3/11 lecture 8-1 [Download]
3/13 lecture 8-2 [Download]
3/15 lecture 8-3 [Download]
Readings
[Optional Text] Serge Abiteboul, Peter Buneman, and Dan Suciu, "Data on the Web: From Relations to Semistructured Data and XML,"
Morgan Kaufman Series in Data Management Systems
Note
Term Project Phase 3 report due on 3/11 by noon. (via RamCT)
|
Week 9
3/18, 3/20, 3/22 |
Spring Break
|
Week 10
3/25, 3/27, 3/29 |
Lectures
Title: Semistructured data and XML - 3
Topics: continued
Title: Data about Data: Metadata
Topics:
Dublin Core, metadata for Digital Libraries, Multimedia, Geospatial data, and Interoperability and Exchange of Metadata
Lecture Notes
3/25, 3/27 lecture 10-1[Download]
3/29 lecture 10-2[Download]
Readings
[Optional Text] Serge Abiteboul, Peter Buneman, and Dan Suciu, "Data on the Web: From Relations to Semistructured Data and XML,"
Morgan Kaufman Series in Data Management Systems
|
Week 11
4/1, 4/3, 4/5 |
Lectures
Title: Metadata models and exchange
Topics: SOAP, Restful Web service
Lecture Notes
4/1 lecture 11-1[Download]
4/3 lecture 11-2[Download]
4/5 lecture 11-3[Download]
Notes
Assignment #2 due on 4/2 by noon
|
| SECTION C |
Understanding Big Data: Analytics |
Week 12
4/8, 4/10, 4/12 |
Lectures
SOAP, Restful Web service
Cloud Data Analytics
Lecture Notes
4/8 lecture 12-1[Download]
4/10 lecture 12-2[Download]
4/12 lecture 12-3[Download]
Readings
[Optional Text] Alan Gates, Programming Pig, O'Reilly, 2011, [Link to the Open Feedback Publishing System]
Notes
Help session for the assignment #3: 1:00~2:00 PM (CSB 325)
|
Week 13
4/15, 4/17, 4/19 |
Lectures
Hadoop's Pig Latin: Data flow management
Lecture Notes
4/15 lecture 13-1[Download]
4/17 lecture 13-2[Download]
4/19 lecture 13-3[Download]
Readings
[Optional Text] Alan Gates, Programming Pig, O'Reilly, 2011, [Link to the Open Feedback Publishing System]
Notes
Term project phase 4 due on 4/16 by noon --> extended to 4/22 by noon
|
Week 14
4/22, 4/24, 4/26 |
Lectures
Hadoop's Pig Latin: Data flow management (Continued)
Data Ontology and Semantic Web-1
Lecture Notes
4/22 lecture 14-1[Download]
4/24 lecture 14-2[Download]
4/26 lecture 14-3[Download]
Readings
[Optional Text] Dean Allengang and Jim Hendler, Semantic Web for the Working Ontolotist, Effective Modeling in RDFS and OWL, Morgan Kaufmann, 2011, ISBN 978-0-12-385965-5
[Link to the web page for Examples]
Notes
Assignment #3 due on 4/22 by noon [Download]
|
Week 15
4/29, 5/1, 5/3
|
Lectures
Data Ontology and Semantic Web-2
Term Project Presentation
Lecture Notes
4/29 lecture 15-1[Download]
5/1 lecture 15-2[Download]
5/3 Term Project Presentation
Matthew (Matt) Stobber, and Nicholas Williams
Daniel Hallworth, and Drew Hopkins
Johannes Paraan, Nate Prewitt, and Charlie Wahlquist
Readings
[Optional Text] Dean Allengang and Jim Hendler, Semantic Web for the Working Ontologist, Effective Modeling in RDFS and OWL, Morgan Kaufmann, 2011, ISBN 978-0-12-385965-5
[Link to the web page for Examples] |
Week 16
5/6, 5/8, 5/11
|
Schedules
5/6 Term Project Presentation
Annika Muelbradt and Cameron Tolooee
Spencer Hale and Chris Millard
Jared Koontz, Zach Mcgaughey, and Russell Geroche
5/8 Term Project Presentation
Chris Huval, and Subhojeet Mukherjee
Danielle Alexander and Nathan Nuber
5/11 Review for the Final Exam [Download]
Notes
Assignment #4 due on 5/7 by noon [Download]-->extended to 5/11 (5:00PM)
|
Final Exam |
May, 13, 2012 2:00 ~ 4:00PM CSU Spring 2013 Final exam schedule |