csu logo green Department of Computer Science

CS 480-A1 SPRING
Principles of Data Management 2013

-Home -Syllabus -Schedule -Assignments -Grading Policy -Course Policy -Code of Conduct -RamCT
Schedule
Note that this schedule will be altered during the semester. Please make sure to check it every week.
SECTION A Internet-scale Data Storage: Data Lifecycle
Week 1
1/23, 1/25

Lectures
Title: Course Introduction and Introduction to data management

Title: Large Scale Data Storage System
Topics: Storage Technology

Lecture Notes
[Download]

Readings
[Optional Text] Arie Shoshani, and Doron Rotem, "Storage Technology",Scientific Data Management Challenges, Technology, and Deployment, Chapman & Hall/CRC, 2010
Chapter 1. Storage Technology and Efficient Storage Access

Notes
Term Project Guideline [Download]
Phase 1 submission: Due on Jan. 28 by noon--> Extended to Jan 31 by noon

Week 2
1/28, 1/30, 2/1

Lectures
Title: Large Scale Data Storage System - Continued
Topic: Data storage and access: GPFS, Lustre
Topic: Dynamic Storage Management: SRM, Case Study in Data Management in WLCG

Lecture Notes
[Download]

Readings
Frank B. Schmuck, Roger L, Haskin, GPFS: A Shared-Disk File System for large Computing Clusters, Proceedings of the USENIX Conference on File and Storage Technologies, pp. 231-244, 2002 [Link]

[Optional Text] Arie Shoshani, and Doron Rotem, "Storage Technology",Scientific Data Management Challenges, Technology, and Deployment, Chapman & Hall/CRC, 2010
Chapter 2.Parallel Data Storage and Access
Chapter 3. Dynamic Storage Management


Note
Programming Assignment 1 has been posted. Due on Feb. 21 by noon [Link]

Week 3
2/4, 2/6, 2/8

Lectures
GPFS - continuted
Large Scale Data Storage System: Google File System
Topics: Architecture, Chunk management, metadata, Consistency Model, Leases and Mutation Order, Replication management, Garbage Collection,and Fault Tolerant Scheme

Lecture Notes
2/4 lecture 3-1 [Download]
2/6 lecture 3-2 [Download]
2/8 lecture 3-3 [Download]

Reading
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, "The Google File System", ACM SIGOPS Operating Systems, 2003 [Download]

References
Apache Hadoop project: Open source version of Google File System [Link]
Tom White, Hadoop: The Definitive Guide, O'Reilly Media, 2012, ISBN-10: 1449311520, ISBN-13: 978-1449311520 [Link]

Week 4 2/11, 2/13, 2/15

Lectures
Topics: Google File System continued

Lecture Notes
2/11 lecture 4-1 [Download]
2/13 lecture 4-2 [Download]
2/15 lecture 4-3 [Download]

Readings
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, "The Google File System", ACM SIGOPS Operating Systems, 2003 [Download]

Notes
Term Project: Phase 2 due on 2/11 by noon
Term Project Phase 2 interview: Please check the schedule posted on RamCT!

Week 5
2/18, 2/20, 2/22

Lectures
Title: Data Cleaning: Duplicate Detection-1
Topics: Introduction to Data Cleaning, Similaraty Functions

Lecture Notes
2/18 lecture 5-1 [Download]
2/20 lecture 5-2 [Download]
2/22 lecture 5-3 [Download]

Readings
An Introduction to Duplicate Detection, Felix Naumann and Melanie Hershel [download]


Week 6
2/25, 2/27, 3/1

Lectures
Title: Data Cleaning: Duplicate Detection-2
Topics: Duplicate Detection Algorithms, Evaluating Detection Success

Lecture Notes
2/25 lecture 6-1 [Download]
2/27 lecture 6-2 [Download]
3/01 lecture 6-3 [Download]

Readings
An Introduction to Duplicate Detection, Felix Naumann and Melanie Hershel [download]


SECTION B Data Representation, Models and Metadata
Week 7
3/4, 3/6, 3/8

Midterm
Review for Midterm (3/4)
Midterm (3/6) in class
Lectures
Title: Semistructured data and XML - 1
Topics: Bisimulation, Regular Path Expressions, Query Languages for the Semistructured data model
,
Structural recursion

Lecture Notes
3/04 lecture 7-1 (review for midterm) [Download]
3/08 lecture 7-3 [Download]

Readings
[Optional Text] Serge Abiteboul, Peter Buneman, and Dan Suciu, "Data on the Web: From Relations to Semistructured Data and XML," Morgan Kaufman Series in Data Management Systems

XML Query Data Model, [Link]

Notes
Assignment #2 due on 4/2 by noon [Download: description] [Download: sample data]

Week 8
3/11, 3/13, 3/15

Lectures
Title: Semistructured data and XML - 2
Topics: Schema, XMLSchema, Query Analysis, Index and compressing semi-structured data

Lecture Notes
3/11 lecture 8-1 [Download]
3/13 lecture 8-2 [Download]
3/15 lecture 8-3 [Download]

Readings
[Optional Text] Serge Abiteboul, Peter Buneman, and Dan Suciu, "Data on the Web: From Relations to Semistructured Data and XML," Morgan Kaufman Series in Data Management Systems

Note
Term Project Phase 3 report due on 3/11 by noon. (via RamCT)

Week 9
3/18, 3/20, 3/22

Spring Break

Week 10
3/25, 3/27, 3/29

Lectures
Title: Semistructured data and XML - 3
Topics: continued

Title: Data about Data: Metadata
Topics: Dublin Core, metadata for Digital Libraries, Multimedia, Geospatial data, and Interoperability and Exchange of Metadata

Lecture Notes
3/25, 3/27 lecture 10-1[Download]
3/29 lecture 10-2[Download]

Readings
[Optional Text] Serge Abiteboul, Peter Buneman, and Dan Suciu, "Data on the Web: From Relations to Semistructured Data and XML," Morgan Kaufman Series in Data Management Systems


Week 11
4/1, 4/3, 4/5

Lectures

Title: Metadata models and exchange
Topics: SOAP, Restful Web service

Lecture Notes
4/1 lecture 11-1[Download]
4/3 lecture 11-2[Download]

4/5 lecture 11-3[Download]

Notes
Assignment #2 due on 4/2 by noon

SECTION C Understanding Big Data: Analytics

Week 12
4/8, 4/10, 4/12

Lectures

SOAP, Restful Web service
Cloud Data Analytics

Lecture Notes
4/8 lecture 12-1[Download]
4/10 lecture 12-2[Download]

4/12 lecture 12-3[Download]

Readings
[Optional Text] Alan Gates, Programming Pig, O'Reilly, 2011, [Link to the Open Feedback Publishing System]

Notes
Help session for the assignment #3: 1:00~2:00 PM (CSB 325)

Week 13
4/15, 4/17, 4/19

Lectures
Hadoop's Pig Latin: Data flow management

Lecture Notes
4/15 lecture 13-1[Download]
4/17 lecture 13-2[Download]

4/19 lecture 13-3
[Download]

Readings
[Optional Text] Alan Gates, Programming Pig, O'Reilly, 2011, [Link to the Open Feedback Publishing System]

Notes
Term project phase 4 due on 4/16 by noon --> extended to 4/22 by noon

Week 14
4/22, 4/24, 4/26

Lectures
Hadoop's Pig Latin: Data flow management (Continued)
Data Ontology and Semantic Web-1

Lecture Notes
4/22 lecture 14-1[Download]
4/24 lecture 14-2[Download]
4/26 lecture 14-3[Download]

Readings
[Optional Text] Dean Allengang and Jim Hendler, Semantic Web for the Working Ontolotist, Effective Modeling in RDFS and OWL, Morgan Kaufmann, 2011, ISBN 978-0-12-385965-5
[Link to the web page for Examples]

Notes
Assignment #3 due on 4/22 by noon [Download]

Week 15
4/29, 5/1, 5/3

Lectures
Data Ontology and Semantic Web-2
Term Project Presentation

Lecture Notes
4/29 lecture 15-1[Download]
5/1 lecture 15-2[Download]

5/3 Term Project Presentation
Matthew (Matt) Stobber, and Nicholas Williams
Daniel Hallworth, and Drew Hopkins
Johannes Paraan, Nate Prewitt, and Charlie Wahlquist

Readings
[Optional Text] Dean Allengang and Jim Hendler, Semantic Web for the Working Ontologist, Effective Modeling in RDFS and OWL, Morgan Kaufmann, 2011, ISBN 978-0-12-385965-5
[Link to the web page for Examples]

Week 16
5/6, 5/8, 5/11

 

Schedules
5/6 Term Project Presentation
Annika Muelbradt and Cameron Tolooee
Spencer Hale and Chris Millard
Jared Koontz, Zach Mcgaughey, and Russell Geroche

5/8 Term Project Presentation
Chris Huval, and Subhojeet Mukherjee
Danielle Alexander and Nathan Nuber

5/11 Review for the Final Exam [Download]

Notes
Assignment #4 due on 5/7 by noon [Download]-->extended to 5/11 (5:00PM)

Final Exam

May, 13, 2012 2:00 ~ 4:00PM CSU Spring 2013 Final exam schedule