Logged in as: guest Log in

Announcements


  • December 10: (Tao) Attention. All dispute on (homework) grades needs to be addressed to the TA by this Friday.  Grades are released and finalized.
  • Decemebr 7: During your assigned period you can download your exam from the following link: Final
  • December 7: I have decided to release the partially graded PS5. Problems #2 and #5. worth a combined 32 pts, are still not graded.
  • EXAM TIMES: The final exam is on 12/7 from 12:00-3:00PM and will be held in SN014 and FB009.
    The Alternate Exam, for those with an official Examination Excuse, is 12/9 from 9:00AM-12:00PM in Sitterson 325.
  • December 3: A rough version of Problem Set #6 is available here. It requires that MongoDB is installed.
  • November 14: There is a new version of Problem Set #5. The changes are to Problem 5. You should redownload it.
  • November 7: Problem Set #5 is now on-line, and is due on 11/21.
  • November 5: Tao's office hour is moved to Wednesday (at the same time).
  • October 31:  We have fixed bugs and made small changes to Problem Set #4; download the new version.
  • October 29:  Problem Set #4 is now on-line. 
  • October 28: (Ziwei) - I need to end my office hour early today at 12 pm.
  • October 14: You can download the Midterm at the following link: Midterm
  • October 12: Monday's Midterm will be open-note and open-internet. The internet can be used to look-up information, but no messaging apps, shared documents, or other means of communication with another person are allowed. The exam itself will be downloaded as a Jupyter notebook and submitted like a problem set. 
  • October 10: If you are unable to take the Midterm at the assigned time of 10/14 from 6pm-8pm, then you must request an alternative examination time at this link. If your request is not allowed, you will get an email before 5pm on 10/11.
  • October 9: [Leonard] Something has come up and I need to move my office hours today from 3pm-5pm to 2pm-4pm. I am sorry for the late notice.
  • October 9: Do not use pandas or csv in problem set 3. Please write your own parser to read and parse the csv file. This is extremely easy using basic Python I/O as follows:
        with open(filename) as fp:
            data=fp.read().split('\n')
        header=data.pop(0).split(',')
        for line in data:
            row = line.split(',')    # doesn't properly handle commas in quotes, but it works for us
            # process row
    
    Where "filename" is a string variable. This also has several advantages over the Panda's based approach, which tries to infer a type for each column.   
  • October 8: If your problem 5 in Problem Set 3 asks to time the query to identify players with the same name who have played for the same team, your query may be truncated. The full query should be 
        SELECT *
        FROM Player P1, PlayedFor R1, Player P2, PlayedFor R2
        WHERE R1.pid=P1.pid AND R2.pid=P2.pid
        AND R1.tid=R2.tid AND R1.pid < R2.pid AND P1.name=P2.name
        GROUP BY P1.pid, P2.pid 
  • October 7: Jupyterhub maintenance 1300-1305. Ensure all work is saved. 
  • October 1: Leonard - Problem Set #3 is now on-line. It only has 6 problems, but 5 require writing short programs.
  • October 1: Ziwei - I will hold additional office hours today from 5 pm - 7 pm.
  • October 1: Boo - No office hours today due to GRE testing. I will be available after 5 pm by email for questions
  • September 26: Problem Set 2: When listing players, teams and games, please list player names for players, mascots for teams and all information in the game relation for games. 
  • September 25: Tao Tao has moved his 3:30-5:30pm Wednesday office hours to Fridays at the same time to minimze overlap. This change is effective immediately.
  • September 24: A new version of problem set #2 is now online that fixes several questions that appear to be asking for two SQL queries.
  • September 17: The second problem set is now available. It is due before the strike of midnight on 10/1. No late submissions will be allowed.
  • September 12: The Jupyterhub will be down briefly at 1:30pm for maintenance. Downtime ~5 min.  
  • September 3: The first problem set is now available. It is due before the strike of midnight on 9/17. No late submissions will be allowed.
  • August 20: First day of class and the course syllabus.

Course Description


Databases are an indispensable tool for managing information, and a course on the principles and practice of database systems is now an integral part of any computer science curricula. This course covers the fundamentals of modern database management systems, in particular relational database systems.

The material covered in Comp 521 can be broken into three areas of emphasis. The first area includes database foundation material such as the relational model, relational algebra, relational calculus, and normal forms. The second area of emphasis is database application programming and includes among others the topics of Structured Query Language (SQL), eXtensible Markup Language (XML), integrating databases into programs, web-based database usage, and triggers and active databases. The third area of emphasis is the systems side of databases, which includes database indexing, efficient query evaluation, the transaction-based model, concurrency, and security.

This course is suitable for computer science majors at both undergraduate and graduate levels. Students who wish to take this course should have some programming experience in a modern language and knowledge of data structures. There will be five homeworks, two mid-terms, and a final exam.

Book, Course Information, and Prerequisites


  Database Textbook

Designing Data-Intensive Applications
Textbook 1 (optional): Database Management Systems, Third edition
by Raghu Ramakrishnan and Johannes Gehrke
McGraw-Hill Higher Education © 2003, ISBN: 0072465638.
Textbook 2 (optional): Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems 1st Edition
by Martin Kleppmann
O'Reilly © 2017, ISBN: 1449373321.
Credit Hours: 3
Location: Murphey 116
Time: TTh 2:00pm-3:15PM
URL: http://csbio.unc.edu/mcmillan/?run=Courses.Comp521F19
Prerequisite: COMP 401, Comp 410,  Comp 411 or equivalents
 

Course Instructors



Leonard Instructor: Leonard McMillan
Office: SN316
email: mcmillan@cs.unc.edu


Tao Tao TA: Tao Tao
Office: SN337
email: ttao@cs.unc.edu


Ziwei Chen LA: Ziwei Chen
Office: SN 325
email: ziwei75@live.unc.edu


Boo Fullwood
LA: Boo Fullwood
Office: SN 325
email: iamboo@live.unc.edu

Office Hours:

Leonard McMillan: Wednesdays 3pm-5pm 
Tao Tao: Tue/Thu/Fri 3:30-5:30pm SN337
Ziwei: Mon 9am-1pm, Thu 11am-1pm in SN325
Boo: Tue 4pm-6pm, Thu 4pm-6pm in SN325

Schedule


Week 1:
T 8/20 Lecture 1: Introduction and Overview (pdf)  
Th 8/22 Lecture 2: Entities and Relations (pdf)  
Week 2:
T 8/27 Lecture 3: The Trouble with Files (pdf)  
Th 8/29 Lecture 4: The Relational Model (pdf)  
Week 3:
T 9/3 Lecture 5: SQL: Basic Queries (pdf) PS1
Th 9/5 Lecture 6: SQL: More Advanced Queries (pdf)  
Week 4:
T 9/10 Instructor out of town (no class)
Th 9/12 Lecture 7: SQL: Joins, Constraints & Triggers (pdf) PS2
Week 5:
T 9/17 Lecture 8: SQL: Modifications and Transactions (pdf) (PS1 due)
Th 9/19 Lecture 9: Exploring a Database (pdf)  
Week 6:
T 9/24 Lecture 10: Database Application Development (pdf)
Th 9/26 Lecture 11: Overview of Storage and Indexing (pdf)  PS3
Week 7:
T 10/1 Lecture 12: Storing and Buffering Data (pdf) PS2 due
Th 10/3 Lecture 13: Tree-Structured Indexes (pdf)  
Week 8:
T 10/8 Lecture 14:  Hash-Based Indexes (pdf)  
Th 10/10 Lecture 15: Overview of Query Evaluation (pdf)  (PS3 due)
Week 9:
M 10/14 Midterm (Covers Lectures 1-14 at 6pm-8pm in SN014 and FB007)
T 10/15 Lecture 16: External Sorting (pdf)
Th 10/17 Fall Break (no class)
Week 10:
T 10/22 Lecture 17: Schema Refinement and Normal Forms (pdf)  
Th 10/24 Lecture 18:  Schema Refinement and Normal Forms (cont)  
Week 11:
T 10/29 Lecture 19: Overview of Transaction Management (pdf)  PS4
Th 10/31 Lecture 20: Normalization Example (Lecture17) and
Transaction Mangement (cont).

Week 12:
T 11/5 Lecture 21: Database Crash Recovery (pdf)  
Th 11/7 Lecture 22: Intro to NoSQL databases (pdf)  PS5 (PS4 due)
Week 13:
T 11/12 Lecture 23: MapReduce for Big Data (pdf)
Th 11/14 Lecture 24: Hadoop Pig and Hive (pdf) PS6
Week 14:
T 11/19 Lecture 25: NoSQL Document Databases (pdf)  
Th 11/21 Lecture 26: NoSQL Columnar Databases (pdf) (PS5 due)
Week 15:
T 11/26 Lecture 27: NoSQL Graph Databases (pdf)  
Th 11/28 Thanksgiving (no class)
Week 16:
T 12/3 Lecture 28: Farewell to Databases (pdf)  (PS6 due)
S 12/7 Final Exam: 12pm-3pm

Course Resources

  1. A simple interactive SQL interpreter that can be embedded in a Jupyter Notebook. You will need to rename the downloaded file to "iSQL.py"
  2. The small Sailor database used for examples in class.
  3. AN even smaller, tiny version of the Sailor database used to demonstrate JOINS.

 

Individual Jupyter Notebooks

 It is recommended that, as an alternative to the class Jupyterhub, you have access to a Jupyter environment either locally or on another cloud service. The recommended path is to use the Azure Cloud Notebooks, but you can install anaconda locally and use that if you wish.

Azure Notebooks

This is the recommended alternative to the Jupyterhub. You will get a personal VM with 4GB of memory which should be plenty for this course.

  • Go to notebooks.azure.com
  • Sign in in the top right
  • Use your onyen@ad.unc.edu login with your onyen password
  • Select 'My Projects' in the top navigation bar
  • Select '+ New Project'
  • Select a name and hit create
  • Select your new project and hit 'Run on Free Cloud'
  • You should now be at a notebook homepage. This can now be used just like the hub.

Local with Anaconda

This completely avoids cloud environments, and gives some other useful python tools, but is somewhat more fiddly

  • Go tohttps://docs.anaconda.com/anaconda/install/
  • Follow the installation instructions for your operating system
  • Open the Navigator
  • Select Launch under Jupyter Notebook
  • A screen like the Jupyterhub should appear in your browser
  • Create a folder for the class

 



Site built using pyWeb version 1.10
© 2010 Leonard McMillan, Alex Jackson and UNC Computational Genetics