Logged in as: guest Log in

Announcements [Lecture Zoom Link] [Office Hours Zoom Link]


  • November 19 (1:27pm): If your problem 12 refers to ""State Unallocated" it should have instead said ""Statewide Unallocated". 
  • November 19 (1:08pm): If your problem 11 refers to the population of counties the query should be for 1 million, not 10 million. BTW, the intent here is to find how many urban counties are in each state.
  • November 19 (12:45pm): On some versions of problem 5 there is a reference to "Covid19" in the first sentance that should be to "Demographics" instead. This change should make the figure and text consistent throughout the remainder of the prompt. Moreover, in the answer cell there is a reference to "CovidDeaths" that should be cahnaged to "DemographicData" AND you should assume that the UPDATE command mentioned in the question is inserted before the second INSERT command.
  • November 19 (12:29pm): On problem 4, if your question makes reference to two differ values for the number of page buffers (i.e. 9 and 7, or 33 and 25) use the first number, (9 and 33 in this case).
  • November 19 (12:19pm): On problem 7 begin the steps of your derivation on the line following the prompt. Allow one line per step. On problem 8 the tuple mentioned in the answer cell might not match the question. Your answer should use the tuple from the question, not the one in the answer cell.
  • November 19: Starting at noon today the Final Exam will be accessible. You do not need to be on Zoom during the exam. However. I will be available on Zoom from 11:30 until the end of the exam on the following Exam Zoom Link. I will essentially be using the Zoom meeting as a waiting room from which I will shuttle students into and out of meeting rooms. During this time, I will priortize people having problems downloading the exam, getting it to work as a Jupyter notebook, helping with broken environments, and submitting it. I will consider requests for clarifications, but I suggest that you first consider the following.

         1) If a problem seems tricky, you are probably over analyzing it. It was my intention to make every problem straightforward.
         2) If you think that information is missing from a problem's prompt. Make a reasonable assumption and insert it as a 
             comment (prefixed with a '#') before the problem's prompt. For example.
                               "# I assume that the query meant to specify the year=2020"
         3) If you would like to submit an explanation of your answer for consideration in determining partial credit, enter it
             on separate lines after your answer. For example:

    # I assume the professor has lost his mind
    Enter the answer to the ultimate question: 42
    My work. (2 sections * (2020 crazy year + 521 hard class)) / 121 angel number
         4) If there are any significant clarifications or changes to the exam, I will post them here on this website during the exam.


  • November 16: I have decided to use tomorrow's class as a Final Review session to allow everyone to attend. 
  • November 15: The grades for PS4 are now posted.
  • November 12: Grades for PS3 should now be online and you should now be able to submit PS5.
  • November 11: An updated version of Problem set #5 is now available. There is a change to question 4 (increases the number of cases from 1000 to 3500) and clarifications of the expected output for some questions has been added. Most important, there is a new version of covid19.json with fixes to the deaths sub-documents. You will need to download it, remove all documents from your counties collection, and then reload it.
  • November 5:  Problem set #5 is now available. Remember that I will drop your problem set with the lowest score.
  • November 3: I have decided to delay the due date for Problem set #4 until Thursday 11/5, thus allowing your to focus and stress over other things tonight. 
  • October 29: An updated version of Problem set #4 is now available. Download it, and transfer your answers to the new copy before submitting.
  • October 20: Problem set #4 is now available, however, I am still tweaking problem 7 8, so expect a second version sometime later in the week.
  • October 15: The midterm grades are now online under your setup menu. I will discuss the grade disttribution in class. Many of you will notice problems that fall into one of three categories. Some of you submitted either empty or nearly empty midterms, when you view your exam you will see than many of your answers are "** No answer **". This was likely due to your uploading an unfinished version of the exam rather than the one you intended. If you fall into this case, send me an email with "Comp521 Midterm empty answer cells" in the subject line. I will deal with these cases one at a time. There is second group of students who submitted a midterm, and see nothing at all (no problems, solutions nor answers) and a grade of 0.0. This is most likely due to your entering an invalid PID. I have notified via email everyone that I think falls in this case. I have fixed a few, but I still have more in the queue. It some cases this is due to a previous Problem set submission which highjacked you true PID. Lastly, there is a third group of people who responded to last lecture survey on the power/internet outage impact. For those people, I am currently holding on to any resubmission that I recieveed from you, unless I had no previous version of your midterm. Handling the replacement case is tricky. Moreover, it was also unclear to me, via breif inspections, if there was any significant difference between the two.  
  • October 13: Please take time to fill in this web survey.
  • October 7: I have changed the due date for Problem Set #3 to Tuesday 10/13 to allow everyone to recover from the midterm.
  • October 6: The midterm can be downloaded from here when it is available. You will have from 3:00pm-4:20pm to take the exam. Your final version must be uploaded before 4:20pm. Please check that the version of the Midtem that you submit is, in fact, the one with your answers and not the empty one that you downloaded. 
  • October 6: While testing the midterm late last night, I inadvertently cleared the graded flag for the first two problem sets. This made it appear as if the problem sets were not graded (the buttons with your scores were missing from your Setup pages). I have reenabled the flags now, and things should be back to normal.
  • October 2: All properly formatted problems sets (Jupyter notebook .ipynb files) are now loaded into the grading system. You must have been logged-in at the time of submission if you did not fill in the signature cell. If you submitted as a guest and you did not fill in the signature block, and you did not enter your Onyen on the submission page, your problem set is lost. Any remaining issues are caused by either modifications to the answer cells in the version submitted, or submission of the wrong notebook file. Also, student in Prof. Bishop's section should now be able to see the questions and solutions to problem set #1, even if they did not submit it. I still have not incorporated the grades from the first homework yet. 
  • October 1. Today's lecture will be replaced by a Midterm review session. It will not be recorded.
  • October 1: Problem sets 1 and 2 are now graded. You can see your score on the setup page if logged into the course website. If you click the button with your score, you should can see how each of your answers was scored. Some of you might see no grade for one or all of your problems. This is generally due to either a submission problem (ex. you submitted an empty copy of your problem set rather than the one with your answers) or you deleted an answer cell and placed your answer where it was not found in the grading system. If you think either of these issues apply to you, you should make a appointment to discuss it with TA Boo Fullwood during one of his office hours. You should show him, via zoom evidence that you either submitted the wrong file, or put your answer in the wrong cell. He will look over your case and collect any version of the problem sets for possible regrading. I still have not brought over all the grades for Prof. Bishop's section yet. I am also still working on a way for those in Prof. Bishop's section to see the answers for Problem Set #1.
  • September 30: A new version 1.1 of Problem Set #3 is now online. I have tried to clarify several of the original questions. Please download a copy of this notebook, and transfer any answers to it.
  • September 24: I have rescheduled the midterm exam for 10/6 so that it does not overlap the Grace Hopper Conference.
  • September 23: Problem Set #3 is now online.
  • September 22: I have decided to extend the deadline for Problem Set #2 by an addtional 24 hrs. It is now due before midnight on 9/23.
  • September 20: Some clarifications on Problem Set #2:
    Ratio's and percentages should be output as float values. You should should also be aware that, by default, our DBMS uses integer arithmetic for expressions involving only integer arguments, bu a mixed expression of floats and ints is promoted to float. Thus, if you are computing a ratio you should use some variant of:
        (1.0 * subsetCount)/totalCount
    and for a percentage:
        (100.0 * subsetCount)/totalCount
    You will not lose credit for omitting a "%" or for including too many significant digits. Integer truncation, however, may lead to a small deduction.

    For Problem 11, the question should say "65 or over" not "over 65". 

  • September 16: I have made a few small changes and clarifications to Problem Set #2. You should download it again and transfer your answers from the old version 1.0 to the new one 1.1. Also, Kat and Boo will be leading a special recitation on 9/17 from 6pm-7pm for students from section 1. They plan to go over the slides and highlight all siginificant differences in the first 6 lectures prior to the section merge. [Recitation Zoom Link]
  • September 15: I have a few major annoucnements. First, I am delaying the due and issue date for problem sets #2 and #3 respectively to 9/22 to give folks from section 001 more time, and to allow me to incorporate more scalabled appraches into grading infrastructure. Second, I am redistributing my office hours over two days W, Th from 11am-noon rather a single block on W from 10am-noon. This will allow me to hold an organizational meeting with the TAs, and I hope it provides more access. I still have not yet garnered control of all the course infrastructure used in section 001. I hope to do so this week.   
  • September 9: I messed up and lost track of my office hours today. Actually, it was a side effect of being triple booked. I plan to mke up for these missed hours early next week. Stay tuned for updates. 
  • September 8: Welcome everyone attending from Comp521-001. We will discuss what is going on, and various senarios during today's class.
  • September 3: Problem Set #2 is now online.
  • September 2: There are two clarifications and fixes for Problem Set #1. On Problem #8, you should also ignore cases where the number of beds in a hosptial is reported as "0" when finding the cases-to-bed ratios. BTW, the zero beds are not usually true, it often means that the hospital did not report a number, as this data was scraped from sources where the hospitals elected to provide information. On Problem 10 there should be a fourth outcome, "confirmed" in addition to the 3 given (death, hospitalized, and recovered).  The point of Problem 10 is to imagine that the primary COVID-19 data was provided in a different form. You should address how you would format this new data in a table, AND how you would provide backward compatibility to the old tables developed in Problems 1-5. Lastly, you should not assume that the new "case-by-case" records mentioned in Problem 10 are identifiable (i.e. don't assume a primary key like a social security number). Instead, you should assume that information about each case reports only the county, sex, race, and date and type (confirmed, death, hospitalized, or recovered). Note: that on a given day there can be many cases from the same county and demographic group, thus, it may not be possible to create a primary key for this relation.
  • August 28: Just like in the real world, there are misspellings and other inconsistencies in the data files you were given for Problem Set #1. You might need to consider this in your answers. The correct answer is one where the data has been cleaned.
  • August 27: I am changing the due date for Problem Set #1 to 9/8 to allow the new cohort from section 001 to catch up.
  • August 26: Here are a few clarifications with regard to Problem Set #1. Notice that the case counts in the files "NCCOVIDConfirmed.csv" and "NCCOVIDDeaths.csv" are cumulative, and my expectation is that your COVID19 table in Problem 3 will include daily rather than cumulative counts. This has no real impact on the problem, but it better clarifies the database's design. You will need to consider the cumulative nature of the raw data, however, in problems 8 (makes it a little tougher) and 9 (makes it simpler). As a reminder, I will be holding office hours today from 10am-noon focusing on getting the transferred student's course environments set up.
  • August 25: I have created accounts for all of the students who recently transferred from section 001. You should now be able to follow the instructions and exercises from Lecture 3. I will also be holding office hours tomorrow at this Zoom Link. In this meeting I will be available to help (and give priority to) anyone who has a problem accessing their course account and Jupyter hub. Office hours will be use a drop-in and group paradigm. If you need to schedule a one-on-one meeting with the instructor you can schedule one during office hours or after lecture.
  • August 22: The Final Exam schedule is now available. Our exam will take place on November 19, 2020 from 3pm-6pm.
  • August 21: The university has called off classes on 8/25, thus we will next meet via Zoom on 8/27. I plan to make adjustments to the schedule to accommodate this change. Problem Set #1 is now posted. Due to the closing of UNC on 8/24-8/25, I may make adjustments to the due date. I am also still working to add the students who transferred from section 001 to section 002. Thus, these new students cannot yet login to the course website or the Jupyter Hub. They can, however, download a copy of all lecture pdfs and the problem set. Stay tuned.
  • August 17: Due to recent outbreaks tomorrow's (8/18) class will be offered online only. Use the given Zoom Link.
  • August 13: Here's the link to the in-class exercise
  • August 12: The website links for Lecture 1 and Zoom are now fixed (lm).
  • August 11: First day of class and the course syllabus.

COVID-19 Considerations


Spacing,Wash,Mask

  • All course meetings will be simulcast via Zoom using a link provided at the top of this website. You are expected to use Zoom if you are exhibiting any COVID-19 symptom, including an elevated temperature, a cough, shortness of breath, fatigue, head or body aches, loss of taste or smell, sore throat, congestion, or runny nose.
  • If the instructor exhibits any symptoms, that day's lecture will be presented online rather than in person. If this happens, students will be able to watch the live broadcast projected in the classroom if they wish.
  • ALL students must adhere to the university's health safety standards in effect at the time of each lecture. Currently, this requires a mask AND social distancing of at least 6 feet while in the classroom.
  • NO students are allowed to sit in the front row of the classroom nor enter through the east door located at the front of the classroom.
  • Exams, problem sets, and live exercises will be online. Access will only be provided during prescribed intervals, which implies that exercises and exams cannot be taken asynchronously.
  • Students will be responsible for cleaning and removing all materials from their desk areas at the end of each class period.
  • The class may at any time revert to being entirely online.

Course Description


Databases are an indispensable tool for managing information, and a course on the principles and practice of database systems is now an integral part of any computer science curricula. This course covers the fundamentals of modern database management systems, in particular relational database systems.

The material covered in Comp 521 can be broken into three areas of emphasis. The first area includes database foundation material such as the relational model, and normal forms. The second area of emphasis is database application programming and includes among others the topics of Structured Query Language (SQL), integrating databases into programs, and web-based database usage. The third area of emphasis is the systems side of databases, which includes database indexing, efficient query evaluation, the transaction-based model, and concurrency.

This course is suitable for computer science majors at both undergraduate and graduate levels. Students who wish to take this course should have some programming experience in a modern language and knowledge of data structures. Course grades will be based on in-class exercises, five problem sets, a mid-terms, and a final exam.

Book, Course Information, and Prerequisites


 

 

 

 

 

 

 

 

 

 

 

 

  Database Textbook
Textbook (optional): Database Management Systems, Third edition
by Raghu Ramakrishnan and Johannes Gehrke
McGraw-Hill Higher Education © 2003, ISBN: 0072465638.
Credit Hours: 3
Location: Sitterson 014
Time: TTh 3:00pm-4:15PM
URL: http://csbio.unc.edu/mcmillan/?run=Courses.Comp521F20
Prerequisites: COMP 401, Comp 410,  Comp 411 or equivalents

Course Instructors


 

 

Leonard Instructor: Leonard McMillan
Office: SN 316
email: mcmillan@cs.unc.edu



 

Boo Fullwood
RA: Boo Fullwood
Office: SN 325
email: iamboo@cs.unc.edu



 

Zhongrui Chen
RA: Zhongrui Chen
Office: SN ???
email: jcpwfloi@cs.unc.edu



 

Juan Garcia
RA: Juan Garcia
Office: SN ???
email: jjgarcia@cs.unc.edu



 

Larry He
RA: Larry He
Office: SN ???
email: lahe@cs.unc.edu



 

Kat Kirchoff
RA: Kat Kirchoff
Office: SN ???
email: kat@cs.unc.edu

Office Hours:

Leonard McMillan: Wednesday and Thursday, 11am-12pm

Boo: M 11am-6:30pm, W 11am-6:30pm
ZhongRui: M 4-5pm, Th 5:15pm-6:45pm, F 9am-noon
Juan: M noon-2pm, T 8am-9am, T 10am-3pm, Th 8am-3pm
Larry: M 8am-2pm, F 8am-2pm
Kat: T 11am-1pm, W 8am-10am, W 11am-1pm, W 3pm-4pm, F 9am-noon

Schedule


 

 

 

Week 1:
T 8/11 Lecture 1: Introduction and Overview (pdf) (video)  
Th 8/13 Lecture 2: Entities and Relations (pdf) (video)  
Week 2:
T 8/18 Lecture 3: The Trouble with Files (pdf) (video) NCDemographics.csv  
Th 8/20 Lecture 4: The Relational Model (pdf) (video) PS1
Week 3:
T 8/25 Lecture 5: SQL: Basic Queries (pdf) Class cancelled to empty campus
Th 8/27 Lecture 6: SQL: Basic Queries (pdf) (video)  
Week 4:
T 9/1 Lecture 7: SQL:Advanced Queries (pdf) (video)  
Th 9/3 Lecture 8: SQL: Joins, Updates, and Transactions (pdf) (video) PS2
Week 5:
T 9/8 Lecture 9: Exploring a Database (pdf) (video) (PS1 due)
Th 9/10 Lecture 10: Database Application Development (pdf) (video)  
Week 6:
T 9/15 Lecture 11: Overview of Storage and Indexing (pdf) (video)  
Th 9/17 Lecture 12: Storing and Buffering Data (pdf) (video)
Week 7:
T 9/22 Lecture 13: Tree-Structured Indexes (pdf)(video) PS3 (PS2 due)
Th 9/24 Lecture 14:  Hash-Based Indexes (pdf) (video)  
Week 8:
T 9/29 Lecture 15: Overview of Query Evaluation (pdf) (video)  
Th 10/1 Midterm Review  
Week 9:
T 10/6 Midterm (Covers Lectures 1-14 and will be given in-class, online)
Th 10/8 Lecture 16: External Sorting (pdf) (video)  
Week 10:
T 10/13 Lecture 17: Schema Refinement and Normal Forms (pdf) (video) (PS3 due)
Th 10/15 Schema Refinement continued (pdf) (video)  
Week 11:
T 10/20 Lecture 18: More on Query Evaluation (pdf) (video) PS4
Th 10/22 Lecture 19: Transaction Scheduling and Preemption (pdf) (video)
Week 12:
T 10/27 Lecture 20: Database Crash Recovery (pdf)(video)  
Th 10/29 Lecture 21: Intro to NoSQL databases (pdf))(video)  PS5
Week 13:
T 11/3 Lecture 22: MapReduce for Big Data (pdf) (video) (PS4 due)
Th 11/5 Lecture 23: Hadoop Pig and Hive (pdf) (video)
Week 14:
T 11/10 Lecture 24: NoSQL Document Databases (pdf) (video)  
Th 11/12 Lecture 24: NoSQL Document Databases (continued) (video)  
Week 15:
T 11/17 Lecture 25: NoSQL Graph Databases (pdf) (PS5 due)
Th 11/19 Final Exam: noon-3pm


Course Resources

  1. A simple interactive SQL interpreter that can be embedded in a Jupyter Notebook. You will need to rename the downloaded file to "iSQL.py"
  2. The small Sailor database used for examples in class.
  3. AN even smaller, tiny version of the Sailor database used to demonstrate JOINS.


Individual Jupyter Notebooks

 It is recommended that, as an alternative to the class Jupyterhub, you have access to a Jupyter environment either locally or on another cloud service. The recommended path is to use the Azure Cloud Notebooks, but you can install anaconda locally and use that if you wish.

Azure Notebooks

This is the recommended alternative to the Jupyterhub. You will get a personal VM with 4GB of memory which should be plenty for this course.

  • Go to notebooks.azure.com
  • Sign in in the top right
  • Use your onyen@ad.unc.edu login with your onyen password
  • Select 'My Projects' in the top navigation bar
  • Select '+ New Project'
  • Select a name and hit create
  • Select your new project and hit 'Run on Free Cloud'
  • You should now be at a notebook homepage. This can now be used just like the hub.

Local with Anaconda

This completely avoids cloud environments, and gives some other useful python tools, but is somewhat more fiddly

  • Go to https://docs.anaconda.com/anaconda/install/
  • Follow the installation instructions for your operating system
  • Open the Navigator
  • Select Launch under Jupyter Notebook
  • A screen like the Jupyterhub should appear in your browser
  • Create a folder for the class




Site built using pyWeb version 1.10
© 2010 Leonard McMillan, Alex Jackson and UNC Computational Genetics