Logged in as: guest Log in

Announcements

  • May 3: Starting at 8:00am you will be able to download your Final exam here. You must be logged into the course website to download the exam. Read all instructions on the download page, and in the exam notebook. You will have until 11:00am (180 minutes) to complete the exam. Uploading will be disabled once the examination period is over. You can, and are encouraged to, submit as many times as you like. Only the last submission will be kept.
  • May 2: I will be in my office from 5 - 6:30pm today to resolve grading issues. -Daniel
  • April 30: The Final Exam Review Session will be held in SN014 from 5-7pm on Tuesday April 30th.
  • April 19: Because of Good Friday, I'll be moving my OH from Friday to Tuesday from 3-5pm. -Daniel
  • April 17: There is a new version of Problem Set #4, which greatly simplifies Problem #2 so that the subsequent problems can be done. Please download this version. Problem #1 is essentailly the same as before, and it is now the most time consuming, and Problem #3 is now trivial. -Leonard
  • April 16: If you have a 0 for Problem Set #2, don't modify your file and come see me after class or during my Office Hours to get it fixed. -Daniel
  • April 16: In the LocalAlign function in Lecture 18, no boundary checks (i==0, j==0) are needed since the for-loop ranges are from 1 to len + 1. If anyone has downloaded the 10 sequences as a Fasta file, the LoadFasta function won't work because the file isn't in the proper Fasta format. To fix this, take out data.pop(0), split by '\n', and return data; no need for the sequences for loop. Come to my OH if you need help implementing this. -Daniel 
  • April 9: I'll be moving my Office Hours from Friday to today from 3-4pm in SN 325 to offer help to anyone still working on Problem Set #3. -Daniel
  • April 9: Problem Set #4 is on-line now. Note that problem #2 is time and space consuming. Stay tuned for possible updates.
  • March 28: Problem Set #3 is on-line now. Note that problem #1 is by far the most computationally difficult, in other words time consuming. You might want to try solving problems 2-5 first. 
  • March 7: Starting at 9:30am you will be able to download a copy of your Midterm here. You must be logged into the course website to download the exam. Read all instructions on the download page, and in the exam notebook. You will have until 10:50am (80 minutes) to complete the exam. Uploading will be disabled once the examination period is over. You can, and are encouraged to, submit as many times as you like. Only the last submission will be kept.
  • March 5: I will be holding extra office hours for questions about the Problem Set and midterm on Tuesday (3/5) in SN325 from 3:30pm-4:30pm. -Daniel  
  • March 1: I will be holding an extra block of office hours today from 2:30-4:30pm in advace of next week's midterm and problem-set due date. -Leonard
  • Febuary 28: Hi all! I will be hosting a Midterm review session from 5pm-7pm at FB009 on Wednesday March 6th to go over an old midterm, review some concepts covered on the exam and answer questions. -Daniel
  • Feburary 18: My Office Hours will be from 12pm-2pm  (2-18) today. -Daniel
  • Feburary 6: I will be holding my office hours today from 3:00-4:30pm because I need to attend a talk (lm). 
  • Feburary 5: I will be hosting an additional hour of office hours today (2/5) from 3pm-4pm in SN325 (the conference room) to answer any more questions about Problem Set #1. -Daniel
  • Feburary 4: Just a reminder, Problem Set #1 is due Tuesday at midnight. Remember to put ALL your code in the answer cell!! We cannot see any code not in those cells so your answers won't be graded!! 
  • January 31: A new version of Problem Set #1 is now on-line, with the corrections mentioned below, as well as instructions for submisson. You shoud download a new copy, and transfer your answers from the previous version to the current on.
  • January 28: Hey everyone, here are some mistakes on the first version of Problem Set 1 which we'll have fixed in the next version: For number 1, missing should equal 0, not []. For number 2, the historgram doesn't consider the number of kmers which occur 0 times, which are you expected to have in your solution array. 
  • January 28: The Python review session will be Tuesday (1/29) 5:00pm-6:00pm, room SN011.
  • January 22: Problem Set #1 is online and due on 2/5/2019
  • January 16: I will hold shorter than usual office hours today from 2-3pm, so that I can attend a previously scheduled dissertation defense (lm).
  • January 15: Plan to bring your laptop to Thursday's class meeting. Also make sure that you can access the Comp555 JupyterHub (details are given at the end of Lecture 2). Optionally, you can also install Jupyter on your personal machine. In some circumstances, this might provide better performace than the JupyterHub's Cloud solution. Anaconda, is the suggested method for installing Jupyter. You will find links describing how to install Anacoda Jupyter at the bottom of this page. Make sure that you install Jupyter with Python 3!
  • January 10: First class meeting in SN014. See you there

Course Description


Computational methods are fueling a revolution in the biological sciences. Computers are already nearly as indispensable as microscopes for analyzing and interpreting biological data. As a result, two new multidisciplinary fields, bioinformatics and computational biology, have emerged. This course will explore the computational methods and algorithmic principles driving this revolution. It will cover basic topics in molecular biology, genetics, and proteomics. The course also addresses basic computational theory and algorithms including asymptotic notation, recursion, divide-and-conquer approaches, graph algorithms, dynamic programming, and greedy algorithms. These fundamental concepts from computer science will be taught within the context of motivating problems drawn from contemporary biology. Example biological topics include sequence alignment, motif finding, gene rearrangement, DNA sequencing, protein peptide sequencing, phylogeny, and gene expression analysis.

This course is suitable for both computer science and biology students at both undergraduate and graduate levels. Students who wish to take this course should have some programming experience in a modern programming language. Knowledge of data structures, algorithm design, and biology is helpful but not required. There will be 5 problem sets each with short programming assignments (each worth 8%), a midterm (worth 20%), a final exam (worth 20%), and many unannounced in-class exercises/quizzes (in total worth 20% with the lowest 2 dropped).

A syllabus for this offering of Comp555 can be downloaded from here.

Book, Course Information, and Prerequisites


This semester I will not be using a book. In the past, I have used the following textbook, but I plan to deviate from it significantly in this offering. Nonetheless, you may find it useful as a supplement.

Bioinformatics Algorithms Bioinformatics Algorithms: An Active Learning Approach, Vol 1
by Phillip Compeau and Pavel Pevzner
Active Learning Publishers © 2014, ISBN: 0990374610.

Credit Hours: 3
Location: SN014
Time: TTh 9:30-10:45
URL: http://www.csbio.unc.edu/mcmillan/?run=Courses.Comp555S19
Prerequisites: COMP 410, Math 381, or equivalents

Course Instructors




Instructor:  Leonard McMillan Leonard's Mug
Office:  SN316
email:  mcmillan@cs.unc.edu
Office Hours:  Wednesdays 2pm-4pm



TA:  Daniel Su Daniel's Face
Office:  SN341
email:  sudan@live.unc.edu
Office Hours:  Monday 3-5pm, Friday 12-2pm

Schedule


Date Topic Homework
January 10 Lecture 1. Introduction (slides)  
January 15 Lecture 2. Jumping into Genomes (slides) (notebook)  
January 17 Lecture 3. Finding patterns in DNA (slides) (notebook)  
January 22 Lecture 4. Finding hidden patterns in DNA (slides) (notebook) Problem Set #1
January 24 Lecture 5. Finding Motifs in our Lifetime (slides) (notebook)  
January 29 Lecture 6: Assembling a Genome (slides)  
January 31 Lecture 7. Finding Paths in Graphs (slides) (notebook)  
February 5 Lecture 8. Finding Eulerian Paths (slides) (notebook) PS #1 Due
February 7 Lecture 9. The Realities of Genome Assembly (slides) (notebook)  
February 12 Lecture 10. Combinatorial Pattern Matching (slides) (notebook)  
February 14 Lecture 11. Suffix Arrays and BWTs (slides) (notebook) Problem Set #2
February 19 Lecture 12. multi-string BWTs (slides) (notebook)  
February 21 Lecture 12. multi-string BWTs (continured)  
February 26 Lecture 13. Protein Sequences and Antibotics (slides) (notebook)  
February 28 Lecture 14. Determining a Peptide's Sequence (slides) (notebook)  
March 5 Lecture 14 (cont), Discuss midterm  PS #2 Due
March 7 Midterm Exam (open notes, open internet)
March 12 No Class (Spring Break)
March 14
March 19 Go over Midtem  
March 21 Lecture 15. Scaling up Peptide Sequencing (slides) (notebook) Problem Set #3
March 26 Lecture 16. Comparing Sequences (slides)  
March 28 Lecture 17. Sequence Alignment (slides) (notebook)  
April 2 Lecture 18. Advanced Sequence Alignment (slides) (notebook)  
April 4 Lecture 19. Adventures in Dynamic Programming (slides) (notebook)  
April 9 Lecture 20. Divide and Conquer Algorithms (slides) Problem Set #4
PS #3 due
April 11 Lecture 21. Hidden Markov Models (slides) (notebook)  
April 16 Lecture 22. Inferring Ancestory using HMMs (slides) (notebookCCGenotypes.csv)  
April 18 Lecture 23. Problems with Problem Sets (slides)  
April 23 Lecture 24. Genome Rearrangements (slides) (notebook)  PS #4 due
April 25 Lecture 24. Genome Rearrangements (cont) (slides) (notebook)  
Friday, May 3 Final Exam (SN014) 8:00am-11:00am

 

Resources


  • Download the package management system Anaconda. It will greatly simplify working with and managing both Python 3 and Jupyter.
  • Follow the installation instructions. In most cases, you should install full Anaconda, rather than Miniconda.


Site built using pyWeb version 1.10
© 2010 Leonard McMillan, Alex Jackson and UNC Computational Genetics