Logged in as: guest Log in

Announcements


  • May 9: The final exam can be downloaded anytime during the examination period (12pm-3pm). It must be submitted online before the exam period ends. It is designed to take only 120 mins, but you can use the entire time allottted.
  • May 1: I will hold a course review session on Thursday, May 4 from 5pm to 7pm in SN011. 
  • April 18:  Problem Set #4  has been revised. The changes are to Problems #4 and #5. Please download and transfer your answers to it.
  • April 13: Problem Set #4 is now available and it is due on Monday 4/24.
  • March 29: A study session for Problem Set #3 will be held on Friday, March 31 in SN325 from 2:00 to 3:30pm.
  • March 22: Problem Set #3 is now available and it is due on Monday 4/3.
  • March 20: Grades and solutions to the Midterm and problem set #1 are now online. You must login using your onyen to access them. You'll need to attend class to find out your password.
  • March 8: The Midterm can be downloaded and must be submitted online. You will have 75 minutes to complete it. I recommend that you submit partial versions as the submission system will be automatically disabled at 12:30.
  • Feburary 28: A study session for Problem Set #2 will be held on Friday, March 3 in SN011 from 2:00 to 3:30pm.
  • Feburary 27: The original test dataset, PS2contigs.fa, for problem 5 of PS#2 had issues which have now been fixed. Please download it again. 
  • Feburary 27: A revisied version of Problem Set #2 has been posted. It corrects an error in problem #3 and revises the constraints on problem #5.
  • Feburary 25: A second test dataset, PS2contigs.fa, for problem 5  of PS#2 is now online. Also, note that the lengths of the contigs exceed 100bp and their overlap can be less than the 50% contrary to the specification given in the problem's description. I will revise the problem set an repost it soon.
  • February 20:  Problem Set #2 is now available and it is due on Monday 3/6.
  • February 8: I will hold a study session to answer questions related to problem set #1 on Friday 2/10 from 2:00 to 3:30pm in SN325
  • February 7: Jupyter notebooks for each of the lectures are now online. The code and discussion should be intact, however, many image and web links may not work. Where possible, I will try to fix these problems over time. Therefore, be prepared to download them again.
  • February 3: All of the fasta files needed for the problem set #1 should now be on-line.
  • January 31: Problem Set #1 is now available and it is due on Monday 2/13. A link is included in the problem set for submitting it online.
  • January 23: I will hold a tutorial session on Friday 1/27 in SN011 from 1:30-3:00 covering installing Jupyter, getting started with Python, and Rosalind.
  • January 11: First class meeting in SN011. See you there

Course Description


Computational methods are fueling a revolution in the biological sciences. Computers are already nearly as indispensable as microscopes for analyzing and interpreting biological data. As a result, two new multidisciplinary fields, bioinformatics and computational biology, have emerged. This course will explore the computational methods and algorithmic principles driving this revolution. It will cover basic topics in molecular biology, genetics, and proteomics. The course also addresses basic computational theory and algorithms including asymptotic notation, recursion, divide-and-conquer approaches, graph algorithms, dynamic programming, and greedy algorithms. These fundamental concepts from computer science will be taught within the context of motivating problems drawn from contemporary biology. Example biological topics include sequence alignment, motif finding, gene rearrangement, DNA sequencing, protein peptide sequencing, phylogeny, and gene expression analysis.

This course is suitable for both computer science and biology students at both undergraduate and graduate levels. Students who wish to take this course should have some programming experience in a modern programming language. Knowledge of data structures, algorithm design, and biology is helpful but not required. There will be 5 problem sets each with short programming assignments, a midterm, and a final exam.

A syllabus for this offering of Comp555 can be downloaded from here.

Book, Course Information, and Prerequisites


Here is the book, which I will be supplementing with new materials:

Bioinformatics Algorithms Bioinformatics Algorithms: An Active Learning Approach
by Phillip Compeau and Pavel Pevzner
Active Learning Publishers © 2014, ISBN: 978-0-9903746-0-2.

Credit Hours: 3
Location: SN011
Time: MW 11:15-12:30
URL: http://www.csbio.unc.edu/mcmillan/?run=Courses.Comp555S17
Prerequisites: COMP 410, Math 381, or equivalents

Course Instructors


Instructor: Leonard McMillan Leonard's Mug
Office: SN316
email: mcmillan@cs.unc.edu
Office Hours: Tuesdays 11am-12pm, 3pm-4pm

Schedule


Date Topic Homework
January 11 Introduction (slides, notebook)  
January 16 No Class (MLK Holiday)
January 18 Exploring a Genome (slides, notebook) (Video Parts 1 & 2)
Reading: Chapter 1 pp 3-14
January 23 Exploring a Genome (Continued) (slides, notebook) (Video Parts 3, 4, & 5)
Reading: Chapter 1 pp 14-45
January 25 Finding Patterns in DNA (slides, notebook) (Video Parts 1 & 2)
Reading: Chapter 3 pp 83-92
January 27 Crash course in Jupyter, Python, and Rosalind (notebook) Link: Rosalind
January 30 Searching for Motifs (slides, notebook) (Video Parts 3, 4, 5 & 6)
Reading: Chapter 3 pp 93-127
Problem Set #1 Assigned
February 1 Protein Sequences and Antibotics (slides, notebook)

(Video Parts 1,2, & 3)
Reading: Chapter 2 pp 47-58

February 6 Inferring Protein Sequences from Fragments (slides, notebook)

(Video Parts 4, 5, & 6)
Reading: Chapter 2 pp 58-66 

February 8 Scaling up Peptide Sequencing (slides, notebook) (Video Parts 7, 8, 9, & 10)
Reading: Chapter 2 pp 66-80 
February 10 Problem set #1 study session in SN325 from 2:00pm-3:30pm Due on 2/13
February 13 Assembling a Genome (slides, notebook) (Video Parts 1, 2, 3 & 4)
Reading: Chapter 4 pp 129-152
February 15 Path Finding in Graphs (slides, notebook) (Video Parts 4, 5, 6, 7, & 8)
Reading: Chapter 4 pp 153-187
February 20 The Realities of Genome Assembly (slides, notebook) (Video Parts 9, 10, 11, & 12)
Problem Set #2 Assigned
February 22 Comparing Sequences (slides, notebook) (Video Parts 1, 2, 3, & 4)
Reading: Chapter 5 pp 189-199
February 27 Sequence Alignment (slides, notebook)

(Video Parts 5, 6, 7, & 8)
Reading: Chapter 5 pp 200-229 

March 1 Advanced Sequence Alignment (slides, notebook)

(Video Parts 9, 10, & 11)
Reading: Chapter 5 pp 230-258  

March 3 Problem set #2 study session in SN325 from 2:00pm-3:30pm Due on 3/6
March 6 Divide and Conquer Algorithms (slides, notebook) Problem Set #2 due
March 8 Midterm, Open book, open notes, online (covers to Lecture 13, 2/27)
March 13 No Class (Spring Break)
March 15
March 20 Go over Midterm  
March 22 Greedy Algorithms (slides, notebook) (Video Parts 1-4)
Problem Set #3 issued
March 27 Genome Rearrangements (slides) (Video Parts 5-9)
March 29 Clustering and Evolution (slides)  
March 31 Problem set #3 study session in SN325 from 2:00pm-3:30pm Due on 4/3
April 3 Imperfect Tree Construction (slides) Not in book
Problem Set #3 Due
April 5 Perfect Phylogeny (slides) Not in book
April 10 Combinatorial Pattern Matching (slides) (Video Parts 1-3)
April 12 Suffix Trees and BWTs (slides) (Video Parts 4-9)
Problem Set #4 Assigned
April 17 Multi-String BWTs (slides) Not in book
April 18 Problem Set #4 Study session  
April 24 Hidden Markov Models (slides) Not in book
April 26 Finding Founder Origins using HMMs (slides) Not in book
May 9 Final Exam 12:00pm-3:00pm, Open book, open notes, online (covers to Lectures 14-25)

 

Resources


  • PS#2 study session on Friday (SN011 2pm-3:30pm)


Site built using pyWeb version 1.10
© 2010 Leonard McMillan, Alex Jackson and UNC Computational Genetics