Logged in as: guest Log in
Problem Set #1 mcmillan / Version 22

Comp 590-087/790-087: BioAlgorithms -- Fall 2011

Problem Set #1

Issued: 9/4/2011      Due: In class 9/22/2010

 


Homework Information: Some of the problems are probably too long to attempt the night before the due date, so plan accordingly. No late homework will accepted. However, your lowest homework will be dropped. Feel free to work with others, but the work you hand in should be your own.
Note: 590-087 students are not required to answer the starred (*) parts of questions.

 

Question 1. Given a 3 billion base-pair genome, what is the expected length of a unqiue sub-sequence (one that appears nowhere else in the genome)? What are the assumptions of your answer, if any?

How many times would one expect to see a given 10-base subsequence appear in the same genome, assuming all bases are independent and equally likely?

 


 

Question 2. Based on the condons shown in table 3.1 of the book and on page 22 of Lecture 1, which of the three codon positions is the least sensitve to mutation (i.e. substituting an alternative base at that position is least likely to change the amino acid)?

*Mutations can be broken into two classes, transitions,  which exchange bases of a similar shape and size (A ↔ G, C ↔ T), and transversions, which exchange bases of dissimlar sizes (A ↔ C, A ↔ T, C ↔ G, G ↔ T). Eventhough there are twice as many trasversions as transistions, they are far less likely. How many transitions do not result in a change of the amino acid coded (consider all three positions)?

 


 

Question 3. Reconsider the incorrect "Change Problem" algorithm from Lecture 3. In how many of the 99 possible inputs for M does it given an incorrect answer for the denominations c = (25, 20, 10, 5, 1)?

Suppose that you are desigining coinage for a new nation, but you are limited to minting only 3 denominations. How do you choose them such that the average number of coins is minimized for all values of change from 1 to 99? Will the greedy algorithm work with your proposed coinage?

 


 

Question 4. Construct a 21-element partial digest that maximizes the number of calls to Place(), and thus acheives the worst-case performance.

*Explain how breakpoints should be spaced on a line segment in order maximize the number of calls to Place().

 


 

Question 5. Given a long genome sequence G, and a shorter pattern string S, sketch out an algorithm (i.e. write pseudo code) for finding the first occurence of S in G (if any). What is the complexity of your algorithm? Describe the input that gives the worst-case performance for your algorithm.

*Estimate the average performace of your algorithm.

 


 

Programming Problem. Modify the code for PartialDigest given in class so that it outputs only exactly one instance of each correct output. The approach of storing previous results and comparing a potential new output to them is frowned upon. I suggest instumenting the code code to figure out why results are revistied and reorganizing the code to avoid reexamination of decisions  that lead to the same answer.

Turn in a listing of the source code for your algorithm, as well as a print out of its output on the input:

      1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,5,5,6,6,7,7,8,8,8,9,9,10,10,11,11,12,12,12,13,14,15

also provide a print out of your code's output for the 21-element partial digest you came up with for in Question 4.



Posted by allind on 2011-09-08 00:29:58 (references Version 20)

On question 3 part 1, should we assume we're using the US change denominations or the additional 20 cent piece we discussed in class?

Posted by mcmillan on 2011-09-08 19:41:51 (references Version 22)

I should have specified to coin denominations as c = (25,20,10,5,1). The problem has now been updated to reflect this. 

Posted by jkhuang on 2011-09-12 19:22:19 (references Version 22)

In the programming problem, the partialDigest.py script given in class does not seem to run with the given input. (I get an error saying I am trying to remove something from the list that is not there).  Is this something we are expected to debug as well?

Posted by mcmillan on 2011-09-16 00:32:06 (references Version 22)

A new version of partialDigest.py, which addresses the bug for the example dataset of the programming problem is available on the course website.




Site built using pyWeb version 1.10
© 2010 Leonard McMillan, Alex Jackson and UNC Computational Genetics