Logged in as: guest Log in
Problem Set #3 mcmillan / Version 15

Comp 590-087/790-087: BioAlgorithms -- Fall 2011

Problem Set #3

Issued: 10/2/2011      Due: In class 10/25/2010

 


Homework Information: Some of the problems are probably too long to attempt the night before the due date, so plan accordingly. No late homework will accepted. However, your lowest homework will be dropped. Feel free to work with others, but the work you hand in should be your own.
Note: 590-087 students are not required to answer the starred (*) parts of questions.

Question 1. Provide pseudocode for a global alignment algorithm between two sequences v and w, subject to the constraint the alignment contains at most k gaps (blocks of consecutive indels).


Question 2. Solve the Exon Chaining problem given the following wieghted intervals:

(1, 4, 3), (1, 12, 7), (1, 12, 8), (7, 11, 9), (8, 20, 9), (10, 18, 7), (12, 20, 5), (14, 16, 9),
(14, 19, 9), (14, 20, 2), (16, 18, 9), (16, 20, 10), (17, 18, 2), (18, 20, 5), (19, 20, 4)

Draw the associated graph, darken the selected intervals, and give the scoring array for the dynamic programming solution.

Replace all the weights of the given set of intervals with the constant value 1, then re-solve the Exon Chaining problem. As before, computer the scoring array, and draw the graph with the solution highlighted. 

* For the special case of the Exon Chaining problem where all intervals have the same weight, design a greedy algorithm that finds the optimal solution. Provide pseudocode.


Question 3. We want to obtain the DNA sequence of a new genome. Because whole genomes are too large to sequence directly, it is common practice to randomly shear multiple copies of it fragments, creating a total of n (possibly overlapping) fragments f1 . . . fn. Assume that the genome's length is known to be N. Careful alignment of each fragment back to a related genome reveals that fragment fi runs from position bi to position ei. Our goal is to find a minimal tiling path of fragments, that is, the smallest subset of fragments such that at least one fragment covers every base of the genome.

Provide pseudocode for a greedy algorithm to find a minimal tiling path. Is your greedy algorithm correct? Why? (Hint: Prove it is correct or give a counterexample)

Provide pseudocode for a dynamic programming algorithm to find a minimal tiling path.

The number of sequenced fragments that cover a given base is called the base's pileup height. What is the maximum pileup height in a minimal tiling path. Explain why your answer is correct.

** Instead of a minimum tiling path, write pseudocode for a minimum pileup height algorithm. Note that the pileup height at each base position must be at least 1.


Question 4. Draw the recursion tree (see figure 7.1 in the textbook) for MergeSort on the input (0,9,1,8,2,7,3,6,4,5).

Give an unsorted 8 element sequence where in every call the Merge(A,B) there is no interleaving of values between the two input lists (i.e. one could achieve the same result as the merge by either concatenating B and A or A and B)


Programming Project. Implement your dynamic programming solution to the minimum tiling path problem of question 3. Make sure that you print out all answers if there are multiple solutions. First, test your code using the set of 16 intervals given in question 2 (ignore the weights). Then try your code on the following set of intervals:

I = [(1, 12), (1, 16), (2, 15), (2, 20), (3, 22), (4, 23), (6, 26), (7, 23), (8, 24), (9, 23), (13, 32), (14, 25), (14, 30), (15, 29), (15, 32), (16, 35), (16, 35), (18, 36), (19, 34), (19, 34), (20, 37), (21, 39), (23, 34), (23, 42), (24, 35), (24, 42), (25, 44), (26, 43), (27, 47), (30, 45), (31, 41), (31, 48), (31, 48), (32, 45), (35, 48), (36, 48), (36, 49), (36, 53), (36, 56), (39, 55), (39, 55), (41, 52), (42, 62), (44, 55), (45, 60), (47, 62), (48, 62), (48, 64), (48, 65), (50, 60), (50, 67), (51, 61), (51, 70), (52, 64), (52, 67), (53, 65), (54, 65), (55, 65), (57, 75), (59, 77), (61, 74), (63, 76), (63, 82), (64, 75), (65, 82), (66, 77), (66, 78), (66, 80), (66, 86), (67, 78), (68, 81), (70, 81), (70, 88), (71, 88), (73, 89), (74, 90), (74, 93), (75, 91), (78, 92), (80, 94), (81, 96), (81, 97), (82, 98), (82, 98), (84, 94), (85, 99), (85, 100), (86, 96), (88, 98), (89, 99), (89, 99), (91, 100), (92, 99), (93, 99), (93, 99), (94, 100), (95, 100), (96, 100), (97, 99), (98, 100), (99, 99)]

Turn in your source code and answers for both sets of intervals.



Posted by allind on 2011-10-21 16:01:33 (references Version 11)

Question on the starred part of question 2: should our pseudocode account for multiple optimal answers? As in, if there are 2 ways to get the same final value, should our algorithm return both paths?

Posted by mcmillan on 2011-10-24 21:28:36 (references Version 12)

I have recovered the missing text in the starred part of question 3. This section is now double starred meaning that it is entirely optional, but worth extra credit for those needing it.




Site built using pyWeb version 1.10
© 2010 Leonard McMillan, Alex Jackson and UNC Computational Genetics