Logged in as: guest Log in
Problem Set #2 mcmillan / Version 6

Comp 590-087/790-087: BioAlgorithms -- Fall 2011

Problem Set #2

Issued: 9/18/2011      Due: In class 10/6/2010

 


Homework Information: Some of the problems are probably too long to attempt the night before the due date, so plan accordingly. No late homework will accepted. However, your lowest homework will be dropped. Feel free to work with others, but the work you hand in should be your own.
Note: 590-087 students are not required to answer the starred (*) parts of questions.

 

Question 1. Write pseudocode for the TotalDistance(word, DNA) function used in various the MedianSearch algorithms given in section 4.9 and characterize its performance in terms of t, n, and l.

In the BranchAndBoundMedianSearch algorithm the cost of computing TotalDistance seems unwarranted for small prefixes. For example, it is highly likely that all 2-mers appear in most strings over some length. This results in an optimisticDistance of 0, and provides no pruning. Suggest a hueristc for directly calling NextVertex() without testing the TotalDistance based on both the length of the prefix and, n, the sequence length, and write pseudocode that implements this algorithm variant.

 


 

Question 2. Find a permutation with no decreasing strips for which there exists a reversal that reduces the number of breakpoints.

How many permutations are there of 6 elements?  How many have a single breakpoint? How many permuations of 6 elements have exactly 2 breakpoints.

*Generalize your results from above for 6 elements to n elements.

 


 

Question 3. In the cookie game two players begin with two stacks of cookies of heights n and m. In each turn a player eats two cookies from one stack and one cookie from the other. The player who cannot complete his turn loses. Who will win? Describe the winning strategy for any value of n and m (Hint: try dynamic programming).

 


 

Question 4. Fill the global alignment dynamic programming matrix for strings TA and TTCA with affine scoring function defined by match premium 0, mismatch penalty -1, gap opening penalty -1, and gap extension penalty -1. Find all optimal global alignments.

 


 

Question 5. Consider the Manhattan Tourist problem that we discussed in class. Assume that we need to go back to the source (0,0) after we reach the sink (m, n), by going either to the north or to the west (from (m, n) to (0,0)).  And we want to maximize the total number of distinct attractions along the combination of these two paths. Design a dynamic programming algorithm to find this pair of paths.

*How many total paths are there from (0,0) to (m,n)?

 

 

Programming Problem. Modify the Longest Common Subsequence Algorithm compnents LCS(v,w) and PrintLCS(b,v,i,j) given on page 176 to return all answers. Turn in a a code listing and your program's output for the following sequences:

v = "tcagggctgggcataaaaggaagaacagggacagctgctgcttacatttgcttctgaaacaaccgtgttc\
      actagcaaccacaaacagacaccatggtgcatctgactgctgaagagaagagtcttgtctccggcctgtg\
      gggcaaggtgaatgtggacgaagttggcggtgaggccctgggcaggctgctgattgtctacccctggact\
      cagaggttctttgactcctttggggacctgtccactcctgatgctgttatgagcaatgctaaagtgaagg\
      cccatggcaagaaggtgctgaactcctttagtgatggcctgaaaaacctggacaacctcaagggcacctt\
      tgctaagctcagtgagcttcactgtgacaagctgcacgtggatcccgagaacttcaagctcctgggcaac\
      gtgcttgtgtgtgtgctggctcaccactttggcaaagaattcacccctcaggtgcaggctgcctatcaga\
      aggtggtggctggtgtggccaatgccctggctcacaagtaccattgagatcctggcctgtttcctggtga\
      ccactggaagaccctgtctccctaaattctttcttctgaactgggggaaataatgtccaccatcaagggt\
      atggcttctgcctaataaagaaccttcagctcaactttctgattcattc"

w = "acatttgcttctgacacaactgtgttcactagcaacctcaaacagacaccatggtgcatctgactcctgag\
        gagaagtctgccgttactgccctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcag\
        gctgctggtggtctacccttggacccagaggttctttgagtcctttggggatctgtccactcctgatgctg\
        ttatgggcaaccctaaggtgaaggctcatggcaagaaagtgctcggtgcctttagtgatggcctggctcac\
        ctggacaacctcaagggcacctttgccacactgagtgagctgcactgtgacaagctgcacgtggatcctga\
        gaacttcaggctcctgggcaacgtgctggtctgtgtgctggcccatcactttggcaaagaattcaccccac\
        cagtgcaggctgcctatcagaaagtggtggctggtgtggctaatgccctggcccacaagtatcactaagct\
        cgctttcttgctgtccaatttctattaaaggttcctttgttccctaagtccaactactaaactgggggata\
        ttatgaagggccttgagcatctggattctgcctaataaaaaacatttattttcattgc"



Posted by leonardo on 2011-10-02 22:16:48 (references Version 6)

In Question 2, when counting the number of breakpoints, should we consider about the element 0, n+1 and the possible breakpoints next to them?

Posted by mcmillan on 2011-10-03 23:54:17 (references Version 6)

Yes. You should consider breakpoints adjacent to the sequence extensions (0, n+1).

Posted by mcmillan on 2011-10-03 23:57:07 (references Version 6)

I fixed a bug in the PrintLCS( ) code given in class (the "print v[i]," should have been "print v[i-1]"). I also added code to print out the alignment.

Posted by leonardo on 2011-10-05 02:26:09 (references Version 6)

In the Programming Problem, when using the long v and w sequences given above as inputs, I got tons of answers since there are too many ways to get a LCS (if my code is correct). I think the program would not stop in an hour... I wonder anyone else has the same problem?




Site built using pyWeb version 1.10
© 2010 Leonard McMillan, Alex Jackson and UNC Computational Genetics