Logged in as: guest Log in
Problem Set #5 mcmillan / Version 20

Comp 555: BioAlgorithms -- Fall 2013

Problem Set #4

Issued: 11/19/2013      Due: In class 12/03/2013

 


Homework Information: Some of the problems are probably too long to attempt the night before the due date, so plan accordingly. No late homework will accepted. However, your lowest homework will be dropped. Feel free to work with others, but the work you hand in should be your own.

 

 

Question 1. Consider the following distance matrix

  A B C D E F
A 0 18 15 21 6 16
B   0 23 19 20 24
C     0 26 17 19
D       0 23 27
E         0 18
F           0

 

  1. Verify that this distance matrix is additive using the four-point condition. Show your work.
  2. Design an efficient method to determine “delta” (the trimming parameter) for an iteration of AdditivePhylogeny algorithm
  3. Construct a phylogeny tree corresponding to this distance matrix using AdditivePhylogeny algorithm. Show intermediate steps by showing intermediate distance matrices during the collapse phase followed by the corresponding trees at the expansion phase.

 


 Question 2.  Consider the following SNP panel where rows are haplotypes and columns are SNPs.

 

  S1 S2
S3
S4
S5
S6
H1 0 0 0 0 0 0
H2
1 0 1 0 0 1
H3 1 0 0 1 0 0
H4 1 0 1 0 0 1
H5 0 0 0 1 1 1
H6 1 1 0 1 0 1

 

Compute and depict the maximal compatible intervals using each of the following:

  1. A Left-to-Right Scan
  2. A Right-to-Left Scan
  3. An Uber Scan

 


 

 

Programming Problem.  (Please submit code by emailing kemal@cs.unc.edu with the subject "COMP 555 PS5")

Consider the following file of genotypes, Gtypes.csv, with four samples. In this file the genotypes for a given genomic position are given for each row. The columns correspond to a marker name, a chromosome, and a columes for the genotypes calls for all four samples. The column labelled "G2" is a second generation cross with the following pedigree: G2 = FVB/NJ x (PWK/PhJ x WSB/EiJ), and, thus is decendent of the samples genotyped in the other three columns. This animal has one chromosome inherited from its maternal parent (FVB/NJ) and the second chromosome is a mix of its grandparents (PWK/PhJ and WSB/EiJ).

The objective of this programming project is to use a Hidden-Markov Model (HMM) to infer the genomic origin (PWK/PhJ and WSB/EiJ) at each marker of the of the second chromosome.

This problem closely resembles the "Fair-Bet Casino" problem discussed both in the textbook and in class. There are two possible states at every marker P and W. The HMM emits a genotype value with the following likelihoods:

P State
PWK FVB f p H N
f f 0.95 0.01 0.02 0.02
p f 0.02 0.02 0.94 0.02
N or H f 0.40 0.10 0.30 0.20
p N or H 0.10 0.40 0.30 0.20
N H 0.10 0.10 0.40 0.40
H N 0.10 0.10 0.40 0.40
N N 0.20 0.20 0.20 0.40
H H 0.20 0.20 0.40 0.20
           
W State
WSB FVB f w H N
f f 0.95 0.01 0.02 0.02
w f 0.02 0.02 0.94 0.02
N or H f 0.40 0.10 0.30 0.20
w N or H 0.10 0.40 0.30 0.20
N H 0.10 0.10 0.40 0.40
H N 0.10 0.10 0.40 0.40
N N 0.20 0.20 0.20 0.40
H H 0.20 0.20 0.40 0.20

Where 'f' represents the FVB "nucleotide" genotype call, and 'p'/'w' represent the PWK/WSB nucleotide if it is different than 'f'. The probability of transitioning from the P state to the W state, or vice versa, is 0.01.




Site built using pyWeb version 1.10
© 2010 Leonard McMillan, Alex Jackson and UNC Computational Genetics