1
2
A fixed penalty σ is given to every indel:
Can be too severe penalty for a series of 100 consecutive indels
5
6
Gaps- contiguous sequence of indels in one of the rows
Modify the scoring for a gap of length x to be:
-(ρ + σx)
where ρ+σ > 0 is the penalty for introducing a gap:
gap opening penalty
and σ is the cost of extending it further (ρ+σ >>σ):
gap extension penalty
because you do not want to add too much of a penalty for further extending the gap, once it is opened.
7
8
To reflect affine gap penalties we have to add “long” horizontal and vertical edges to the edit graph.
Each such edge of length x should have weight -ρ - x·σ
There are many such edges!
Adding them to the graph increases the running time of the alignment algorithm by a factor of n (where n is the number of vertices)
So the complexity increases from $O(n^2)$ to $O(n^3)$
9
11
Levels:
A jumping penalty is assigned to moving from the main level to either the upper level or the lower level (-ρ - σ)
There is a gap extension penalty for each continuation on a level other than the main level (-σ)
12
13
In a similar way, we represent alignment of 3 sequences as a 3-row matrix
A T _ G C G _
A _ C G T _ A
A T C A C _ A
14
15
16
18
For 3 sequences of length n, the run time is $7n^3$; $O(n^3)$
For k sequences, build a k-dimensional Manhattan, with run time $(2^k-1)(n^k)$; $O(2^kn^k)$
Conclusion: dynamic programming approach for alignment between two sequences is easily extended to k sequences but it is impractical due to exponential running time
21
Every multiple alignment induces pairwise alignments
x: AC-GCGG-C
y: AC-GC-GAG
z: GCCGC-GAG
Induces:
x: ACGCGG-C; x: AC-GCGG-C; y: AC-GCGAG
y: ACGC-GAC; z: GCCGC-GAG; z: GCCGCGAG
22
Do Pairwise Alignments imply a Multiple Alignment?
Given 3 arbitrary pairwise alignments:
x: ACGCTGG-C; x: AC-GCTGG-C; y: AC-GC-GAG
y: ACGC--GAC; z: GCCGCA-GAG; z: GCCGCAGAG
Can we construct a multiple alignment that induces them?
NOT ALWAYS
Why? Because pairwise alignments may be arbitrarily inconsistent
23
But, in others we cannot because one alignment makes a choice that is inconsistent with the overall best choice
AAAATTTT-------- ----AAAATTTT----
----TTTTGGGG---- -OR- --------TTTTGGGG
--------GGGGAAAA GGGGAAAA--------
Is there another way?
24
25
We used profile scores earlier when we discussed Motif finding
- A G G C T A T C A C C T G
T A G – C T A C C A - - - G
C A G – C T A C C A - - - G
C A G – C T A T C A C – G G
C A G – C T A T C G C – G G
A 0 5 0 0 0 0 5 0 0 4 0 0 0 0
C 3 0 0 0 5 0 0 2 5 0 3 1 0 0
G 0 0 5 1 0 0 0 0 0 1 0 0 2 5
T 1 0 0 0 0 5 0 3 0 0 0 0 1 0
- 1 0 0 4 0 0 0 0 0 0 2 4 2 0
Thus far we have aligned sequences against other sequences
Can we align a sequence against a profile?
Can we align a profile against a profile?
26
A more general version of the multi-alignment problem:
Given two alignments, can we align them?
x: GGGCACTGCAT
y: GGTTACGTC-- Alignment 1
z: GGGAACTGCAG
w: GGACGTACC-- Alignment 2
v: GGACCT-----
Idea: don’t use the sequences, but align their profiles
x: GGGCAC=TGCAT
y: GGTTAC=GTC--
z: GGGAAC=TGCAG Combined Alignment
|| || | |
w: GG==ACGTACC--
v: GG==ACCT-----
27
28
s1: GATTCA s2: GTCTGA s3: GATATT s4: GTCAGC
29
s2: GTCTGA s1: GATTCA-- s4: GTCAGC (score = 2) s4: G-T-CAGC (score = 0) s1: GAT-TCA s2: G-TCTGA s2: G-TCTGA (score = 1) s3: GATAT-T (score = -1) s1: GAT-TCA s3: GAT-ATT s3: GATAT-T (score = 1) s4: G-TCAGC (score = -1)
29
s2: G T C T G A | | | | → s2,4: G T C t/a G a/c s4: G T C A G C
s1 : G A T T C A s3 : G A T A T T s2,4: G T C t/a G a/c
s1 : GAT-TCA s3 : GATAT-T (score = 1 + 1 + 1 - 1 + 1 - 1 - 1 = 1) s1 : GAT-TCA s2,4: G-TCtGa (score = 2 - 2 + 2 - 2 + 1 - 1 + 1 = 1) s3 : GATAT-T s2,4: G-TCtGa (score = 2 - 2 + 2 - 2 + 1 - 1 - 1 = -1)
29
Progressive alignment is a variation of a greedy profile alignment algorithm with a somewhat more intelligent strategy for choosing the order of alignments.
Progressive alignment works well for close sequences, but deteriorates for distant sequences
CLUSTAL OMEGA
30
‘W’ stands for ‘weighted’ (different parts of alignment are weighted differently).
Three-step process
31
Pairwise alignment
32
33
34
35