In [6]:
%%javascript
var width = window.innerWidth || document.documentElement.clientWidth || document.body.clientWidth;
var height = window.innerHeight || document.documentElement.clientHeight || document.body.clientHeight;

IPython.notebook.kernel.execute("windowSize = (" + width + "," + height + ")");
// suitable for small screens
nbpresent.mode.tree.set(
    ["app", "theme-manager", "themes", "my-theme"], 
    {
    palette: {
        "blue": { id: "blue", rgb: [0, 153, 204] },
        "black": { id: "black", rgb: [0, 0, 0] },
        "white": { id: "white", rgb: [255, 255, 255] },
        "red": { id: "red", rgb: [240, 32, 32] },
        "gray": { id: "gray", rgb: [128, 128, 128] },
    },
    backgrounds: {
        "my-background": {
            "background-color": "white"
        }
    },
    "text-base": {
        "font-family": "Georgia",
        "font-size": 2.5
    },
    rules: {
        h1: {
            "font-size": 5.5,
            color: "blue",
            "text-align": "center"
        },
        h2: {
            "font-size": 3,
            color: "blue",
            "text-align": "center"
        },
        h3: {
            "font-size": 3,
            color: "black",
        },
        "ul li": {
            "font-size": 2.5,
            color: "black"
        },
        "ul li ul li": {
            "font-size": 2.0,
            color: "black"
        },
        "code": {
            "font-size": 1.6,
        },
        "pre": {
            "font-size": 1.6,
        }
    }
});

<IPython.core.display.Javascript object>

# Path Finding in Graphs

<img src="./Media/forkintheroad.jpg" width="600">

* Problem Set #2 will be posted before Friday

<p style="text-align: right; clear: right;">1</p>

# From Last Time

<table style="border: none;">
    <tbody>
    <tr style="border: none;">
    <td colspan="3" style="border: none; text-align: center;">
    <h3>Two methods for assembling the 5-mers from the sequence "<code>GACGGCGGCGCACGGCGCAA</code>"</h3>
    </td>
    </tr>
    <tr style="border: none;">
    <td style="border: none; vertical-align: top;" width="45%">
    <h3>Hamiltonian Path:</h3>
    <img src="./Media/HamiltonianV1.png" width="400">
    Find a path that passes through every <em>vertex</em> of this graph exactly once.
    </td>
    <td style="border: none;" width="10%">
    &nbsp;
    </td>
    <td style="border: none;  vertical-align: top;" width="45%">
    <h3>Eulerian Path:</h3>
    <img src="./Media/EulerV1.png" width="400">
    Find a path that passes through every <em>edge</em> of this graph exactly once.
    </td>
    </tr>
    </tbody>
</table>

<p style="text-align: right; clear: right;">2</p>

# De Bruijn's Problem

<table style="border: none;">
    <tbody>
    <tr style="border: none;">
    <td style="border: none; vertical-align: top; padding: 0px 40px; text-align: center;" width="40%">
    <h3>Nicolaas de Bruijn<br>(1918-2012)</h3>
    <img src="./Media/DeBruijn.jpg" width="200">
    <span style="font-size: 80%">A dutch mathematician noted for his many contributions in the fields of graph theory, number theory, combinatorics and logic.</span>
    </td>
    <td style="border: none;  vertical-align: top;" width="60%">
    <h3>Minimal Superstring Problem:</h3>
    <p>Find the shortest sequence that contains all 
    $\left|\Sigma\right|^k$ strings of length $k$ from the alphabet $\Sigma$ 
    as a substring.<p>
    <p style="margin-left: 25px;"><b>Example:</b> All strings of length 3 from the alphabet {'0','1'}.</p>
    <img src="./Media/SuperExample.png" width="400">
    <p>He solved this problem by mapping it to a graph. Note, this particular problem leads to cyclic sequence.</p>
    </td>
    </tr>
    </tbody>
</table>

<p style="text-align: right; clear: right;">3</p>

# De Bruijn's Graphs

<table style="border: none;">
    <tbody>
    <tr style="border: none;">
    <td style="border: none; vertical-align: top;" width="45%">
    <p>Minimal Superstrings can be constructed by taking a Hamiltonian path of an n-dimensional De Bruijn graph over k symbols</p>
    <img src="./Media/superstring.png" width="400">
    </td>
    <td style="border: none;" width="10%">
    &nbsp;
    </td>
    <td style="border: none;  vertical-align: top;" width="45%">
    <p>Or, equivalently, a Eulerian cycle of a<br>(n−1)-dimensional De Bruijn graph.</p>
    <img src="./Media/supereuler.png" width="400">
    </td>
    </tr>
    </tbody>
</table>

<p style="text-align: right; clear: right;">4</p>

# Solving Graph Problems on a Computer

* Graph Representations

<table style="border: none;">
    <tbody>
    <tr style="border: none;">
    <td style="border: none; vertical-align: top;" width="35%">
    <p>An example graph:</p>
    <img src="./Media/simpleGraph.png" width="200">
    </td>
    <td style="border: none; vertical-align: top;" width="30%">
    <p>An Adjacency Matrix:</p>
        <table>
        <tbody>
        <tr style="border: none;"><th> &nbsp;</th><th>A</th><th>B</th><th>C</th><th>D</th><th>E</th></tr>
        <tr style="border: none;"><th>A&nbsp;</th><td>0</td><td>1</td><td>0</td><td>0</td><td>1</td></tr>
        <tr style="border: none;"><th>B&nbsp;</th><td>0</td><td>0</td><td>1</td><td>1</td><td>0</td></tr>
        <tr style="border: none;"><th>C&nbsp;</th><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td></tr>
        <tr style="border: none;"><th>D&nbsp;</th><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td></tr>
        <tr style="border: none;"><th>E&nbsp;</th><td>0</td><td>1</td><td>1</td><td>1</td><td>0</td></tr>
        </tbody>
        </table>
    <p>An <em>n &times; n</em> matrix where A<sub>ij</sub> is 1 if there is an edge connecting
    the i<sup>th</sup> vertex to the j<sup>th</sup> vertex and 0 otherwise.</p>
    </td>
    <td style="border: none; vertical-align: top;" width="35%">
    <p>Adjacency Lists:</p>
    <p style="margin-left: 25px;">Edge = [(0,1), (0,4),<br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(1,2), (1,3),<br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(2,0),<br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(3,0),<br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(4,1), (4,2), (4,3)]</p>
    <p>An array or list of vertex pairs <em>(i,j)</em> indicating an edge from the i<sup>th</sup> vertex to the j<sup>th</sup> vertex.</p>
    </td>
    </tr>
    </tbody>
</table>

<p style="text-align: right; clear: right;">5</p>

# An adjacency list graph object

In [7]:
class BasicGraph:
    def __init__(self, vlist=[]):
        """ Initialize a Graph with an optional vertex list """
        self.index = {v:i for i,v in enumerate(vlist)}    # looks up index given name
        self.vertex = {i:v for i,v in enumerate(vlist)}   # looks up name given index
        self.edge = []
        self.edgelabel = []
    def addVertex(self, label):
        """ Add a labeled vertex to the graph """
        index = len(self.index)
        self.index[label] = index
        self.vertex[index] = label
    def addEdge(self, vsrc, vdst, label='', repeats=True):
        """ Add a directed edge to the graph, with an optional label. 
        Repeated edges are distinct, unless repeats is set to False. """
        e = (self.index[vsrc], self.index[vdst])
        if (repeats) or (e not in self.edge):
            self.edge.append(e)
            self.edgelabel.append(label)

<p style="text-align: right; clear: right;">6</p>

# Usage example

Let's generate the vertices needed to find De Bruijn's superstring of 4-bit binary strings... and create a graph object using them.

In [8]:
import itertools

binary = [''.join(t) for t in itertools.product('01', repeat=4)]

print binary

G1 = BasicGraph(binary)
for vsrc in binary:
    G1.addEdge(vsrc,vsrc[1:]+'0')
    G1.addEdge(vsrc,vsrc[1:]+'1')
    
print
print "Vertex indices = ", G1.index
print
print "Index to Vertex = ",G1.vertex
print
print "Edges = ", G1.edge

['0000', '0001', '0010', '0011', '0100', '0101', '0110', '0111', '1000', '1001', '1010', '1011', '1100', '1101', '1110', '1111']

Vertex indices =  {'0110': 6, '0111': 7, '0000': 0, '0001': 1, '0011': 3, '0010': 2, '0101': 5, '0100': 4, '1111': 15, '1110': 14, '1100': 12, '1101': 13, '1010': 10, '1011': 11, '1001': 9, '1000': 8}

Index to Vertex =  {0: '0000', 1: '0001', 2: '0010', 3: '0011', 4: '0100', 5: '0101', 6: '0110', 7: '0111', 8: '1000', 9: '1001', 10: '1010', 11: '1011', 12: '1100', 13: '1101', 14: '1110', 15: '1111'}

Edges =  [(0, 0), (0, 1), (1, 2), (1, 3), (2, 4), (2, 5), (3, 6), (3, 7), (4, 8), (4, 9), (5, 10), (5, 11), (6, 12), (6, 13), (7, 14), (7, 15), (8, 0), (8, 1), (9, 2), (9, 3), (10, 4), (10, 5), (11, 6), (11, 7), (12, 8), (12, 9), (13, 10), (13, 11), (14, 12), (14, 13), (15, 14), (15, 15)]


<p style="text-align: right; clear: right;">7</p>

# The resulting graph

<img src="./Media/superstring.png" width="700">


<p style="text-align: right; clear: right;">8</p>

# The Hamiltonian Path Problem


Next, we need an *algorithm* to find a path in a graph that visits every node exactly once, if such a path exists.

### How?
<img src="./Media/BruteForce.jpg">
### Approach:
* Enumerate every possible path (all permutations of *N* vertices). Python's <code>itertools.permutations()</code> does this.
* Verify that there is an edge connecting all *N-1* pairs of adjacent vertices

<p style="text-align: right; clear: right;">9</p>

# All vertex permutations &equals; *every*&nbsp; possible path

* A simple graph with 4 vertices

<img src="./Media/Graph4Nodes.png" width="200">

In [9]:
import itertools

start = 0
for path in itertools.permutations([0,1,2,3]):
    if (path[0] != start):
        print
        start = path[0]
    print path,

(0, 1, 2, 3) (0, 1, 3, 2) (0, 2, 1, 3) (0, 2, 3, 1) (0, 3, 1, 2) (0, 3, 2, 1)
(1, 0, 2, 3) (1, 0, 3, 2) (1, 2, 0, 3) (1, 2, 3, 0) (1, 3, 0, 2) (1, 3, 2, 0)
(2, 0, 1, 3) (2, 0, 3, 1) (2, 1, 0, 3) (2, 1, 3, 0) (2, 3, 0, 1) (2, 3, 1, 0)
(3, 0, 1, 2) (3, 0, 2, 1) (3, 1, 0, 2) (3, 1, 2, 0) (3, 2, 0, 1) (3, 2, 1, 0)


Only some of these vertex permutions are actual paths in the graph

<p style="text-align: right; clear: right;">10</p>

# A Hamiltonian Path Algorithm

* Test each vertex permutation to see if it is a valid path
* Let's extend our *BasicGraph* into an *EnhancedGraph* class
* Create the superstring graph and find a Hamiltonian Path

In [11]:
import itertools

class EnhancedGraph(BasicGraph):
    def hamiltonianPath(self):
        """ A Brute-force method for finding a Hamiltonian Path. 
        Basically, all possible N! paths are enumerated and checked
        for edges. Since edges can be reused there are no distictions
        made for *which* version of a repeated edge. """
        for path in itertools.permutations(sorted(self.index.values())):
            for i in xrange(len(path)-1):
                if ((path[i],path[i+1]) not in self.edge):
                    break
            else:
                return [self.vertex[i] for i in path]
        return []
    
G1 = EnhancedGraph(binary)
for vsrc in binary:
    G1.addEdge(vsrc,vsrc[1:]+'0')
    G1.addEdge(vsrc,vsrc[1:]+'1')

# WARNING: takes about 30 mins
%time path = G1.hamiltonianPath()
print path
superstring = path[0] + ''.join([path[i][3] for i in xrange(1,len(path))])
print superstring

CPU times: user 31min 55s, sys: 9.21 s, total: 32min 4s
Wall time: 31min 54s
['0000', '0001', '0010', '0100', '1001', '0011', '0110', '1101', '1010', '0101', '1011', '0111', '1111', '1110', '1100', '1000']
0000100110101111000


<p style="text-align: right; clear: right;">11</p>

# Visualizing the result

<img src="./Media/superPathV1.png" width="600">

<p style="text-align: right; clear: right;">12</p>

# Is this solution unique?

<table style="border: none;">
<tr style="border: none;">
<td style="border: none;">
How about the path = "0000111101001011000"<br><br>
&bullet; Our Hamiltonian path finder produces a single path, if one exists.<br>
&bullet; How would you modify it to produce every valid Hamiltonian path?<br>
&bullet; How long would that take?<br><br>
One of De Bruijn's contributions is that there are:<br><br>
$$\frac{(\sigma!)^{\sigma^{k-1}}}{\sigma^k}$$
<br>paths leading to superstrings where $\sigma = \left|\Sigma\right|$.
</td>
<td style="border: none;">
<img src="./Media/superPathV2.png" width="500">
</td>
</tr>
</table>



In our case $\sigma = 2$ and *k = 4*, so there should be $\frac{2^{2^3}}{2^4} = 16$ paths (ignoring those that are just different starting points on the same cycle) 
</div>
<p style="text-align: right; clear: right;">13</p>

# Brute Force is slow!

* There are *N!* possible paths for *N* vertices.
  - Our 16 vertices give 20,922,789,888,000 possible paths

<img src="./Media/Maze.gif" width="300">

* There is a fairly simple ***Branch-and-Bound evaluation strategy***
  - Grow the path using only *valid* edges
* Use recursion to extend paths along graph *edges*
* Trick is to maintain two lists:
   - The *path so far*, where each adjacent pair of vertices is connected by an edge
   - *Unused* vertices. When the unused list becomes empty we've found a path

<p style="text-align: right; clear: right;">14</p>

# A Branch-and-Bound Hamiltonian Path Finder

In [17]:
import itertools

class ImprovedGraph(Graph):
    def SearchTree(self, path, verticesLeft):
        """ A recursive Branch-and-Bound Hamiltonian Path search. 
        Paths are extended one node at a time using only available
        edges from the graph. """
        if (len(verticesLeft) == 0):
            self.PathV2result = [self.vertex[i] for i in path]
            return True
        for v in verticesLeft:
            if (len(path) == 0) or ((path[-1],v) in self.edge):
                if self.SearchTree(path+[v], [r for r in verticesLeft if r != v]):
                    return True
        return False
    def hamiltonianPath(self):
        """ A wrapper function for invoking the Branch-and-Bound 
        Hamiltonian Path search. """
        self.PathV2result = []
        self.SearchTree([],sorted(self.index.values()))                
        return self.PathV2result

G1 = ImprovedGraph(binary)
for vsrc in binary:
    G1.addEdge(vsrc,vsrc[1:]+'0')
    G1.addEdge(vsrc,vsrc[1:]+'1')
%timeit path = G1.hamiltonianPath()
print path
superstring = path[0] + ''.join([path[i][3] for i in xrange(1,len(path))])
print superstring

10000 loops, best of 3: 127 µs per loop
['0000', '0001', '0010', '0100', '1001', '0011', '0110', '1101', '1010', '0101', '1011', '0111', '1111', '1110', '1100', '1000']
0000100110101111000


That's a considerable speed up, but it *still* might be too slow for some graphs ...

<p style="text-align: right; clear: right;">15</p>

# Is there a better Hamiltonian Path Algorithm?
* Better in what sense?
* Better = number of steps to find a solution are polynomial in either the number of edges or vertices
    - Polynomial: ${variable}^{constant}$
    - Exponential: ${constant}^{variable}$ or worse, ${variable}^{variable}$
    - For example our Brute-Force algorithm was $O(V!) = O(V^V)$ where *V* is the number of vertices in our graph, a problem variable 
* We can only practically solve only small problems if the algorithm for solving them takes a number of steps that grows exponentially with a problem variable (i.e. the number of vertices), or else be satisfied with heuristic or *approximate* solutions
* Can we *prove* that there is no algorithm that can find a Hamiltonian Path in a time that is polynomial in the number of vertices or edges in the graph?
    - No one has, and here is a [million-dollar reward](http://www.claymath.org/millennium-problems) if you can!
    - Given an oracle can suggest an answer (i.e. *Nondeterministically*)
    - It's easy to verify that an answer is correct in *Polynomial* time.
    - A lot of known similar problems will suddenly become solvable using your algorithm

<table style="border: none;">
<tr style="border: none;">
<td style="border: none;"><img src="./Media/ComplexityClasses.png"></td>
<td style="border: none;"><img src="./Media/ComplexityCollapses.png"></td>
</tr>
</table>

<p style="text-align: right; clear: right;">16</p>

# De Bruijn's Key Insight

<table style="border: none;">
    <tbody>
    <tr style="border: none;">
    <td style="border: none; vertical-align: top;" width="50%">
    <div style="font-size: 80%;">
    <p>De Bruijn realized that Minimal Superstrings were ***Eulerian cycles*** in (k−1)-dimensional "De Bruijn graph" (i.e. a graph where the desired strings are *edges*, and vertices are the (k-1)-mer suffixes and prefixes of the string set).</p>
    <p>He also knew that Euler had an ingenous way to solve this problem.</p>
    <p>Recall Euler's desire to counstuct a tour where each bridge was crossed only once.</p>
    <ul>
    <li style="font-size: 100%;">Start at any vertex *v*, and follow until you return to *v*</li>
    <li style="font-size: 100%;">As long as there exists any vertex *u* that belongs to the current tour, but has adjacent edges that are not part of the tour
      <ul>
      <li>Start a new trail from *u*</li>
      <li>Following unused edges until returning to *u*</li> 
      <li>Join the new trail to the original tour</li>
      </ul>
    </ul>
    <p>He didn't solve the general Hamiltonian Path problem, but he was able to remap the Minimal Superstring problem to a simpler problem. Note every Minimal Superstring Problem can be fomulated as a Hamiltonian Path in some graph, but the converse is not true. Instead, he found a clever mapping of every Minimal Superstring Problem to a Eulerian Path problem.<p>
    <p>Let's demonstrate using the islands and bridges shown to the right</p>
    </div>
    </td>
    <td style="border: none; vertical-align: top;" width="50%">
    <img src="./Media/IslandGraph.png" width="400">
    </td>
    </tr>
    </tbody>
</table>

<p style="text-align: right; clear: right;">17</p>

# An algorithm for finding an Eulerian cycle

Our first path:
<img src="./Media/EulerTourPart1.png" width="600">

Take a side-trip, and merge it in:
<img src="./Media/EulerTourPart2.png" width="600">

<p style="text-align: right; clear: right;">18</p>

# Continue making side trips

Merging in a second side-trip:
<img src="./Media/EulerTourPart3.png" width="600">

Merging in a third side-trip:
<img src="./Media/EulerTourPart4.png" width="600">

<p style="text-align: right; clear: right;">19</p>

# Repeat until there are no more side trips to take
Merging in a final side-trip:
<img src="./Media/EulerTourPart5.png" width="600">

This algorithm requires a number of steps that is linear in the number of graph edges, $O(E)$. The number of edges in a general graph is $E = O(V^2)$ (the adjacency matrix tells us this).

<p style="text-align: right; clear: right;">20</p>

# Converting to code

In [18]:
    # A new method for our Graph Class
    def eulerianPath(self):
        graph = [(src,dst) for src,dst in self.edge]
        currentVertex = self.verifyAndGetStart()
        path = [currentVertex]
        # "next" is the list index where vertices get inserted into our tour
        # it starts at the end (i.e. same as appending), but later "side-trips" will insert in the middle
        next = 1
        while len(graph) > 0:
            # follows a path until it ends
            for edge in graph:
                if (edge[0] == currentVertex):
                    currentVertex = edge[1]
                    graph.remove(edge)
                    path.insert(next, currentVertex)
                    next += 1
                    break
            else:
                # Look for side-trips along the path
                for edge in graph:
                    try:
                        # insert our side-trip after the "u" vertex that is starts from
                        next = path.index(edge[0]) + 1
                        currentVertex = edge[0]
                        break
                    except ValueError:
                        continue
                else:
                    print "There is no path!"
                    return False
        return path

Some issues with our code:
  * Where do we start our tour? (The mysterious VerifyandGetStart() method)
  * Where will it end?
  * How do we know that each side-trip will rejoin the graph at the same point where it began?

<p style="text-align: right; clear: right;">21</p>

# Euler's Theorems

* A graph is balanced if for every vertex the number of incoming edges equals to the number of outgoing edges:

$$in(v) = out(v)$$

* **Theorem 1:**  A connected graph has a *Eulerian Cycle* if and only if each of its vertices are balanced.
  - **Sketch of Proof:**
  - In mid-tour of a valid Euler cycle, there must be a path onto an island and another path off
  - This is true until no paths exist
  - Thus every vertex must be balanced


* **Theorem 2**:  A connected graph has an *Eulerian Path* if and only if it contains at exacty two semi-balanced vertices and all others are balanced.
  - Exceptions are allowed for the start and end of the tour
  - A single start vertex can have one more outgoing path than incoming paths
  - A single end vertex can have one more incoming path than outgoing paths 

$$\mbox{Semi-balanced vertex: }\left|in(v) - out(v)\right| = 1$$

  - One of the semi-balanced vertices, with $out(v) = in(v) + 1$ is the start of the tour
  - The othersemi-balanced vertex, with $in(v) = out(v) + 1$ is the end of the tour

<p style="text-align: right; clear: right;">22</p>

# VerifyAndGetStart code

In [19]:
    # More new methods for the Graph Class
    def degrees(self):
        """ Returns two dictionaries with the inDegree and outDegree
        of each node from the graph. """
        inDegree = {}
        outDegree = {}
        for src, dst in self.edge:
            outDegree[src] = outDegree.get(src, 0) + 1
            inDegree[dst] = inDegree.get(dst, 0) + 1
        return inDegree, outDegree
    def verifyAndGetStart(self):
        inDegree, outDegree = self.degrees()
        start, end = 0, 0
        # node 0 will be the starting node is a Euler cycle is found
        for vert in self.vertex.iterkeys():
            ins = inDegree.get(vert,0)
            outs = outDegree.get(vert,0)
            if (ins == outs):
                continue
            elif (ins - outs == 1):
                end = vert
            elif (outs - ins == 1):
                start = vert
            else:
                start, end = -1, -1
                break
        if (start >= 0) and (end >= 0):
            return start
        else:
            return -1

<p style="text-align: right; clear: right;">23</p>

# A New Graph Class

In [20]:
import itertools

class AwesomeGraph(ImprovedGraph):
    def degrees(self):
        """ Returns two dictionaries with the inDegree and outDegree
        of each node from the graph. """
        inDegree = {}
        outDegree = {}
        for src, dst in self.edge:
            outDegree[src] = outDegree.get(src, 0) + 1
            inDegree[dst] = inDegree.get(dst, 0) + 1
        return inDegree, outDegree
    def verifyAndGetStart(self):
        inDegree, outDegree = self.degrees()
        start = 0
        end = 0
        for vert in self.vertex.iterkeys():
            ins = inDegree.get(vert,0)
            outs = outDegree.get(vert,0)
            if (ins == outs):
                continue
            elif (ins - outs == 1):
                end = vert
            elif (outs - ins == 1):
                start = vert
            else:
                start, end = -1, -1
                break
        if (start >= 0) and (end >= 0):
            return start
        else:
            return -1
    def eulerianPath(self):
        graph = [(src,dst) for src,dst in self.edge]
        currentVertex = self.verifyAndGetStart()
        path = [currentVertex]
        # "next" is where vertices get inserted into our tour
        # it starts at the end (i.e. it is the same as appending),
        # but later "side-trips" will insert in the middle
        next = 1
        while len(graph) > 0:
            for edge in graph:
                if (edge[0] == currentVertex):
                    currentVertex = edge[1]
                    graph.remove(edge)
                    path.insert(next, currentVertex)
                    next += 1
                    break
            else:
                for edge in graph:
                    try:
                        next = path.index(edge[0]) + 1
                        currentVertex = edge[0]
                        break
                    except ValueError:
                        continue
                else:
                    print "There is no path!"
                    return False
        return path
    def eulerEdges(self, path):
        edgeId = {}
        for i in xrange(len(self.edge)):
            edgeId[self.edge[i]] = edgeId.get(self.edge[i], []) + [i]
        edgeList = []
        for i in xrange(len(path)-1):
            edgeList.append(self.edgelabel[edgeId[path[i],path[i+1]].pop()])            
        return edgeList
    def render(self, highlightPath=[]):
        """ Outputs a version of the graph that can be rendered
        using graphviz tools (http://www.graphviz.org/)."""
        edgeId = {}
        for i in xrange(len(self.edge)):
            edgeId[self.edge[i]] = edgeId.get(self.edge[i], []) + [i]
        edgeSet = set()
        for i in xrange(len(highlightPath)-1):
            src = self.index[highlightPath[i]]
            dst = self.index[highlightPath[i+1]]
            edgeSet.add(edgeId[src,dst].pop())
        result = ''
        result += 'digraph {\n'
        result += '   graph [nodesep=2, size="10,10"];\n'
        for index, label in self.vertex.iteritems():
            result += '    N%d [shape="box", style="rounded", label="%s"];\n' % (index, label)
        for i, e in enumerate(self.edge):
            src, dst = e
            result += '    N%d -> N%d' % (src, dst)
            label = self.edgelabel[i]
            if (len(label) > 0):
                if (i in edgeSet):
                    result += ' [label="%s", penwidth=3.0]' % (label)
                else:
                    result += ' [label="%s"]' % (label)
            elif (i in edgeSet):
                result += ' [penwidth=3.0]'                
            result += ';\n'                
        result += '    overlap=false;\n'
        result += '}\n'
        return result

<b>Note:</b> I also added an eulerEdges() method to the class. The Eulerian Path algorithm returns a list of vertices along the path, which is consistent with the Hamiltonian Path algorithm. However, in our case, we are less interested in the series of vertices visited than we are the series of edges. Thus, eulerEdges(), returns the edge labels along a path.

<p style="text-align: right; clear: right;">24</p>

# Finding Minimal Superstrings with an Euler Path

In [23]:
binary = [''.join(t) for t in itertools.product('01', repeat=4)]

nodes = sorted(set([code[:-1] for code in binary] + [code[1:] for code in binary]))
G2 = AwesomeGraph(nodes)
for code in binary:
   # Here I give each edge a label
   G2.addEdge(code[:-1],code[1:],code)

%timeit path = G2.eulerianPath()
print nodes
print path
print G2.eulerEdges(path)

10000 loops, best of 3: 30.2 µs per loop
['000', '001', '010', '011', '100', '101', '110', '111']
[0, 0, 1, 3, 7, 7, 6, 5, 3, 6, 4, 1, 2, 5, 2, 4, 0]
['0000', '0001', '0011', '0111', '1111', '1110', '1101', '1011', '0110', '1100', '1001', '0010', '0101', '1010', '0100', '1000']


Perhaps we should have called it *WickedAwesomeGraph*!

<p style="text-align: right; clear: right;">25</p>

# Our graph and its Euler path 
* In this case our the graph was fully balanced. So the Euler Path is a cycle.
* Our tour starts arbitarily with the first vertex, '000'

<img src="./Media/supereuler.png" width="500">

000 &rarr; 000 &rarr; 001 &rarr; 011 &rarr; 111 &rarr; 111 &rarr; 110 &rarr; 101 &rarr; 011 &rarr; 110 &rarr; 100 &rarr; 001 &rarr; 010 &rarr; 101 &rarr; 010 &rarr; 100 &rarr; 000

superstring = "<code>0000111101100101000</code>"
<p style="text-align: right; clear: right;">26</p>

# Next Time

* We return to genome assembly
<img src="./Media/Eureka.gif" width="500">

<p style="text-align: right; clear: right;">27</p>