Systems and methods for de novo assembly of nucleotide sequence reads using a modified string graph
Systems and methods to automatically de novo assemble a set of unordered read sequences into one or more, larger nucleotide sequences are presented. The method involves first creating two identical sets of the reads, dividing each read in both sets into smaller sorted mer sequences and then comparin...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Systems and methods to automatically de novo assemble a set of unordered read sequences into one or more, larger nucleotide sequences are presented. The method involves first creating two identical sets of the reads, dividing each read in both sets into smaller sorted mer sequences and then comparing the mers for each read in set 1 to the mers from each read in set 2 to exhaustively identify overlapping segments. Overlap information is used to construct a modified assembly string graph, traversal of which produces a sorted string graph layout file consisting of all the reads ordered left to right including their approximate starting offset position. The sorted string graph layout file is then processed by a novel multiple sequence alignment system that uses mer matches between all the overlapping reads at a given position to place matching individual bases from each read into columns from which an overall consensus sequence is determined. |
---|