Producing Genomic Sequences after Genome Scaffolding with Ambiguous Paths: Complexity, Approximation and Lower Bounds

Scaffolding is the final step in assembling Next Generation Sequencing data, in which pre-assembled contiguous regions (”contigs”) are oriented and ordered using information that links them (for example, mapping of paired-end reads). As the genome of some species is highly repetitive, we allow placi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Algorithmica 2021-07, Vol.83 (7), p.2063-2095
Hauptverfasser: Davot, Tom, Chateau, Annie, Giroudeau, Rodolphe, Weller, Mathias, Tabary, Dorine
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Scaffolding is the final step in assembling Next Generation Sequencing data, in which pre-assembled contiguous regions (”contigs”) are oriented and ordered using information that links them (for example, mapping of paired-end reads). As the genome of some species is highly repetitive, we allow placing some contigs multiple times, thereby generalizing established computational models for this problem. We study the subsequent problems induced by the translation of solutions of the model back to actual sequences, proposing models and analyzing the complexity of the resulting computational problems. We find both polynomial-time and NP -hard special cases like planarity or bounded degree. Finally, we propose two polynomial-time approximation algorithms according to cut/weight score.
ISSN:0178-4617
1432-0541
DOI:10.1007/s00453-021-00819-6