Detecting exact breakpoints of deletions with diversity in hepatitis B viral genomic DNA from next-generation sequencing data

•The proposed VirDelect detects exact breakpoints of deletions with characteristics of HBV genomic DNA.•Three phases are proposed to efficiently reduce computation cost of split read alignment without losing accuracy.•VirDelect was validated on both simulation and real data to prove its feasibility....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Methods (San Diego, Calif.) Calif.), 2017-10, Vol.129, p.24-32
Hauptverfasser: Cheng, Ji-Hong, Liu, Wen-Chun, Chang, Ting-Tsung, Hsieh, Sun-Yuan, Tseng, Vincent S.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•The proposed VirDelect detects exact breakpoints of deletions with characteristics of HBV genomic DNA.•Three phases are proposed to efficiently reduce computation cost of split read alignment without losing accuracy.•VirDelect was validated on both simulation and real data to prove its feasibility. Many studies have suggested that deletions of Hepatitis B Viral (HBV) are associated with the development of progressive liver diseases, even ultimately resulting in hepatocellular carcinoma (HCC). Among the methods for detecting deletions from next-generation sequencing (NGS) data, few methods considered the characteristics of virus, such as high evolution rates and high divergence among the different HBV genomes. Sequencing high divergence HBV genome sequences using the NGS technology outputs millions of reads. Thus, detecting exact breakpoints of deletions from these big and complex data incurs very high computational cost. We proposed a novel analytical method named VirDelect (Virus Deletion Detect), which uses split read alignment base to detect exact breakpoint and diversity variable to consider high divergence in single-end reads data, such that the computational cost can be reduced without losing accuracy. We use four simulated reads datasets and two real pair-end reads datasets of HBV genome sequence to verify VirDelect accuracy by score functions. The experimental results show that VirDelect outperforms the state-of-the-art method Pindel in terms of accuracy score for all simulated datasets and VirDelect had only two base errors even in real datasets. VirDelect is also shown to deliver high accuracy in analyzing the single-end read data as well as pair-end data. VirDelect can serve as an effective and efficient bioinformatics tool for physiologists with high accuracy and efficient performance and applicable to further analysis with characteristics similar to HBV on genome length and high divergence. The software program of VirDelect can be downloaded at https://sourceforge.net/projects/virdelect/.
ISSN:1046-2023
1095-9130
DOI:10.1016/j.ymeth.2017.08.005