Ultrafast one‐pass FASTQ data preprocessing, quality control, and deduplication using fastp

A large amount of sequencing data is generated and processed every day with the continuous evolution of sequencing technology and the expansion of sequencing applications. One consequence of such sequencing data explosion is the increasing cost and complexity of data processing. The preprocessing of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:iMeta 2023-05, Vol.2 (2), p.e107-n/a
1. Verfasser: Chen, Shifu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A large amount of sequencing data is generated and processed every day with the continuous evolution of sequencing technology and the expansion of sequencing applications. One consequence of such sequencing data explosion is the increasing cost and complexity of data processing. The preprocessing of FASTQ data, which means removing adapter contamination, filtering low‐quality reads, and correcting wrongly represented bases, is an indispensable but resource intensive part of sequencing data analysis. Therefore, although a lot of software applications have been developed to solve this problem, bioinformatics scientists and engineers are still pursuing faster, simpler, and more energy‐efficient software. Several years ago, the author developed fastp, which is an ultrafast all‐in‐one FASTQ data preprocessor with many modern features. This software has been approved by many bioinformatics users and has been continuously maintained and updated. Since the first publication on fastp, it has been greatly improved, making it even faster and more powerful. For instance, the duplication evaluation module has been improved, and a new deduplication module has been added. This study aimed to introduce the new features of fastp and demonstrate how it was designed and implemented. Fastp is a widely adopted tool for FASTQ data preprocessing and quality control. It is ultrafast and versatile and can perform adapter removal, global or quality trimming, read filtering, unique molecular identifier processing, base correction, and many other actions within a single pass of data scanning. Fastp has been reconstructed and upgraded with some new features. Compared to fastp 0.20.0, the new fastp 0.23.2 is even 80% faster. Highlights Fastp is an ultrafast tool that processes FASTQ data in a single pass. Fastp has been redesigned to make it faster and generate reproducible results. Fastp introduces efficient FASTQ‐level deduplication without sorting the reads.
ISSN:2770-596X
2770-5986
2770-596X
DOI:10.1002/imt2.107