Abundant human DNA contamination identified in non-primate genome databases

During routine screens of the NCBI databases using human repetitive elements we discovered an unlikely level of nucleotide identity across a broad range of phyla. To ascertain whether databases containing DNA sequences, genome assemblies and trace archive reads were contaminated with human sequences...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PloS one 2011-02, Vol.6 (2), p.e16410
Hauptverfasser:	Longo, Mark S, O'Neill, Michael J, O'Neill, Rachel J
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Animals Archives & records Bacteria Base Sequence Bioinformatics Biology Contamination Data bases Databases, Genetic - standards Deoxyribonucleic acid DNA DNA Contamination DNA sequencing Gene sequencing Genome Genomes Genomics Human performance Humans Mice Molecular Sequence Data Nucleotide sequence Phylogeny Primates Pseudomonas Pseudomonas - genetics Pseudomonas aeruginosa Screens Sequence Analysis, DNA Sequence Homology, Nucleic Acid Zea mays - genetics
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	During routine screens of the NCBI databases using human repetitive elements we discovered an unlikely level of nucleotide identity across a broad range of phyla. To ascertain whether databases containing DNA sequences, genome assemblies and trace archive reads were contaminated with human sequences, we performed an in depth search for sequences of human origin in non-human species. Using a primate specific SINE, AluY, we screened 2,749 non-primate public databases from NCBI, Ensembl, JGI, and UCSC and have found 492 to be contaminated with human sequence. These represent species ranging from bacteria (B. cereus) to plants (Z. mays) to fish (D. rerio) with examples found from most phyla. The identification of such extensive contamination of human sequence across databases and sequence types warrants caution among the sequencing community in future sequencing efforts, such as human re-sequencing. We discuss issues this may raise as well as present data that gives insight as to how this may be occurring.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0016410