Ten simple rules for annotating sequencing experiments
About the Authors: Irene Stevens * E-mail: irene.stevens@ki.se Affiliations Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden, Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden ORCID logo http://orcid.org/0000-0003-3823-1499 Abdul Kadir Mukarram Aff...
Gespeichert in:
Veröffentlicht in: | PLoS computational biology 2020-10, Vol.16 (10), p.e1008260-e1008260 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | About the Authors: Irene Stevens * E-mail: irene.stevens@ki.se Affiliations Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden, Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden ORCID logo http://orcid.org/0000-0003-3823-1499 Abdul Kadir Mukarram Affiliation: Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden ORCID logo http://orcid.org/0000-0002-9726-0399 Matthias Hörtenhuber Affiliation: Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden ORCID logo http://orcid.org/0000-0002-5599-5565 Terrence F. Meehan Affiliation: European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom Johan Rung Affiliations Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden ORCID logo http://orcid.org/0000-0001-5875-8429 Carsten O. Daub Affiliations Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden, Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden ORCID logo http://orcid.org/0000-0002-3295-8729 Introduction A file of nucleic acid sequences itself is not descriptive. Furthermore, metadata provides the basis for supervised machine learning algorithms using labeled data and indexing Next Generation Sequencing datasets into public repositories to support database queries and data discovery. [...]metadata is key for making data Findable, Accessible, Interoperable, and Reusable (FAIR) [1]. Several large-scale sequencing projects, such as the Functional Annotation of the Mammalian Genome (FANTOM5) [13], Encyclopedia of DNA Elements (ENCODE) [14], and the Danio Rerio Encyclopedia of DNA Elements (DANIO-CODE) [15], have established additional metadata models to customarily describe their data in a systematic way that allows for integrative analysis of disparate datasets. Under each section, we defined weights on the terms such as required (e.g., biosample type), conditionally required (e.g., target of a chromatin immunoprecipitation sequencing (ChIP-seq assay)), and optional terms (e.g., chemistry version used for sequencing). |
---|---|
ISSN: | 1553-7358 1553-734X 1553-7358 |
DOI: | 10.1371/journal.pcbi.1008260 |