Ten simple rules for annotating sequencing experiments

About the Authors: Irene Stevens * E-mail: irene.stevens@ki.se Affiliations Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden, Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden ORCID logo http://orcid.org/0000-0003-3823-1499 Abdul Kadir Mukarram Aff...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PLoS computational biology 2020-10, Vol.16 (10), p.e1008260-e1008260
Hauptverfasser:	Stevens, Irene, Mukarram, Abdul Kadir, Hörtenhuber, Matthias, Meehan, Terrence F, Rung, Johan, Daub, Carsten O
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Annotations Archives & records Artificial intelligence Bibliographical citations Bioinformatics Biology and Life Sciences Chromatin Colleges & universities Computational Biology Computer and Information Sciences Data integrity Datasets Deoxyribonucleic acid DNA Dublin Core Format Encyclopedias Experiments Gene Ontology Genetics Genomes Genomics Genomics - methods Genomics - standards Identification and classification Immunology Immunoprecipitation Laboratories Learning algorithms Machine learning Medicin och hälsovetenskap Metadata Molecular biology Molecular Sequence Annotation - methods Molecular Sequence Annotation - standards Next-generation sequencing Nucleic acids Nucleotide sequence Nutrition Ontology Research and Analysis Methods Sequence Analysis, DNA - methods Sequence Analysis, DNA - standards Vocabularies & taxonomies
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	About the Authors: Irene Stevens * E-mail: irene.stevens@ki.se Affiliations Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden, Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden ORCID logo http://orcid.org/0000-0003-3823-1499 Abdul Kadir Mukarram Affiliation: Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden ORCID logo http://orcid.org/0000-0002-9726-0399 Matthias Hörtenhuber Affiliation: Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden ORCID logo http://orcid.org/0000-0002-5599-5565 Terrence F. Meehan Affiliation: European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom Johan Rung Affiliations Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden ORCID logo http://orcid.org/0000-0001-5875-8429 Carsten O. Daub Affiliations Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden, Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden ORCID logo http://orcid.org/0000-0002-3295-8729 Introduction A file of nucleic acid sequences itself is not descriptive. Furthermore, metadata provides the basis for supervised machine learning algorithms using labeled data and indexing Next Generation Sequencing datasets into public repositories to support database queries and data discovery. [...]metadata is key for making data Findable, Accessible, Interoperable, and Reusable (FAIR) [1]. Several large-scale sequencing projects, such as the Functional Annotation of the Mammalian Genome (FANTOM5) [13], Encyclopedia of DNA Elements (ENCODE) [14], and the Danio Rerio Encyclopedia of DNA Elements (DANIO-CODE) [15], have established additional metadata models to customarily describe their data in a systematic way that allows for integrative analysis of disparate datasets. Under each section, we defined weights on the terms such as required (e.g., biosample type), conditionally required (e.g., target of a chromatin immunoprecipitation sequencing (ChIP-seq assay)), and optional terms (e.g., chemistry version used for sequencing).
ISSN:	1553-7358 1553-734X 1553-7358
DOI:	10.1371/journal.pcbi.1008260