A proteomics sample metadata representation for multiomics integration and big data analysis

The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Dai, Chengxin, Füllgrabe, Anja, Pfeuffer, Julianus, Solovyeva, Elizaveta M, Deng, Jingwen, Moreno, Pablo, Kamatchinathan, Selvakumar, Kundu, Deepti Jaiswal, George, Nancy, Fexova, Silvie, Grüning, Björn A, Föll, Melanie Christine, Griss, Johannes, Vaudel, Marc, Audain, Enrique, Locard-Paulet, Marie, Turewicz, Michael, Eisenacher, Martin, Uszkoreit, Julian, Van Den Bossche, Tim, Schwämmle, Veit, Webel, Henry, Schulze, Stefan, Bouyssié, David, Jayaram, Savita, Duggineni, Vinay Kumar, Samaras, Patroklos, Wilhelm, Mathias, Choi, Meena, Wang, Mingxun, Kohlbacher, Oliver, Brazma, Alvis, Papatheodorou, Irene, Bandeira, Nuno, Deutsch, Eric W, Vizcaíno, Juan Antonio, Bai, Mingze, Sachsenberg, Timo, Levitsky, Lev I, Perez-Riverol, Yasset
Format: Artikel
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.