Ten simple rules for initial data analysis

Typically, researchers do not perform IDA in a systematic way, if at all, or mix IDA activities with subsequent data analysis tasks such as hypothesis generation or exploration, formal analysis, and interpretation of conclusions. The value of an effective IDA strategy for researchers lies in ensurin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PLoS computational biology 2022-02, Vol.18 (2), p.e1009819-e1009819
Hauptverfasser: Baillie, Mark, le Cessie, Saskia, Schmidt, Carsten Oliver, Lusa, Lara, Huebner, Marianne
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page e1009819
container_issue 2
container_start_page e1009819
container_title PLoS computational biology
container_volume 18
creator Baillie, Mark
le Cessie, Saskia
Schmidt, Carsten Oliver
Lusa, Lara
Huebner, Marianne
description Typically, researchers do not perform IDA in a systematic way, if at all, or mix IDA activities with subsequent data analysis tasks such as hypothesis generation or exploration, formal analysis, and interpretation of conclusions. The value of an effective IDA strategy for researchers lies in ensuring that data are of sufficient quality, that model assumptions made in the SAP are satisfied, or to support decisions for the statistical analyses (and are adequately documented). IDA requires domain knowledge, especially researchers with an understanding of why and how the data was measured and collected, expertise in data management and stewardship, competencies in planning and implementing data analysis, and experience of scientific computing practices. Make IDA reproducible IDA is a crucial part of the research pipeline, and as such, it should be well documented to promote transparency, utility, and reproducibility. [...]keeping track of changes that you and your collaborators make to project data, programs (including analysis scripts, libraries, and packages), and documentation (including plans and reports) is a key IDA practice [15].
doi_str_mv 10.1371/journal.pcbi.1009819
format Article
fullrecord <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2640120254</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A695460928</galeid><doaj_id>oai_doaj_org_article_1ec60f7014d942ab816bba7b4ccf7049</doaj_id><sourcerecordid>A695460928</sourcerecordid><originalsourceid>FETCH-LOGICAL-c633t-fa669828afda96c697cddb378501b558360915216d934d1d813c5028b4ac99e03</originalsourceid><addsrcrecordid>eNqVUltrFDEUHsRia_UfiA74osKuuU_yIpTiZaEoaH0OZ5LMmiUzWZMZsf_ebHdauqUvkoeEk-9yzuGrqhcYLTFt8PtNnNIAYbk1rV9ihJTE6lF1gjmni4Zy-fjO-7h6mvMGofJU4kl1TDlBhCp1Ur27dEOdfb8Nrk5TcLnuYqr94EcPobYwQg3F5Sr7_Kw66iBk93y-T6ufnz5enn9ZXHz7vDo_u1gYQem46EAIJYmEzoISRqjGWNvSRnKEW84lFUhhTrCwijKLrcTUcERky8Ao5RA9rV7tdbchZj2PmTURDOHSNmcFsdojbISN3ibfQ7rSEby-LsS01pBGb4LT2BmBugZhZhUj0Eos2haalhlTqkwVrQ-z29T2zho3jAnCgejhz-B_6XX8o6VsEMekCLyZBVL8Pbk86t5n40KAwcVp1zelkklBdl6v70Efnm65R62hDOCHLhZfU451vTdxcJ0v9TOhOCurJLIQ3h4QCmZ0f8c1TDnr1Y_v_4H9eohle6xJMefkututYKR3GbxpX-8yqOcMFtrLuxu9Jd2Ejv4DvpDVyQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2640120254</pqid></control><display><type>article</type><title>Ten simple rules for initial data analysis</title><source>Public Library of Science (PLoS) Journals Open Access</source><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><creator>Baillie, Mark ; le Cessie, Saskia ; Schmidt, Carsten Oliver ; Lusa, Lara ; Huebner, Marianne</creator><creatorcontrib>Baillie, Mark ; le Cessie, Saskia ; Schmidt, Carsten Oliver ; Lusa, Lara ; Huebner, Marianne ; Topic Group “Initial Data Analysis” of the STRATOS Initiative ; for the Topic Group “Initial Data Analysis” of the STRATOS Initiative</creatorcontrib><description>Typically, researchers do not perform IDA in a systematic way, if at all, or mix IDA activities with subsequent data analysis tasks such as hypothesis generation or exploration, formal analysis, and interpretation of conclusions. The value of an effective IDA strategy for researchers lies in ensuring that data are of sufficient quality, that model assumptions made in the SAP are satisfied, or to support decisions for the statistical analyses (and are adequately documented). IDA requires domain knowledge, especially researchers with an understanding of why and how the data was measured and collected, expertise in data management and stewardship, competencies in planning and implementing data analysis, and experience of scientific computing practices. Make IDA reproducible IDA is a crucial part of the research pipeline, and as such, it should be well documented to promote transparency, utility, and reproducibility. [...]keeping track of changes that you and your collaborators make to project data, programs (including analysis scripts, libraries, and packages), and documentation (including plans and reports) is a key IDA practice [15].</description><identifier>ISSN: 1553-7358</identifier><identifier>ISSN: 1553-734X</identifier><identifier>EISSN: 1553-7358</identifier><identifier>DOI: 10.1371/journal.pcbi.1009819</identifier><identifier>PMID: 35202399</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Computer and Information Sciences ; Data Analysis ; Data management ; Data mining ; Decision analysis ; Humans ; Hypotheses ; Information management ; Laws, regulations and rules ; Metadata ; Methods ; Ovarian Neoplasms ; Physical Sciences ; Planning ; Reproducibility ; Research and Analysis Methods ; Researchers ; Science Policy ; Social Sciences ; Statistical analysis ; Subject specialists</subject><ispartof>PLoS computational biology, 2022-02, Vol.18 (2), p.e1009819-e1009819</ispartof><rights>COPYRIGHT 2022 Public Library of Science</rights><rights>2022 Baillie et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2022 Baillie et al 2022 Baillie et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c633t-fa669828afda96c697cddb378501b558360915216d934d1d813c5028b4ac99e03</citedby><cites>FETCH-LOGICAL-c633t-fa669828afda96c697cddb378501b558360915216d934d1d813c5028b4ac99e03</cites><orcidid>0000-0002-8981-2421 ; 0000-0002-9694-9231 ; 0000-0002-5618-0667 ; 0000-0001-5266-9396</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8870512/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8870512/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,315,729,782,786,866,887,2106,2932,23875,27933,27934,53800,53802</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35202399$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Baillie, Mark</creatorcontrib><creatorcontrib>le Cessie, Saskia</creatorcontrib><creatorcontrib>Schmidt, Carsten Oliver</creatorcontrib><creatorcontrib>Lusa, Lara</creatorcontrib><creatorcontrib>Huebner, Marianne</creatorcontrib><creatorcontrib>Topic Group “Initial Data Analysis” of the STRATOS Initiative</creatorcontrib><creatorcontrib>for the Topic Group “Initial Data Analysis” of the STRATOS Initiative</creatorcontrib><title>Ten simple rules for initial data analysis</title><title>PLoS computational biology</title><addtitle>PLoS Comput Biol</addtitle><description>Typically, researchers do not perform IDA in a systematic way, if at all, or mix IDA activities with subsequent data analysis tasks such as hypothesis generation or exploration, formal analysis, and interpretation of conclusions. The value of an effective IDA strategy for researchers lies in ensuring that data are of sufficient quality, that model assumptions made in the SAP are satisfied, or to support decisions for the statistical analyses (and are adequately documented). IDA requires domain knowledge, especially researchers with an understanding of why and how the data was measured and collected, expertise in data management and stewardship, competencies in planning and implementing data analysis, and experience of scientific computing practices. Make IDA reproducible IDA is a crucial part of the research pipeline, and as such, it should be well documented to promote transparency, utility, and reproducibility. [...]keeping track of changes that you and your collaborators make to project data, programs (including analysis scripts, libraries, and packages), and documentation (including plans and reports) is a key IDA practice [15].</description><subject>Computer and Information Sciences</subject><subject>Data Analysis</subject><subject>Data management</subject><subject>Data mining</subject><subject>Decision analysis</subject><subject>Humans</subject><subject>Hypotheses</subject><subject>Information management</subject><subject>Laws, regulations and rules</subject><subject>Metadata</subject><subject>Methods</subject><subject>Ovarian Neoplasms</subject><subject>Physical Sciences</subject><subject>Planning</subject><subject>Reproducibility</subject><subject>Research and Analysis Methods</subject><subject>Researchers</subject><subject>Science Policy</subject><subject>Social Sciences</subject><subject>Statistical analysis</subject><subject>Subject specialists</subject><issn>1553-7358</issn><issn>1553-734X</issn><issn>1553-7358</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqVUltrFDEUHsRia_UfiA74osKuuU_yIpTiZaEoaH0OZ5LMmiUzWZMZsf_ebHdauqUvkoeEk-9yzuGrqhcYLTFt8PtNnNIAYbk1rV9ihJTE6lF1gjmni4Zy-fjO-7h6mvMGofJU4kl1TDlBhCp1Ur27dEOdfb8Nrk5TcLnuYqr94EcPobYwQg3F5Sr7_Kw66iBk93y-T6ufnz5enn9ZXHz7vDo_u1gYQem46EAIJYmEzoISRqjGWNvSRnKEW84lFUhhTrCwijKLrcTUcERky8Ao5RA9rV7tdbchZj2PmTURDOHSNmcFsdojbISN3ibfQ7rSEby-LsS01pBGb4LT2BmBugZhZhUj0Eos2haalhlTqkwVrQ-z29T2zho3jAnCgejhz-B_6XX8o6VsEMekCLyZBVL8Pbk86t5n40KAwcVp1zelkklBdl6v70Efnm65R62hDOCHLhZfU451vTdxcJ0v9TOhOCurJLIQ3h4QCmZ0f8c1TDnr1Y_v_4H9eohle6xJMefkututYKR3GbxpX-8yqOcMFtrLuxu9Jd2Ejv4DvpDVyQ</recordid><startdate>20220201</startdate><enddate>20220201</enddate><creator>Baillie, Mark</creator><creator>le Cessie, Saskia</creator><creator>Schmidt, Carsten Oliver</creator><creator>Lusa, Lara</creator><creator>Huebner, Marianne</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISN</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7QP</scope><scope>7TK</scope><scope>7TM</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>LK8</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-8981-2421</orcidid><orcidid>https://orcid.org/0000-0002-9694-9231</orcidid><orcidid>https://orcid.org/0000-0002-5618-0667</orcidid><orcidid>https://orcid.org/0000-0001-5266-9396</orcidid></search><sort><creationdate>20220201</creationdate><title>Ten simple rules for initial data analysis</title><author>Baillie, Mark ; le Cessie, Saskia ; Schmidt, Carsten Oliver ; Lusa, Lara ; Huebner, Marianne</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c633t-fa669828afda96c697cddb378501b558360915216d934d1d813c5028b4ac99e03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer and Information Sciences</topic><topic>Data Analysis</topic><topic>Data management</topic><topic>Data mining</topic><topic>Decision analysis</topic><topic>Humans</topic><topic>Hypotheses</topic><topic>Information management</topic><topic>Laws, regulations and rules</topic><topic>Metadata</topic><topic>Methods</topic><topic>Ovarian Neoplasms</topic><topic>Physical Sciences</topic><topic>Planning</topic><topic>Reproducibility</topic><topic>Research and Analysis Methods</topic><topic>Researchers</topic><topic>Science Policy</topic><topic>Social Sciences</topic><topic>Statistical analysis</topic><topic>Subject specialists</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Baillie, Mark</creatorcontrib><creatorcontrib>le Cessie, Saskia</creatorcontrib><creatorcontrib>Schmidt, Carsten Oliver</creatorcontrib><creatorcontrib>Lusa, Lara</creatorcontrib><creatorcontrib>Huebner, Marianne</creatorcontrib><creatorcontrib>Topic Group “Initial Data Analysis” of the STRATOS Initiative</creatorcontrib><creatorcontrib>for the Topic Group “Initial Data Analysis” of the STRATOS Initiative</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Canada</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Computing Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PLoS computational biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Baillie, Mark</au><au>le Cessie, Saskia</au><au>Schmidt, Carsten Oliver</au><au>Lusa, Lara</au><au>Huebner, Marianne</au><aucorp>Topic Group “Initial Data Analysis” of the STRATOS Initiative</aucorp><aucorp>for the Topic Group “Initial Data Analysis” of the STRATOS Initiative</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Ten simple rules for initial data analysis</atitle><jtitle>PLoS computational biology</jtitle><addtitle>PLoS Comput Biol</addtitle><date>2022-02-01</date><risdate>2022</risdate><volume>18</volume><issue>2</issue><spage>e1009819</spage><epage>e1009819</epage><pages>e1009819-e1009819</pages><issn>1553-7358</issn><issn>1553-734X</issn><eissn>1553-7358</eissn><abstract>Typically, researchers do not perform IDA in a systematic way, if at all, or mix IDA activities with subsequent data analysis tasks such as hypothesis generation or exploration, formal analysis, and interpretation of conclusions. The value of an effective IDA strategy for researchers lies in ensuring that data are of sufficient quality, that model assumptions made in the SAP are satisfied, or to support decisions for the statistical analyses (and are adequately documented). IDA requires domain knowledge, especially researchers with an understanding of why and how the data was measured and collected, expertise in data management and stewardship, competencies in planning and implementing data analysis, and experience of scientific computing practices. Make IDA reproducible IDA is a crucial part of the research pipeline, and as such, it should be well documented to promote transparency, utility, and reproducibility. [...]keeping track of changes that you and your collaborators make to project data, programs (including analysis scripts, libraries, and packages), and documentation (including plans and reports) is a key IDA practice [15].</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>35202399</pmid><doi>10.1371/journal.pcbi.1009819</doi><orcidid>https://orcid.org/0000-0002-8981-2421</orcidid><orcidid>https://orcid.org/0000-0002-9694-9231</orcidid><orcidid>https://orcid.org/0000-0002-5618-0667</orcidid><orcidid>https://orcid.org/0000-0001-5266-9396</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1553-7358
ispartof PLoS computational biology, 2022-02, Vol.18 (2), p.e1009819-e1009819
issn 1553-7358
1553-734X
1553-7358
language eng
recordid cdi_plos_journals_2640120254
source Public Library of Science (PLoS) Journals Open Access; MEDLINE; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central
subjects Computer and Information Sciences
Data Analysis
Data management
Data mining
Decision analysis
Humans
Hypotheses
Information management
Laws, regulations and rules
Metadata
Methods
Ovarian Neoplasms
Physical Sciences
Planning
Reproducibility
Research and Analysis Methods
Researchers
Science Policy
Social Sciences
Statistical analysis
Subject specialists
title Ten simple rules for initial data analysis
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-01T05%3A40%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Ten%20simple%20rules%20for%20initial%20data%20analysis&rft.jtitle=PLoS%20computational%20biology&rft.au=Baillie,%20Mark&rft.aucorp=Topic%20Group%20%E2%80%9CInitial%20Data%20Analysis%E2%80%9D%20of%20the%20STRATOS%20Initiative&rft.date=2022-02-01&rft.volume=18&rft.issue=2&rft.spage=e1009819&rft.epage=e1009819&rft.pages=e1009819-e1009819&rft.issn=1553-7358&rft.eissn=1553-7358&rft_id=info:doi/10.1371/journal.pcbi.1009819&rft_dat=%3Cgale_plos_%3EA695460928%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2640120254&rft_id=info:pmid/35202399&rft_galeid=A695460928&rft_doaj_id=oai_doaj_org_article_1ec60f7014d942ab816bba7b4ccf7049&rfr_iscdi=true