Inconsistency in the use of the term "validation" in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging
The development of deep learning (DL) algorithms is a three-step process-training, tuning, and testing. Studies are inconsistent in the use of the term "validation", with some using it to refer to tuning and others testing, which hinders accurate delivery of information and may inadvertent...
Gespeichert in:
Veröffentlicht in: | PloS one 2020-09, Vol.15 (9), p.e0238908-e0238908 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | e0238908 |
---|---|
container_issue | 9 |
container_start_page | e0238908 |
container_title | PloS one |
container_volume | 15 |
creator | Kim, Dong Wook Jang, Hye Young Ko, Yousun Son, Jung Hee Kim, Pyeong Hwa Kim, Seon-Ok Lim, Joon Seo Park, Seong Ho |
description | The development of deep learning (DL) algorithms is a three-step process-training, tuning, and testing. Studies are inconsistent in the use of the term "validation", with some using it to refer to tuning and others testing, which hinders accurate delivery of information and may inadvertently exaggerate the performance of DL algorithms. We investigated the extent of inconsistency in usage of the term "validation" in studies on the accuracy of DL algorithms in providing diagnosis from medical imaging.
We analyzed the full texts of research papers cited in two recent systematic reviews. The papers were categorized according to whether the term "validation" was used to refer to tuning alone, both tuning and testing, or testing alone. We analyzed whether paper characteristics (i.e., journal category, field of study, year of print publication, journal impact factor [JIF], and nature of test data) were associated with the usage of the terminology using multivariable logistic regression analysis with generalized estimating equations. Of 201 papers published in 125 journals, 118 (58.7%), 9 (4.5%), and 74 (36.8%) used the term to refer to tuning alone, both tuning and testing, and testing alone, respectively. A weak association was noted between higher JIF and using the term to refer to testing (i.e., testing alone or both tuning and testing) instead of tuning alone (vs. JIF 10: adjusted odds ratio 2.41, P = 0.089). Journal category, field of study, year of print publication, and nature of test data were not significantly associated with the terminology usage.
Existing literature has a significant degree of inconsistency in using the term "validation" when referring to the steps in DL algorithm development. Efforts are needed to improve the accuracy and clarity in the terminology usage. |
doi_str_mv | 10.1371/journal.pone.0238908 |
format | Article |
fullrecord | <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2441872237</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A635164051</galeid><doaj_id>oai_doaj_org_article_d6e305a60c144552a1ee53dc67fe982b</doaj_id><sourcerecordid>A635164051</sourcerecordid><originalsourceid>FETCH-LOGICAL-c692t-addbd2efa0987bee03ada200292b7f3433dba4a5e295767f13ce0745ae4feb393</originalsourceid><addsrcrecordid>eNqNk1trFDEUxwdRbK1-A9GhgujDrrnM9UUoxctCoeDtNZyZnJnNkkmmSabYL-NnNbPdll3pg-Qh4eR3_ueSnCR5ScmS8pJ-2NjJGdDL0RpcEsarmlSPkmNac7YoGOGP985HyTPvN4TkvCqKp8kRZzXNa0KPkz8r01rjlQ9o2ptUmTSsMZ08prbbHgO6IT29Bq0kBGXN6cz4MEmFPnU4WheU6bfoiK6zbgDTbr0l4phqBGdmAHRvnQrrwc8Co7PXSs52qaA3NiaQds4O6YBStaBTNUAfr58nTzrQHl_s9pPk5-dPP86_Li4uv6zOzy4WbVGzsAApG8mwA1JXZYNIOEhghLCaNWXHM85lAxnkyOq8LMqO8hZJmeWAWYcNr_lJ8vpWd9TWi11rvWBZRquSMV5GYnVLSAsbMbqYoLsRFpTYGqzrBcRWtBqFLJCTHArS0izLcwYUMeeyjYGxrlgTtT7uok1NLLhFExzoA9HDG6PWorfXosyqmH8WBd7tBJy9mtAHMSjfotZg0E7bvBmjtIyFnyRv_kEfrm5H9RALUKazMW47i4qzgue0yEhOI7V8gIpL4qDiN8JORfuBw_sDh8gE_B16mLwXq-_f_p-9_HXIvt1j1wg6rL3V0_xB_SGY3YKts9477O6bTImYx-iuG2IeI7Ebo-j2av-B7p3u5ob_BVKsGxM</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2441872237</pqid></control><display><type>article</type><title>Inconsistency in the use of the term "validation" in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Public Library of Science (PLoS) Journals Open Access</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Kim, Dong Wook ; Jang, Hye Young ; Ko, Yousun ; Son, Jung Hee ; Kim, Pyeong Hwa ; Kim, Seon-Ok ; Lim, Joon Seo ; Park, Seong Ho</creator><contributor>Hong, Julian C.</contributor><creatorcontrib>Kim, Dong Wook ; Jang, Hye Young ; Ko, Yousun ; Son, Jung Hee ; Kim, Pyeong Hwa ; Kim, Seon-Ok ; Lim, Joon Seo ; Park, Seong Ho ; Hong, Julian C.</creatorcontrib><description>The development of deep learning (DL) algorithms is a three-step process-training, tuning, and testing. Studies are inconsistent in the use of the term "validation", with some using it to refer to tuning and others testing, which hinders accurate delivery of information and may inadvertently exaggerate the performance of DL algorithms. We investigated the extent of inconsistency in usage of the term "validation" in studies on the accuracy of DL algorithms in providing diagnosis from medical imaging.
We analyzed the full texts of research papers cited in two recent systematic reviews. The papers were categorized according to whether the term "validation" was used to refer to tuning alone, both tuning and testing, or testing alone. We analyzed whether paper characteristics (i.e., journal category, field of study, year of print publication, journal impact factor [JIF], and nature of test data) were associated with the usage of the terminology using multivariable logistic regression analysis with generalized estimating equations. Of 201 papers published in 125 journals, 118 (58.7%), 9 (4.5%), and 74 (36.8%) used the term to refer to tuning alone, both tuning and testing, and testing alone, respectively. A weak association was noted between higher JIF and using the term to refer to testing (i.e., testing alone or both tuning and testing) instead of tuning alone (vs. JIF <5; JIF 5 to 10: adjusted odds ratio 2.11, P = 0.042; JIF >10: adjusted odds ratio 2.41, P = 0.089). Journal category, field of study, year of print publication, and nature of test data were not significantly associated with the terminology usage.
Existing literature has a significant degree of inconsistency in using the term "validation" when referring to the steps in DL algorithm development. Efforts are needed to improve the accuracy and clarity in the terminology usage.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0238908</identifier><identifier>PMID: 32915901</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Accuracy ; Algorithms ; Artificial intelligence ; Citation indexes ; Computer and Information Sciences ; Datasets ; Deep learning ; Diagnosis ; Diagnostic imaging ; Diagnostic Imaging - methods ; Humans ; Jargon ; Journal Impact Factor ; Learning algorithms ; Literature reviews ; Machine Learning ; Medical diagnosis ; Medical imaging ; Medical research ; Medical schools ; Medicine ; Medicine and Health Sciences ; Methods ; Periodicals as Topic - standards ; Physical Sciences ; Regression analysis ; Research and Analysis Methods ; Researchers ; Scientific papers ; Systematic review ; Terminology ; Tuning ; Validation Studies as Topic</subject><ispartof>PloS one, 2020-09, Vol.15 (9), p.e0238908-e0238908</ispartof><rights>COPYRIGHT 2020 Public Library of Science</rights><rights>2020 Kim et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2020 Kim et al 2020 Kim et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c692t-addbd2efa0987bee03ada200292b7f3433dba4a5e295767f13ce0745ae4feb393</citedby><cites>FETCH-LOGICAL-c692t-addbd2efa0987bee03ada200292b7f3433dba4a5e295767f13ce0745ae4feb393</cites><orcidid>0000-0002-2420-8709 ; 0000-0002-2181-9555 ; 0000-0002-1257-8315</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7485764/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7485764/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,2102,2928,23866,27924,27925,53791,53793,79600,79601</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/32915901$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Hong, Julian C.</contributor><creatorcontrib>Kim, Dong Wook</creatorcontrib><creatorcontrib>Jang, Hye Young</creatorcontrib><creatorcontrib>Ko, Yousun</creatorcontrib><creatorcontrib>Son, Jung Hee</creatorcontrib><creatorcontrib>Kim, Pyeong Hwa</creatorcontrib><creatorcontrib>Kim, Seon-Ok</creatorcontrib><creatorcontrib>Lim, Joon Seo</creatorcontrib><creatorcontrib>Park, Seong Ho</creatorcontrib><title>Inconsistency in the use of the term "validation" in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging</title><title>PloS one</title><addtitle>PLoS One</addtitle><description>The development of deep learning (DL) algorithms is a three-step process-training, tuning, and testing. Studies are inconsistent in the use of the term "validation", with some using it to refer to tuning and others testing, which hinders accurate delivery of information and may inadvertently exaggerate the performance of DL algorithms. We investigated the extent of inconsistency in usage of the term "validation" in studies on the accuracy of DL algorithms in providing diagnosis from medical imaging.
We analyzed the full texts of research papers cited in two recent systematic reviews. The papers were categorized according to whether the term "validation" was used to refer to tuning alone, both tuning and testing, or testing alone. We analyzed whether paper characteristics (i.e., journal category, field of study, year of print publication, journal impact factor [JIF], and nature of test data) were associated with the usage of the terminology using multivariable logistic regression analysis with generalized estimating equations. Of 201 papers published in 125 journals, 118 (58.7%), 9 (4.5%), and 74 (36.8%) used the term to refer to tuning alone, both tuning and testing, and testing alone, respectively. A weak association was noted between higher JIF and using the term to refer to testing (i.e., testing alone or both tuning and testing) instead of tuning alone (vs. JIF <5; JIF 5 to 10: adjusted odds ratio 2.11, P = 0.042; JIF >10: adjusted odds ratio 2.41, P = 0.089). Journal category, field of study, year of print publication, and nature of test data were not significantly associated with the terminology usage.
Existing literature has a significant degree of inconsistency in using the term "validation" when referring to the steps in DL algorithm development. Efforts are needed to improve the accuracy and clarity in the terminology usage.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Artificial intelligence</subject><subject>Citation indexes</subject><subject>Computer and Information Sciences</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Diagnosis</subject><subject>Diagnostic imaging</subject><subject>Diagnostic Imaging - methods</subject><subject>Humans</subject><subject>Jargon</subject><subject>Journal Impact Factor</subject><subject>Learning algorithms</subject><subject>Literature reviews</subject><subject>Machine Learning</subject><subject>Medical diagnosis</subject><subject>Medical imaging</subject><subject>Medical research</subject><subject>Medical schools</subject><subject>Medicine</subject><subject>Medicine and Health Sciences</subject><subject>Methods</subject><subject>Periodicals as Topic - standards</subject><subject>Physical Sciences</subject><subject>Regression analysis</subject><subject>Research and Analysis Methods</subject><subject>Researchers</subject><subject>Scientific papers</subject><subject>Systematic review</subject><subject>Terminology</subject><subject>Tuning</subject><subject>Validation Studies as Topic</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqNk1trFDEUxwdRbK1-A9GhgujDrrnM9UUoxctCoeDtNZyZnJnNkkmmSabYL-NnNbPdll3pg-Qh4eR3_ueSnCR5ScmS8pJ-2NjJGdDL0RpcEsarmlSPkmNac7YoGOGP985HyTPvN4TkvCqKp8kRZzXNa0KPkz8r01rjlQ9o2ptUmTSsMZ08prbbHgO6IT29Bq0kBGXN6cz4MEmFPnU4WheU6bfoiK6zbgDTbr0l4phqBGdmAHRvnQrrwc8Co7PXSs52qaA3NiaQds4O6YBStaBTNUAfr58nTzrQHl_s9pPk5-dPP86_Li4uv6zOzy4WbVGzsAApG8mwA1JXZYNIOEhghLCaNWXHM85lAxnkyOq8LMqO8hZJmeWAWYcNr_lJ8vpWd9TWi11rvWBZRquSMV5GYnVLSAsbMbqYoLsRFpTYGqzrBcRWtBqFLJCTHArS0izLcwYUMeeyjYGxrlgTtT7uok1NLLhFExzoA9HDG6PWorfXosyqmH8WBd7tBJy9mtAHMSjfotZg0E7bvBmjtIyFnyRv_kEfrm5H9RALUKazMW47i4qzgue0yEhOI7V8gIpL4qDiN8JORfuBw_sDh8gE_B16mLwXq-_f_p-9_HXIvt1j1wg6rL3V0_xB_SGY3YKts9477O6bTImYx-iuG2IeI7Ebo-j2av-B7p3u5ob_BVKsGxM</recordid><startdate>20200911</startdate><enddate>20200911</enddate><creator>Kim, Dong Wook</creator><creator>Jang, Hye Young</creator><creator>Ko, Yousun</creator><creator>Son, Jung Hee</creator><creator>Kim, Pyeong Hwa</creator><creator>Kim, Seon-Ok</creator><creator>Lim, Joon Seo</creator><creator>Park, Seong Ho</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-2420-8709</orcidid><orcidid>https://orcid.org/0000-0002-2181-9555</orcidid><orcidid>https://orcid.org/0000-0002-1257-8315</orcidid></search><sort><creationdate>20200911</creationdate><title>Inconsistency in the use of the term "validation" in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging</title><author>Kim, Dong Wook ; Jang, Hye Young ; Ko, Yousun ; Son, Jung Hee ; Kim, Pyeong Hwa ; Kim, Seon-Ok ; Lim, Joon Seo ; Park, Seong Ho</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c692t-addbd2efa0987bee03ada200292b7f3433dba4a5e295767f13ce0745ae4feb393</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Artificial intelligence</topic><topic>Citation indexes</topic><topic>Computer and Information Sciences</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Diagnosis</topic><topic>Diagnostic imaging</topic><topic>Diagnostic Imaging - methods</topic><topic>Humans</topic><topic>Jargon</topic><topic>Journal Impact Factor</topic><topic>Learning algorithms</topic><topic>Literature reviews</topic><topic>Machine Learning</topic><topic>Medical diagnosis</topic><topic>Medical imaging</topic><topic>Medical research</topic><topic>Medical schools</topic><topic>Medicine</topic><topic>Medicine and Health Sciences</topic><topic>Methods</topic><topic>Periodicals as Topic - standards</topic><topic>Physical Sciences</topic><topic>Regression analysis</topic><topic>Research and Analysis Methods</topic><topic>Researchers</topic><topic>Scientific papers</topic><topic>Systematic review</topic><topic>Terminology</topic><topic>Tuning</topic><topic>Validation Studies as Topic</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kim, Dong Wook</creatorcontrib><creatorcontrib>Jang, Hye Young</creatorcontrib><creatorcontrib>Ko, Yousun</creatorcontrib><creatorcontrib>Son, Jung Hee</creatorcontrib><creatorcontrib>Kim, Pyeong Hwa</creatorcontrib><creatorcontrib>Kim, Seon-Ok</creatorcontrib><creatorcontrib>Lim, Joon Seo</creatorcontrib><creatorcontrib>Park, Seong Ho</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Opposing Viewpoints in Context (Gale)</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Nursing & Allied Health Database</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological & Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>Agricultural & Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Materials Science Database</collection><collection>Nursing & Allied Health Database (Alumni Edition)</collection><collection>Meteorological & Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Agricultural Science Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Nursing & Allied Health Premium</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials Science Collection</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kim, Dong Wook</au><au>Jang, Hye Young</au><au>Ko, Yousun</au><au>Son, Jung Hee</au><au>Kim, Pyeong Hwa</au><au>Kim, Seon-Ok</au><au>Lim, Joon Seo</au><au>Park, Seong Ho</au><au>Hong, Julian C.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Inconsistency in the use of the term "validation" in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging</atitle><jtitle>PloS one</jtitle><addtitle>PLoS One</addtitle><date>2020-09-11</date><risdate>2020</risdate><volume>15</volume><issue>9</issue><spage>e0238908</spage><epage>e0238908</epage><pages>e0238908-e0238908</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>The development of deep learning (DL) algorithms is a three-step process-training, tuning, and testing. Studies are inconsistent in the use of the term "validation", with some using it to refer to tuning and others testing, which hinders accurate delivery of information and may inadvertently exaggerate the performance of DL algorithms. We investigated the extent of inconsistency in usage of the term "validation" in studies on the accuracy of DL algorithms in providing diagnosis from medical imaging.
We analyzed the full texts of research papers cited in two recent systematic reviews. The papers were categorized according to whether the term "validation" was used to refer to tuning alone, both tuning and testing, or testing alone. We analyzed whether paper characteristics (i.e., journal category, field of study, year of print publication, journal impact factor [JIF], and nature of test data) were associated with the usage of the terminology using multivariable logistic regression analysis with generalized estimating equations. Of 201 papers published in 125 journals, 118 (58.7%), 9 (4.5%), and 74 (36.8%) used the term to refer to tuning alone, both tuning and testing, and testing alone, respectively. A weak association was noted between higher JIF and using the term to refer to testing (i.e., testing alone or both tuning and testing) instead of tuning alone (vs. JIF <5; JIF 5 to 10: adjusted odds ratio 2.11, P = 0.042; JIF >10: adjusted odds ratio 2.41, P = 0.089). Journal category, field of study, year of print publication, and nature of test data were not significantly associated with the terminology usage.
Existing literature has a significant degree of inconsistency in using the term "validation" when referring to the steps in DL algorithm development. Efforts are needed to improve the accuracy and clarity in the terminology usage.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>32915901</pmid><doi>10.1371/journal.pone.0238908</doi><tpages>e0238908</tpages><orcidid>https://orcid.org/0000-0002-2420-8709</orcidid><orcidid>https://orcid.org/0000-0002-2181-9555</orcidid><orcidid>https://orcid.org/0000-0002-1257-8315</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1932-6203 |
ispartof | PloS one, 2020-09, Vol.15 (9), p.e0238908-e0238908 |
issn | 1932-6203 1932-6203 |
language | eng |
recordid | cdi_plos_journals_2441872237 |
source | MEDLINE; DOAJ Directory of Open Access Journals; Public Library of Science (PLoS) Journals Open Access; EZB-FREE-00999 freely available EZB journals; PubMed Central; Free Full-Text Journals in Chemistry |
subjects | Accuracy Algorithms Artificial intelligence Citation indexes Computer and Information Sciences Datasets Deep learning Diagnosis Diagnostic imaging Diagnostic Imaging - methods Humans Jargon Journal Impact Factor Learning algorithms Literature reviews Machine Learning Medical diagnosis Medical imaging Medical research Medical schools Medicine Medicine and Health Sciences Methods Periodicals as Topic - standards Physical Sciences Regression analysis Research and Analysis Methods Researchers Scientific papers Systematic review Terminology Tuning Validation Studies as Topic |
title | Inconsistency in the use of the term "validation" in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T15%3A12%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Inconsistency%20in%20the%20use%20of%20the%20term%20%22validation%22%20in%20studies%20reporting%20the%20performance%20of%20deep%20learning%20algorithms%20in%20providing%20diagnosis%20from%20medical%20imaging&rft.jtitle=PloS%20one&rft.au=Kim,%20Dong%20Wook&rft.date=2020-09-11&rft.volume=15&rft.issue=9&rft.spage=e0238908&rft.epage=e0238908&rft.pages=e0238908-e0238908&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0238908&rft_dat=%3Cgale_plos_%3EA635164051%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2441872237&rft_id=info:pmid/32915901&rft_galeid=A635164051&rft_doaj_id=oai_doaj_org_article_d6e305a60c144552a1ee53dc67fe982b&rfr_iscdi=true |