Inconsistency in the use of the term "validation" in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging

The development of deep learning (DL) algorithms is a three-step process-training, tuning, and testing. Studies are inconsistent in the use of the term "validation", with some using it to refer to tuning and others testing, which hinders accurate delivery of information and may inadvertent...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PloS one 2020-09, Vol.15 (9), p.e0238908-e0238908
Hauptverfasser: Kim, Dong Wook, Jang, Hye Young, Ko, Yousun, Son, Jung Hee, Kim, Pyeong Hwa, Kim, Seon-Ok, Lim, Joon Seo, Park, Seong Ho
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page e0238908
container_issue 9
container_start_page e0238908
container_title PloS one
container_volume 15
creator Kim, Dong Wook
Jang, Hye Young
Ko, Yousun
Son, Jung Hee
Kim, Pyeong Hwa
Kim, Seon-Ok
Lim, Joon Seo
Park, Seong Ho
description The development of deep learning (DL) algorithms is a three-step process-training, tuning, and testing. Studies are inconsistent in the use of the term "validation", with some using it to refer to tuning and others testing, which hinders accurate delivery of information and may inadvertently exaggerate the performance of DL algorithms. We investigated the extent of inconsistency in usage of the term "validation" in studies on the accuracy of DL algorithms in providing diagnosis from medical imaging. We analyzed the full texts of research papers cited in two recent systematic reviews. The papers were categorized according to whether the term "validation" was used to refer to tuning alone, both tuning and testing, or testing alone. We analyzed whether paper characteristics (i.e., journal category, field of study, year of print publication, journal impact factor [JIF], and nature of test data) were associated with the usage of the terminology using multivariable logistic regression analysis with generalized estimating equations. Of 201 papers published in 125 journals, 118 (58.7%), 9 (4.5%), and 74 (36.8%) used the term to refer to tuning alone, both tuning and testing, and testing alone, respectively. A weak association was noted between higher JIF and using the term to refer to testing (i.e., testing alone or both tuning and testing) instead of tuning alone (vs. JIF 10: adjusted odds ratio 2.41, P = 0.089). Journal category, field of study, year of print publication, and nature of test data were not significantly associated with the terminology usage. Existing literature has a significant degree of inconsistency in using the term "validation" when referring to the steps in DL algorithm development. Efforts are needed to improve the accuracy and clarity in the terminology usage.
doi_str_mv 10.1371/journal.pone.0238908
format Article
fullrecord <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2441872237</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A635164051</galeid><doaj_id>oai_doaj_org_article_d6e305a60c144552a1ee53dc67fe982b</doaj_id><sourcerecordid>A635164051</sourcerecordid><originalsourceid>FETCH-LOGICAL-c692t-addbd2efa0987bee03ada200292b7f3433dba4a5e295767f13ce0745ae4feb393</originalsourceid><addsrcrecordid>eNqNk1trFDEUxwdRbK1-A9GhgujDrrnM9UUoxctCoeDtNZyZnJnNkkmmSabYL-NnNbPdll3pg-Qh4eR3_ueSnCR5ScmS8pJ-2NjJGdDL0RpcEsarmlSPkmNac7YoGOGP985HyTPvN4TkvCqKp8kRZzXNa0KPkz8r01rjlQ9o2ptUmTSsMZ08prbbHgO6IT29Bq0kBGXN6cz4MEmFPnU4WheU6bfoiK6zbgDTbr0l4phqBGdmAHRvnQrrwc8Co7PXSs52qaA3NiaQds4O6YBStaBTNUAfr58nTzrQHl_s9pPk5-dPP86_Li4uv6zOzy4WbVGzsAApG8mwA1JXZYNIOEhghLCaNWXHM85lAxnkyOq8LMqO8hZJmeWAWYcNr_lJ8vpWd9TWi11rvWBZRquSMV5GYnVLSAsbMbqYoLsRFpTYGqzrBcRWtBqFLJCTHArS0izLcwYUMeeyjYGxrlgTtT7uok1NLLhFExzoA9HDG6PWorfXosyqmH8WBd7tBJy9mtAHMSjfotZg0E7bvBmjtIyFnyRv_kEfrm5H9RALUKazMW47i4qzgue0yEhOI7V8gIpL4qDiN8JORfuBw_sDh8gE_B16mLwXq-_f_p-9_HXIvt1j1wg6rL3V0_xB_SGY3YKts9477O6bTImYx-iuG2IeI7Ebo-j2av-B7p3u5ob_BVKsGxM</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2441872237</pqid></control><display><type>article</type><title>Inconsistency in the use of the term "validation" in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Public Library of Science (PLoS) Journals Open Access</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Kim, Dong Wook ; Jang, Hye Young ; Ko, Yousun ; Son, Jung Hee ; Kim, Pyeong Hwa ; Kim, Seon-Ok ; Lim, Joon Seo ; Park, Seong Ho</creator><contributor>Hong, Julian C.</contributor><creatorcontrib>Kim, Dong Wook ; Jang, Hye Young ; Ko, Yousun ; Son, Jung Hee ; Kim, Pyeong Hwa ; Kim, Seon-Ok ; Lim, Joon Seo ; Park, Seong Ho ; Hong, Julian C.</creatorcontrib><description>The development of deep learning (DL) algorithms is a three-step process-training, tuning, and testing. Studies are inconsistent in the use of the term "validation", with some using it to refer to tuning and others testing, which hinders accurate delivery of information and may inadvertently exaggerate the performance of DL algorithms. We investigated the extent of inconsistency in usage of the term "validation" in studies on the accuracy of DL algorithms in providing diagnosis from medical imaging. We analyzed the full texts of research papers cited in two recent systematic reviews. The papers were categorized according to whether the term "validation" was used to refer to tuning alone, both tuning and testing, or testing alone. We analyzed whether paper characteristics (i.e., journal category, field of study, year of print publication, journal impact factor [JIF], and nature of test data) were associated with the usage of the terminology using multivariable logistic regression analysis with generalized estimating equations. Of 201 papers published in 125 journals, 118 (58.7%), 9 (4.5%), and 74 (36.8%) used the term to refer to tuning alone, both tuning and testing, and testing alone, respectively. A weak association was noted between higher JIF and using the term to refer to testing (i.e., testing alone or both tuning and testing) instead of tuning alone (vs. JIF &lt;5; JIF 5 to 10: adjusted odds ratio 2.11, P = 0.042; JIF &gt;10: adjusted odds ratio 2.41, P = 0.089). Journal category, field of study, year of print publication, and nature of test data were not significantly associated with the terminology usage. Existing literature has a significant degree of inconsistency in using the term "validation" when referring to the steps in DL algorithm development. Efforts are needed to improve the accuracy and clarity in the terminology usage.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0238908</identifier><identifier>PMID: 32915901</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Accuracy ; Algorithms ; Artificial intelligence ; Citation indexes ; Computer and Information Sciences ; Datasets ; Deep learning ; Diagnosis ; Diagnostic imaging ; Diagnostic Imaging - methods ; Humans ; Jargon ; Journal Impact Factor ; Learning algorithms ; Literature reviews ; Machine Learning ; Medical diagnosis ; Medical imaging ; Medical research ; Medical schools ; Medicine ; Medicine and Health Sciences ; Methods ; Periodicals as Topic - standards ; Physical Sciences ; Regression analysis ; Research and Analysis Methods ; Researchers ; Scientific papers ; Systematic review ; Terminology ; Tuning ; Validation Studies as Topic</subject><ispartof>PloS one, 2020-09, Vol.15 (9), p.e0238908-e0238908</ispartof><rights>COPYRIGHT 2020 Public Library of Science</rights><rights>2020 Kim et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2020 Kim et al 2020 Kim et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c692t-addbd2efa0987bee03ada200292b7f3433dba4a5e295767f13ce0745ae4feb393</citedby><cites>FETCH-LOGICAL-c692t-addbd2efa0987bee03ada200292b7f3433dba4a5e295767f13ce0745ae4feb393</cites><orcidid>0000-0002-2420-8709 ; 0000-0002-2181-9555 ; 0000-0002-1257-8315</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7485764/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7485764/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,2102,2928,23866,27924,27925,53791,53793,79600,79601</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/32915901$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Hong, Julian C.</contributor><creatorcontrib>Kim, Dong Wook</creatorcontrib><creatorcontrib>Jang, Hye Young</creatorcontrib><creatorcontrib>Ko, Yousun</creatorcontrib><creatorcontrib>Son, Jung Hee</creatorcontrib><creatorcontrib>Kim, Pyeong Hwa</creatorcontrib><creatorcontrib>Kim, Seon-Ok</creatorcontrib><creatorcontrib>Lim, Joon Seo</creatorcontrib><creatorcontrib>Park, Seong Ho</creatorcontrib><title>Inconsistency in the use of the term "validation" in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging</title><title>PloS one</title><addtitle>PLoS One</addtitle><description>The development of deep learning (DL) algorithms is a three-step process-training, tuning, and testing. Studies are inconsistent in the use of the term "validation", with some using it to refer to tuning and others testing, which hinders accurate delivery of information and may inadvertently exaggerate the performance of DL algorithms. We investigated the extent of inconsistency in usage of the term "validation" in studies on the accuracy of DL algorithms in providing diagnosis from medical imaging. We analyzed the full texts of research papers cited in two recent systematic reviews. The papers were categorized according to whether the term "validation" was used to refer to tuning alone, both tuning and testing, or testing alone. We analyzed whether paper characteristics (i.e., journal category, field of study, year of print publication, journal impact factor [JIF], and nature of test data) were associated with the usage of the terminology using multivariable logistic regression analysis with generalized estimating equations. Of 201 papers published in 125 journals, 118 (58.7%), 9 (4.5%), and 74 (36.8%) used the term to refer to tuning alone, both tuning and testing, and testing alone, respectively. A weak association was noted between higher JIF and using the term to refer to testing (i.e., testing alone or both tuning and testing) instead of tuning alone (vs. JIF &lt;5; JIF 5 to 10: adjusted odds ratio 2.11, P = 0.042; JIF &gt;10: adjusted odds ratio 2.41, P = 0.089). Journal category, field of study, year of print publication, and nature of test data were not significantly associated with the terminology usage. Existing literature has a significant degree of inconsistency in using the term "validation" when referring to the steps in DL algorithm development. Efforts are needed to improve the accuracy and clarity in the terminology usage.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Artificial intelligence</subject><subject>Citation indexes</subject><subject>Computer and Information Sciences</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Diagnosis</subject><subject>Diagnostic imaging</subject><subject>Diagnostic Imaging - methods</subject><subject>Humans</subject><subject>Jargon</subject><subject>Journal Impact Factor</subject><subject>Learning algorithms</subject><subject>Literature reviews</subject><subject>Machine Learning</subject><subject>Medical diagnosis</subject><subject>Medical imaging</subject><subject>Medical research</subject><subject>Medical schools</subject><subject>Medicine</subject><subject>Medicine and Health Sciences</subject><subject>Methods</subject><subject>Periodicals as Topic - standards</subject><subject>Physical Sciences</subject><subject>Regression analysis</subject><subject>Research and Analysis Methods</subject><subject>Researchers</subject><subject>Scientific papers</subject><subject>Systematic review</subject><subject>Terminology</subject><subject>Tuning</subject><subject>Validation Studies as Topic</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqNk1trFDEUxwdRbK1-A9GhgujDrrnM9UUoxctCoeDtNZyZnJnNkkmmSabYL-NnNbPdll3pg-Qh4eR3_ueSnCR5ScmS8pJ-2NjJGdDL0RpcEsarmlSPkmNac7YoGOGP985HyTPvN4TkvCqKp8kRZzXNa0KPkz8r01rjlQ9o2ptUmTSsMZ08prbbHgO6IT29Bq0kBGXN6cz4MEmFPnU4WheU6bfoiK6zbgDTbr0l4phqBGdmAHRvnQrrwc8Co7PXSs52qaA3NiaQds4O6YBStaBTNUAfr58nTzrQHl_s9pPk5-dPP86_Li4uv6zOzy4WbVGzsAApG8mwA1JXZYNIOEhghLCaNWXHM85lAxnkyOq8LMqO8hZJmeWAWYcNr_lJ8vpWd9TWi11rvWBZRquSMV5GYnVLSAsbMbqYoLsRFpTYGqzrBcRWtBqFLJCTHArS0izLcwYUMeeyjYGxrlgTtT7uok1NLLhFExzoA9HDG6PWorfXosyqmH8WBd7tBJy9mtAHMSjfotZg0E7bvBmjtIyFnyRv_kEfrm5H9RALUKazMW47i4qzgue0yEhOI7V8gIpL4qDiN8JORfuBw_sDh8gE_B16mLwXq-_f_p-9_HXIvt1j1wg6rL3V0_xB_SGY3YKts9477O6bTImYx-iuG2IeI7Ebo-j2av-B7p3u5ob_BVKsGxM</recordid><startdate>20200911</startdate><enddate>20200911</enddate><creator>Kim, Dong Wook</creator><creator>Jang, Hye Young</creator><creator>Ko, Yousun</creator><creator>Son, Jung Hee</creator><creator>Kim, Pyeong Hwa</creator><creator>Kim, Seon-Ok</creator><creator>Lim, Joon Seo</creator><creator>Park, Seong Ho</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-2420-8709</orcidid><orcidid>https://orcid.org/0000-0002-2181-9555</orcidid><orcidid>https://orcid.org/0000-0002-1257-8315</orcidid></search><sort><creationdate>20200911</creationdate><title>Inconsistency in the use of the term "validation" in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging</title><author>Kim, Dong Wook ; Jang, Hye Young ; Ko, Yousun ; Son, Jung Hee ; Kim, Pyeong Hwa ; Kim, Seon-Ok ; Lim, Joon Seo ; Park, Seong Ho</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c692t-addbd2efa0987bee03ada200292b7f3433dba4a5e295767f13ce0745ae4feb393</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Artificial intelligence</topic><topic>Citation indexes</topic><topic>Computer and Information Sciences</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Diagnosis</topic><topic>Diagnostic imaging</topic><topic>Diagnostic Imaging - methods</topic><topic>Humans</topic><topic>Jargon</topic><topic>Journal Impact Factor</topic><topic>Learning algorithms</topic><topic>Literature reviews</topic><topic>Machine Learning</topic><topic>Medical diagnosis</topic><topic>Medical imaging</topic><topic>Medical research</topic><topic>Medical schools</topic><topic>Medicine</topic><topic>Medicine and Health Sciences</topic><topic>Methods</topic><topic>Periodicals as Topic - standards</topic><topic>Physical Sciences</topic><topic>Regression analysis</topic><topic>Research and Analysis Methods</topic><topic>Researchers</topic><topic>Scientific papers</topic><topic>Systematic review</topic><topic>Terminology</topic><topic>Tuning</topic><topic>Validation Studies as Topic</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kim, Dong Wook</creatorcontrib><creatorcontrib>Jang, Hye Young</creatorcontrib><creatorcontrib>Ko, Yousun</creatorcontrib><creatorcontrib>Son, Jung Hee</creatorcontrib><creatorcontrib>Kim, Pyeong Hwa</creatorcontrib><creatorcontrib>Kim, Seon-Ok</creatorcontrib><creatorcontrib>Lim, Joon Seo</creatorcontrib><creatorcontrib>Park, Seong Ho</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Opposing Viewpoints in Context (Gale)</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Nursing &amp; Allied Health Database</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological &amp; Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>Agricultural &amp; Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Materials Science Database</collection><collection>Nursing &amp; Allied Health Database (Alumni Edition)</collection><collection>Meteorological &amp; Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Agricultural Science Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials Science Collection</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kim, Dong Wook</au><au>Jang, Hye Young</au><au>Ko, Yousun</au><au>Son, Jung Hee</au><au>Kim, Pyeong Hwa</au><au>Kim, Seon-Ok</au><au>Lim, Joon Seo</au><au>Park, Seong Ho</au><au>Hong, Julian C.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Inconsistency in the use of the term "validation" in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging</atitle><jtitle>PloS one</jtitle><addtitle>PLoS One</addtitle><date>2020-09-11</date><risdate>2020</risdate><volume>15</volume><issue>9</issue><spage>e0238908</spage><epage>e0238908</epage><pages>e0238908-e0238908</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>The development of deep learning (DL) algorithms is a three-step process-training, tuning, and testing. Studies are inconsistent in the use of the term "validation", with some using it to refer to tuning and others testing, which hinders accurate delivery of information and may inadvertently exaggerate the performance of DL algorithms. We investigated the extent of inconsistency in usage of the term "validation" in studies on the accuracy of DL algorithms in providing diagnosis from medical imaging. We analyzed the full texts of research papers cited in two recent systematic reviews. The papers were categorized according to whether the term "validation" was used to refer to tuning alone, both tuning and testing, or testing alone. We analyzed whether paper characteristics (i.e., journal category, field of study, year of print publication, journal impact factor [JIF], and nature of test data) were associated with the usage of the terminology using multivariable logistic regression analysis with generalized estimating equations. Of 201 papers published in 125 journals, 118 (58.7%), 9 (4.5%), and 74 (36.8%) used the term to refer to tuning alone, both tuning and testing, and testing alone, respectively. A weak association was noted between higher JIF and using the term to refer to testing (i.e., testing alone or both tuning and testing) instead of tuning alone (vs. JIF &lt;5; JIF 5 to 10: adjusted odds ratio 2.11, P = 0.042; JIF &gt;10: adjusted odds ratio 2.41, P = 0.089). Journal category, field of study, year of print publication, and nature of test data were not significantly associated with the terminology usage. Existing literature has a significant degree of inconsistency in using the term "validation" when referring to the steps in DL algorithm development. Efforts are needed to improve the accuracy and clarity in the terminology usage.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>32915901</pmid><doi>10.1371/journal.pone.0238908</doi><tpages>e0238908</tpages><orcidid>https://orcid.org/0000-0002-2420-8709</orcidid><orcidid>https://orcid.org/0000-0002-2181-9555</orcidid><orcidid>https://orcid.org/0000-0002-1257-8315</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1932-6203
ispartof PloS one, 2020-09, Vol.15 (9), p.e0238908-e0238908
issn 1932-6203
1932-6203
language eng
recordid cdi_plos_journals_2441872237
source MEDLINE; DOAJ Directory of Open Access Journals; Public Library of Science (PLoS) Journals Open Access; EZB-FREE-00999 freely available EZB journals; PubMed Central; Free Full-Text Journals in Chemistry
subjects Accuracy
Algorithms
Artificial intelligence
Citation indexes
Computer and Information Sciences
Datasets
Deep learning
Diagnosis
Diagnostic imaging
Diagnostic Imaging - methods
Humans
Jargon
Journal Impact Factor
Learning algorithms
Literature reviews
Machine Learning
Medical diagnosis
Medical imaging
Medical research
Medical schools
Medicine
Medicine and Health Sciences
Methods
Periodicals as Topic - standards
Physical Sciences
Regression analysis
Research and Analysis Methods
Researchers
Scientific papers
Systematic review
Terminology
Tuning
Validation Studies as Topic
title Inconsistency in the use of the term "validation" in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T15%3A12%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Inconsistency%20in%20the%20use%20of%20the%20term%20%22validation%22%20in%20studies%20reporting%20the%20performance%20of%20deep%20learning%20algorithms%20in%20providing%20diagnosis%20from%20medical%20imaging&rft.jtitle=PloS%20one&rft.au=Kim,%20Dong%20Wook&rft.date=2020-09-11&rft.volume=15&rft.issue=9&rft.spage=e0238908&rft.epage=e0238908&rft.pages=e0238908-e0238908&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0238908&rft_dat=%3Cgale_plos_%3EA635164051%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2441872237&rft_id=info:pmid/32915901&rft_galeid=A635164051&rft_doaj_id=oai_doaj_org_article_d6e305a60c144552a1ee53dc67fe982b&rfr_iscdi=true