Semi‐supervised training using cooperative labeling of weakly annotated data for nodule detection in chest CT

Purpose Machine learning algorithms are best trained with large quantities of accurately annotated samples. While natural scene images can often be labeled relatively cheaply and at large scale, obtaining accurate annotations for medical images is both time consuming and expensive. In this study, we...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Medical physics (Lancaster) 2023-07, Vol.50 (7), p.4255-4268
Hauptverfasser:	Maynord, Michael, Farhangi, M. Mehdi, Fermüller, Cornelia, Aloimonos, Yiannis, Levine, Gary, Petrick, Nicholas, Sahiner, Berkman, Pezeshk, Aria
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms computer aided detection Humans Image Processing, Computer-Assisted - methods Machine Learning pulmonary nodules semi‐supervised learning Supervised Machine Learning Tomography, X-Ray Computed
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	4268
container_issue	7
container_start_page	4255
container_title	Medical physics (Lancaster)
container_volume	50
creator	Maynord, Michael Farhangi, M. Mehdi Fermüller, Cornelia Aloimonos, Yiannis Levine, Gary Petrick, Nicholas Sahiner, Berkman Pezeshk, Aria
description	Purpose Machine learning algorithms are best trained with large quantities of accurately annotated samples. While natural scene images can often be labeled relatively cheaply and at large scale, obtaining accurate annotations for medical images is both time consuming and expensive. In this study, we propose a cooperative labeling method that allows us to make use of weakly annotated medical imaging data for the training of a machine learning algorithm. As most clinically produced data are weakly‐annotated – produced for use by humans rather than machines and lacking information machine learning depends upon – this approach allows us to incorporate a wider range of clinical data and thereby increase the training set size. Methods Our pseudo‐labeling method consists of multiple stages. In the first stage, a previously established network is trained using a limited number of samples with high‐quality expert‐produced annotations. This network is used to generate annotations for a separate larger dataset that contains only weakly annotated scans. In the second stage, by cross‐checking the two types of annotations against each other, we obtain higher‐fidelity annotations. In the third stage, we extract training data from the weakly annotated scans, and combine it with the fully annotated data, producing a larger training dataset. We use this larger dataset to develop a computer‐aided detection (CADe) system for nodule detection in chest CT. Results We evaluated the proposed approach by presenting the network with different numbers of expert‐annotated scans in training and then testing the CADe using an independent expert‐annotated dataset. We demonstrate that when availability of expert annotations is severely limited, the inclusion of weakly‐labeled data leads to a 5% improvement in the competitive performance metric (CPM), defined as the average of sensitivities at different false‐positive rates. Conclusions Our proposed approach can effectively merge a weakly‐annotated dataset with a small, well‐annotated dataset for algorithm training. This approach can help enlarge limited training data by leveraging the large amount of weakly labeled data typically generated in clinical image interpretation.
doi_str_mv	10.1002/mp.16219
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2765073667</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2765073667</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3219-7248756ed51e2c4509f5e09843d5b96caf2846e642656557cfc0b71a0ae8548d3</originalsourceid><addsrcrecordid>eNp1kE1OwzAQRi0EoqUgcQLkJZuUiWM7yRJV_ElFIFHWketMwJDEwU5adccROCMnIaUFVmw8kufNN5pHyHEI4xCAnVXNOJQsTHfIkPE4CjiDdJcMAVIeMA5iQA68fwEAGQnYJ4NIyghkGg6JfcDKfL5_-K5BtzAec9o6ZWpTP9HOr19tbd9SrVkgLdUcy_WnLegS1Wu5oqqubavafi5XraKFdbS2eVcizbFF3RpbU1NT_Yy-pZPZIdkrVOnxaFtH5PHyYja5DqZ3VzeT82mgo_6OIGY8iYXEXITINBeQFgIhTXiUi3kqtSpYwiVKzqSQQsS60DCPQwUKE8GTPBqR001u4-xb1-_OKuM1lqWq0XY-Y7EUEPce4j9UO-u9wyJrnKmUW2UhZGu9WdVk33p79GSb2s0rzH_BH589EGyApSlx9W9Qdnu_CfwCvyKEng</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2765073667</pqid></control><display><type>article</type><title>Semi‐supervised training using cooperative labeling of weakly annotated data for nodule detection in chest CT</title><source>MEDLINE</source><source>Wiley Online Library All Journals</source><source>Alma/SFX Local Collection</source><creator>Maynord, Michael ; Farhangi, M. Mehdi ; Fermüller, Cornelia ; Aloimonos, Yiannis ; Levine, Gary ; Petrick, Nicholas ; Sahiner, Berkman ; Pezeshk, Aria</creator><creatorcontrib>Maynord, Michael ; Farhangi, M. Mehdi ; Fermüller, Cornelia ; Aloimonos, Yiannis ; Levine, Gary ; Petrick, Nicholas ; Sahiner, Berkman ; Pezeshk, Aria</creatorcontrib><description>Purpose Machine learning algorithms are best trained with large quantities of accurately annotated samples. While natural scene images can often be labeled relatively cheaply and at large scale, obtaining accurate annotations for medical images is both time consuming and expensive. In this study, we propose a cooperative labeling method that allows us to make use of weakly annotated medical imaging data for the training of a machine learning algorithm. As most clinically produced data are weakly‐annotated – produced for use by humans rather than machines and lacking information machine learning depends upon – this approach allows us to incorporate a wider range of clinical data and thereby increase the training set size. Methods Our pseudo‐labeling method consists of multiple stages. In the first stage, a previously established network is trained using a limited number of samples with high‐quality expert‐produced annotations. This network is used to generate annotations for a separate larger dataset that contains only weakly annotated scans. In the second stage, by cross‐checking the two types of annotations against each other, we obtain higher‐fidelity annotations. In the third stage, we extract training data from the weakly annotated scans, and combine it with the fully annotated data, producing a larger training dataset. We use this larger dataset to develop a computer‐aided detection (CADe) system for nodule detection in chest CT. Results We evaluated the proposed approach by presenting the network with different numbers of expert‐annotated scans in training and then testing the CADe using an independent expert‐annotated dataset. We demonstrate that when availability of expert annotations is severely limited, the inclusion of weakly‐labeled data leads to a 5% improvement in the competitive performance metric (CPM), defined as the average of sensitivities at different false‐positive rates. Conclusions Our proposed approach can effectively merge a weakly‐annotated dataset with a small, well‐annotated dataset for algorithm training. This approach can help enlarge limited training data by leveraging the large amount of weakly labeled data typically generated in clinical image interpretation.</description><identifier>ISSN: 0094-2405</identifier><identifier>EISSN: 2473-4209</identifier><identifier>DOI: 10.1002/mp.16219</identifier><identifier>PMID: 36630691</identifier><language>eng</language><publisher>United States</publisher><subject>Algorithms ; computer aided detection ; Humans ; Image Processing, Computer-Assisted - methods ; Machine Learning ; pulmonary nodules ; semi‐supervised learning ; Supervised Machine Learning ; Tomography, X-Ray Computed</subject><ispartof>Medical physics (Lancaster), 2023-07, Vol.50 (7), p.4255-4268</ispartof><rights>2023 American Association of Physicists in Medicine.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3219-7248756ed51e2c4509f5e09843d5b96caf2846e642656557cfc0b71a0ae8548d3</citedby><cites>FETCH-LOGICAL-c3219-7248756ed51e2c4509f5e09843d5b96caf2846e642656557cfc0b71a0ae8548d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fmp.16219$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fmp.16219$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,780,784,1417,27924,27925,45574,45575</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/36630691$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Maynord, Michael</creatorcontrib><creatorcontrib>Farhangi, M. Mehdi</creatorcontrib><creatorcontrib>Fermüller, Cornelia</creatorcontrib><creatorcontrib>Aloimonos, Yiannis</creatorcontrib><creatorcontrib>Levine, Gary</creatorcontrib><creatorcontrib>Petrick, Nicholas</creatorcontrib><creatorcontrib>Sahiner, Berkman</creatorcontrib><creatorcontrib>Pezeshk, Aria</creatorcontrib><title>Semi‐supervised training using cooperative labeling of weakly annotated data for nodule detection in chest CT</title><title>Medical physics (Lancaster)</title><addtitle>Med Phys</addtitle><description>Purpose Machine learning algorithms are best trained with large quantities of accurately annotated samples. While natural scene images can often be labeled relatively cheaply and at large scale, obtaining accurate annotations for medical images is both time consuming and expensive. In this study, we propose a cooperative labeling method that allows us to make use of weakly annotated medical imaging data for the training of a machine learning algorithm. As most clinically produced data are weakly‐annotated – produced for use by humans rather than machines and lacking information machine learning depends upon – this approach allows us to incorporate a wider range of clinical data and thereby increase the training set size. Methods Our pseudo‐labeling method consists of multiple stages. In the first stage, a previously established network is trained using a limited number of samples with high‐quality expert‐produced annotations. This network is used to generate annotations for a separate larger dataset that contains only weakly annotated scans. In the second stage, by cross‐checking the two types of annotations against each other, we obtain higher‐fidelity annotations. In the third stage, we extract training data from the weakly annotated scans, and combine it with the fully annotated data, producing a larger training dataset. We use this larger dataset to develop a computer‐aided detection (CADe) system for nodule detection in chest CT. Results We evaluated the proposed approach by presenting the network with different numbers of expert‐annotated scans in training and then testing the CADe using an independent expert‐annotated dataset. We demonstrate that when availability of expert annotations is severely limited, the inclusion of weakly‐labeled data leads to a 5% improvement in the competitive performance metric (CPM), defined as the average of sensitivities at different false‐positive rates. Conclusions Our proposed approach can effectively merge a weakly‐annotated dataset with a small, well‐annotated dataset for algorithm training. This approach can help enlarge limited training data by leveraging the large amount of weakly labeled data typically generated in clinical image interpretation.</description><subject>Algorithms</subject><subject>computer aided detection</subject><subject>Humans</subject><subject>Image Processing, Computer-Assisted - methods</subject><subject>Machine Learning</subject><subject>pulmonary nodules</subject><subject>semi‐supervised learning</subject><subject>Supervised Machine Learning</subject><subject>Tomography, X-Ray Computed</subject><issn>0094-2405</issn><issn>2473-4209</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp1kE1OwzAQRi0EoqUgcQLkJZuUiWM7yRJV_ElFIFHWketMwJDEwU5adccROCMnIaUFVmw8kufNN5pHyHEI4xCAnVXNOJQsTHfIkPE4CjiDdJcMAVIeMA5iQA68fwEAGQnYJ4NIyghkGg6JfcDKfL5_-K5BtzAec9o6ZWpTP9HOr19tbd9SrVkgLdUcy_WnLegS1Wu5oqqubavafi5XraKFdbS2eVcizbFF3RpbU1NT_Yy-pZPZIdkrVOnxaFtH5PHyYja5DqZ3VzeT82mgo_6OIGY8iYXEXITINBeQFgIhTXiUi3kqtSpYwiVKzqSQQsS60DCPQwUKE8GTPBqR001u4-xb1-_OKuM1lqWq0XY-Y7EUEPce4j9UO-u9wyJrnKmUW2UhZGu9WdVk33p79GSb2s0rzH_BH589EGyApSlx9W9Qdnu_CfwCvyKEng</recordid><startdate>202307</startdate><enddate>202307</enddate><creator>Maynord, Michael</creator><creator>Farhangi, M. Mehdi</creator><creator>Fermüller, Cornelia</creator><creator>Aloimonos, Yiannis</creator><creator>Levine, Gary</creator><creator>Petrick, Nicholas</creator><creator>Sahiner, Berkman</creator><creator>Pezeshk, Aria</creator><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>202307</creationdate><title>Semi‐supervised training using cooperative labeling of weakly annotated data for nodule detection in chest CT</title><author>Maynord, Michael ; Farhangi, M. Mehdi ; Fermüller, Cornelia ; Aloimonos, Yiannis ; Levine, Gary ; Petrick, Nicholas ; Sahiner, Berkman ; Pezeshk, Aria</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3219-7248756ed51e2c4509f5e09843d5b96caf2846e642656557cfc0b71a0ae8548d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>computer aided detection</topic><topic>Humans</topic><topic>Image Processing, Computer-Assisted - methods</topic><topic>Machine Learning</topic><topic>pulmonary nodules</topic><topic>semi‐supervised learning</topic><topic>Supervised Machine Learning</topic><topic>Tomography, X-Ray Computed</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Maynord, Michael</creatorcontrib><creatorcontrib>Farhangi, M. Mehdi</creatorcontrib><creatorcontrib>Fermüller, Cornelia</creatorcontrib><creatorcontrib>Aloimonos, Yiannis</creatorcontrib><creatorcontrib>Levine, Gary</creatorcontrib><creatorcontrib>Petrick, Nicholas</creatorcontrib><creatorcontrib>Sahiner, Berkman</creatorcontrib><creatorcontrib>Pezeshk, Aria</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Medical physics (Lancaster)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Maynord, Michael</au><au>Farhangi, M. Mehdi</au><au>Fermüller, Cornelia</au><au>Aloimonos, Yiannis</au><au>Levine, Gary</au><au>Petrick, Nicholas</au><au>Sahiner, Berkman</au><au>Pezeshk, Aria</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Semi‐supervised training using cooperative labeling of weakly annotated data for nodule detection in chest CT</atitle><jtitle>Medical physics (Lancaster)</jtitle><addtitle>Med Phys</addtitle><date>2023-07</date><risdate>2023</risdate><volume>50</volume><issue>7</issue><spage>4255</spage><epage>4268</epage><pages>4255-4268</pages><issn>0094-2405</issn><eissn>2473-4209</eissn><abstract>Purpose Machine learning algorithms are best trained with large quantities of accurately annotated samples. While natural scene images can often be labeled relatively cheaply and at large scale, obtaining accurate annotations for medical images is both time consuming and expensive. In this study, we propose a cooperative labeling method that allows us to make use of weakly annotated medical imaging data for the training of a machine learning algorithm. As most clinically produced data are weakly‐annotated – produced for use by humans rather than machines and lacking information machine learning depends upon – this approach allows us to incorporate a wider range of clinical data and thereby increase the training set size. Methods Our pseudo‐labeling method consists of multiple stages. In the first stage, a previously established network is trained using a limited number of samples with high‐quality expert‐produced annotations. This network is used to generate annotations for a separate larger dataset that contains only weakly annotated scans. In the second stage, by cross‐checking the two types of annotations against each other, we obtain higher‐fidelity annotations. In the third stage, we extract training data from the weakly annotated scans, and combine it with the fully annotated data, producing a larger training dataset. We use this larger dataset to develop a computer‐aided detection (CADe) system for nodule detection in chest CT. Results We evaluated the proposed approach by presenting the network with different numbers of expert‐annotated scans in training and then testing the CADe using an independent expert‐annotated dataset. We demonstrate that when availability of expert annotations is severely limited, the inclusion of weakly‐labeled data leads to a 5% improvement in the competitive performance metric (CPM), defined as the average of sensitivities at different false‐positive rates. Conclusions Our proposed approach can effectively merge a weakly‐annotated dataset with a small, well‐annotated dataset for algorithm training. This approach can help enlarge limited training data by leveraging the large amount of weakly labeled data typically generated in clinical image interpretation.</abstract><cop>United States</cop><pmid>36630691</pmid><doi>10.1002/mp.16219</doi><tpages>14</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0094-2405
ispartof	Medical physics (Lancaster), 2023-07, Vol.50 (7), p.4255-4268
issn	0094-2405 2473-4209
language	eng
recordid	cdi_proquest_miscellaneous_2765073667
source	MEDLINE; Wiley Online Library All Journals; Alma/SFX Local Collection
subjects	Algorithms computer aided detection Humans Image Processing, Computer-Assisted - methods Machine Learning pulmonary nodules semi‐supervised learning Supervised Machine Learning Tomography, X-Ray Computed
title	Semi‐supervised training using cooperative labeling of weakly annotated data for nodule detection in chest CT
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T01%3A12%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Semi%E2%80%90supervised%20training%20using%20cooperative%20labeling%20of%20weakly%20annotated%20data%20for%20nodule%20detection%20in%20chest%20CT&rft.jtitle=Medical%20physics%20(Lancaster)&rft.au=Maynord,%20Michael&rft.date=2023-07&rft.volume=50&rft.issue=7&rft.spage=4255&rft.epage=4268&rft.pages=4255-4268&rft.issn=0094-2405&rft.eissn=2473-4209&rft_id=info:doi/10.1002/mp.16219&rft_dat=%3Cproquest_cross%3E2765073667%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2765073667&rft_id=info:pmid/36630691&rfr_iscdi=true