Voice-Assisted Image Labeling for Endoscopic Ultrasound Classification Using Neural Networks

Ultrasound imaging is a commonly used technology for visualising patient anatomy in real-time during diagnostic and therapeutic procedures. High operator dependency and low reproducibility make ultrasound imaging and interpretation challenging with a steep learning curve. Automatic image classificat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on medical imaging 2022-06, Vol.41 (6), p.1311-1319
Hauptverfasser: Bonmati, Ester, Hu, Yipeng, Grimwood, Alexander, Johnson, Gavin J., Goodchild, George, Keane, Margaret G., Gurusamy, Kurinchi, Davidson, Brian, Clarkson, Matthew J., Pereira, Stephen P., Barratt, Dean C.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1319
container_issue 6
container_start_page 1311
container_title IEEE transactions on medical imaging
container_volume 41
creator Bonmati, Ester
Hu, Yipeng
Grimwood, Alexander
Johnson, Gavin J.
Goodchild, George
Keane, Margaret G.
Gurusamy, Kurinchi
Davidson, Brian
Clarkson, Matthew J.
Pereira, Stephen P.
Barratt, Dean C.
description Ultrasound imaging is a commonly used technology for visualising patient anatomy in real-time during diagnostic and therapeutic procedures. High operator dependency and low reproducibility make ultrasound imaging and interpretation challenging with a steep learning curve. Automatic image classification using deep learning has the potential to overcome some of these challenges by supporting ultrasound training in novices, as well as aiding ultrasound image interpretation in patient with complex pathology for more experienced practitioners. However, the use of deep learning methods requires a large amount of data in order to provide accurate results. Labelling large ultrasound datasets is a challenging task because labels are retrospectively assigned to 2D images without the 3D spatial context available in vivo or that would be inferred while visually tracking structures between frames during the procedure. In this work, we propose a multi-modal convolutional neural network (CNN) architecture that labels endoscopic ultrasound (EUS) images from raw verbal comments provided by a clinician during the procedure. We use a CNN composed of two branches, one for voice data and another for image data, which are joined to predict image labels from the spoken names of anatomical landmarks. The network was trained using recorded verbal comments from expert operators. Our results show a prediction accuracy of 76% at image level on a dataset with 5 different labels. We conclude that the addition of spoken commentaries can increase the performance of ultrasound image classification, and eliminate the burden of manually labelling large EUS datasets necessary for deep learning applications.
doi_str_mv 10.1109/TMI.2021.3139023
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_9663415</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9663415</ieee_id><sourcerecordid>2615300065</sourcerecordid><originalsourceid>FETCH-LOGICAL-c347t-7f14891b15e91b02990865ce4d0d88b0f8212f856cac1d98c2ebac95867658d63</originalsourceid><addsrcrecordid>eNpdkE1LxDAQhoMoun7cBUEKXrx0naRNmhxlWXVh1YsrHoSSplOJdps1aRH_vVl29eBl3sM87zA8hJxSGFMK6urpfjZmwOg4o5kClu2QEeVcpoznL7tkBKyQKYBgB-QwhHcAmnNQ--Qgy5VgUogReX121mB6HYINPdbJbKnfMJnrClvbvSWN88m0q10wbmVNsmh7r4MbujqZtDp2Gmt0b12XLMIaf8DB6zZG_-X8Rzgme41uA55s84gsbqZPk7t0_ng7m1zPU5PlRZ8WDc2lohXlGCcwpUAKbjCvoZaygkYyyhrJhdGG1koahpU2iktRCC5rkR2Ry83dlXefA4a-XNpgsG11h24IJROUZxBF8Ihe_EPf3eC7-F2kCgZSAYVIwYYy3oXgsSlX3i61_y4plGvzZTRfrs2XW_Oxcr49PFRLrP8Kv6ojcLYBLCL-rVXc5PG7H9auhl0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2672089010</pqid></control><display><type>article</type><title>Voice-Assisted Image Labeling for Endoscopic Ultrasound Classification Using Neural Networks</title><source>IEEE Electronic Library (IEL)</source><creator>Bonmati, Ester ; Hu, Yipeng ; Grimwood, Alexander ; Johnson, Gavin J. ; Goodchild, George ; Keane, Margaret G. ; Gurusamy, Kurinchi ; Davidson, Brian ; Clarkson, Matthew J. ; Pereira, Stephen P. ; Barratt, Dean C.</creator><creatorcontrib>Bonmati, Ester ; Hu, Yipeng ; Grimwood, Alexander ; Johnson, Gavin J. ; Goodchild, George ; Keane, Margaret G. ; Gurusamy, Kurinchi ; Davidson, Brian ; Clarkson, Matthew J. ; Pereira, Stephen P. ; Barratt, Dean C.</creatorcontrib><description>Ultrasound imaging is a commonly used technology for visualising patient anatomy in real-time during diagnostic and therapeutic procedures. High operator dependency and low reproducibility make ultrasound imaging and interpretation challenging with a steep learning curve. Automatic image classification using deep learning has the potential to overcome some of these challenges by supporting ultrasound training in novices, as well as aiding ultrasound image interpretation in patient with complex pathology for more experienced practitioners. However, the use of deep learning methods requires a large amount of data in order to provide accurate results. Labelling large ultrasound datasets is a challenging task because labels are retrospectively assigned to 2D images without the 3D spatial context available in vivo or that would be inferred while visually tracking structures between frames during the procedure. In this work, we propose a multi-modal convolutional neural network (CNN) architecture that labels endoscopic ultrasound (EUS) images from raw verbal comments provided by a clinician during the procedure. We use a CNN composed of two branches, one for voice data and another for image data, which are joined to predict image labels from the spoken names of anatomical landmarks. The network was trained using recorded verbal comments from expert operators. Our results show a prediction accuracy of 76% at image level on a dataset with 5 different labels. We conclude that the addition of spoken commentaries can increase the performance of ultrasound image classification, and eliminate the burden of manually labelling large EUS datasets necessary for deep learning applications.</description><identifier>ISSN: 0278-0062</identifier><identifier>EISSN: 1558-254X</identifier><identifier>DOI: 10.1109/TMI.2021.3139023</identifier><identifier>PMID: 34962866</identifier><identifier>CODEN: ITMID4</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Artificial neural networks ; Automatic labeling ; Classification ; Datasets ; Deep learning ; Endoscopy ; Humans ; Image classification ; Labeling ; Labelling ; Labels ; Learning curves ; Machine learning ; Neural networks ; Neural Networks, Computer ; Real-time systems ; Reproducibility of Results ; Retrospective Studies ; Task analysis ; Training ; Ultrasonic imaging ; Ultrasonography ; Ultrasound ; voice</subject><ispartof>IEEE transactions on medical imaging, 2022-06, Vol.41 (6), p.1311-1319</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c347t-7f14891b15e91b02990865ce4d0d88b0f8212f856cac1d98c2ebac95867658d63</citedby><cites>FETCH-LOGICAL-c347t-7f14891b15e91b02990865ce4d0d88b0f8212f856cac1d98c2ebac95867658d63</cites><orcidid>0000-0003-4902-0486 ; 0000-0003-0821-1809 ; 0000-0002-9152-5907 ; 0000-0001-9217-5438 ; 0000-0002-5565-1252 ; 0000-0003-2916-655X ; 0000-0002-0313-9134</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9663415$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54736</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9663415$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34962866$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Bonmati, Ester</creatorcontrib><creatorcontrib>Hu, Yipeng</creatorcontrib><creatorcontrib>Grimwood, Alexander</creatorcontrib><creatorcontrib>Johnson, Gavin J.</creatorcontrib><creatorcontrib>Goodchild, George</creatorcontrib><creatorcontrib>Keane, Margaret G.</creatorcontrib><creatorcontrib>Gurusamy, Kurinchi</creatorcontrib><creatorcontrib>Davidson, Brian</creatorcontrib><creatorcontrib>Clarkson, Matthew J.</creatorcontrib><creatorcontrib>Pereira, Stephen P.</creatorcontrib><creatorcontrib>Barratt, Dean C.</creatorcontrib><title>Voice-Assisted Image Labeling for Endoscopic Ultrasound Classification Using Neural Networks</title><title>IEEE transactions on medical imaging</title><addtitle>TMI</addtitle><addtitle>IEEE Trans Med Imaging</addtitle><description>Ultrasound imaging is a commonly used technology for visualising patient anatomy in real-time during diagnostic and therapeutic procedures. High operator dependency and low reproducibility make ultrasound imaging and interpretation challenging with a steep learning curve. Automatic image classification using deep learning has the potential to overcome some of these challenges by supporting ultrasound training in novices, as well as aiding ultrasound image interpretation in patient with complex pathology for more experienced practitioners. However, the use of deep learning methods requires a large amount of data in order to provide accurate results. Labelling large ultrasound datasets is a challenging task because labels are retrospectively assigned to 2D images without the 3D spatial context available in vivo or that would be inferred while visually tracking structures between frames during the procedure. In this work, we propose a multi-modal convolutional neural network (CNN) architecture that labels endoscopic ultrasound (EUS) images from raw verbal comments provided by a clinician during the procedure. We use a CNN composed of two branches, one for voice data and another for image data, which are joined to predict image labels from the spoken names of anatomical landmarks. The network was trained using recorded verbal comments from expert operators. Our results show a prediction accuracy of 76% at image level on a dataset with 5 different labels. We conclude that the addition of spoken commentaries can increase the performance of ultrasound image classification, and eliminate the burden of manually labelling large EUS datasets necessary for deep learning applications.</description><subject>Artificial neural networks</subject><subject>Automatic labeling</subject><subject>Classification</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Endoscopy</subject><subject>Humans</subject><subject>Image classification</subject><subject>Labeling</subject><subject>Labelling</subject><subject>Labels</subject><subject>Learning curves</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Neural Networks, Computer</subject><subject>Real-time systems</subject><subject>Reproducibility of Results</subject><subject>Retrospective Studies</subject><subject>Task analysis</subject><subject>Training</subject><subject>Ultrasonic imaging</subject><subject>Ultrasonography</subject><subject>Ultrasound</subject><subject>voice</subject><issn>0278-0062</issn><issn>1558-254X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNpdkE1LxDAQhoMoun7cBUEKXrx0naRNmhxlWXVh1YsrHoSSplOJdps1aRH_vVl29eBl3sM87zA8hJxSGFMK6urpfjZmwOg4o5kClu2QEeVcpoznL7tkBKyQKYBgB-QwhHcAmnNQ--Qgy5VgUogReX121mB6HYINPdbJbKnfMJnrClvbvSWN88m0q10wbmVNsmh7r4MbujqZtDp2Gmt0b12XLMIaf8DB6zZG_-X8Rzgme41uA55s84gsbqZPk7t0_ng7m1zPU5PlRZ8WDc2lohXlGCcwpUAKbjCvoZaygkYyyhrJhdGG1koahpU2iktRCC5rkR2Ry83dlXefA4a-XNpgsG11h24IJROUZxBF8Ihe_EPf3eC7-F2kCgZSAYVIwYYy3oXgsSlX3i61_y4plGvzZTRfrs2XW_Oxcr49PFRLrP8Kv6ojcLYBLCL-rVXc5PG7H9auhl0</recordid><startdate>20220601</startdate><enddate>20220601</enddate><creator>Bonmati, Ester</creator><creator>Hu, Yipeng</creator><creator>Grimwood, Alexander</creator><creator>Johnson, Gavin J.</creator><creator>Goodchild, George</creator><creator>Keane, Margaret G.</creator><creator>Gurusamy, Kurinchi</creator><creator>Davidson, Brian</creator><creator>Clarkson, Matthew J.</creator><creator>Pereira, Stephen P.</creator><creator>Barratt, Dean C.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>NAPCQ</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-4902-0486</orcidid><orcidid>https://orcid.org/0000-0003-0821-1809</orcidid><orcidid>https://orcid.org/0000-0002-9152-5907</orcidid><orcidid>https://orcid.org/0000-0001-9217-5438</orcidid><orcidid>https://orcid.org/0000-0002-5565-1252</orcidid><orcidid>https://orcid.org/0000-0003-2916-655X</orcidid><orcidid>https://orcid.org/0000-0002-0313-9134</orcidid></search><sort><creationdate>20220601</creationdate><title>Voice-Assisted Image Labeling for Endoscopic Ultrasound Classification Using Neural Networks</title><author>Bonmati, Ester ; Hu, Yipeng ; Grimwood, Alexander ; Johnson, Gavin J. ; Goodchild, George ; Keane, Margaret G. ; Gurusamy, Kurinchi ; Davidson, Brian ; Clarkson, Matthew J. ; Pereira, Stephen P. ; Barratt, Dean C.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c347t-7f14891b15e91b02990865ce4d0d88b0f8212f856cac1d98c2ebac95867658d63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Artificial neural networks</topic><topic>Automatic labeling</topic><topic>Classification</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Endoscopy</topic><topic>Humans</topic><topic>Image classification</topic><topic>Labeling</topic><topic>Labelling</topic><topic>Labels</topic><topic>Learning curves</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Neural Networks, Computer</topic><topic>Real-time systems</topic><topic>Reproducibility of Results</topic><topic>Retrospective Studies</topic><topic>Task analysis</topic><topic>Training</topic><topic>Ultrasonic imaging</topic><topic>Ultrasonography</topic><topic>Ultrasound</topic><topic>voice</topic><toplevel>online_resources</toplevel><creatorcontrib>Bonmati, Ester</creatorcontrib><creatorcontrib>Hu, Yipeng</creatorcontrib><creatorcontrib>Grimwood, Alexander</creatorcontrib><creatorcontrib>Johnson, Gavin J.</creatorcontrib><creatorcontrib>Goodchild, George</creatorcontrib><creatorcontrib>Keane, Margaret G.</creatorcontrib><creatorcontrib>Gurusamy, Kurinchi</creatorcontrib><creatorcontrib>Davidson, Brian</creatorcontrib><creatorcontrib>Clarkson, Matthew J.</creatorcontrib><creatorcontrib>Pereira, Stephen P.</creatorcontrib><creatorcontrib>Barratt, Dean C.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on medical imaging</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Bonmati, Ester</au><au>Hu, Yipeng</au><au>Grimwood, Alexander</au><au>Johnson, Gavin J.</au><au>Goodchild, George</au><au>Keane, Margaret G.</au><au>Gurusamy, Kurinchi</au><au>Davidson, Brian</au><au>Clarkson, Matthew J.</au><au>Pereira, Stephen P.</au><au>Barratt, Dean C.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Voice-Assisted Image Labeling for Endoscopic Ultrasound Classification Using Neural Networks</atitle><jtitle>IEEE transactions on medical imaging</jtitle><stitle>TMI</stitle><addtitle>IEEE Trans Med Imaging</addtitle><date>2022-06-01</date><risdate>2022</risdate><volume>41</volume><issue>6</issue><spage>1311</spage><epage>1319</epage><pages>1311-1319</pages><issn>0278-0062</issn><eissn>1558-254X</eissn><coden>ITMID4</coden><abstract>Ultrasound imaging is a commonly used technology for visualising patient anatomy in real-time during diagnostic and therapeutic procedures. High operator dependency and low reproducibility make ultrasound imaging and interpretation challenging with a steep learning curve. Automatic image classification using deep learning has the potential to overcome some of these challenges by supporting ultrasound training in novices, as well as aiding ultrasound image interpretation in patient with complex pathology for more experienced practitioners. However, the use of deep learning methods requires a large amount of data in order to provide accurate results. Labelling large ultrasound datasets is a challenging task because labels are retrospectively assigned to 2D images without the 3D spatial context available in vivo or that would be inferred while visually tracking structures between frames during the procedure. In this work, we propose a multi-modal convolutional neural network (CNN) architecture that labels endoscopic ultrasound (EUS) images from raw verbal comments provided by a clinician during the procedure. We use a CNN composed of two branches, one for voice data and another for image data, which are joined to predict image labels from the spoken names of anatomical landmarks. The network was trained using recorded verbal comments from expert operators. Our results show a prediction accuracy of 76% at image level on a dataset with 5 different labels. We conclude that the addition of spoken commentaries can increase the performance of ultrasound image classification, and eliminate the burden of manually labelling large EUS datasets necessary for deep learning applications.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>34962866</pmid><doi>10.1109/TMI.2021.3139023</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0003-4902-0486</orcidid><orcidid>https://orcid.org/0000-0003-0821-1809</orcidid><orcidid>https://orcid.org/0000-0002-9152-5907</orcidid><orcidid>https://orcid.org/0000-0001-9217-5438</orcidid><orcidid>https://orcid.org/0000-0002-5565-1252</orcidid><orcidid>https://orcid.org/0000-0003-2916-655X</orcidid><orcidid>https://orcid.org/0000-0002-0313-9134</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0278-0062
ispartof IEEE transactions on medical imaging, 2022-06, Vol.41 (6), p.1311-1319
issn 0278-0062
1558-254X
language eng
recordid cdi_ieee_primary_9663415
source IEEE Electronic Library (IEL)
subjects Artificial neural networks
Automatic labeling
Classification
Datasets
Deep learning
Endoscopy
Humans
Image classification
Labeling
Labelling
Labels
Learning curves
Machine learning
Neural networks
Neural Networks, Computer
Real-time systems
Reproducibility of Results
Retrospective Studies
Task analysis
Training
Ultrasonic imaging
Ultrasonography
Ultrasound
voice
title Voice-Assisted Image Labeling for Endoscopic Ultrasound Classification Using Neural Networks
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T20%3A28%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Voice-Assisted%20Image%20Labeling%20for%20Endoscopic%20Ultrasound%20Classification%20Using%20Neural%20Networks&rft.jtitle=IEEE%20transactions%20on%20medical%20imaging&rft.au=Bonmati,%20Ester&rft.date=2022-06-01&rft.volume=41&rft.issue=6&rft.spage=1311&rft.epage=1319&rft.pages=1311-1319&rft.issn=0278-0062&rft.eissn=1558-254X&rft.coden=ITMID4&rft_id=info:doi/10.1109/TMI.2021.3139023&rft_dat=%3Cproquest_RIE%3E2615300065%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2672089010&rft_id=info:pmid/34962866&rft_ieee_id=9663415&rfr_iscdi=true