Automated Detection of Radiology Reports that Require Follow-up Imaging Using Natural Language Processing Feature Engineering and Machine Learning Classification
While radiologists regularly issue follow-up recommendations, our preliminary research has shown that anywhere from 35 to 50% of patients who receive follow-up recommendations for findings of possible cancer on abdominopelvic imaging do not return for follow-up. As such, they remain at risk for adve...
Gespeichert in:
Veröffentlicht in: | Journal of digital imaging 2020-02, Vol.33 (1), p.131-136 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 136 |
---|---|
container_issue | 1 |
container_start_page | 131 |
container_title | Journal of digital imaging |
container_volume | 33 |
creator | Lou, Robert Lalevic, Darco Chambers, Charles Zafar, Hanna M. Cook, Tessa S. |
description | While radiologists regularly issue follow-up recommendations, our preliminary research has shown that anywhere from 35 to 50% of patients who receive follow-up recommendations for findings of possible cancer on abdominopelvic imaging do not return for follow-up. As such, they remain at risk for adverse outcomes related to missed or delayed cancer diagnosis. In this study, we develop an algorithm to automatically detect free text radiology reports that have a follow-up recommendation using natural language processing (NLP) techniques and machine learning models. The data set used in this study consists of 6000 free text reports from the author’s institution. NLP techniques are used to engineer 1500 features, which include the most informative unigrams, bigrams, and trigrams in the training corpus after performing tokenization and Porter stemming. On this data set, we train naive Bayes, decision tree, and maximum entropy models. The decision tree model, with an F1 score of 0.458 and accuracy of 0.862, outperforms both the naive Bayes (F1 score of 0.381) and maximum entropy (F1 score of 0.387) models. The models were analyzed to determine predictive features, with term frequency of
n
-grams such as “renal neoplasm” and “evalu with enhanc” being most predictive of a follow-up recommendation. Key to maximizing performance was feature engineering that extracts predictive information and appropriate selection of machine learning algorithms based on the feature set. |
doi_str_mv | 10.1007/s10278-019-00271-7 |
format | Article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmed_primary_31482317</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2283692097</sourcerecordid><originalsourceid>FETCH-LOGICAL-c474t-60d43937e91bdddb0203154757c0b22207f906663843c8cdc52fdd65818eeecd3</originalsourceid><addsrcrecordid>eNqNkttu1DAQhiMEotvCC3CBLHGDhAI-JLZzg1QtXai0HFRRiTvLa0-yqbL21nao-jh9U5zdshwuEDc-zHz_aH7NFMUzgl8TjMWbSDAVssSkKXF-kVI8KGaEE1kKKr49LGZYNqIkUjZHxXGMVxgTUYvqcXHESCUpI2JW3J2OyW90AoveQQKTeu-Qb9GFtr0ffHeLLmDrQ4oorXXKn-uxD4AWfhj8TTlu0flGd73r0GWczk86jUEPaKldN-oO0JfgDcRdbgFTEtCZywKAMMW0s-ijNuscQEvQwU3B-aCzou2Nnrp5Ujxq9RDh6f19Ulwuzr7OP5TLz-_P56fL0lSiSiXHtmINE9CQlbV2hSlmpK6yYYNXlFIs2gZzzpmsmJHGmpq21vJaEgkAxrKT4u2-7nZcbcAacCk7UdvQb3S4VV736s-M69eq89-VwLwSjOYCL-8LBH89Qkxq00cDw6Ad-DEqSmVVc8YJzuiLv9ArPwaX7U0U4w3FjcgU3VMm-BgDtIdmCFbTBqj9Bqi8AWq3AWoSPf_dxkHyc-QZkHvgBla-jaYHZ-CAYYxr0nDe5Acmct6n3RDmfnQpS1_9vzTTbE_H7TRrCL9M_qP_H91c4Cs</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2283692097</pqid></control><display><type>article</type><title>Automated Detection of Radiology Reports that Require Follow-up Imaging Using Natural Language Processing Feature Engineering and Machine Learning Classification</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Web of Science - Science Citation Index Expanded - 2020<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" /></source><source>PubMed Central</source><creator>Lou, Robert ; Lalevic, Darco ; Chambers, Charles ; Zafar, Hanna M. ; Cook, Tessa S.</creator><creatorcontrib>Lou, Robert ; Lalevic, Darco ; Chambers, Charles ; Zafar, Hanna M. ; Cook, Tessa S.</creatorcontrib><description>While radiologists regularly issue follow-up recommendations, our preliminary research has shown that anywhere from 35 to 50% of patients who receive follow-up recommendations for findings of possible cancer on abdominopelvic imaging do not return for follow-up. As such, they remain at risk for adverse outcomes related to missed or delayed cancer diagnosis. In this study, we develop an algorithm to automatically detect free text radiology reports that have a follow-up recommendation using natural language processing (NLP) techniques and machine learning models. The data set used in this study consists of 6000 free text reports from the author’s institution. NLP techniques are used to engineer 1500 features, which include the most informative unigrams, bigrams, and trigrams in the training corpus after performing tokenization and Porter stemming. On this data set, we train naive Bayes, decision tree, and maximum entropy models. The decision tree model, with an F1 score of 0.458 and accuracy of 0.862, outperforms both the naive Bayes (F1 score of 0.381) and maximum entropy (F1 score of 0.387) models. The models were analyzed to determine predictive features, with term frequency of
n
-grams such as “renal neoplasm” and “evalu with enhanc” being most predictive of a follow-up recommendation. Key to maximizing performance was feature engineering that extracts predictive information and appropriate selection of machine learning algorithms based on the feature set.</description><identifier>ISSN: 0897-1889</identifier><identifier>EISSN: 1618-727X</identifier><identifier>DOI: 10.1007/s10278-019-00271-7</identifier><identifier>PMID: 31482317</identifier><language>eng</language><publisher>Cham: Springer International Publishing</publisher><subject>Algorithms ; Artificial intelligence ; Bayesian analysis ; Cancer ; Datasets ; Decision trees ; Engineering education ; Entropy ; Feature extraction ; Imaging ; Kidney cancer ; Language ; Learning algorithms ; Life Sciences & Biomedicine ; Machine learning ; Maximum entropy ; Medical imaging ; Medicine ; Medicine & Public Health ; Natural language processing ; Original Paper ; Radiology ; Radiology, Nuclear Medicine & Medical Imaging ; Science & Technology</subject><ispartof>Journal of digital imaging, 2020-02, Vol.33 (1), p.131-136</ispartof><rights>Society for Imaging Informatics in Medicine 2019</rights><rights>Journal of Digital Imaging is a copyright of Springer, (2019). All Rights Reserved.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>true</woscitedreferencessubscribed><woscitedreferencescount>32</woscitedreferencescount><woscitedreferencesoriginalsourcerecordid>wos000519669000018</woscitedreferencesoriginalsourcerecordid><citedby>FETCH-LOGICAL-c474t-60d43937e91bdddb0203154757c0b22207f906663843c8cdc52fdd65818eeecd3</citedby><cites>FETCH-LOGICAL-c474t-60d43937e91bdddb0203154757c0b22207f906663843c8cdc52fdd65818eeecd3</cites><orcidid>0000-0003-4723-5416</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7064732/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7064732/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,315,728,781,785,886,27929,27930,28253,53796,53798</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/31482317$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Lou, Robert</creatorcontrib><creatorcontrib>Lalevic, Darco</creatorcontrib><creatorcontrib>Chambers, Charles</creatorcontrib><creatorcontrib>Zafar, Hanna M.</creatorcontrib><creatorcontrib>Cook, Tessa S.</creatorcontrib><title>Automated Detection of Radiology Reports that Require Follow-up Imaging Using Natural Language Processing Feature Engineering and Machine Learning Classification</title><title>Journal of digital imaging</title><addtitle>J Digit Imaging</addtitle><addtitle>J DIGIT IMAGING</addtitle><addtitle>J Digit Imaging</addtitle><description>While radiologists regularly issue follow-up recommendations, our preliminary research has shown that anywhere from 35 to 50% of patients who receive follow-up recommendations for findings of possible cancer on abdominopelvic imaging do not return for follow-up. As such, they remain at risk for adverse outcomes related to missed or delayed cancer diagnosis. In this study, we develop an algorithm to automatically detect free text radiology reports that have a follow-up recommendation using natural language processing (NLP) techniques and machine learning models. The data set used in this study consists of 6000 free text reports from the author’s institution. NLP techniques are used to engineer 1500 features, which include the most informative unigrams, bigrams, and trigrams in the training corpus after performing tokenization and Porter stemming. On this data set, we train naive Bayes, decision tree, and maximum entropy models. The decision tree model, with an F1 score of 0.458 and accuracy of 0.862, outperforms both the naive Bayes (F1 score of 0.381) and maximum entropy (F1 score of 0.387) models. The models were analyzed to determine predictive features, with term frequency of
n
-grams such as “renal neoplasm” and “evalu with enhanc” being most predictive of a follow-up recommendation. Key to maximizing performance was feature engineering that extracts predictive information and appropriate selection of machine learning algorithms based on the feature set.</description><subject>Algorithms</subject><subject>Artificial intelligence</subject><subject>Bayesian analysis</subject><subject>Cancer</subject><subject>Datasets</subject><subject>Decision trees</subject><subject>Engineering education</subject><subject>Entropy</subject><subject>Feature extraction</subject><subject>Imaging</subject><subject>Kidney cancer</subject><subject>Language</subject><subject>Learning algorithms</subject><subject>Life Sciences & Biomedicine</subject><subject>Machine learning</subject><subject>Maximum entropy</subject><subject>Medical imaging</subject><subject>Medicine</subject><subject>Medicine & Public Health</subject><subject>Natural language processing</subject><subject>Original Paper</subject><subject>Radiology</subject><subject>Radiology, Nuclear Medicine & Medical Imaging</subject><subject>Science & Technology</subject><issn>0897-1889</issn><issn>1618-727X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>AOWDO</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNqNkttu1DAQhiMEotvCC3CBLHGDhAI-JLZzg1QtXai0HFRRiTvLa0-yqbL21nao-jh9U5zdshwuEDc-zHz_aH7NFMUzgl8TjMWbSDAVssSkKXF-kVI8KGaEE1kKKr49LGZYNqIkUjZHxXGMVxgTUYvqcXHESCUpI2JW3J2OyW90AoveQQKTeu-Qb9GFtr0ffHeLLmDrQ4oorXXKn-uxD4AWfhj8TTlu0flGd73r0GWczk86jUEPaKldN-oO0JfgDcRdbgFTEtCZywKAMMW0s-ijNuscQEvQwU3B-aCzou2Nnrp5Ujxq9RDh6f19Ulwuzr7OP5TLz-_P56fL0lSiSiXHtmINE9CQlbV2hSlmpK6yYYNXlFIs2gZzzpmsmJHGmpq21vJaEgkAxrKT4u2-7nZcbcAacCk7UdvQb3S4VV736s-M69eq89-VwLwSjOYCL-8LBH89Qkxq00cDw6Ad-DEqSmVVc8YJzuiLv9ArPwaX7U0U4w3FjcgU3VMm-BgDtIdmCFbTBqj9Bqi8AWq3AWoSPf_dxkHyc-QZkHvgBla-jaYHZ-CAYYxr0nDe5Acmct6n3RDmfnQpS1_9vzTTbE_H7TRrCL9M_qP_H91c4Cs</recordid><startdate>20200201</startdate><enddate>20200201</enddate><creator>Lou, Robert</creator><creator>Lalevic, Darco</creator><creator>Chambers, Charles</creator><creator>Zafar, Hanna M.</creator><creator>Cook, Tessa S.</creator><general>Springer International Publishing</general><general>Springer Nature</general><general>Springer Nature B.V</general><scope>AOWDO</scope><scope>BLEPL</scope><scope>DTL</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7QO</scope><scope>7RV</scope><scope>7SC</scope><scope>7TK</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K9.</scope><scope>KB0</scope><scope>L7M</scope><scope>LK8</scope><scope>L~C</scope><scope>L~D</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0003-4723-5416</orcidid></search><sort><creationdate>20200201</creationdate><title>Automated Detection of Radiology Reports that Require Follow-up Imaging Using Natural Language Processing Feature Engineering and Machine Learning Classification</title><author>Lou, Robert ; Lalevic, Darco ; Chambers, Charles ; Zafar, Hanna M. ; Cook, Tessa S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c474t-60d43937e91bdddb0203154757c0b22207f906663843c8cdc52fdd65818eeecd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Artificial intelligence</topic><topic>Bayesian analysis</topic><topic>Cancer</topic><topic>Datasets</topic><topic>Decision trees</topic><topic>Engineering education</topic><topic>Entropy</topic><topic>Feature extraction</topic><topic>Imaging</topic><topic>Kidney cancer</topic><topic>Language</topic><topic>Learning algorithms</topic><topic>Life Sciences & Biomedicine</topic><topic>Machine learning</topic><topic>Maximum entropy</topic><topic>Medical imaging</topic><topic>Medicine</topic><topic>Medicine & Public Health</topic><topic>Natural language processing</topic><topic>Original Paper</topic><topic>Radiology</topic><topic>Radiology, Nuclear Medicine & Medical Imaging</topic><topic>Science & Technology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lou, Robert</creatorcontrib><creatorcontrib>Lalevic, Darco</creatorcontrib><creatorcontrib>Chambers, Charles</creatorcontrib><creatorcontrib>Zafar, Hanna M.</creatorcontrib><creatorcontrib>Cook, Tessa S.</creatorcontrib><collection>Web of Science - Science Citation Index Expanded - 2020</collection><collection>Web of Science Core Collection</collection><collection>Science Citation Index Expanded</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Nursing & Allied Health Database</collection><collection>Computer and Information Systems Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>Proquest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Nursing & Allied Health Database (Alumni Edition)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>ProQuest Biological Science Collection</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Nursing & Allied Health Premium</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Journal of digital imaging</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lou, Robert</au><au>Lalevic, Darco</au><au>Chambers, Charles</au><au>Zafar, Hanna M.</au><au>Cook, Tessa S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Automated Detection of Radiology Reports that Require Follow-up Imaging Using Natural Language Processing Feature Engineering and Machine Learning Classification</atitle><jtitle>Journal of digital imaging</jtitle><stitle>J Digit Imaging</stitle><stitle>J DIGIT IMAGING</stitle><addtitle>J Digit Imaging</addtitle><date>2020-02-01</date><risdate>2020</risdate><volume>33</volume><issue>1</issue><spage>131</spage><epage>136</epage><pages>131-136</pages><issn>0897-1889</issn><eissn>1618-727X</eissn><abstract>While radiologists regularly issue follow-up recommendations, our preliminary research has shown that anywhere from 35 to 50% of patients who receive follow-up recommendations for findings of possible cancer on abdominopelvic imaging do not return for follow-up. As such, they remain at risk for adverse outcomes related to missed or delayed cancer diagnosis. In this study, we develop an algorithm to automatically detect free text radiology reports that have a follow-up recommendation using natural language processing (NLP) techniques and machine learning models. The data set used in this study consists of 6000 free text reports from the author’s institution. NLP techniques are used to engineer 1500 features, which include the most informative unigrams, bigrams, and trigrams in the training corpus after performing tokenization and Porter stemming. On this data set, we train naive Bayes, decision tree, and maximum entropy models. The decision tree model, with an F1 score of 0.458 and accuracy of 0.862, outperforms both the naive Bayes (F1 score of 0.381) and maximum entropy (F1 score of 0.387) models. The models were analyzed to determine predictive features, with term frequency of
n
-grams such as “renal neoplasm” and “evalu with enhanc” being most predictive of a follow-up recommendation. Key to maximizing performance was feature engineering that extracts predictive information and appropriate selection of machine learning algorithms based on the feature set.</abstract><cop>Cham</cop><pub>Springer International Publishing</pub><pmid>31482317</pmid><doi>10.1007/s10278-019-00271-7</doi><tpages>6</tpages><orcidid>https://orcid.org/0000-0003-4723-5416</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0897-1889 |
ispartof | Journal of digital imaging, 2020-02, Vol.33 (1), p.131-136 |
issn | 0897-1889 1618-727X |
language | eng |
recordid | cdi_pubmed_primary_31482317 |
source | Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Web of Science - Science Citation Index Expanded - 2020<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" />; PubMed Central |
subjects | Algorithms Artificial intelligence Bayesian analysis Cancer Datasets Decision trees Engineering education Entropy Feature extraction Imaging Kidney cancer Language Learning algorithms Life Sciences & Biomedicine Machine learning Maximum entropy Medical imaging Medicine Medicine & Public Health Natural language processing Original Paper Radiology Radiology, Nuclear Medicine & Medical Imaging Science & Technology |
title | Automated Detection of Radiology Reports that Require Follow-up Imaging Using Natural Language Processing Feature Engineering and Machine Learning Classification |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-14T17%3A09%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Automated%20Detection%20of%20Radiology%20Reports%20that%20Require%20Follow-up%20Imaging%20Using%20Natural%20Language%20Processing%20Feature%20Engineering%20and%20Machine%20Learning%20Classification&rft.jtitle=Journal%20of%20digital%20imaging&rft.au=Lou,%20Robert&rft.date=2020-02-01&rft.volume=33&rft.issue=1&rft.spage=131&rft.epage=136&rft.pages=131-136&rft.issn=0897-1889&rft.eissn=1618-727X&rft_id=info:doi/10.1007/s10278-019-00271-7&rft_dat=%3Cproquest_pubme%3E2283692097%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2283692097&rft_id=info:pmid/31482317&rfr_iscdi=true |