Establishment of machine learning-based tool for early detection of pulmonary embolism

Pulmonary embolism (PE) is a complex disease with high mortality and morbidity rate, leading to increasing society burden. However, current diagnosis is solely based on symptoms and laboratory data despite its complex pathology, which easily leads to misdiagnosis and missed diagnosis by inexperience...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer methods and programs in biomedicine 2024-02, Vol.244, p.107977-107977, Article 107977
Hauptverfasser: Liu, Lijue, Li, Yaming, Liu, Na, Luo, Jingmin, Deng, Jinhai, Peng, Weixiong, Bai, Yongping, Zhang, Guogang, Zhao, Guihu, Yang, Ning, Li, Chuanchang, Long, Xueying
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 107977
container_issue
container_start_page 107977
container_title Computer methods and programs in biomedicine
container_volume 244
creator Liu, Lijue
Li, Yaming
Liu, Na
Luo, Jingmin
Deng, Jinhai
Peng, Weixiong
Bai, Yongping
Zhang, Guogang
Zhao, Guihu
Yang, Ning
Li, Chuanchang
Long, Xueying
description Pulmonary embolism (PE) is a complex disease with high mortality and morbidity rate, leading to increasing society burden. However, current diagnosis is solely based on symptoms and laboratory data despite its complex pathology, which easily leads to misdiagnosis and missed diagnosis by inexperienced doctors. Especially, CT pulmonary angiography, the gold standard method, is not widely available. In this study, we aim to establish a rapid and accurate screening model for pulmonary embolism using machine learning technology. Importantly, data required for disease prediction are easily accessed, including routine laboratory data and medical record information of patients. We extracted features from patients' routine laboratory results and medical records, including blood routine, biochemical group, blood coagulation routine and other test results, as well as symptoms and medical history information. Samples with a feature loss rate greater than 0.8 were deleted from the original database. Data from 4723 cases were retained, 231 of which were positive for pulmonary embolism. 50 features were retained through the positive and negative statistical hypothesis testing which was used to build the predictive model. In order to avoid identification as majority-class samples caused by the imbalance of sample proportion, we used the method of Synthetic Minority Oversampling Technique (SMOTE) to increase the amount of information on minority samples. Five typical machine learning algorithms were used to model the screening of pulmonary embolism, including Support Vector Machines, Logistic Regression, Random Forest, XGBoost, and Back Propagation Neural Networks. To evaluate model performance, sensitivity, specificity and AUC curve were analyzed as the main evaluation indicators. Furthermore, a baseline model was established using the characteristics of the pulmonary embolism guidelines as a comparison model. We found that XGBoost showed better performance compared to other models, with the highest sensitivity and specificity (0.99 and 0.99, respectively). Moreover, it showed significant improvement in performance compared to the baseline model (sensitivity and specificity were 0.76 and 0.76 respectively). More important, our model showed low missed diagnosis rate (0.46) and high AUC value (0.992). Finally, the calculation time of our model is only about 0.05 s to obtain the possibility of pulmonary embolism. In this study, five machine learning classification models wer
doi_str_mv 10.1016/j.cmpb.2023.107977
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2904153503</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2904153503</sourcerecordid><originalsourceid>FETCH-LOGICAL-c254t-204efd75e555acc9bd095a136661aac3ea84d28521f3a29cbc84b4193bed1f5b3</originalsourceid><addsrcrecordid>eNo9kDtPwzAUhS0EoqXwBxiQR5YUP2InGVHFS6rEAqyW7dzQVHYc4mTov8dRC9OVjs53dPUhdEvJmhIqH_Zr63uzZoTxFBRVUZyhJS0LlhVCinO0TKUqY5IUC3QV454QwoSQl2jBS0p5SfgSfT3FURvXxp2HbsShwV7bXdsBdqCHru2-M6Mj1HgMweEmDDjF7oBrGMGObehmpJ-cD50eDhi8CWnMX6OLRrsIN6e7Qp_PTx-b12z7_vK2edxmlol8zBjJoakLAUIIbW1lalIJTbmUkmptOegyr1kpGG24ZpU1tsxNTituoKaNMHyF7o-7_RB-Joij8m204JzuIExRsYrkVHBBeKqyY9UOIcYBGtUPrU9PK0rU7FPt1exTzT7V0WeC7k77k_FQ_yN_AvkvNkdzZw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2904153503</pqid></control><display><type>article</type><title>Establishment of machine learning-based tool for early detection of pulmonary embolism</title><source>ScienceDirect Journals (5 years ago - present)</source><creator>Liu, Lijue ; Li, Yaming ; Liu, Na ; Luo, Jingmin ; Deng, Jinhai ; Peng, Weixiong ; Bai, Yongping ; Zhang, Guogang ; Zhao, Guihu ; Yang, Ning ; Li, Chuanchang ; Long, Xueying</creator><creatorcontrib>Liu, Lijue ; Li, Yaming ; Liu, Na ; Luo, Jingmin ; Deng, Jinhai ; Peng, Weixiong ; Bai, Yongping ; Zhang, Guogang ; Zhao, Guihu ; Yang, Ning ; Li, Chuanchang ; Long, Xueying</creatorcontrib><description>Pulmonary embolism (PE) is a complex disease with high mortality and morbidity rate, leading to increasing society burden. However, current diagnosis is solely based on symptoms and laboratory data despite its complex pathology, which easily leads to misdiagnosis and missed diagnosis by inexperienced doctors. Especially, CT pulmonary angiography, the gold standard method, is not widely available. In this study, we aim to establish a rapid and accurate screening model for pulmonary embolism using machine learning technology. Importantly, data required for disease prediction are easily accessed, including routine laboratory data and medical record information of patients. We extracted features from patients' routine laboratory results and medical records, including blood routine, biochemical group, blood coagulation routine and other test results, as well as symptoms and medical history information. Samples with a feature loss rate greater than 0.8 were deleted from the original database. Data from 4723 cases were retained, 231 of which were positive for pulmonary embolism. 50 features were retained through the positive and negative statistical hypothesis testing which was used to build the predictive model. In order to avoid identification as majority-class samples caused by the imbalance of sample proportion, we used the method of Synthetic Minority Oversampling Technique (SMOTE) to increase the amount of information on minority samples. Five typical machine learning algorithms were used to model the screening of pulmonary embolism, including Support Vector Machines, Logistic Regression, Random Forest, XGBoost, and Back Propagation Neural Networks. To evaluate model performance, sensitivity, specificity and AUC curve were analyzed as the main evaluation indicators. Furthermore, a baseline model was established using the characteristics of the pulmonary embolism guidelines as a comparison model. We found that XGBoost showed better performance compared to other models, with the highest sensitivity and specificity (0.99 and 0.99, respectively). Moreover, it showed significant improvement in performance compared to the baseline model (sensitivity and specificity were 0.76 and 0.76 respectively). More important, our model showed low missed diagnosis rate (0.46) and high AUC value (0.992). Finally, the calculation time of our model is only about 0.05 s to obtain the possibility of pulmonary embolism. In this study, five machine learning classification models were established to assess the likelihood of patients suffering from pulmonary embolism, and the XGBoost model most significantly improved the precision, sensitivity, and AUC for pulmonary embolism screening. Collectively, we have established an AI-based model to accurately predict pulmonary embolism at early stage.</description><identifier>ISSN: 0169-2607</identifier><identifier>EISSN: 1872-7565</identifier><identifier>DOI: 10.1016/j.cmpb.2023.107977</identifier><identifier>PMID: 38113803</identifier><language>eng</language><publisher>Ireland</publisher><ispartof>Computer methods and programs in biomedicine, 2024-02, Vol.244, p.107977-107977, Article 107977</ispartof><rights>Copyright © 2023. Published by Elsevier B.V.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c254t-204efd75e555acc9bd095a136661aac3ea84d28521f3a29cbc84b4193bed1f5b3</cites><orcidid>0009-0005-6419-2835 ; 0000-0002-3457-4147 ; 0000-0003-4033-1843</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38113803$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Liu, Lijue</creatorcontrib><creatorcontrib>Li, Yaming</creatorcontrib><creatorcontrib>Liu, Na</creatorcontrib><creatorcontrib>Luo, Jingmin</creatorcontrib><creatorcontrib>Deng, Jinhai</creatorcontrib><creatorcontrib>Peng, Weixiong</creatorcontrib><creatorcontrib>Bai, Yongping</creatorcontrib><creatorcontrib>Zhang, Guogang</creatorcontrib><creatorcontrib>Zhao, Guihu</creatorcontrib><creatorcontrib>Yang, Ning</creatorcontrib><creatorcontrib>Li, Chuanchang</creatorcontrib><creatorcontrib>Long, Xueying</creatorcontrib><title>Establishment of machine learning-based tool for early detection of pulmonary embolism</title><title>Computer methods and programs in biomedicine</title><addtitle>Comput Methods Programs Biomed</addtitle><description>Pulmonary embolism (PE) is a complex disease with high mortality and morbidity rate, leading to increasing society burden. However, current diagnosis is solely based on symptoms and laboratory data despite its complex pathology, which easily leads to misdiagnosis and missed diagnosis by inexperienced doctors. Especially, CT pulmonary angiography, the gold standard method, is not widely available. In this study, we aim to establish a rapid and accurate screening model for pulmonary embolism using machine learning technology. Importantly, data required for disease prediction are easily accessed, including routine laboratory data and medical record information of patients. We extracted features from patients' routine laboratory results and medical records, including blood routine, biochemical group, blood coagulation routine and other test results, as well as symptoms and medical history information. Samples with a feature loss rate greater than 0.8 were deleted from the original database. Data from 4723 cases were retained, 231 of which were positive for pulmonary embolism. 50 features were retained through the positive and negative statistical hypothesis testing which was used to build the predictive model. In order to avoid identification as majority-class samples caused by the imbalance of sample proportion, we used the method of Synthetic Minority Oversampling Technique (SMOTE) to increase the amount of information on minority samples. Five typical machine learning algorithms were used to model the screening of pulmonary embolism, including Support Vector Machines, Logistic Regression, Random Forest, XGBoost, and Back Propagation Neural Networks. To evaluate model performance, sensitivity, specificity and AUC curve were analyzed as the main evaluation indicators. Furthermore, a baseline model was established using the characteristics of the pulmonary embolism guidelines as a comparison model. We found that XGBoost showed better performance compared to other models, with the highest sensitivity and specificity (0.99 and 0.99, respectively). Moreover, it showed significant improvement in performance compared to the baseline model (sensitivity and specificity were 0.76 and 0.76 respectively). More important, our model showed low missed diagnosis rate (0.46) and high AUC value (0.992). Finally, the calculation time of our model is only about 0.05 s to obtain the possibility of pulmonary embolism. In this study, five machine learning classification models were established to assess the likelihood of patients suffering from pulmonary embolism, and the XGBoost model most significantly improved the precision, sensitivity, and AUC for pulmonary embolism screening. Collectively, we have established an AI-based model to accurately predict pulmonary embolism at early stage.</description><issn>0169-2607</issn><issn>1872-7565</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNo9kDtPwzAUhS0EoqXwBxiQR5YUP2InGVHFS6rEAqyW7dzQVHYc4mTov8dRC9OVjs53dPUhdEvJmhIqH_Zr63uzZoTxFBRVUZyhJS0LlhVCinO0TKUqY5IUC3QV454QwoSQl2jBS0p5SfgSfT3FURvXxp2HbsShwV7bXdsBdqCHru2-M6Mj1HgMweEmDDjF7oBrGMGObehmpJ-cD50eDhi8CWnMX6OLRrsIN6e7Qp_PTx-b12z7_vK2edxmlol8zBjJoakLAUIIbW1lalIJTbmUkmptOegyr1kpGG24ZpU1tsxNTituoKaNMHyF7o-7_RB-Joij8m204JzuIExRsYrkVHBBeKqyY9UOIcYBGtUPrU9PK0rU7FPt1exTzT7V0WeC7k77k_FQ_yN_AvkvNkdzZw</recordid><startdate>202402</startdate><enddate>202402</enddate><creator>Liu, Lijue</creator><creator>Li, Yaming</creator><creator>Liu, Na</creator><creator>Luo, Jingmin</creator><creator>Deng, Jinhai</creator><creator>Peng, Weixiong</creator><creator>Bai, Yongping</creator><creator>Zhang, Guogang</creator><creator>Zhao, Guihu</creator><creator>Yang, Ning</creator><creator>Li, Chuanchang</creator><creator>Long, Xueying</creator><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0009-0005-6419-2835</orcidid><orcidid>https://orcid.org/0000-0002-3457-4147</orcidid><orcidid>https://orcid.org/0000-0003-4033-1843</orcidid></search><sort><creationdate>202402</creationdate><title>Establishment of machine learning-based tool for early detection of pulmonary embolism</title><author>Liu, Lijue ; Li, Yaming ; Liu, Na ; Luo, Jingmin ; Deng, Jinhai ; Peng, Weixiong ; Bai, Yongping ; Zhang, Guogang ; Zhao, Guihu ; Yang, Ning ; Li, Chuanchang ; Long, Xueying</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c254t-204efd75e555acc9bd095a136661aac3ea84d28521f3a29cbc84b4193bed1f5b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Lijue</creatorcontrib><creatorcontrib>Li, Yaming</creatorcontrib><creatorcontrib>Liu, Na</creatorcontrib><creatorcontrib>Luo, Jingmin</creatorcontrib><creatorcontrib>Deng, Jinhai</creatorcontrib><creatorcontrib>Peng, Weixiong</creatorcontrib><creatorcontrib>Bai, Yongping</creatorcontrib><creatorcontrib>Zhang, Guogang</creatorcontrib><creatorcontrib>Zhao, Guihu</creatorcontrib><creatorcontrib>Yang, Ning</creatorcontrib><creatorcontrib>Li, Chuanchang</creatorcontrib><creatorcontrib>Long, Xueying</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Computer methods and programs in biomedicine</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Lijue</au><au>Li, Yaming</au><au>Liu, Na</au><au>Luo, Jingmin</au><au>Deng, Jinhai</au><au>Peng, Weixiong</au><au>Bai, Yongping</au><au>Zhang, Guogang</au><au>Zhao, Guihu</au><au>Yang, Ning</au><au>Li, Chuanchang</au><au>Long, Xueying</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Establishment of machine learning-based tool for early detection of pulmonary embolism</atitle><jtitle>Computer methods and programs in biomedicine</jtitle><addtitle>Comput Methods Programs Biomed</addtitle><date>2024-02</date><risdate>2024</risdate><volume>244</volume><spage>107977</spage><epage>107977</epage><pages>107977-107977</pages><artnum>107977</artnum><issn>0169-2607</issn><eissn>1872-7565</eissn><abstract>Pulmonary embolism (PE) is a complex disease with high mortality and morbidity rate, leading to increasing society burden. However, current diagnosis is solely based on symptoms and laboratory data despite its complex pathology, which easily leads to misdiagnosis and missed diagnosis by inexperienced doctors. Especially, CT pulmonary angiography, the gold standard method, is not widely available. In this study, we aim to establish a rapid and accurate screening model for pulmonary embolism using machine learning technology. Importantly, data required for disease prediction are easily accessed, including routine laboratory data and medical record information of patients. We extracted features from patients' routine laboratory results and medical records, including blood routine, biochemical group, blood coagulation routine and other test results, as well as symptoms and medical history information. Samples with a feature loss rate greater than 0.8 were deleted from the original database. Data from 4723 cases were retained, 231 of which were positive for pulmonary embolism. 50 features were retained through the positive and negative statistical hypothesis testing which was used to build the predictive model. In order to avoid identification as majority-class samples caused by the imbalance of sample proportion, we used the method of Synthetic Minority Oversampling Technique (SMOTE) to increase the amount of information on minority samples. Five typical machine learning algorithms were used to model the screening of pulmonary embolism, including Support Vector Machines, Logistic Regression, Random Forest, XGBoost, and Back Propagation Neural Networks. To evaluate model performance, sensitivity, specificity and AUC curve were analyzed as the main evaluation indicators. Furthermore, a baseline model was established using the characteristics of the pulmonary embolism guidelines as a comparison model. We found that XGBoost showed better performance compared to other models, with the highest sensitivity and specificity (0.99 and 0.99, respectively). Moreover, it showed significant improvement in performance compared to the baseline model (sensitivity and specificity were 0.76 and 0.76 respectively). More important, our model showed low missed diagnosis rate (0.46) and high AUC value (0.992). Finally, the calculation time of our model is only about 0.05 s to obtain the possibility of pulmonary embolism. In this study, five machine learning classification models were established to assess the likelihood of patients suffering from pulmonary embolism, and the XGBoost model most significantly improved the precision, sensitivity, and AUC for pulmonary embolism screening. Collectively, we have established an AI-based model to accurately predict pulmonary embolism at early stage.</abstract><cop>Ireland</cop><pmid>38113803</pmid><doi>10.1016/j.cmpb.2023.107977</doi><tpages>1</tpages><orcidid>https://orcid.org/0009-0005-6419-2835</orcidid><orcidid>https://orcid.org/0000-0002-3457-4147</orcidid><orcidid>https://orcid.org/0000-0003-4033-1843</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0169-2607
ispartof Computer methods and programs in biomedicine, 2024-02, Vol.244, p.107977-107977, Article 107977
issn 0169-2607
1872-7565
language eng
recordid cdi_proquest_miscellaneous_2904153503
source ScienceDirect Journals (5 years ago - present)
title Establishment of machine learning-based tool for early detection of pulmonary embolism
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T20%3A27%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Establishment%20of%20machine%20learning-based%20tool%20for%20early%20detection%20of%20pulmonary%20embolism&rft.jtitle=Computer%20methods%20and%20programs%20in%20biomedicine&rft.au=Liu,%20Lijue&rft.date=2024-02&rft.volume=244&rft.spage=107977&rft.epage=107977&rft.pages=107977-107977&rft.artnum=107977&rft.issn=0169-2607&rft.eissn=1872-7565&rft_id=info:doi/10.1016/j.cmpb.2023.107977&rft_dat=%3Cproquest_cross%3E2904153503%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2904153503&rft_id=info:pmid/38113803&rfr_iscdi=true