An Algorithm of Query Expansion for Chinese EMR Retrieval by Improving Expansion Term Weights and Retrieval Scores

Query expansion (QE) has been widely used in electronic medical record (EMR) retrieval for assisted diagnosis and clinical research. However, existing QE algorithms haven't achieved satisfactory performance in Chinese EMR retrieval, and one noticeable problem is that the weights of expansion te...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2020, Vol.8, p.200063-200072
Hauptverfasser: Yang, Songchun, Zheng, Xiangwen, Yin, Xiangfei, Mao, Huajian, Zhao, Dongsheng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 200072
container_issue
container_start_page 200063
container_title IEEE access
container_volume 8
creator Yang, Songchun
Zheng, Xiangwen
Yin, Xiangfei
Mao, Huajian
Zhao, Dongsheng
description Query expansion (QE) has been widely used in electronic medical record (EMR) retrieval for assisted diagnosis and clinical research. However, existing QE algorithms haven't achieved satisfactory performance in Chinese EMR retrieval, and one noticeable problem is that the weights of expansion terms and retrieval scores have unreasonable factors for lack of the solid consideration of clinical needs. Here we propose an algorithm of QE for Chinese EMR retrieval by improving expansion term weights and retrieval scores. First, the weights of expansion terms are assigned with semantic similarities, category weights and co-occurrence frequencies between expansion terms and multiple query terms. Then the retrieval scores calculated by expansion terms are limited to reduce the query drift caused by high-frequency expansion terms. Experiment results show that our method gets a 33.3% increase in the precision at top 10, a 90.4% increase in the recall, and a 13.2% increase in MAP compared with four baselines. It proves that our improvement scheme can ensure the accuracy of expansion term weights and decrease the query drift caused by QE, which substantially improves the performance of Chinese EMR retrieval.
doi_str_mv 10.1109/ACCESS.2020.3033017
format Article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2460159029</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9250729</ieee_id><doaj_id>oai_doaj_org_article_3603858544014416b0ecaa74799f653e</doaj_id><sourcerecordid>2460159029</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-b216f35c50525ca4e9644cdfc493986a2cecabdd53c82e985f3f122690f2563</originalsourceid><addsrcrecordid>eNpNkU1rGzEQhpeSQEKaX5CLoGc7o89dHc3iNoaUkjiQo5C1I1vGXrnSOsT_vko3BOsiMczzzIi3qu4oTCkFfT9r2_lyOWXAYMqBc6D1t-qaUaUnXHJ1cfa-qm5z3kI5TSnJ-rpKs57MduuYwrDZk-jJ0xHTiczfD7bPIfbEx0TaTegxI5n_fibPOKSAb3ZHViey2B9SfAv9-gx4wbQnrxjWmyET23dnxNLFhPl7dentLuPt531TLX_OX9qHyeOfX4t29jhxApphsipbey6dBMmkswK1EsJ13gnNdaMsc-jsquskdw1D3UjPPWVMafBMKn5TLUZrF-3WHFLY23Qy0QbzvxDT2tg0BLdDwxXwRjZSCKBCULWCYra1qLX2SnIsrh-jq_z27xHzYLbxmPqyvGFCAZUamC5dfOxyKeac0H9NpWA-kjJjUuYjKfOZVKHuRiog4hehmYS6OP8BBziOHQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2460159029</pqid></control><display><type>article</type><title>An Algorithm of Query Expansion for Chinese EMR Retrieval by Improving Expansion Term Weights and Retrieval Scores</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Yang, Songchun ; Zheng, Xiangwen ; Yin, Xiangfei ; Mao, Huajian ; Zhao, Dongsheng</creator><creatorcontrib>Yang, Songchun ; Zheng, Xiangwen ; Yin, Xiangfei ; Mao, Huajian ; Zhao, Dongsheng</creatorcontrib><description>Query expansion (QE) has been widely used in electronic medical record (EMR) retrieval for assisted diagnosis and clinical research. However, existing QE algorithms haven't achieved satisfactory performance in Chinese EMR retrieval, and one noticeable problem is that the weights of expansion terms and retrieval scores have unreasonable factors for lack of the solid consideration of clinical needs. Here we propose an algorithm of QE for Chinese EMR retrieval by improving expansion term weights and retrieval scores. First, the weights of expansion terms are assigned with semantic similarities, category weights and co-occurrence frequencies between expansion terms and multiple query terms. Then the retrieval scores calculated by expansion terms are limited to reduce the query drift caused by high-frequency expansion terms. Experiment results show that our method gets a 33.3% increase in the precision at top 10, a 90.4% increase in the recall, and a 13.2% increase in MAP compared with four baselines. It proves that our improvement scheme can ensure the accuracy of expansion term weights and decrease the query drift caused by QE, which substantially improves the performance of Chinese EMR retrieval.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2020.3033017</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; BM25 ; co-occurrence ; Drift ; Electronic health records ; Electronic medical record ; Electronic medical records ; Medical research ; Performance enhancement ; Queries ; Query expansion ; Retrieval ; Semantics ; Solids ; word2Vec</subject><ispartof>IEEE access, 2020, Vol.8, p.200063-200072</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-b216f35c50525ca4e9644cdfc493986a2cecabdd53c82e985f3f122690f2563</citedby><cites>FETCH-LOGICAL-c408t-b216f35c50525ca4e9644cdfc493986a2cecabdd53c82e985f3f122690f2563</cites><orcidid>0000-0001-7940-0514 ; 0000-0002-1139-7889 ; 0000-0002-5609-6270 ; 0000-0003-2616-8891 ; 0000-0002-8424-0372</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9250729$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Yang, Songchun</creatorcontrib><creatorcontrib>Zheng, Xiangwen</creatorcontrib><creatorcontrib>Yin, Xiangfei</creatorcontrib><creatorcontrib>Mao, Huajian</creatorcontrib><creatorcontrib>Zhao, Dongsheng</creatorcontrib><title>An Algorithm of Query Expansion for Chinese EMR Retrieval by Improving Expansion Term Weights and Retrieval Scores</title><title>IEEE access</title><addtitle>Access</addtitle><description>Query expansion (QE) has been widely used in electronic medical record (EMR) retrieval for assisted diagnosis and clinical research. However, existing QE algorithms haven't achieved satisfactory performance in Chinese EMR retrieval, and one noticeable problem is that the weights of expansion terms and retrieval scores have unreasonable factors for lack of the solid consideration of clinical needs. Here we propose an algorithm of QE for Chinese EMR retrieval by improving expansion term weights and retrieval scores. First, the weights of expansion terms are assigned with semantic similarities, category weights and co-occurrence frequencies between expansion terms and multiple query terms. Then the retrieval scores calculated by expansion terms are limited to reduce the query drift caused by high-frequency expansion terms. Experiment results show that our method gets a 33.3% increase in the precision at top 10, a 90.4% increase in the recall, and a 13.2% increase in MAP compared with four baselines. It proves that our improvement scheme can ensure the accuracy of expansion term weights and decrease the query drift caused by QE, which substantially improves the performance of Chinese EMR retrieval.</description><subject>Algorithms</subject><subject>BM25</subject><subject>co-occurrence</subject><subject>Drift</subject><subject>Electronic health records</subject><subject>Electronic medical record</subject><subject>Electronic medical records</subject><subject>Medical research</subject><subject>Performance enhancement</subject><subject>Queries</subject><subject>Query expansion</subject><subject>Retrieval</subject><subject>Semantics</subject><subject>Solids</subject><subject>word2Vec</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNkU1rGzEQhpeSQEKaX5CLoGc7o89dHc3iNoaUkjiQo5C1I1vGXrnSOsT_vko3BOsiMczzzIi3qu4oTCkFfT9r2_lyOWXAYMqBc6D1t-qaUaUnXHJ1cfa-qm5z3kI5TSnJ-rpKs57MduuYwrDZk-jJ0xHTiczfD7bPIfbEx0TaTegxI5n_fibPOKSAb3ZHViey2B9SfAv9-gx4wbQnrxjWmyET23dnxNLFhPl7dentLuPt531TLX_OX9qHyeOfX4t29jhxApphsipbey6dBMmkswK1EsJ13gnNdaMsc-jsquskdw1D3UjPPWVMafBMKn5TLUZrF-3WHFLY23Qy0QbzvxDT2tg0BLdDwxXwRjZSCKBCULWCYra1qLX2SnIsrh-jq_z27xHzYLbxmPqyvGFCAZUamC5dfOxyKeac0H9NpWA-kjJjUuYjKfOZVKHuRiog4hehmYS6OP8BBziOHQ</recordid><startdate>2020</startdate><enddate>2020</enddate><creator>Yang, Songchun</creator><creator>Zheng, Xiangwen</creator><creator>Yin, Xiangfei</creator><creator>Mao, Huajian</creator><creator>Zhao, Dongsheng</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-7940-0514</orcidid><orcidid>https://orcid.org/0000-0002-1139-7889</orcidid><orcidid>https://orcid.org/0000-0002-5609-6270</orcidid><orcidid>https://orcid.org/0000-0003-2616-8891</orcidid><orcidid>https://orcid.org/0000-0002-8424-0372</orcidid></search><sort><creationdate>2020</creationdate><title>An Algorithm of Query Expansion for Chinese EMR Retrieval by Improving Expansion Term Weights and Retrieval Scores</title><author>Yang, Songchun ; Zheng, Xiangwen ; Yin, Xiangfei ; Mao, Huajian ; Zhao, Dongsheng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-b216f35c50525ca4e9644cdfc493986a2cecabdd53c82e985f3f122690f2563</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>BM25</topic><topic>co-occurrence</topic><topic>Drift</topic><topic>Electronic health records</topic><topic>Electronic medical record</topic><topic>Electronic medical records</topic><topic>Medical research</topic><topic>Performance enhancement</topic><topic>Queries</topic><topic>Query expansion</topic><topic>Retrieval</topic><topic>Semantics</topic><topic>Solids</topic><topic>word2Vec</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yang, Songchun</creatorcontrib><creatorcontrib>Zheng, Xiangwen</creatorcontrib><creatorcontrib>Yin, Xiangfei</creatorcontrib><creatorcontrib>Mao, Huajian</creatorcontrib><creatorcontrib>Zhao, Dongsheng</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Songchun</au><au>Zheng, Xiangwen</au><au>Yin, Xiangfei</au><au>Mao, Huajian</au><au>Zhao, Dongsheng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An Algorithm of Query Expansion for Chinese EMR Retrieval by Improving Expansion Term Weights and Retrieval Scores</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2020</date><risdate>2020</risdate><volume>8</volume><spage>200063</spage><epage>200072</epage><pages>200063-200072</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Query expansion (QE) has been widely used in electronic medical record (EMR) retrieval for assisted diagnosis and clinical research. However, existing QE algorithms haven't achieved satisfactory performance in Chinese EMR retrieval, and one noticeable problem is that the weights of expansion terms and retrieval scores have unreasonable factors for lack of the solid consideration of clinical needs. Here we propose an algorithm of QE for Chinese EMR retrieval by improving expansion term weights and retrieval scores. First, the weights of expansion terms are assigned with semantic similarities, category weights and co-occurrence frequencies between expansion terms and multiple query terms. Then the retrieval scores calculated by expansion terms are limited to reduce the query drift caused by high-frequency expansion terms. Experiment results show that our method gets a 33.3% increase in the precision at top 10, a 90.4% increase in the recall, and a 13.2% increase in MAP compared with four baselines. It proves that our improvement scheme can ensure the accuracy of expansion term weights and decrease the query drift caused by QE, which substantially improves the performance of Chinese EMR retrieval.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2020.3033017</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0001-7940-0514</orcidid><orcidid>https://orcid.org/0000-0002-1139-7889</orcidid><orcidid>https://orcid.org/0000-0002-5609-6270</orcidid><orcidid>https://orcid.org/0000-0003-2616-8891</orcidid><orcidid>https://orcid.org/0000-0002-8424-0372</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2020, Vol.8, p.200063-200072
issn 2169-3536
2169-3536
language eng
recordid cdi_proquest_journals_2460159029
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals
subjects Algorithms
BM25
co-occurrence
Drift
Electronic health records
Electronic medical record
Electronic medical records
Medical research
Performance enhancement
Queries
Query expansion
Retrieval
Semantics
Solids
word2Vec
title An Algorithm of Query Expansion for Chinese EMR Retrieval by Improving Expansion Term Weights and Retrieval Scores
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T02%3A09%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20Algorithm%20of%20Query%20Expansion%20for%20Chinese%20EMR%20Retrieval%20by%20Improving%20Expansion%20Term%20Weights%20and%20Retrieval%20Scores&rft.jtitle=IEEE%20access&rft.au=Yang,%20Songchun&rft.date=2020&rft.volume=8&rft.spage=200063&rft.epage=200072&rft.pages=200063-200072&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2020.3033017&rft_dat=%3Cproquest_ieee_%3E2460159029%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2460159029&rft_id=info:pmid/&rft_ieee_id=9250729&rft_doaj_id=oai_doaj_org_article_3603858544014416b0ecaa74799f653e&rfr_iscdi=true