Improving Biomedical Question Answering by Data Augmentation and Model Weighting

Biomedical Question Answering aims to extract an answer to the given question from a biomedical context. Due to the strong professionalism of specific domain, it's more difficult to build large-scale datasets for specific domain question answering. Existing methods are limited by the lack of tr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM transactions on computational biology and bioinformatics 2023-03, Vol.20 (2), p.1114-1124
Hauptverfasser: Du, Yongping, Yan, Jingya, Lu, Yuxuan, Zhao, Yiliang, Jin, Xingnan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1124
container_issue 2
container_start_page 1114
container_title IEEE/ACM transactions on computational biology and bioinformatics
container_volume 20
creator Du, Yongping
Yan, Jingya
Lu, Yuxuan
Zhao, Yiliang
Jin, Xingnan
description Biomedical Question Answering aims to extract an answer to the given question from a biomedical context. Due to the strong professionalism of specific domain, it's more difficult to build large-scale datasets for specific domain question answering. Existing methods are limited by the lack of training data, and the performance is not as good as in open-domain settings, especially degrading when facing to the adversarial sample. We try to resolve the above issues. First, effective data augmentation strategies are adopted to improve the model training, including slide window, summarization and round-trip translation. Second, we propose a model weighting strategy for the final answer prediction in biomedical domain, which combines the advantage of two models, open-domain model QANet and BioBERT pre-trained in biomedical domain data. Finally, we give adversarial training to reinforce the robustness of the model. The public biomedical dataset collected from PubMed provided by BioASQ challenge is used to evaluate our approach. The results show that the model performance has been improved significantly compared to the single model and other models participated in BioASQ challenge. It can learn richer semantic expression from data augmentation and adversarial samples, which is beneficial to solve more complex question answering problems in biomedical domain.
doi_str_mv 10.1109/TCBB.2022.3171388
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_2658227528</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9765710</ieee_id><sourcerecordid>2795805645</sourcerecordid><originalsourceid>FETCH-LOGICAL-c349t-f0cb40ad9262d92453233e34ef1541ae2eee297a35bca4eb7e9d2bf8664713143</originalsourceid><addsrcrecordid>eNpdkF1LwzAUhoMobk5_gAhS8Mabznynudzm10BRYeJlSNvT2dEPbVpl_97UzV14kwTOc07e8yB0SvCYEKyvFrPpdEwxpWNGFGFRtIeGRAgVai35fv_mIhRasgE6cm6FMeUa80M0YIJHUkg2RM_z8qOpv_JqGUzzuoQ0T2wRvHTg2ryugknlvqHpq_E6uLatDSbdsoSqtb9lW6XBY51CEbxBvnxvPXiMDjJbODjZ3iP0enuzmN2HD09389nkIUwY122Y4STm2KaaSuoPLhhlDBiHzIcmFigAUK0sE3FiOcQKdErjLJKS-00JZyN0uZnr43_2cU2ZuwSKwlZQd85QKSJKlaCRRy_-oau6ayqfzlClRYSF9P-PENlQSVM710BmPpq8tM3aEGx63abXbXrdZqvb95xvJ3exd7fr-PPrgbMNkPt9dmWtpFAEsx-XToKh</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2795805645</pqid></control><display><type>article</type><title>Improving Biomedical Question Answering by Data Augmentation and Model Weighting</title><source>IEEE Electronic Library (IEL)</source><creator>Du, Yongping ; Yan, Jingya ; Lu, Yuxuan ; Zhao, Yiliang ; Jin, Xingnan</creator><creatorcontrib>Du, Yongping ; Yan, Jingya ; Lu, Yuxuan ; Zhao, Yiliang ; Jin, Xingnan</creatorcontrib><description>Biomedical Question Answering aims to extract an answer to the given question from a biomedical context. Due to the strong professionalism of specific domain, it's more difficult to build large-scale datasets for specific domain question answering. Existing methods are limited by the lack of training data, and the performance is not as good as in open-domain settings, especially degrading when facing to the adversarial sample. We try to resolve the above issues. First, effective data augmentation strategies are adopted to improve the model training, including slide window, summarization and round-trip translation. Second, we propose a model weighting strategy for the final answer prediction in biomedical domain, which combines the advantage of two models, open-domain model QANet and BioBERT pre-trained in biomedical domain data. Finally, we give adversarial training to reinforce the robustness of the model. The public biomedical dataset collected from PubMed provided by BioASQ challenge is used to evaluate our approach. The results show that the model performance has been improved significantly compared to the single model and other models participated in BioASQ challenge. It can learn richer semantic expression from data augmentation and adversarial samples, which is beneficial to solve more complex question answering problems in biomedical domain.</description><identifier>ISSN: 1545-5963</identifier><identifier>EISSN: 1557-9964</identifier><identifier>DOI: 10.1109/TCBB.2022.3171388</identifier><identifier>PMID: 35486563</identifier><identifier>CODEN: ITCBCY</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Biological system modeling ; Biomedical data ; Biomedical question answering ; Context modeling ; Data augmentation ; Data models ; Datasets ; deep learning ; Domains ; Machine Learning ; model weighting ; Predictive models ; Questions ; Semantics ; Task analysis ; Training ; Training data ; Weighting</subject><ispartof>IEEE/ACM transactions on computational biology and bioinformatics, 2023-03, Vol.20 (2), p.1114-1124</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c349t-f0cb40ad9262d92453233e34ef1541ae2eee297a35bca4eb7e9d2bf8664713143</citedby><cites>FETCH-LOGICAL-c349t-f0cb40ad9262d92453233e34ef1541ae2eee297a35bca4eb7e9d2bf8664713143</cites><orcidid>0000-0003-0373-1696 ; 0000-0002-8520-0540</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9765710$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9765710$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35486563$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Du, Yongping</creatorcontrib><creatorcontrib>Yan, Jingya</creatorcontrib><creatorcontrib>Lu, Yuxuan</creatorcontrib><creatorcontrib>Zhao, Yiliang</creatorcontrib><creatorcontrib>Jin, Xingnan</creatorcontrib><title>Improving Biomedical Question Answering by Data Augmentation and Model Weighting</title><title>IEEE/ACM transactions on computational biology and bioinformatics</title><addtitle>TCBB</addtitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><description>Biomedical Question Answering aims to extract an answer to the given question from a biomedical context. Due to the strong professionalism of specific domain, it's more difficult to build large-scale datasets for specific domain question answering. Existing methods are limited by the lack of training data, and the performance is not as good as in open-domain settings, especially degrading when facing to the adversarial sample. We try to resolve the above issues. First, effective data augmentation strategies are adopted to improve the model training, including slide window, summarization and round-trip translation. Second, we propose a model weighting strategy for the final answer prediction in biomedical domain, which combines the advantage of two models, open-domain model QANet and BioBERT pre-trained in biomedical domain data. Finally, we give adversarial training to reinforce the robustness of the model. The public biomedical dataset collected from PubMed provided by BioASQ challenge is used to evaluate our approach. The results show that the model performance has been improved significantly compared to the single model and other models participated in BioASQ challenge. It can learn richer semantic expression from data augmentation and adversarial samples, which is beneficial to solve more complex question answering problems in biomedical domain.</description><subject>Biological system modeling</subject><subject>Biomedical data</subject><subject>Biomedical question answering</subject><subject>Context modeling</subject><subject>Data augmentation</subject><subject>Data models</subject><subject>Datasets</subject><subject>deep learning</subject><subject>Domains</subject><subject>Machine Learning</subject><subject>model weighting</subject><subject>Predictive models</subject><subject>Questions</subject><subject>Semantics</subject><subject>Task analysis</subject><subject>Training</subject><subject>Training data</subject><subject>Weighting</subject><issn>1545-5963</issn><issn>1557-9964</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNpdkF1LwzAUhoMobk5_gAhS8Mabznynudzm10BRYeJlSNvT2dEPbVpl_97UzV14kwTOc07e8yB0SvCYEKyvFrPpdEwxpWNGFGFRtIeGRAgVai35fv_mIhRasgE6cm6FMeUa80M0YIJHUkg2RM_z8qOpv_JqGUzzuoQ0T2wRvHTg2ryugknlvqHpq_E6uLatDSbdsoSqtb9lW6XBY51CEbxBvnxvPXiMDjJbODjZ3iP0enuzmN2HD09389nkIUwY122Y4STm2KaaSuoPLhhlDBiHzIcmFigAUK0sE3FiOcQKdErjLJKS-00JZyN0uZnr43_2cU2ZuwSKwlZQd85QKSJKlaCRRy_-oau6ayqfzlClRYSF9P-PENlQSVM710BmPpq8tM3aEGx63abXbXrdZqvb95xvJ3exd7fr-PPrgbMNkPt9dmWtpFAEsx-XToKh</recordid><startdate>202303</startdate><enddate>202303</enddate><creator>Du, Yongping</creator><creator>Yan, Jingya</creator><creator>Lu, Yuxuan</creator><creator>Zhao, Yiliang</creator><creator>Jin, Xingnan</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-0373-1696</orcidid><orcidid>https://orcid.org/0000-0002-8520-0540</orcidid></search><sort><creationdate>202303</creationdate><title>Improving Biomedical Question Answering by Data Augmentation and Model Weighting</title><author>Du, Yongping ; Yan, Jingya ; Lu, Yuxuan ; Zhao, Yiliang ; Jin, Xingnan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c349t-f0cb40ad9262d92453233e34ef1541ae2eee297a35bca4eb7e9d2bf8664713143</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Biological system modeling</topic><topic>Biomedical data</topic><topic>Biomedical question answering</topic><topic>Context modeling</topic><topic>Data augmentation</topic><topic>Data models</topic><topic>Datasets</topic><topic>deep learning</topic><topic>Domains</topic><topic>Machine Learning</topic><topic>model weighting</topic><topic>Predictive models</topic><topic>Questions</topic><topic>Semantics</topic><topic>Task analysis</topic><topic>Training</topic><topic>Training data</topic><topic>Weighting</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Du, Yongping</creatorcontrib><creatorcontrib>Yan, Jingya</creatorcontrib><creatorcontrib>Lu, Yuxuan</creatorcontrib><creatorcontrib>Zhao, Yiliang</creatorcontrib><creatorcontrib>Jin, Xingnan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Du, Yongping</au><au>Yan, Jingya</au><au>Lu, Yuxuan</au><au>Zhao, Yiliang</au><au>Jin, Xingnan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improving Biomedical Question Answering by Data Augmentation and Model Weighting</atitle><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle><stitle>TCBB</stitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><date>2023-03</date><risdate>2023</risdate><volume>20</volume><issue>2</issue><spage>1114</spage><epage>1124</epage><pages>1114-1124</pages><issn>1545-5963</issn><eissn>1557-9964</eissn><coden>ITCBCY</coden><abstract>Biomedical Question Answering aims to extract an answer to the given question from a biomedical context. Due to the strong professionalism of specific domain, it's more difficult to build large-scale datasets for specific domain question answering. Existing methods are limited by the lack of training data, and the performance is not as good as in open-domain settings, especially degrading when facing to the adversarial sample. We try to resolve the above issues. First, effective data augmentation strategies are adopted to improve the model training, including slide window, summarization and round-trip translation. Second, we propose a model weighting strategy for the final answer prediction in biomedical domain, which combines the advantage of two models, open-domain model QANet and BioBERT pre-trained in biomedical domain data. Finally, we give adversarial training to reinforce the robustness of the model. The public biomedical dataset collected from PubMed provided by BioASQ challenge is used to evaluate our approach. The results show that the model performance has been improved significantly compared to the single model and other models participated in BioASQ challenge. It can learn richer semantic expression from data augmentation and adversarial samples, which is beneficial to solve more complex question answering problems in biomedical domain.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>35486563</pmid><doi>10.1109/TCBB.2022.3171388</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0003-0373-1696</orcidid><orcidid>https://orcid.org/0000-0002-8520-0540</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1545-5963
ispartof IEEE/ACM transactions on computational biology and bioinformatics, 2023-03, Vol.20 (2), p.1114-1124
issn 1545-5963
1557-9964
language eng
recordid cdi_proquest_miscellaneous_2658227528
source IEEE Electronic Library (IEL)
subjects Biological system modeling
Biomedical data
Biomedical question answering
Context modeling
Data augmentation
Data models
Datasets
deep learning
Domains
Machine Learning
model weighting
Predictive models
Questions
Semantics
Task analysis
Training
Training data
Weighting
title Improving Biomedical Question Answering by Data Augmentation and Model Weighting
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T23%3A08%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improving%20Biomedical%20Question%20Answering%20by%20Data%20Augmentation%20and%20Model%20Weighting&rft.jtitle=IEEE/ACM%20transactions%20on%20computational%20biology%20and%20bioinformatics&rft.au=Du,%20Yongping&rft.date=2023-03&rft.volume=20&rft.issue=2&rft.spage=1114&rft.epage=1124&rft.pages=1114-1124&rft.issn=1545-5963&rft.eissn=1557-9964&rft.coden=ITCBCY&rft_id=info:doi/10.1109/TCBB.2022.3171388&rft_dat=%3Cproquest_RIE%3E2795805645%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2795805645&rft_id=info:pmid/35486563&rft_ieee_id=9765710&rfr_iscdi=true