Improving Biomedical Question Answering by Data Augmentation and Model Weighting
Biomedical Question Answering aims to extract an answer to the given question from a biomedical context. Due to the strong professionalism of specific domain, it's more difficult to build large-scale datasets for specific domain question answering. Existing methods are limited by the lack of tr...
Gespeichert in:
Veröffentlicht in: | IEEE/ACM transactions on computational biology and bioinformatics 2023-03, Vol.20 (2), p.1114-1124 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1124 |
---|---|
container_issue | 2 |
container_start_page | 1114 |
container_title | IEEE/ACM transactions on computational biology and bioinformatics |
container_volume | 20 |
creator | Du, Yongping Yan, Jingya Lu, Yuxuan Zhao, Yiliang Jin, Xingnan |
description | Biomedical Question Answering aims to extract an answer to the given question from a biomedical context. Due to the strong professionalism of specific domain, it's more difficult to build large-scale datasets for specific domain question answering. Existing methods are limited by the lack of training data, and the performance is not as good as in open-domain settings, especially degrading when facing to the adversarial sample. We try to resolve the above issues. First, effective data augmentation strategies are adopted to improve the model training, including slide window, summarization and round-trip translation. Second, we propose a model weighting strategy for the final answer prediction in biomedical domain, which combines the advantage of two models, open-domain model QANet and BioBERT pre-trained in biomedical domain data. Finally, we give adversarial training to reinforce the robustness of the model. The public biomedical dataset collected from PubMed provided by BioASQ challenge is used to evaluate our approach. The results show that the model performance has been improved significantly compared to the single model and other models participated in BioASQ challenge. It can learn richer semantic expression from data augmentation and adversarial samples, which is beneficial to solve more complex question answering problems in biomedical domain. |
doi_str_mv | 10.1109/TCBB.2022.3171388 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_2658227528</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9765710</ieee_id><sourcerecordid>2795805645</sourcerecordid><originalsourceid>FETCH-LOGICAL-c349t-f0cb40ad9262d92453233e34ef1541ae2eee297a35bca4eb7e9d2bf8664713143</originalsourceid><addsrcrecordid>eNpdkF1LwzAUhoMobk5_gAhS8Mabznynudzm10BRYeJlSNvT2dEPbVpl_97UzV14kwTOc07e8yB0SvCYEKyvFrPpdEwxpWNGFGFRtIeGRAgVai35fv_mIhRasgE6cm6FMeUa80M0YIJHUkg2RM_z8qOpv_JqGUzzuoQ0T2wRvHTg2ryugknlvqHpq_E6uLatDSbdsoSqtb9lW6XBY51CEbxBvnxvPXiMDjJbODjZ3iP0enuzmN2HD09389nkIUwY122Y4STm2KaaSuoPLhhlDBiHzIcmFigAUK0sE3FiOcQKdErjLJKS-00JZyN0uZnr43_2cU2ZuwSKwlZQd85QKSJKlaCRRy_-oau6ayqfzlClRYSF9P-PENlQSVM710BmPpq8tM3aEGx63abXbXrdZqvb95xvJ3exd7fr-PPrgbMNkPt9dmWtpFAEsx-XToKh</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2795805645</pqid></control><display><type>article</type><title>Improving Biomedical Question Answering by Data Augmentation and Model Weighting</title><source>IEEE Electronic Library (IEL)</source><creator>Du, Yongping ; Yan, Jingya ; Lu, Yuxuan ; Zhao, Yiliang ; Jin, Xingnan</creator><creatorcontrib>Du, Yongping ; Yan, Jingya ; Lu, Yuxuan ; Zhao, Yiliang ; Jin, Xingnan</creatorcontrib><description>Biomedical Question Answering aims to extract an answer to the given question from a biomedical context. Due to the strong professionalism of specific domain, it's more difficult to build large-scale datasets for specific domain question answering. Existing methods are limited by the lack of training data, and the performance is not as good as in open-domain settings, especially degrading when facing to the adversarial sample. We try to resolve the above issues. First, effective data augmentation strategies are adopted to improve the model training, including slide window, summarization and round-trip translation. Second, we propose a model weighting strategy for the final answer prediction in biomedical domain, which combines the advantage of two models, open-domain model QANet and BioBERT pre-trained in biomedical domain data. Finally, we give adversarial training to reinforce the robustness of the model. The public biomedical dataset collected from PubMed provided by BioASQ challenge is used to evaluate our approach. The results show that the model performance has been improved significantly compared to the single model and other models participated in BioASQ challenge. It can learn richer semantic expression from data augmentation and adversarial samples, which is beneficial to solve more complex question answering problems in biomedical domain.</description><identifier>ISSN: 1545-5963</identifier><identifier>EISSN: 1557-9964</identifier><identifier>DOI: 10.1109/TCBB.2022.3171388</identifier><identifier>PMID: 35486563</identifier><identifier>CODEN: ITCBCY</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Biological system modeling ; Biomedical data ; Biomedical question answering ; Context modeling ; Data augmentation ; Data models ; Datasets ; deep learning ; Domains ; Machine Learning ; model weighting ; Predictive models ; Questions ; Semantics ; Task analysis ; Training ; Training data ; Weighting</subject><ispartof>IEEE/ACM transactions on computational biology and bioinformatics, 2023-03, Vol.20 (2), p.1114-1124</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c349t-f0cb40ad9262d92453233e34ef1541ae2eee297a35bca4eb7e9d2bf8664713143</citedby><cites>FETCH-LOGICAL-c349t-f0cb40ad9262d92453233e34ef1541ae2eee297a35bca4eb7e9d2bf8664713143</cites><orcidid>0000-0003-0373-1696 ; 0000-0002-8520-0540</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9765710$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9765710$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35486563$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Du, Yongping</creatorcontrib><creatorcontrib>Yan, Jingya</creatorcontrib><creatorcontrib>Lu, Yuxuan</creatorcontrib><creatorcontrib>Zhao, Yiliang</creatorcontrib><creatorcontrib>Jin, Xingnan</creatorcontrib><title>Improving Biomedical Question Answering by Data Augmentation and Model Weighting</title><title>IEEE/ACM transactions on computational biology and bioinformatics</title><addtitle>TCBB</addtitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><description>Biomedical Question Answering aims to extract an answer to the given question from a biomedical context. Due to the strong professionalism of specific domain, it's more difficult to build large-scale datasets for specific domain question answering. Existing methods are limited by the lack of training data, and the performance is not as good as in open-domain settings, especially degrading when facing to the adversarial sample. We try to resolve the above issues. First, effective data augmentation strategies are adopted to improve the model training, including slide window, summarization and round-trip translation. Second, we propose a model weighting strategy for the final answer prediction in biomedical domain, which combines the advantage of two models, open-domain model QANet and BioBERT pre-trained in biomedical domain data. Finally, we give adversarial training to reinforce the robustness of the model. The public biomedical dataset collected from PubMed provided by BioASQ challenge is used to evaluate our approach. The results show that the model performance has been improved significantly compared to the single model and other models participated in BioASQ challenge. It can learn richer semantic expression from data augmentation and adversarial samples, which is beneficial to solve more complex question answering problems in biomedical domain.</description><subject>Biological system modeling</subject><subject>Biomedical data</subject><subject>Biomedical question answering</subject><subject>Context modeling</subject><subject>Data augmentation</subject><subject>Data models</subject><subject>Datasets</subject><subject>deep learning</subject><subject>Domains</subject><subject>Machine Learning</subject><subject>model weighting</subject><subject>Predictive models</subject><subject>Questions</subject><subject>Semantics</subject><subject>Task analysis</subject><subject>Training</subject><subject>Training data</subject><subject>Weighting</subject><issn>1545-5963</issn><issn>1557-9964</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNpdkF1LwzAUhoMobk5_gAhS8Mabznynudzm10BRYeJlSNvT2dEPbVpl_97UzV14kwTOc07e8yB0SvCYEKyvFrPpdEwxpWNGFGFRtIeGRAgVai35fv_mIhRasgE6cm6FMeUa80M0YIJHUkg2RM_z8qOpv_JqGUzzuoQ0T2wRvHTg2ryugknlvqHpq_E6uLatDSbdsoSqtb9lW6XBY51CEbxBvnxvPXiMDjJbODjZ3iP0enuzmN2HD09389nkIUwY122Y4STm2KaaSuoPLhhlDBiHzIcmFigAUK0sE3FiOcQKdErjLJKS-00JZyN0uZnr43_2cU2ZuwSKwlZQd85QKSJKlaCRRy_-oau6ayqfzlClRYSF9P-PENlQSVM710BmPpq8tM3aEGx63abXbXrdZqvb95xvJ3exd7fr-PPrgbMNkPt9dmWtpFAEsx-XToKh</recordid><startdate>202303</startdate><enddate>202303</enddate><creator>Du, Yongping</creator><creator>Yan, Jingya</creator><creator>Lu, Yuxuan</creator><creator>Zhao, Yiliang</creator><creator>Jin, Xingnan</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-0373-1696</orcidid><orcidid>https://orcid.org/0000-0002-8520-0540</orcidid></search><sort><creationdate>202303</creationdate><title>Improving Biomedical Question Answering by Data Augmentation and Model Weighting</title><author>Du, Yongping ; Yan, Jingya ; Lu, Yuxuan ; Zhao, Yiliang ; Jin, Xingnan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c349t-f0cb40ad9262d92453233e34ef1541ae2eee297a35bca4eb7e9d2bf8664713143</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Biological system modeling</topic><topic>Biomedical data</topic><topic>Biomedical question answering</topic><topic>Context modeling</topic><topic>Data augmentation</topic><topic>Data models</topic><topic>Datasets</topic><topic>deep learning</topic><topic>Domains</topic><topic>Machine Learning</topic><topic>model weighting</topic><topic>Predictive models</topic><topic>Questions</topic><topic>Semantics</topic><topic>Task analysis</topic><topic>Training</topic><topic>Training data</topic><topic>Weighting</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Du, Yongping</creatorcontrib><creatorcontrib>Yan, Jingya</creatorcontrib><creatorcontrib>Lu, Yuxuan</creatorcontrib><creatorcontrib>Zhao, Yiliang</creatorcontrib><creatorcontrib>Jin, Xingnan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Du, Yongping</au><au>Yan, Jingya</au><au>Lu, Yuxuan</au><au>Zhao, Yiliang</au><au>Jin, Xingnan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improving Biomedical Question Answering by Data Augmentation and Model Weighting</atitle><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle><stitle>TCBB</stitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><date>2023-03</date><risdate>2023</risdate><volume>20</volume><issue>2</issue><spage>1114</spage><epage>1124</epage><pages>1114-1124</pages><issn>1545-5963</issn><eissn>1557-9964</eissn><coden>ITCBCY</coden><abstract>Biomedical Question Answering aims to extract an answer to the given question from a biomedical context. Due to the strong professionalism of specific domain, it's more difficult to build large-scale datasets for specific domain question answering. Existing methods are limited by the lack of training data, and the performance is not as good as in open-domain settings, especially degrading when facing to the adversarial sample. We try to resolve the above issues. First, effective data augmentation strategies are adopted to improve the model training, including slide window, summarization and round-trip translation. Second, we propose a model weighting strategy for the final answer prediction in biomedical domain, which combines the advantage of two models, open-domain model QANet and BioBERT pre-trained in biomedical domain data. Finally, we give adversarial training to reinforce the robustness of the model. The public biomedical dataset collected from PubMed provided by BioASQ challenge is used to evaluate our approach. The results show that the model performance has been improved significantly compared to the single model and other models participated in BioASQ challenge. It can learn richer semantic expression from data augmentation and adversarial samples, which is beneficial to solve more complex question answering problems in biomedical domain.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>35486563</pmid><doi>10.1109/TCBB.2022.3171388</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0003-0373-1696</orcidid><orcidid>https://orcid.org/0000-0002-8520-0540</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1545-5963 |
ispartof | IEEE/ACM transactions on computational biology and bioinformatics, 2023-03, Vol.20 (2), p.1114-1124 |
issn | 1545-5963 1557-9964 |
language | eng |
recordid | cdi_proquest_miscellaneous_2658227528 |
source | IEEE Electronic Library (IEL) |
subjects | Biological system modeling Biomedical data Biomedical question answering Context modeling Data augmentation Data models Datasets deep learning Domains Machine Learning model weighting Predictive models Questions Semantics Task analysis Training Training data Weighting |
title | Improving Biomedical Question Answering by Data Augmentation and Model Weighting |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T23%3A08%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improving%20Biomedical%20Question%20Answering%20by%20Data%20Augmentation%20and%20Model%20Weighting&rft.jtitle=IEEE/ACM%20transactions%20on%20computational%20biology%20and%20bioinformatics&rft.au=Du,%20Yongping&rft.date=2023-03&rft.volume=20&rft.issue=2&rft.spage=1114&rft.epage=1124&rft.pages=1114-1124&rft.issn=1545-5963&rft.eissn=1557-9964&rft.coden=ITCBCY&rft_id=info:doi/10.1109/TCBB.2022.3171388&rft_dat=%3Cproquest_RIE%3E2795805645%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2795805645&rft_id=info:pmid/35486563&rft_ieee_id=9765710&rfr_iscdi=true |