Improving Biomedical Question Answering by Data Augmentation and Model Weighting

Biomedical Question Answering aims to extract an answer to the given question from a biomedical context. Due to the strong professionalism of specific domain, it's more difficult to build large-scale datasets for specific domain question answering. Existing methods are limited by the lack of tr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/ACM transactions on computational biology and bioinformatics 2023-03, Vol.20 (2), p.1114-1124
Hauptverfasser:	Du, Yongping, Yan, Jingya, Lu, Yuxuan, Zhao, Yiliang, Jin, Xingnan
Format:	Artikel
Sprache:	eng
Schlagworte:	Biological system modeling Biomedical data Biomedical question answering Context modeling Data augmentation Data models Datasets deep learning Domains Machine Learning model weighting Predictive models Questions Semantics Task analysis Training Training data Weighting
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1124
container_issue	2
container_start_page	1114
container_title	IEEE/ACM transactions on computational biology and bioinformatics
container_volume	20
creator	Du, Yongping Yan, Jingya Lu, Yuxuan Zhao, Yiliang Jin, Xingnan
description	Biomedical Question Answering aims to extract an answer to the given question from a biomedical context. Due to the strong professionalism of specific domain, it's more difficult to build large-scale datasets for specific domain question answering. Existing methods are limited by the lack of training data, and the performance is not as good as in open-domain settings, especially degrading when facing to the adversarial sample. We try to resolve the above issues. First, effective data augmentation strategies are adopted to improve the model training, including slide window, summarization and round-trip translation. Second, we propose a model weighting strategy for the final answer prediction in biomedical domain, which combines the advantage of two models, open-domain model QANet and BioBERT pre-trained in biomedical domain data. Finally, we give adversarial training to reinforce the robustness of the model. The public biomedical dataset collected from PubMed provided by BioASQ challenge is used to evaluate our approach. The results show that the model performance has been improved significantly compared to the single model and other models participated in BioASQ challenge. It can learn richer semantic expression from data augmentation and adversarial samples, which is beneficial to solve more complex question answering problems in biomedical domain.
doi_str_mv	10.1109/TCBB.2022.3171388
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_2658227528</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9765710</ieee_id><sourcerecordid>2795805645</sourcerecordid><originalsourceid>FETCH-LOGICAL-c349t-f0cb40ad9262d92453233e34ef1541ae2eee297a35bca4eb7e9d2bf8664713143</originalsourceid><addsrcrecordid>eNpdkF1LwzAUhoMobk5_gAhS8Mabznynudzm10BRYeJlSNvT2dEPbVpl_97UzV14kwTOc07e8yB0SvCYEKyvFrPpdEwxpWNGFGFRtIeGRAgVai35fv_mIhRasgE6cm6FMeUa80M0YIJHUkg2RM_z8qOpv_JqGUzzuoQ0T2wRvHTg2ryugknlvqHpq_E6uLatDSbdsoSqtb9lW6XBY51CEbxBvnxvPXiMDjJbODjZ3iP0enuzmN2HD09389nkIUwY122Y4STm2KaaSuoPLhhlDBiHzIcmFigAUK0sE3FiOcQKdErjLJKS-00JZyN0uZnr43_2cU2ZuwSKwlZQd85QKSJKlaCRRy_-oau6ayqfzlClRYSF9P-PENlQSVM710BmPpq8tM3aEGx63abXbXrdZqvb95xvJ3exd7fr-PPrgbMNkPt9dmWtpFAEsx-XToKh</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2795805645</pqid></control><display><type>article</type><title>Improving Biomedical Question Answering by Data Augmentation and Model Weighting</title><source>IEEE Electronic Library (IEL)</source><creator>Du, Yongping ; Yan, Jingya ; Lu, Yuxuan ; Zhao, Yiliang ; Jin, Xingnan</creator><creatorcontrib>Du, Yongping ; Yan, Jingya ; Lu, Yuxuan ; Zhao, Yiliang ; Jin, Xingnan</creatorcontrib><description>Biomedical Question Answering aims to extract an answer to the given question from a biomedical context. Due to the strong professionalism of specific domain, it's more difficult to build large-scale datasets for specific domain question answering. Existing methods are limited by the lack of training data, and the performance is not as good as in open-domain settings, especially degrading when facing to the adversarial sample. We try to resolve the above issues. First, effective data augmentation strategies are adopted to improve the model training, including slide window, summarization and round-trip translation. Second, we propose a model weighting strategy for the final answer prediction in biomedical domain, which combines the advantage of two models, open-domain model QANet and BioBERT pre-trained in biomedical domain data. Finally, we give adversarial training to reinforce the robustness of the model. The public biomedical dataset collected from PubMed provided by BioASQ challenge is used to evaluate our approach. The results show that the model performance has been improved significantly compared to the single model and other models participated in BioASQ challenge. It can learn richer semantic expression from data augmentation and adversarial samples, which is beneficial to solve more complex question answering problems in biomedical domain.</description><identifier>ISSN: 1545-5963</identifier><identifier>EISSN: 1557-9964</identifier><identifier>DOI: 10.1109/TCBB.2022.3171388</identifier><identifier>PMID: 35486563</identifier><identifier>CODEN: ITCBCY</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Biological system modeling ; Biomedical data ; Biomedical question answering ; Context modeling ; Data augmentation ; Data models ; Datasets ; deep learning ; Domains ; Machine Learning ; model weighting ; Predictive models ; Questions ; Semantics ; Task analysis ; Training ; Training data ; Weighting</subject><ispartof>IEEE/ACM transactions on computational biology and bioinformatics, 2023-03, Vol.20 (2), p.1114-1124</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c349t-f0cb40ad9262d92453233e34ef1541ae2eee297a35bca4eb7e9d2bf8664713143</citedby><cites>FETCH-LOGICAL-c349t-f0cb40ad9262d92453233e34ef1541ae2eee297a35bca4eb7e9d2bf8664713143</cites><orcidid>0000-0003-0373-1696 ; 0000-0002-8520-0540</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9765710$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9765710$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35486563$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Du, Yongping</creatorcontrib><creatorcontrib>Yan, Jingya</creatorcontrib><creatorcontrib>Lu, Yuxuan</creatorcontrib><creatorcontrib>Zhao, Yiliang</creatorcontrib><creatorcontrib>Jin, Xingnan</creatorcontrib><title>Improving Biomedical Question Answering by Data Augmentation and Model Weighting</title><title>IEEE/ACM transactions on computational biology and bioinformatics</title><addtitle>TCBB</addtitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><description>Biomedical Question Answering aims to extract an answer to the given question from a biomedical context. Due to the strong professionalism of specific domain, it's more difficult to build large-scale datasets for specific domain question answering. Existing methods are limited by the lack of training data, and the performance is not as good as in open-domain settings, especially degrading when facing to the adversarial sample. We try to resolve the above issues. First, effective data augmentation strategies are adopted to improve the model training, including slide window, summarization and round-trip translation. Second, we propose a model weighting strategy for the final answer prediction in biomedical domain, which combines the advantage of two models, open-domain model QANet and BioBERT pre-trained in biomedical domain data. Finally, we give adversarial training to reinforce the robustness of the model. The public biomedical dataset collected from PubMed provided by BioASQ challenge is used to evaluate our approach. The results show that the model performance has been improved significantly compared to the single model and other models participated in BioASQ challenge. It can learn richer semantic expression from data augmentation and adversarial samples, which is beneficial to solve more complex question answering problems in biomedical domain.</description><subject>Biological system modeling</subject><subject>Biomedical data</subject><subject>Biomedical question answering</subject><subject>Context modeling</subject><subject>Data augmentation</subject><subject>Data models</subject><subject>Datasets</subject><subject>deep learning</subject><subject>Domains</subject><subject>Machine Learning</subject><subject>model weighting</subject><subject>Predictive models</subject><subject>Questions</subject><subject>Semantics</subject><subject>Task analysis</subject><subject>Training</subject><subject>Training data</subject><subject>Weighting</subject><issn>1545-5963</issn><issn>1557-9964</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNpdkF1LwzAUhoMobk5_gAhS8Mabznynudzm10BRYeJlSNvT2dEPbVpl_97UzV14kwTOc07e8yB0SvCYEKyvFrPpdEwxpWNGFGFRtIeGRAgVai35fv_mIhRasgE6cm6FMeUa80M0YIJHUkg2RM_z8qOpv_JqGUzzuoQ0T2wRvHTg2ryugknlvqHpq_E6uLatDSbdsoSqtb9lW6XBY51CEbxBvnxvPXiMDjJbODjZ3iP0enuzmN2HD09389nkIUwY122Y4STm2KaaSuoPLhhlDBiHzIcmFigAUK0sE3FiOcQKdErjLJKS-00JZyN0uZnr43_2cU2ZuwSKwlZQd85QKSJKlaCRRy_-oau6ayqfzlClRYSF9P-PENlQSVM710BmPpq8tM3aEGx63abXbXrdZqvb95xvJ3exd7fr-PPrgbMNkPt9dmWtpFAEsx-XToKh</recordid><startdate>202303</startdate><enddate>202303</enddate><creator>Du, Yongping</creator><creator>Yan, Jingya</creator><creator>Lu, Yuxuan</creator><creator>Zhao, Yiliang</creator><creator>Jin, Xingnan</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-0373-1696</orcidid><orcidid>https://orcid.org/0000-0002-8520-0540</orcidid></search><sort><creationdate>202303</creationdate><title>Improving Biomedical Question Answering by Data Augmentation and Model Weighting</title><author>Du, Yongping ; Yan, Jingya ; Lu, Yuxuan ; Zhao, Yiliang ; Jin, Xingnan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c349t-f0cb40ad9262d92453233e34ef1541ae2eee297a35bca4eb7e9d2bf8664713143</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Biological system modeling</topic><topic>Biomedical data</topic><topic>Biomedical question answering</topic><topic>Context modeling</topic><topic>Data augmentation</topic><topic>Data models</topic><topic>Datasets</topic><topic>deep learning</topic><topic>Domains</topic><topic>Machine Learning</topic><topic>model weighting</topic><topic>Predictive models</topic><topic>Questions</topic><topic>Semantics</topic><topic>Task analysis</topic><topic>Training</topic><topic>Training data</topic><topic>Weighting</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Du, Yongping</creatorcontrib><creatorcontrib>Yan, Jingya</creatorcontrib><creatorcontrib>Lu, Yuxuan</creatorcontrib><creatorcontrib>Zhao, Yiliang</creatorcontrib><creatorcontrib>Jin, Xingnan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Du, Yongping</au><au>Yan, Jingya</au><au>Lu, Yuxuan</au><au>Zhao, Yiliang</au><au>Jin, Xingnan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improving Biomedical Question Answering by Data Augmentation and Model Weighting</atitle><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle><stitle>TCBB</stitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><date>2023-03</date><risdate>2023</risdate><volume>20</volume><issue>2</issue><spage>1114</spage><epage>1124</epage><pages>1114-1124</pages><issn>1545-5963</issn><eissn>1557-9964</eissn><coden>ITCBCY</coden><abstract>Biomedical Question Answering aims to extract an answer to the given question from a biomedical context. Due to the strong professionalism of specific domain, it's more difficult to build large-scale datasets for specific domain question answering. Existing methods are limited by the lack of training data, and the performance is not as good as in open-domain settings, especially degrading when facing to the adversarial sample. We try to resolve the above issues. First, effective data augmentation strategies are adopted to improve the model training, including slide window, summarization and round-trip translation. Second, we propose a model weighting strategy for the final answer prediction in biomedical domain, which combines the advantage of two models, open-domain model QANet and BioBERT pre-trained in biomedical domain data. Finally, we give adversarial training to reinforce the robustness of the model. The public biomedical dataset collected from PubMed provided by BioASQ challenge is used to evaluate our approach. The results show that the model performance has been improved significantly compared to the single model and other models participated in BioASQ challenge. It can learn richer semantic expression from data augmentation and adversarial samples, which is beneficial to solve more complex question answering problems in biomedical domain.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>35486563</pmid><doi>10.1109/TCBB.2022.3171388</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0003-0373-1696</orcidid><orcidid>https://orcid.org/0000-0002-8520-0540</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1545-5963
ispartof	IEEE/ACM transactions on computational biology and bioinformatics, 2023-03, Vol.20 (2), p.1114-1124
issn	1545-5963 1557-9964
language	eng
recordid	cdi_proquest_miscellaneous_2658227528
source	IEEE Electronic Library (IEL)
subjects	Biological system modeling Biomedical data Biomedical question answering Context modeling Data augmentation Data models Datasets deep learning Domains Machine Learning model weighting Predictive models Questions Semantics Task analysis Training Training data Weighting
title	Improving Biomedical Question Answering by Data Augmentation and Model Weighting
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T23%3A08%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improving%20Biomedical%20Question%20Answering%20by%20Data%20Augmentation%20and%20Model%20Weighting&rft.jtitle=IEEE/ACM%20transactions%20on%20computational%20biology%20and%20bioinformatics&rft.au=Du,%20Yongping&rft.date=2023-03&rft.volume=20&rft.issue=2&rft.spage=1114&rft.epage=1124&rft.pages=1114-1124&rft.issn=1545-5963&rft.eissn=1557-9964&rft.coden=ITCBCY&rft_id=info:doi/10.1109/TCBB.2022.3171388&rft_dat=%3Cproquest_RIE%3E2795805645%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2795805645&rft_id=info:pmid/35486563&rft_ieee_id=9765710&rfr_iscdi=true