MarkerGenie: an NLP-enabled text-mining system for biomedical entity relation extraction

Natural language processing (NLP) tasks aim to convert unstructured text data (e.g. articles or dialogues) to structured information. In recent years, we have witnessed fundamental advances of NLP technique, which has been widely used in many applications such as financial text mining, news recommen...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Bioinformatics advances 2022, Vol.2 (1), p.vbac035-vbac035
Hauptverfasser:	Gu, Wenhao, Yang, Xiao, Yang, Minhao, Han, Kun, Pan, Wenying, Zhu, Zexuan
Format:	Artikel
Sprache:	eng
Schlagworte:	Original Paper
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	vbac035
container_issue	1
container_start_page	vbac035
container_title	Bioinformatics advances
container_volume	2
creator	Gu, Wenhao Yang, Xiao Yang, Minhao Han, Kun Pan, Wenying Zhu, Zexuan
description	Natural language processing (NLP) tasks aim to convert unstructured text data (e.g. articles or dialogues) to structured information. In recent years, we have witnessed fundamental advances of NLP technique, which has been widely used in many applications such as financial text mining, news recommendation and machine translation. However, its application in the biomedical space remains challenging due to a lack of labeled data, ambiguities and inconsistencies of biological terminology. In biomedical marker discovery studies, tools that rely on NLP models to automatically and accurately extract relations of biomedical entities are valuable as they can provide a more thorough survey of all available literature, hence providing a less biased result compared to manual curation. In addition, the fast speed of machine reader helps quickly orient research and development. To address the aforementioned needs, we developed automatic training data labeling, rule-based biological terminology cleaning and a more accurate NLP model for binary associative and multi-relation prediction into the program. We demonstrated the effectiveness of the proposed methods in identifying relations between biomedical entities on various benchmark datasets and case studies. MarkerGenie is available at https://www.genegeniedx.com/markergenie/. Data for model training and evaluation, term lists of biomedical entities, details of the case studies and all trained models are provided at https://drive.google.com/drive/folders/14RypiIfIr3W_K-mNIAx9BNtObHSZoAyn?usp=sharing. Supplementary data are available at online.
doi_str_mv	10.1093/bioadv/vbac035
format	Article
fullrecord	<record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9710573</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2769993822</sourcerecordid><originalsourceid>FETCH-LOGICAL-c390t-cb377083f91971b168f555229ada9174d1746378609d4c5a5cde92a015dd00f73</originalsourceid><addsrcrecordid>eNpVUU1LAzEUDKJoUa8eJUcv2yabZrPxIEjxC-rHQcFbeJtkNbqbrcm22H9vSmvRQ8iDN29mmEHohJIhJZKNKteBWYwWFWjC-A4a5AXjGSFjuvtnPkDHMX4QQnIhCjpm--iAFYWUrCwH6PUewqcNN9Y7e47B44fpU2Y9VI01uLfffdY67_wbjsvY2xbXXcBJtrXGaWiw9b3rlzjYBnrXeZwOAujVeIT2amiiPd78h-jl-up5cptNH2_uJpfTTDNJ-kxXTAhSslpSKWhFi7LmnOe5BAOSirFJr2CiLIg0Y82Ba2NlDoRyYwipBTtEF2ve2bxKrnRyFKBRs-BaCEvVgVP_N969q7duoZIc4YIlgrMNQei-5jb2qnVR26YBb7t5VLlIWaWw8jxBh2uoDl2MwdZbGUrUqhG1bkRtGkkHp3_NbeG_-bMf4heKhA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2769993822</pqid></control><display><type>article</type><title>MarkerGenie: an NLP-enabled text-mining system for biomedical entity relation extraction</title><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Oxford Journals Open Access Collection</source><source>PubMed Central</source><creator>Gu, Wenhao ; Yang, Xiao ; Yang, Minhao ; Han, Kun ; Pan, Wenying ; Zhu, Zexuan</creator><contributor>Arighi, Cecilia</contributor><creatorcontrib>Gu, Wenhao ; Yang, Xiao ; Yang, Minhao ; Han, Kun ; Pan, Wenying ; Zhu, Zexuan ; Arighi, Cecilia</creatorcontrib><description>Natural language processing (NLP) tasks aim to convert unstructured text data (e.g. articles or dialogues) to structured information. In recent years, we have witnessed fundamental advances of NLP technique, which has been widely used in many applications such as financial text mining, news recommendation and machine translation. However, its application in the biomedical space remains challenging due to a lack of labeled data, ambiguities and inconsistencies of biological terminology. In biomedical marker discovery studies, tools that rely on NLP models to automatically and accurately extract relations of biomedical entities are valuable as they can provide a more thorough survey of all available literature, hence providing a less biased result compared to manual curation. In addition, the fast speed of machine reader helps quickly orient research and development. To address the aforementioned needs, we developed automatic training data labeling, rule-based biological terminology cleaning and a more accurate NLP model for binary associative and multi-relation prediction into the program. We demonstrated the effectiveness of the proposed methods in identifying relations between biomedical entities on various benchmark datasets and case studies. MarkerGenie is available at https://www.genegeniedx.com/markergenie/. Data for model training and evaluation, term lists of biomedical entities, details of the case studies and all trained models are provided at https://drive.google.com/drive/folders/14RypiIfIr3W_K-mNIAx9BNtObHSZoAyn?usp=sharing. Supplementary data are available at online.</description><identifier>ISSN: 2635-0041</identifier><identifier>EISSN: 2635-0041</identifier><identifier>DOI: 10.1093/bioadv/vbac035</identifier><identifier>PMID: 36699388</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Original Paper</subject><ispartof>Bioinformatics advances, 2022, Vol.2 (1), p.vbac035-vbac035</ispartof><rights>The Author(s) 2022. Published by Oxford University Press.</rights><rights>The Author(s) 2022. Published by Oxford University Press. 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c390t-cb377083f91971b168f555229ada9174d1746378609d4c5a5cde92a015dd00f73</citedby><cites>FETCH-LOGICAL-c390t-cb377083f91971b168f555229ada9174d1746378609d4c5a5cde92a015dd00f73</cites><orcidid>0000-0001-8479-6904</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710573/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710573/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,724,777,781,861,882,4010,27904,27905,27906,53772,53774</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/36699388$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Arighi, Cecilia</contributor><creatorcontrib>Gu, Wenhao</creatorcontrib><creatorcontrib>Yang, Xiao</creatorcontrib><creatorcontrib>Yang, Minhao</creatorcontrib><creatorcontrib>Han, Kun</creatorcontrib><creatorcontrib>Pan, Wenying</creatorcontrib><creatorcontrib>Zhu, Zexuan</creatorcontrib><title>MarkerGenie: an NLP-enabled text-mining system for biomedical entity relation extraction</title><title>Bioinformatics advances</title><addtitle>Bioinform Adv</addtitle><description>Natural language processing (NLP) tasks aim to convert unstructured text data (e.g. articles or dialogues) to structured information. In recent years, we have witnessed fundamental advances of NLP technique, which has been widely used in many applications such as financial text mining, news recommendation and machine translation. However, its application in the biomedical space remains challenging due to a lack of labeled data, ambiguities and inconsistencies of biological terminology. In biomedical marker discovery studies, tools that rely on NLP models to automatically and accurately extract relations of biomedical entities are valuable as they can provide a more thorough survey of all available literature, hence providing a less biased result compared to manual curation. In addition, the fast speed of machine reader helps quickly orient research and development. To address the aforementioned needs, we developed automatic training data labeling, rule-based biological terminology cleaning and a more accurate NLP model for binary associative and multi-relation prediction into the program. We demonstrated the effectiveness of the proposed methods in identifying relations between biomedical entities on various benchmark datasets and case studies. MarkerGenie is available at https://www.genegeniedx.com/markergenie/. Data for model training and evaluation, term lists of biomedical entities, details of the case studies and all trained models are provided at https://drive.google.com/drive/folders/14RypiIfIr3W_K-mNIAx9BNtObHSZoAyn?usp=sharing. Supplementary data are available at online.</description><subject>Original Paper</subject><issn>2635-0041</issn><issn>2635-0041</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNpVUU1LAzEUDKJoUa8eJUcv2yabZrPxIEjxC-rHQcFbeJtkNbqbrcm22H9vSmvRQ8iDN29mmEHohJIhJZKNKteBWYwWFWjC-A4a5AXjGSFjuvtnPkDHMX4QQnIhCjpm--iAFYWUrCwH6PUewqcNN9Y7e47B44fpU2Y9VI01uLfffdY67_wbjsvY2xbXXcBJtrXGaWiw9b3rlzjYBnrXeZwOAujVeIT2amiiPd78h-jl-up5cptNH2_uJpfTTDNJ-kxXTAhSslpSKWhFi7LmnOe5BAOSirFJr2CiLIg0Y82Ba2NlDoRyYwipBTtEF2ve2bxKrnRyFKBRs-BaCEvVgVP_N969q7duoZIc4YIlgrMNQei-5jb2qnVR26YBb7t5VLlIWaWw8jxBh2uoDl2MwdZbGUrUqhG1bkRtGkkHp3_NbeG_-bMf4heKhA</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Gu, Wenhao</creator><creator>Yang, Xiao</creator><creator>Yang, Minhao</creator><creator>Han, Kun</creator><creator>Pan, Wenying</creator><creator>Zhu, Zexuan</creator><general>Oxford University Press</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0001-8479-6904</orcidid></search><sort><creationdate>2022</creationdate><title>MarkerGenie: an NLP-enabled text-mining system for biomedical entity relation extraction</title><author>Gu, Wenhao ; Yang, Xiao ; Yang, Minhao ; Han, Kun ; Pan, Wenying ; Zhu, Zexuan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c390t-cb377083f91971b168f555229ada9174d1746378609d4c5a5cde92a015dd00f73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Original Paper</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gu, Wenhao</creatorcontrib><creatorcontrib>Yang, Xiao</creatorcontrib><creatorcontrib>Yang, Minhao</creatorcontrib><creatorcontrib>Han, Kun</creatorcontrib><creatorcontrib>Pan, Wenying</creatorcontrib><creatorcontrib>Zhu, Zexuan</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Bioinformatics advances</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Gu, Wenhao</au><au>Yang, Xiao</au><au>Yang, Minhao</au><au>Han, Kun</au><au>Pan, Wenying</au><au>Zhu, Zexuan</au><au>Arighi, Cecilia</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MarkerGenie: an NLP-enabled text-mining system for biomedical entity relation extraction</atitle><jtitle>Bioinformatics advances</jtitle><addtitle>Bioinform Adv</addtitle><date>2022</date><risdate>2022</risdate><volume>2</volume><issue>1</issue><spage>vbac035</spage><epage>vbac035</epage><pages>vbac035-vbac035</pages><issn>2635-0041</issn><eissn>2635-0041</eissn><abstract>Natural language processing (NLP) tasks aim to convert unstructured text data (e.g. articles or dialogues) to structured information. In recent years, we have witnessed fundamental advances of NLP technique, which has been widely used in many applications such as financial text mining, news recommendation and machine translation. However, its application in the biomedical space remains challenging due to a lack of labeled data, ambiguities and inconsistencies of biological terminology. In biomedical marker discovery studies, tools that rely on NLP models to automatically and accurately extract relations of biomedical entities are valuable as they can provide a more thorough survey of all available literature, hence providing a less biased result compared to manual curation. In addition, the fast speed of machine reader helps quickly orient research and development. To address the aforementioned needs, we developed automatic training data labeling, rule-based biological terminology cleaning and a more accurate NLP model for binary associative and multi-relation prediction into the program. We demonstrated the effectiveness of the proposed methods in identifying relations between biomedical entities on various benchmark datasets and case studies. MarkerGenie is available at https://www.genegeniedx.com/markergenie/. Data for model training and evaluation, term lists of biomedical entities, details of the case studies and all trained models are provided at https://drive.google.com/drive/folders/14RypiIfIr3W_K-mNIAx9BNtObHSZoAyn?usp=sharing. Supplementary data are available at online.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>36699388</pmid><doi>10.1093/bioadv/vbac035</doi><orcidid>https://orcid.org/0000-0001-8479-6904</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2635-0041
ispartof	Bioinformatics advances, 2022, Vol.2 (1), p.vbac035-vbac035
issn	2635-0041 2635-0041
language	eng
recordid	cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9710573
source	DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Oxford Journals Open Access Collection; PubMed Central
subjects	Original Paper
title	MarkerGenie: an NLP-enabled text-mining system for biomedical entity relation extraction
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T10%3A06%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MarkerGenie:%20an%20NLP-enabled%20text-mining%20system%20for%20biomedical%20entity%20relation%20extraction&rft.jtitle=Bioinformatics%20advances&rft.au=Gu,%20Wenhao&rft.date=2022&rft.volume=2&rft.issue=1&rft.spage=vbac035&rft.epage=vbac035&rft.pages=vbac035-vbac035&rft.issn=2635-0041&rft.eissn=2635-0041&rft_id=info:doi/10.1093/bioadv/vbac035&rft_dat=%3Cproquest_pubme%3E2769993822%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2769993822&rft_id=info:pmid/36699388&rfr_iscdi=true