Cross-lingual semantic annotation of biomedical literature: experiments in Spanish and English

Abstract Motivation Biomedical literature is one of the most relevant sources of information for knowledge mining in the field of Bioinformatics. In spite of English being the most widely addressed language in the field; in recent years, there has been a growing interest from the natural language pr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2020-03, Vol.36 (6), p.1872-1880
Hauptverfasser: Perez, Naiara, Accuosto, Pablo, Bravo, Àlex, Cuadros, Montse, Martínez-Garcia, Eva, Saggion, Horacio, Rigau, German
Format: Artikel
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1880
container_issue 6
container_start_page 1872
container_title Bioinformatics
container_volume 36
creator Perez, Naiara
Accuosto, Pablo
Bravo, Àlex
Cuadros, Montse
Martínez-Garcia, Eva
Saggion, Horacio
Rigau, German
description Abstract Motivation Biomedical literature is one of the most relevant sources of information for knowledge mining in the field of Bioinformatics. In spite of English being the most widely addressed language in the field; in recent years, there has been a growing interest from the natural language processing community in dealing with languages other than English. However, the availability of language resources and tools for appropriate treatment of non-English texts is lacking behind. Our research is concerned with the semantic annotation of biomedical texts in the Spanish language, which can be considered an under-resourced language where biomedical text processing is concerned. Results We have carried out experiments to assess the effectiveness of several methods for the automatic annotation of biomedical texts in Spanish. One approach is based on the linguistic analysis of Spanish texts and their annotation using an information retrieval and concept disambiguation approach. A second method takes advantage of a Spanish–English machine translation process to annotate English documents and transfer annotations back to Spanish. A third method takes advantage of the combination of both procedures. Our evaluation shows that a combined system has competitive advantages over the two individual procedures. Availability and implementation UMLSMapper (https://snlt.vicomtech.org/umlsmapper) and the annotation transfer tool (http://scientmin.taln.upf.edu/anntransfer/) are freely available for research purposes as web services and/or demos. Supplementary information Supplementary data are available at Bioinformatics online.
doi_str_mv 10.1093/bioinformatics/btz853
format Article
fullrecord <record><control><sourceid>proquest_TOX</sourceid><recordid>TN_cdi_csuc_recercat_oai_recercat_cat_2072_366663</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/btz853</oup_id><sourcerecordid>2315096878</sourcerecordid><originalsourceid>FETCH-LOGICAL-c439t-e7536e399b947c946b927730160991f68953c8ca570d5e542772a8c1aae6d0843</originalsourceid><addsrcrecordid>eNqNUctOxCAUJUbj-xM0XbqpQnm0uDOT8ZGYuFC3EsrcKqaFCjRRv14mM2rcSUK4hHPOvZyD0BHBpwRLetZab13nw6CTNfGsTZ8NpxtolzCBywpzuZlrKuqSNZjuoL0YXzHmhDG2jXYoqSmucLWLnmbBx1j21j1Pui8iDNplwUI751OW9q7wXZGbDbCwJiN6myDoNAU4L-B9hGAHcCkW1hX3o3Y2vmTuopi75z7XB2ir032Ew_W5jx4v5w-z6_L27upmdnFbGkZlKqHmVACVspWsNpKJVlZ1HpEILCXpRCM5NY3RvMYLDpzlx0o3hmgNYoEbRvcRWemaOBkVwEAwOimv7e9luStcV4qKvGjmnKw4Y_BvE8SkBhsN9L124KeoKko4lqKpmwzla_mlWwE6NeZ_6_ChCFbLONTfONQqjsw7XreY2mzgD-vb_wzAK4Cfxn9qfgHSKZ1E</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2315096878</pqid></control><display><type>article</type><title>Cross-lingual semantic annotation of biomedical literature: experiments in Spanish and English</title><source>Oxford Journals Open Access Collection</source><creator>Perez, Naiara ; Accuosto, Pablo ; Bravo, Àlex ; Cuadros, Montse ; Martínez-Garcia, Eva ; Saggion, Horacio ; Rigau, German</creator><contributor>Wren, Jonathan</contributor><creatorcontrib>Perez, Naiara ; Accuosto, Pablo ; Bravo, Àlex ; Cuadros, Montse ; Martínez-Garcia, Eva ; Saggion, Horacio ; Rigau, German ; Wren, Jonathan</creatorcontrib><description>Abstract Motivation Biomedical literature is one of the most relevant sources of information for knowledge mining in the field of Bioinformatics. In spite of English being the most widely addressed language in the field; in recent years, there has been a growing interest from the natural language processing community in dealing with languages other than English. However, the availability of language resources and tools for appropriate treatment of non-English texts is lacking behind. Our research is concerned with the semantic annotation of biomedical texts in the Spanish language, which can be considered an under-resourced language where biomedical text processing is concerned. Results We have carried out experiments to assess the effectiveness of several methods for the automatic annotation of biomedical texts in Spanish. One approach is based on the linguistic analysis of Spanish texts and their annotation using an information retrieval and concept disambiguation approach. A second method takes advantage of a Spanish–English machine translation process to annotate English documents and transfer annotations back to Spanish. A third method takes advantage of the combination of both procedures. Our evaluation shows that a combined system has competitive advantages over the two individual procedures. Availability and implementation UMLSMapper (https://snlt.vicomtech.org/umlsmapper) and the annotation transfer tool (http://scientmin.taln.upf.edu/anntransfer/) are freely available for research purposes as web services and/or demos. Supplementary information Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btz853</identifier><identifier>PMID: 31730202</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><ispartof>Bioinformatics, 2020-03, Vol.36 (6), p.1872-1880</ispartof><rights>The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2019</rights><rights>The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.</rights><rights>Oxford University Press. This is a pre-copyedited, author-produced version of an article accepted for publication in Bioinformatics following peer review. The version of record Perez N, Accuosto P, Bravo A, Cuadros M, Martínez-Garcia E, Saggion H, Rigau G. Cross-lingual semantic annotation of biomedical literature: experiments in Spanish and English. Bioinformatics. 2019 Nov 15. is available online at: https://doi.org/10.1093/bioinformatics/btz853 info:eu-repo/semantics/openAccess</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c439t-e7536e399b947c946b927730160991f68953c8ca570d5e542772a8c1aae6d0843</citedby><cites>FETCH-LOGICAL-c439t-e7536e399b947c946b927730160991f68953c8ca570d5e542772a8c1aae6d0843</cites><orcidid>0000-0001-8648-0428</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,776,780,881,1598,26953,27903,27904</link.rule.ids><linktorsrc>$$Uhttps://dx.doi.org/10.1093/bioinformatics/btz853$$EView_record_in_Oxford_University_Press$$FView_record_in_$$GOxford_University_Press</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/31730202$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Wren, Jonathan</contributor><creatorcontrib>Perez, Naiara</creatorcontrib><creatorcontrib>Accuosto, Pablo</creatorcontrib><creatorcontrib>Bravo, Àlex</creatorcontrib><creatorcontrib>Cuadros, Montse</creatorcontrib><creatorcontrib>Martínez-Garcia, Eva</creatorcontrib><creatorcontrib>Saggion, Horacio</creatorcontrib><creatorcontrib>Rigau, German</creatorcontrib><title>Cross-lingual semantic annotation of biomedical literature: experiments in Spanish and English</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Abstract Motivation Biomedical literature is one of the most relevant sources of information for knowledge mining in the field of Bioinformatics. In spite of English being the most widely addressed language in the field; in recent years, there has been a growing interest from the natural language processing community in dealing with languages other than English. However, the availability of language resources and tools for appropriate treatment of non-English texts is lacking behind. Our research is concerned with the semantic annotation of biomedical texts in the Spanish language, which can be considered an under-resourced language where biomedical text processing is concerned. Results We have carried out experiments to assess the effectiveness of several methods for the automatic annotation of biomedical texts in Spanish. One approach is based on the linguistic analysis of Spanish texts and their annotation using an information retrieval and concept disambiguation approach. A second method takes advantage of a Spanish–English machine translation process to annotate English documents and transfer annotations back to Spanish. A third method takes advantage of the combination of both procedures. Our evaluation shows that a combined system has competitive advantages over the two individual procedures. Availability and implementation UMLSMapper (https://snlt.vicomtech.org/umlsmapper) and the annotation transfer tool (http://scientmin.taln.upf.edu/anntransfer/) are freely available for research purposes as web services and/or demos. Supplementary information Supplementary data are available at Bioinformatics online.</description><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>XX2</sourceid><recordid>eNqNUctOxCAUJUbj-xM0XbqpQnm0uDOT8ZGYuFC3EsrcKqaFCjRRv14mM2rcSUK4hHPOvZyD0BHBpwRLetZab13nw6CTNfGsTZ8NpxtolzCBywpzuZlrKuqSNZjuoL0YXzHmhDG2jXYoqSmucLWLnmbBx1j21j1Pui8iDNplwUI751OW9q7wXZGbDbCwJiN6myDoNAU4L-B9hGAHcCkW1hX3o3Y2vmTuopi75z7XB2ir032Ew_W5jx4v5w-z6_L27upmdnFbGkZlKqHmVACVspWsNpKJVlZ1HpEILCXpRCM5NY3RvMYLDpzlx0o3hmgNYoEbRvcRWemaOBkVwEAwOimv7e9luStcV4qKvGjmnKw4Y_BvE8SkBhsN9L124KeoKko4lqKpmwzla_mlWwE6NeZ_6_ChCFbLONTfONQqjsw7XreY2mzgD-vb_wzAK4Cfxn9qfgHSKZ1E</recordid><startdate>20200301</startdate><enddate>20200301</enddate><creator>Perez, Naiara</creator><creator>Accuosto, Pablo</creator><creator>Bravo, Àlex</creator><creator>Cuadros, Montse</creator><creator>Martínez-Garcia, Eva</creator><creator>Saggion, Horacio</creator><creator>Rigau, German</creator><general>Oxford University Press</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>XX2</scope><orcidid>https://orcid.org/0000-0001-8648-0428</orcidid></search><sort><creationdate>20200301</creationdate><title>Cross-lingual semantic annotation of biomedical literature: experiments in Spanish and English</title><author>Perez, Naiara ; Accuosto, Pablo ; Bravo, Àlex ; Cuadros, Montse ; Martínez-Garcia, Eva ; Saggion, Horacio ; Rigau, German</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c439t-e7536e399b947c946b927730160991f68953c8ca570d5e542772a8c1aae6d0843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Perez, Naiara</creatorcontrib><creatorcontrib>Accuosto, Pablo</creatorcontrib><creatorcontrib>Bravo, Àlex</creatorcontrib><creatorcontrib>Cuadros, Montse</creatorcontrib><creatorcontrib>Martínez-Garcia, Eva</creatorcontrib><creatorcontrib>Saggion, Horacio</creatorcontrib><creatorcontrib>Rigau, German</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Recercat</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Perez, Naiara</au><au>Accuosto, Pablo</au><au>Bravo, Àlex</au><au>Cuadros, Montse</au><au>Martínez-Garcia, Eva</au><au>Saggion, Horacio</au><au>Rigau, German</au><au>Wren, Jonathan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Cross-lingual semantic annotation of biomedical literature: experiments in Spanish and English</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2020-03-01</date><risdate>2020</risdate><volume>36</volume><issue>6</issue><spage>1872</spage><epage>1880</epage><pages>1872-1880</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><abstract>Abstract Motivation Biomedical literature is one of the most relevant sources of information for knowledge mining in the field of Bioinformatics. In spite of English being the most widely addressed language in the field; in recent years, there has been a growing interest from the natural language processing community in dealing with languages other than English. However, the availability of language resources and tools for appropriate treatment of non-English texts is lacking behind. Our research is concerned with the semantic annotation of biomedical texts in the Spanish language, which can be considered an under-resourced language where biomedical text processing is concerned. Results We have carried out experiments to assess the effectiveness of several methods for the automatic annotation of biomedical texts in Spanish. One approach is based on the linguistic analysis of Spanish texts and their annotation using an information retrieval and concept disambiguation approach. A second method takes advantage of a Spanish–English machine translation process to annotate English documents and transfer annotations back to Spanish. A third method takes advantage of the combination of both procedures. Our evaluation shows that a combined system has competitive advantages over the two individual procedures. Availability and implementation UMLSMapper (https://snlt.vicomtech.org/umlsmapper) and the annotation transfer tool (http://scientmin.taln.upf.edu/anntransfer/) are freely available for research purposes as web services and/or demos. Supplementary information Supplementary data are available at Bioinformatics online.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>31730202</pmid><doi>10.1093/bioinformatics/btz853</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0001-8648-0428</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1367-4803
ispartof Bioinformatics, 2020-03, Vol.36 (6), p.1872-1880
issn 1367-4803
1460-2059
1367-4811
language eng
recordid cdi_csuc_recercat_oai_recercat_cat_2072_366663
source Oxford Journals Open Access Collection
title Cross-lingual semantic annotation of biomedical literature: experiments in Spanish and English
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T06%3A14%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_TOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Cross-lingual%20semantic%20annotation%20of%20biomedical%20literature:%20experiments%20in%20Spanish%20and%20English&rft.jtitle=Bioinformatics&rft.au=Perez,%20Naiara&rft.date=2020-03-01&rft.volume=36&rft.issue=6&rft.spage=1872&rft.epage=1880&rft.pages=1872-1880&rft.issn=1367-4803&rft.eissn=1460-2059&rft_id=info:doi/10.1093/bioinformatics/btz853&rft_dat=%3Cproquest_TOX%3E2315096878%3C/proquest_TOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2315096878&rft_id=info:pmid/31730202&rft_oup_id=10.1093/bioinformatics/btz853&rfr_iscdi=true