From Large Language Models to Knowledge Graphs for Biomarker Discovery in Cancer

Domain experts often rely on most recent knowledge for apprehending and disseminating specific biological processes that help them design strategies for developing prevention and therapeutic decision-making in various disease scenarios. A challenging scenarios for artificial intelligence (AI) is usi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2023-11
Hauptverfasser: Karim, Md Rezaul, Lina Molinas Comet, Shajalal, Md, Beyan, Oya Deniz, Rebholz-Schuhmann, Dietrich, Decker, Stefan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Karim, Md Rezaul
Lina Molinas Comet
Shajalal, Md
Beyan, Oya Deniz
Rebholz-Schuhmann, Dietrich
Decker, Stefan
description Domain experts often rely on most recent knowledge for apprehending and disseminating specific biological processes that help them design strategies for developing prevention and therapeutic decision-making in various disease scenarios. A challenging scenarios for artificial intelligence (AI) is using biomedical data (e.g., texts, imaging, omics, and clinical) to provide diagnosis and treatment recommendations for cancerous conditions.~Data and knowledge about biomedical entities like cancer, drugs, genes, proteins, and their mechanism is spread across structured (knowledge bases (KBs)) and unstructured (e.g., scientific articles) sources. A large-scale knowledge graph (KG) can be constructed by integrating and extracting facts about semantically interrelated entities and relations. Such a KG not only allows exploration and question answering (QA) but also enables domain experts to deduce new knowledge. However, exploring and querying large-scale KGs is tedious for non-domain users due to their lack of understanding of the data assets and semantic technologies. In this paper, we develop a domain KG to leverage cancer-specific biomarker discovery and interactive QA. For this, we constructed a domain ontology called OncoNet Ontology (ONO), which enables semantic reasoning for validating gene-disease (different types of cancer) relations. The KG is further enriched by harmonizing the ONO, metadata, controlled vocabularies, and biomedical concepts from scientific articles by employing BioBERT- and SciBERT-based information extractors. Further, since the biomedical domain is evolving, where new findings often replace old ones, without having access to up-to-date scientific findings, there is a high chance an AI system exhibits concept drift while providing diagnosis and treatment. Therefore, we fine-tune the KG using large language models (LLMs) based on more recent articles and KBs.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2876764892</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2876764892</sourcerecordid><originalsourceid>FETCH-proquest_journals_28767648923</originalsourceid><addsrcrecordid>eNqNis0KgkAUhYcgSMp3GGgt2Ix_bbM_qKBFexn0aprOtTta9PbNogdoc87H-c6EOULKlZcEQsyYa0zj-76IYhGG0mHXPWHHz4oqsKmrUVm4YAGt4QPyk8Z3C4XdDqT6u-ElEt_U2Cl6APFtbXJ8AX14rXmqdA60YNNStQbcX8_Zcr-7pUevJ3yOYIaswZG0VZlI4iiOgmQt5H-vL_gkPlg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2876764892</pqid></control><display><type>article</type><title>From Large Language Models to Knowledge Graphs for Biomarker Discovery in Cancer</title><source>Freely Accessible Journals</source><creator>Karim, Md Rezaul ; Lina Molinas Comet ; Shajalal, Md ; Beyan, Oya Deniz ; Rebholz-Schuhmann, Dietrich ; Decker, Stefan</creator><creatorcontrib>Karim, Md Rezaul ; Lina Molinas Comet ; Shajalal, Md ; Beyan, Oya Deniz ; Rebholz-Schuhmann, Dietrich ; Decker, Stefan</creatorcontrib><description>Domain experts often rely on most recent knowledge for apprehending and disseminating specific biological processes that help them design strategies for developing prevention and therapeutic decision-making in various disease scenarios. A challenging scenarios for artificial intelligence (AI) is using biomedical data (e.g., texts, imaging, omics, and clinical) to provide diagnosis and treatment recommendations for cancerous conditions.~Data and knowledge about biomedical entities like cancer, drugs, genes, proteins, and their mechanism is spread across structured (knowledge bases (KBs)) and unstructured (e.g., scientific articles) sources. A large-scale knowledge graph (KG) can be constructed by integrating and extracting facts about semantically interrelated entities and relations. Such a KG not only allows exploration and question answering (QA) but also enables domain experts to deduce new knowledge. However, exploring and querying large-scale KGs is tedious for non-domain users due to their lack of understanding of the data assets and semantic technologies. In this paper, we develop a domain KG to leverage cancer-specific biomarker discovery and interactive QA. For this, we constructed a domain ontology called OncoNet Ontology (ONO), which enables semantic reasoning for validating gene-disease (different types of cancer) relations. The KG is further enriched by harmonizing the ONO, metadata, controlled vocabularies, and biomedical concepts from scientific articles by employing BioBERT- and SciBERT-based information extractors. Further, since the biomedical domain is evolving, where new findings often replace old ones, without having access to up-to-date scientific findings, there is a high chance an AI system exhibits concept drift while providing diagnosis and treatment. Therefore, we fine-tune the KG using large language models (LLMs) based on more recent articles and KBs.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Artificial intelligence ; Biological activity ; Biomarkers ; Biomedical data ; Cancer ; Diagnosis ; Health services ; Information retrieval ; Knowledge ; Knowledge bases (artificial intelligence) ; Knowledge representation ; Large language models ; Medical diagnosis ; Ontology ; Semantics ; Subject specialists</subject><ispartof>arXiv.org, 2023-11</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>782,786</link.rule.ids></links><search><creatorcontrib>Karim, Md Rezaul</creatorcontrib><creatorcontrib>Lina Molinas Comet</creatorcontrib><creatorcontrib>Shajalal, Md</creatorcontrib><creatorcontrib>Beyan, Oya Deniz</creatorcontrib><creatorcontrib>Rebholz-Schuhmann, Dietrich</creatorcontrib><creatorcontrib>Decker, Stefan</creatorcontrib><title>From Large Language Models to Knowledge Graphs for Biomarker Discovery in Cancer</title><title>arXiv.org</title><description>Domain experts often rely on most recent knowledge for apprehending and disseminating specific biological processes that help them design strategies for developing prevention and therapeutic decision-making in various disease scenarios. A challenging scenarios for artificial intelligence (AI) is using biomedical data (e.g., texts, imaging, omics, and clinical) to provide diagnosis and treatment recommendations for cancerous conditions.~Data and knowledge about biomedical entities like cancer, drugs, genes, proteins, and their mechanism is spread across structured (knowledge bases (KBs)) and unstructured (e.g., scientific articles) sources. A large-scale knowledge graph (KG) can be constructed by integrating and extracting facts about semantically interrelated entities and relations. Such a KG not only allows exploration and question answering (QA) but also enables domain experts to deduce new knowledge. However, exploring and querying large-scale KGs is tedious for non-domain users due to their lack of understanding of the data assets and semantic technologies. In this paper, we develop a domain KG to leverage cancer-specific biomarker discovery and interactive QA. For this, we constructed a domain ontology called OncoNet Ontology (ONO), which enables semantic reasoning for validating gene-disease (different types of cancer) relations. The KG is further enriched by harmonizing the ONO, metadata, controlled vocabularies, and biomedical concepts from scientific articles by employing BioBERT- and SciBERT-based information extractors. Further, since the biomedical domain is evolving, where new findings often replace old ones, without having access to up-to-date scientific findings, there is a high chance an AI system exhibits concept drift while providing diagnosis and treatment. Therefore, we fine-tune the KG using large language models (LLMs) based on more recent articles and KBs.</description><subject>Artificial intelligence</subject><subject>Biological activity</subject><subject>Biomarkers</subject><subject>Biomedical data</subject><subject>Cancer</subject><subject>Diagnosis</subject><subject>Health services</subject><subject>Information retrieval</subject><subject>Knowledge</subject><subject>Knowledge bases (artificial intelligence)</subject><subject>Knowledge representation</subject><subject>Large language models</subject><subject>Medical diagnosis</subject><subject>Ontology</subject><subject>Semantics</subject><subject>Subject specialists</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNis0KgkAUhYcgSMp3GGgt2Ix_bbM_qKBFexn0aprOtTta9PbNogdoc87H-c6EOULKlZcEQsyYa0zj-76IYhGG0mHXPWHHz4oqsKmrUVm4YAGt4QPyk8Z3C4XdDqT6u-ElEt_U2Cl6APFtbXJ8AX14rXmqdA60YNNStQbcX8_Zcr-7pUevJ3yOYIaswZG0VZlI4iiOgmQt5H-vL_gkPlg</recordid><startdate>20231119</startdate><enddate>20231119</enddate><creator>Karim, Md Rezaul</creator><creator>Lina Molinas Comet</creator><creator>Shajalal, Md</creator><creator>Beyan, Oya Deniz</creator><creator>Rebholz-Schuhmann, Dietrich</creator><creator>Decker, Stefan</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20231119</creationdate><title>From Large Language Models to Knowledge Graphs for Biomarker Discovery in Cancer</title><author>Karim, Md Rezaul ; Lina Molinas Comet ; Shajalal, Md ; Beyan, Oya Deniz ; Rebholz-Schuhmann, Dietrich ; Decker, Stefan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28767648923</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Artificial intelligence</topic><topic>Biological activity</topic><topic>Biomarkers</topic><topic>Biomedical data</topic><topic>Cancer</topic><topic>Diagnosis</topic><topic>Health services</topic><topic>Information retrieval</topic><topic>Knowledge</topic><topic>Knowledge bases (artificial intelligence)</topic><topic>Knowledge representation</topic><topic>Large language models</topic><topic>Medical diagnosis</topic><topic>Ontology</topic><topic>Semantics</topic><topic>Subject specialists</topic><toplevel>online_resources</toplevel><creatorcontrib>Karim, Md Rezaul</creatorcontrib><creatorcontrib>Lina Molinas Comet</creatorcontrib><creatorcontrib>Shajalal, Md</creatorcontrib><creatorcontrib>Beyan, Oya Deniz</creatorcontrib><creatorcontrib>Rebholz-Schuhmann, Dietrich</creatorcontrib><creatorcontrib>Decker, Stefan</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Karim, Md Rezaul</au><au>Lina Molinas Comet</au><au>Shajalal, Md</au><au>Beyan, Oya Deniz</au><au>Rebholz-Schuhmann, Dietrich</au><au>Decker, Stefan</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>From Large Language Models to Knowledge Graphs for Biomarker Discovery in Cancer</atitle><jtitle>arXiv.org</jtitle><date>2023-11-19</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Domain experts often rely on most recent knowledge for apprehending and disseminating specific biological processes that help them design strategies for developing prevention and therapeutic decision-making in various disease scenarios. A challenging scenarios for artificial intelligence (AI) is using biomedical data (e.g., texts, imaging, omics, and clinical) to provide diagnosis and treatment recommendations for cancerous conditions.~Data and knowledge about biomedical entities like cancer, drugs, genes, proteins, and their mechanism is spread across structured (knowledge bases (KBs)) and unstructured (e.g., scientific articles) sources. A large-scale knowledge graph (KG) can be constructed by integrating and extracting facts about semantically interrelated entities and relations. Such a KG not only allows exploration and question answering (QA) but also enables domain experts to deduce new knowledge. However, exploring and querying large-scale KGs is tedious for non-domain users due to their lack of understanding of the data assets and semantic technologies. In this paper, we develop a domain KG to leverage cancer-specific biomarker discovery and interactive QA. For this, we constructed a domain ontology called OncoNet Ontology (ONO), which enables semantic reasoning for validating gene-disease (different types of cancer) relations. The KG is further enriched by harmonizing the ONO, metadata, controlled vocabularies, and biomedical concepts from scientific articles by employing BioBERT- and SciBERT-based information extractors. Further, since the biomedical domain is evolving, where new findings often replace old ones, without having access to up-to-date scientific findings, there is a high chance an AI system exhibits concept drift while providing diagnosis and treatment. Therefore, we fine-tune the KG using large language models (LLMs) based on more recent articles and KBs.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-11
issn 2331-8422
language eng
recordid cdi_proquest_journals_2876764892
source Freely Accessible Journals
subjects Artificial intelligence
Biological activity
Biomarkers
Biomedical data
Cancer
Diagnosis
Health services
Information retrieval
Knowledge
Knowledge bases (artificial intelligence)
Knowledge representation
Large language models
Medical diagnosis
Ontology
Semantics
Subject specialists
title From Large Language Models to Knowledge Graphs for Biomarker Discovery in Cancer
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-04T15%3A28%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=From%20Large%20Language%20Models%20to%20Knowledge%20Graphs%20for%20Biomarker%20Discovery%20in%20Cancer&rft.jtitle=arXiv.org&rft.au=Karim,%20Md%20Rezaul&rft.date=2023-11-19&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2876764892%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2876764892&rft_id=info:pmid/&rfr_iscdi=true