Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning

Creating knowledge bases and ontologies is a time consuming task that relies on a manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrary complex nested knowledge...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2023-12
Hauptverfasser:	Caufield, J Harry, Hegde, Harshad, Emonet, Vincent, Harris, Nomi L, Joachimiak, Marcin P, Matentzoglu, Nicolas, Kim, HyeongSik, Moxon, Sierra A T, Reese, Justin T, Haendel, Melissa A, Robinson, Peter N, Mungall, Christopher J
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Learning Interrogation Knowledge Knowledge bases (artificial intelligence) Ontology Semantics Training Zero-shot learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Caufield, J Harry Hegde, Harshad Emonet, Vincent Harris, Nomi L Joachimiak, Marcin P Matentzoglu, Nicolas Kim, HyeongSik Moxon, Sierra A T Reese, Justin T Haendel, Melissa A Robinson, Peter N Mungall, Christopher J
description	Creating knowledge bases and ontologies is a time consuming task that relies on a manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrary complex nested knowledge schemas. Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning (ZSL) and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against GPT-3+ to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for all matched elements. We present examples of use of SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease causation graphs. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction (RE) methods, but has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. SPIRES is available as part of the open source OntoGPT package: https://github.com/ monarch-initiative/ontogpt.
doi_str_mv	10.48550/arxiv.2304.02711
format	Article
fullrecord	<record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2304_02711</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2797430947</sourcerecordid><originalsourceid>FETCH-LOGICAL-a951-794e44d33e3119f97a6a1b87c53c3e5075268a9e055efae77be30b8eb055abcf3</originalsourceid><addsrcrecordid>eNotkM1Kw0AUhQdBsNQ-gCsH3OgidX47ibtSqhYKiu0-TJKbNjWZiTOTWn0KH9nYurrw3cPh8CF0RclYxFKSe-0O1X7MOBFjwhSlZ2jAOKdRLBi7QCPvd4QQNlFMSj5AP6vgujx0DgrcOtu0AVcmgHN2o0NlDdamwA7yzvlqDxgOwen8-LAl9tBoE6rc49vV6-Jtvrp7wFPcQNjaApfW4da2Xd33mA1-N_azhmIDONMePO78H_0GZyO_tQHXoJ3p0SU6L3XtYfR_h2j9OF_PnqPly9NiNl1GOpE0UokAIQrOgVOalInSE02zWOWS5xwkUZJNYp0AkRJKDUplwEkWQ9YDneUlH6LrU-1RV9q6qtHuK_3Tlh619YmbU6LX8tGBD-nOds70m1KmEiU4SYTiv81Icyg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2797430947</pqid></control><display><type>article</type><title>Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Caufield, J Harry ; Hegde, Harshad ; Emonet, Vincent ; Harris, Nomi L ; Joachimiak, Marcin P ; Matentzoglu, Nicolas ; Kim, HyeongSik ; Moxon, Sierra A T ; Reese, Justin T ; Haendel, Melissa A ; Robinson, Peter N ; Mungall, Christopher J</creator><creatorcontrib>Caufield, J Harry ; Hegde, Harshad ; Emonet, Vincent ; Harris, Nomi L ; Joachimiak, Marcin P ; Matentzoglu, Nicolas ; Kim, HyeongSik ; Moxon, Sierra A T ; Reese, Justin T ; Haendel, Melissa A ; Robinson, Peter N ; Mungall, Christopher J</creatorcontrib><description>Creating knowledge bases and ontologies is a time consuming task that relies on a manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrary complex nested knowledge schemas. Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning (ZSL) and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against GPT-3+ to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for all matched elements. We present examples of use of SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease causation graphs. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction (RE) methods, but has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. SPIRES is available as part of the open source OntoGPT package: https://github.com/ monarch-initiative/ontogpt.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2304.02711</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning ; Interrogation ; Knowledge ; Knowledge bases (artificial intelligence) ; Ontology ; Semantics ; Training ; Zero-shot learning</subject><ispartof>arXiv.org, 2023-12</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27904</link.rule.ids><backlink>$$Uhttps://doi.org/10.1093/bioinformatics/btae104$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.2304.02711$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Caufield, J Harry</creatorcontrib><creatorcontrib>Hegde, Harshad</creatorcontrib><creatorcontrib>Emonet, Vincent</creatorcontrib><creatorcontrib>Harris, Nomi L</creatorcontrib><creatorcontrib>Joachimiak, Marcin P</creatorcontrib><creatorcontrib>Matentzoglu, Nicolas</creatorcontrib><creatorcontrib>Kim, HyeongSik</creatorcontrib><creatorcontrib>Moxon, Sierra A T</creatorcontrib><creatorcontrib>Reese, Justin T</creatorcontrib><creatorcontrib>Haendel, Melissa A</creatorcontrib><creatorcontrib>Robinson, Peter N</creatorcontrib><creatorcontrib>Mungall, Christopher J</creatorcontrib><title>Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning</title><title>arXiv.org</title><description>Creating knowledge bases and ontologies is a time consuming task that relies on a manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrary complex nested knowledge schemas. Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning (ZSL) and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against GPT-3+ to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for all matched elements. We present examples of use of SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease causation graphs. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction (RE) methods, but has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. SPIRES is available as part of the open source OntoGPT package: https://github.com/ monarch-initiative/ontogpt.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><subject>Interrogation</subject><subject>Knowledge</subject><subject>Knowledge bases (artificial intelligence)</subject><subject>Ontology</subject><subject>Semantics</subject><subject>Training</subject><subject>Zero-shot learning</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><sourceid>GOX</sourceid><recordid>eNotkM1Kw0AUhQdBsNQ-gCsH3OgidX47ibtSqhYKiu0-TJKbNjWZiTOTWn0KH9nYurrw3cPh8CF0RclYxFKSe-0O1X7MOBFjwhSlZ2jAOKdRLBi7QCPvd4QQNlFMSj5AP6vgujx0DgrcOtu0AVcmgHN2o0NlDdamwA7yzvlqDxgOwen8-LAl9tBoE6rc49vV6-Jtvrp7wFPcQNjaApfW4da2Xd33mA1-N_azhmIDONMePO78H_0GZyO_tQHXoJ3p0SU6L3XtYfR_h2j9OF_PnqPly9NiNl1GOpE0UokAIQrOgVOalInSE02zWOWS5xwkUZJNYp0AkRJKDUplwEkWQ9YDneUlH6LrU-1RV9q6qtHuK_3Tlh619YmbU6LX8tGBD-nOds70m1KmEiU4SYTiv81Icyg</recordid><startdate>20231222</startdate><enddate>20231222</enddate><creator>Caufield, J Harry</creator><creator>Hegde, Harshad</creator><creator>Emonet, Vincent</creator><creator>Harris, Nomi L</creator><creator>Joachimiak, Marcin P</creator><creator>Matentzoglu, Nicolas</creator><creator>Kim, HyeongSik</creator><creator>Moxon, Sierra A T</creator><creator>Reese, Justin T</creator><creator>Haendel, Melissa A</creator><creator>Robinson, Peter N</creator><creator>Mungall, Christopher J</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231222</creationdate><title>Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning</title><author>Caufield, J Harry ; Hegde, Harshad ; Emonet, Vincent ; Harris, Nomi L ; Joachimiak, Marcin P ; Matentzoglu, Nicolas ; Kim, HyeongSik ; Moxon, Sierra A T ; Reese, Justin T ; Haendel, Melissa A ; Robinson, Peter N ; Mungall, Christopher J</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a951-794e44d33e3119f97a6a1b87c53c3e5075268a9e055efae77be30b8eb055abcf3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><topic>Interrogation</topic><topic>Knowledge</topic><topic>Knowledge bases (artificial intelligence)</topic><topic>Ontology</topic><topic>Semantics</topic><topic>Training</topic><topic>Zero-shot learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Caufield, J Harry</creatorcontrib><creatorcontrib>Hegde, Harshad</creatorcontrib><creatorcontrib>Emonet, Vincent</creatorcontrib><creatorcontrib>Harris, Nomi L</creatorcontrib><creatorcontrib>Joachimiak, Marcin P</creatorcontrib><creatorcontrib>Matentzoglu, Nicolas</creatorcontrib><creatorcontrib>Kim, HyeongSik</creatorcontrib><creatorcontrib>Moxon, Sierra A T</creatorcontrib><creatorcontrib>Reese, Justin T</creatorcontrib><creatorcontrib>Haendel, Melissa A</creatorcontrib><creatorcontrib>Robinson, Peter N</creatorcontrib><creatorcontrib>Mungall, Christopher J</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Caufield, J Harry</au><au>Hegde, Harshad</au><au>Emonet, Vincent</au><au>Harris, Nomi L</au><au>Joachimiak, Marcin P</au><au>Matentzoglu, Nicolas</au><au>Kim, HyeongSik</au><au>Moxon, Sierra A T</au><au>Reese, Justin T</au><au>Haendel, Melissa A</au><au>Robinson, Peter N</au><au>Mungall, Christopher J</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning</atitle><jtitle>arXiv.org</jtitle><date>2023-12-22</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Creating knowledge bases and ontologies is a time consuming task that relies on a manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrary complex nested knowledge schemas. Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning (ZSL) and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against GPT-3+ to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for all matched elements. We present examples of use of SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease causation graphs. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction (RE) methods, but has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. SPIRES is available as part of the open source OntoGPT package: https://github.com/ monarch-initiative/ontogpt.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2304.02711</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2023-12
issn	2331-8422
language	eng
recordid	cdi_arxiv_primary_2304_02711
source	arXiv.org; Free E- Journals
subjects	Computer Science - Artificial Intelligence Computer Science - Learning Interrogation Knowledge Knowledge bases (artificial intelligence) Ontology Semantics Training Zero-shot learning
title	Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T07%3A06%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Structured%20prompt%20interrogation%20and%20recursive%20extraction%20of%20semantics%20(SPIRES):%20A%20method%20for%20populating%20knowledge%20bases%20using%20zero-shot%20learning&rft.jtitle=arXiv.org&rft.au=Caufield,%20J%20Harry&rft.date=2023-12-22&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2304.02711&rft_dat=%3Cproquest_arxiv%3E2797430947%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2797430947&rft_id=info:pmid/&rfr_iscdi=true