Linking Surface Facts to Large-Scale Knowledge Graphs

Open Information Extraction (OIE) methods extract facts from natural language text in the form of ("subject"; "relation"; "object") triples. These facts are, however, merely surface forms, the ambiguity of which impedes their downstream usage; e.g., the surface phrase &...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Radevski, Gorjan, Gashteovski, Kiril, Hung, Chia-Chien, Lawrence, Carolin, Glavaš, Goran
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Radevski, Gorjan Gashteovski, Kiril Hung, Chia-Chien Lawrence, Carolin Glavaš, Goran
description	Open Information Extraction (OIE) methods extract facts from natural language text in the form of ("subject"; "relation"; "object") triples. These facts are, however, merely surface forms, the ambiguity of which impedes their downstream usage; e.g., the surface phrase "Michael Jordan" may refer to either the former basketball player or the university professor. Knowledge Graphs (KGs), on the other hand, contain facts in a canonical (i.e., unambiguous) form, but their coverage is limited by a static schema (i.e., a fixed set of entities and predicates). To bridge this gap, we need the best of both worlds: (i) high coverage of free-text OIEs, and (ii) semantic precision (i.e., monosemy) of KGs. In order to achieve this goal, we propose a new benchmark with novel evaluation protocols that can, for example, measure fact linking performance on a granular triple slot level, while also measuring if a system has the ability to recognize that a surface form has no match in the existing KG. Our extensive evaluation of several baselines show that detection of out-of-KG entities and predicates is more difficult than accurate linking to existing ones, thus calling for more research efforts on this difficult task. We publicly release all resources (data, benchmark and code) on https://github.com/nec-research/fact-linking.
doi_str_mv	10.48550/arxiv.2310.14909
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2310_14909</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2310_14909</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-2394c9600a70d7485185edfb0f6f1b64e0e684ca73138583973368618c1050253</originalsourceid><addsrcrecordid>eNotzrtuwkAQheFtUkQkD5CKfQGTWc9eS4QCiWKJAnprWM86VoxBa3J7-xBCdaS_OPqEeFAw094YeKT83X3OSjwHpQOEW2GqbnjvhlZuPnKiyHJJ8TTK00FWlFsuNpF6lq_D4avnpmW5ynR8G-_ETaJ-5PvrTsR2-bRdPBfVevWymFcFWReKEoOOwQKQg8adBcobbtIOkk1qZzUDW68jOVTojcfgEK23ykcFBkqDEzH9v72462Pu9pR_6j9_ffHjL88xPUY</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Linking Surface Facts to Large-Scale Knowledge Graphs</title><source>arXiv.org</source><creator>Radevski, Gorjan ; Gashteovski, Kiril ; Hung, Chia-Chien ; Lawrence, Carolin ; Glavaš, Goran</creator><creatorcontrib>Radevski, Gorjan ; Gashteovski, Kiril ; Hung, Chia-Chien ; Lawrence, Carolin ; Glavaš, Goran</creatorcontrib><description>Open Information Extraction (OIE) methods extract facts from natural language text in the form of ("subject"; "relation"; "object") triples. These facts are, however, merely surface forms, the ambiguity of which impedes their downstream usage; e.g., the surface phrase "Michael Jordan" may refer to either the former basketball player or the university professor. Knowledge Graphs (KGs), on the other hand, contain facts in a canonical (i.e., unambiguous) form, but their coverage is limited by a static schema (i.e., a fixed set of entities and predicates). To bridge this gap, we need the best of both worlds: (i) high coverage of free-text OIEs, and (ii) semantic precision (i.e., monosemy) of KGs. In order to achieve this goal, we propose a new benchmark with novel evaluation protocols that can, for example, measure fact linking performance on a granular triple slot level, while also measuring if a system has the ability to recognize that a surface form has no match in the existing KG. Our extensive evaluation of several baselines show that detection of out-of-KG entities and predicates is more difficult than accurate linking to existing ones, thus calling for more research efforts on this difficult task. We publicly release all resources (data, benchmark and code) on https://github.com/nec-research/fact-linking.</description><identifier>DOI: 10.48550/arxiv.2310.14909</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Learning</subject><creationdate>2023-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2310.14909$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2310.14909$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Radevski, Gorjan</creatorcontrib><creatorcontrib>Gashteovski, Kiril</creatorcontrib><creatorcontrib>Hung, Chia-Chien</creatorcontrib><creatorcontrib>Lawrence, Carolin</creatorcontrib><creatorcontrib>Glavaš, Goran</creatorcontrib><title>Linking Surface Facts to Large-Scale Knowledge Graphs</title><description>Open Information Extraction (OIE) methods extract facts from natural language text in the form of ("subject"; "relation"; "object") triples. These facts are, however, merely surface forms, the ambiguity of which impedes their downstream usage; e.g., the surface phrase "Michael Jordan" may refer to either the former basketball player or the university professor. Knowledge Graphs (KGs), on the other hand, contain facts in a canonical (i.e., unambiguous) form, but their coverage is limited by a static schema (i.e., a fixed set of entities and predicates). To bridge this gap, we need the best of both worlds: (i) high coverage of free-text OIEs, and (ii) semantic precision (i.e., monosemy) of KGs. In order to achieve this goal, we propose a new benchmark with novel evaluation protocols that can, for example, measure fact linking performance on a granular triple slot level, while also measuring if a system has the ability to recognize that a surface form has no match in the existing KG. Our extensive evaluation of several baselines show that detection of out-of-KG entities and predicates is more difficult than accurate linking to existing ones, thus calling for more research efforts on this difficult task. We publicly release all resources (data, benchmark and code) on https://github.com/nec-research/fact-linking.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzrtuwkAQheFtUkQkD5CKfQGTWc9eS4QCiWKJAnprWM86VoxBa3J7-xBCdaS_OPqEeFAw094YeKT83X3OSjwHpQOEW2GqbnjvhlZuPnKiyHJJ8TTK00FWlFsuNpF6lq_D4avnpmW5ynR8G-_ETaJ-5PvrTsR2-bRdPBfVevWymFcFWReKEoOOwQKQg8adBcobbtIOkk1qZzUDW68jOVTojcfgEK23ykcFBkqDEzH9v72462Pu9pR_6j9_ffHjL88xPUY</recordid><startdate>20231023</startdate><enddate>20231023</enddate><creator>Radevski, Gorjan</creator><creator>Gashteovski, Kiril</creator><creator>Hung, Chia-Chien</creator><creator>Lawrence, Carolin</creator><creator>Glavaš, Goran</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231023</creationdate><title>Linking Surface Facts to Large-Scale Knowledge Graphs</title><author>Radevski, Gorjan ; Gashteovski, Kiril ; Hung, Chia-Chien ; Lawrence, Carolin ; Glavaš, Goran</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-2394c9600a70d7485185edfb0f6f1b64e0e684ca73138583973368618c1050253</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Radevski, Gorjan</creatorcontrib><creatorcontrib>Gashteovski, Kiril</creatorcontrib><creatorcontrib>Hung, Chia-Chien</creatorcontrib><creatorcontrib>Lawrence, Carolin</creatorcontrib><creatorcontrib>Glavaš, Goran</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Radevski, Gorjan</au><au>Gashteovski, Kiril</au><au>Hung, Chia-Chien</au><au>Lawrence, Carolin</au><au>Glavaš, Goran</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Linking Surface Facts to Large-Scale Knowledge Graphs</atitle><date>2023-10-23</date><risdate>2023</risdate><abstract>Open Information Extraction (OIE) methods extract facts from natural language text in the form of ("subject"; "relation"; "object") triples. These facts are, however, merely surface forms, the ambiguity of which impedes their downstream usage; e.g., the surface phrase "Michael Jordan" may refer to either the former basketball player or the university professor. Knowledge Graphs (KGs), on the other hand, contain facts in a canonical (i.e., unambiguous) form, but their coverage is limited by a static schema (i.e., a fixed set of entities and predicates). To bridge this gap, we need the best of both worlds: (i) high coverage of free-text OIEs, and (ii) semantic precision (i.e., monosemy) of KGs. In order to achieve this goal, we propose a new benchmark with novel evaluation protocols that can, for example, measure fact linking performance on a granular triple slot level, while also measuring if a system has the ability to recognize that a surface form has no match in the existing KG. Our extensive evaluation of several baselines show that detection of out-of-KG entities and predicates is more difficult than accurate linking to existing ones, thus calling for more research efforts on this difficult task. We publicly release all resources (data, benchmark and code) on https://github.com/nec-research/fact-linking.</abstract><doi>10.48550/arxiv.2310.14909</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2310.14909
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2310_14909
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning
title	Linking Surface Facts to Large-Scale Knowledge Graphs
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T05%3A26%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Linking%20Surface%20Facts%20to%20Large-Scale%20Knowledge%20Graphs&rft.au=Radevski,%20Gorjan&rft.date=2023-10-23&rft_id=info:doi/10.48550/arxiv.2310.14909&rft_dat=%3Carxiv_GOX%3E2310_14909%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true