Linking Surface Facts to Large-Scale Knowledge Graphs

Open Information Extraction (OIE) methods extract facts from natural language text in the form of ("subject"; "relation"; "object") triples. These facts are, however, merely surface forms, the ambiguity of which impedes their downstream usage; e.g., the surface phrase &...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Radevski, Gorjan, Gashteovski, Kiril, Hung, Chia-Chien, Lawrence, Carolin, Glavaš, Goran
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Radevski, Gorjan
Gashteovski, Kiril
Hung, Chia-Chien
Lawrence, Carolin
Glavaš, Goran
description Open Information Extraction (OIE) methods extract facts from natural language text in the form of ("subject"; "relation"; "object") triples. These facts are, however, merely surface forms, the ambiguity of which impedes their downstream usage; e.g., the surface phrase "Michael Jordan" may refer to either the former basketball player or the university professor. Knowledge Graphs (KGs), on the other hand, contain facts in a canonical (i.e., unambiguous) form, but their coverage is limited by a static schema (i.e., a fixed set of entities and predicates). To bridge this gap, we need the best of both worlds: (i) high coverage of free-text OIEs, and (ii) semantic precision (i.e., monosemy) of KGs. In order to achieve this goal, we propose a new benchmark with novel evaluation protocols that can, for example, measure fact linking performance on a granular triple slot level, while also measuring if a system has the ability to recognize that a surface form has no match in the existing KG. Our extensive evaluation of several baselines show that detection of out-of-KG entities and predicates is more difficult than accurate linking to existing ones, thus calling for more research efforts on this difficult task. We publicly release all resources (data, benchmark and code) on https://github.com/nec-research/fact-linking.
doi_str_mv 10.48550/arxiv.2310.14909
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2310_14909</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2310_14909</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-2394c9600a70d7485185edfb0f6f1b64e0e684ca73138583973368618c1050253</originalsourceid><addsrcrecordid>eNotzrtuwkAQheFtUkQkD5CKfQGTWc9eS4QCiWKJAnprWM86VoxBa3J7-xBCdaS_OPqEeFAw094YeKT83X3OSjwHpQOEW2GqbnjvhlZuPnKiyHJJ8TTK00FWlFsuNpF6lq_D4avnpmW5ynR8G-_ETaJ-5PvrTsR2-bRdPBfVevWymFcFWReKEoOOwQKQg8adBcobbtIOkk1qZzUDW68jOVTojcfgEK23ykcFBkqDEzH9v72462Pu9pR_6j9_ffHjL88xPUY</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Linking Surface Facts to Large-Scale Knowledge Graphs</title><source>arXiv.org</source><creator>Radevski, Gorjan ; Gashteovski, Kiril ; Hung, Chia-Chien ; Lawrence, Carolin ; Glavaš, Goran</creator><creatorcontrib>Radevski, Gorjan ; Gashteovski, Kiril ; Hung, Chia-Chien ; Lawrence, Carolin ; Glavaš, Goran</creatorcontrib><description>Open Information Extraction (OIE) methods extract facts from natural language text in the form of ("subject"; "relation"; "object") triples. These facts are, however, merely surface forms, the ambiguity of which impedes their downstream usage; e.g., the surface phrase "Michael Jordan" may refer to either the former basketball player or the university professor. Knowledge Graphs (KGs), on the other hand, contain facts in a canonical (i.e., unambiguous) form, but their coverage is limited by a static schema (i.e., a fixed set of entities and predicates). To bridge this gap, we need the best of both worlds: (i) high coverage of free-text OIEs, and (ii) semantic precision (i.e., monosemy) of KGs. In order to achieve this goal, we propose a new benchmark with novel evaluation protocols that can, for example, measure fact linking performance on a granular triple slot level, while also measuring if a system has the ability to recognize that a surface form has no match in the existing KG. Our extensive evaluation of several baselines show that detection of out-of-KG entities and predicates is more difficult than accurate linking to existing ones, thus calling for more research efforts on this difficult task. We publicly release all resources (data, benchmark and code) on https://github.com/nec-research/fact-linking.</description><identifier>DOI: 10.48550/arxiv.2310.14909</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Learning</subject><creationdate>2023-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2310.14909$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2310.14909$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Radevski, Gorjan</creatorcontrib><creatorcontrib>Gashteovski, Kiril</creatorcontrib><creatorcontrib>Hung, Chia-Chien</creatorcontrib><creatorcontrib>Lawrence, Carolin</creatorcontrib><creatorcontrib>Glavaš, Goran</creatorcontrib><title>Linking Surface Facts to Large-Scale Knowledge Graphs</title><description>Open Information Extraction (OIE) methods extract facts from natural language text in the form of ("subject"; "relation"; "object") triples. These facts are, however, merely surface forms, the ambiguity of which impedes their downstream usage; e.g., the surface phrase "Michael Jordan" may refer to either the former basketball player or the university professor. Knowledge Graphs (KGs), on the other hand, contain facts in a canonical (i.e., unambiguous) form, but their coverage is limited by a static schema (i.e., a fixed set of entities and predicates). To bridge this gap, we need the best of both worlds: (i) high coverage of free-text OIEs, and (ii) semantic precision (i.e., monosemy) of KGs. In order to achieve this goal, we propose a new benchmark with novel evaluation protocols that can, for example, measure fact linking performance on a granular triple slot level, while also measuring if a system has the ability to recognize that a surface form has no match in the existing KG. Our extensive evaluation of several baselines show that detection of out-of-KG entities and predicates is more difficult than accurate linking to existing ones, thus calling for more research efforts on this difficult task. We publicly release all resources (data, benchmark and code) on https://github.com/nec-research/fact-linking.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzrtuwkAQheFtUkQkD5CKfQGTWc9eS4QCiWKJAnprWM86VoxBa3J7-xBCdaS_OPqEeFAw094YeKT83X3OSjwHpQOEW2GqbnjvhlZuPnKiyHJJ8TTK00FWlFsuNpF6lq_D4avnpmW5ynR8G-_ETaJ-5PvrTsR2-bRdPBfVevWymFcFWReKEoOOwQKQg8adBcobbtIOkk1qZzUDW68jOVTojcfgEK23ykcFBkqDEzH9v72462Pu9pR_6j9_ffHjL88xPUY</recordid><startdate>20231023</startdate><enddate>20231023</enddate><creator>Radevski, Gorjan</creator><creator>Gashteovski, Kiril</creator><creator>Hung, Chia-Chien</creator><creator>Lawrence, Carolin</creator><creator>Glavaš, Goran</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231023</creationdate><title>Linking Surface Facts to Large-Scale Knowledge Graphs</title><author>Radevski, Gorjan ; Gashteovski, Kiril ; Hung, Chia-Chien ; Lawrence, Carolin ; Glavaš, Goran</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-2394c9600a70d7485185edfb0f6f1b64e0e684ca73138583973368618c1050253</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Radevski, Gorjan</creatorcontrib><creatorcontrib>Gashteovski, Kiril</creatorcontrib><creatorcontrib>Hung, Chia-Chien</creatorcontrib><creatorcontrib>Lawrence, Carolin</creatorcontrib><creatorcontrib>Glavaš, Goran</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Radevski, Gorjan</au><au>Gashteovski, Kiril</au><au>Hung, Chia-Chien</au><au>Lawrence, Carolin</au><au>Glavaš, Goran</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Linking Surface Facts to Large-Scale Knowledge Graphs</atitle><date>2023-10-23</date><risdate>2023</risdate><abstract>Open Information Extraction (OIE) methods extract facts from natural language text in the form of ("subject"; "relation"; "object") triples. These facts are, however, merely surface forms, the ambiguity of which impedes their downstream usage; e.g., the surface phrase "Michael Jordan" may refer to either the former basketball player or the university professor. Knowledge Graphs (KGs), on the other hand, contain facts in a canonical (i.e., unambiguous) form, but their coverage is limited by a static schema (i.e., a fixed set of entities and predicates). To bridge this gap, we need the best of both worlds: (i) high coverage of free-text OIEs, and (ii) semantic precision (i.e., monosemy) of KGs. In order to achieve this goal, we propose a new benchmark with novel evaluation protocols that can, for example, measure fact linking performance on a granular triple slot level, while also measuring if a system has the ability to recognize that a surface form has no match in the existing KG. Our extensive evaluation of several baselines show that detection of out-of-KG entities and predicates is more difficult than accurate linking to existing ones, thus calling for more research efforts on this difficult task. We publicly release all resources (data, benchmark and code) on https://github.com/nec-research/fact-linking.</abstract><doi>10.48550/arxiv.2310.14909</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2310.14909
ispartof
issn
language eng
recordid cdi_arxiv_primary_2310_14909
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Computation and Language
Computer Science - Learning
title Linking Surface Facts to Large-Scale Knowledge Graphs
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T05%3A26%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Linking%20Surface%20Facts%20to%20Large-Scale%20Knowledge%20Graphs&rft.au=Radevski,%20Gorjan&rft.date=2023-10-23&rft_id=info:doi/10.48550/arxiv.2310.14909&rft_dat=%3Carxiv_GOX%3E2310_14909%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true