PromptEM: Prompt-tuning for Low-resource Generalized Entity Matching

Entity Matching (EM), which aims to identify whether two entity records from two relational tables refer to the same real-world entity, is one of the fundamental problems in data management. Traditional EM assumes that two tables are homogeneous with the aligned schema, while it is common that entit...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2022-07
Hauptverfasser:	Wang, Pengfei, Zeng, Xiaocan, Chen, Lu, Fan, Ye, Mao, Yuren, Zhu, Junhao, Gao, Yunjun
Format:	Artikel
Sprache:	eng
Schlagworte:	Labels Matching
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Wang, Pengfei Zeng, Xiaocan Chen, Lu Fan, Ye Mao, Yuren Zhu, Junhao Gao, Yunjun
description	Entity Matching (EM), which aims to identify whether two entity records from two relational tables refer to the same real-world entity, is one of the fundamental problems in data management. Traditional EM assumes that two tables are homogeneous with the aligned schema, while it is common that entity records of different formats (e.g., relational, semi-structured, or textual types) involve in practical scenarios. It is not practical to unify their schemas due to the different formats. To support EM on format-different entity records, Generalized Entity Matching (GEM) has been proposed and gained much attention recently. To do GEM, existing methods typically perform in a supervised learning way, which relies on a large amount of high-quality labeled examples. However, the labeling process is extremely labor-intensive, and frustrates the use of GEM. Low-resource GEM, i.e., GEM that only requires a small number of labeled examples, becomes an urgent need. To this end, this paper, for the first time, focuses on the low-resource GEM and proposes a novel low-resource GEM method, termed as PromptEM. PromptEM has addressed three challenging issues (i.e., designing GEM-specific prompt-tuning, improving pseudo-labels quality, and running efficient self-training) in low-resource GEM. Extensive experimental results on eight real benchmarks demonstrate the superiority of PromptEM in terms of effectiveness and efficiency.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2688296009</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2688296009</sourcerecordid><originalsourceid>FETCH-proquest_journals_26882960093</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mRwCSjKzy0ocfW1UoCwdEtK8zLz0hXS8osUfPLLdYtSi_NLi5JTFdxT81KLEnMyq1JTFFzzSjJLKhV8E0uSM4CKeRhY0xJzilN5oTQ3g7Kba4izh25BUX5haWpxSXwW0Iw8oFS8kZmFhZGlmYGBpTFxqgA5gDqf</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2688296009</pqid></control><display><type>article</type><title>PromptEM: Prompt-tuning for Low-resource Generalized Entity Matching</title><source>Free E- Journals</source><creator>Wang, Pengfei ; Zeng, Xiaocan ; Chen, Lu ; Fan, Ye ; Mao, Yuren ; Zhu, Junhao ; Gao, Yunjun</creator><creatorcontrib>Wang, Pengfei ; Zeng, Xiaocan ; Chen, Lu ; Fan, Ye ; Mao, Yuren ; Zhu, Junhao ; Gao, Yunjun</creatorcontrib><description>Entity Matching (EM), which aims to identify whether two entity records from two relational tables refer to the same real-world entity, is one of the fundamental problems in data management. Traditional EM assumes that two tables are homogeneous with the aligned schema, while it is common that entity records of different formats (e.g., relational, semi-structured, or textual types) involve in practical scenarios. It is not practical to unify their schemas due to the different formats. To support EM on format-different entity records, Generalized Entity Matching (GEM) has been proposed and gained much attention recently. To do GEM, existing methods typically perform in a supervised learning way, which relies on a large amount of high-quality labeled examples. However, the labeling process is extremely labor-intensive, and frustrates the use of GEM. Low-resource GEM, i.e., GEM that only requires a small number of labeled examples, becomes an urgent need. To this end, this paper, for the first time, focuses on the low-resource GEM and proposes a novel low-resource GEM method, termed as PromptEM. PromptEM has addressed three challenging issues (i.e., designing GEM-specific prompt-tuning, improving pseudo-labels quality, and running efficient self-training) in low-resource GEM. Extensive experimental results on eight real benchmarks demonstrate the superiority of PromptEM in terms of effectiveness and efficiency.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Labels ; Matching</subject><ispartof>arXiv.org, 2022-07</ispartof><rights>2022. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Wang, Pengfei</creatorcontrib><creatorcontrib>Zeng, Xiaocan</creatorcontrib><creatorcontrib>Chen, Lu</creatorcontrib><creatorcontrib>Fan, Ye</creatorcontrib><creatorcontrib>Mao, Yuren</creatorcontrib><creatorcontrib>Zhu, Junhao</creatorcontrib><creatorcontrib>Gao, Yunjun</creatorcontrib><title>PromptEM: Prompt-tuning for Low-resource Generalized Entity Matching</title><title>arXiv.org</title><description>Entity Matching (EM), which aims to identify whether two entity records from two relational tables refer to the same real-world entity, is one of the fundamental problems in data management. Traditional EM assumes that two tables are homogeneous with the aligned schema, while it is common that entity records of different formats (e.g., relational, semi-structured, or textual types) involve in practical scenarios. It is not practical to unify their schemas due to the different formats. To support EM on format-different entity records, Generalized Entity Matching (GEM) has been proposed and gained much attention recently. To do GEM, existing methods typically perform in a supervised learning way, which relies on a large amount of high-quality labeled examples. However, the labeling process is extremely labor-intensive, and frustrates the use of GEM. Low-resource GEM, i.e., GEM that only requires a small number of labeled examples, becomes an urgent need. To this end, this paper, for the first time, focuses on the low-resource GEM and proposes a novel low-resource GEM method, termed as PromptEM. PromptEM has addressed three challenging issues (i.e., designing GEM-specific prompt-tuning, improving pseudo-labels quality, and running efficient self-training) in low-resource GEM. Extensive experimental results on eight real benchmarks demonstrate the superiority of PromptEM in terms of effectiveness and efficiency.</description><subject>Labels</subject><subject>Matching</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mRwCSjKzy0ocfW1UoCwdEtK8zLz0hXS8osUfPLLdYtSi_NLi5JTFdxT81KLEnMyq1JTFFzzSjJLKhV8E0uSM4CKeRhY0xJzilN5oTQ3g7Kba4izh25BUX5haWpxSXwW0Iw8oFS8kZmFhZGlmYGBpTFxqgA5gDqf</recordid><startdate>20220716</startdate><enddate>20220716</enddate><creator>Wang, Pengfei</creator><creator>Zeng, Xiaocan</creator><creator>Chen, Lu</creator><creator>Fan, Ye</creator><creator>Mao, Yuren</creator><creator>Zhu, Junhao</creator><creator>Gao, Yunjun</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20220716</creationdate><title>PromptEM: Prompt-tuning for Low-resource Generalized Entity Matching</title><author>Wang, Pengfei ; Zeng, Xiaocan ; Chen, Lu ; Fan, Ye ; Mao, Yuren ; Zhu, Junhao ; Gao, Yunjun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_26882960093</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Labels</topic><topic>Matching</topic><toplevel>online_resources</toplevel><creatorcontrib>Wang, Pengfei</creatorcontrib><creatorcontrib>Zeng, Xiaocan</creatorcontrib><creatorcontrib>Chen, Lu</creatorcontrib><creatorcontrib>Fan, Ye</creatorcontrib><creatorcontrib>Mao, Yuren</creatorcontrib><creatorcontrib>Zhu, Junhao</creatorcontrib><creatorcontrib>Gao, Yunjun</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Pengfei</au><au>Zeng, Xiaocan</au><au>Chen, Lu</au><au>Fan, Ye</au><au>Mao, Yuren</au><au>Zhu, Junhao</au><au>Gao, Yunjun</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>PromptEM: Prompt-tuning for Low-resource Generalized Entity Matching</atitle><jtitle>arXiv.org</jtitle><date>2022-07-16</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>Entity Matching (EM), which aims to identify whether two entity records from two relational tables refer to the same real-world entity, is one of the fundamental problems in data management. Traditional EM assumes that two tables are homogeneous with the aligned schema, while it is common that entity records of different formats (e.g., relational, semi-structured, or textual types) involve in practical scenarios. It is not practical to unify their schemas due to the different formats. To support EM on format-different entity records, Generalized Entity Matching (GEM) has been proposed and gained much attention recently. To do GEM, existing methods typically perform in a supervised learning way, which relies on a large amount of high-quality labeled examples. However, the labeling process is extremely labor-intensive, and frustrates the use of GEM. Low-resource GEM, i.e., GEM that only requires a small number of labeled examples, becomes an urgent need. To this end, this paper, for the first time, focuses on the low-resource GEM and proposes a novel low-resource GEM method, termed as PromptEM. PromptEM has addressed three challenging issues (i.e., designing GEM-specific prompt-tuning, improving pseudo-labels quality, and running efficient self-training) in low-resource GEM. Extensive experimental results on eight real benchmarks demonstrate the superiority of PromptEM in terms of effectiveness and efficiency.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2022-07
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2688296009
source	Free E- Journals
subjects	Labels Matching
title	PromptEM: Prompt-tuning for Low-resource Generalized Entity Matching
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T11%3A30%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=PromptEM:%20Prompt-tuning%20for%20Low-resource%20Generalized%20Entity%20Matching&rft.jtitle=arXiv.org&rft.au=Wang,%20Pengfei&rft.date=2022-07-16&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2688296009%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2688296009&rft_id=info:pmid/&rfr_iscdi=true