Discovery of Predicate-Oriented Relations among Named Entities Extracted from Thai Texts

Extracting named entities (NEs) and their relations is more difficult in Thai than in other languages due to several Thai specific characteristics, including no explicit boundaries for words, phrases and sentences; few case markers and modifier clues; high ambiguity in compound words and serial verb...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEICE Transactions on Information and Systems 2012/07/01, Vol.E95.D(7), pp.1932-1946
Hauptverfasser: TONGTEP, Nattapong, THEERAMUNKONG, Thanaruk
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1946
container_issue 7
container_start_page 1932
container_title IEICE Transactions on Information and Systems
container_volume E95.D
creator TONGTEP, Nattapong
THEERAMUNKONG, Thanaruk
description Extracting named entities (NEs) and their relations is more difficult in Thai than in other languages due to several Thai specific characteristics, including no explicit boundaries for words, phrases and sentences; few case markers and modifier clues; high ambiguity in compound words and serial verbs; and flexible word orders. Unlike most previous works which focused on NE relations of specific actions, such as work_for, live_in, located_in, and kill, this paper proposes more general types of NE relations, called predicate-oriented relation (PoR), where an extracted action part (verb) is used as a core component to associate related named entities extracted from Thai Texts. Lacking a practical parser for the Thai language, we present three types of surface features, i.e. punctuation marks (such as token spaces), entity types and the number of entities and then apply five alternative commonly used learning schemes to investigate their performance on predicate-oriented relation extraction. The experimental results show that our approach achieves the F-measure of 97.76%, 99.19%, 95.00% and 93.50% on four different types of predicate-oriented relation (action-location, location-action, action-person and person-action) in crime-related news documents using a data set of 1,736 entity pairs. The effects of NE extraction techniques, feature sets and class unbalance on the performance of relation extraction are explored.
doi_str_mv 10.1587/transinf.E95.D.1932
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1221862376</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1221862376</sourcerecordid><originalsourceid>FETCH-LOGICAL-c590t-87961d698f25ce3242752458af80641059963bf2cb9edd3594694b46af1ca3653</originalsourceid><addsrcrecordid>eNpdkEtLLDEQRoMoOD5-gZveCG56bh6ddLIUZ_QKchUZwV2oSVc00g9Nooz_3h5GB7mrguJ89VGHkBNGp0zq-k-O0KfQ--ncyOlsyozgO2TC6kqWTCi2SybUMFVqKfg-OUjphVKmOZMT8jgLyQ0fGD-LwRd3EZvgIGN5GwP2GZviHlvIYehTAd3QPxX_oBu38z6HHDAV89XY7dagj0NXLJ4hFAtc5XRE9jy0CY-_5yF5uJwvLv6WN7dX1xfnN6WThuZS10axRhntuXQoeMVrySupwWuqKkalMUosPXdLg00jpKmUqZaVAs8cCCXFITnb3H2Nw9s7pmy78SNsW-hxeE-Wcc604qJWIyo2qItDShG9fY2hg_hpGbVrj_bHox092pldexxTp98FkBy0fkRcSNsoV1RLWumRu95wLynDE24BiDm4Fv-_Xf_q2DLuGaLFXnwBI4OQFQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1221862376</pqid></control><display><type>article</type><title>Discovery of Predicate-Oriented Relations among Named Entities Extracted from Thai Texts</title><source>J-STAGE Free</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>TONGTEP, Nattapong ; THEERAMUNKONG, Thanaruk</creator><creatorcontrib>TONGTEP, Nattapong ; THEERAMUNKONG, Thanaruk</creatorcontrib><description>Extracting named entities (NEs) and their relations is more difficult in Thai than in other languages due to several Thai specific characteristics, including no explicit boundaries for words, phrases and sentences; few case markers and modifier clues; high ambiguity in compound words and serial verbs; and flexible word orders. Unlike most previous works which focused on NE relations of specific actions, such as work_for, live_in, located_in, and kill, this paper proposes more general types of NE relations, called predicate-oriented relation (PoR), where an extracted action part (verb) is used as a core component to associate related named entities extracted from Thai Texts. Lacking a practical parser for the Thai language, we present three types of surface features, i.e. punctuation marks (such as token spaces), entity types and the number of entities and then apply five alternative commonly used learning schemes to investigate their performance on predicate-oriented relation extraction. The experimental results show that our approach achieves the F-measure of 97.76%, 99.19%, 95.00% and 93.50% on four different types of predicate-oriented relation (action-location, location-action, action-person and person-action) in crime-related news documents using a data set of 1,736 entity pairs. The effects of NE extraction techniques, feature sets and class unbalance on the performance of relation extraction are explored.</description><identifier>ISSN: 0916-8532</identifier><identifier>EISSN: 1745-1361</identifier><identifier>DOI: 10.1587/transinf.E95.D.1932</identifier><language>eng</language><publisher>Oxford: The Institute of Electronics, Information and Communication Engineers</publisher><subject>Applied sciences ; Boundaries ; Computer science; control theory; systems ; Exact sciences and technology ; Extraction ; information extraction ; Information systems. Data bases ; Information theory ; Information, signal and communications theory ; Markers ; Memory organisation. Data processing ; named entity ; News ; relation extraction ; Sentences ; Serials ; Software ; surface feature ; Telecommunications and information theory ; Texts ; Unbalance</subject><ispartof>IEICE Transactions on Information and Systems, 2012/07/01, Vol.E95.D(7), pp.1932-1946</ispartof><rights>2012 The Institute of Electronics, Information and Communication Engineers</rights><rights>2015 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c590t-87961d698f25ce3242752458af80641059963bf2cb9edd3594694b46af1ca3653</citedby><cites>FETCH-LOGICAL-c590t-87961d698f25ce3242752458af80641059963bf2cb9edd3594694b46af1ca3653</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,1881,4022,27921,27922,27923</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=26085048$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>TONGTEP, Nattapong</creatorcontrib><creatorcontrib>THEERAMUNKONG, Thanaruk</creatorcontrib><title>Discovery of Predicate-Oriented Relations among Named Entities Extracted from Thai Texts</title><title>IEICE Transactions on Information and Systems</title><addtitle>IEICE Trans. Inf. &amp; Syst.</addtitle><description>Extracting named entities (NEs) and their relations is more difficult in Thai than in other languages due to several Thai specific characteristics, including no explicit boundaries for words, phrases and sentences; few case markers and modifier clues; high ambiguity in compound words and serial verbs; and flexible word orders. Unlike most previous works which focused on NE relations of specific actions, such as work_for, live_in, located_in, and kill, this paper proposes more general types of NE relations, called predicate-oriented relation (PoR), where an extracted action part (verb) is used as a core component to associate related named entities extracted from Thai Texts. Lacking a practical parser for the Thai language, we present three types of surface features, i.e. punctuation marks (such as token spaces), entity types and the number of entities and then apply five alternative commonly used learning schemes to investigate their performance on predicate-oriented relation extraction. The experimental results show that our approach achieves the F-measure of 97.76%, 99.19%, 95.00% and 93.50% on four different types of predicate-oriented relation (action-location, location-action, action-person and person-action) in crime-related news documents using a data set of 1,736 entity pairs. The effects of NE extraction techniques, feature sets and class unbalance on the performance of relation extraction are explored.</description><subject>Applied sciences</subject><subject>Boundaries</subject><subject>Computer science; control theory; systems</subject><subject>Exact sciences and technology</subject><subject>Extraction</subject><subject>information extraction</subject><subject>Information systems. Data bases</subject><subject>Information theory</subject><subject>Information, signal and communications theory</subject><subject>Markers</subject><subject>Memory organisation. Data processing</subject><subject>named entity</subject><subject>News</subject><subject>relation extraction</subject><subject>Sentences</subject><subject>Serials</subject><subject>Software</subject><subject>surface feature</subject><subject>Telecommunications and information theory</subject><subject>Texts</subject><subject>Unbalance</subject><issn>0916-8532</issn><issn>1745-1361</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><recordid>eNpdkEtLLDEQRoMoOD5-gZveCG56bh6ddLIUZ_QKchUZwV2oSVc00g9Nooz_3h5GB7mrguJ89VGHkBNGp0zq-k-O0KfQ--ncyOlsyozgO2TC6kqWTCi2SybUMFVqKfg-OUjphVKmOZMT8jgLyQ0fGD-LwRd3EZvgIGN5GwP2GZviHlvIYehTAd3QPxX_oBu38z6HHDAV89XY7dagj0NXLJ4hFAtc5XRE9jy0CY-_5yF5uJwvLv6WN7dX1xfnN6WThuZS10axRhntuXQoeMVrySupwWuqKkalMUosPXdLg00jpKmUqZaVAs8cCCXFITnb3H2Nw9s7pmy78SNsW-hxeE-Wcc604qJWIyo2qItDShG9fY2hg_hpGbVrj_bHox092pldexxTp98FkBy0fkRcSNsoV1RLWumRu95wLynDE24BiDm4Fv-_Xf_q2DLuGaLFXnwBI4OQFQ</recordid><startdate>2012</startdate><enddate>2012</enddate><creator>TONGTEP, Nattapong</creator><creator>THEERAMUNKONG, Thanaruk</creator><general>The Institute of Electronics, Information and Communication Engineers</general><general>Oxford University Press</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>2012</creationdate><title>Discovery of Predicate-Oriented Relations among Named Entities Extracted from Thai Texts</title><author>TONGTEP, Nattapong ; THEERAMUNKONG, Thanaruk</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c590t-87961d698f25ce3242752458af80641059963bf2cb9edd3594694b46af1ca3653</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Applied sciences</topic><topic>Boundaries</topic><topic>Computer science; control theory; systems</topic><topic>Exact sciences and technology</topic><topic>Extraction</topic><topic>information extraction</topic><topic>Information systems. Data bases</topic><topic>Information theory</topic><topic>Information, signal and communications theory</topic><topic>Markers</topic><topic>Memory organisation. Data processing</topic><topic>named entity</topic><topic>News</topic><topic>relation extraction</topic><topic>Sentences</topic><topic>Serials</topic><topic>Software</topic><topic>surface feature</topic><topic>Telecommunications and information theory</topic><topic>Texts</topic><topic>Unbalance</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>TONGTEP, Nattapong</creatorcontrib><creatorcontrib>THEERAMUNKONG, Thanaruk</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEICE Transactions on Information and Systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>TONGTEP, Nattapong</au><au>THEERAMUNKONG, Thanaruk</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Discovery of Predicate-Oriented Relations among Named Entities Extracted from Thai Texts</atitle><jtitle>IEICE Transactions on Information and Systems</jtitle><addtitle>IEICE Trans. Inf. &amp; Syst.</addtitle><date>2012</date><risdate>2012</risdate><volume>E95.D</volume><issue>7</issue><spage>1932</spage><epage>1946</epage><pages>1932-1946</pages><issn>0916-8532</issn><eissn>1745-1361</eissn><abstract>Extracting named entities (NEs) and their relations is more difficult in Thai than in other languages due to several Thai specific characteristics, including no explicit boundaries for words, phrases and sentences; few case markers and modifier clues; high ambiguity in compound words and serial verbs; and flexible word orders. Unlike most previous works which focused on NE relations of specific actions, such as work_for, live_in, located_in, and kill, this paper proposes more general types of NE relations, called predicate-oriented relation (PoR), where an extracted action part (verb) is used as a core component to associate related named entities extracted from Thai Texts. Lacking a practical parser for the Thai language, we present three types of surface features, i.e. punctuation marks (such as token spaces), entity types and the number of entities and then apply five alternative commonly used learning schemes to investigate their performance on predicate-oriented relation extraction. The experimental results show that our approach achieves the F-measure of 97.76%, 99.19%, 95.00% and 93.50% on four different types of predicate-oriented relation (action-location, location-action, action-person and person-action) in crime-related news documents using a data set of 1,736 entity pairs. The effects of NE extraction techniques, feature sets and class unbalance on the performance of relation extraction are explored.</abstract><cop>Oxford</cop><pub>The Institute of Electronics, Information and Communication Engineers</pub><doi>10.1587/transinf.E95.D.1932</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0916-8532
ispartof IEICE Transactions on Information and Systems, 2012/07/01, Vol.E95.D(7), pp.1932-1946
issn 0916-8532
1745-1361
language eng
recordid cdi_proquest_miscellaneous_1221862376
source J-STAGE Free; EZB-FREE-00999 freely available EZB journals
subjects Applied sciences
Boundaries
Computer science
control theory
systems
Exact sciences and technology
Extraction
information extraction
Information systems. Data bases
Information theory
Information, signal and communications theory
Markers
Memory organisation. Data processing
named entity
News
relation extraction
Sentences
Serials
Software
surface feature
Telecommunications and information theory
Texts
Unbalance
title Discovery of Predicate-Oriented Relations among Named Entities Extracted from Thai Texts
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T16%3A38%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Discovery%20of%20Predicate-Oriented%20Relations%20among%20Named%20Entities%20Extracted%20from%20Thai%20Texts&rft.jtitle=IEICE%20Transactions%20on%20Information%20and%20Systems&rft.au=TONGTEP,%20Nattapong&rft.date=2012&rft.volume=E95.D&rft.issue=7&rft.spage=1932&rft.epage=1946&rft.pages=1932-1946&rft.issn=0916-8532&rft.eissn=1745-1361&rft_id=info:doi/10.1587/transinf.E95.D.1932&rft_dat=%3Cproquest_cross%3E1221862376%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1221862376&rft_id=info:pmid/&rfr_iscdi=true