Discovery of Predicate-Oriented Relations among Named Entities Extracted from Thai Texts
Extracting named entities (NEs) and their relations is more difficult in Thai than in other languages due to several Thai specific characteristics, including no explicit boundaries for words, phrases and sentences; few case markers and modifier clues; high ambiguity in compound words and serial verb...
Gespeichert in:
Veröffentlicht in: | IEICE Transactions on Information and Systems 2012/07/01, Vol.E95.D(7), pp.1932-1946 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1946 |
---|---|
container_issue | 7 |
container_start_page | 1932 |
container_title | IEICE Transactions on Information and Systems |
container_volume | E95.D |
creator | TONGTEP, Nattapong THEERAMUNKONG, Thanaruk |
description | Extracting named entities (NEs) and their relations is more difficult in Thai than in other languages due to several Thai specific characteristics, including no explicit boundaries for words, phrases and sentences; few case markers and modifier clues; high ambiguity in compound words and serial verbs; and flexible word orders. Unlike most previous works which focused on NE relations of specific actions, such as work_for, live_in, located_in, and kill, this paper proposes more general types of NE relations, called predicate-oriented relation (PoR), where an extracted action part (verb) is used as a core component to associate related named entities extracted from Thai Texts. Lacking a practical parser for the Thai language, we present three types of surface features, i.e. punctuation marks (such as token spaces), entity types and the number of entities and then apply five alternative commonly used learning schemes to investigate their performance on predicate-oriented relation extraction. The experimental results show that our approach achieves the F-measure of 97.76%, 99.19%, 95.00% and 93.50% on four different types of predicate-oriented relation (action-location, location-action, action-person and person-action) in crime-related news documents using a data set of 1,736 entity pairs. The effects of NE extraction techniques, feature sets and class unbalance on the performance of relation extraction are explored. |
doi_str_mv | 10.1587/transinf.E95.D.1932 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1221862376</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1221862376</sourcerecordid><originalsourceid>FETCH-LOGICAL-c590t-87961d698f25ce3242752458af80641059963bf2cb9edd3594694b46af1ca3653</originalsourceid><addsrcrecordid>eNpdkEtLLDEQRoMoOD5-gZveCG56bh6ddLIUZ_QKchUZwV2oSVc00g9Nooz_3h5GB7mrguJ89VGHkBNGp0zq-k-O0KfQ--ncyOlsyozgO2TC6kqWTCi2SybUMFVqKfg-OUjphVKmOZMT8jgLyQ0fGD-LwRd3EZvgIGN5GwP2GZviHlvIYehTAd3QPxX_oBu38z6HHDAV89XY7dagj0NXLJ4hFAtc5XRE9jy0CY-_5yF5uJwvLv6WN7dX1xfnN6WThuZS10axRhntuXQoeMVrySupwWuqKkalMUosPXdLg00jpKmUqZaVAs8cCCXFITnb3H2Nw9s7pmy78SNsW-hxeE-Wcc604qJWIyo2qItDShG9fY2hg_hpGbVrj_bHox092pldexxTp98FkBy0fkRcSNsoV1RLWumRu95wLynDE24BiDm4Fv-_Xf_q2DLuGaLFXnwBI4OQFQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1221862376</pqid></control><display><type>article</type><title>Discovery of Predicate-Oriented Relations among Named Entities Extracted from Thai Texts</title><source>J-STAGE Free</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>TONGTEP, Nattapong ; THEERAMUNKONG, Thanaruk</creator><creatorcontrib>TONGTEP, Nattapong ; THEERAMUNKONG, Thanaruk</creatorcontrib><description>Extracting named entities (NEs) and their relations is more difficult in Thai than in other languages due to several Thai specific characteristics, including no explicit boundaries for words, phrases and sentences; few case markers and modifier clues; high ambiguity in compound words and serial verbs; and flexible word orders. Unlike most previous works which focused on NE relations of specific actions, such as work_for, live_in, located_in, and kill, this paper proposes more general types of NE relations, called predicate-oriented relation (PoR), where an extracted action part (verb) is used as a core component to associate related named entities extracted from Thai Texts. Lacking a practical parser for the Thai language, we present three types of surface features, i.e. punctuation marks (such as token spaces), entity types and the number of entities and then apply five alternative commonly used learning schemes to investigate their performance on predicate-oriented relation extraction. The experimental results show that our approach achieves the F-measure of 97.76%, 99.19%, 95.00% and 93.50% on four different types of predicate-oriented relation (action-location, location-action, action-person and person-action) in crime-related news documents using a data set of 1,736 entity pairs. The effects of NE extraction techniques, feature sets and class unbalance on the performance of relation extraction are explored.</description><identifier>ISSN: 0916-8532</identifier><identifier>EISSN: 1745-1361</identifier><identifier>DOI: 10.1587/transinf.E95.D.1932</identifier><language>eng</language><publisher>Oxford: The Institute of Electronics, Information and Communication Engineers</publisher><subject>Applied sciences ; Boundaries ; Computer science; control theory; systems ; Exact sciences and technology ; Extraction ; information extraction ; Information systems. Data bases ; Information theory ; Information, signal and communications theory ; Markers ; Memory organisation. Data processing ; named entity ; News ; relation extraction ; Sentences ; Serials ; Software ; surface feature ; Telecommunications and information theory ; Texts ; Unbalance</subject><ispartof>IEICE Transactions on Information and Systems, 2012/07/01, Vol.E95.D(7), pp.1932-1946</ispartof><rights>2012 The Institute of Electronics, Information and Communication Engineers</rights><rights>2015 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c590t-87961d698f25ce3242752458af80641059963bf2cb9edd3594694b46af1ca3653</citedby><cites>FETCH-LOGICAL-c590t-87961d698f25ce3242752458af80641059963bf2cb9edd3594694b46af1ca3653</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,1881,4022,27921,27922,27923</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=26085048$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>TONGTEP, Nattapong</creatorcontrib><creatorcontrib>THEERAMUNKONG, Thanaruk</creatorcontrib><title>Discovery of Predicate-Oriented Relations among Named Entities Extracted from Thai Texts</title><title>IEICE Transactions on Information and Systems</title><addtitle>IEICE Trans. Inf. & Syst.</addtitle><description>Extracting named entities (NEs) and their relations is more difficult in Thai than in other languages due to several Thai specific characteristics, including no explicit boundaries for words, phrases and sentences; few case markers and modifier clues; high ambiguity in compound words and serial verbs; and flexible word orders. Unlike most previous works which focused on NE relations of specific actions, such as work_for, live_in, located_in, and kill, this paper proposes more general types of NE relations, called predicate-oriented relation (PoR), where an extracted action part (verb) is used as a core component to associate related named entities extracted from Thai Texts. Lacking a practical parser for the Thai language, we present three types of surface features, i.e. punctuation marks (such as token spaces), entity types and the number of entities and then apply five alternative commonly used learning schemes to investigate their performance on predicate-oriented relation extraction. The experimental results show that our approach achieves the F-measure of 97.76%, 99.19%, 95.00% and 93.50% on four different types of predicate-oriented relation (action-location, location-action, action-person and person-action) in crime-related news documents using a data set of 1,736 entity pairs. The effects of NE extraction techniques, feature sets and class unbalance on the performance of relation extraction are explored.</description><subject>Applied sciences</subject><subject>Boundaries</subject><subject>Computer science; control theory; systems</subject><subject>Exact sciences and technology</subject><subject>Extraction</subject><subject>information extraction</subject><subject>Information systems. Data bases</subject><subject>Information theory</subject><subject>Information, signal and communications theory</subject><subject>Markers</subject><subject>Memory organisation. Data processing</subject><subject>named entity</subject><subject>News</subject><subject>relation extraction</subject><subject>Sentences</subject><subject>Serials</subject><subject>Software</subject><subject>surface feature</subject><subject>Telecommunications and information theory</subject><subject>Texts</subject><subject>Unbalance</subject><issn>0916-8532</issn><issn>1745-1361</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><recordid>eNpdkEtLLDEQRoMoOD5-gZveCG56bh6ddLIUZ_QKchUZwV2oSVc00g9Nooz_3h5GB7mrguJ89VGHkBNGp0zq-k-O0KfQ--ncyOlsyozgO2TC6kqWTCi2SybUMFVqKfg-OUjphVKmOZMT8jgLyQ0fGD-LwRd3EZvgIGN5GwP2GZviHlvIYehTAd3QPxX_oBu38z6HHDAV89XY7dagj0NXLJ4hFAtc5XRE9jy0CY-_5yF5uJwvLv6WN7dX1xfnN6WThuZS10axRhntuXQoeMVrySupwWuqKkalMUosPXdLg00jpKmUqZaVAs8cCCXFITnb3H2Nw9s7pmy78SNsW-hxeE-Wcc604qJWIyo2qItDShG9fY2hg_hpGbVrj_bHox092pldexxTp98FkBy0fkRcSNsoV1RLWumRu95wLynDE24BiDm4Fv-_Xf_q2DLuGaLFXnwBI4OQFQ</recordid><startdate>2012</startdate><enddate>2012</enddate><creator>TONGTEP, Nattapong</creator><creator>THEERAMUNKONG, Thanaruk</creator><general>The Institute of Electronics, Information and Communication Engineers</general><general>Oxford University Press</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>2012</creationdate><title>Discovery of Predicate-Oriented Relations among Named Entities Extracted from Thai Texts</title><author>TONGTEP, Nattapong ; THEERAMUNKONG, Thanaruk</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c590t-87961d698f25ce3242752458af80641059963bf2cb9edd3594694b46af1ca3653</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Applied sciences</topic><topic>Boundaries</topic><topic>Computer science; control theory; systems</topic><topic>Exact sciences and technology</topic><topic>Extraction</topic><topic>information extraction</topic><topic>Information systems. Data bases</topic><topic>Information theory</topic><topic>Information, signal and communications theory</topic><topic>Markers</topic><topic>Memory organisation. Data processing</topic><topic>named entity</topic><topic>News</topic><topic>relation extraction</topic><topic>Sentences</topic><topic>Serials</topic><topic>Software</topic><topic>surface feature</topic><topic>Telecommunications and information theory</topic><topic>Texts</topic><topic>Unbalance</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>TONGTEP, Nattapong</creatorcontrib><creatorcontrib>THEERAMUNKONG, Thanaruk</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEICE Transactions on Information and Systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>TONGTEP, Nattapong</au><au>THEERAMUNKONG, Thanaruk</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Discovery of Predicate-Oriented Relations among Named Entities Extracted from Thai Texts</atitle><jtitle>IEICE Transactions on Information and Systems</jtitle><addtitle>IEICE Trans. Inf. & Syst.</addtitle><date>2012</date><risdate>2012</risdate><volume>E95.D</volume><issue>7</issue><spage>1932</spage><epage>1946</epage><pages>1932-1946</pages><issn>0916-8532</issn><eissn>1745-1361</eissn><abstract>Extracting named entities (NEs) and their relations is more difficult in Thai than in other languages due to several Thai specific characteristics, including no explicit boundaries for words, phrases and sentences; few case markers and modifier clues; high ambiguity in compound words and serial verbs; and flexible word orders. Unlike most previous works which focused on NE relations of specific actions, such as work_for, live_in, located_in, and kill, this paper proposes more general types of NE relations, called predicate-oriented relation (PoR), where an extracted action part (verb) is used as a core component to associate related named entities extracted from Thai Texts. Lacking a practical parser for the Thai language, we present three types of surface features, i.e. punctuation marks (such as token spaces), entity types and the number of entities and then apply five alternative commonly used learning schemes to investigate their performance on predicate-oriented relation extraction. The experimental results show that our approach achieves the F-measure of 97.76%, 99.19%, 95.00% and 93.50% on four different types of predicate-oriented relation (action-location, location-action, action-person and person-action) in crime-related news documents using a data set of 1,736 entity pairs. The effects of NE extraction techniques, feature sets and class unbalance on the performance of relation extraction are explored.</abstract><cop>Oxford</cop><pub>The Institute of Electronics, Information and Communication Engineers</pub><doi>10.1587/transinf.E95.D.1932</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0916-8532 |
ispartof | IEICE Transactions on Information and Systems, 2012/07/01, Vol.E95.D(7), pp.1932-1946 |
issn | 0916-8532 1745-1361 |
language | eng |
recordid | cdi_proquest_miscellaneous_1221862376 |
source | J-STAGE Free; EZB-FREE-00999 freely available EZB journals |
subjects | Applied sciences Boundaries Computer science control theory systems Exact sciences and technology Extraction information extraction Information systems. Data bases Information theory Information, signal and communications theory Markers Memory organisation. Data processing named entity News relation extraction Sentences Serials Software surface feature Telecommunications and information theory Texts Unbalance |
title | Discovery of Predicate-Oriented Relations among Named Entities Extracted from Thai Texts |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T16%3A38%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Discovery%20of%20Predicate-Oriented%20Relations%20among%20Named%20Entities%20Extracted%20from%20Thai%20Texts&rft.jtitle=IEICE%20Transactions%20on%20Information%20and%20Systems&rft.au=TONGTEP,%20Nattapong&rft.date=2012&rft.volume=E95.D&rft.issue=7&rft.spage=1932&rft.epage=1946&rft.pages=1932-1946&rft.issn=0916-8532&rft.eissn=1745-1361&rft_id=info:doi/10.1587/transinf.E95.D.1932&rft_dat=%3Cproquest_cross%3E1221862376%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1221862376&rft_id=info:pmid/&rfr_iscdi=true |