Automatic Acquisition of Basic Katakana Lexicon from a Given Corpus
Katakana, Japanese phonogram mainly used for loan words, is a troublemaker in Japanese word segmentation. Since Katakana words are heavily domain-dependent and there are many Katakana neologisms, it is almost impossible to construct and maintain Katakana word dictionary by hand. This paper proposes...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 693 |
---|---|
container_issue | |
container_start_page | 682 |
container_title | |
container_volume | |
creator | Nakazawa, Toshiaki Kawahara, Daisuke Kurohashi, Sadao |
description | Katakana, Japanese phonogram mainly used for loan words, is a troublemaker in Japanese word segmentation. Since Katakana words are heavily domain-dependent and there are many Katakana neologisms, it is almost impossible to construct and maintain Katakana word dictionary by hand. This paper proposes an automatic segmentation method of Japanese Katakana compounds, which makes it possible to construct precise and concise Katakana word dictionary automatically, given only a medium or large size of Japanese corpus of some domain. |
doi_str_mv | 10.1007/11562214_60 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>pascalfrancis_sprin</sourceid><recordid>TN_cdi_pascalfrancis_primary_17265713</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>17265713</sourcerecordid><originalsourceid>FETCH-LOGICAL-j285t-a5c88101642175ea0b7989a6c11e953319e42fac5b42278da431b248c0dd6b9c3</originalsourceid><addsrcrecordid>eNpNkEtPwzAQhM1Loio98Qd84cAh4PXbxxKVgqjEBc7RxnWQ-0hKnCL49xgVIfay0n6j0ewQcgnsBhgztwBKcw6y0uyITJyxQkkmwHAJx2QEGqAQQrqTP8ZdhuqUjJhgvHBGinMySWnF8ghwAHxEyul-6LY4RE-n_n0fUxxi19KuoXeY8vEJB1xji3QRPqPPpOm7LUU6jx-hpWXX7_bpgpw1uElh8rvH5PV-9lI-FIvn-WM5XRQrbtVQoPLWAgMtORgVkNXGWYfaAwSnRE4UJG_Qq1pybuwSpYCaS-vZcqlr58WYXB18d5g8bpoeWx9TtevjFvuvKj-rlQGRddcHXcqofQt9VXfdOlXAqp8iq39Fim-2J12O</addsrcrecordid><sourcetype>Index Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Automatic Acquisition of Basic Katakana Lexicon from a Given Corpus</title><source>Springer Books</source><creator>Nakazawa, Toshiaki ; Kawahara, Daisuke ; Kurohashi, Sadao</creator><contributor>Kwong, Oi Yee ; Dale, Robert ; Wong, Kam-Fai ; Su, Jian</contributor><creatorcontrib>Nakazawa, Toshiaki ; Kawahara, Daisuke ; Kurohashi, Sadao ; Kwong, Oi Yee ; Dale, Robert ; Wong, Kam-Fai ; Su, Jian</creatorcontrib><description>Katakana, Japanese phonogram mainly used for loan words, is a troublemaker in Japanese word segmentation. Since Katakana words are heavily domain-dependent and there are many Katakana neologisms, it is almost impossible to construct and maintain Katakana word dictionary by hand. This paper proposes an automatic segmentation method of Japanese Katakana compounds, which makes it possible to construct precise and concise Katakana word dictionary automatically, given only a medium or large size of Japanese corpus of some domain.</description><identifier>ISSN: 0302-9743</identifier><identifier>ISBN: 9783540291725</identifier><identifier>ISBN: 3540291725</identifier><identifier>EISSN: 1611-3349</identifier><identifier>EISBN: 9783540317241</identifier><identifier>EISBN: 3540317244</identifier><identifier>DOI: 10.1007/11562214_60</identifier><language>eng</language><publisher>Berlin, Heidelberg: Springer Berlin Heidelberg</publisher><subject>Applied sciences ; Artificial intelligence ; Automatic Acquisition ; Compound Noun ; Compound Word ; Computer science; control theory; systems ; Exact sciences and technology ; Input Word ; Speech and sound recognition and synthesis. Linguistics ; Tomato Sauce</subject><ispartof>Lecture notes in computer science, 2005, p.682-693</ispartof><rights>Springer-Verlag Berlin Heidelberg 2005</rights><rights>2005 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/11562214_60$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/11562214_60$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>309,310,775,776,780,785,786,789,4036,4037,27902,38232,41418,42487</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=17265713$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><contributor>Kwong, Oi Yee</contributor><contributor>Dale, Robert</contributor><contributor>Wong, Kam-Fai</contributor><contributor>Su, Jian</contributor><creatorcontrib>Nakazawa, Toshiaki</creatorcontrib><creatorcontrib>Kawahara, Daisuke</creatorcontrib><creatorcontrib>Kurohashi, Sadao</creatorcontrib><title>Automatic Acquisition of Basic Katakana Lexicon from a Given Corpus</title><title>Lecture notes in computer science</title><description>Katakana, Japanese phonogram mainly used for loan words, is a troublemaker in Japanese word segmentation. Since Katakana words are heavily domain-dependent and there are many Katakana neologisms, it is almost impossible to construct and maintain Katakana word dictionary by hand. This paper proposes an automatic segmentation method of Japanese Katakana compounds, which makes it possible to construct precise and concise Katakana word dictionary automatically, given only a medium or large size of Japanese corpus of some domain.</description><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Automatic Acquisition</subject><subject>Compound Noun</subject><subject>Compound Word</subject><subject>Computer science; control theory; systems</subject><subject>Exact sciences and technology</subject><subject>Input Word</subject><subject>Speech and sound recognition and synthesis. Linguistics</subject><subject>Tomato Sauce</subject><issn>0302-9743</issn><issn>1611-3349</issn><isbn>9783540291725</isbn><isbn>3540291725</isbn><isbn>9783540317241</isbn><isbn>3540317244</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2005</creationdate><recordtype>conference_proceeding</recordtype><recordid>eNpNkEtPwzAQhM1Loio98Qd84cAh4PXbxxKVgqjEBc7RxnWQ-0hKnCL49xgVIfay0n6j0ewQcgnsBhgztwBKcw6y0uyITJyxQkkmwHAJx2QEGqAQQrqTP8ZdhuqUjJhgvHBGinMySWnF8ghwAHxEyul-6LY4RE-n_n0fUxxi19KuoXeY8vEJB1xji3QRPqPPpOm7LUU6jx-hpWXX7_bpgpw1uElh8rvH5PV-9lI-FIvn-WM5XRQrbtVQoPLWAgMtORgVkNXGWYfaAwSnRE4UJG_Qq1pybuwSpYCaS-vZcqlr58WYXB18d5g8bpoeWx9TtevjFvuvKj-rlQGRddcHXcqofQt9VXfdOlXAqp8iq39Fim-2J12O</recordid><startdate>2005</startdate><enddate>2005</enddate><creator>Nakazawa, Toshiaki</creator><creator>Kawahara, Daisuke</creator><creator>Kurohashi, Sadao</creator><general>Springer Berlin Heidelberg</general><general>Springer</general><scope>IQODW</scope></search><sort><creationdate>2005</creationdate><title>Automatic Acquisition of Basic Katakana Lexicon from a Given Corpus</title><author>Nakazawa, Toshiaki ; Kawahara, Daisuke ; Kurohashi, Sadao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-j285t-a5c88101642175ea0b7989a6c11e953319e42fac5b42278da431b248c0dd6b9c3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Automatic Acquisition</topic><topic>Compound Noun</topic><topic>Compound Word</topic><topic>Computer science; control theory; systems</topic><topic>Exact sciences and technology</topic><topic>Input Word</topic><topic>Speech and sound recognition and synthesis. Linguistics</topic><topic>Tomato Sauce</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Nakazawa, Toshiaki</creatorcontrib><creatorcontrib>Kawahara, Daisuke</creatorcontrib><creatorcontrib>Kurohashi, Sadao</creatorcontrib><collection>Pascal-Francis</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nakazawa, Toshiaki</au><au>Kawahara, Daisuke</au><au>Kurohashi, Sadao</au><au>Kwong, Oi Yee</au><au>Dale, Robert</au><au>Wong, Kam-Fai</au><au>Su, Jian</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Automatic Acquisition of Basic Katakana Lexicon from a Given Corpus</atitle><btitle>Lecture notes in computer science</btitle><date>2005</date><risdate>2005</risdate><spage>682</spage><epage>693</epage><pages>682-693</pages><issn>0302-9743</issn><eissn>1611-3349</eissn><isbn>9783540291725</isbn><isbn>3540291725</isbn><eisbn>9783540317241</eisbn><eisbn>3540317244</eisbn><abstract>Katakana, Japanese phonogram mainly used for loan words, is a troublemaker in Japanese word segmentation. Since Katakana words are heavily domain-dependent and there are many Katakana neologisms, it is almost impossible to construct and maintain Katakana word dictionary by hand. This paper proposes an automatic segmentation method of Japanese Katakana compounds, which makes it possible to construct precise and concise Katakana word dictionary automatically, given only a medium or large size of Japanese corpus of some domain.</abstract><cop>Berlin, Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/11562214_60</doi><tpages>12</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0302-9743 |
ispartof | Lecture notes in computer science, 2005, p.682-693 |
issn | 0302-9743 1611-3349 |
language | eng |
recordid | cdi_pascalfrancis_primary_17265713 |
source | Springer Books |
subjects | Applied sciences Artificial intelligence Automatic Acquisition Compound Noun Compound Word Computer science control theory systems Exact sciences and technology Input Word Speech and sound recognition and synthesis. Linguistics Tomato Sauce |
title | Automatic Acquisition of Basic Katakana Lexicon from a Given Corpus |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T00%3A34%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-pascalfrancis_sprin&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Automatic%20Acquisition%20of%20Basic%20Katakana%20Lexicon%20from%20a%20Given%20Corpus&rft.btitle=Lecture%20notes%20in%20computer%20science&rft.au=Nakazawa,%20Toshiaki&rft.date=2005&rft.spage=682&rft.epage=693&rft.pages=682-693&rft.issn=0302-9743&rft.eissn=1611-3349&rft.isbn=9783540291725&rft.isbn_list=3540291725&rft_id=info:doi/10.1007/11562214_60&rft_dat=%3Cpascalfrancis_sprin%3E17265713%3C/pascalfrancis_sprin%3E%3Curl%3E%3C/url%3E&rft.eisbn=9783540317241&rft.eisbn_list=3540317244&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |