Short Texts Classification Through Reference Document Expansion

With the rapid development of information technology, short texts arising from socialized human inter- action are gradually predominant in network information streams. Accelerating demands are requiring the industry to provide more effective classification of the brief texts. However, faced with sho...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Chinese Journal of Electronics 2014-04, Vol.23 (2), p.315-321
Hauptverfasser: Yang, Zhen, Fan, Kefeng, Lai, Yingxu, Gao, Kaiming, Wang, Yong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 321
container_issue 2
container_start_page 315
container_title Chinese Journal of Electronics
container_volume 23
creator Yang, Zhen
Fan, Kefeng
Lai, Yingxu
Gao, Kaiming
Wang, Yong
description With the rapid development of information technology, short texts arising from socialized human inter- action are gradually predominant in network information streams. Accelerating demands are requiring the industry to provide more effective classification of the brief texts. However, faced with short text documents, each of which contains only a few words, traditional document classifi- cation models run into difficulty. Aggressive documents expansion works remarkably well for many cases but suf- fers from the assumption of independent, identically dis- tributed observations. We formalize a view of classification using Bayesian decision theory, treat each short text as ob- servations from a probabilistic model, called a statistical language model, and encode classification preferences with a loss function defined by the language models and the ex- ternal reference document. According to Vapnik's meth- ods of Structural risk minimization (SRM), the optimal classification action is the one that minimizes the struc- tural risk, which provides a result that allows one to trade off errors on the training sample against improved gener- alization performance. We conduct experiments by using several corpora of microblog-like data, and analyze the ex- perimental results. With respect to established baselines, results of these experiments show that applying our pro- posed document expansion method produces better chance to achieve the improved classification performance.
doi_str_mv 10.23919/CJE.2014.10851881
format Article
fullrecord <record><control><sourceid>crossref_chong</sourceid><recordid>TN_cdi_chongqing_primary_49522452</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><cqvip_id>49522452</cqvip_id><sourcerecordid>10_23919_CJE_2014_10851881</sourcerecordid><originalsourceid>FETCH-LOGICAL-c224t-66243dbc2b2c9b1ef731da082ab7907ec899e662d363cbdcac683de1ef452cf3</originalsourceid><addsrcrecordid>eNo9kNFKwzAUQIMoWOZ-wKf6AZ3JTdIkTyK1TmUgaN9LmiZtYWtn0sH8e6NOn-7DPedyOQhdE7wCqoi6LV7KFWDCVgRLTqQkZygBLHjGuRLnKCEYIGM5p5doGcLQYJwLzAmBBN2995Of08oe55AWWx3XbjB6HqYxrXo_Hbo-fbPOejsamz5M5rCz45yWx70eQ4Su0IXT22CXp7lA1WNZFU_Z5nX9XNxvMgPA5izPgdG2MdCAUQ2xTlDSaixBN0JhYY1UykaopTk1TWu0ySVtbQQZB-PoAsHvWeOnELx19d4PO-0_a4Lrnwh1jFB_R6j_IkTp5iT109h9DGP3bzHF418c6Ber4VuP</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Short Texts Classification Through Reference Document Expansion</title><source>Alma/SFX Local Collection</source><creator>Yang, Zhen ; Fan, Kefeng ; Lai, Yingxu ; Gao, Kaiming ; Wang, Yong</creator><creatorcontrib>Yang, Zhen ; Fan, Kefeng ; Lai, Yingxu ; Gao, Kaiming ; Wang, Yong</creatorcontrib><description>With the rapid development of information technology, short texts arising from socialized human inter- action are gradually predominant in network information streams. Accelerating demands are requiring the industry to provide more effective classification of the brief texts. However, faced with short text documents, each of which contains only a few words, traditional document classifi- cation models run into difficulty. Aggressive documents expansion works remarkably well for many cases but suf- fers from the assumption of independent, identically dis- tributed observations. We formalize a view of classification using Bayesian decision theory, treat each short text as ob- servations from a probabilistic model, called a statistical language model, and encode classification preferences with a loss function defined by the language models and the ex- ternal reference document. According to Vapnik's meth- ods of Structural risk minimization (SRM), the optimal classification action is the one that minimizes the struc- tural risk, which provides a result that allows one to trade off errors on the training sample against improved gener- alization performance. We conduct experiments by using several corpora of microblog-like data, and analyze the ex- perimental results. With respect to established baselines, results of these experiments show that applying our pro- posed document expansion method produces better chance to achieve the improved classification performance.</description><identifier>ISSN: 1022-4653</identifier><identifier>EISSN: 2075-5597</identifier><identifier>DOI: 10.23919/CJE.2014.10851881</identifier><language>eng</language><subject>信息技术 ; 扩建工程 ; 文本分类 ; 文本文件 ; 结构风险最小化 ; 统计语言模型 ; 观测资料 ; 贝叶斯决策理论</subject><ispartof>Chinese Journal of Electronics, 2014-04, Vol.23 (2), p.315-321</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c224t-66243dbc2b2c9b1ef731da082ab7907ec899e662d363cbdcac683de1ef452cf3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttp://image.cqvip.com/vip1000/qk/86774X/86774X.jpg</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Yang, Zhen</creatorcontrib><creatorcontrib>Fan, Kefeng</creatorcontrib><creatorcontrib>Lai, Yingxu</creatorcontrib><creatorcontrib>Gao, Kaiming</creatorcontrib><creatorcontrib>Wang, Yong</creatorcontrib><title>Short Texts Classification Through Reference Document Expansion</title><title>Chinese Journal of Electronics</title><addtitle>Chinese of Journal Electronics</addtitle><description>With the rapid development of information technology, short texts arising from socialized human inter- action are gradually predominant in network information streams. Accelerating demands are requiring the industry to provide more effective classification of the brief texts. However, faced with short text documents, each of which contains only a few words, traditional document classifi- cation models run into difficulty. Aggressive documents expansion works remarkably well for many cases but suf- fers from the assumption of independent, identically dis- tributed observations. We formalize a view of classification using Bayesian decision theory, treat each short text as ob- servations from a probabilistic model, called a statistical language model, and encode classification preferences with a loss function defined by the language models and the ex- ternal reference document. According to Vapnik's meth- ods of Structural risk minimization (SRM), the optimal classification action is the one that minimizes the struc- tural risk, which provides a result that allows one to trade off errors on the training sample against improved gener- alization performance. We conduct experiments by using several corpora of microblog-like data, and analyze the ex- perimental results. With respect to established baselines, results of these experiments show that applying our pro- posed document expansion method produces better chance to achieve the improved classification performance.</description><subject>信息技术</subject><subject>扩建工程</subject><subject>文本分类</subject><subject>文本文件</subject><subject>结构风险最小化</subject><subject>统计语言模型</subject><subject>观测资料</subject><subject>贝叶斯决策理论</subject><issn>1022-4653</issn><issn>2075-5597</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNo9kNFKwzAUQIMoWOZ-wKf6AZ3JTdIkTyK1TmUgaN9LmiZtYWtn0sH8e6NOn-7DPedyOQhdE7wCqoi6LV7KFWDCVgRLTqQkZygBLHjGuRLnKCEYIGM5p5doGcLQYJwLzAmBBN2995Of08oe55AWWx3XbjB6HqYxrXo_Hbo-fbPOejsamz5M5rCz45yWx70eQ4Su0IXT22CXp7lA1WNZFU_Z5nX9XNxvMgPA5izPgdG2MdCAUQ2xTlDSaixBN0JhYY1UykaopTk1TWu0ySVtbQQZB-PoAsHvWeOnELx19d4PO-0_a4Lrnwh1jFB_R6j_IkTp5iT109h9DGP3bzHF418c6Ber4VuP</recordid><startdate>20140401</startdate><enddate>20140401</enddate><creator>Yang, Zhen</creator><creator>Fan, Kefeng</creator><creator>Lai, Yingxu</creator><creator>Gao, Kaiming</creator><creator>Wang, Yong</creator><scope>2RA</scope><scope>92L</scope><scope>CQIGP</scope><scope>W92</scope><scope>~WA</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20140401</creationdate><title>Short Texts Classification Through Reference Document Expansion</title><author>Yang, Zhen ; Fan, Kefeng ; Lai, Yingxu ; Gao, Kaiming ; Wang, Yong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c224t-66243dbc2b2c9b1ef731da082ab7907ec899e662d363cbdcac683de1ef452cf3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>信息技术</topic><topic>扩建工程</topic><topic>文本分类</topic><topic>文本文件</topic><topic>结构风险最小化</topic><topic>统计语言模型</topic><topic>观测资料</topic><topic>贝叶斯决策理论</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yang, Zhen</creatorcontrib><creatorcontrib>Fan, Kefeng</creatorcontrib><creatorcontrib>Lai, Yingxu</creatorcontrib><creatorcontrib>Gao, Kaiming</creatorcontrib><creatorcontrib>Wang, Yong</creatorcontrib><collection>中文科技期刊数据库</collection><collection>中文科技期刊数据库-CALIS站点</collection><collection>中文科技期刊数据库-7.0平台</collection><collection>中文科技期刊数据库-工程技术</collection><collection>中文科技期刊数据库- 镜像站点</collection><collection>CrossRef</collection><jtitle>Chinese Journal of Electronics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Zhen</au><au>Fan, Kefeng</au><au>Lai, Yingxu</au><au>Gao, Kaiming</au><au>Wang, Yong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Short Texts Classification Through Reference Document Expansion</atitle><jtitle>Chinese Journal of Electronics</jtitle><addtitle>Chinese of Journal Electronics</addtitle><date>2014-04-01</date><risdate>2014</risdate><volume>23</volume><issue>2</issue><spage>315</spage><epage>321</epage><pages>315-321</pages><issn>1022-4653</issn><eissn>2075-5597</eissn><abstract>With the rapid development of information technology, short texts arising from socialized human inter- action are gradually predominant in network information streams. Accelerating demands are requiring the industry to provide more effective classification of the brief texts. However, faced with short text documents, each of which contains only a few words, traditional document classifi- cation models run into difficulty. Aggressive documents expansion works remarkably well for many cases but suf- fers from the assumption of independent, identically dis- tributed observations. We formalize a view of classification using Bayesian decision theory, treat each short text as ob- servations from a probabilistic model, called a statistical language model, and encode classification preferences with a loss function defined by the language models and the ex- ternal reference document. According to Vapnik's meth- ods of Structural risk minimization (SRM), the optimal classification action is the one that minimizes the struc- tural risk, which provides a result that allows one to trade off errors on the training sample against improved gener- alization performance. We conduct experiments by using several corpora of microblog-like data, and analyze the ex- perimental results. With respect to established baselines, results of these experiments show that applying our pro- posed document expansion method produces better chance to achieve the improved classification performance.</abstract><doi>10.23919/CJE.2014.10851881</doi><tpages>7</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1022-4653
ispartof Chinese Journal of Electronics, 2014-04, Vol.23 (2), p.315-321
issn 1022-4653
2075-5597
language eng
recordid cdi_chongqing_primary_49522452
source Alma/SFX Local Collection
subjects 信息技术
扩建工程
文本分类
文本文件
结构风险最小化
统计语言模型
观测资料
贝叶斯决策理论
title Short Texts Classification Through Reference Document Expansion
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T18%3A08%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_chong&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Short%20Texts%20Classification%20Through%20Reference%20Document%20Expansion&rft.jtitle=Chinese%20Journal%20of%20Electronics&rft.au=Yang,%20Zhen&rft.date=2014-04-01&rft.volume=23&rft.issue=2&rft.spage=315&rft.epage=321&rft.pages=315-321&rft.issn=1022-4653&rft.eissn=2075-5597&rft_id=info:doi/10.23919/CJE.2014.10851881&rft_dat=%3Ccrossref_chong%3E10_23919_CJE_2014_10851881%3C/crossref_chong%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_cqvip_id=49522452&rfr_iscdi=true