Short Texts Classification Through Reference Document Expansion
With the rapid development of information technology, short texts arising from socialized human inter- action are gradually predominant in network information streams. Accelerating demands are requiring the industry to provide more effective classification of the brief texts. However, faced with sho...
Gespeichert in:
Veröffentlicht in: | Chinese Journal of Electronics 2014-04, Vol.23 (2), p.315-321 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 321 |
---|---|
container_issue | 2 |
container_start_page | 315 |
container_title | Chinese Journal of Electronics |
container_volume | 23 |
creator | Yang, Zhen Fan, Kefeng Lai, Yingxu Gao, Kaiming Wang, Yong |
description | With the rapid development of information technology, short texts arising from socialized human inter- action are gradually predominant in network information streams. Accelerating demands are requiring the industry to provide more effective classification of the brief texts. However, faced with short text documents, each of which contains only a few words, traditional document classifi- cation models run into difficulty. Aggressive documents expansion works remarkably well for many cases but suf- fers from the assumption of independent, identically dis- tributed observations. We formalize a view of classification using Bayesian decision theory, treat each short text as ob- servations from a probabilistic model, called a statistical language model, and encode classification preferences with a loss function defined by the language models and the ex- ternal reference document. According to Vapnik's meth- ods of Structural risk minimization (SRM), the optimal classification action is the one that minimizes the struc- tural risk, which provides a result that allows one to trade off errors on the training sample against improved gener- alization performance. We conduct experiments by using several corpora of microblog-like data, and analyze the ex- perimental results. With respect to established baselines, results of these experiments show that applying our pro- posed document expansion method produces better chance to achieve the improved classification performance. |
doi_str_mv | 10.23919/CJE.2014.10851881 |
format | Article |
fullrecord | <record><control><sourceid>crossref_chong</sourceid><recordid>TN_cdi_chongqing_primary_49522452</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><cqvip_id>49522452</cqvip_id><sourcerecordid>10_23919_CJE_2014_10851881</sourcerecordid><originalsourceid>FETCH-LOGICAL-c224t-66243dbc2b2c9b1ef731da082ab7907ec899e662d363cbdcac683de1ef452cf3</originalsourceid><addsrcrecordid>eNo9kNFKwzAUQIMoWOZ-wKf6AZ3JTdIkTyK1TmUgaN9LmiZtYWtn0sH8e6NOn-7DPedyOQhdE7wCqoi6LV7KFWDCVgRLTqQkZygBLHjGuRLnKCEYIGM5p5doGcLQYJwLzAmBBN2995Of08oe55AWWx3XbjB6HqYxrXo_Hbo-fbPOejsamz5M5rCz45yWx70eQ4Su0IXT22CXp7lA1WNZFU_Z5nX9XNxvMgPA5izPgdG2MdCAUQ2xTlDSaixBN0JhYY1UykaopTk1TWu0ySVtbQQZB-PoAsHvWeOnELx19d4PO-0_a4Lrnwh1jFB_R6j_IkTp5iT109h9DGP3bzHF418c6Ber4VuP</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Short Texts Classification Through Reference Document Expansion</title><source>Alma/SFX Local Collection</source><creator>Yang, Zhen ; Fan, Kefeng ; Lai, Yingxu ; Gao, Kaiming ; Wang, Yong</creator><creatorcontrib>Yang, Zhen ; Fan, Kefeng ; Lai, Yingxu ; Gao, Kaiming ; Wang, Yong</creatorcontrib><description>With the rapid development of information technology, short texts arising from socialized human inter- action are gradually predominant in network information streams. Accelerating demands are requiring the industry to provide more effective classification of the brief texts. However, faced with short text documents, each of which contains only a few words, traditional document classifi- cation models run into difficulty. Aggressive documents expansion works remarkably well for many cases but suf- fers from the assumption of independent, identically dis- tributed observations. We formalize a view of classification using Bayesian decision theory, treat each short text as ob- servations from a probabilistic model, called a statistical language model, and encode classification preferences with a loss function defined by the language models and the ex- ternal reference document. According to Vapnik's meth- ods of Structural risk minimization (SRM), the optimal classification action is the one that minimizes the struc- tural risk, which provides a result that allows one to trade off errors on the training sample against improved gener- alization performance. We conduct experiments by using several corpora of microblog-like data, and analyze the ex- perimental results. With respect to established baselines, results of these experiments show that applying our pro- posed document expansion method produces better chance to achieve the improved classification performance.</description><identifier>ISSN: 1022-4653</identifier><identifier>EISSN: 2075-5597</identifier><identifier>DOI: 10.23919/CJE.2014.10851881</identifier><language>eng</language><subject>信息技术 ; 扩建工程 ; 文本分类 ; 文本文件 ; 结构风险最小化 ; 统计语言模型 ; 观测资料 ; 贝叶斯决策理论</subject><ispartof>Chinese Journal of Electronics, 2014-04, Vol.23 (2), p.315-321</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c224t-66243dbc2b2c9b1ef731da082ab7907ec899e662d363cbdcac683de1ef452cf3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttp://image.cqvip.com/vip1000/qk/86774X/86774X.jpg</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Yang, Zhen</creatorcontrib><creatorcontrib>Fan, Kefeng</creatorcontrib><creatorcontrib>Lai, Yingxu</creatorcontrib><creatorcontrib>Gao, Kaiming</creatorcontrib><creatorcontrib>Wang, Yong</creatorcontrib><title>Short Texts Classification Through Reference Document Expansion</title><title>Chinese Journal of Electronics</title><addtitle>Chinese of Journal Electronics</addtitle><description>With the rapid development of information technology, short texts arising from socialized human inter- action are gradually predominant in network information streams. Accelerating demands are requiring the industry to provide more effective classification of the brief texts. However, faced with short text documents, each of which contains only a few words, traditional document classifi- cation models run into difficulty. Aggressive documents expansion works remarkably well for many cases but suf- fers from the assumption of independent, identically dis- tributed observations. We formalize a view of classification using Bayesian decision theory, treat each short text as ob- servations from a probabilistic model, called a statistical language model, and encode classification preferences with a loss function defined by the language models and the ex- ternal reference document. According to Vapnik's meth- ods of Structural risk minimization (SRM), the optimal classification action is the one that minimizes the struc- tural risk, which provides a result that allows one to trade off errors on the training sample against improved gener- alization performance. We conduct experiments by using several corpora of microblog-like data, and analyze the ex- perimental results. With respect to established baselines, results of these experiments show that applying our pro- posed document expansion method produces better chance to achieve the improved classification performance.</description><subject>信息技术</subject><subject>扩建工程</subject><subject>文本分类</subject><subject>文本文件</subject><subject>结构风险最小化</subject><subject>统计语言模型</subject><subject>观测资料</subject><subject>贝叶斯决策理论</subject><issn>1022-4653</issn><issn>2075-5597</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNo9kNFKwzAUQIMoWOZ-wKf6AZ3JTdIkTyK1TmUgaN9LmiZtYWtn0sH8e6NOn-7DPedyOQhdE7wCqoi6LV7KFWDCVgRLTqQkZygBLHjGuRLnKCEYIGM5p5doGcLQYJwLzAmBBN2995Of08oe55AWWx3XbjB6HqYxrXo_Hbo-fbPOejsamz5M5rCz45yWx70eQ4Su0IXT22CXp7lA1WNZFU_Z5nX9XNxvMgPA5izPgdG2MdCAUQ2xTlDSaixBN0JhYY1UykaopTk1TWu0ySVtbQQZB-PoAsHvWeOnELx19d4PO-0_a4Lrnwh1jFB_R6j_IkTp5iT109h9DGP3bzHF418c6Ber4VuP</recordid><startdate>20140401</startdate><enddate>20140401</enddate><creator>Yang, Zhen</creator><creator>Fan, Kefeng</creator><creator>Lai, Yingxu</creator><creator>Gao, Kaiming</creator><creator>Wang, Yong</creator><scope>2RA</scope><scope>92L</scope><scope>CQIGP</scope><scope>W92</scope><scope>~WA</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20140401</creationdate><title>Short Texts Classification Through Reference Document Expansion</title><author>Yang, Zhen ; Fan, Kefeng ; Lai, Yingxu ; Gao, Kaiming ; Wang, Yong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c224t-66243dbc2b2c9b1ef731da082ab7907ec899e662d363cbdcac683de1ef452cf3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>信息技术</topic><topic>扩建工程</topic><topic>文本分类</topic><topic>文本文件</topic><topic>结构风险最小化</topic><topic>统计语言模型</topic><topic>观测资料</topic><topic>贝叶斯决策理论</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yang, Zhen</creatorcontrib><creatorcontrib>Fan, Kefeng</creatorcontrib><creatorcontrib>Lai, Yingxu</creatorcontrib><creatorcontrib>Gao, Kaiming</creatorcontrib><creatorcontrib>Wang, Yong</creatorcontrib><collection>中文科技期刊数据库</collection><collection>中文科技期刊数据库-CALIS站点</collection><collection>中文科技期刊数据库-7.0平台</collection><collection>中文科技期刊数据库-工程技术</collection><collection>中文科技期刊数据库- 镜像站点</collection><collection>CrossRef</collection><jtitle>Chinese Journal of Electronics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Zhen</au><au>Fan, Kefeng</au><au>Lai, Yingxu</au><au>Gao, Kaiming</au><au>Wang, Yong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Short Texts Classification Through Reference Document Expansion</atitle><jtitle>Chinese Journal of Electronics</jtitle><addtitle>Chinese of Journal Electronics</addtitle><date>2014-04-01</date><risdate>2014</risdate><volume>23</volume><issue>2</issue><spage>315</spage><epage>321</epage><pages>315-321</pages><issn>1022-4653</issn><eissn>2075-5597</eissn><abstract>With the rapid development of information technology, short texts arising from socialized human inter- action are gradually predominant in network information streams. Accelerating demands are requiring the industry to provide more effective classification of the brief texts. However, faced with short text documents, each of which contains only a few words, traditional document classifi- cation models run into difficulty. Aggressive documents expansion works remarkably well for many cases but suf- fers from the assumption of independent, identically dis- tributed observations. We formalize a view of classification using Bayesian decision theory, treat each short text as ob- servations from a probabilistic model, called a statistical language model, and encode classification preferences with a loss function defined by the language models and the ex- ternal reference document. According to Vapnik's meth- ods of Structural risk minimization (SRM), the optimal classification action is the one that minimizes the struc- tural risk, which provides a result that allows one to trade off errors on the training sample against improved gener- alization performance. We conduct experiments by using several corpora of microblog-like data, and analyze the ex- perimental results. With respect to established baselines, results of these experiments show that applying our pro- posed document expansion method produces better chance to achieve the improved classification performance.</abstract><doi>10.23919/CJE.2014.10851881</doi><tpages>7</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1022-4653 |
ispartof | Chinese Journal of Electronics, 2014-04, Vol.23 (2), p.315-321 |
issn | 1022-4653 2075-5597 |
language | eng |
recordid | cdi_chongqing_primary_49522452 |
source | Alma/SFX Local Collection |
subjects | 信息技术 扩建工程 文本分类 文本文件 结构风险最小化 统计语言模型 观测资料 贝叶斯决策理论 |
title | Short Texts Classification Through Reference Document Expansion |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T18%3A08%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_chong&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Short%20Texts%20Classification%20Through%20Reference%20Document%20Expansion&rft.jtitle=Chinese%20Journal%20of%20Electronics&rft.au=Yang,%20Zhen&rft.date=2014-04-01&rft.volume=23&rft.issue=2&rft.spage=315&rft.epage=321&rft.pages=315-321&rft.issn=1022-4653&rft.eissn=2075-5597&rft_id=info:doi/10.23919/CJE.2014.10851881&rft_dat=%3Ccrossref_chong%3E10_23919_CJE_2014_10851881%3C/crossref_chong%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_cqvip_id=49522452&rfr_iscdi=true |