Automatic content based title extraction for Chinese documents using support vector machine

In this paper, a content-based and domain-independent method for automatically extracting titles from Chinese research papers is proposed. The information contained in the title itself and the similarity between the title and the body of the paper is exploited, under the condition that the experimen...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhengcao Zhang, Maosong Sun, Shaoming Liu
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Computer science Data mining Intelligent systems Internet Laboratories Machine intelligence Robustness Sun Support vector machine classification Support vector machines
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	558
container_issue
container_start_page	553
container_title
container_volume
creator	Zhengcao Zhang Maosong Sun Shaoming Liu
description	In this paper, a content-based and domain-independent method for automatically extracting titles from Chinese research papers is proposed. The information contained in the title itself and the similarity between the title and the body of the paper is exploited, under the condition that the experiment is carried out on plain texts in which no any format information such as font is used. A list of words only used in Chinese titles and a list of words never used in Chinese titles are further collected to facilitate the title extraction. We use the support vector machine classifier to perform a robust and more adaptable automatic title extraction. The method achieves good performance on a test set consisting of 2438 research papers which cover almost all of the academic disciplines.
doi_str_mv	10.1109/NLPKE.2005.1598799
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_1598799</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1598799</ieee_id><sourcerecordid>1598799</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-fbcb17ea406253e4fb1e7cfff6178ce8b761f663dc2bbde5d1b1029819f486d03</originalsourceid><addsrcrecordid>eNotkMtKxDAYhQMiKGNfQDd5gdb8TZM0y6GMFyzqQlcuhiT9o5HphSYVfXsrzuGDs_k4i0PIJbACgOnrx_b5YVeUjIkChK6V1ick06pmK1xzCeKMZDF-sjVcCynlOXnbLmnsTQqOunFIOCRqTcSOppAOSPE7zcalMA7UjzNtPsKAEWk3uqVf3UiXGIZ3GpdpGudEv9ClVeuN-xMvyKk3h4jZsTfk9Wb30tzl7dPtfbNt8wBKpNxbZ0GhqZgsBcfKW0DlvPcSVO2wtkqCl5J3rrS2Q9GBBVbqGrSvatkxviFX_7sBEffTHHoz_-yPF_BfeVRUew</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Automatic content based title extraction for Chinese documents using support vector machine</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Zhengcao Zhang ; Maosong Sun ; Shaoming Liu</creator><creatorcontrib>Zhengcao Zhang ; Maosong Sun ; Shaoming Liu</creatorcontrib><description>In this paper, a content-based and domain-independent method for automatically extracting titles from Chinese research papers is proposed. The information contained in the title itself and the similarity between the title and the body of the paper is exploited, under the condition that the experiment is carried out on plain texts in which no any format information such as font is used. A list of words only used in Chinese titles and a list of words never used in Chinese titles are further collected to facilitate the title extraction. We use the support vector machine classifier to perform a robust and more adaptable automatic title extraction. The method achieves good performance on a test set consisting of 2438 research papers which cover almost all of the academic disciplines.</description><identifier>ISBN: 9780780393615</identifier><identifier>ISBN: 0780393619</identifier><identifier>DOI: 10.1109/NLPKE.2005.1598799</identifier><language>eng</language><publisher>IEEE</publisher><subject>Computer science ; Data mining ; Intelligent systems ; Internet ; Laboratories ; Machine intelligence ; Robustness ; Sun ; Support vector machine classification ; Support vector machines</subject><ispartof>2005 International Conference on Natural Language Processing and Knowledge Engineering, 2005, p.553-558</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1598799$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,4036,4037,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1598799$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhengcao Zhang</creatorcontrib><creatorcontrib>Maosong Sun</creatorcontrib><creatorcontrib>Shaoming Liu</creatorcontrib><title>Automatic content based title extraction for Chinese documents using support vector machine</title><title>2005 International Conference on Natural Language Processing and Knowledge Engineering</title><addtitle>NLPKE</addtitle><description>In this paper, a content-based and domain-independent method for automatically extracting titles from Chinese research papers is proposed. The information contained in the title itself and the similarity between the title and the body of the paper is exploited, under the condition that the experiment is carried out on plain texts in which no any format information such as font is used. A list of words only used in Chinese titles and a list of words never used in Chinese titles are further collected to facilitate the title extraction. We use the support vector machine classifier to perform a robust and more adaptable automatic title extraction. The method achieves good performance on a test set consisting of 2438 research papers which cover almost all of the academic disciplines.</description><subject>Computer science</subject><subject>Data mining</subject><subject>Intelligent systems</subject><subject>Internet</subject><subject>Laboratories</subject><subject>Machine intelligence</subject><subject>Robustness</subject><subject>Sun</subject><subject>Support vector machine classification</subject><subject>Support vector machines</subject><isbn>9780780393615</isbn><isbn>0780393619</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2005</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotkMtKxDAYhQMiKGNfQDd5gdb8TZM0y6GMFyzqQlcuhiT9o5HphSYVfXsrzuGDs_k4i0PIJbACgOnrx_b5YVeUjIkChK6V1ick06pmK1xzCeKMZDF-sjVcCynlOXnbLmnsTQqOunFIOCRqTcSOppAOSPE7zcalMA7UjzNtPsKAEWk3uqVf3UiXGIZ3GpdpGudEv9ClVeuN-xMvyKk3h4jZsTfk9Wb30tzl7dPtfbNt8wBKpNxbZ0GhqZgsBcfKW0DlvPcSVO2wtkqCl5J3rrS2Q9GBBVbqGrSvatkxviFX_7sBEffTHHoz_-yPF_BfeVRUew</recordid><startdate>2005</startdate><enddate>2005</enddate><creator>Zhengcao Zhang</creator><creator>Maosong Sun</creator><creator>Shaoming Liu</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>2005</creationdate><title>Automatic content based title extraction for Chinese documents using support vector machine</title><author>Zhengcao Zhang ; Maosong Sun ; Shaoming Liu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-fbcb17ea406253e4fb1e7cfff6178ce8b761f663dc2bbde5d1b1029819f486d03</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Computer science</topic><topic>Data mining</topic><topic>Intelligent systems</topic><topic>Internet</topic><topic>Laboratories</topic><topic>Machine intelligence</topic><topic>Robustness</topic><topic>Sun</topic><topic>Support vector machine classification</topic><topic>Support vector machines</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhengcao Zhang</creatorcontrib><creatorcontrib>Maosong Sun</creatorcontrib><creatorcontrib>Shaoming Liu</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhengcao Zhang</au><au>Maosong Sun</au><au>Shaoming Liu</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Automatic content based title extraction for Chinese documents using support vector machine</atitle><btitle>2005 International Conference on Natural Language Processing and Knowledge Engineering</btitle><stitle>NLPKE</stitle><date>2005</date><risdate>2005</risdate><spage>553</spage><epage>558</epage><pages>553-558</pages><isbn>9780780393615</isbn><isbn>0780393619</isbn><abstract>In this paper, a content-based and domain-independent method for automatically extracting titles from Chinese research papers is proposed. The information contained in the title itself and the similarity between the title and the body of the paper is exploited, under the condition that the experiment is carried out on plain texts in which no any format information such as font is used. A list of words only used in Chinese titles and a list of words never used in Chinese titles are further collected to facilitate the title extraction. We use the support vector machine classifier to perform a robust and more adaptable automatic title extraction. The method achieves good performance on a test set consisting of 2438 research papers which cover almost all of the academic disciplines.</abstract><pub>IEEE</pub><doi>10.1109/NLPKE.2005.1598799</doi><tpages>6</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISBN: 9780780393615
ispartof	2005 International Conference on Natural Language Processing and Knowledge Engineering, 2005, p.553-558
issn
language	eng
recordid	cdi_ieee_primary_1598799
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Computer science Data mining Intelligent systems Internet Laboratories Machine intelligence Robustness Sun Support vector machine classification Support vector machines
title	Automatic content based title extraction for Chinese documents using support vector machine
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T23%3A36%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Automatic%20content%20based%20title%20extraction%20for%20Chinese%20documents%20using%20support%20vector%20machine&rft.btitle=2005%20International%20Conference%20on%20Natural%20Language%20Processing%20and%20Knowledge%20Engineering&rft.au=Zhengcao%20Zhang&rft.date=2005&rft.spage=553&rft.epage=558&rft.pages=553-558&rft.isbn=9780780393615&rft.isbn_list=0780393619&rft_id=info:doi/10.1109/NLPKE.2005.1598799&rft_dat=%3Cieee_6IE%3E1598799%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=1598799&rfr_iscdi=true