Automatic content based title extraction for Chinese documents using support vector machine

In this paper, a content-based and domain-independent method for automatically extracting titles from Chinese research papers is proposed. The information contained in the title itself and the similarity between the title and the body of the paper is exploited, under the condition that the experimen...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Zhengcao Zhang, Maosong Sun, Shaoming Liu
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 558
container_issue
container_start_page 553
container_title
container_volume
creator Zhengcao Zhang
Maosong Sun
Shaoming Liu
description In this paper, a content-based and domain-independent method for automatically extracting titles from Chinese research papers is proposed. The information contained in the title itself and the similarity between the title and the body of the paper is exploited, under the condition that the experiment is carried out on plain texts in which no any format information such as font is used. A list of words only used in Chinese titles and a list of words never used in Chinese titles are further collected to facilitate the title extraction. We use the support vector machine classifier to perform a robust and more adaptable automatic title extraction. The method achieves good performance on a test set consisting of 2438 research papers which cover almost all of the academic disciplines.
doi_str_mv 10.1109/NLPKE.2005.1598799
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_1598799</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1598799</ieee_id><sourcerecordid>1598799</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-fbcb17ea406253e4fb1e7cfff6178ce8b761f663dc2bbde5d1b1029819f486d03</originalsourceid><addsrcrecordid>eNotkMtKxDAYhQMiKGNfQDd5gdb8TZM0y6GMFyzqQlcuhiT9o5HphSYVfXsrzuGDs_k4i0PIJbACgOnrx_b5YVeUjIkChK6V1ick06pmK1xzCeKMZDF-sjVcCynlOXnbLmnsTQqOunFIOCRqTcSOppAOSPE7zcalMA7UjzNtPsKAEWk3uqVf3UiXGIZ3GpdpGudEv9ClVeuN-xMvyKk3h4jZsTfk9Wb30tzl7dPtfbNt8wBKpNxbZ0GhqZgsBcfKW0DlvPcSVO2wtkqCl5J3rrS2Q9GBBVbqGrSvatkxviFX_7sBEffTHHoz_-yPF_BfeVRUew</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Automatic content based title extraction for Chinese documents using support vector machine</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Zhengcao Zhang ; Maosong Sun ; Shaoming Liu</creator><creatorcontrib>Zhengcao Zhang ; Maosong Sun ; Shaoming Liu</creatorcontrib><description>In this paper, a content-based and domain-independent method for automatically extracting titles from Chinese research papers is proposed. The information contained in the title itself and the similarity between the title and the body of the paper is exploited, under the condition that the experiment is carried out on plain texts in which no any format information such as font is used. A list of words only used in Chinese titles and a list of words never used in Chinese titles are further collected to facilitate the title extraction. We use the support vector machine classifier to perform a robust and more adaptable automatic title extraction. The method achieves good performance on a test set consisting of 2438 research papers which cover almost all of the academic disciplines.</description><identifier>ISBN: 9780780393615</identifier><identifier>ISBN: 0780393619</identifier><identifier>DOI: 10.1109/NLPKE.2005.1598799</identifier><language>eng</language><publisher>IEEE</publisher><subject>Computer science ; Data mining ; Intelligent systems ; Internet ; Laboratories ; Machine intelligence ; Robustness ; Sun ; Support vector machine classification ; Support vector machines</subject><ispartof>2005 International Conference on Natural Language Processing and Knowledge Engineering, 2005, p.553-558</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1598799$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,4036,4037,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1598799$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhengcao Zhang</creatorcontrib><creatorcontrib>Maosong Sun</creatorcontrib><creatorcontrib>Shaoming Liu</creatorcontrib><title>Automatic content based title extraction for Chinese documents using support vector machine</title><title>2005 International Conference on Natural Language Processing and Knowledge Engineering</title><addtitle>NLPKE</addtitle><description>In this paper, a content-based and domain-independent method for automatically extracting titles from Chinese research papers is proposed. The information contained in the title itself and the similarity between the title and the body of the paper is exploited, under the condition that the experiment is carried out on plain texts in which no any format information such as font is used. A list of words only used in Chinese titles and a list of words never used in Chinese titles are further collected to facilitate the title extraction. We use the support vector machine classifier to perform a robust and more adaptable automatic title extraction. The method achieves good performance on a test set consisting of 2438 research papers which cover almost all of the academic disciplines.</description><subject>Computer science</subject><subject>Data mining</subject><subject>Intelligent systems</subject><subject>Internet</subject><subject>Laboratories</subject><subject>Machine intelligence</subject><subject>Robustness</subject><subject>Sun</subject><subject>Support vector machine classification</subject><subject>Support vector machines</subject><isbn>9780780393615</isbn><isbn>0780393619</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2005</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotkMtKxDAYhQMiKGNfQDd5gdb8TZM0y6GMFyzqQlcuhiT9o5HphSYVfXsrzuGDs_k4i0PIJbACgOnrx_b5YVeUjIkChK6V1ick06pmK1xzCeKMZDF-sjVcCynlOXnbLmnsTQqOunFIOCRqTcSOppAOSPE7zcalMA7UjzNtPsKAEWk3uqVf3UiXGIZ3GpdpGudEv9ClVeuN-xMvyKk3h4jZsTfk9Wb30tzl7dPtfbNt8wBKpNxbZ0GhqZgsBcfKW0DlvPcSVO2wtkqCl5J3rrS2Q9GBBVbqGrSvatkxviFX_7sBEffTHHoz_-yPF_BfeVRUew</recordid><startdate>2005</startdate><enddate>2005</enddate><creator>Zhengcao Zhang</creator><creator>Maosong Sun</creator><creator>Shaoming Liu</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>2005</creationdate><title>Automatic content based title extraction for Chinese documents using support vector machine</title><author>Zhengcao Zhang ; Maosong Sun ; Shaoming Liu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-fbcb17ea406253e4fb1e7cfff6178ce8b761f663dc2bbde5d1b1029819f486d03</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Computer science</topic><topic>Data mining</topic><topic>Intelligent systems</topic><topic>Internet</topic><topic>Laboratories</topic><topic>Machine intelligence</topic><topic>Robustness</topic><topic>Sun</topic><topic>Support vector machine classification</topic><topic>Support vector machines</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhengcao Zhang</creatorcontrib><creatorcontrib>Maosong Sun</creatorcontrib><creatorcontrib>Shaoming Liu</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhengcao Zhang</au><au>Maosong Sun</au><au>Shaoming Liu</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Automatic content based title extraction for Chinese documents using support vector machine</atitle><btitle>2005 International Conference on Natural Language Processing and Knowledge Engineering</btitle><stitle>NLPKE</stitle><date>2005</date><risdate>2005</risdate><spage>553</spage><epage>558</epage><pages>553-558</pages><isbn>9780780393615</isbn><isbn>0780393619</isbn><abstract>In this paper, a content-based and domain-independent method for automatically extracting titles from Chinese research papers is proposed. The information contained in the title itself and the similarity between the title and the body of the paper is exploited, under the condition that the experiment is carried out on plain texts in which no any format information such as font is used. A list of words only used in Chinese titles and a list of words never used in Chinese titles are further collected to facilitate the title extraction. We use the support vector machine classifier to perform a robust and more adaptable automatic title extraction. The method achieves good performance on a test set consisting of 2438 research papers which cover almost all of the academic disciplines.</abstract><pub>IEEE</pub><doi>10.1109/NLPKE.2005.1598799</doi><tpages>6</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISBN: 9780780393615
ispartof 2005 International Conference on Natural Language Processing and Knowledge Engineering, 2005, p.553-558
issn
language eng
recordid cdi_ieee_primary_1598799
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Computer science
Data mining
Intelligent systems
Internet
Laboratories
Machine intelligence
Robustness
Sun
Support vector machine classification
Support vector machines
title Automatic content based title extraction for Chinese documents using support vector machine
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T23%3A36%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Automatic%20content%20based%20title%20extraction%20for%20Chinese%20documents%20using%20support%20vector%20machine&rft.btitle=2005%20International%20Conference%20on%20Natural%20Language%20Processing%20and%20Knowledge%20Engineering&rft.au=Zhengcao%20Zhang&rft.date=2005&rft.spage=553&rft.epage=558&rft.pages=553-558&rft.isbn=9780780393615&rft.isbn_list=0780393619&rft_id=info:doi/10.1109/NLPKE.2005.1598799&rft_dat=%3Cieee_6IE%3E1598799%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=1598799&rfr_iscdi=true