Automatic content based title extraction for Chinese documents using support vector machine
In this paper, a content-based and domain-independent method for automatically extracting titles from Chinese research papers is proposed. The information contained in the title itself and the similarity between the title and the body of the paper is exploited, under the condition that the experimen...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 558 |
---|---|
container_issue | |
container_start_page | 553 |
container_title | |
container_volume | |
creator | Zhengcao Zhang Maosong Sun Shaoming Liu |
description | In this paper, a content-based and domain-independent method for automatically extracting titles from Chinese research papers is proposed. The information contained in the title itself and the similarity between the title and the body of the paper is exploited, under the condition that the experiment is carried out on plain texts in which no any format information such as font is used. A list of words only used in Chinese titles and a list of words never used in Chinese titles are further collected to facilitate the title extraction. We use the support vector machine classifier to perform a robust and more adaptable automatic title extraction. The method achieves good performance on a test set consisting of 2438 research papers which cover almost all of the academic disciplines. |
doi_str_mv | 10.1109/NLPKE.2005.1598799 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_1598799</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1598799</ieee_id><sourcerecordid>1598799</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-fbcb17ea406253e4fb1e7cfff6178ce8b761f663dc2bbde5d1b1029819f486d03</originalsourceid><addsrcrecordid>eNotkMtKxDAYhQMiKGNfQDd5gdb8TZM0y6GMFyzqQlcuhiT9o5HphSYVfXsrzuGDs_k4i0PIJbACgOnrx_b5YVeUjIkChK6V1ick06pmK1xzCeKMZDF-sjVcCynlOXnbLmnsTQqOunFIOCRqTcSOppAOSPE7zcalMA7UjzNtPsKAEWk3uqVf3UiXGIZ3GpdpGudEv9ClVeuN-xMvyKk3h4jZsTfk9Wb30tzl7dPtfbNt8wBKpNxbZ0GhqZgsBcfKW0DlvPcSVO2wtkqCl5J3rrS2Q9GBBVbqGrSvatkxviFX_7sBEffTHHoz_-yPF_BfeVRUew</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Automatic content based title extraction for Chinese documents using support vector machine</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Zhengcao Zhang ; Maosong Sun ; Shaoming Liu</creator><creatorcontrib>Zhengcao Zhang ; Maosong Sun ; Shaoming Liu</creatorcontrib><description>In this paper, a content-based and domain-independent method for automatically extracting titles from Chinese research papers is proposed. The information contained in the title itself and the similarity between the title and the body of the paper is exploited, under the condition that the experiment is carried out on plain texts in which no any format information such as font is used. A list of words only used in Chinese titles and a list of words never used in Chinese titles are further collected to facilitate the title extraction. We use the support vector machine classifier to perform a robust and more adaptable automatic title extraction. The method achieves good performance on a test set consisting of 2438 research papers which cover almost all of the academic disciplines.</description><identifier>ISBN: 9780780393615</identifier><identifier>ISBN: 0780393619</identifier><identifier>DOI: 10.1109/NLPKE.2005.1598799</identifier><language>eng</language><publisher>IEEE</publisher><subject>Computer science ; Data mining ; Intelligent systems ; Internet ; Laboratories ; Machine intelligence ; Robustness ; Sun ; Support vector machine classification ; Support vector machines</subject><ispartof>2005 International Conference on Natural Language Processing and Knowledge Engineering, 2005, p.553-558</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1598799$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,4036,4037,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1598799$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhengcao Zhang</creatorcontrib><creatorcontrib>Maosong Sun</creatorcontrib><creatorcontrib>Shaoming Liu</creatorcontrib><title>Automatic content based title extraction for Chinese documents using support vector machine</title><title>2005 International Conference on Natural Language Processing and Knowledge Engineering</title><addtitle>NLPKE</addtitle><description>In this paper, a content-based and domain-independent method for automatically extracting titles from Chinese research papers is proposed. The information contained in the title itself and the similarity between the title and the body of the paper is exploited, under the condition that the experiment is carried out on plain texts in which no any format information such as font is used. A list of words only used in Chinese titles and a list of words never used in Chinese titles are further collected to facilitate the title extraction. We use the support vector machine classifier to perform a robust and more adaptable automatic title extraction. The method achieves good performance on a test set consisting of 2438 research papers which cover almost all of the academic disciplines.</description><subject>Computer science</subject><subject>Data mining</subject><subject>Intelligent systems</subject><subject>Internet</subject><subject>Laboratories</subject><subject>Machine intelligence</subject><subject>Robustness</subject><subject>Sun</subject><subject>Support vector machine classification</subject><subject>Support vector machines</subject><isbn>9780780393615</isbn><isbn>0780393619</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2005</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotkMtKxDAYhQMiKGNfQDd5gdb8TZM0y6GMFyzqQlcuhiT9o5HphSYVfXsrzuGDs_k4i0PIJbACgOnrx_b5YVeUjIkChK6V1ick06pmK1xzCeKMZDF-sjVcCynlOXnbLmnsTQqOunFIOCRqTcSOppAOSPE7zcalMA7UjzNtPsKAEWk3uqVf3UiXGIZ3GpdpGudEv9ClVeuN-xMvyKk3h4jZsTfk9Wb30tzl7dPtfbNt8wBKpNxbZ0GhqZgsBcfKW0DlvPcSVO2wtkqCl5J3rrS2Q9GBBVbqGrSvatkxviFX_7sBEffTHHoz_-yPF_BfeVRUew</recordid><startdate>2005</startdate><enddate>2005</enddate><creator>Zhengcao Zhang</creator><creator>Maosong Sun</creator><creator>Shaoming Liu</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>2005</creationdate><title>Automatic content based title extraction for Chinese documents using support vector machine</title><author>Zhengcao Zhang ; Maosong Sun ; Shaoming Liu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-fbcb17ea406253e4fb1e7cfff6178ce8b761f663dc2bbde5d1b1029819f486d03</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Computer science</topic><topic>Data mining</topic><topic>Intelligent systems</topic><topic>Internet</topic><topic>Laboratories</topic><topic>Machine intelligence</topic><topic>Robustness</topic><topic>Sun</topic><topic>Support vector machine classification</topic><topic>Support vector machines</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhengcao Zhang</creatorcontrib><creatorcontrib>Maosong Sun</creatorcontrib><creatorcontrib>Shaoming Liu</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhengcao Zhang</au><au>Maosong Sun</au><au>Shaoming Liu</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Automatic content based title extraction for Chinese documents using support vector machine</atitle><btitle>2005 International Conference on Natural Language Processing and Knowledge Engineering</btitle><stitle>NLPKE</stitle><date>2005</date><risdate>2005</risdate><spage>553</spage><epage>558</epage><pages>553-558</pages><isbn>9780780393615</isbn><isbn>0780393619</isbn><abstract>In this paper, a content-based and domain-independent method for automatically extracting titles from Chinese research papers is proposed. The information contained in the title itself and the similarity between the title and the body of the paper is exploited, under the condition that the experiment is carried out on plain texts in which no any format information such as font is used. A list of words only used in Chinese titles and a list of words never used in Chinese titles are further collected to facilitate the title extraction. We use the support vector machine classifier to perform a robust and more adaptable automatic title extraction. The method achieves good performance on a test set consisting of 2438 research papers which cover almost all of the academic disciplines.</abstract><pub>IEEE</pub><doi>10.1109/NLPKE.2005.1598799</doi><tpages>6</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISBN: 9780780393615 |
ispartof | 2005 International Conference on Natural Language Processing and Knowledge Engineering, 2005, p.553-558 |
issn | |
language | eng |
recordid | cdi_ieee_primary_1598799 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Computer science Data mining Intelligent systems Internet Laboratories Machine intelligence Robustness Sun Support vector machine classification Support vector machines |
title | Automatic content based title extraction for Chinese documents using support vector machine |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T23%3A36%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Automatic%20content%20based%20title%20extraction%20for%20Chinese%20documents%20using%20support%20vector%20machine&rft.btitle=2005%20International%20Conference%20on%20Natural%20Language%20Processing%20and%20Knowledge%20Engineering&rft.au=Zhengcao%20Zhang&rft.date=2005&rft.spage=553&rft.epage=558&rft.pages=553-558&rft.isbn=9780780393615&rft.isbn_list=0780393619&rft_id=info:doi/10.1109/NLPKE.2005.1598799&rft_dat=%3Cieee_6IE%3E1598799%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=1598799&rfr_iscdi=true |