Citation recognition for scientific publications in digital libraries

A method based on part-of-speech tagging (PoS) is used for bibliographic reference structure. This method operates on a roughly structured ASCII file, produced by OCR. Because of the heterogeneity of the reference structure, the method acts in a bottom-up way, without an a priori model, gathering st...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Besagni, D., Belaid, A.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 252
container_issue
container_start_page 244
container_title
container_volume
creator Besagni, D.
Belaid, A.
description A method based on part-of-speech tagging (PoS) is used for bibliographic reference structure. This method operates on a roughly structured ASCII file, produced by OCR. Because of the heterogeneity of the reference structure, the method acts in a bottom-up way, without an a priori model, gathering structural elements from basic tags to subfields and fields. Significant tags are first grouped in homogeneous classes according to their categories and then reduced in canonical forms corresponding to record fields: "authors", "title", "conference name", "date", etc. Nonlabeled tokens are integrated in one or another field by either applying PoS correction rules or using a interor intra-field model generated from well-detected records. The designed prototype operates with a great satisfaction on different record layouts and character recognition qualities. Without manual intervention, 96.6% words are correctly attributed, and about 75,9% references are completely segmented from 2,575 references.
doi_str_mv 10.1109/DIAL.2004.1263253
format Conference Proceeding
fullrecord <record><control><sourceid>hal_6IE</sourceid><recordid>TN_cdi_ieee_primary_1263253</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1263253</ieee_id><sourcerecordid>oai_HAL_inria_00100181v1</sourcerecordid><originalsourceid>FETCH-LOGICAL-h209t-a9027540c690cd949d63d12490986b5ce8f116c82cfa6e924a0355079de92f5f3</originalsourceid><addsrcrecordid>eNo9kE9Lw0AQxRdEUGs_gHjZuyTO_k32GGq1hYAXBW9hs9ltR2JSdqPgtze0xWHgvQe_eYch5I5BzhiYx6dtVeccQOaMa8GVuCA3UGijOJTlxxVZpvQJ8wgjNRPXZL3CyU44DjR6N-4GPPowRpoc-mHCgI4evtse3RFLFAfa4W6-6mmPbbQRfboll8H2yS_PuiDvz-u31SarX1-2q6rO9hzMlFkDvFASnDbgOiNNp0XHuDRgSt0q58vAmHYld8Fqb7i0IJSCwnRzCCqIBXk49e5t3xwiftn424wWm01VNzhEtA0Am7dkP2ym7080eu__8fNfxB-Smlf5</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Citation recognition for scientific publications in digital libraries</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Besagni, D. ; Belaid, A.</creator><creatorcontrib>Besagni, D. ; Belaid, A.</creatorcontrib><description>A method based on part-of-speech tagging (PoS) is used for bibliographic reference structure. This method operates on a roughly structured ASCII file, produced by OCR. Because of the heterogeneity of the reference structure, the method acts in a bottom-up way, without an a priori model, gathering structural elements from basic tags to subfields and fields. Significant tags are first grouped in homogeneous classes according to their categories and then reduced in canonical forms corresponding to record fields: "authors", "title", "conference name", "date", etc. Nonlabeled tokens are integrated in one or another field by either applying PoS correction rules or using a interor intra-field model generated from well-detected records. The designed prototype operates with a great satisfaction on different record layouts and character recognition qualities. Without manual intervention, 96.6% words are correctly attributed, and about 75,9% references are completely segmented from 2,575 references.</description><identifier>ISBN: 076952088X</identifier><identifier>ISBN: 9780769520889</identifier><identifier>DOI: 10.1109/DIAL.2004.1263253</identifier><language>eng</language><publisher>IEEE</publisher><subject>Bibliometrics ; Character recognition ; Computer Science ; Information analysis ; Optical character recognition software ; Other ; Production ; Prototypes ; Software libraries ; Tagging ; Turning ; Watches</subject><ispartof>First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings, 2004, p.244-252</ispartof><rights>Attribution</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0001-9639-7281</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1263253$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>230,309,310,776,780,785,786,881,2052,4036,4037,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1263253$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://inria.hal.science/inria-00100181$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Besagni, D.</creatorcontrib><creatorcontrib>Belaid, A.</creatorcontrib><title>Citation recognition for scientific publications in digital libraries</title><title>First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings</title><addtitle>DIAL</addtitle><description>A method based on part-of-speech tagging (PoS) is used for bibliographic reference structure. This method operates on a roughly structured ASCII file, produced by OCR. Because of the heterogeneity of the reference structure, the method acts in a bottom-up way, without an a priori model, gathering structural elements from basic tags to subfields and fields. Significant tags are first grouped in homogeneous classes according to their categories and then reduced in canonical forms corresponding to record fields: "authors", "title", "conference name", "date", etc. Nonlabeled tokens are integrated in one or another field by either applying PoS correction rules or using a interor intra-field model generated from well-detected records. The designed prototype operates with a great satisfaction on different record layouts and character recognition qualities. Without manual intervention, 96.6% words are correctly attributed, and about 75,9% references are completely segmented from 2,575 references.</description><subject>Bibliometrics</subject><subject>Character recognition</subject><subject>Computer Science</subject><subject>Information analysis</subject><subject>Optical character recognition software</subject><subject>Other</subject><subject>Production</subject><subject>Prototypes</subject><subject>Software libraries</subject><subject>Tagging</subject><subject>Turning</subject><subject>Watches</subject><isbn>076952088X</isbn><isbn>9780769520889</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2004</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNo9kE9Lw0AQxRdEUGs_gHjZuyTO_k32GGq1hYAXBW9hs9ltR2JSdqPgtze0xWHgvQe_eYch5I5BzhiYx6dtVeccQOaMa8GVuCA3UGijOJTlxxVZpvQJ8wgjNRPXZL3CyU44DjR6N-4GPPowRpoc-mHCgI4evtse3RFLFAfa4W6-6mmPbbQRfboll8H2yS_PuiDvz-u31SarX1-2q6rO9hzMlFkDvFASnDbgOiNNp0XHuDRgSt0q58vAmHYld8Fqb7i0IJSCwnRzCCqIBXk49e5t3xwiftn424wWm01VNzhEtA0Am7dkP2ym7080eu__8fNfxB-Smlf5</recordid><startdate>2004</startdate><enddate>2004</enddate><creator>Besagni, D.</creator><creator>Belaid, A.</creator><general>IEEE</general><general>IEEE Computer Society</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0001-9639-7281</orcidid></search><sort><creationdate>2004</creationdate><title>Citation recognition for scientific publications in digital libraries</title><author>Besagni, D. ; Belaid, A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-h209t-a9027540c690cd949d63d12490986b5ce8f116c82cfa6e924a0355079de92f5f3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2004</creationdate><topic>Bibliometrics</topic><topic>Character recognition</topic><topic>Computer Science</topic><topic>Information analysis</topic><topic>Optical character recognition software</topic><topic>Other</topic><topic>Production</topic><topic>Prototypes</topic><topic>Software libraries</topic><topic>Tagging</topic><topic>Turning</topic><topic>Watches</topic><toplevel>online_resources</toplevel><creatorcontrib>Besagni, D.</creatorcontrib><creatorcontrib>Belaid, A.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Besagni, D.</au><au>Belaid, A.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Citation recognition for scientific publications in digital libraries</atitle><btitle>First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings</btitle><stitle>DIAL</stitle><date>2004</date><risdate>2004</risdate><spage>244</spage><epage>252</epage><pages>244-252</pages><isbn>076952088X</isbn><isbn>9780769520889</isbn><abstract>A method based on part-of-speech tagging (PoS) is used for bibliographic reference structure. This method operates on a roughly structured ASCII file, produced by OCR. Because of the heterogeneity of the reference structure, the method acts in a bottom-up way, without an a priori model, gathering structural elements from basic tags to subfields and fields. Significant tags are first grouped in homogeneous classes according to their categories and then reduced in canonical forms corresponding to record fields: "authors", "title", "conference name", "date", etc. Nonlabeled tokens are integrated in one or another field by either applying PoS correction rules or using a interor intra-field model generated from well-detected records. The designed prototype operates with a great satisfaction on different record layouts and character recognition qualities. Without manual intervention, 96.6% words are correctly attributed, and about 75,9% references are completely segmented from 2,575 references.</abstract><pub>IEEE</pub><doi>10.1109/DIAL.2004.1263253</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0001-9639-7281</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISBN: 076952088X
ispartof First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings, 2004, p.244-252
issn
language eng
recordid cdi_ieee_primary_1263253
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Bibliometrics
Character recognition
Computer Science
Information analysis
Optical character recognition software
Other
Production
Prototypes
Software libraries
Tagging
Turning
Watches
title Citation recognition for scientific publications in digital libraries
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T09%3A45%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-hal_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Citation%20recognition%20for%20scientific%20publications%20in%20digital%20libraries&rft.btitle=First%20International%20Workshop%20on%20Document%20Image%20Analysis%20for%20Libraries,%202004.%20Proceedings&rft.au=Besagni,%20D.&rft.date=2004&rft.spage=244&rft.epage=252&rft.pages=244-252&rft.isbn=076952088X&rft.isbn_list=9780769520889&rft_id=info:doi/10.1109/DIAL.2004.1263253&rft_dat=%3Chal_6IE%3Eoai_HAL_inria_00100181v1%3C/hal_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=1263253&rfr_iscdi=true