Indexing Names of Persons in a Large Dataset of a Newspaper

An index is a very good tool for finding the necessary information from a set of documents. So far, the extant index tools in both the printed and digital newspaper versions are not sufficient to help users find information. Users must browse the entire newspaper to fulfill their needs or discover l...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Pirovani, Juliana P. C., Nogueira, Matheus, de Oliveira, Elias
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Computation and Language Computer Science Gold Collection (GC) Local Grammars (LG) NER System Newspaper Pages Personal Names
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	155
container_issue
container_start_page	147
container_title
container_volume	11122
creator	Pirovani, Juliana P. C. Nogueira, Matheus de Oliveira, Elias
description	An index is a very good tool for finding the necessary information from a set of documents. So far, the extant index tools in both the printed and digital newspaper versions are not sufficient to help users find information. Users must browse the entire newspaper to fulfill their needs or discover later on, after spending a considerable amount of energy, that the information they had been seeking is not available. We propose here to use state-of-the-art strategies for extracting named entities specifically for person names and, with an index of names, provide the user with an important tool to find names within newspaper pages. The state-of-the-art system considered used the Golden Collection of the First and Second HAREM, a reference for Named Entity Recognition systems in Portuguese, as training and test sets respectively. Furthermore, we created a new training dataset from the actual newspaper’s articles. In this case, we processed 100 articles of the newspaper and managed to correctly find 87.0% of the extant names and their respective partial citations.
doi_str_mv	10.1007/978-3-319-99722-3_15
format	Conference Proceeding
fullrecord	<record><control><sourceid>proquest_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_04026626v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>EBC6307020_140_160</sourcerecordid><originalsourceid>FETCH-LOGICAL-h277t-6c7354832a0bd107b431cce53d9caf9be4bca4d0ea4a2b56f754848e357042be3</originalsourceid><addsrcrecordid>eNo1kE1P3DAQhl2gqMt2_wGHXDkYxh7HjtUTAlpWWgEHKnGzJtnZj7KbpHH46L-vswuH0UjvPO8cHiFOFZwrAHfhXSFRovLSe6e1xKDyL-IEU7ILng7ESFmlJKLxh2KS-M-bskdiBAhaemfwWJwoyHVRWA_um5jE-AcANBTe5mYkfkzrOb-v62V2R1uOWbPIHriLTR2zdZ1RNqNuydk19RS5H66U3fFbbKnl7rv4uqBN5MnHHovfP28er27l7P7X9OpyJlfauV7aymFuCtQE5VyBKw2qquIc576ihS_ZlBWZOTAZ0mVuFy7RpmDMHRhdMo7F2f7vijah7dZb6v6Fhtbh9nIWhgwMaGu1fVWJ1Xs2JrBechfKpnmOQUEYvIbkKWBIpsJOYxi8ppLZl9qu-fvCsQ88tCqu-4421YraPjkJFsElc0GZNBbwPy99dAE</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype><pqid>EBC6307020_140_160</pqid></control><display><type>conference_proceeding</type><title>Indexing Names of Persons in a Large Dataset of a Newspaper</title><source>Springer Books</source><creator>Pirovani, Juliana P. C. ; Nogueira, Matheus ; de Oliveira, Elias</creator><contributor>Gonçalo Oliveira, Hugo ; Villavicencio, Aline ; Gamallo, Pablo ; Abad, Alberto ; Paetzold, Gustavo Henrique ; Moreira, Viviane ; Ramisch, Carlos ; Caseli, Helena ; Abad, Alberto ; Gonçalo Oliveira, Hugo ; Caseli, Helena ; Villavicencio, Aline ; Moreira, Viviane ; Paetzold, Gustavo Henrique ; Gamallo, Pablo ; Ramisch, Carlos</contributor><creatorcontrib>Pirovani, Juliana P. C. ; Nogueira, Matheus ; de Oliveira, Elias ; Gonçalo Oliveira, Hugo ; Villavicencio, Aline ; Gamallo, Pablo ; Abad, Alberto ; Paetzold, Gustavo Henrique ; Moreira, Viviane ; Ramisch, Carlos ; Caseli, Helena ; Abad, Alberto ; Gonçalo Oliveira, Hugo ; Caseli, Helena ; Villavicencio, Aline ; Moreira, Viviane ; Paetzold, Gustavo Henrique ; Gamallo, Pablo ; Ramisch, Carlos</creatorcontrib><description>An index is a very good tool for finding the necessary information from a set of documents. So far, the extant index tools in both the printed and digital newspaper versions are not sufficient to help users find information. Users must browse the entire newspaper to fulfill their needs or discover later on, after spending a considerable amount of energy, that the information they had been seeking is not available. We propose here to use state-of-the-art strategies for extracting named entities specifically for person names and, with an index of names, provide the user with an important tool to find names within newspaper pages. The state-of-the-art system considered used the Golden Collection of the First and Second HAREM, a reference for Named Entity Recognition systems in Portuguese, as training and test sets respectively. Furthermore, we created a new training dataset from the actual newspaper’s articles. In this case, we processed 100 articles of the newspaper and managed to correctly find 87.0% of the extant names and their respective partial citations.</description><identifier>ISSN: 0302-9743</identifier><identifier>ISBN: 9783319997216</identifier><identifier>ISBN: 3319997211</identifier><identifier>ISBN: 331999722X</identifier><identifier>ISBN: 9783319997223</identifier><identifier>EISSN: 1611-3349</identifier><identifier>EISBN: 331999722X</identifier><identifier>EISBN: 9783319997223</identifier><identifier>DOI: 10.1007/978-3-319-99722-3_15</identifier><identifier>OCLC: 1052886907</identifier><identifier>LCCallNum: Q334-342</identifier><language>eng</language><publisher>Switzerland: Springer International Publishing AG</publisher><subject>Computation and Language ; Computer Science ; Gold Collection (GC) ; Local Grammars (LG) ; NER System ; Newspaper Pages ; Personal Names</subject><ispartof>Computational Processing of the Portuguese Language, 2018, Vol.11122, p.147-155</ispartof><rights>Springer Nature Switzerland AG 2018</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0003-2066-7980 ; 0000-0002-4157-6503</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttps://ebookcentral.proquest.com/covers/6307020-l.jpg</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/978-3-319-99722-3_15$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/978-3-319-99722-3_15$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>230,309,779,780,784,789,793,885,27925,38255,41442,42511</link.rule.ids><backlink>$$Uhttps://hal.science/hal-04026626$$DView record in HAL$$Hfree_for_read</backlink></links><search><contributor>Gonçalo Oliveira, Hugo</contributor><contributor>Villavicencio, Aline</contributor><contributor>Gamallo, Pablo</contributor><contributor>Abad, Alberto</contributor><contributor>Paetzold, Gustavo Henrique</contributor><contributor>Moreira, Viviane</contributor><contributor>Ramisch, Carlos</contributor><contributor>Caseli, Helena</contributor><contributor>Abad, Alberto</contributor><contributor>Gonçalo Oliveira, Hugo</contributor><contributor>Caseli, Helena</contributor><contributor>Villavicencio, Aline</contributor><contributor>Moreira, Viviane</contributor><contributor>Paetzold, Gustavo Henrique</contributor><contributor>Gamallo, Pablo</contributor><contributor>Ramisch, Carlos</contributor><creatorcontrib>Pirovani, Juliana P. C.</creatorcontrib><creatorcontrib>Nogueira, Matheus</creatorcontrib><creatorcontrib>de Oliveira, Elias</creatorcontrib><title>Indexing Names of Persons in a Large Dataset of a Newspaper</title><title>Computational Processing of the Portuguese Language</title><description>An index is a very good tool for finding the necessary information from a set of documents. So far, the extant index tools in both the printed and digital newspaper versions are not sufficient to help users find information. Users must browse the entire newspaper to fulfill their needs or discover later on, after spending a considerable amount of energy, that the information they had been seeking is not available. We propose here to use state-of-the-art strategies for extracting named entities specifically for person names and, with an index of names, provide the user with an important tool to find names within newspaper pages. The state-of-the-art system considered used the Golden Collection of the First and Second HAREM, a reference for Named Entity Recognition systems in Portuguese, as training and test sets respectively. Furthermore, we created a new training dataset from the actual newspaper’s articles. In this case, we processed 100 articles of the newspaper and managed to correctly find 87.0% of the extant names and their respective partial citations.</description><subject>Computation and Language</subject><subject>Computer Science</subject><subject>Gold Collection (GC)</subject><subject>Local Grammars (LG)</subject><subject>NER System</subject><subject>Newspaper Pages</subject><subject>Personal Names</subject><issn>0302-9743</issn><issn>1611-3349</issn><isbn>9783319997216</isbn><isbn>3319997211</isbn><isbn>331999722X</isbn><isbn>9783319997223</isbn><isbn>331999722X</isbn><isbn>9783319997223</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2018</creationdate><recordtype>conference_proceeding</recordtype><recordid>eNo1kE1P3DAQhl2gqMt2_wGHXDkYxh7HjtUTAlpWWgEHKnGzJtnZj7KbpHH46L-vswuH0UjvPO8cHiFOFZwrAHfhXSFRovLSe6e1xKDyL-IEU7ILng7ESFmlJKLxh2KS-M-bskdiBAhaemfwWJwoyHVRWA_um5jE-AcANBTe5mYkfkzrOb-v62V2R1uOWbPIHriLTR2zdZ1RNqNuydk19RS5H66U3fFbbKnl7rv4uqBN5MnHHovfP28er27l7P7X9OpyJlfauV7aymFuCtQE5VyBKw2qquIc576ihS_ZlBWZOTAZ0mVuFy7RpmDMHRhdMo7F2f7vijah7dZb6v6Fhtbh9nIWhgwMaGu1fVWJ1Xs2JrBechfKpnmOQUEYvIbkKWBIpsJOYxi8ppLZl9qu-fvCsQ88tCqu-4421YraPjkJFsElc0GZNBbwPy99dAE</recordid><startdate>2018</startdate><enddate>2018</enddate><creator>Pirovani, Juliana P. C.</creator><creator>Nogueira, Matheus</creator><creator>de Oliveira, Elias</creator><general>Springer International Publishing AG</general><general>Springer International Publishing</general><scope>FFUUA</scope><scope>1XC</scope><orcidid>https://orcid.org/0000-0003-2066-7980</orcidid><orcidid>https://orcid.org/0000-0002-4157-6503</orcidid></search><sort><creationdate>2018</creationdate><title>Indexing Names of Persons in a Large Dataset of a Newspaper</title><author>Pirovani, Juliana P. C. ; Nogueira, Matheus ; de Oliveira, Elias</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-h277t-6c7354832a0bd107b431cce53d9caf9be4bca4d0ea4a2b56f754848e357042be3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Computation and Language</topic><topic>Computer Science</topic><topic>Gold Collection (GC)</topic><topic>Local Grammars (LG)</topic><topic>NER System</topic><topic>Newspaper Pages</topic><topic>Personal Names</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Pirovani, Juliana P. C.</creatorcontrib><creatorcontrib>Nogueira, Matheus</creatorcontrib><creatorcontrib>de Oliveira, Elias</creatorcontrib><collection>ProQuest Ebook Central - Book Chapters - Demo use only</collection><collection>Hyper Article en Ligne (HAL)</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pirovani, Juliana P. C.</au><au>Nogueira, Matheus</au><au>de Oliveira, Elias</au><au>Gonçalo Oliveira, Hugo</au><au>Villavicencio, Aline</au><au>Gamallo, Pablo</au><au>Abad, Alberto</au><au>Paetzold, Gustavo Henrique</au><au>Moreira, Viviane</au><au>Ramisch, Carlos</au><au>Caseli, Helena</au><au>Abad, Alberto</au><au>Gonçalo Oliveira, Hugo</au><au>Caseli, Helena</au><au>Villavicencio, Aline</au><au>Moreira, Viviane</au><au>Paetzold, Gustavo Henrique</au><au>Gamallo, Pablo</au><au>Ramisch, Carlos</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Indexing Names of Persons in a Large Dataset of a Newspaper</atitle><btitle>Computational Processing of the Portuguese Language</btitle><date>2018</date><risdate>2018</risdate><volume>11122</volume><spage>147</spage><epage>155</epage><pages>147-155</pages><issn>0302-9743</issn><eissn>1611-3349</eissn><isbn>9783319997216</isbn><isbn>3319997211</isbn><isbn>331999722X</isbn><isbn>9783319997223</isbn><eisbn>331999722X</eisbn><eisbn>9783319997223</eisbn><abstract>An index is a very good tool for finding the necessary information from a set of documents. So far, the extant index tools in both the printed and digital newspaper versions are not sufficient to help users find information. Users must browse the entire newspaper to fulfill their needs or discover later on, after spending a considerable amount of energy, that the information they had been seeking is not available. We propose here to use state-of-the-art strategies for extracting named entities specifically for person names and, with an index of names, provide the user with an important tool to find names within newspaper pages. The state-of-the-art system considered used the Golden Collection of the First and Second HAREM, a reference for Named Entity Recognition systems in Portuguese, as training and test sets respectively. Furthermore, we created a new training dataset from the actual newspaper’s articles. In this case, we processed 100 articles of the newspaper and managed to correctly find 87.0% of the extant names and their respective partial citations.</abstract><cop>Switzerland</cop><pub>Springer International Publishing AG</pub><doi>10.1007/978-3-319-99722-3_15</doi><oclcid>1052886907</oclcid><tpages>9</tpages><orcidid>https://orcid.org/0000-0003-2066-7980</orcidid><orcidid>https://orcid.org/0000-0002-4157-6503</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0302-9743
ispartof	Computational Processing of the Portuguese Language, 2018, Vol.11122, p.147-155
issn	0302-9743 1611-3349
language	eng
recordid	cdi_hal_primary_oai_HAL_hal_04026626v1
source	Springer Books
subjects	Computation and Language Computer Science Gold Collection (GC) Local Grammars (LG) NER System Newspaper Pages Personal Names
title	Indexing Names of Persons in a Large Dataset of a Newspaper
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T21%3A23%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Indexing%20Names%20of%20Persons%20in%20a%20Large%20Dataset%20of%20a%20Newspaper&rft.btitle=Computational%20Processing%20of%20the%20Portuguese%20Language&rft.au=Pirovani,%20Juliana%20P.%20C.&rft.date=2018&rft.volume=11122&rft.spage=147&rft.epage=155&rft.pages=147-155&rft.issn=0302-9743&rft.eissn=1611-3349&rft.isbn=9783319997216&rft.isbn_list=3319997211&rft.isbn_list=331999722X&rft.isbn_list=9783319997223&rft_id=info:doi/10.1007/978-3-319-99722-3_15&rft_dat=%3Cproquest_hal_p%3EEBC6307020_140_160%3C/proquest_hal_p%3E%3Curl%3E%3C/url%3E&rft.eisbn=331999722X&rft.eisbn_list=9783319997223&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=EBC6307020_140_160&rft_id=info:pmid/&rfr_iscdi=true