NAMED ENTITIES DISTRIBUTION IN NEWSPAPER ARTICLES

This paper is focused on the distribution of named entities inside texts for e-learning. An important topic in natural language processing is represented by Named Entities Recognition (NER), which is essential in order to be able to automatically understand the text. In this article we experimented...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	eLearning and Software for Education 2016, Vol.12 (1), p.231-238
Hauptverfasser:	Matei, Liviu Sebastian, Trăuşan-Matu, Ştefan
Format:	Artikel
Sprache:	eng
Schlagworte:	ICT Information and Communications Technologies Media studies Sociology of Education
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	238
container_issue	1
container_start_page	231
container_title	eLearning and Software for Education
container_volume	12
creator	Matei, Liviu Sebastian Trăuşan-Matu, Ştefan
description	This paper is focused on the distribution of named entities inside texts for e-learning. An important topic in natural language processing is represented by Named Entities Recognition (NER), which is essential in order to be able to automatically understand the text. In this article we experimented our approach with newspaper articles, but the same technique can be applied to texts written by students, documentation or different other sources of knowledge used in e-learning. In order to perform this analysis we considered a number of 19043 Reuters newspaper articles. The following types of named entities were considered: person names, locations and organizations. For the extraction of the named entities we used Stanford NER software. Among the statistics that we are computing are the average number of sentences per named entity, which is defined as the number of distinct named entities from an article divided by the number of sentences and the average position in the text. We are also computing for each type of named entity the distribution in the sentence and we are representing graphically the result by using cubic spline interpolation. Using the position of named entities in the text we will infer a function based on Poisson’s distribution that models the distribution of named entities. These statistics can be used afterwards in different areas of NLP. For example, if we know that usually named entities are found more often in certain areas of the text, a statistical NER can give a higher priority to those. Another example of usage for such a statistic is the case of automatic text summarization. Some summarization approaches consider always the first sentence of the text. Due to the fact that named entities are an essential factor for automatic summarization, if we would know the typical distribution of named entities we could consider in the summary the sentences that are more probably containing these special tokens. This is important for e-learning because it permits us to identify more easily topics or important sentences within learning materials and thus users are able to acquire knowledge more easily or to determine which knowledge source is relevant for them.
format	Article
fullrecord	<record><control><sourceid>ceeol</sourceid><recordid>TN_cdi_ceeol_journals_522410</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ceeol_id>522410</ceeol_id><sourcerecordid>522410</sourcerecordid><originalsourceid>FETCH-ceeol_journals_5224103</originalsourceid><addsrcrecordid>eNpjYuA0MjAz07WwMDJkgbINjMwiOBh4i4uzDIDA0sTQwNSEk8HQz9HX1UXB1S_EM8TTNVjBxTM4JMjTKTTE099PwdNPwc81PDjAMcA1SMExKMTT2cc1mIeBNS0xpziVF0pzM8i4uYY4e-gmp6bm58Rn5ZcW5QHF402NjIB2GBOQBgAnziv0</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>NAMED ENTITIES DISTRIBUTION IN NEWSPAPER ARTICLES</title><source>Education Source (EBSCOhost)</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Matei, Liviu Sebastian ; Trăuşan-Matu, Ştefan</creator><creatorcontrib>Matei, Liviu Sebastian ; Trăuşan-Matu, Ştefan</creatorcontrib><description>This paper is focused on the distribution of named entities inside texts for e-learning. An important topic in natural language processing is represented by Named Entities Recognition (NER), which is essential in order to be able to automatically understand the text. In this article we experimented our approach with newspaper articles, but the same technique can be applied to texts written by students, documentation or different other sources of knowledge used in e-learning. In order to perform this analysis we considered a number of 19043 Reuters newspaper articles. The following types of named entities were considered: person names, locations and organizations. For the extraction of the named entities we used Stanford NER software. Among the statistics that we are computing are the average number of sentences per named entity, which is defined as the number of distinct named entities from an article divided by the number of sentences and the average position in the text. We are also computing for each type of named entity the distribution in the sentence and we are representing graphically the result by using cubic spline interpolation. Using the position of named entities in the text we will infer a function based on Poisson’s distribution that models the distribution of named entities. These statistics can be used afterwards in different areas of NLP. For example, if we know that usually named entities are found more often in certain areas of the text, a statistical NER can give a higher priority to those. Another example of usage for such a statistic is the case of automatic text summarization. Some summarization approaches consider always the first sentence of the text. Due to the fact that named entities are an essential factor for automatic summarization, if we would know the typical distribution of named entities we could consider in the summary the sentences that are more probably containing these special tokens. This is important for e-learning because it permits us to identify more easily topics or important sentences within learning materials and thus users are able to acquire knowledge more easily or to determine which knowledge source is relevant for them.</description><identifier>ISSN: 2066-026X</identifier><identifier>EISSN: 2066-8821</identifier><language>eng</language><publisher>Carol I National Defence University Publishing House</publisher><subject>ICT Information and Communications Technologies ; Media studies ; Sociology of Education</subject><ispartof>eLearning and Software for Education, 2016, Vol.12 (1), p.231-238</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttps://www.ceeol.com//api/image/getissuecoverimage?id=picture_2016_24464.png</thumbnail><link.rule.ids>314,780,784,4024</link.rule.ids></links><search><creatorcontrib>Matei, Liviu Sebastian</creatorcontrib><creatorcontrib>Trăuşan-Matu, Ştefan</creatorcontrib><title>NAMED ENTITIES DISTRIBUTION IN NEWSPAPER ARTICLES</title><title>eLearning and Software for Education</title><addtitle>Conference proceedings of »eLearning and Software for Education« (eLSE)</addtitle><description>This paper is focused on the distribution of named entities inside texts for e-learning. An important topic in natural language processing is represented by Named Entities Recognition (NER), which is essential in order to be able to automatically understand the text. In this article we experimented our approach with newspaper articles, but the same technique can be applied to texts written by students, documentation or different other sources of knowledge used in e-learning. In order to perform this analysis we considered a number of 19043 Reuters newspaper articles. The following types of named entities were considered: person names, locations and organizations. For the extraction of the named entities we used Stanford NER software. Among the statistics that we are computing are the average number of sentences per named entity, which is defined as the number of distinct named entities from an article divided by the number of sentences and the average position in the text. We are also computing for each type of named entity the distribution in the sentence and we are representing graphically the result by using cubic spline interpolation. Using the position of named entities in the text we will infer a function based on Poisson’s distribution that models the distribution of named entities. These statistics can be used afterwards in different areas of NLP. For example, if we know that usually named entities are found more often in certain areas of the text, a statistical NER can give a higher priority to those. Another example of usage for such a statistic is the case of automatic text summarization. Some summarization approaches consider always the first sentence of the text. Due to the fact that named entities are an essential factor for automatic summarization, if we would know the typical distribution of named entities we could consider in the summary the sentences that are more probably containing these special tokens. This is important for e-learning because it permits us to identify more easily topics or important sentences within learning materials and thus users are able to acquire knowledge more easily or to determine which knowledge source is relevant for them.</description><subject>ICT Information and Communications Technologies</subject><subject>Media studies</subject><subject>Sociology of Education</subject><issn>2066-026X</issn><issn>2066-8821</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>REL</sourceid><recordid>eNpjYuA0MjAz07WwMDJkgbINjMwiOBh4i4uzDIDA0sTQwNSEk8HQz9HX1UXB1S_EM8TTNVjBxTM4JMjTKTTE099PwdNPwc81PDjAMcA1SMExKMTT2cc1mIeBNS0xpziVF0pzM8i4uYY4e-gmp6bm58Rn5ZcW5QHF402NjIB2GBOQBgAnziv0</recordid><startdate>2016</startdate><enddate>2016</enddate><creator>Matei, Liviu Sebastian</creator><creator>Trăuşan-Matu, Ştefan</creator><general>Carol I National Defence University Publishing House</general><scope>AE2</scope><scope>BIXPP</scope><scope>REL</scope></search><sort><creationdate>2016</creationdate><title>NAMED ENTITIES DISTRIBUTION IN NEWSPAPER ARTICLES</title><author>Matei, Liviu Sebastian ; Trăuşan-Matu, Ştefan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-ceeol_journals_5224103</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>ICT Information and Communications Technologies</topic><topic>Media studies</topic><topic>Sociology of Education</topic><toplevel>online_resources</toplevel><creatorcontrib>Matei, Liviu Sebastian</creatorcontrib><creatorcontrib>Trăuşan-Matu, Ştefan</creatorcontrib><collection>Central and Eastern European Online Library (C.E.E.O.L.) (DFG Nationallizenzen)</collection><collection>CEEOL: Open Access</collection><collection>Central and Eastern European Online Library</collection><jtitle>eLearning and Software for Education</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Matei, Liviu Sebastian</au><au>Trăuşan-Matu, Ştefan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>NAMED ENTITIES DISTRIBUTION IN NEWSPAPER ARTICLES</atitle><jtitle>eLearning and Software for Education</jtitle><addtitle>Conference proceedings of »eLearning and Software for Education« (eLSE)</addtitle><date>2016</date><risdate>2016</risdate><volume>12</volume><issue>1</issue><spage>231</spage><epage>238</epage><pages>231-238</pages><issn>2066-026X</issn><eissn>2066-8821</eissn><abstract>This paper is focused on the distribution of named entities inside texts for e-learning. An important topic in natural language processing is represented by Named Entities Recognition (NER), which is essential in order to be able to automatically understand the text. In this article we experimented our approach with newspaper articles, but the same technique can be applied to texts written by students, documentation or different other sources of knowledge used in e-learning. In order to perform this analysis we considered a number of 19043 Reuters newspaper articles. The following types of named entities were considered: person names, locations and organizations. For the extraction of the named entities we used Stanford NER software. Among the statistics that we are computing are the average number of sentences per named entity, which is defined as the number of distinct named entities from an article divided by the number of sentences and the average position in the text. We are also computing for each type of named entity the distribution in the sentence and we are representing graphically the result by using cubic spline interpolation. Using the position of named entities in the text we will infer a function based on Poisson’s distribution that models the distribution of named entities. These statistics can be used afterwards in different areas of NLP. For example, if we know that usually named entities are found more often in certain areas of the text, a statistical NER can give a higher priority to those. Another example of usage for such a statistic is the case of automatic text summarization. Some summarization approaches consider always the first sentence of the text. Due to the fact that named entities are an essential factor for automatic summarization, if we would know the typical distribution of named entities we could consider in the summary the sentences that are more probably containing these special tokens. This is important for e-learning because it permits us to identify more easily topics or important sentences within learning materials and thus users are able to acquire knowledge more easily or to determine which knowledge source is relevant for them.</abstract><pub>Carol I National Defence University Publishing House</pub><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2066-026X
ispartof	eLearning and Software for Education, 2016, Vol.12 (1), p.231-238
issn	2066-026X 2066-8821
language	eng
recordid	cdi_ceeol_journals_522410
source	Education Source (EBSCOhost); EZB-FREE-00999 freely available EZB journals
subjects	ICT Information and Communications Technologies Media studies Sociology of Education
title	NAMED ENTITIES DISTRIBUTION IN NEWSPAPER ARTICLES
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T08%3A35%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ceeol&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=NAMED%20ENTITIES%20DISTRIBUTION%20IN%20NEWSPAPER%20ARTICLES&rft.jtitle=eLearning%20and%20Software%20for%20Education&rft.au=Matei,%20Liviu%20Sebastian&rft.date=2016&rft.volume=12&rft.issue=1&rft.spage=231&rft.epage=238&rft.pages=231-238&rft.issn=2066-026X&rft.eissn=2066-8821&rft_id=info:doi/&rft_dat=%3Cceeol%3E522410%3C/ceeol%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ceeol_id=522410&rfr_iscdi=true