Automatic text summarization using latent semantic analysis

In the paper, the most state-of-the-art methods of automatic text summarization, which build summaries in the form of generic extracts, are considered. The original text is represented in the form of a numerical matrix. Matrix columns correspond to text sentences, and each sentence is represented in...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Programming and computer software 2011-11, Vol.37 (6), p.299-305
Hauptverfasser:	Mashechkin, I. V., Petrovskiy, M. I., Popov, D. S., Tsarev, D. V.
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Computer Science Operating Systems Representations Semantic analysis Semantics Sentences Software Engineering Software Engineering/Programming and Operating Systems Standard data
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	305
container_issue	6
container_start_page	299
container_title	Programming and computer software
container_volume	37
creator	Mashechkin, I. V. Petrovskiy, M. I. Popov, D. S. Tsarev, D. V.
description	In the paper, the most state-of-the-art methods of automatic text summarization, which build summaries in the form of generic extracts, are considered. The original text is represented in the form of a numerical matrix. Matrix columns correspond to text sentences, and each sentence is represented in the form of a vector in the term space. Further, latent semantic analysis is applied to the matrix obtained to construct sentences representation in the topic space. The dimensionality of the topic space is much less than the dimensionality of the initial term space. The choice of the most important sentences is carried out on the basis of sentences representation in the topic space. The number of important sentences is defined by the length of the demanded summary. This paper also presents a new generic text summarization method that uses nonnegative matrix factorization to estimate sentence relevance. Proposed sentence relevance estimation is based on normalization of topic space and further weighting of each topic using sentences representation in topic space. The proposed method shows better summarization quality and performance than state-of-the-art methods on the DUC 2001 and DUC 2002 standard data sets.
doi_str_mv	10.1134/S0361768811060041
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2918649526</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2918649526</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-df2e2f962a3d82aee8ac1b01973c30f94f2a39ecedd27abb082f25da55843e43</originalsourceid><addsrcrecordid>eNp1kE9LxDAQxYMouK5-AG8Fz9VMkqYJnpZFV2HBg3svaTtZuvTPmqTg-ulNqeBBPA3M-73HzCPkFug9ABcP75RLyKVSAFRSKuCMLEBSlXIm4ZwsJjmd9Ety5f2BUoiQWJDH1RiGzoSmSgJ-hsSPXWdc8xU3Q5-Mvun3SWsC9lHCzvQTaHrTnnzjr8mFNa3Hm5-5JLvnp936Jd2-bV7Xq21acZAhrS1DZrVkhteKGURlKigp6JxXnFotbFQ0VljXLDdlSRWzLKtNlinBUfAluZtjj274GNGH4jCMLt7gC6ZBSaEzJiMFM1W5wXuHtji6Jr5yKoAWU0XFn4qih80eH9l-j-43-X_TNxQ8aJY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918649526</pqid></control><display><type>article</type><title>Automatic text summarization using latent semantic analysis</title><source>ProQuest Central UK/Ireland</source><source>SpringerLink Journals - AutoHoldings</source><source>ProQuest Central</source><creator>Mashechkin, I. V. ; Petrovskiy, M. I. ; Popov, D. S. ; Tsarev, D. V.</creator><creatorcontrib>Mashechkin, I. V. ; Petrovskiy, M. I. ; Popov, D. S. ; Tsarev, D. V.</creatorcontrib><description>In the paper, the most state-of-the-art methods of automatic text summarization, which build summaries in the form of generic extracts, are considered. The original text is represented in the form of a numerical matrix. Matrix columns correspond to text sentences, and each sentence is represented in the form of a vector in the term space. Further, latent semantic analysis is applied to the matrix obtained to construct sentences representation in the topic space. The dimensionality of the topic space is much less than the dimensionality of the initial term space. The choice of the most important sentences is carried out on the basis of sentences representation in the topic space. The number of important sentences is defined by the length of the demanded summary. This paper also presents a new generic text summarization method that uses nonnegative matrix factorization to estimate sentence relevance. Proposed sentence relevance estimation is based on normalization of topic space and further weighting of each topic using sentences representation in topic space. The proposed method shows better summarization quality and performance than state-of-the-art methods on the DUC 2001 and DUC 2002 standard data sets.</description><identifier>ISSN: 0361-7688</identifier><identifier>EISSN: 1608-3261</identifier><identifier>DOI: 10.1134/S0361768811060041</identifier><language>eng</language><publisher>Dordrecht: SP MAIK Nauka/Interperiodica</publisher><subject>Artificial Intelligence ; Computer Science ; Operating Systems ; Representations ; Semantic analysis ; Semantics ; Sentences ; Software Engineering ; Software Engineering/Programming and Operating Systems ; Standard data</subject><ispartof>Programming and computer software, 2011-11, Vol.37 (6), p.299-305</ispartof><rights>Pleiades Publishing, Ltd. 2011</rights><rights>Pleiades Publishing, Ltd. 2011.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-df2e2f962a3d82aee8ac1b01973c30f94f2a39ecedd27abb082f25da55843e43</citedby><cites>FETCH-LOGICAL-c316t-df2e2f962a3d82aee8ac1b01973c30f94f2a39ecedd27abb082f25da55843e43</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1134/S0361768811060041$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2918649526?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,21387,27923,27924,33743,41487,42556,43804,51318,64384,64388,72240</link.rule.ids></links><search><creatorcontrib>Mashechkin, I. V.</creatorcontrib><creatorcontrib>Petrovskiy, M. I.</creatorcontrib><creatorcontrib>Popov, D. S.</creatorcontrib><creatorcontrib>Tsarev, D. V.</creatorcontrib><title>Automatic text summarization using latent semantic analysis</title><title>Programming and computer software</title><addtitle>Program Comput Soft</addtitle><description>In the paper, the most state-of-the-art methods of automatic text summarization, which build summaries in the form of generic extracts, are considered. The original text is represented in the form of a numerical matrix. Matrix columns correspond to text sentences, and each sentence is represented in the form of a vector in the term space. Further, latent semantic analysis is applied to the matrix obtained to construct sentences representation in the topic space. The dimensionality of the topic space is much less than the dimensionality of the initial term space. The choice of the most important sentences is carried out on the basis of sentences representation in the topic space. The number of important sentences is defined by the length of the demanded summary. This paper also presents a new generic text summarization method that uses nonnegative matrix factorization to estimate sentence relevance. Proposed sentence relevance estimation is based on normalization of topic space and further weighting of each topic using sentences representation in topic space. The proposed method shows better summarization quality and performance than state-of-the-art methods on the DUC 2001 and DUC 2002 standard data sets.</description><subject>Artificial Intelligence</subject><subject>Computer Science</subject><subject>Operating Systems</subject><subject>Representations</subject><subject>Semantic analysis</subject><subject>Semantics</subject><subject>Sentences</subject><subject>Software Engineering</subject><subject>Software Engineering/Programming and Operating Systems</subject><subject>Standard data</subject><issn>0361-7688</issn><issn>1608-3261</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp1kE9LxDAQxYMouK5-AG8Fz9VMkqYJnpZFV2HBg3svaTtZuvTPmqTg-ulNqeBBPA3M-73HzCPkFug9ABcP75RLyKVSAFRSKuCMLEBSlXIm4ZwsJjmd9Ety5f2BUoiQWJDH1RiGzoSmSgJ-hsSPXWdc8xU3Q5-Mvun3SWsC9lHCzvQTaHrTnnzjr8mFNa3Hm5-5JLvnp936Jd2-bV7Xq21acZAhrS1DZrVkhteKGURlKigp6JxXnFotbFQ0VljXLDdlSRWzLKtNlinBUfAluZtjj274GNGH4jCMLt7gC6ZBSaEzJiMFM1W5wXuHtji6Jr5yKoAWU0XFn4qih80eH9l-j-43-X_TNxQ8aJY</recordid><startdate>20111101</startdate><enddate>20111101</enddate><creator>Mashechkin, I. V.</creator><creator>Petrovskiy, M. I.</creator><creator>Popov, D. S.</creator><creator>Tsarev, D. V.</creator><general>SP MAIK Nauka/Interperiodica</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope></search><sort><creationdate>20111101</creationdate><title>Automatic text summarization using latent semantic analysis</title><author>Mashechkin, I. V. ; Petrovskiy, M. I. ; Popov, D. S. ; Tsarev, D. V.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-df2e2f962a3d82aee8ac1b01973c30f94f2a39ecedd27abb082f25da55843e43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Artificial Intelligence</topic><topic>Computer Science</topic><topic>Operating Systems</topic><topic>Representations</topic><topic>Semantic analysis</topic><topic>Semantics</topic><topic>Sentences</topic><topic>Software Engineering</topic><topic>Software Engineering/Programming and Operating Systems</topic><topic>Standard data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mashechkin, I. V.</creatorcontrib><creatorcontrib>Petrovskiy, M. I.</creatorcontrib><creatorcontrib>Popov, D. S.</creatorcontrib><creatorcontrib>Tsarev, D. V.</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>Programming and computer software</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mashechkin, I. V.</au><au>Petrovskiy, M. I.</au><au>Popov, D. S.</au><au>Tsarev, D. V.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Automatic text summarization using latent semantic analysis</atitle><jtitle>Programming and computer software</jtitle><stitle>Program Comput Soft</stitle><date>2011-11-01</date><risdate>2011</risdate><volume>37</volume><issue>6</issue><spage>299</spage><epage>305</epage><pages>299-305</pages><issn>0361-7688</issn><eissn>1608-3261</eissn><abstract>In the paper, the most state-of-the-art methods of automatic text summarization, which build summaries in the form of generic extracts, are considered. The original text is represented in the form of a numerical matrix. Matrix columns correspond to text sentences, and each sentence is represented in the form of a vector in the term space. Further, latent semantic analysis is applied to the matrix obtained to construct sentences representation in the topic space. The dimensionality of the topic space is much less than the dimensionality of the initial term space. The choice of the most important sentences is carried out on the basis of sentences representation in the topic space. The number of important sentences is defined by the length of the demanded summary. This paper also presents a new generic text summarization method that uses nonnegative matrix factorization to estimate sentence relevance. Proposed sentence relevance estimation is based on normalization of topic space and further weighting of each topic using sentences representation in topic space. The proposed method shows better summarization quality and performance than state-of-the-art methods on the DUC 2001 and DUC 2002 standard data sets.</abstract><cop>Dordrecht</cop><pub>SP MAIK Nauka/Interperiodica</pub><doi>10.1134/S0361768811060041</doi><tpages>7</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0361-7688
ispartof	Programming and computer software, 2011-11, Vol.37 (6), p.299-305
issn	0361-7688 1608-3261
language	eng
recordid	cdi_proquest_journals_2918649526
source	ProQuest Central UK/Ireland; SpringerLink Journals - AutoHoldings; ProQuest Central
subjects	Artificial Intelligence Computer Science Operating Systems Representations Semantic analysis Semantics Sentences Software Engineering Software Engineering/Programming and Operating Systems Standard data
title	Automatic text summarization using latent semantic analysis
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T00%3A17%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Automatic%20text%20summarization%20using%20latent%20semantic%20analysis&rft.jtitle=Programming%20and%20computer%20software&rft.au=Mashechkin,%20I.%20V.&rft.date=2011-11-01&rft.volume=37&rft.issue=6&rft.spage=299&rft.epage=305&rft.pages=299-305&rft.issn=0361-7688&rft.eissn=1608-3261&rft_id=info:doi/10.1134/S0361768811060041&rft_dat=%3Cproquest_cross%3E2918649526%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918649526&rft_id=info:pmid/&rfr_iscdi=true