Automatic text summarization using latent semantic analysis

In the paper, the most state-of-the-art methods of automatic text summarization, which build summaries in the form of generic extracts, are considered. The original text is represented in the form of a numerical matrix. Matrix columns correspond to text sentences, and each sentence is represented in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Programming and computer software 2011-11, Vol.37 (6), p.299-305
Hauptverfasser: Mashechkin, I. V., Petrovskiy, M. I., Popov, D. S., Tsarev, D. V.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 305
container_issue 6
container_start_page 299
container_title Programming and computer software
container_volume 37
creator Mashechkin, I. V.
Petrovskiy, M. I.
Popov, D. S.
Tsarev, D. V.
description In the paper, the most state-of-the-art methods of automatic text summarization, which build summaries in the form of generic extracts, are considered. The original text is represented in the form of a numerical matrix. Matrix columns correspond to text sentences, and each sentence is represented in the form of a vector in the term space. Further, latent semantic analysis is applied to the matrix obtained to construct sentences representation in the topic space. The dimensionality of the topic space is much less than the dimensionality of the initial term space. The choice of the most important sentences is carried out on the basis of sentences representation in the topic space. The number of important sentences is defined by the length of the demanded summary. This paper also presents a new generic text summarization method that uses nonnegative matrix factorization to estimate sentence relevance. Proposed sentence relevance estimation is based on normalization of topic space and further weighting of each topic using sentences representation in topic space. The proposed method shows better summarization quality and performance than state-of-the-art methods on the DUC 2001 and DUC 2002 standard data sets.
doi_str_mv 10.1134/S0361768811060041
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2918649526</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2918649526</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-df2e2f962a3d82aee8ac1b01973c30f94f2a39ecedd27abb082f25da55843e43</originalsourceid><addsrcrecordid>eNp1kE9LxDAQxYMouK5-AG8Fz9VMkqYJnpZFV2HBg3svaTtZuvTPmqTg-ulNqeBBPA3M-73HzCPkFug9ABcP75RLyKVSAFRSKuCMLEBSlXIm4ZwsJjmd9Ety5f2BUoiQWJDH1RiGzoSmSgJ-hsSPXWdc8xU3Q5-Mvun3SWsC9lHCzvQTaHrTnnzjr8mFNa3Hm5-5JLvnp936Jd2-bV7Xq21acZAhrS1DZrVkhteKGURlKigp6JxXnFotbFQ0VljXLDdlSRWzLKtNlinBUfAluZtjj274GNGH4jCMLt7gC6ZBSaEzJiMFM1W5wXuHtji6Jr5yKoAWU0XFn4qih80eH9l-j-43-X_TNxQ8aJY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918649526</pqid></control><display><type>article</type><title>Automatic text summarization using latent semantic analysis</title><source>ProQuest Central UK/Ireland</source><source>SpringerLink Journals - AutoHoldings</source><source>ProQuest Central</source><creator>Mashechkin, I. V. ; Petrovskiy, M. I. ; Popov, D. S. ; Tsarev, D. V.</creator><creatorcontrib>Mashechkin, I. V. ; Petrovskiy, M. I. ; Popov, D. S. ; Tsarev, D. V.</creatorcontrib><description>In the paper, the most state-of-the-art methods of automatic text summarization, which build summaries in the form of generic extracts, are considered. The original text is represented in the form of a numerical matrix. Matrix columns correspond to text sentences, and each sentence is represented in the form of a vector in the term space. Further, latent semantic analysis is applied to the matrix obtained to construct sentences representation in the topic space. The dimensionality of the topic space is much less than the dimensionality of the initial term space. The choice of the most important sentences is carried out on the basis of sentences representation in the topic space. The number of important sentences is defined by the length of the demanded summary. This paper also presents a new generic text summarization method that uses nonnegative matrix factorization to estimate sentence relevance. Proposed sentence relevance estimation is based on normalization of topic space and further weighting of each topic using sentences representation in topic space. The proposed method shows better summarization quality and performance than state-of-the-art methods on the DUC 2001 and DUC 2002 standard data sets.</description><identifier>ISSN: 0361-7688</identifier><identifier>EISSN: 1608-3261</identifier><identifier>DOI: 10.1134/S0361768811060041</identifier><language>eng</language><publisher>Dordrecht: SP MAIK Nauka/Interperiodica</publisher><subject>Artificial Intelligence ; Computer Science ; Operating Systems ; Representations ; Semantic analysis ; Semantics ; Sentences ; Software Engineering ; Software Engineering/Programming and Operating Systems ; Standard data</subject><ispartof>Programming and computer software, 2011-11, Vol.37 (6), p.299-305</ispartof><rights>Pleiades Publishing, Ltd. 2011</rights><rights>Pleiades Publishing, Ltd. 2011.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-df2e2f962a3d82aee8ac1b01973c30f94f2a39ecedd27abb082f25da55843e43</citedby><cites>FETCH-LOGICAL-c316t-df2e2f962a3d82aee8ac1b01973c30f94f2a39ecedd27abb082f25da55843e43</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1134/S0361768811060041$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2918649526?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,21387,27923,27924,33743,41487,42556,43804,51318,64384,64388,72240</link.rule.ids></links><search><creatorcontrib>Mashechkin, I. V.</creatorcontrib><creatorcontrib>Petrovskiy, M. I.</creatorcontrib><creatorcontrib>Popov, D. S.</creatorcontrib><creatorcontrib>Tsarev, D. V.</creatorcontrib><title>Automatic text summarization using latent semantic analysis</title><title>Programming and computer software</title><addtitle>Program Comput Soft</addtitle><description>In the paper, the most state-of-the-art methods of automatic text summarization, which build summaries in the form of generic extracts, are considered. The original text is represented in the form of a numerical matrix. Matrix columns correspond to text sentences, and each sentence is represented in the form of a vector in the term space. Further, latent semantic analysis is applied to the matrix obtained to construct sentences representation in the topic space. The dimensionality of the topic space is much less than the dimensionality of the initial term space. The choice of the most important sentences is carried out on the basis of sentences representation in the topic space. The number of important sentences is defined by the length of the demanded summary. This paper also presents a new generic text summarization method that uses nonnegative matrix factorization to estimate sentence relevance. Proposed sentence relevance estimation is based on normalization of topic space and further weighting of each topic using sentences representation in topic space. The proposed method shows better summarization quality and performance than state-of-the-art methods on the DUC 2001 and DUC 2002 standard data sets.</description><subject>Artificial Intelligence</subject><subject>Computer Science</subject><subject>Operating Systems</subject><subject>Representations</subject><subject>Semantic analysis</subject><subject>Semantics</subject><subject>Sentences</subject><subject>Software Engineering</subject><subject>Software Engineering/Programming and Operating Systems</subject><subject>Standard data</subject><issn>0361-7688</issn><issn>1608-3261</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp1kE9LxDAQxYMouK5-AG8Fz9VMkqYJnpZFV2HBg3svaTtZuvTPmqTg-ulNqeBBPA3M-73HzCPkFug9ABcP75RLyKVSAFRSKuCMLEBSlXIm4ZwsJjmd9Ety5f2BUoiQWJDH1RiGzoSmSgJ-hsSPXWdc8xU3Q5-Mvun3SWsC9lHCzvQTaHrTnnzjr8mFNa3Hm5-5JLvnp936Jd2-bV7Xq21acZAhrS1DZrVkhteKGURlKigp6JxXnFotbFQ0VljXLDdlSRWzLKtNlinBUfAluZtjj274GNGH4jCMLt7gC6ZBSaEzJiMFM1W5wXuHtji6Jr5yKoAWU0XFn4qih80eH9l-j-43-X_TNxQ8aJY</recordid><startdate>20111101</startdate><enddate>20111101</enddate><creator>Mashechkin, I. V.</creator><creator>Petrovskiy, M. I.</creator><creator>Popov, D. S.</creator><creator>Tsarev, D. V.</creator><general>SP MAIK Nauka/Interperiodica</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope></search><sort><creationdate>20111101</creationdate><title>Automatic text summarization using latent semantic analysis</title><author>Mashechkin, I. V. ; Petrovskiy, M. I. ; Popov, D. S. ; Tsarev, D. V.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-df2e2f962a3d82aee8ac1b01973c30f94f2a39ecedd27abb082f25da55843e43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Artificial Intelligence</topic><topic>Computer Science</topic><topic>Operating Systems</topic><topic>Representations</topic><topic>Semantic analysis</topic><topic>Semantics</topic><topic>Sentences</topic><topic>Software Engineering</topic><topic>Software Engineering/Programming and Operating Systems</topic><topic>Standard data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mashechkin, I. V.</creatorcontrib><creatorcontrib>Petrovskiy, M. I.</creatorcontrib><creatorcontrib>Popov, D. S.</creatorcontrib><creatorcontrib>Tsarev, D. V.</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>Programming and computer software</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mashechkin, I. V.</au><au>Petrovskiy, M. I.</au><au>Popov, D. S.</au><au>Tsarev, D. V.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Automatic text summarization using latent semantic analysis</atitle><jtitle>Programming and computer software</jtitle><stitle>Program Comput Soft</stitle><date>2011-11-01</date><risdate>2011</risdate><volume>37</volume><issue>6</issue><spage>299</spage><epage>305</epage><pages>299-305</pages><issn>0361-7688</issn><eissn>1608-3261</eissn><abstract>In the paper, the most state-of-the-art methods of automatic text summarization, which build summaries in the form of generic extracts, are considered. The original text is represented in the form of a numerical matrix. Matrix columns correspond to text sentences, and each sentence is represented in the form of a vector in the term space. Further, latent semantic analysis is applied to the matrix obtained to construct sentences representation in the topic space. The dimensionality of the topic space is much less than the dimensionality of the initial term space. The choice of the most important sentences is carried out on the basis of sentences representation in the topic space. The number of important sentences is defined by the length of the demanded summary. This paper also presents a new generic text summarization method that uses nonnegative matrix factorization to estimate sentence relevance. Proposed sentence relevance estimation is based on normalization of topic space and further weighting of each topic using sentences representation in topic space. The proposed method shows better summarization quality and performance than state-of-the-art methods on the DUC 2001 and DUC 2002 standard data sets.</abstract><cop>Dordrecht</cop><pub>SP MAIK Nauka/Interperiodica</pub><doi>10.1134/S0361768811060041</doi><tpages>7</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0361-7688
ispartof Programming and computer software, 2011-11, Vol.37 (6), p.299-305
issn 0361-7688
1608-3261
language eng
recordid cdi_proquest_journals_2918649526
source ProQuest Central UK/Ireland; SpringerLink Journals - AutoHoldings; ProQuest Central
subjects Artificial Intelligence
Computer Science
Operating Systems
Representations
Semantic analysis
Semantics
Sentences
Software Engineering
Software Engineering/Programming and Operating Systems
Standard data
title Automatic text summarization using latent semantic analysis
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T00%3A17%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Automatic%20text%20summarization%20using%20latent%20semantic%20analysis&rft.jtitle=Programming%20and%20computer%20software&rft.au=Mashechkin,%20I.%20V.&rft.date=2011-11-01&rft.volume=37&rft.issue=6&rft.spage=299&rft.epage=305&rft.pages=299-305&rft.issn=0361-7688&rft.eissn=1608-3261&rft_id=info:doi/10.1134/S0361768811060041&rft_dat=%3Cproquest_cross%3E2918649526%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918649526&rft_id=info:pmid/&rfr_iscdi=true