Automatic text summarization using latent semantic analysis
In the paper, the most state-of-the-art methods of automatic text summarization, which build summaries in the form of generic extracts, are considered. The original text is represented in the form of a numerical matrix. Matrix columns correspond to text sentences, and each sentence is represented in...
Gespeichert in:
Veröffentlicht in: | Programming and computer software 2011-11, Vol.37 (6), p.299-305 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 305 |
---|---|
container_issue | 6 |
container_start_page | 299 |
container_title | Programming and computer software |
container_volume | 37 |
creator | Mashechkin, I. V. Petrovskiy, M. I. Popov, D. S. Tsarev, D. V. |
description | In the paper, the most state-of-the-art methods of automatic text summarization, which build summaries in the form of generic extracts, are considered. The original text is represented in the form of a numerical matrix. Matrix columns correspond to text sentences, and each sentence is represented in the form of a vector in the term space. Further, latent semantic analysis is applied to the matrix obtained to construct sentences representation in the topic space. The dimensionality of the topic space is much less than the dimensionality of the initial term space. The choice of the most important sentences is carried out on the basis of sentences representation in the topic space. The number of important sentences is defined by the length of the demanded summary. This paper also presents a new generic text summarization method that uses nonnegative matrix factorization to estimate sentence relevance. Proposed sentence relevance estimation is based on normalization of topic space and further weighting of each topic using sentences representation in topic space. The proposed method shows better summarization quality and performance than state-of-the-art methods on the DUC 2001 and DUC 2002 standard data sets. |
doi_str_mv | 10.1134/S0361768811060041 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2918649526</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2918649526</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-df2e2f962a3d82aee8ac1b01973c30f94f2a39ecedd27abb082f25da55843e43</originalsourceid><addsrcrecordid>eNp1kE9LxDAQxYMouK5-AG8Fz9VMkqYJnpZFV2HBg3svaTtZuvTPmqTg-ulNqeBBPA3M-73HzCPkFug9ABcP75RLyKVSAFRSKuCMLEBSlXIm4ZwsJjmd9Ety5f2BUoiQWJDH1RiGzoSmSgJ-hsSPXWdc8xU3Q5-Mvun3SWsC9lHCzvQTaHrTnnzjr8mFNa3Hm5-5JLvnp936Jd2-bV7Xq21acZAhrS1DZrVkhteKGURlKigp6JxXnFotbFQ0VljXLDdlSRWzLKtNlinBUfAluZtjj274GNGH4jCMLt7gC6ZBSaEzJiMFM1W5wXuHtji6Jr5yKoAWU0XFn4qih80eH9l-j-43-X_TNxQ8aJY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918649526</pqid></control><display><type>article</type><title>Automatic text summarization using latent semantic analysis</title><source>ProQuest Central UK/Ireland</source><source>SpringerLink Journals - AutoHoldings</source><source>ProQuest Central</source><creator>Mashechkin, I. V. ; Petrovskiy, M. I. ; Popov, D. S. ; Tsarev, D. V.</creator><creatorcontrib>Mashechkin, I. V. ; Petrovskiy, M. I. ; Popov, D. S. ; Tsarev, D. V.</creatorcontrib><description>In the paper, the most state-of-the-art methods of automatic text summarization, which build summaries in the form of generic extracts, are considered. The original text is represented in the form of a numerical matrix. Matrix columns correspond to text sentences, and each sentence is represented in the form of a vector in the term space. Further, latent semantic analysis is applied to the matrix obtained to construct sentences representation in the topic space. The dimensionality of the topic space is much less than the dimensionality of the initial term space. The choice of the most important sentences is carried out on the basis of sentences representation in the topic space. The number of important sentences is defined by the length of the demanded summary. This paper also presents a new generic text summarization method that uses nonnegative matrix factorization to estimate sentence relevance. Proposed sentence relevance estimation is based on normalization of topic space and further weighting of each topic using sentences representation in topic space. The proposed method shows better summarization quality and performance than state-of-the-art methods on the DUC 2001 and DUC 2002 standard data sets.</description><identifier>ISSN: 0361-7688</identifier><identifier>EISSN: 1608-3261</identifier><identifier>DOI: 10.1134/S0361768811060041</identifier><language>eng</language><publisher>Dordrecht: SP MAIK Nauka/Interperiodica</publisher><subject>Artificial Intelligence ; Computer Science ; Operating Systems ; Representations ; Semantic analysis ; Semantics ; Sentences ; Software Engineering ; Software Engineering/Programming and Operating Systems ; Standard data</subject><ispartof>Programming and computer software, 2011-11, Vol.37 (6), p.299-305</ispartof><rights>Pleiades Publishing, Ltd. 2011</rights><rights>Pleiades Publishing, Ltd. 2011.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-df2e2f962a3d82aee8ac1b01973c30f94f2a39ecedd27abb082f25da55843e43</citedby><cites>FETCH-LOGICAL-c316t-df2e2f962a3d82aee8ac1b01973c30f94f2a39ecedd27abb082f25da55843e43</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1134/S0361768811060041$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2918649526?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,21387,27923,27924,33743,41487,42556,43804,51318,64384,64388,72240</link.rule.ids></links><search><creatorcontrib>Mashechkin, I. V.</creatorcontrib><creatorcontrib>Petrovskiy, M. I.</creatorcontrib><creatorcontrib>Popov, D. S.</creatorcontrib><creatorcontrib>Tsarev, D. V.</creatorcontrib><title>Automatic text summarization using latent semantic analysis</title><title>Programming and computer software</title><addtitle>Program Comput Soft</addtitle><description>In the paper, the most state-of-the-art methods of automatic text summarization, which build summaries in the form of generic extracts, are considered. The original text is represented in the form of a numerical matrix. Matrix columns correspond to text sentences, and each sentence is represented in the form of a vector in the term space. Further, latent semantic analysis is applied to the matrix obtained to construct sentences representation in the topic space. The dimensionality of the topic space is much less than the dimensionality of the initial term space. The choice of the most important sentences is carried out on the basis of sentences representation in the topic space. The number of important sentences is defined by the length of the demanded summary. This paper also presents a new generic text summarization method that uses nonnegative matrix factorization to estimate sentence relevance. Proposed sentence relevance estimation is based on normalization of topic space and further weighting of each topic using sentences representation in topic space. The proposed method shows better summarization quality and performance than state-of-the-art methods on the DUC 2001 and DUC 2002 standard data sets.</description><subject>Artificial Intelligence</subject><subject>Computer Science</subject><subject>Operating Systems</subject><subject>Representations</subject><subject>Semantic analysis</subject><subject>Semantics</subject><subject>Sentences</subject><subject>Software Engineering</subject><subject>Software Engineering/Programming and Operating Systems</subject><subject>Standard data</subject><issn>0361-7688</issn><issn>1608-3261</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp1kE9LxDAQxYMouK5-AG8Fz9VMkqYJnpZFV2HBg3svaTtZuvTPmqTg-ulNqeBBPA3M-73HzCPkFug9ABcP75RLyKVSAFRSKuCMLEBSlXIm4ZwsJjmd9Ety5f2BUoiQWJDH1RiGzoSmSgJ-hsSPXWdc8xU3Q5-Mvun3SWsC9lHCzvQTaHrTnnzjr8mFNa3Hm5-5JLvnp936Jd2-bV7Xq21acZAhrS1DZrVkhteKGURlKigp6JxXnFotbFQ0VljXLDdlSRWzLKtNlinBUfAluZtjj274GNGH4jCMLt7gC6ZBSaEzJiMFM1W5wXuHtji6Jr5yKoAWU0XFn4qih80eH9l-j-43-X_TNxQ8aJY</recordid><startdate>20111101</startdate><enddate>20111101</enddate><creator>Mashechkin, I. V.</creator><creator>Petrovskiy, M. I.</creator><creator>Popov, D. S.</creator><creator>Tsarev, D. V.</creator><general>SP MAIK Nauka/Interperiodica</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope></search><sort><creationdate>20111101</creationdate><title>Automatic text summarization using latent semantic analysis</title><author>Mashechkin, I. V. ; Petrovskiy, M. I. ; Popov, D. S. ; Tsarev, D. V.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-df2e2f962a3d82aee8ac1b01973c30f94f2a39ecedd27abb082f25da55843e43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Artificial Intelligence</topic><topic>Computer Science</topic><topic>Operating Systems</topic><topic>Representations</topic><topic>Semantic analysis</topic><topic>Semantics</topic><topic>Sentences</topic><topic>Software Engineering</topic><topic>Software Engineering/Programming and Operating Systems</topic><topic>Standard data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mashechkin, I. V.</creatorcontrib><creatorcontrib>Petrovskiy, M. I.</creatorcontrib><creatorcontrib>Popov, D. S.</creatorcontrib><creatorcontrib>Tsarev, D. V.</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>Programming and computer software</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mashechkin, I. V.</au><au>Petrovskiy, M. I.</au><au>Popov, D. S.</au><au>Tsarev, D. V.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Automatic text summarization using latent semantic analysis</atitle><jtitle>Programming and computer software</jtitle><stitle>Program Comput Soft</stitle><date>2011-11-01</date><risdate>2011</risdate><volume>37</volume><issue>6</issue><spage>299</spage><epage>305</epage><pages>299-305</pages><issn>0361-7688</issn><eissn>1608-3261</eissn><abstract>In the paper, the most state-of-the-art methods of automatic text summarization, which build summaries in the form of generic extracts, are considered. The original text is represented in the form of a numerical matrix. Matrix columns correspond to text sentences, and each sentence is represented in the form of a vector in the term space. Further, latent semantic analysis is applied to the matrix obtained to construct sentences representation in the topic space. The dimensionality of the topic space is much less than the dimensionality of the initial term space. The choice of the most important sentences is carried out on the basis of sentences representation in the topic space. The number of important sentences is defined by the length of the demanded summary. This paper also presents a new generic text summarization method that uses nonnegative matrix factorization to estimate sentence relevance. Proposed sentence relevance estimation is based on normalization of topic space and further weighting of each topic using sentences representation in topic space. The proposed method shows better summarization quality and performance than state-of-the-art methods on the DUC 2001 and DUC 2002 standard data sets.</abstract><cop>Dordrecht</cop><pub>SP MAIK Nauka/Interperiodica</pub><doi>10.1134/S0361768811060041</doi><tpages>7</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0361-7688 |
ispartof | Programming and computer software, 2011-11, Vol.37 (6), p.299-305 |
issn | 0361-7688 1608-3261 |
language | eng |
recordid | cdi_proquest_journals_2918649526 |
source | ProQuest Central UK/Ireland; SpringerLink Journals - AutoHoldings; ProQuest Central |
subjects | Artificial Intelligence Computer Science Operating Systems Representations Semantic analysis Semantics Sentences Software Engineering Software Engineering/Programming and Operating Systems Standard data |
title | Automatic text summarization using latent semantic analysis |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T00%3A17%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Automatic%20text%20summarization%20using%20latent%20semantic%20analysis&rft.jtitle=Programming%20and%20computer%20software&rft.au=Mashechkin,%20I.%20V.&rft.date=2011-11-01&rft.volume=37&rft.issue=6&rft.spage=299&rft.epage=305&rft.pages=299-305&rft.issn=0361-7688&rft.eissn=1608-3261&rft_id=info:doi/10.1134/S0361768811060041&rft_dat=%3Cproquest_cross%3E2918649526%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918649526&rft_id=info:pmid/&rfr_iscdi=true |