Understanding Document Semantics from Summaries: A Case Study on Hindi Texts

Summary of a document contains words that actually contribute to the semantics of the document. Latent Semantic Analysis (LSA) is a mathematical model that is used to understand document semantics by deriving a semantic structure based on patterns of word correlations in the document. When using LSA...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on Asian and low-resource language information processing 2017-03, Vol.16 (1), p.1-20
Hauptverfasser: Krishnamurthi, Karthik, Panuganti, Vijayapal Reddy, Bulusu, Vishnu Vardhan
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 20
container_issue 1
container_start_page 1
container_title ACM transactions on Asian and low-resource language information processing
container_volume 16
creator Krishnamurthi, Karthik
Panuganti, Vijayapal Reddy
Bulusu, Vishnu Vardhan
description Summary of a document contains words that actually contribute to the semantics of the document. Latent Semantic Analysis (LSA) is a mathematical model that is used to understand document semantics by deriving a semantic structure based on patterns of word correlations in the document. When using LSA to capture semantics from summaries, it is observed that LSA performs quite well despite being completely independent of any external sources of semantics. However, LSA can be remodeled to enhance its capability to analyze correlations within texts. By taking advantage of the model being language independent, this article presents two stages of LSA remodeling to understand document semantics in the Indian context, specifically from Hindi text summaries. One stage of remodeling is done by providing supplementary information, such as document category and domain information. The second stage of remodeling is done by using a supervised term weighting measure in the process. The remodeled LSA’s performance is empirically evaluated in a document classification application by comparing the accuracies of classification to plain LSA. An improvement in the performance of LSA in the range of 4.7% to 6.2% is achieved from the remodel when compared to the plain model. The results suggest that summaries of documents efficiently capture the semantic structure of documents and is an alternative to full-length documents for understanding document semantics.
doi_str_mv 10.1145/2956236
format Article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1145_2956236</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1145_2956236</sourcerecordid><originalsourceid>FETCH-LOGICAL-c230t-af924bb753b4783fb251c44d3fab2b2bdacd7fc0057938d1d03e9c9ff01e1e083</originalsourceid><addsrcrecordid>eNo1j8tKA0EURBtRMMTgL8zO1Zh7-_ZjeinxCQEXMeuhnzLi9Ej3ZOHfGzFSi6rVoQ5j1wi3iEKuuZGKkzpjC05atkIDP__fyphLtqr1AwBQaKUAF2y9zyGWOtschvze3E_-MMY8N7s42jwPvjapTGOzO4yjLUOsV-wi2c8aV6desv3jw9vmud2-Pr1s7rat5wRza5PhwjktyQndUXJcohciULKOHxOsDzp5AKkNdQEDUDTepAQYMUJHS3bzx_VlqrXE1H-V4Xjhu0fof1X7kyr9AHZcRV0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Understanding Document Semantics from Summaries: A Case Study on Hindi Texts</title><source>ACM Digital Library Complete</source><creator>Krishnamurthi, Karthik ; Panuganti, Vijayapal Reddy ; Bulusu, Vishnu Vardhan</creator><creatorcontrib>Krishnamurthi, Karthik ; Panuganti, Vijayapal Reddy ; Bulusu, Vishnu Vardhan</creatorcontrib><description>Summary of a document contains words that actually contribute to the semantics of the document. Latent Semantic Analysis (LSA) is a mathematical model that is used to understand document semantics by deriving a semantic structure based on patterns of word correlations in the document. When using LSA to capture semantics from summaries, it is observed that LSA performs quite well despite being completely independent of any external sources of semantics. However, LSA can be remodeled to enhance its capability to analyze correlations within texts. By taking advantage of the model being language independent, this article presents two stages of LSA remodeling to understand document semantics in the Indian context, specifically from Hindi text summaries. One stage of remodeling is done by providing supplementary information, such as document category and domain information. The second stage of remodeling is done by using a supervised term weighting measure in the process. The remodeled LSA’s performance is empirically evaluated in a document classification application by comparing the accuracies of classification to plain LSA. An improvement in the performance of LSA in the range of 4.7% to 6.2% is achieved from the remodel when compared to the plain model. The results suggest that summaries of documents efficiently capture the semantic structure of documents and is an alternative to full-length documents for understanding document semantics.</description><identifier>ISSN: 2375-4699</identifier><identifier>EISSN: 2375-4702</identifier><identifier>DOI: 10.1145/2956236</identifier><language>eng</language><ispartof>ACM transactions on Asian and low-resource language information processing, 2017-03, Vol.16 (1), p.1-20</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c230t-af924bb753b4783fb251c44d3fab2b2bdacd7fc0057938d1d03e9c9ff01e1e083</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids></links><search><creatorcontrib>Krishnamurthi, Karthik</creatorcontrib><creatorcontrib>Panuganti, Vijayapal Reddy</creatorcontrib><creatorcontrib>Bulusu, Vishnu Vardhan</creatorcontrib><title>Understanding Document Semantics from Summaries: A Case Study on Hindi Texts</title><title>ACM transactions on Asian and low-resource language information processing</title><description>Summary of a document contains words that actually contribute to the semantics of the document. Latent Semantic Analysis (LSA) is a mathematical model that is used to understand document semantics by deriving a semantic structure based on patterns of word correlations in the document. When using LSA to capture semantics from summaries, it is observed that LSA performs quite well despite being completely independent of any external sources of semantics. However, LSA can be remodeled to enhance its capability to analyze correlations within texts. By taking advantage of the model being language independent, this article presents two stages of LSA remodeling to understand document semantics in the Indian context, specifically from Hindi text summaries. One stage of remodeling is done by providing supplementary information, such as document category and domain information. The second stage of remodeling is done by using a supervised term weighting measure in the process. The remodeled LSA’s performance is empirically evaluated in a document classification application by comparing the accuracies of classification to plain LSA. An improvement in the performance of LSA in the range of 4.7% to 6.2% is achieved from the remodel when compared to the plain model. The results suggest that summaries of documents efficiently capture the semantic structure of documents and is an alternative to full-length documents for understanding document semantics.</description><issn>2375-4699</issn><issn>2375-4702</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><recordid>eNo1j8tKA0EURBtRMMTgL8zO1Zh7-_ZjeinxCQEXMeuhnzLi9Ej3ZOHfGzFSi6rVoQ5j1wi3iEKuuZGKkzpjC05atkIDP__fyphLtqr1AwBQaKUAF2y9zyGWOtschvze3E_-MMY8N7s42jwPvjapTGOzO4yjLUOsV-wi2c8aV6desv3jw9vmud2-Pr1s7rat5wRza5PhwjktyQndUXJcohciULKOHxOsDzp5AKkNdQEDUDTepAQYMUJHS3bzx_VlqrXE1H-V4Xjhu0fof1X7kyr9AHZcRV0</recordid><startdate>20170331</startdate><enddate>20170331</enddate><creator>Krishnamurthi, Karthik</creator><creator>Panuganti, Vijayapal Reddy</creator><creator>Bulusu, Vishnu Vardhan</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20170331</creationdate><title>Understanding Document Semantics from Summaries</title><author>Krishnamurthi, Karthik ; Panuganti, Vijayapal Reddy ; Bulusu, Vishnu Vardhan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c230t-af924bb753b4783fb251c44d3fab2b2bdacd7fc0057938d1d03e9c9ff01e1e083</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Krishnamurthi, Karthik</creatorcontrib><creatorcontrib>Panuganti, Vijayapal Reddy</creatorcontrib><creatorcontrib>Bulusu, Vishnu Vardhan</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on Asian and low-resource language information processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Krishnamurthi, Karthik</au><au>Panuganti, Vijayapal Reddy</au><au>Bulusu, Vishnu Vardhan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Understanding Document Semantics from Summaries: A Case Study on Hindi Texts</atitle><jtitle>ACM transactions on Asian and low-resource language information processing</jtitle><date>2017-03-31</date><risdate>2017</risdate><volume>16</volume><issue>1</issue><spage>1</spage><epage>20</epage><pages>1-20</pages><issn>2375-4699</issn><eissn>2375-4702</eissn><abstract>Summary of a document contains words that actually contribute to the semantics of the document. Latent Semantic Analysis (LSA) is a mathematical model that is used to understand document semantics by deriving a semantic structure based on patterns of word correlations in the document. When using LSA to capture semantics from summaries, it is observed that LSA performs quite well despite being completely independent of any external sources of semantics. However, LSA can be remodeled to enhance its capability to analyze correlations within texts. By taking advantage of the model being language independent, this article presents two stages of LSA remodeling to understand document semantics in the Indian context, specifically from Hindi text summaries. One stage of remodeling is done by providing supplementary information, such as document category and domain information. The second stage of remodeling is done by using a supervised term weighting measure in the process. The remodeled LSA’s performance is empirically evaluated in a document classification application by comparing the accuracies of classification to plain LSA. An improvement in the performance of LSA in the range of 4.7% to 6.2% is achieved from the remodel when compared to the plain model. The results suggest that summaries of documents efficiently capture the semantic structure of documents and is an alternative to full-length documents for understanding document semantics.</abstract><doi>10.1145/2956236</doi><tpages>20</tpages></addata></record>
fulltext fulltext
identifier ISSN: 2375-4699
ispartof ACM transactions on Asian and low-resource language information processing, 2017-03, Vol.16 (1), p.1-20
issn 2375-4699
2375-4702
language eng
recordid cdi_crossref_primary_10_1145_2956236
source ACM Digital Library Complete
title Understanding Document Semantics from Summaries: A Case Study on Hindi Texts
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T08%3A52%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Understanding%20Document%20Semantics%20from%20Summaries:%20A%20Case%20Study%20on%20Hindi%20Texts&rft.jtitle=ACM%20transactions%20on%20Asian%20and%20low-resource%20language%20information%20processing&rft.au=Krishnamurthi,%20Karthik&rft.date=2017-03-31&rft.volume=16&rft.issue=1&rft.spage=1&rft.epage=20&rft.pages=1-20&rft.issn=2375-4699&rft.eissn=2375-4702&rft_id=info:doi/10.1145/2956236&rft_dat=%3Ccrossref%3E10_1145_2956236%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true