Understanding Document Semantics from Summaries: A Case Study on Hindi Texts

Summary of a document contains words that actually contribute to the semantics of the document. Latent Semantic Analysis (LSA) is a mathematical model that is used to understand document semantics by deriving a semantic structure based on patterns of word correlations in the document. When using LSA...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on Asian and low-resource language information processing 2017-03, Vol.16 (1), p.1-20
Hauptverfasser:	Krishnamurthi, Karthik, Panuganti, Vijayapal Reddy, Bulusu, Vishnu Vardhan
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	20
container_issue	1
container_start_page	1
container_title	ACM transactions on Asian and low-resource language information processing
container_volume	16
creator	Krishnamurthi, Karthik Panuganti, Vijayapal Reddy Bulusu, Vishnu Vardhan
description	Summary of a document contains words that actually contribute to the semantics of the document. Latent Semantic Analysis (LSA) is a mathematical model that is used to understand document semantics by deriving a semantic structure based on patterns of word correlations in the document. When using LSA to capture semantics from summaries, it is observed that LSA performs quite well despite being completely independent of any external sources of semantics. However, LSA can be remodeled to enhance its capability to analyze correlations within texts. By taking advantage of the model being language independent, this article presents two stages of LSA remodeling to understand document semantics in the Indian context, specifically from Hindi text summaries. One stage of remodeling is done by providing supplementary information, such as document category and domain information. The second stage of remodeling is done by using a supervised term weighting measure in the process. The remodeled LSA’s performance is empirically evaluated in a document classification application by comparing the accuracies of classification to plain LSA. An improvement in the performance of LSA in the range of 4.7% to 6.2% is achieved from the remodel when compared to the plain model. The results suggest that summaries of documents efficiently capture the semantic structure of documents and is an alternative to full-length documents for understanding document semantics.
doi_str_mv	10.1145/2956236
format	Article
fullrecord	<record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1145_2956236</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1145_2956236</sourcerecordid><originalsourceid>FETCH-LOGICAL-c230t-af924bb753b4783fb251c44d3fab2b2bdacd7fc0057938d1d03e9c9ff01e1e083</originalsourceid><addsrcrecordid>eNo1j8tKA0EURBtRMMTgL8zO1Zh7-_ZjeinxCQEXMeuhnzLi9Ej3ZOHfGzFSi6rVoQ5j1wi3iEKuuZGKkzpjC05atkIDP__fyphLtqr1AwBQaKUAF2y9zyGWOtschvze3E_-MMY8N7s42jwPvjapTGOzO4yjLUOsV-wi2c8aV6desv3jw9vmud2-Pr1s7rat5wRza5PhwjktyQndUXJcohciULKOHxOsDzp5AKkNdQEDUDTepAQYMUJHS3bzx_VlqrXE1H-V4Xjhu0fof1X7kyr9AHZcRV0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Understanding Document Semantics from Summaries: A Case Study on Hindi Texts</title><source>ACM Digital Library Complete</source><creator>Krishnamurthi, Karthik ; Panuganti, Vijayapal Reddy ; Bulusu, Vishnu Vardhan</creator><creatorcontrib>Krishnamurthi, Karthik ; Panuganti, Vijayapal Reddy ; Bulusu, Vishnu Vardhan</creatorcontrib><description>Summary of a document contains words that actually contribute to the semantics of the document. Latent Semantic Analysis (LSA) is a mathematical model that is used to understand document semantics by deriving a semantic structure based on patterns of word correlations in the document. When using LSA to capture semantics from summaries, it is observed that LSA performs quite well despite being completely independent of any external sources of semantics. However, LSA can be remodeled to enhance its capability to analyze correlations within texts. By taking advantage of the model being language independent, this article presents two stages of LSA remodeling to understand document semantics in the Indian context, specifically from Hindi text summaries. One stage of remodeling is done by providing supplementary information, such as document category and domain information. The second stage of remodeling is done by using a supervised term weighting measure in the process. The remodeled LSA’s performance is empirically evaluated in a document classification application by comparing the accuracies of classification to plain LSA. An improvement in the performance of LSA in the range of 4.7% to 6.2% is achieved from the remodel when compared to the plain model. The results suggest that summaries of documents efficiently capture the semantic structure of documents and is an alternative to full-length documents for understanding document semantics.</description><identifier>ISSN: 2375-4699</identifier><identifier>EISSN: 2375-4702</identifier><identifier>DOI: 10.1145/2956236</identifier><language>eng</language><ispartof>ACM transactions on Asian and low-resource language information processing, 2017-03, Vol.16 (1), p.1-20</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c230t-af924bb753b4783fb251c44d3fab2b2bdacd7fc0057938d1d03e9c9ff01e1e083</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids></links><search><creatorcontrib>Krishnamurthi, Karthik</creatorcontrib><creatorcontrib>Panuganti, Vijayapal Reddy</creatorcontrib><creatorcontrib>Bulusu, Vishnu Vardhan</creatorcontrib><title>Understanding Document Semantics from Summaries: A Case Study on Hindi Texts</title><title>ACM transactions on Asian and low-resource language information processing</title><description>Summary of a document contains words that actually contribute to the semantics of the document. Latent Semantic Analysis (LSA) is a mathematical model that is used to understand document semantics by deriving a semantic structure based on patterns of word correlations in the document. When using LSA to capture semantics from summaries, it is observed that LSA performs quite well despite being completely independent of any external sources of semantics. However, LSA can be remodeled to enhance its capability to analyze correlations within texts. By taking advantage of the model being language independent, this article presents two stages of LSA remodeling to understand document semantics in the Indian context, specifically from Hindi text summaries. One stage of remodeling is done by providing supplementary information, such as document category and domain information. The second stage of remodeling is done by using a supervised term weighting measure in the process. The remodeled LSA’s performance is empirically evaluated in a document classification application by comparing the accuracies of classification to plain LSA. An improvement in the performance of LSA in the range of 4.7% to 6.2% is achieved from the remodel when compared to the plain model. The results suggest that summaries of documents efficiently capture the semantic structure of documents and is an alternative to full-length documents for understanding document semantics.</description><issn>2375-4699</issn><issn>2375-4702</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><recordid>eNo1j8tKA0EURBtRMMTgL8zO1Zh7-_ZjeinxCQEXMeuhnzLi9Ej3ZOHfGzFSi6rVoQ5j1wi3iEKuuZGKkzpjC05atkIDP__fyphLtqr1AwBQaKUAF2y9zyGWOtschvze3E_-MMY8N7s42jwPvjapTGOzO4yjLUOsV-wi2c8aV6desv3jw9vmud2-Pr1s7rat5wRza5PhwjktyQndUXJcohciULKOHxOsDzp5AKkNdQEDUDTepAQYMUJHS3bzx_VlqrXE1H-V4Xjhu0fof1X7kyr9AHZcRV0</recordid><startdate>20170331</startdate><enddate>20170331</enddate><creator>Krishnamurthi, Karthik</creator><creator>Panuganti, Vijayapal Reddy</creator><creator>Bulusu, Vishnu Vardhan</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20170331</creationdate><title>Understanding Document Semantics from Summaries</title><author>Krishnamurthi, Karthik ; Panuganti, Vijayapal Reddy ; Bulusu, Vishnu Vardhan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c230t-af924bb753b4783fb251c44d3fab2b2bdacd7fc0057938d1d03e9c9ff01e1e083</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Krishnamurthi, Karthik</creatorcontrib><creatorcontrib>Panuganti, Vijayapal Reddy</creatorcontrib><creatorcontrib>Bulusu, Vishnu Vardhan</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on Asian and low-resource language information processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Krishnamurthi, Karthik</au><au>Panuganti, Vijayapal Reddy</au><au>Bulusu, Vishnu Vardhan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Understanding Document Semantics from Summaries: A Case Study on Hindi Texts</atitle><jtitle>ACM transactions on Asian and low-resource language information processing</jtitle><date>2017-03-31</date><risdate>2017</risdate><volume>16</volume><issue>1</issue><spage>1</spage><epage>20</epage><pages>1-20</pages><issn>2375-4699</issn><eissn>2375-4702</eissn><abstract>Summary of a document contains words that actually contribute to the semantics of the document. Latent Semantic Analysis (LSA) is a mathematical model that is used to understand document semantics by deriving a semantic structure based on patterns of word correlations in the document. When using LSA to capture semantics from summaries, it is observed that LSA performs quite well despite being completely independent of any external sources of semantics. However, LSA can be remodeled to enhance its capability to analyze correlations within texts. By taking advantage of the model being language independent, this article presents two stages of LSA remodeling to understand document semantics in the Indian context, specifically from Hindi text summaries. One stage of remodeling is done by providing supplementary information, such as document category and domain information. The second stage of remodeling is done by using a supervised term weighting measure in the process. The remodeled LSA’s performance is empirically evaluated in a document classification application by comparing the accuracies of classification to plain LSA. An improvement in the performance of LSA in the range of 4.7% to 6.2% is achieved from the remodel when compared to the plain model. The results suggest that summaries of documents efficiently capture the semantic structure of documents and is an alternative to full-length documents for understanding document semantics.</abstract><doi>10.1145/2956236</doi><tpages>20</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 2375-4699
ispartof	ACM transactions on Asian and low-resource language information processing, 2017-03, Vol.16 (1), p.1-20
issn	2375-4699 2375-4702
language	eng
recordid	cdi_crossref_primary_10_1145_2956236
source	ACM Digital Library Complete
title	Understanding Document Semantics from Summaries: A Case Study on Hindi Texts
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T08%3A52%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Understanding%20Document%20Semantics%20from%20Summaries:%20A%20Case%20Study%20on%20Hindi%20Texts&rft.jtitle=ACM%20transactions%20on%20Asian%20and%20low-resource%20language%20information%20processing&rft.au=Krishnamurthi,%20Karthik&rft.date=2017-03-31&rft.volume=16&rft.issue=1&rft.spage=1&rft.epage=20&rft.pages=1-20&rft.issn=2375-4699&rft.eissn=2375-4702&rft_id=info:doi/10.1145/2956236&rft_dat=%3Ccrossref%3E10_1145_2956236%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true