Deep Learning for Hindi Text Classification: A Comparison
Natural Language Processing (NLP) and especially natural language text analysis have seen great advances in recent times. Usage of deep learning in text processing has revolutionized the techniques for text processing and achieved remarkable results. Different deep learning architectures like CNN, L...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2020-01 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Joshi, Ramchandra Goel, Purvi Joshi, Raviraj |
description | Natural Language Processing (NLP) and especially natural language text analysis have seen great advances in recent times. Usage of deep learning in text processing has revolutionized the techniques for text processing and achieved remarkable results. Different deep learning architectures like CNN, LSTM, and very recent Transformer have been used to achieve state of the art results variety on NLP tasks. In this work, we survey a host of deep learning architectures for text classification tasks. The work is specifically concerned with the classification of Hindi text. The research in the classification of morphologically rich and low resource Hindi language written in Devanagari script has been limited due to the absence of large labeled corpus. In this work, we used translated versions of English data-sets to evaluate models based on CNN, LSTM and Attention. Multilingual pre-trained sentence embeddings based on BERT and LASER are also compared to evaluate their effectiveness for the Hindi language. The paper also serves as a tutorial for popular text classification techniques. |
doi_str_mv | 10.48550/arxiv.2001.10340 |
format | Article |
fullrecord | <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2001_10340</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2348151361</sourcerecordid><originalsourceid>FETCH-LOGICAL-a521-c06ee5a14fc4956c539ab5562a57be7e173678b1f5278dd35c6d3c79c6ade2b3</originalsourceid><addsrcrecordid>eNotj09LwzAYh4MgOOY-gCcDnlvzp2_SehtVN6Hgwd3L2zSVjC2pSSfz2zs3T7_Lw4_nIeSOs7woAdgjxqP7zgVjPOdMFuyKzISUPCsLIW7IIqUtY0woLQDkjFTP1o60sRi98590CJGune8d3djjROsdpuQGZ3BywT_RJa3DfsToUvC35HrAXbKL_52Tj9eXTb3OmvfVW71sMgTBM8OUtYC8GExRgTIgK-wAlEDQndWWa6l02fEBhC77XoJRvTS6Mgp7Kzo5J_eX13NWO0a3x_jT_uW157wT8XAhxhi-DjZN7TYcoj8ptUIWJQcuFZe_OihRIA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2348151361</pqid></control><display><type>article</type><title>Deep Learning for Hindi Text Classification: A Comparison</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Joshi, Ramchandra ; Goel, Purvi ; Joshi, Raviraj</creator><creatorcontrib>Joshi, Ramchandra ; Goel, Purvi ; Joshi, Raviraj</creatorcontrib><description>Natural Language Processing (NLP) and especially natural language text analysis have seen great advances in recent times. Usage of deep learning in text processing has revolutionized the techniques for text processing and achieved remarkable results. Different deep learning architectures like CNN, LSTM, and very recent Transformer have been used to achieve state of the art results variety on NLP tasks. In this work, we survey a host of deep learning architectures for text classification tasks. The work is specifically concerned with the classification of Hindi text. The research in the classification of morphologically rich and low resource Hindi language written in Devanagari script has been limited due to the absence of large labeled corpus. In this work, we used translated versions of English data-sets to evaluate models based on CNN, LSTM and Attention. Multilingual pre-trained sentence embeddings based on BERT and LASER are also compared to evaluate their effectiveness for the Hindi language. The paper also serves as a tutorial for popular text classification techniques.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2001.10340</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Classification ; Computer Science - Computation and Language ; Computer Science - Information Retrieval ; Computer Science - Learning ; Deep learning ; Evaluation ; Hindi language ; Machine learning ; Natural language ; Natural language processing ; Statistics - Machine Learning ; Text editing</subject><ispartof>arXiv.org, 2020-01</ispartof><rights>2020. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27902</link.rule.ids><backlink>$$Uhttps://doi.org/10.48550/arXiv.2001.10340$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.1007/978-3-030-44689-5_9$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Joshi, Ramchandra</creatorcontrib><creatorcontrib>Goel, Purvi</creatorcontrib><creatorcontrib>Joshi, Raviraj</creatorcontrib><title>Deep Learning for Hindi Text Classification: A Comparison</title><title>arXiv.org</title><description>Natural Language Processing (NLP) and especially natural language text analysis have seen great advances in recent times. Usage of deep learning in text processing has revolutionized the techniques for text processing and achieved remarkable results. Different deep learning architectures like CNN, LSTM, and very recent Transformer have been used to achieve state of the art results variety on NLP tasks. In this work, we survey a host of deep learning architectures for text classification tasks. The work is specifically concerned with the classification of Hindi text. The research in the classification of morphologically rich and low resource Hindi language written in Devanagari script has been limited due to the absence of large labeled corpus. In this work, we used translated versions of English data-sets to evaluate models based on CNN, LSTM and Attention. Multilingual pre-trained sentence embeddings based on BERT and LASER are also compared to evaluate their effectiveness for the Hindi language. The paper also serves as a tutorial for popular text classification techniques.</description><subject>Classification</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Information Retrieval</subject><subject>Computer Science - Learning</subject><subject>Deep learning</subject><subject>Evaluation</subject><subject>Hindi language</subject><subject>Machine learning</subject><subject>Natural language</subject><subject>Natural language processing</subject><subject>Statistics - Machine Learning</subject><subject>Text editing</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><sourceid>GOX</sourceid><recordid>eNotj09LwzAYh4MgOOY-gCcDnlvzp2_SehtVN6Hgwd3L2zSVjC2pSSfz2zs3T7_Lw4_nIeSOs7woAdgjxqP7zgVjPOdMFuyKzISUPCsLIW7IIqUtY0woLQDkjFTP1o60sRi98590CJGune8d3djjROsdpuQGZ3BywT_RJa3DfsToUvC35HrAXbKL_52Tj9eXTb3OmvfVW71sMgTBM8OUtYC8GExRgTIgK-wAlEDQndWWa6l02fEBhC77XoJRvTS6Mgp7Kzo5J_eX13NWO0a3x_jT_uW157wT8XAhxhi-DjZN7TYcoj8ptUIWJQcuFZe_OihRIA</recordid><startdate>20200119</startdate><enddate>20200119</enddate><creator>Joshi, Ramchandra</creator><creator>Goel, Purvi</creator><creator>Joshi, Raviraj</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20200119</creationdate><title>Deep Learning for Hindi Text Classification: A Comparison</title><author>Joshi, Ramchandra ; Goel, Purvi ; Joshi, Raviraj</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a521-c06ee5a14fc4956c539ab5562a57be7e173678b1f5278dd35c6d3c79c6ade2b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Classification</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Information Retrieval</topic><topic>Computer Science - Learning</topic><topic>Deep learning</topic><topic>Evaluation</topic><topic>Hindi language</topic><topic>Machine learning</topic><topic>Natural language</topic><topic>Natural language processing</topic><topic>Statistics - Machine Learning</topic><topic>Text editing</topic><toplevel>online_resources</toplevel><creatorcontrib>Joshi, Ramchandra</creatorcontrib><creatorcontrib>Goel, Purvi</creatorcontrib><creatorcontrib>Joshi, Raviraj</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Joshi, Ramchandra</au><au>Goel, Purvi</au><au>Joshi, Raviraj</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep Learning for Hindi Text Classification: A Comparison</atitle><jtitle>arXiv.org</jtitle><date>2020-01-19</date><risdate>2020</risdate><eissn>2331-8422</eissn><abstract>Natural Language Processing (NLP) and especially natural language text analysis have seen great advances in recent times. Usage of deep learning in text processing has revolutionized the techniques for text processing and achieved remarkable results. Different deep learning architectures like CNN, LSTM, and very recent Transformer have been used to achieve state of the art results variety on NLP tasks. In this work, we survey a host of deep learning architectures for text classification tasks. The work is specifically concerned with the classification of Hindi text. The research in the classification of morphologically rich and low resource Hindi language written in Devanagari script has been limited due to the absence of large labeled corpus. In this work, we used translated versions of English data-sets to evaluate models based on CNN, LSTM and Attention. Multilingual pre-trained sentence embeddings based on BERT and LASER are also compared to evaluate their effectiveness for the Hindi language. The paper also serves as a tutorial for popular text classification techniques.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2001.10340</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2020-01 |
issn | 2331-8422 |
language | eng |
recordid | cdi_arxiv_primary_2001_10340 |
source | arXiv.org; Free E- Journals |
subjects | Classification Computer Science - Computation and Language Computer Science - Information Retrieval Computer Science - Learning Deep learning Evaluation Hindi language Machine learning Natural language Natural language processing Statistics - Machine Learning Text editing |
title | Deep Learning for Hindi Text Classification: A Comparison |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T05%3A08%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20Learning%20for%20Hindi%20Text%20Classification:%20A%20Comparison&rft.jtitle=arXiv.org&rft.au=Joshi,%20Ramchandra&rft.date=2020-01-19&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2001.10340&rft_dat=%3Cproquest_arxiv%3E2348151361%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2348151361&rft_id=info:pmid/&rfr_iscdi=true |