Roman Urdu Word Variations and Normalized Sentiment Review Dataset (RUWV-NSR)

We have developed two unique Roman Urdu datasets, translated into English. The first dataset focuses on Roman Urdu words and their spelling variations. This dataset is structured in an Excel file with five columns labeled "Var-1" to "Var-5," each representing up to five variation...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Ahmed, Mudasar
Format: Dataset
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Ahmed, Mudasar
description We have developed two unique Roman Urdu datasets, translated into English. The first dataset focuses on Roman Urdu words and their spelling variations. This dataset is structured in an Excel file with five columns labeled "Var-1" to "Var-5," each representing up to five variations of Roman Urdu spellings for individual words. The final column, "common," contains the most frequently used spelling for each word. In total, this dataset includes 5,244 unique Roman Urdu words, which, when combined with their variations, amount to 19,527 words. The second dataset contains Roman Urdu reviews, each labeled with a sentiment. Given the variability in Roman Urdu spellings found on the web, where users often create their own spelling variations, we have normalized the spelling of words across these reviews. This dataset is the first of its kind, containing the largest collection of Roman Urdu reviews, with a total of 28,090 reviews categorized into five sentiment classes. This dataset is particularly valuable for analyzing Roman Urdu content in contexts such as online product reviews or Roman Urdu articles, which are becoming increasingly common. It offers significant potential for sentiment analysis and language processing applications.
doi_str_mv 10.17632/v5jfhsvtmd.5
format Dataset
fullrecord <record><control><sourceid>datacite_PQ8</sourceid><recordid>TN_cdi_datacite_primary_10_17632_v5jfhsvtmd_5</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_17632_v5jfhsvtmd_5</sourcerecordid><originalsourceid>FETCH-datacite_primary_10_17632_v5jfhsvtmd_53</originalsourceid><addsrcrecordid>eNqVjr0OgjAYALs4GHV0_0YdiiBBH8CfuMhQBMbmiy2xhramrRh9eokxcXa5W244QqZJHCXrVbpcdNm1ufguaBFlQ3JkVqOB0ok71NYJqNApDMoaD2gE5NZpbNVLCiikCUr3ACY7JR-wxYBeBpixsq5oXrD5mAwabL2cfD0idL87bQ5U9OlZBclvTml0T57E_LPDfzs8S__t3xC6ReM</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>dataset</recordtype></control><display><type>dataset</type><title>Roman Urdu Word Variations and Normalized Sentiment Review Dataset (RUWV-NSR)</title><source>DataCite</source><creator>Ahmed, Mudasar</creator><creatorcontrib>Ahmed, Mudasar</creatorcontrib><description>We have developed two unique Roman Urdu datasets, translated into English. The first dataset focuses on Roman Urdu words and their spelling variations. This dataset is structured in an Excel file with five columns labeled "Var-1" to "Var-5," each representing up to five variations of Roman Urdu spellings for individual words. The final column, "common," contains the most frequently used spelling for each word. In total, this dataset includes 5,244 unique Roman Urdu words, which, when combined with their variations, amount to 19,527 words. The second dataset contains Roman Urdu reviews, each labeled with a sentiment. Given the variability in Roman Urdu spellings found on the web, where users often create their own spelling variations, we have normalized the spelling of words across these reviews. This dataset is the first of its kind, containing the largest collection of Roman Urdu reviews, with a total of 28,090 reviews categorized into five sentiment classes. This dataset is particularly valuable for analyzing Roman Urdu content in contexts such as online product reviews or Roman Urdu articles, which are becoming increasingly common. It offers significant potential for sentiment analysis and language processing applications.</description><identifier>DOI: 10.17632/v5jfhsvtmd.5</identifier><language>eng</language><publisher>Mendeley Data</publisher><subject>Sentiment Analysis ; Statistical Natural Language Processing ; Text Mining</subject><creationdate>2024</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,1892</link.rule.ids><linktorsrc>$$Uhttps://commons.datacite.org/doi.org/10.17632/v5jfhsvtmd.5$$EView_record_in_DataCite.org$$FView_record_in_$$GDataCite.org$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Ahmed, Mudasar</creatorcontrib><title>Roman Urdu Word Variations and Normalized Sentiment Review Dataset (RUWV-NSR)</title><description>We have developed two unique Roman Urdu datasets, translated into English. The first dataset focuses on Roman Urdu words and their spelling variations. This dataset is structured in an Excel file with five columns labeled "Var-1" to "Var-5," each representing up to five variations of Roman Urdu spellings for individual words. The final column, "common," contains the most frequently used spelling for each word. In total, this dataset includes 5,244 unique Roman Urdu words, which, when combined with their variations, amount to 19,527 words. The second dataset contains Roman Urdu reviews, each labeled with a sentiment. Given the variability in Roman Urdu spellings found on the web, where users often create their own spelling variations, we have normalized the spelling of words across these reviews. This dataset is the first of its kind, containing the largest collection of Roman Urdu reviews, with a total of 28,090 reviews categorized into five sentiment classes. This dataset is particularly valuable for analyzing Roman Urdu content in contexts such as online product reviews or Roman Urdu articles, which are becoming increasingly common. It offers significant potential for sentiment analysis and language processing applications.</description><subject>Sentiment Analysis</subject><subject>Statistical Natural Language Processing</subject><subject>Text Mining</subject><fulltext>true</fulltext><rsrctype>dataset</rsrctype><creationdate>2024</creationdate><recordtype>dataset</recordtype><sourceid>PQ8</sourceid><recordid>eNqVjr0OgjAYALs4GHV0_0YdiiBBH8CfuMhQBMbmiy2xhramrRh9eokxcXa5W244QqZJHCXrVbpcdNm1ufguaBFlQ3JkVqOB0ok71NYJqNApDMoaD2gE5NZpbNVLCiikCUr3ACY7JR-wxYBeBpixsq5oXrD5mAwabL2cfD0idL87bQ5U9OlZBclvTml0T57E_LPDfzs8S__t3xC6ReM</recordid><startdate>20241007</startdate><enddate>20241007</enddate><creator>Ahmed, Mudasar</creator><general>Mendeley Data</general><scope>DYCCY</scope><scope>PQ8</scope></search><sort><creationdate>20241007</creationdate><title>Roman Urdu Word Variations and Normalized Sentiment Review Dataset (RUWV-NSR)</title><author>Ahmed, Mudasar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-datacite_primary_10_17632_v5jfhsvtmd_53</frbrgroupid><rsrctype>datasets</rsrctype><prefilter>datasets</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Sentiment Analysis</topic><topic>Statistical Natural Language Processing</topic><topic>Text Mining</topic><toplevel>online_resources</toplevel><creatorcontrib>Ahmed, Mudasar</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ahmed, Mudasar</au><format>book</format><genre>unknown</genre><ristype>DATA</ristype><title>Roman Urdu Word Variations and Normalized Sentiment Review Dataset (RUWV-NSR)</title><date>2024-10-07</date><risdate>2024</risdate><abstract>We have developed two unique Roman Urdu datasets, translated into English. The first dataset focuses on Roman Urdu words and their spelling variations. This dataset is structured in an Excel file with five columns labeled "Var-1" to "Var-5," each representing up to five variations of Roman Urdu spellings for individual words. The final column, "common," contains the most frequently used spelling for each word. In total, this dataset includes 5,244 unique Roman Urdu words, which, when combined with their variations, amount to 19,527 words. The second dataset contains Roman Urdu reviews, each labeled with a sentiment. Given the variability in Roman Urdu spellings found on the web, where users often create their own spelling variations, we have normalized the spelling of words across these reviews. This dataset is the first of its kind, containing the largest collection of Roman Urdu reviews, with a total of 28,090 reviews categorized into five sentiment classes. This dataset is particularly valuable for analyzing Roman Urdu content in contexts such as online product reviews or Roman Urdu articles, which are becoming increasingly common. It offers significant potential for sentiment analysis and language processing applications.</abstract><pub>Mendeley Data</pub><doi>10.17632/v5jfhsvtmd.5</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.17632/v5jfhsvtmd.5
ispartof
issn
language eng
recordid cdi_datacite_primary_10_17632_v5jfhsvtmd_5
source DataCite
subjects Sentiment Analysis
Statistical Natural Language Processing
Text Mining
title Roman Urdu Word Variations and Normalized Sentiment Review Dataset (RUWV-NSR)
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T04%3A21%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite_PQ8&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.au=Ahmed,%20Mudasar&rft.date=2024-10-07&rft_id=info:doi/10.17632/v5jfhsvtmd.5&rft_dat=%3Cdatacite_PQ8%3E10_17632_v5jfhsvtmd_5%3C/datacite_PQ8%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true