Curated-Processed-Reannotated Turkish e-commerce sentimet analysis dataset

The dataset was compiled from publicly available sources, including Hugging Face, GitHub, and Kaggle. To ensure data quality, we performed preprocessing steps such as deduplication, removal of non-Turkish entries, and exclusion of short reviews (fewer than three words). Python and the pandas library...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Ezin, Ercan, Savran Kiziltepe, Rukiye
Format: Dataset
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Ezin, Ercan
Savran Kiziltepe, Rukiye
description The dataset was compiled from publicly available sources, including Hugging Face, GitHub, and Kaggle. To ensure data quality, we performed preprocessing steps such as deduplication, removal of non-Turkish entries, and exclusion of short reviews (fewer than three words). Python and the pandas library were used for data cleaning and formatting. For sentiment labeling, we used ChatGPT4-o-mini in a zero-shot approach, batch-processing approximately 100 reviews per request. We chose zero-shot labeling after observing that providing additional instructions led to a decline in labeling accuracy with ChatGPT4-o-mini. The prompt instructed the model to classify each review’s sentiment as Positive, Negative, or Neutral, without any specific examples or prior information. The prompt format was: • System Message: "You are a sentiment analysis assistant." • User Message: "Please analyze the sentiment of the following review dictionary and return the result in the format 'id,label' where label should be one of these: Positive, Negative, or Neutral." This zero-shot approach resulted in high consistency, which we validated by comparing the model's output with human annotations, observing a strong correlation in sentiment labeling accuracy. This ensured reliable labeling across the entire dataset.
doi_str_mv 10.17632/nvkcfnkh47.1
format Dataset
fullrecord <record><control><sourceid>datacite_PQ8</sourceid><recordid>TN_cdi_datacite_primary_10_17632_nvkcfnkh47_1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_17632_nvkcfnkh47_1</sourcerecordid><originalsourceid>FETCH-datacite_primary_10_17632_nvkcfnkh47_13</originalsourceid><addsrcrecordid>eNqVjrEKwjAURbM4iDq65wdSGyv2A4oiTiLdwyN9pSFNKnmvQv9eK4Kz073cc4cjxFbnmS6PxX4Xn9620XeHMtNLca3GBIyNuqXBItG73RFiHHheZT0m76iTqOwQAiaLkjCyC8gSIvQTOZINMBDyWixa6Ak331wJdT7V1UXN3DpG80guQJqMzs3HxfxcjC7-_b8A5vdF0Q</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>dataset</recordtype></control><display><type>dataset</type><title>Curated-Processed-Reannotated Turkish e-commerce sentimet analysis dataset</title><source>DataCite</source><creator>Ezin, Ercan ; Savran Kiziltepe, Rukiye</creator><creatorcontrib>Ezin, Ercan ; Savran Kiziltepe, Rukiye</creatorcontrib><description>The dataset was compiled from publicly available sources, including Hugging Face, GitHub, and Kaggle. To ensure data quality, we performed preprocessing steps such as deduplication, removal of non-Turkish entries, and exclusion of short reviews (fewer than three words). Python and the pandas library were used for data cleaning and formatting. For sentiment labeling, we used ChatGPT4-o-mini in a zero-shot approach, batch-processing approximately 100 reviews per request. We chose zero-shot labeling after observing that providing additional instructions led to a decline in labeling accuracy with ChatGPT4-o-mini. The prompt instructed the model to classify each review’s sentiment as Positive, Negative, or Neutral, without any specific examples or prior information. The prompt format was: • System Message: "You are a sentiment analysis assistant." • User Message: "Please analyze the sentiment of the following review dictionary and return the result in the format 'id,label' where label should be one of these: Positive, Negative, or Neutral." This zero-shot approach resulted in high consistency, which we validated by comparing the model's output with human annotations, observing a strong correlation in sentiment labeling accuracy. This ensured reliable labeling across the entire dataset.</description><identifier>DOI: 10.17632/nvkcfnkh47.1</identifier><language>eng</language><publisher>Mendeley Data</publisher><subject>e-Commerce ; Large Language Model ; Sentiment Analysis ; Turkish Language</subject><creationdate>2024</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-3862-7621</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>781,1895</link.rule.ids><linktorsrc>$$Uhttps://commons.datacite.org/doi.org/10.17632/nvkcfnkh47.1$$EView_record_in_DataCite.org$$FView_record_in_$$GDataCite.org$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Ezin, Ercan</creatorcontrib><creatorcontrib>Savran Kiziltepe, Rukiye</creatorcontrib><title>Curated-Processed-Reannotated Turkish e-commerce sentimet analysis dataset</title><description>The dataset was compiled from publicly available sources, including Hugging Face, GitHub, and Kaggle. To ensure data quality, we performed preprocessing steps such as deduplication, removal of non-Turkish entries, and exclusion of short reviews (fewer than three words). Python and the pandas library were used for data cleaning and formatting. For sentiment labeling, we used ChatGPT4-o-mini in a zero-shot approach, batch-processing approximately 100 reviews per request. We chose zero-shot labeling after observing that providing additional instructions led to a decline in labeling accuracy with ChatGPT4-o-mini. The prompt instructed the model to classify each review’s sentiment as Positive, Negative, or Neutral, without any specific examples or prior information. The prompt format was: • System Message: "You are a sentiment analysis assistant." • User Message: "Please analyze the sentiment of the following review dictionary and return the result in the format 'id,label' where label should be one of these: Positive, Negative, or Neutral." This zero-shot approach resulted in high consistency, which we validated by comparing the model's output with human annotations, observing a strong correlation in sentiment labeling accuracy. This ensured reliable labeling across the entire dataset.</description><subject>e-Commerce</subject><subject>Large Language Model</subject><subject>Sentiment Analysis</subject><subject>Turkish Language</subject><fulltext>true</fulltext><rsrctype>dataset</rsrctype><creationdate>2024</creationdate><recordtype>dataset</recordtype><sourceid>PQ8</sourceid><recordid>eNqVjrEKwjAURbM4iDq65wdSGyv2A4oiTiLdwyN9pSFNKnmvQv9eK4Kz073cc4cjxFbnmS6PxX4Xn9620XeHMtNLca3GBIyNuqXBItG73RFiHHheZT0m76iTqOwQAiaLkjCyC8gSIvQTOZINMBDyWixa6Ak331wJdT7V1UXN3DpG80guQJqMzs3HxfxcjC7-_b8A5vdF0Q</recordid><startdate>20241029</startdate><enddate>20241029</enddate><creator>Ezin, Ercan</creator><creator>Savran Kiziltepe, Rukiye</creator><general>Mendeley Data</general><scope>DYCCY</scope><scope>PQ8</scope><orcidid>https://orcid.org/0000-0002-3862-7621</orcidid></search><sort><creationdate>20241029</creationdate><title>Curated-Processed-Reannotated Turkish e-commerce sentimet analysis dataset</title><author>Ezin, Ercan ; Savran Kiziltepe, Rukiye</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-datacite_primary_10_17632_nvkcfnkh47_13</frbrgroupid><rsrctype>datasets</rsrctype><prefilter>datasets</prefilter><language>eng</language><creationdate>2024</creationdate><topic>e-Commerce</topic><topic>Large Language Model</topic><topic>Sentiment Analysis</topic><topic>Turkish Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Ezin, Ercan</creatorcontrib><creatorcontrib>Savran Kiziltepe, Rukiye</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ezin, Ercan</au><au>Savran Kiziltepe, Rukiye</au><format>book</format><genre>unknown</genre><ristype>DATA</ristype><title>Curated-Processed-Reannotated Turkish e-commerce sentimet analysis dataset</title><date>2024-10-29</date><risdate>2024</risdate><abstract>The dataset was compiled from publicly available sources, including Hugging Face, GitHub, and Kaggle. To ensure data quality, we performed preprocessing steps such as deduplication, removal of non-Turkish entries, and exclusion of short reviews (fewer than three words). Python and the pandas library were used for data cleaning and formatting. For sentiment labeling, we used ChatGPT4-o-mini in a zero-shot approach, batch-processing approximately 100 reviews per request. We chose zero-shot labeling after observing that providing additional instructions led to a decline in labeling accuracy with ChatGPT4-o-mini. The prompt instructed the model to classify each review’s sentiment as Positive, Negative, or Neutral, without any specific examples or prior information. The prompt format was: • System Message: "You are a sentiment analysis assistant." • User Message: "Please analyze the sentiment of the following review dictionary and return the result in the format 'id,label' where label should be one of these: Positive, Negative, or Neutral." This zero-shot approach resulted in high consistency, which we validated by comparing the model's output with human annotations, observing a strong correlation in sentiment labeling accuracy. This ensured reliable labeling across the entire dataset.</abstract><pub>Mendeley Data</pub><doi>10.17632/nvkcfnkh47.1</doi><orcidid>https://orcid.org/0000-0002-3862-7621</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.17632/nvkcfnkh47.1
ispartof
issn
language eng
recordid cdi_datacite_primary_10_17632_nvkcfnkh47_1
source DataCite
subjects e-Commerce
Large Language Model
Sentiment Analysis
Turkish Language
title Curated-Processed-Reannotated Turkish e-commerce sentimet analysis dataset
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-16T05%3A22%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite_PQ8&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.au=Ezin,%20Ercan&rft.date=2024-10-29&rft_id=info:doi/10.17632/nvkcfnkh47.1&rft_dat=%3Cdatacite_PQ8%3E10_17632_nvkcfnkh47_1%3C/datacite_PQ8%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true