Curated-Processed-Reannotated Turkish e-commerce sentimet analysis dataset

The dataset was compiled from publicly available sources, including Hugging Face, GitHub, and Kaggle. To ensure data quality, we performed preprocessing steps such as deduplication, removal of non-Turkish entries, and exclusion of short reviews (fewer than three words). Python and the pandas library...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ezin, Ercan, Savran Kiziltepe, Rukiye
Format:	Dataset
Sprache:	eng
Schlagworte:	e-Commerce Large Language Model Sentiment Analysis Turkish Language
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Ezin, Ercan Savran Kiziltepe, Rukiye
description	The dataset was compiled from publicly available sources, including Hugging Face, GitHub, and Kaggle. To ensure data quality, we performed preprocessing steps such as deduplication, removal of non-Turkish entries, and exclusion of short reviews (fewer than three words). Python and the pandas library were used for data cleaning and formatting. For sentiment labeling, we used ChatGPT4-o-mini in a zero-shot approach, batch-processing approximately 100 reviews per request. We chose zero-shot labeling after observing that providing additional instructions led to a decline in labeling accuracy with ChatGPT4-o-mini. The prompt instructed the model to classify each review’s sentiment as Positive, Negative, or Neutral, without any specific examples or prior information. The prompt format was: • System Message: "You are a sentiment analysis assistant." • User Message: "Please analyze the sentiment of the following review dictionary and return the result in the format 'id,label' where label should be one of these: Positive, Negative, or Neutral." This zero-shot approach resulted in high consistency, which we validated by comparing the model's output with human annotations, observing a strong correlation in sentiment labeling accuracy. This ensured reliable labeling across the entire dataset.
doi_str_mv	10.17632/nvkcfnkh47.1
format	Dataset
fullrecord	<record><control><sourceid>datacite_PQ8</sourceid><recordid>TN_cdi_datacite_primary_10_17632_nvkcfnkh47_1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_17632_nvkcfnkh47_1</sourcerecordid><originalsourceid>FETCH-datacite_primary_10_17632_nvkcfnkh47_13</originalsourceid><addsrcrecordid>eNqVjrEKwjAURbM4iDq65wdSGyv2A4oiTiLdwyN9pSFNKnmvQv9eK4Kz073cc4cjxFbnmS6PxX4Xn9620XeHMtNLca3GBIyNuqXBItG73RFiHHheZT0m76iTqOwQAiaLkjCyC8gSIvQTOZINMBDyWixa6Ak331wJdT7V1UXN3DpG80guQJqMzs3HxfxcjC7-_b8A5vdF0Q</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>dataset</recordtype></control><display><type>dataset</type><title>Curated-Processed-Reannotated Turkish e-commerce sentimet analysis dataset</title><source>DataCite</source><creator>Ezin, Ercan ; Savran Kiziltepe, Rukiye</creator><creatorcontrib>Ezin, Ercan ; Savran Kiziltepe, Rukiye</creatorcontrib><description>The dataset was compiled from publicly available sources, including Hugging Face, GitHub, and Kaggle. To ensure data quality, we performed preprocessing steps such as deduplication, removal of non-Turkish entries, and exclusion of short reviews (fewer than three words). Python and the pandas library were used for data cleaning and formatting. For sentiment labeling, we used ChatGPT4-o-mini in a zero-shot approach, batch-processing approximately 100 reviews per request. We chose zero-shot labeling after observing that providing additional instructions led to a decline in labeling accuracy with ChatGPT4-o-mini. The prompt instructed the model to classify each review’s sentiment as Positive, Negative, or Neutral, without any specific examples or prior information. The prompt format was: • System Message: "You are a sentiment analysis assistant." • User Message: "Please analyze the sentiment of the following review dictionary and return the result in the format 'id,label' where label should be one of these: Positive, Negative, or Neutral." This zero-shot approach resulted in high consistency, which we validated by comparing the model's output with human annotations, observing a strong correlation in sentiment labeling accuracy. This ensured reliable labeling across the entire dataset.</description><identifier>DOI: 10.17632/nvkcfnkh47.1</identifier><language>eng</language><publisher>Mendeley Data</publisher><subject>e-Commerce ; Large Language Model ; Sentiment Analysis ; Turkish Language</subject><creationdate>2024</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-3862-7621</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>781,1895</link.rule.ids><linktorsrc>$$Uhttps://commons.datacite.org/doi.org/10.17632/nvkcfnkh47.1$$EView_record_in_DataCite.org$$FView_record_in_$$GDataCite.org$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Ezin, Ercan</creatorcontrib><creatorcontrib>Savran Kiziltepe, Rukiye</creatorcontrib><title>Curated-Processed-Reannotated Turkish e-commerce sentimet analysis dataset</title><description>The dataset was compiled from publicly available sources, including Hugging Face, GitHub, and Kaggle. To ensure data quality, we performed preprocessing steps such as deduplication, removal of non-Turkish entries, and exclusion of short reviews (fewer than three words). Python and the pandas library were used for data cleaning and formatting. For sentiment labeling, we used ChatGPT4-o-mini in a zero-shot approach, batch-processing approximately 100 reviews per request. We chose zero-shot labeling after observing that providing additional instructions led to a decline in labeling accuracy with ChatGPT4-o-mini. The prompt instructed the model to classify each review’s sentiment as Positive, Negative, or Neutral, without any specific examples or prior information. The prompt format was: • System Message: "You are a sentiment analysis assistant." • User Message: "Please analyze the sentiment of the following review dictionary and return the result in the format 'id,label' where label should be one of these: Positive, Negative, or Neutral." This zero-shot approach resulted in high consistency, which we validated by comparing the model's output with human annotations, observing a strong correlation in sentiment labeling accuracy. This ensured reliable labeling across the entire dataset.</description><subject>e-Commerce</subject><subject>Large Language Model</subject><subject>Sentiment Analysis</subject><subject>Turkish Language</subject><fulltext>true</fulltext><rsrctype>dataset</rsrctype><creationdate>2024</creationdate><recordtype>dataset</recordtype><sourceid>PQ8</sourceid><recordid>eNqVjrEKwjAURbM4iDq65wdSGyv2A4oiTiLdwyN9pSFNKnmvQv9eK4Kz073cc4cjxFbnmS6PxX4Xn9620XeHMtNLca3GBIyNuqXBItG73RFiHHheZT0m76iTqOwQAiaLkjCyC8gSIvQTOZINMBDyWixa6Ak331wJdT7V1UXN3DpG80guQJqMzs3HxfxcjC7-_b8A5vdF0Q</recordid><startdate>20241029</startdate><enddate>20241029</enddate><creator>Ezin, Ercan</creator><creator>Savran Kiziltepe, Rukiye</creator><general>Mendeley Data</general><scope>DYCCY</scope><scope>PQ8</scope><orcidid>https://orcid.org/0000-0002-3862-7621</orcidid></search><sort><creationdate>20241029</creationdate><title>Curated-Processed-Reannotated Turkish e-commerce sentimet analysis dataset</title><author>Ezin, Ercan ; Savran Kiziltepe, Rukiye</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-datacite_primary_10_17632_nvkcfnkh47_13</frbrgroupid><rsrctype>datasets</rsrctype><prefilter>datasets</prefilter><language>eng</language><creationdate>2024</creationdate><topic>e-Commerce</topic><topic>Large Language Model</topic><topic>Sentiment Analysis</topic><topic>Turkish Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Ezin, Ercan</creatorcontrib><creatorcontrib>Savran Kiziltepe, Rukiye</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ezin, Ercan</au><au>Savran Kiziltepe, Rukiye</au><format>book</format><genre>unknown</genre><ristype>DATA</ristype><title>Curated-Processed-Reannotated Turkish e-commerce sentimet analysis dataset</title><date>2024-10-29</date><risdate>2024</risdate><abstract>The dataset was compiled from publicly available sources, including Hugging Face, GitHub, and Kaggle. To ensure data quality, we performed preprocessing steps such as deduplication, removal of non-Turkish entries, and exclusion of short reviews (fewer than three words). Python and the pandas library were used for data cleaning and formatting. For sentiment labeling, we used ChatGPT4-o-mini in a zero-shot approach, batch-processing approximately 100 reviews per request. We chose zero-shot labeling after observing that providing additional instructions led to a decline in labeling accuracy with ChatGPT4-o-mini. The prompt instructed the model to classify each review’s sentiment as Positive, Negative, or Neutral, without any specific examples or prior information. The prompt format was: • System Message: "You are a sentiment analysis assistant." • User Message: "Please analyze the sentiment of the following review dictionary and return the result in the format 'id,label' where label should be one of these: Positive, Negative, or Neutral." This zero-shot approach resulted in high consistency, which we validated by comparing the model's output with human annotations, observing a strong correlation in sentiment labeling accuracy. This ensured reliable labeling across the entire dataset.</abstract><pub>Mendeley Data</pub><doi>10.17632/nvkcfnkh47.1</doi><orcidid>https://orcid.org/0000-0002-3862-7621</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.17632/nvkcfnkh47.1
ispartof
issn
language	eng
recordid	cdi_datacite_primary_10_17632_nvkcfnkh47_1
source	DataCite
subjects	e-Commerce Large Language Model Sentiment Analysis Turkish Language
title	Curated-Processed-Reannotated Turkish e-commerce sentimet analysis dataset
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-16T05%3A22%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite_PQ8&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.au=Ezin,%20Ercan&rft.date=2024-10-29&rft_id=info:doi/10.17632/nvkcfnkh47.1&rft_dat=%3Cdatacite_PQ8%3E10_17632_nvkcfnkh47_1%3C/datacite_PQ8%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true