Curated-Processed-Reannotated Turkish e-commerce sentimet analysis dataset
The dataset was compiled from publicly available sources, including Hugging Face, GitHub, and Kaggle. To ensure data quality, we performed preprocessing steps such as deduplication, removal of non-Turkish entries, and exclusion of short reviews (fewer than three words). Python and the pandas library...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Dataset |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Ezin, Ercan Savran Kiziltepe, Rukiye |
description | The dataset was compiled from publicly available sources, including Hugging Face, GitHub, and Kaggle. To ensure data quality, we performed preprocessing steps such as deduplication, removal of non-Turkish entries, and exclusion of short reviews (fewer than three words). Python and the pandas library were used for data cleaning and formatting.
For sentiment labeling, we used ChatGPT4-o-mini in a zero-shot approach, batch-processing approximately 100 reviews per request. We chose zero-shot labeling after observing that providing additional instructions led to a decline in labeling accuracy with ChatGPT4-o-mini. The prompt instructed the model to classify each review’s sentiment as Positive, Negative, or Neutral, without any specific examples or prior information. The prompt format was:
• System Message: "You are a sentiment analysis assistant."
• User Message: "Please analyze the sentiment of the following review dictionary and return the result in the format 'id,label' where label should be one of these: Positive, Negative, or Neutral."
This zero-shot approach resulted in high consistency, which we validated by comparing the model's output with human annotations, observing a strong correlation in sentiment labeling accuracy. This ensured reliable labeling across the entire dataset. |
doi_str_mv | 10.17632/nvkcfnkh47.1 |
format | Dataset |
fullrecord | <record><control><sourceid>datacite_PQ8</sourceid><recordid>TN_cdi_datacite_primary_10_17632_nvkcfnkh47_1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_17632_nvkcfnkh47_1</sourcerecordid><originalsourceid>FETCH-datacite_primary_10_17632_nvkcfnkh47_13</originalsourceid><addsrcrecordid>eNqVjrEKwjAURbM4iDq65wdSGyv2A4oiTiLdwyN9pSFNKnmvQv9eK4Kz073cc4cjxFbnmS6PxX4Xn9620XeHMtNLca3GBIyNuqXBItG73RFiHHheZT0m76iTqOwQAiaLkjCyC8gSIvQTOZINMBDyWixa6Ak331wJdT7V1UXN3DpG80guQJqMzs3HxfxcjC7-_b8A5vdF0Q</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>dataset</recordtype></control><display><type>dataset</type><title>Curated-Processed-Reannotated Turkish e-commerce sentimet analysis dataset</title><source>DataCite</source><creator>Ezin, Ercan ; Savran Kiziltepe, Rukiye</creator><creatorcontrib>Ezin, Ercan ; Savran Kiziltepe, Rukiye</creatorcontrib><description>The dataset was compiled from publicly available sources, including Hugging Face, GitHub, and Kaggle. To ensure data quality, we performed preprocessing steps such as deduplication, removal of non-Turkish entries, and exclusion of short reviews (fewer than three words). Python and the pandas library were used for data cleaning and formatting.
For sentiment labeling, we used ChatGPT4-o-mini in a zero-shot approach, batch-processing approximately 100 reviews per request. We chose zero-shot labeling after observing that providing additional instructions led to a decline in labeling accuracy with ChatGPT4-o-mini. The prompt instructed the model to classify each review’s sentiment as Positive, Negative, or Neutral, without any specific examples or prior information. The prompt format was:
• System Message: "You are a sentiment analysis assistant."
• User Message: "Please analyze the sentiment of the following review dictionary and return the result in the format 'id,label' where label should be one of these: Positive, Negative, or Neutral."
This zero-shot approach resulted in high consistency, which we validated by comparing the model's output with human annotations, observing a strong correlation in sentiment labeling accuracy. This ensured reliable labeling across the entire dataset.</description><identifier>DOI: 10.17632/nvkcfnkh47.1</identifier><language>eng</language><publisher>Mendeley Data</publisher><subject>e-Commerce ; Large Language Model ; Sentiment Analysis ; Turkish Language</subject><creationdate>2024</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-3862-7621</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>781,1895</link.rule.ids><linktorsrc>$$Uhttps://commons.datacite.org/doi.org/10.17632/nvkcfnkh47.1$$EView_record_in_DataCite.org$$FView_record_in_$$GDataCite.org$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Ezin, Ercan</creatorcontrib><creatorcontrib>Savran Kiziltepe, Rukiye</creatorcontrib><title>Curated-Processed-Reannotated Turkish e-commerce sentimet analysis dataset</title><description>The dataset was compiled from publicly available sources, including Hugging Face, GitHub, and Kaggle. To ensure data quality, we performed preprocessing steps such as deduplication, removal of non-Turkish entries, and exclusion of short reviews (fewer than three words). Python and the pandas library were used for data cleaning and formatting.
For sentiment labeling, we used ChatGPT4-o-mini in a zero-shot approach, batch-processing approximately 100 reviews per request. We chose zero-shot labeling after observing that providing additional instructions led to a decline in labeling accuracy with ChatGPT4-o-mini. The prompt instructed the model to classify each review’s sentiment as Positive, Negative, or Neutral, without any specific examples or prior information. The prompt format was:
• System Message: "You are a sentiment analysis assistant."
• User Message: "Please analyze the sentiment of the following review dictionary and return the result in the format 'id,label' where label should be one of these: Positive, Negative, or Neutral."
This zero-shot approach resulted in high consistency, which we validated by comparing the model's output with human annotations, observing a strong correlation in sentiment labeling accuracy. This ensured reliable labeling across the entire dataset.</description><subject>e-Commerce</subject><subject>Large Language Model</subject><subject>Sentiment Analysis</subject><subject>Turkish Language</subject><fulltext>true</fulltext><rsrctype>dataset</rsrctype><creationdate>2024</creationdate><recordtype>dataset</recordtype><sourceid>PQ8</sourceid><recordid>eNqVjrEKwjAURbM4iDq65wdSGyv2A4oiTiLdwyN9pSFNKnmvQv9eK4Kz073cc4cjxFbnmS6PxX4Xn9620XeHMtNLca3GBIyNuqXBItG73RFiHHheZT0m76iTqOwQAiaLkjCyC8gSIvQTOZINMBDyWixa6Ak331wJdT7V1UXN3DpG80guQJqMzs3HxfxcjC7-_b8A5vdF0Q</recordid><startdate>20241029</startdate><enddate>20241029</enddate><creator>Ezin, Ercan</creator><creator>Savran Kiziltepe, Rukiye</creator><general>Mendeley Data</general><scope>DYCCY</scope><scope>PQ8</scope><orcidid>https://orcid.org/0000-0002-3862-7621</orcidid></search><sort><creationdate>20241029</creationdate><title>Curated-Processed-Reannotated Turkish e-commerce sentimet analysis dataset</title><author>Ezin, Ercan ; Savran Kiziltepe, Rukiye</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-datacite_primary_10_17632_nvkcfnkh47_13</frbrgroupid><rsrctype>datasets</rsrctype><prefilter>datasets</prefilter><language>eng</language><creationdate>2024</creationdate><topic>e-Commerce</topic><topic>Large Language Model</topic><topic>Sentiment Analysis</topic><topic>Turkish Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Ezin, Ercan</creatorcontrib><creatorcontrib>Savran Kiziltepe, Rukiye</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ezin, Ercan</au><au>Savran Kiziltepe, Rukiye</au><format>book</format><genre>unknown</genre><ristype>DATA</ristype><title>Curated-Processed-Reannotated Turkish e-commerce sentimet analysis dataset</title><date>2024-10-29</date><risdate>2024</risdate><abstract>The dataset was compiled from publicly available sources, including Hugging Face, GitHub, and Kaggle. To ensure data quality, we performed preprocessing steps such as deduplication, removal of non-Turkish entries, and exclusion of short reviews (fewer than three words). Python and the pandas library were used for data cleaning and formatting.
For sentiment labeling, we used ChatGPT4-o-mini in a zero-shot approach, batch-processing approximately 100 reviews per request. We chose zero-shot labeling after observing that providing additional instructions led to a decline in labeling accuracy with ChatGPT4-o-mini. The prompt instructed the model to classify each review’s sentiment as Positive, Negative, or Neutral, without any specific examples or prior information. The prompt format was:
• System Message: "You are a sentiment analysis assistant."
• User Message: "Please analyze the sentiment of the following review dictionary and return the result in the format 'id,label' where label should be one of these: Positive, Negative, or Neutral."
This zero-shot approach resulted in high consistency, which we validated by comparing the model's output with human annotations, observing a strong correlation in sentiment labeling accuracy. This ensured reliable labeling across the entire dataset.</abstract><pub>Mendeley Data</pub><doi>10.17632/nvkcfnkh47.1</doi><orcidid>https://orcid.org/0000-0002-3862-7621</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.17632/nvkcfnkh47.1 |
ispartof | |
issn | |
language | eng |
recordid | cdi_datacite_primary_10_17632_nvkcfnkh47_1 |
source | DataCite |
subjects | e-Commerce Large Language Model Sentiment Analysis Turkish Language |
title | Curated-Processed-Reannotated Turkish e-commerce sentimet analysis dataset |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-16T05%3A22%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite_PQ8&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.au=Ezin,%20Ercan&rft.date=2024-10-29&rft_id=info:doi/10.17632/nvkcfnkh47.1&rft_dat=%3Cdatacite_PQ8%3E10_17632_nvkcfnkh47_1%3C/datacite_PQ8%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |