Curated-Processed-Reannotated Turkish e-commerce sentimet analysis dataset

The dataset was compiled from publicly available sources, including Hugging Face, GitHub, and Kaggle. To ensure data quality, we performed preprocessing steps such as deduplication, removal of non-Turkish entries, and exclusion of short reviews (fewer than three words). Python and the pandas library...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Ezin, Ercan, Savran Kiziltepe, Rukiye
Format: Dataset
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The dataset was compiled from publicly available sources, including Hugging Face, GitHub, and Kaggle. To ensure data quality, we performed preprocessing steps such as deduplication, removal of non-Turkish entries, and exclusion of short reviews (fewer than three words). Python and the pandas library were used for data cleaning and formatting. For sentiment labeling, we used ChatGPT4-o-mini in a zero-shot approach, batch-processing approximately 100 reviews per request. We chose zero-shot labeling after observing that providing additional instructions led to a decline in labeling accuracy with ChatGPT4-o-mini. The prompt instructed the model to classify each review’s sentiment as Positive, Negative, or Neutral, without any specific examples or prior information. The prompt format was: • System Message: "You are a sentiment analysis assistant." • User Message: "Please analyze the sentiment of the following review dictionary and return the result in the format 'id,label' where label should be one of these: Positive, Negative, or Neutral." This zero-shot approach resulted in high consistency, which we validated by comparing the model's output with human annotations, observing a strong correlation in sentiment labeling accuracy. This ensured reliable labeling across the entire dataset.
DOI:10.17632/nvkcfnkh47.1