Clustering Tasks and Decision Trees with Elegiac Poets
The dataset contains files generated during a Natural Language Processing (NLP) and automatic text analysis task. Attached is a Jupyter notebook with the complete code, along with several Excel files (.xlsx) containing organized information. Additionally, there are three folders that include files g...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Dataset |
Sprache: | spa |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The dataset contains files generated during a Natural Language Processing (NLP) and automatic text analysis task. Attached is a Jupyter notebook with the complete code, along with several Excel files (.xlsx) containing organized information. Additionally, there are three folders that include files generated during the Silhouette calculation, K-means clustering, and feature extraction using decision trees.
The three folders are:1. Silhouette Calculation: Contains PNG images of Silhouette plots for various analysis configurations.2. K-means Clustering: Contains pickle (.pkl) files with features and labels for each combination of excluded author, n-gram type, n-gram range, and matrix type.3. Feature Extraction: Contains CSV files with lists of documents by cluster and the most important features along with information gain and information gain ratio metrics.
Other file formats included in the dataset are:- CSV files containing Silhouette scores, optimal clustering results, cluster assignments, and optimal cluster assignments.- PNG images of scatter plots colored by author and by cluster.- Pickle files containing the top features extracted during the analysis. |
---|---|
DOI: | 10.5281/zenodo.12682693 |