Product Length Predictions with Machine Learning: An Integrated Approach Using Extreme Gradient Boosting

The study aims to introduce a novel machine learning approach for the prediction of product lengths by addressing diverse data types including numeric, textual and categorical data and extracting valuable information from the dataset to enhance prediction accuracy. This is achieved by employing meth...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:SN computer science 2024-06, Vol.5 (6), p.659, Article 659
Hauptverfasser: Thakur, Abhishek, Kumar, Ankit, Mishra, Sudhansu Kumar, Behera, Subhendu Kumar, Sethi, Jagannath, Sahu, Sitanshu Sekhar, Swain, Subrat Kumar
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The study aims to introduce a novel machine learning approach for the prediction of product lengths by addressing diverse data types including numeric, textual and categorical data and extracting valuable information from the dataset to enhance prediction accuracy. This is achieved by employing methods that combine text vectorization, gradient boosting algorithm and feature encoding of categorical data, specifically using Term Frequency-Inverse Document Frequency (TF-IDF), eXtreme Gradient Boosting (XGBoost) and target encoding. Our method begins with thorough data preparation, removing outliers and filling in missing values, then extracts important features from product titles, descriptions, and bullet points present in the dataset. We convert text from product titles, descriptions, and bullet points into numerical form using the TF-IDF technique. It captures the weighted frequency of words in the form of TF-IDF feature vectors enabling the effective application of the algorithm. Our training process employs RandomizedSearchCV to optimize the XGBoost model’s hyperparameters utilizing TF-IDF vectors and target encoded product type IDs. This allows the model to effectively handle variability and uncertainty for product length predictions. The techniques used contribute to the adaptability of the method and enable accurate prediction of product length in e-commerce which can be helpful in inventory management across diverse products. This can extend their utility to optimize supply chain operations, improving demand forecasting across a variety of products, and aiding in strategic planning for procurement and stock levels.
ISSN:2661-8907
2662-995X
2661-8907
DOI:10.1007/s42979-024-02999-8