Strengthening Sentence Similarity Identification Through OpenAI Embeddings and Deep Learning

Discovering similarity between sentences can be beneficial to a variety of systems, including chatbots for customer support, educational platforms, e-commerce customer inquiries, and community forums or question-answering systems. One of the primary issues that online question-answering platforms an...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of advanced computer science & applications 2024, Vol.15 (4)
Hauptverfasser: Korade, Nilesh B., Salunke, Mahendra B., Bhosle, Amol A., Kumbharkar, Prashant B., Asalkar, Gayatri G., Khedkar, Rutuja G.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Discovering similarity between sentences can be beneficial to a variety of systems, including chatbots for customer support, educational platforms, e-commerce customer inquiries, and community forums or question-answering systems. One of the primary issues that online question-answering platforms and customer service chatbots have is the large number of duplicate inquiries that are placed on the platform. In addition to cluttering up the platform, these repetitive queries degrade the content's quality and make it harder for visitors to locate pertinent information. Therefore, it is necessary to automatically detect sentence similarity in order to improve the user experience and quickly match user expectations. The present study makes use of the Quora dataset to construct a framework for similarity discovery in sentence pairs. As part of our research, we have built additional attributes based on textual data for improving the accuracy of similarity prediction. The study investigates several vectorization methods and their influence on accuracy. To convert preprocess text input to a numerical vector, we implemented Word2Vec, FastText, Term Frequency-Inverse Document Frequency (TF-IDF), CountVectorizer (CV), and OpenAI embedding. In order to judge sentence similarity, the embedding offered by several approaches was used with various models, including cosine similarity, Random Forest (RF), AdaBoost, XGBoost, LSTM, and CNN. The result demonstrates that all algorithms trained on OpenAI embedding yield excellent outcomes. The OpenAI-created embedding offers excellent information to models trained on it and has significant potential for capturing sentence similarity.
ISSN:2158-107X
2156-5570
DOI:10.14569/IJACSA.2024.0150485