Enhancing User Experience on Q&A Platforms: Measuring Text Similarity Based on Hybrid CNN-LSTM Model for Efficient Duplicate Question Detection

This research introduces an innovative approach for identifying duplicate questions within the Stack Overflow community, a challenging task in NLP. Leveraging deep learning techniques, our proposed methodology combines Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks to...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2024, Vol.12, p.34512-34526
Hauptverfasser: Faseeh, Muhammad, Khan, Murad Ali, Iqbal, Naeem, Qayyum, Faiza, Mehmood, Asif, Kim, Jungsuk
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This research introduces an innovative approach for identifying duplicate questions within the Stack Overflow community, a challenging task in NLP. Leveraging deep learning techniques, our proposed methodology combines Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks to capture both local and long-term dependencies in textual data. We employ word embeddings, specifically Google's Word2Vec and GloVe, to enhance text representation. Extensive experiments on the Stack Overflow dataset demonstrate the effectiveness of our approach, achieving an impressive accuracy of 87.09% and a recall rate of 87.%. The integration of CNN and LSTM models significantly streamlines preprocessing, making it a valuable tool for detecting duplicate questions. Future directions include extending the model to multiple languages and exploring alternative word embedding techniques. Our approach presents promising applications beyond Stack Overflow, offering solutions for identifying similar questions on various QA platforms.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2024.3358422