Semantic segment extraction and matching for Internet FAQ retrieval

This investigation presents a novel approach to semantic segment extraction and matching for retrieving information from Internet FAQs with natural language queries. Two semantic segments, the question category segment (QS) and the keyword segment (KS), are extracted from the input queries and the F...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on knowledge and data engineering 2006-07, Vol.18 (7), p.930-940
Hauptverfasser:	WU, Chung-Hsien, YEH, Jui-Feng, LAI, Yu-Sheng
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied sciences Artificial intelligence Computer science control theory systems Computer systems and distributed systems. User interface Data mining Data processing. List processing. Character string processing deduction and theorem proving Exact sciences and technology Explosives Filtering Information retrieval Information systems. Data bases Internet knowledge processing Matching Mathematical models Memory organisation. Data processing Natural language processing Natural languages Navigation Ontologies Queries query formulation Recall retrieval models Search engines Segments Semantics Similarity Software Speech and sound recognition and synthesis. Linguistics Studies
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This investigation presents a novel approach to semantic segment extraction and matching for retrieving information from Internet FAQs with natural language queries. Two semantic segments, the question category segment (QS) and the keyword segment (KS), are extracted from the input queries and the FAQ questions with a semiautomatically derived question-semantic grammar. A semantic matching method is presented to estimate the similarity between the semantic segments of the query and the questions in the FAQ collection. Additionally, the vector space model (VSM) is adopted to measure the similarity between the query and the answers of the QA pairs. Finally, a multistage ranking strategy is adopted to determine the optimally performing combination of similarity metrics. The experimental results illustrate that the proposed method achieves an average rank of 4.52 and a top-10 recall rate of 90.89 percent. Compared with the query-expansion method, this method improves the performance by 4.82 places in the average rank of correct answers, 25.34 percent in the top-5 recall rate, and 5.21 percent in the top-10 recall rate.
ISSN:	1041-4347 1558-2191
DOI:	10.1109/TKDE.2006.115