A new text-based w-distance metric to find the perfect match between words

The k-NN algorithm is an instance-based learning algorithm which is widely used in the data mining applications. The core engine of the k-NN algorithm is the distance/similarity function. The performance of the k-NN algorithm varies with the selection of distance function. The traditional distance/s...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of intelligent & fuzzy systems 2020-01, Vol.38 (3), p.2661-2672
Hauptverfasser:	Ali, Munwar, Jung, Low Tang, Hosam, Osama, Wagan, Asif Ali, Shah, Rehan Ali, Khayyat, Mashael
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Computer simulation Data mining Euclidean geometry Machine learning Similarity Similarity measures Strings
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The k-NN algorithm is an instance-based learning algorithm which is widely used in the data mining applications. The core engine of the k-NN algorithm is the distance/similarity function. The performance of the k-NN algorithm varies with the selection of distance function. The traditional distance/similarity functions in k-NN do not perfectly handle the mix-mode words such as when one string has multiple substrings/words. For example, a two-word string of “Employee Name”, a one-word string of “Name” or more than one word such as, “Name of Employee”. This ambiguity is faced by different distance/similarity functions causing difficulties in finding the perfect match of words. To improve the perfect-match calculation functionality in the traditional k-NN algorithm, a new similarity distance metric is developed and named as word-distance (w-distance). The perfect match will help us to identify the exact required value. The proposed w-distance is a hybrid of distance and similarity in nature because it is to handle dissimilarity and similarity features of strings at the same time. The simulation results showed that w-distance has a better impact on the performance of the k-NN algorithm as compared to the Euclidean distance and the cosine similarity.
ISSN:	1064-1246 1875-8967
DOI:	10.3233/JIFS-179552