Possibilistic Similarity Measures for Data Science and Machine Learning Applications

Measuring similarity is of a great interest in many research areas such as in data sciences, machine learning, pattern recognition, text analysis and information retrieval to name a few. Literature has shown that possibility is an attractive notion in the context of distinguishability assessment and...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2020-01, Vol.8, p.1-1
Hauptverfasser:	Charfi, Amal, Bouhamed, Sonda Ammar, Bosse, Eloi, Kallel, Imene Khanfir, Bouchaala, Wassim, Solaiman, Basel, Derbel, Nabil
Format:	Artikel
Sprache:	eng
Schlagworte:	Atmospheric measurements Classification Computer Science Computer Science, Information Systems Context Data mining Data science distance Empirical analysis Engineering Engineering Sciences Engineering, Electrical & Electronic entropy Experiments Indexes Information retrieval learning Machine learning Measurement uncertainty measures of specificity Particle measurements Pattern analysis Pattern recognition possibility distributions Possibility theory Science & Technology Signal and Image processing Similarity Similarity measures Technology Telecommunications Uncertainty
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Measuring similarity is of a great interest in many research areas such as in data sciences, machine learning, pattern recognition, text analysis and information retrieval to name a few. Literature has shown that possibility is an attractive notion in the context of distinguishability assessment and can lead to very efficient and computationally inexpensive learning schemes. This paper focuses on determining the similarity between two possibility distributions. A review of existing similarity measures within the possibilistic framework is presented first. Then, similarity measures are analyzed with respect to their capacity to satisfy a set of required properties that a similarity measure should own. Most of the existing possibilistic similarity measures produce undesirable outcomes since they generally depend on the application context.A new similarity measure, called InfoSpecificity, is introduced and the similarity measures are categorized into three main methods: morphic-based, amorphic-based and hybrid. Two experiments are being conducted using four benchmark databases. The aim of the experiments is to compare the efficiency of the possibilistic similarity measures when applied to real data. Empirical experiments have shown good results for the hybrid methods, particularly with the InfoSpecificity measure. In general, the hybrid methods outperform the other two categories when evaluated on small-size samples, i.e., poor-data context (or poor-informed environment) where possibility theory can be used at the greatest benefit.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2020.2979553