On Rank Selection in Non-Negative Matrix Factorization Using Concordance

The choice of the factorization rank of a matrix is critical, e.g., in dimensionality reduction, filtering, clustering, deconvolution, etc., because selecting a rank that is too high amounts to adjusting the noise, while selecting a rank that is too low results in the oversimplification of the signa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Mathematics (Basel) 2023-11, Vol.11 (22), p.4611
Hauptverfasser: Fogel, Paul, Geissler, Christophe, Morizet, Nicolas, Luta, George
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The choice of the factorization rank of a matrix is critical, e.g., in dimensionality reduction, filtering, clustering, deconvolution, etc., because selecting a rank that is too high amounts to adjusting the noise, while selecting a rank that is too low results in the oversimplification of the signal. Numerous methods for selecting the factorization rank of a non-negative matrix have been proposed. One of them is the cophenetic correlation coefficient (ccc), widely used in data science to evaluate the number of clusters in a hierarchical clustering. In previous work, it was shown that ccc performs better than other methods for rank selection in non-negative matrix factorization (NMF) when the underlying structure of the matrix consists of orthogonal clusters. In this article, we show that using the ratio of ccc to the approximation error significantly improves the accuracy of the rank selection. We also propose a new criterion, concordance, which, like ccc, benefits from the stochastic nature of NMF; its accuracy is also improved by using its ratio-to-error form. Using real and simulated data, we show that concordance, with a CUSUM-based automatic detection algorithm for its original or ratio-to-error forms, significantly outperforms ccc. It is important to note that the new criterion works for a broader class of matrices, where the underlying clusters are not assumed to be orthogonal.
ISSN:2227-7390
2227-7390
DOI:10.3390/math11224611