Continuous Semi-Supervised Nonnegative Matrix Factorization

Nonnegative matrix factorization can be used to automatically detect topics within a corpus in an unsupervised fashion. The technique amounts to an approximation of a nonnegative matrix as the product of two nonnegative matrices of lower rank. In certain applications it is desirable to extract topic...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Algorithms 2023-04, Vol.16 (4), p.187
Hauptverfasser: Lindstrom, Michael R., Ding, Xiaofu, Liu, Feng, Somayajula, Anand, Needell, Deanna
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Nonnegative matrix factorization can be used to automatically detect topics within a corpus in an unsupervised fashion. The technique amounts to an approximation of a nonnegative matrix as the product of two nonnegative matrices of lower rank. In certain applications it is desirable to extract topics and use them to predict quantitative outcomes. In this paper, we show Nonnegative Matrix Factorization can be combined with regression on a continuous response variable by minimizing a penalty function that adds a weighted regression error to a matrix factorization error. We show theoretically that as the weighting increases, the regression error in training decreases weakly. We test our method on synthetic data and real data coming from Rate My Professors reviews to predict an instructor’s rating from the text in their reviews. In practice, when used as a dimensionality reduction method (when the number of topics chosen in the model is fewer than the true number of topics), the method performs better than doing regression after topics are identified—both during training and testing—and it retrains interpretability.
ISSN:1999-4893
1999-4893
DOI:10.3390/a16040187