Semisupervised learning of hierarchical latent trait models for data visualization

Recently, we have developed the hierarchical generative topographic mapping (HGTM), an interactive method for visualization of large high-dimensional real-valued data sets. We propose a more general visualization system by extending HGTM in three ways, which allows the user to visualize a wider rang...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on knowledge and data engineering 2005-03, Vol.17 (3), p.384-400
Hauptverfasser:	Nabney, I.T., Sun, Y., Tino, P., Kaban, A.
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied sciences Boundaries Clusters Computer science control theory systems Construction Data analysis Data mining Data visualization document mining Exact sciences and technology Feedback Gaussian noise Index Terms- Hierarchical model Information systems. Data bases Interactive latent trait model magnification factors Mathematical models Memory organisation. Data processing Multidimensional systems Quantization Semisupervised learning Software Studies Sun Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recently, we have developed the hierarchical generative topographic mapping (HGTM), an interactive method for visualization of large high-dimensional real-valued data sets. We propose a more general visualization system by extending HGTM in three ways, which allows the user to visualize a wider range of data sets and better support the model development process. 1) We integrate HGTM with noise models from the exponential family of distributions. The basic building block is the latent trait model (LTM). This enables us to visualize data of inherently discrete nature, e.g., collections of documents, in a hierarchical manner. 2) We give the user a choice of initializing the child plots of the current plot in either interactive, or automatic mode. In the interactive mode, the user selects "regions of interest", whereas in the automatic mode, an unsupervised minimum message length (MML)-inspired construction of a mixture of LTMs is employed. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualizing large data sets. 3) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualization plots, since they can highlight the boundaries between data clusters. We illustrate our approach on a toy example and evaluate it on three more complex real data sets.
ISSN:	1041-4347 1558-2191
DOI:	10.1109/TKDE.2005.49