A Self-Organizing Principle for Learning Nonlinear Manifolds

Modern science confronts us with massive amounts of data: expression profiles of thousands of human genes, multimedia documents, subjective judgments on consumer products or political candidates, trade indices, global climate patterns, etc. These data are often highly structured, but that structure...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the National Academy of Sciences - PNAS 2002-12, Vol.99 (25), p.15869-15872
Hauptverfasser: Agrafiotis, Dimitris K., Xu, Huafeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Modern science confronts us with massive amounts of data: expression profiles of thousands of human genes, multimedia documents, subjective judgments on consumer products or political candidates, trade indices, global climate patterns, etc. These data are often highly structured, but that structure is hidden in a complex set of relationships or high-dimensional abstractions. Here we present a self-organizing algorithm for embedding a set of related observations into a low-dimensional space that preserves the intrinsic dimensionality and metric structure of the data. The embedding is carried out by using an iterative pairwise refinement strategy that attempts to preserve local geometry while maintaining a minimum separation between distant objects. In effect, the method views the proximities between remote objects as lower bounds of their true geodesic distances and uses them as a means to impose global structure. Unlike previous approaches, our method can reveal the underlying geometry of the manifold without intensive nearest-neighbor or shortest-path computations and can reproduce the true geodesic distances of the data points in the low-dimensional embedding without requiring that these distances be estimated from the data sample. More importantly, the method is found to scale linearly with the number of points and can be applied to very large data sets that are intractable by conventional embedding procedures.
ISSN:0027-8424
1091-6490
DOI:10.1073/pnas.242424399