cite2vec: Citation-Driven Document Exploration via Word Embeddings

Effectively exploring and browsing document collections is a fundamental problem in visualization. Traditionally, document visualization is based on a data model that represents each document as the set of its comprised words, effectively characterizing what the document is. In this paper we take an...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on visualization and computer graphics 2017-01, Vol.23 (1), p.691-700
Hauptverfasser:	Berger, Matthew, McDonough, Katherine, Seversky, Lee M.
Format:	Artikel
Sprache:	eng
Schlagworte:	Citation analysis Computer vision Context Data visualization document visualization Documents Object detection Scientific papers Semantics Tracking Two dimensional displays Visualization word embeddings
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Effectively exploring and browsing document collections is a fundamental problem in visualization. Traditionally, document visualization is based on a data model that represents each document as the set of its comprised words, effectively characterizing what the document is. In this paper we take an alternative perspective: motivated by the manner in which users search documents in the research process, we aim to visualize documents via their usage, or how documents tend to be used. We present a new visualization scheme - cite2vec - that allows the user to dynamically explore and browse documents via how other documents use them, information that we capture through citation contexts in a document collection. Starting from a usage-oriented word-document 2D projection, the user can dynamically steer document projections by prescribing semantic concepts, both in the form of phrase/document compositions and document:phrase analogies, enabling the exploration and comparison of documents by their use. The user interactions are enabled by a joint representation of words and documents in a common high-dimensional embedding space where user-specified concepts correspond to linear operations of word and document vectors. Our case studies, centered around a large document corpus of computer vision research papers, highlight the potential for usage-based document visualization.
ISSN:	1077-2626 1941-0506
DOI:	10.1109/TVCG.2016.2598667