Track, Attend, and Parse (TAP): An End-to-End Framework for Online Handwritten Mathematical Expression Recognition

In this paper, we introduce Track, Attend, and Parse (TAP), an end-to-end approach based on neural networks for online handwritten mathematical expression recognition (OHMER). The architecture of TAP consists of a tracker and a parser. The tracker employs a stack of bidirectional recurrent neural ne...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multimedia 2019-01, Vol.21 (1), p.221-233
Hauptverfasser: Zhang, Jianshu, Du, Jun, Dai, Lirong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this paper, we introduce Track, Attend, and Parse (TAP), an end-to-end approach based on neural networks for online handwritten mathematical expression recognition (OHMER). The architecture of TAP consists of a tracker and a parser. The tracker employs a stack of bidirectional recurrent neural networks with gated recurrent units (GRU) to model the input handwritten traces, which can fully utilize the dynamic trajectory information in OHMER. Followed by the tracker, the parser adopts a GRU equipped with guided hybrid attention (GHA) to generate notations. The proposed GHA is composed of a coverage-based spatial attention, a temporal attention, and an attention guider. Moreover, we demonstrate the strong complementarity between offline information with static-image input and online information with ink-trajectory input by blending a fully convolutional networks-based watcher into TAP. Inherently, unlike traditional methods, this end-to-end framework does not require the explicit symbol segmentation and a predefined expression grammar for parsing. Validated on a benchmark published by the CROHME competition, the proposed approach outperforms the state-of-the-art methods and achieves the best reported results with an expression recognition accuracy of 61.16% on CROHME 2014 and 57.02% on CROHME 2016, using only official training dataset.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2018.2844689