Mapping Natural Language Commands to Web Elements
The web provides a rich, open-domain environment with textual, structural, and spatial properties. We propose a new task for grounding language in this environment: given a natural language command (e.g., "click on the second article"), choose the correct element on the web page (e.g., a h...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The web provides a rich, open-domain environment with textual, structural,
and spatial properties. We propose a new task for grounding language in this
environment: given a natural language command (e.g., "click on the second
article"), choose the correct element on the web page (e.g., a hyperlink or
text box). We collected a dataset of over 50,000 commands that capture various
phenomena such as functional references (e.g. "find who made this site"),
relational reasoning (e.g. "article by john"), and visual reasoning (e.g.
"top-most article"). We also implemented and analyzed three baseline models
that capture different phenomena present in the dataset. |
---|---|
DOI: | 10.48550/arxiv.1808.09132 |