Natural Language Interactions in Autonomous Vehicles: Intent Detection and Slot Filling from Passenger Utterances
Springer LNCS Proceedings for CICLing 2019 Understanding passenger intents and extracting relevant slots are important building blocks towards developing contextual dialogue systems for natural interactions in autonomous vehicles (AV). In this work, we explored AMIE (Automated-vehicle Multi-modal In...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Springer LNCS Proceedings for CICLing 2019 Understanding passenger intents and extracting relevant slots are important
building blocks towards developing contextual dialogue systems for natural
interactions in autonomous vehicles (AV). In this work, we explored AMIE
(Automated-vehicle Multi-modal In-cabin Experience), the in-cabin agent
responsible for handling certain passenger-vehicle interactions. When the
passengers give instructions to AMIE, the agent should parse such commands
properly and trigger the appropriate functionality of the AV system. In our
current explorations, we focused on AMIE scenarios describing usages around
setting or changing the destination and route, updating driving behavior or
speed, finishing the trip and other use-cases to support various natural
commands. We collected a multi-modal in-cabin dataset with multi-turn dialogues
between the passengers and AMIE using a Wizard-of-Oz scheme via a realistic
scavenger hunt game activity. After exploring various recent Recurrent Neural
Networks (RNN) based techniques, we introduced our own hierarchical joint
models to recognize passenger intents along with relevant slots associated with
the action to be performed in AV scenarios. Our experimental results
outperformed certain competitive baselines and achieved overall F1 scores of
0.91 for utterance-level intent detection and 0.96 for slot filling tasks. In
addition, we conducted initial speech-to-text explorations by comparing
intent/slot models trained and tested on human transcriptions versus noisy
Automatic Speech Recognition (ASR) outputs. Finally, we compared the results
with single passenger rides versus the rides with multiple passengers. |
---|---|
DOI: | 10.48550/arxiv.1904.10500 |