Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive?
Motion forecasting is crucial in enabling autonomous vehicles to anticipate the future trajectories of surrounding agents. To do so, it requires solving mapping, detection, tracking, and then forecasting problems, in a multi-step pipeline. In this complex system, advances in conventional forecasting...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Motion forecasting is crucial in enabling autonomous vehicles to anticipate
the future trajectories of surrounding agents. To do so, it requires solving
mapping, detection, tracking, and then forecasting problems, in a multi-step
pipeline. In this complex system, advances in conventional forecasting methods
have been made using curated data, i.e., with the assumption of perfect maps,
detection, and tracking. This paradigm, however, ignores any errors from
upstream modules. Meanwhile, an emerging end-to-end paradigm, that tightly
integrates the perception and forecasting architectures into joint training,
promises to solve this issue. However, the evaluation protocols between the two
methods were so far incompatible and their comparison was not possible. In
fact, conventional forecasting methods are usually not trained nor tested in
real-world pipelines (e.g., with upstream detection, tracking, and mapping
modules). In this work, we aim to bring forecasting models closer to the
real-world deployment. First, we propose a unified evaluation pipeline for
forecasting methods with real-world perception inputs, allowing us to compare
conventional and end-to-end methods for the first time. Second, our in-depth
study uncovers a substantial performance gap when transitioning from curated to
perception-based data. In particular, we show that this gap (1) stems not only
from differences in precision but also from the nature of imperfect inputs
provided by perception modules, and that (2) is not trivially reduced by simply
finetuning on perception outputs. Based on extensive experiments, we provide
recommendations for critical areas that require improvement and guidance
towards more robust motion forecasting in the real world. The evaluation
library for benchmarking models under standardized and practical conditions is
provided: \url{https://github.com/valeoai/MFEval}. |
---|---|
DOI: | 10.48550/arxiv.2306.09281 |