Consistent Aggregation of Objectives with Diverse Time Preferences Requires Non-Markovian Rewards
As the capabilities of artificial agents improve, they are being increasingly deployed to service multiple diverse objectives and stakeholders. However, the composition of these objectives is often performed ad hoc, with no clear justification. This paper takes a normative approach to multi-objectiv...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | As the capabilities of artificial agents improve, they are being increasingly
deployed to service multiple diverse objectives and stakeholders. However, the
composition of these objectives is often performed ad hoc, with no clear
justification. This paper takes a normative approach to multi-objective agency:
from a set of intuitively appealing axioms, it is shown that Markovian
aggregation of Markovian reward functions is not possible when the time
preference (discount factor) for each objective may vary. It follows that
optimal multi-objective agents must admit rewards that are non-Markovian with
respect to the individual objectives. To this end, a practical non-Markovian
aggregation scheme is proposed, which overcomes the impossibility with only one
additional parameter for each objective. This work offers new insights into
sequential, multi-objective agency and intertemporal choice, and has practical
implications for the design of AI systems deployed to serve multiple
generations of principals with varying time preference. |
---|---|
DOI: | 10.48550/arxiv.2310.00435 |