Reinforcement Learning of Multi-Party Trading Dialog Policies

Trading dialogs are a kind of negotiation in which an exchange of ownership of items is discussed, and these kinds of dialogs are pervasive in many situations. Recently, there has been an increasing amount of research on applying reinforcement learning (RL) to negotiation dialog domains. However, in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Transactions of the Japanese Society for Artificial Intelligence 2015/07/01, Vol.31(4), pp.B-FC1_1-14
Hauptverfasser: Hiraoka, Takuya, Georgila, Kallirroi, Nouri, Elnaz, Traum, David, Nakamura, Satoshi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Trading dialogs are a kind of negotiation in which an exchange of ownership of items is discussed, and these kinds of dialogs are pervasive in many situations. Recently, there has been an increasing amount of research on applying reinforcement learning (RL) to negotiation dialog domains. However, in previous research, the focus was on negotiation dialog between two participants only, ignoring cases where negotiation takes place between more than two interlocutors. In this paper, as a first study on multi-party negotiation, we apply RL to a multi-party trading scenario where the dialog system (learner) trades with one, two, or three other agents. We experiment with different RL algorithms and reward functions. We use Q-learning with linear function approximation, least-squares policy iteration, and neural fitted Q iteration. In addition, to make the learning process more efficient, we introduce an incremental reward function. The negotiation strategy of the learner is learned through simulated dialog with trader simulators. In our experiments, we evaluate how the performance of the learner varies depending on the RL algorithm used and the number of traders. Furthermore, we compare the learned dialog policies with two strong hand-crafted baseline dialog policies. Our results show that (1) even in simple multi-party trading dialog tasks, learning an effective negotiation policy is not a straightforward task and requires a lot of experimentation; and (2) the use of neural fitted Q iteration combined with an incremental reward function produces negotiation policies as effective or even better than the policies of the two strong hand-crafted baselines.
ISSN:1346-0714
1346-8030
DOI:10.1527/tjsai.B-FC1