A model of discrete choice based on reinforcement learning under short-term memory

A family of models of individual discrete choice are constructed by means of statistical averaging of choices made by a subject in a reinforcement learning process, where the subject has a short, k-term memory span. The choice probabilities in these models combine in a non-trivial, non-linear way th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of mathematical psychology 2020-12, Vol.99, p.102455, Article 102455
1. Verfasser: Perepelitsa, Misha
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A family of models of individual discrete choice are constructed by means of statistical averaging of choices made by a subject in a reinforcement learning process, where the subject has a short, k-term memory span. The choice probabilities in these models combine in a non-trivial, non-linear way the initial learning bias and the experience gained through learning. The properties of such models are discussed and, in particular, it is shown that probabilities deviate from Luce’s Choice Axiom. Moreover, it is shown that the latter property is recovered as the memory span increases. Two different applications in utility theory are considered. In the first, we use the discrete choice model to generate a binary preference (weak stochastic order) relation on simple lotteries. We show that the relation violate the transitivity and independence axioms of expected utility theory. Furthermore, we establish the dependence of the preferences on frames, with risk aversion for gains, and risk seeking for losses. Based on these findings, we consider a parametric model of choice based on the probability maximization principle, as a model for deviations from the expected utility principle. To illustrate the approach, we apply this model to the classical problem of demand for insurance. •Reinforcement learning (RL) choice probabilities leads to non LCA models.•Binary preferences derived from RL choice probabilities are intransitive and nonlinear.•Increasing memory span of a decision maker conforms the decisions with LCA.
ISSN:0022-2496
1096-0880
DOI:10.1016/j.jmp.2020.102455