Sequential Interdiction with Incomplete Information and Learning
In “Sequential Interdiction with Incomplete Information and Learning,” J.S. Borrero, O. Prokopyev, and D. Saure study a general class of interdiction problems when the leader has incomplete information regarding the formulation solved by the follower and both interact repeatedly over time. In the se...
Gespeichert in:
Veröffentlicht in: | Operations research 2019-01, Vol.67 (1), p.72-89 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In “Sequential Interdiction with Incomplete Information and Learning,” J.S. Borrero, O. Prokopyev, and D. Saure study a general class of interdiction problems when the leader has incomplete information regarding the formulation solved by the follower and both interact repeatedly over time. In the setting, the leader learns by observing the follower’s reactions to the leader’s actions. The authors show that strong notions of optimality are not attainable, but a form of weak optimality is attained by the set of proposed policies. These policies are greedy and robust in the sense that the leader reacts “optimally” in each period based on the information at hand and takes a robust approach to handling missing information. The policies are shown to provide a real-time certificate of optimality and to consistently outperform a reasonable benchmark in a series of numerical experiments.
We present a framework for a class of sequential decision-making problems in the context of general interdiction problems, in which a leader and a follower repeatedly interact. At each period, the leader allocates resources to disrupt the performance of the follower (e.g., as in defender–attacker or network interdiction problems), who, in turn, minimizes some cost function over a set of activities that depends on the leader’s decision. Although the follower has complete knowledge of the follower’s problem, the leader has only partial information and needs to learn about the cost parameters, available resources, and the follower’s activities from the feedback generated by the follower’s actions. We measure policy performance in terms of its time-stability, defined as the number of periods it takes for the leader to match the actions of an oracle with complete information. In particular, we propose a class of greedy and robust policies and show that these policies are weakly optimal, eventually match the oracle’s actions, and provide a real-time certificate of optimality. We also study a lower bound on any policy performance based on the notion of a semioracle. Our numerical experiments demonstrate that the proposed policies consistently outperform a reasonable benchmark and perform fairly close to the semioracle.
The online appendix is available at
https://doi.org/10.1287/opre.2018.1773
. |
---|---|
ISSN: | 0030-364X 1526-5463 |
DOI: | 10.1287/opre.2018.1773 |