The Value-of-Information in Matching With Queues

We consider the problem of optimal matching with queues in dynamic systems and investigate the value-of-information. In such systems, operators match tasks and resources stored in queues, with the objective of maximizing the system utility of the matching reward profile, minus the average matching c...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/ACM transactions on networking 2017-02, Vol.25 (1), p.29-42
1. Verfasser:	Huang, Longbo
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithm design and analysis Algorithms Approximation algorithms Control algorithms Convergence Crowdsourcing Delay Delays dual learning Heuristic algorithms learning module Machine learning Matching Modules Optimization queueing Queues Random access memory System dynamics
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We consider the problem of optimal matching with queues in dynamic systems and investigate the value-of-information. In such systems, operators match tasks and resources stored in queues, with the objective of maximizing the system utility of the matching reward profile, minus the average matching cost. This problem appears in many practical systems and the main challenges are the no-underflow constraints, and the lack of matching-reward information and system dynamics statistics. We develop two online matching algorithms: Learning-aided Reward optimAl Matching (LRAM) and Dual-LRAM (DRAM) to effectively resolve both challenges. Both algorithms are equipped with a learning module for estimating the matching-reward information, while DRAM incorporates an additional module for learning the system dynamics. We show that both algorithms achieve an O(∈ + δ r ) close-to-optimal utility performance for any ∈ > 0, while DRAM achieves a faster convergence speed and a better delay compared with LRAM, i.e., O(δ π /∈ + log(1/∈) 2 ) delay and O(δ π /∈) convergence under DRAM compared with O(1/∈) delay and convergence under LRAM (δ r and δ π are maximum estimation errors for reward and system dynamics). Our results show that the information of different system components can play very different roles in algorithm performance and provide a novel way for designing the joint learning-control algorithms.
ISSN:	1063-6692 1558-2566
DOI:	10.1109/TNET.2016.2564700