Optimal Policy for Sequential Stochastic Resource Allocation
Document Type
Article
Publication Date
11-2018
Abstract
A gambler in possession of R chips/coins is allowed N(>R) pulls/trials at a slot machine. Upon pulling the arm, the slot machine realizes a random state i ɛ{1, ..., M} with probability p(i) and the corresponding positive monetary reward g(i) is presented to the gambler. The gambler can accept the reward by inserting a coin in the machine. However, the dilemma facing the gambler is whether to spend the coin or keep it in reserve hoping to pick up a greater reward in the future. We assume that the gambler has full knowledge of the reward distribution function. We are interested in the optimal gambling strategy that results in the maximal cumulative reward. The problem is naturally posed as a Stochastic Dynamic Program whose solution yields the optimal policy and expected cumulative reward. We show that the optimal strategy is a threshold policy, wherein a coin is spent if and only if the number of coins r exceeds a state and stage/trial dependent threshold value. We illustrate the utility of the result on a military operational scenario.
Source Publication
Procedia Computer Science
Recommended Citation
Krishnamoorthy, K., Pachter, M., & Casbeer, D. W. (2016). Optimal Policy for Sequential Stochastic Resource Allocation. Procedia Computer Science, 95, 483–488. https://doi.org/10.1016/j.procs.2016.09.325
Comments
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License, which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way. CC BY-NC-ND 4.0
Sourced from the published version of record cited below.
Part of the special issue of Procedia Computer Science: Complex Adaptive Systems Los Angeles, CA November 2-4, 2016