Approximate dynamic programming with state aggregation applied to UAV perimeter patrol
One encounters the curse of dimensionality in the application of dynamic programming to determine optimal policies for large‐scale controlled Markov chains. In this paper, we provide a reward‐based aggregation method to construct suboptimal policies for a perimeter surveillance control problem which gives rise to a large scale Markov chain. The novelty of this approach lies in circumventing the need for value iteration over the entire state space. Instead, the state space is partitioned and the value function is approximated by a constant over each partition. We associate a meta‐state with each partition, where the transition probabilities between these meta‐states are known. The state aggregation approach results in a significant reduction in the computational burden and lends itself to value iteration over the aggregated state‐space. We provide bounds to assess the quality of the approximation and give numerical results that support the proposed methodology.