Reinforcement learning is one of the more attractive machine learning technologies, due to its unsupervised learning structure and ability to continually learn even as the environment it is operating in changes. This ability to learn in an unsupervised manner in a changing environment is applicable in complex domains through the use of function approximation of the domain’s policy. The function approximation presented here is that of fuzzy state aggregation. This article presents the use of fuzzy state aggregation with the current policy hill climbing methods of Win or Lose Fast (WoLF) and policy-dynamics based WoLF (PD-WoLF), exceeding the learning rate and performance of the combined fuzzy state aggregation and Q-learning reinforcement learning. Results of testing using the TileWorld domain demonstrate the policy hill climbing performs better than the existing Q-learning implementations.
Eighth IASTED International Conference on Control and Applications (CA 2006)
Wardell, D., & Peterson, G. L. (2006). Fuzzy State Aggregation and Off-Policy Reinforcement Learning for Stochastic Environments. Eighth IASTED International Conference on Control and Applications (CA 2006), 2006, Pp. 145-152., 133–138.