Gradient descent learning algorithms have proven effective in solving mixed strategy games. The policy hill climbing (PHC) variants of WoLF (Win or Learn Fast) and PDWoLF (Policy Dynamics based WoLF) have both shown rapid convergence to equilibrium solutions by increasing the accuracy of their gradient parameters over standard Q-learning. Likewise, cooperative learning techniques using weighted strategy sharing (WSS) and expertness measurements improve agent performance when multiple agents are solving a common goal. By combining these cooperative techniques with fast gradient descent learning, an agent’s performance converges to a solution at an even faster rate. This statement is verified in a stochastic grid world environment using a limited visibility hunter-prey model with random and intelligent prey. Among five different expertness measurements, cooperative learning using each PHC algorithm converges faster than independent learning when agents strictly learn from better performing agents.
9th IASTED International Conference on Artificial Intelligence and Soft Computing (ASC 2005)
Cousin, K., & Peterson, G. L. (2005). Cooperative reinforcement learning using an expert-measuring weighted strategy with WoLF. The 9th IASTED International Conference on Artificial Intelligence and Soft Computing, 2005 (ASC 2005), pp. 165-170. Track 481-196.