Document Type
Conference Proceeding
Publication Date
7-12-2008
Abstract
This paper merges hierarchical reinforcement learning (HRL) with ant colony optimization (ACO) to produce a HRL ACO algorithm capable of generating solutions for large domains. This paper describes two specific implementations of the new algorithm: the first a modification to Dietterich’s MAXQ-Q HRL algorithm, the second a hierarchical ant colony system algorithm. These implementations generate faster results, with little to no significant change in the quality of solutions for the tested problem domains. The application of ACO to the MAXQ-Q algorithm replaces the reinforcement learning, Q-learning, with the modified ant colony optimization method, Ant-Q. This algorithm, MAXQ-AntQ, converges to solutions not significantly different from MAXQ-Q in 88% of the time. This paper then transfers HRL techniques to the ACO domain and traveling salesman problem (TSP). To apply HRL to ACO, a hierarchy must be created for the TSP. A data clustering algorithm creates these subtasks, with an ACO algorithm to solve the individual and complete problems. This paper tests two clustering algorithms, k-means and G-means. The results demonstrate the algorithm with data clustering produces solutions 20 times faster with 5-10% decrease in solution quality due to the effects of clustering.
Source Publication
10th Annual Conference on Genetic and Evolutionary Computation (Proceedings of)
Recommended Citation
Erik J. Dries and Gilbert L. Peterson. 2008. Scaling ant colony optimization with hierarchical reinforcement learning partitioning. In Proceedings of the 10th annual conference on Genetic and evolutionary computation (GECCO '08). Association for Computing Machinery, New York, NY, USA, 25–32. https://doi.org/10.1145/1389095.1389100
Comments
©2008 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the U.S. Government.
AFIT Scholar furnishes the accepted draft of this conference paper. The version of record, as published by ACM in the proceedings, is available to subscribers through the DOI link on this page.
Shared in accordance with ACM's green open access policies found at their website.
Funding note: This work was supported in part through AFRL/SNRN Lab Task 06SN02COR from the Air Force Office of Scientific Research (AFOSR)