Date of Award
3-2025
Document Type
Thesis
Degree Name
Master of Science in Operations Research
Department
Department of Operational Sciences
First Advisor
Matthew A. Robbins, PhD
Abstract
This research utilizes reinforcement learning (RL) to train two blue agents each imbued with a directed energy weapon (DEW) in a 2v2 within visual range air combat maneuvering problem. A phased solution approach is employed to repeatedly tune and train several RL algorithm implementations: Proximal Policy Optimization (PPO) and Double Deep Q Network (DDQN). Phase I of training includes reward shaping for basic flight elements such as altitude, airspeed, and target proximity. Phase II of training builds off policies developed in Phase I, but rewards emphasize winning the aerial engagement by any means necessary. DDQN significantly outperforms PPO in Phase I, obtaining a superlative policy that shot down both red aircraft in 43.1% of engagements (compared to 21.3% for PPO). In Phase II however, PPO produced a superlative policy that shot down both red aircraft in 61.1% of simulated engagements, compared to the superlative DDQN policy that shot down both red just 53.4% of the time. While PPO ultimately produces the superlative policy with the highest combat win percentage, the DDQN superlative policy appears more generalizable and broadly applicable to differing air combat environments. In addition to comparing each algorithm’s superlative policies against each other, we utilize the superlative policies to demonstrate a proof of concept for evaluating how adjusting DEW settings might impact combat effectiveness, finding that increasing the DEW firing angle range significantly improves blue’s mean total reward across simulated episodes.
AFIT Designator
AFIT-ENS-MS-25-M-197
Recommended Citation
Wilson, Caden W., "Learning To Dogfight: Proximal Policy Optimization vs. Double Deep Q Network For 2v2 Air Combat with Directed Energy Weapons in AFSIM" (2025). Theses and Dissertations. 8283.
https://scholar.afit.edu/etd/8283
Comments
An embargo was observed for this posting.
Distribution A: Approved for public release, Distribution Unlimited. PA case number 88ABW-2025-0328