Date of Award

3-2025

Document Type

Thesis

Degree Name

Master of Science in Operations Research

Department

Department of Operational Sciences

First Advisor

Matthew A. Robbins, PhD

Abstract

This research utilizes reinforcement learning (RL) to train two blue agents each imbued with a directed energy weapon (DEW) in a 2v2 within visual range air combat maneuvering problem. A phased solution approach is employed to repeatedly tune and train several RL algorithm implementations: Proximal Policy Optimization (PPO) and Double Deep Q Network (DDQN). Phase I of training includes reward shaping for basic flight elements such as altitude, airspeed, and target proximity. Phase II of training builds off policies developed in Phase I, but rewards emphasize winning the aerial engagement by any means necessary. DDQN significantly outperforms PPO in Phase I, obtaining a superlative policy that shot down both red aircraft in 43.1% of engagements (compared to 21.3% for PPO). In Phase II however, PPO produced a superlative policy that shot down both red aircraft in 61.1% of simulated engagements, compared to the superlative DDQN policy that shot down both red just 53.4% of the time. While PPO ultimately produces the superlative policy with the highest combat win percentage, the DDQN superlative policy appears more generalizable and broadly applicable to differing air combat environments. In addition to comparing each algorithm’s superlative policies against each other, we utilize the superlative policies to demonstrate a proof of concept for evaluating how adjusting DEW settings might impact combat effectiveness, finding that increasing the DEW firing angle range significantly improves blue’s mean total reward across simulated episodes.

AFIT Designator

AFIT-ENS-MS-25-M-197

Comments

An embargo was observed for this posting.

Distribution A: Approved for public release, Distribution Unlimited. PA case number 88ABW-2025-0328

Share

COinS