Theses and Dissertations

Learning To Dogfight: Proximal Policy Optimization vs. Double Deep Q Network For 2v2 Air Combat with Directed Energy Weapons in AFSIM

Caden W. Wilson

Date of Award

3-2025

Document Type

Thesis

Degree Name

Master of Science in Operations Research

Department

Department of Operational Sciences

First Advisor

Matthew A. Robbins, PhD

Abstract

This research utilizes reinforcement learning (RL) to train two blue agents each imbued with a directed energy weapon (DEW) in a 2v2 within visual range air combat maneuvering problem. A phased solution approach is employed to repeatedly tune and train several RL algorithm implementations: Proximal Policy Optimization (PPO) and Double Deep Q Network (DDQN). Phase I of training includes reward shaping for basic flight elements such as altitude, airspeed, and target proximity. Phase II of training builds off policies developed in Phase I, but rewards emphasize winning the aerial engagement by any means necessary. DDQN significantly outperforms PPO in Phase I, obtaining a superlative policy that shot down both red aircraft in 43.1% of engagements (compared to 21.3% for PPO). In Phase II however, PPO produced a superlative policy that shot down both red aircraft in 61.1% of simulated engagements, compared to the superlative DDQN policy that shot down both red just 53.4% of the time. While PPO ultimately produces the superlative policy with the highest combat win percentage, the DDQN superlative policy appears more generalizable and broadly applicable to differing air combat environments. In addition to comparing each algorithm’s superlative policies against each other, we utilize the superlative policies to demonstrate a proof of concept for evaluating how adjusting DEW settings might impact combat effectiveness, finding that increasing the DEW firing angle range significantly improves blue’s mean total reward across simulated episodes.

AFIT Designator

AFIT-ENS-MS-25-M-197

Comments

An embargo was observed for this posting.

Distribution A: Approved for public release, Distribution Unlimited. PA case number 88ABW-2025-0328

Recommended Citation

Wilson, Caden W., "Learning To Dogfight: Proximal Policy Optimization vs. Double Deep Q Network For 2v2 Air Combat with Directed Energy Weapons in AFSIM" (2025). Theses and Dissertations. 8283.
https://scholar.afit.edu/etd/8283

Download

Included in

Aviation Commons, Operational Research Commons

COinS

Theses and Dissertations

Learning To Dogfight: Proximal Policy Optimization vs. Double Deep Q Network For 2v2 Air Combat with Directed Energy Weapons in AFSIM

Date of Award

Document Type

Degree Name

Department

First Advisor

Abstract

AFIT Designator

Comments

Recommended Citation

Included in

Search

Browse

Author Corner

Theses and Dissertations

Learning To Dogfight: Proximal Policy Optimization vs. Double Deep Q Network For 2v2 Air Combat with Directed Energy Weapons in AFSIM

Author

Date of Award

Document Type

Degree Name

Department

First Advisor

Abstract

AFIT Designator

Comments

Recommended Citation

Included in

Share

Search

Browse

Author Corner