Date of Award
3-2025
Document Type
Thesis
Degree Name
Master of Science
Department
Department of Operational Sciences
First Advisor
Matthew A. Robbins, PhD
Abstract
Artificial intelligence (AI) grows ever-more important in warfighting. Emerging technologies allow for the use of AI to control aircraft and weapons systems. This research investigates the application of reinforcement learning (RL) through the Proximal Policy Optimization (PPO) algorithm to a two-versus-two (2v2) beyond-visual-range (BVR) air combat maneuvering problem (ACMP). Implemented in the Advanced Framework for Simulation, Integration, and Modeling (AFSIM), the methodology frames the engagement as a Markov decision process, wherein an autonomous RL agent learns continuous control decisions—throttle, pitch, roll, and yaw—under a cooperative communication scheme. A multi-phase curriculum-learning approach facilitates the progressive acquisition of flight stability, weapon deployment, and air combat tactics. Through hyperparameter tuning and reward shaping, the PPO agent demonstrates the emergent capacity to balance offensive missile usage with evasive maneuvers. Findings highlight the algorithm’s potential to evolve intelligent, adaptive behaviors in aerial engagements, offering pathways to improved tactical simulations and future research in reinforcement learning for combat aviation.
AFIT Designator
AFIT-ENS-MS-25-M-169
Recommended Citation
Joseph, Daniel B., "Proximal Policy Optimization Applied to the Beyond Visual Range Air Combat Maneuvering Problem" (2025). Theses and Dissertations. 8281.
https://scholar.afit.edu/etd/8281
Included in
Aviation Commons, Data Science Commons, Operational Research Commons
Comments
An embargo was observed for this posting.
Distribution A: Approved for public release, Distribution Unlimited. PA case number 88ABW-2025-0320