We formulate the first generalized air combat maneuvering problem (ACMP), called the MvN ACMP, wherein M friendly AUCAVs engage against N enemy AUCAVs, developing a Markov decision process (MDP) model to control the team of M Blue AUCAVs. The MDP model leverages a 5-degree-of-freedom aircraft state transition model and formulates a directed energy weapon capability. Instead, a model-based reinforcement learning approach is adopted wherein an approximate policy iteration algorithmic strategy is implemented to attain high-quality approximate policies relative to a high performing benchmark policy. The ADP algorithm utilizes a multi-layer neural network for the value function approximation regression mechanism. One-versus-one and two-versus-one scenarios are constructed to test whether an AUCAV can outmaneuver and destroy a superior enemy AUCAV. The performance is evaluated across offensive, defensive, and neutral starts, leading to 6 problem instances. The ADP policies outperform the position-energy benchmark policy in 4 of 6 problem instances. Results show the ADP approach mimics certain basic fighter maneuvers and section tactics.

