Date of Award


Document Type


Degree Name

Master of Science


Department of Electrical and Computer Engineering

First Advisor

Clark N. Taylor, PhD


This thesis introduces a monocular vision-based approach for 6 DoF pose estimation on a known object. The proposed solution is to use a CNN to find known features of an object in an image. These known features, together with their known locations, are used by a PnP algorithm to estimate the pose of the target object with respect to the camera. The primary difficulty with CNN-based methods is needing to generate a large amount of training data to effectively create the CNN. To overcome this difficulty, a 3D model of the real-world object is created and used in a visualization environment to create images of the object from many different perspectives and with differing backgrounds. This approach enables the creation of a very large truth dataset in a short time period. This synthetic imagery is used to train a YOLO network, enabling rapid and accurate feature recognition in a single image. The solution gives less than 3.43 cm average magnitude error at contact point (1 to 2 meters).

AFIT Designator



A 12-month embargo was observed.

Approved for public release. Case number on file.