Date of Award
3-2023
Document Type
Thesis
Degree Name
Master of Science
Department
Department of Electrical and Computer Engineering
First Advisor
Clark N. Taylor, PhD
Abstract
This thesis introduces a monocular vision-based approach for 6 DoF pose estimation on a known object. The proposed solution is to use a CNN to find known features of an object in an image. These known features, together with their known locations, are used by a PnP algorithm to estimate the pose of the target object with respect to the camera. The primary difficulty with CNN-based methods is needing to generate a large amount of training data to effectively create the CNN. To overcome this difficulty, a 3D model of the real-world object is created and used in a visualization environment to create images of the object from many different perspectives and with differing backgrounds. This approach enables the creation of a very large truth dataset in a short time period. This synthetic imagery is used to train a YOLO network, enabling rapid and accurate feature recognition in a single image. The solution gives less than 3.43 cm average magnitude error at contact point (1 to 2 meters).
AFIT Designator
AFIT-ENG-MS-23-M-060
Recommended Citation
Tran, Quang Ngoc, "Monocular Vision and Machine Learning for Pose Estimation" (2023). Theses and Dissertations. 6940.
https://scholar.afit.edu/etd/6940
Comments
A 12-month embargo was observed.
Approved for public release. Case number on file.