Date of Award
3-2025
Document Type
Thesis
Degree Name
Master of Science
Department
Department of Operational Sciences
First Advisor
Bruce A. Cox, PhD
Abstract
Classifying previously unseen objects poses a significant challenge for traditional computer vision algorithms, which rely on extensive labeled training data. Zero-shot reasoning offers a way to overcome this limitation. This research explores a novel method for image recognition using the Animals with Attributes 2 (AWA2) dataset as a proof of concept. A multi-label ResNet50 model predicts core attributes like color, ear shape, or number of limbs. Those attributes then feed into ChatGPT which leverages its extensive knowledge base to classify the animal based on the provided attributes. This novel approach skips the need to train on every possible class. Instead, the computer vision model is trained on the available classes to recognize general attributes while ChatGPT handles the actual classification through its vast pretraining. This division of labor reduces the need for retraining the computer vision model when new classes emerge, simultaneously reducing training data requirements for minority classes. The method shows promise for domains where unseen classes appear often, such as autonomous surveillance or emerging threat identification. Preliminary tests indicate potentially encouraging performance, with nearly state-of-the-art validation accuracy of 36.6% on unseen classes; however the test accuracy of 3% on different reserved unseen classes indicates significant room for methodological or training improvement, potentially indicating overfitting on the validation set. Although our method does not match the best reported accuracies for zero-shot learning, the framework demonstrates a scalable alternative to the current computer vision paradigm.
AFIT Designator
AFIT-ENS-MS-25-M-174
Recommended Citation
Wegner, Michael A., "Improving Zero Shot Learning by Linking Multi-label CNNs with LLMs" (2025). Theses and Dissertations. 8243.
https://scholar.afit.edu/etd/8243
Comments
An embargo was observed for posting this thesis.
This work is marked Distribution A, Approved for Public Release. PA case number 88ABW-2025-0352