Date of Award


Document Type


Degree Name

Master of Science in Electrical Engineering


Department of Electrical and Computer Engineering

First Advisor

Brett Borghetti, PhD


Despite ongoing improvements in machine translation, machine translators still lack the capability of incorporating context from which source text may have been derived. Machine translators use text from a source language to translate it into a target language without observing any visual context. This work aims to produce a neural machine translation model that is capable of accepting both text and image context as a multimodal translator from Mandarin Chinese to English. The model was trained on a small multimodal dataset of 700 images and sentences, and compared to a translator trained only on the text associated with those images. The model was also trained on a larger text only corpus of 21,000 sentences with and without the addition of the small multimodal dataset. Notable differences were produced between the text only and the multimodal translators when trained on the small 700 sentence and image dataset, however no observable discrepancies were found between the translators trained on the larger text corpus. Further research with a larger multimodal dataset could provide more results clarifying the utility of multimodal machine translation.

AFIT Designator


DTIC Accession Number