Date of Award

3-2023

Document Type

Thesis

Degree Name

Master of Science in Operations Research

Department

Department of Operational Sciences

First Advisor

Phillip M. LaCasse, PhD

Abstract

There is strong motivation in both civilian and military circles to understand the attitudes, motivations, feelings, and emotions of a population of interest. Social media is a rich source of self-disclosed information by individuals from all walks of life about virtually every domain of the human experience, but the vast quantity of data is impossible to effectively analyze without advanced natural language processing algorithms. This research creates a transfer learning based emotion classification model for Indonesian language Twitter data. Transfer learning consists of two steps: pre-training and fine tuning. Three variations of Indonesian Bidirectional Encoder Representations from Transformers (IndoBERT) are tested with hyperparameters tuned via designed experiment. The top IndoBERT model, tested on an open source corpus of 4,403 labeled Indonesian Tweets, outperforms all known prior studies with an F1 score of approximately 0.791. Additionally, this research explores the relationship between training set size and model validity for fine tuning of the transfer learning models; datasets ranging from 100 to 3900 observations are trained and then validated on five unique test sets. Results indicate that as few as 1000 observations can obtain results comparable to using the full training corpus. Finally, this research proposes a self-supervised approach using embedded emojis for sentiment labeling in order to alleviate the need for translation. Initial results are encouraging, with an F1 score of 0.454 on a five-emotion dataset and 0.746 on a two-sentiment dataset.

AFIT Designator

AFIT-ENS-MS-23-M-154

Comments

A 12-month embargo was observed.

Approved for public release. Case number on file.

Share

COinS