Date of Award
Master of Science in Electrical Engineering
Department of Electrical and Computer Engineering
Martin Desimio, PhD
This thesis examines methods for isolated digit recognition without using time alignment. Resource requirements for isolated word recognizers that use time alignment can become prohibitively large as the vocabulary to be classified grows. Thus, methods capable of achieving recognition rates comparable to those obtained with current methods using these techniques are needed. The goals of this research are to find feature sets for speech recognition that perform well without using time alignment, and to identify classifiers that provide good performance with these features. Using the digits from the TI46 database, baseline speaker-independent recognition rates of 95.2% for the complete speaker set and 98.1% for the male speaker set are established using dynamic time warping (DTW). This work begins with features derived from spectrograms of each digit. Based on a critical band frequency scale covering the telephone bandwidth (300-3000 Hz), these critical band energy features are classified alone and in combination with several other feature sets, with several different classifiers. With this method, there is one "short" feature vector per word. For speaker-independent recognition using the complete speaker set and a multi-layer perceptron (MLP) classifier, a recognition rate of 92.4% is achieved. For the same classifier with the male speaker set, a recognition rate of 97.1% is achieved. For the male speaker set, there is no statistical difference between results using DTW, and those using the MLP and no time alignment. This shows that there are feature sets that may provide high recognition rates for isolated word recognition without the need for time alignment.
DTIC Accession Number
Gay, Jeffrey M., "Isolated Digit Recognition without Time Alignment" (1994). Theses and Dissertations. 6410.