Date of Award
Master of Science
Department of Electrical and Computer Engineering
Martin DeSimio, PhD
We develop and present results of an artificial neural network (ANN) based compensation technique for mismatched classifier training and testing conditions in speaker identification (SID). One ANN per feature per speaker is trained to perform a mapping of that feature from a corrupted condition to an undistorted condition. Therefore, a classifier trained under one condition may be used to classify data collected under a different condition. Speech utterances from 168 speakers, collected in a studio, and also re-recorded after transmission over telephone networks, are used for developing and testing the method. Peak formant resonant frequencies, their bandwidths, and pitch are used as features. These features from the studio speech are used to train Gaussian Mixture Model classifiers. Portions of the studio and telephone speech are used to train the compensation ANNs. In mismatched train and test conditions, features from telephone speech are modified by the trained ANNs and applied to the GMMs trained with features from studio speech. Without compensation, SID accuracy is 6%. The compensation method developed in this work provides mismatch SID accuracy of 58.3%. Previous research on the same data with the commonly used Mel Frequency Cepstral Coefficients as features and a typical compensation method of Cepstral Mean Subtraction with Band Limiting gives SID accuracy of 27.4% with the same type of classifiers.
DTIC Accession Number
Fitzgerald, Edmund A., "Channel-Mismatch Compensation in Speaker Identification Feature Selection and Adaptation with Artificial Neural Networks" (1998). Theses and Dissertations. 5629.