Date of Award
12-1992
Document Type
Thesis
Degree Name
Master of Science in Electrical Engineering
Department
Department of Electrical and Computer Engineering
First Advisor
Steven K. Rogers, PhD
Abstract
The TIMIT and KING databases, as well as a ten day AFIT speaker corpus, are used to compare proven spectral processing techniques to an auditory neural representation for speaker identification. The feature sets compared were Linear Predictive Coding (LPC) cepstral coefficients and auditory nerve firing rates using the Payton model. This auditory model provides for the mechanisms found in the human middle and inner auditory periphery as well as neural transduction. Clustering algorithms were used to generate speaker specific codebooks - one statistically based and the other a neural approach. These algorithms are the Linde-Buzo-Gray (LBG) algorithm and a Kohonen self-organizing feature map (SOFM). The LBG algorithm consistently provided optimal codebook designs with corresponding better classification rates. The resulting Vector Quantized (VQ) distortion based classification indicates the auditory model provides slightly reduced recognition in clean studio quality recordings (LPC 100%, Payton 90%), yet achieves similar performance to the LPC cepstral representation in both degraded environments (both 95%) and in test data recorded over multiple sessions (both over 98%). A variety of normalization techniques, preprocessing procedures and classifier fusion methods were examined on this biologically motivated feature set.
AFIT Designator
AFIT-GE-ENG-92D-11
DTIC Accession Number
ADA259076
Recommended Citation
Colombi, John M., "Cepstral and Auditory Model Features for Speaker Recognition" (1992). Theses and Dissertations. 7128.
https://scholar.afit.edu/etd/7128
Comments
The author's Vita page is omitted.