Speaker Normalization Methods for Vowel Cognition: Comparative Analysis Using Neural Network and Nearest Neighbor Classifiers
Carpenter, Gail A.
Govindarajan, Krishna K.
MetadataShow full item record
Intrinsic and extrinsic speaker normalization methods are systematically compared using a neural network (fuzzy ARTMAP) and L1 and L2 K-Nearest Neighbor (K-NN) categorizers trained and tested on disjoint sets of speakers of the Peterson-Barney vowel database. Intrinsic methods include one nonscaled, four psychophysical scales (bark, bark with endcorrection, mel, ERB), and three log scales, each tested on four combinations of F0 , F1, F2, F3. Extrinsic methods include four speaker adaptation schemes, each combined with the 32 intrinsic methods: centroid subtraction across all frequencies (CS), centroid subtraction for each frequency (CSi), linear scale (LS), and linear transformation (LT). ARTMAP and KNN show similar trends, with K-NN performing better, but requiring about ten times as much memory. The optimal intrinsic normalization method is bark scale, or bark with endcorrection, using the differences between all frequencies (Diff All). The order of performance for the extrinsic methods is LT, CSi, LS, and CS, with fuzzy ARTMAP performing best using bark scale with Diff All; and K-NN choosing psychophysical measures for all except CSi.
RightsCopyright 1993 Boston University. Permission to copy without fee all or part of this material is granted provided that: 1. The copies are not made or distributed for direct commercial advantage; 2. the report title, author, document number, and release date appear, and notice is given that copying is by permission of BOSTON UNIVERSITY TRUSTEES. To copy otherwise, or to republish, requires a fee and / or special permission.