Neural Dynamics of Variable Rate Speech Categorization
MetadataShow full item record
What is the neural representation of a speech code as it evolves in real time? A neural model of thiss process, called the ARTPHONE model, is developed to quantitatively simulate data concerning segregation and integration of phonetic percepts, as exemplified by the problem of distinguishing "topic" from "top pick" in natural discourse. Psychoacouotic data concerning categorization of stop consonant pairs indicate that the closure time between syllable final (VC) and syllable initial (CV) transitions determines whether consonants are segregated, i.e., perceived as distinct, or integrated, i.e. fused into a single percept. Hearing two stops in a VC CV pair that are phonetically the same, as in "top pick," requires about 150 msec more closure time than hearing two stops in a Vc1-C₂V pair that are phonetically different, as in "odd ball." When the distribution of closure intervals over trials is experimentally varied, subjects' decision boundaries between one-stop and two-stop percepts always occurred near the mean closure interval (Repp, 1980). The neural model traces these properties to clynamical interactions between a working memory for short-term storage of phonetic items and a list categorization network that groups, or chunns, sequences of the phonetic itemss in working memory These interactions autornatically adjust their processing rate to the speech rate via automatic gain control. The speech code in the model is a resonant wave that emerges after bottom-up signals from the working memory select list chunks which, in turn, read out top-down expectations that amplify consistent woking memory items. The resonance between bottom-up abd top-down information develops on a slower time scale than the processing of bottom-up information alone. It focuses attention upon speach groupings in working memory that are expected based upon past experience, while inhibiting speech features that are not expected, as in phonemic restoration. As in other examples drawn from Adaptive Resonance Theory, it is proposed that all conscious speech precepts are resonant events. In the case of VC₁-C₂V pairs, such a resonance may be rapidly reset by inputs, such as C₂, that are inconsistent with a top-down expectation, say if C₁ or, in the absence of a top-down mismatch, by a collapse of resonanct activation due to a habituative process that can take a much longer time to occur, as illustrated by the categorical boundary between VCV-CV. The categorical boundary for integration of VS-CV persists 150 msec longer that that of VC₁-C₂V because of the resonant dynamics that subserve perception of C. These categorization data may thus be understood ass emergent properties of a resonant process that adjusts its dynamics to track the speech rate.