Cross-language study of voicing categorization of rate-varied speech

Kyoko Nagao and Kenneth de Jong

Dept. of Linguist., Indiana Univ., 322 Memorial Hall, Bloomington, IN 47405


Speaking rate pervasively affects the acoustic properties of linguistic categories. However, a wealth of previous literature shows that listeners can still systematically identify those categories. For example, voice onset time (VOT), a major acoustic cue to differentiate voicing contrasts, gets shorter as speech rate increases. Listeners' category boundaries on a VOT continuum shift to a lower value when syllable duration decreases. Also listeners, when asked to choose best examples of a category, choose tokens with a shorter VOT at fast rates.

These previous perception studies used almost exclusively computer-generated stimuli. This study examines the effect of speech rate on voicing categorization in naturally produced speech. In addition, we examine whether the same effect happens with listeners with different linguistic backgrounds. Four native speakers of American English repeated syllables (/bi/ and /pi/) at increasing rates in time with a metronome. Three-syllable stimuli were spliced from the repetitive speech and presented to English, Japanese, and Korean listeners. VOT and syllable duration were also obtained for each syllable, showing overlap at the fastest rates.

In Experiment 1, English listeners exhibited rate normalization as in previous studies. However, English listeners identified /p/ and /b/ with a category boundary much shorter than those found in previous studies with synthesized stimuli. This difference may be due to the extraordinarily wide range of VOT values in previous studies. The boundary found in the current study closely matches the actual division point for /b/ and /p/ productions.

In Experiment 2, English listeners identified each consonant and rated its goodness. Mean goodness ratings showed that VOT values shift to lower values for the best /b/ and to higher for the best /p/ when syllable duration is longer. Also, mean ratings were lower when the listeners misidentified the consonant than when they correctly identified the consonant, suggesting that listeners were aware of fine-grained differences in the fast, ambiguous stimuli.

In Experiment 3, Japanese listeners and Korean listeners, one group in the US, and one in the country of origin performed the same identification task as in Experiment 1. The same rate normalization effects were found for each of the four groups. However, identification functions were not the same as the English listeners, but were systematically shifted in the direction of the native language. This cross language difference suggests strongly that rate normalization here is not a general auditory mechanism, but is based on the distribution of consonants that the listeners have experienced.