Wavelet Parameters for Speech Synthesis

Abstract:

A standard method of analyzing human speech is to divide it into its constituent linguistic elements. These elements, called phonemes, are considered to be the basic building blocks of speech and can produce meaningful words and phrases when concatenated. Synthesizing speech from individual phonemes, however, can result in discontinuous and sometimes unintelligible sounds. The transitions between phonemes that occur naturally during speech production are often missing in synthesized speech. The effect of these transitions is called coarticulation, the overlap of an articulation with its neighbor. A recently proposed model for speech coarticulation uses wavelet system characterization to describe the time-frequency behavior of a coarticulated speech utterance. The objectives of this research are to verify the proposed wavelet system coarticulation model by reproducing speech that was analyzed using the model, and to explore the model by using it to synthesize new speech.

The coarticulated speech in question is the consonant-vowel-consonant combination that occurs in words such as ``deed'' or ``bib'' where the two consonants are the same. These coarticulated words along with their vowels spoken in isolation are recorded digitally and processed to create the wavelet system model of coarticulated speech. The result is a wavelet transform of the coarticulated word with respect to its isolated vowel. This model shows how the vowel changes in time and frequency when spoken in context. Synthesizing the original speech using the inverse process serves to verify the wavelet system coarticulation model. An informal visual and aural comparison of the synthesized speech to the original speech shows that the two are nearly the same and that the wavelet system characterization process is reversible. Synthesizing new utterances using the coarticulation model from one word and using isolated vowels that are different from the vowel used in that word produces utterances that are consistently identified as the original word. It is concluded that using different vowels with the wavelet system coarticulation model affects only the accuracy of the original word's reproduction and does not synthesize new words. The insights gained from this research may be used to produce more intelligible synthetic speech.


Brian Tuttle

Graduate Program in Acoustics


Page Last Modified Tuesday, 02-Jan-2007 02:38:46 EST