NovaSpeech is currently focusing on a hybrid speech synthesis project funded in part by grant #R44 DC006761-02 from the National Institute of Deafness and Communication Disorders of the National Institutes of Health. This project expands on techniques first presented by Hertz (2002) in which the best features of concatenative unit selection and formant-based synthesis approaches are combined. In our hybrid approach, for both limited and unlimited vocabulary applications, only a small number of “intrinsic units,” such as stressed vowels, from the target speaker, need to be prestored; other “adaptable units” can be produced by rule.
In the first phase of our project, we validated hybrid synthesis as a viable technique, demonstrating convincingly through formal perceptual tests that the majority of segments in an utterance can be replaced by formant-synthesized segments or by segments from other speakers (often even from speakers of the opposite sex and vastly different ages) with virtually no degradation to the resulting speech quality. The hybrid utterances were perceived as natural, as sounding like the intended speaker, and as highly intelligible.
In the second phase of the project, we have been building an actual working hybrid system (initially for English). Unlike existing systems, which generally trade off one desirable property for another, this system will provide human-sounding, intelligible, and mimetic speech, yet have small storage requirements, be able to support the cost-efficient addition of new voices, and be suitable for implementation on virtually any hardware platform. As a result, the technology will be well-suited to virtually any unlimited vocabulary synthesis application.
The system will be of special benefit to speech-impaired individuals, who have a particularly great need for natural-sounding, individualized voices on a broad range of devices. With the hybrid system, individuals who know they will lose their voice due to illness or surgery will be able to cost-efficiently capture and utilize their pre-injury voice in a voice output communication aid; and all speech-impaired users will be able to obtain reliable, appropriate, individualized voices that can “grow” with them as they mature and age.
More details about our experimental results and the perceptual models on which hybrid synthesis is based, see our Publications page.