The module Wave_synth generates speech based on the information available in the utterance structure. Relevant for this conversion is the phone name, its duration and pitch.
In its current implementation the waveform synthesis is based on the diphones of the MBROLA project ([2] http://tcts.fpms.ac.be/synthesis/). The MBROLA project provides a synthesis engine (phone-to-speech conversion) and diphones for a large number of languages including German. We currently base our voices on the male (de2) and female (de3 and de1) German voices of the MBROLA project.
The interface between Festival and the MBROLA synthesis engine is a file that contains a list of phones with durations and pitch information. This file is fed into the MBROLA synthesis engine. The resulting speech is then loaded back into the Festival utterance and can be played.