The first step in synthesizing text is tokenization. A
token is, generally speaking, an atom, separated from the rest
of the text by whitespaces. Punctuation marks are separated from the
tokens and saved as a feature in the relation with the name
token. To access these features (item.feat
TOKEN "punc") or
(item.feat TOKEN ``whitespace'') can be used. Punctuation
marks and whitespaces are defined in the file
''festival/lib/token.scm'':
(defvar token.punctuation "\"'`.,:;!?(){}[]")
(defvar token.prepunctuation "\"'`({[")
(defvar token.whitespace "\t \n
\r")
For the German version of Festival, this tokenization is only extended
by the hyphen (``-'') in ``token.prepunctuation'' (file
festival/lib/german/ims_german_voices.scm). That
is, hyphenated compounds are split into words using the hyphen as
delimiter.