Programmer Guide/SPU Reference/KLATTSYN: Difference between revisions
(initial import) |
m (1 revision: Initial import) |
Revision as of 17:31, 18 November 2010
Contents
KLATTSYN - Klatt parameter synthesis
Usage:
KLATTSYN SR N SYNMODE OUTMODE TCONFIG TFRAME
Inputs:
The KLATTSYN
expects the following six input parameters:
- SR
- The sampling rate
- N
- The frame length (i.e. the number of samples per evaluation step).
- SYNMODE
- The synthesis mode specifying whether the parameter set given in the TFRAME input should be used in a loop to produce a stationary synthesis ("loop") or if the parameter set is predefined for each evaluation step ("list").
- OUTMODE
- The output mode. The following mode constants are supported:
- "
all
" for full synthesis output
- "
voice
" for output of only the voice source
- "
aspiration
" for output of only the aspiration source
- "
frics
" for output of the frication output
- "
glotout
" for output of voicing and aspiration
- "
par_glotout
" for output of voicing and aspiration in parallel tract
- "
outbypas
" for output only of bypass path
- "
sourc
" for source output
- TCONFIG
- Global configuration parameter table. See [link to global parameter description] for details.
- TFRAME
- Frame configuration parameter table. See [link to frame parameter description] for details.
Outputs:
The atom has the following outputs:
- Y
- The generated output signal.
- I
- The number of synthesis frames.
- T
- The length of the synthesized signal.
Function:
The KLATTSYN
SPAtom encapsulates the synthesizer functionality provided by the Klatt C++ class and can be used within SPUs in the STx macro language.
The Klatt parameter synthesis uses a set of parameters to generate synthesized speech. It was developed by Dennis H. Klatt and was described in Klatt, D.H. (1980), "Software for a cascade/parallel formant synthesizer", Journal of the Acoustical Society of America 67 (3), 971-995. The set of parameters used by this implementation was based on a refined set described in Klatt, D.H. and Klatt, L.C. (1990), "Analysis, synthesis, and perception of voice quality variations among female and male talkers", Journal of the Acoustical Society of America 87 (2), 820-857.
The implementation consists of the native C++ part, including a class that implements the actual synthesizer and a SPAtom class that encapsulates the synthesizer functionality and provides an interface for the STx macro language. The second part consists of two STx classes, one for interfacing with the C++ part and one providing a set of toolbox functions for several contexts within STx.
Notes:
The frame parameter table (TFRAME) contains a row for each frame (or only one row for synthesis in loop mode). The following column names are evaluated:
Field | Description |
F0 | Voicing fundamental frequency in Hz |
AV | Amplitude of voicing in dB (0 to 70) |
F1 | First formant frequency in Hz (200 to 1300) |
B1 | First formant bandwidth in Hz (40 to 1000) |
F2 | Second formant frequency in Hz (550 to 3000) |
B2 | Second formant bandwidth in Hz (40 to 1000) |
F3 | Third formant frequency in Hz (1200 to 4999) |
B3 | Third formant bandwidth in Hz (40 to 1000) |
F4 | Fourth formant frequency in Hz (1200 to 4999) |
B4 | Fourth formant bandwidth in Hz (40 to 1000) |
F5 | Fifth formant frequency in Hz (1200 to 4999) |
B5 | Fifth formant bandwidth in Hz (40 to 1000) |
F6 | Sixth formant frequency in Hz (1200 to 4999) |
B6 | Sixth formant bandwidth in Hz (40 to 2000) |
FNZ | Nasal zero frequency in Hz (248 to 528) |
BNZ | Nasal zero bandwidth in Hz (40 to 1000) |
FNP | Nasal pole frequency in Hz (248 to 528) |
BNP | Nasal pole bandwidth in Hz (40 to 1000) |
ASP | Amplitude of aspiration in dB (0 to 70) |
Kopen | Number of samples in open period (10 to 65) |
Aturb | Breathiness in voicing (0 to 80) |
TLT | Voicing spectral tilt in dB (0 to 24) |
AF | Amplitude of frication in dB (0 to 80) |
Kskew | Skewness of alternate periods (0 to 40 in sample#/2) |
A1 | Amplitude of par 1st formant in dB (0 to 80) |
B1p | Par. 1st formant bandwidth in Hz (40 to 1000) |
A2 | Amplitude of F2 frication in dB (0 to 80) |
B2p | Par. 2nd formant bandwidth in Hz (40 to 1000) |
A3 | Amplitude of F3 frication in dB (0 to 80) |
B3p | Par. 3rd formant bandwidth in Hz (40 to 1000) |
A4 | Amplitude of F4 frication in dB (0 to 80) |
B4p | Par. 4th formant bandwidth in Hz (40 to 1000) |
A5 | Amplitude of F5 frication in dB (0 to 80) |
B5p | Par. 5th formant bandwidth in Hz (40 to 1000) |
A6 | Amplitude of F6 (same as r6pa) (0 to 80) |
B6p | Par. 6th formant bandwidth in Hz (40 to 2000) |
ANP | Amplitude of par nasal pole in dB (0 to 80) |
AB | Amplitude of bypass fric. in dB (0 to 80) |
AVp | Amplitude of voicing ( par in dB (0 to 70) |
Gain0 | Overall gain (60 dB is unity) (0 to 60) |
The global configuration table contains three fields, named "ID", "NumVal" and "StrVal". Depending on the type of parameter, the value must be specified in the "NumVal" field or the "StrVal" field.
ID | NumVal | StrVal | Description |
synthesismodel | cascadeparallel
allparallel |
Specifies if the synthesizer should use the cascade tract for formant synthesis or only the parallel tract | |
nfcascade | 0-6 | If the cascade tract is used, this parameter specifies the number of formants to be used | |
glsource | impulsive
natural sampled |
Type of glottal source (impulsive, natural or sampled) | |
samplefactor | 0.00001 | Multiplication factor for glottal samples (default = 0.00001) | |
f0flutter | 0-100 | Percentage of f0 flutter (0-100, default = 0) |
The KLATTSYN
atom was added to S_TOOLS-STx in version 3.9.