Supplemental MaterialsThe material below is intended to supplement selected publications by Peter Birkholz and coworkers. They contain, for example, autio stimuli that were used in perception experiments described in the papers.
Birkholz P, Gabriel F, Kürbis S, Echternach M (submitted). How the peak glottal area affects LPC-based formant estimates of vowels.
The supplemental material here contains the CAD files for 3D-printing the ten vocal tract resonators, the CAD files needed to create the silicone vocal fold model, and the audio files of the synthetic vowels generated with the physical models and the computer simulations.
Birkholz P, Stone S, Kürbis S (2019). Comparison of different methods for the voiced excitation of physical vocal tract models. In: Birkholz P, Stone S (eds.) Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2019 (TUDPress, Dresden)
The supplemental material here contains the CAD files for 3D-printing the eight vocal tract resonators for the eight long German vowels and the audio files of the stimuli generated using the different excitation methods.
Birkholz P, Pape D (submitted). How modeling entrance loss and flow separation in a two-mass model affects the oscillation and synthesis quaity.
The supplemental material here contains the audio stimuli used for the perception experiments.
Stone S, Marxen M, Birkholz P (submitted). Construction and Evaluation of a Parametric One-Dimensional Vocal Tract Model.
The supplemental material here contains the audio stimuli used for the perception experiments and an additional table with parameter values.
Klause F, Stone S, Birkholz P (2017). A head-mounted camera system for the measurement of lip protrusion and opening during speech production. In: Trouvain J, Steiner I, Möbius B (eds.) Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2017 (TUDPress, Dresden), pp. 145-151 [pdf]
This paper presents a head-mounted camera system for the simultaneous measurement of lip protrusion and opening during speech production. All necessary files for the construction of the camera helmet, and the software and scripts for the extraction of the parameters can be downloaded here.
Birkholz P, Martin L, Xu Y, Scherbaum S, Neuschaefer-Rube C (2016). Manipulation of the prosodic features of vocal tract length, nasality and articulatory precision using articulatory synthesis. Computer Speech & Language, 41, pp. 116-127
In this study we examined the articulatory generation of secondary prosodic features using articulatory speech synthesis. Therefore, we manipulated certain articulatory features of re-synthesized German words and asked listeners to rate the prosodic effects. We applied the following articulatory manipulations:
- Best possible resynthesis of the natural utterance (no manipulation)
- Simulation of a longer vocal tract (larynx lowering and lip protrusion)
- Simulation of a shorter vocal tract (larynx raising and lip retraction)
- Nasalized articulation of all sonorants
- Reduced articulatory effort (lower speed of target approximation)
- Increased articulatory effort (higher speed of target approximation)
- Slight centralization of all vowels and consonants (towards an indifferent articulation)
- Strong centralization of all vowels and consonants
Birkholz P, Martin L, Willmes K, Kröger BJ, Neuschaefer-Rube C (2015). The contribution of phonation type to the perception of vocal emotions in German: an articulatory synthesis study. Journal of the Acoustical Society of America, 137(3), pp. 1503–1512
In this study we examined how the sole change of phonation type can change the perception of vocal emotions in re-synthesized portrayed emotional utterances, when other prosodic parameters (phone duration, pitch) remain the same as in the original utterance. Below are the stimuli for one of the examined sentences ("Der Lappen liegt auf dem Eisschrank"), which was originally spoken with seven emotional expressions. For each emotion, you will first hear the original utterance re-synthesized as good as possible with regard to phonation type, phone duration, and pitch. Then you hear the same sentence with the phonation type replaced by purely breathy voice, purely modal voice, and purely pressed voice. here.
Mumtaz R, Preuß S, Neuschaefer-Rube C, Hey C, Sader R, Birkholz P (2014). Tongue Contour Reconstruction from Optical and Electrical Palatography. IEEE Signal Processing Letters, 21(6), pp. 658-662
In this study we examined the potential of optical and electrical palatography (OPG and EPG) for the reconstruction of the tongue contour. Therefore, we extracted the tongue contour and the corresponding (virtual) EPG and OPG data from MRI corpora of the vocal tract of two different speakers, and trained linear models to predict the tongue contours based on the sensor data.
This supplemental material contains the tongue and vocal tract contours extracted from the MRI samples as svg files, the tongue points and simulated sensor data in Excel tables, and Matlab scripts for the cross-validation and EPG index calculation.
Birkholz P, Kröger BJ, Neuschaefer-Rube C (2011). Articulatory synthesis of words in six voice qualities using a modified two-mass model of the vocal folds. In: Proc. of the 1st International Workshop on Performative Speech and Singing Synthesis, Vancouver, BC [pdf]
In this study we examined the capability of our extended two-mass model of the vocal folds to synthesize words in different voice qualities. The following stimuli were synthesized and judged by listeners with respect to the perceived voice quality:
- Intended normal voice quality
- Intended pressed voice quality
- Intended breathy voice quality
- Intended whispery voice quality (pressed breathy/murmur)
- Intended vocal fry
- Intended falsetto
Birkholz P, Kröger BJ, Neuschaefer-Rube C (2010). Stimmsynthese mit einem Zwei-Massen-Modell der Stimmlippen mit dreieckigem Öffnungsquerschnitt. In 27th Jahrestagung der DGPP, Aachen, Germany [pdf]
- Synthetic vowel stimuli for several steps on the continuum from the minimal to the maximal glottal rest area using the classical two-mass-model of the vocal folds (wav). All stimuli were perceived as a slightly pressed to normal voice quality.
- Synthetic vowel stimuli for several steps on the continuum from the minimal to the maximal glottal rest area using the new two-mass-model of the vocal folds with a triangular glottis shape in the rest position (wav). As in reality, the voice quality changes from pressed over normal to breathy when the glottal rest area increases.