Speech and Hearing Science Lab Vienna

Welcome to the Speech and Hearing Sciences Lab!

The SHS-Lab is a work group affiliated with the Medical University of Vienna (MUV), Department of Otorhinolaryngology, Division of Phoniatrics-Logopedics.

The research conducted at the SHS-Lab focuses on voice disorders, and clinical care thereof, especially on techniques for voice quality assessment. This includes research on pathological voice production, acoustic emissions thereof, and research on human auditory sensations provoked by pathological speech sounds. A second new focus of the lab is auditory health assessment using computer models of normal and impaired hearing. From 2018 to 2022, the Lab is hosting the research project KLI722-B30 funded by the Austrian Science Fund within its Clinical Research Program.

The general scientific framework of the Lab is described as follows. Various kinds of computer models of pathological speech and hearing are used for (i) health status monitoring via functional assessment, and (ii) treatment effect prediction based on statistical modelling. A particular focus is given to physiological interpretability and explanatory power of the models, trying to avoid typical black-box classification approaches whenever feasible and necessary. Aspects of translational research and personalized medicine are fostered by the close proximity to clinical facilities, which is a particular advantage for the signal processing focused group.

Particular research foci are listed as follows. The Lab hosts the ‘Database of Pathological and Non-Pathological voices’, which currently holds approximately 700 high-speed videos and simultaneous high-quality audio recordings of 160 subjects (as of Oct. 2019). Particular voice quality types that are currently investigated include (i) diplophonia, in which two pitches are perceived in the voice simultaneously, (ii) vocal fry, which is characterized by the perception of individual glottal pulses, (iii) phase-differences observed in vocal fold vibration, which are related to vocal health status, and (iv) voices with extra glottal pulses causing a rough or raspy voice quality. The videos are analyzed via computational imaging, in particular, graphical segmentation. Also synthetic videos of vocal fold vibration are used. A particular focus with regard to diplophonia is the tracking of multiple fundamental frequencies. Analysis-by-synthesis is applied to either glottal area waveforms, or audio waveforms. Auditory perceptual phenomena are either link with particular signal types qualitatively, or quantitatively be means of computations hearing models, and validated by data obtained from expert listeners. A current focus with regard to impaired hearing is the distinction of cochlear gain loss and synptopathy by using computational hearing models for diagnostic purposes, and the extraction auditory health parameters for treatment effect prediction and treatment type selection, e.g., hearing aids or cochlear implants.