Researchers at DTU Health Tech use computational models to look at speech understanding to suggest possibilities to create better speech communication for example in online voice-chat experiences.
Most people have experienced that it is often easier to understand what someone is saying when their voice is separated from interfering noises. What may be less known is that we all have a spatial filter due to the anatomy of our head and ears. This spatial filter is also referred to as the head-related transfer function (HRTF).
Researchers from DTU Health Tech have investigated the following hypothesis: Spatial separation of the sources of speech is dependent on the individual HRTF. Consequently, the anatomy of a person can give advantages in understanding speech compared to other individuals with different anatomy.
Postdoc Axel Ahrens, who is part of the Hearing Systems section headed by Professor Torsten Dau at DTU Health Tech, have, employed a computational speech intelligibility model that has previously been shown to correctly predict speech intelligibility in certain conditions. The work was done in collaboration with colleagues from University of Malaga and Facebook Reality Labs. “Using the model, we showed that the speech intelligibility cues are largely different for various HRTFs. Thus, some listeners might have speech intelligibility advantages over others. However, it remains unclear if these differences can also be found in real-life scenarios or if other factors contribute to a larger extend”, Axel Ahrens says.
These findings can for example have implications for virtual reality applications or video chat programmes, where a spatial separation of voices could improve the communication experience. I.e. when virtually separating voices in applications such as Zoom or MS Teams, the app developers should consider choosing the spatial filter that leads to the best possible speech understanding advantage.
Read more in the full paper here.
Photo: Colourbox.com