Distr@l Feasibility study on the professional use of German speaker voices generated by AI in media production
We are delighted that our Distr@l research project will start on 1.9.2024!
The project is supported and funded by the Distr@l funding programme of the Hessian Ministry for Digitalization and Innovation.
The project is being carried out by the two partners VOX-OVER and ADACOR.
VOX-OVER is an expert in the field of voice production for high-quality media productions in the areas of film and audio book dubbing, e-learning, etc. and contributes expertise in the areas of audio, sound recording, language, voice coloration and tonality to the joint project.
ADACOR is a cloud service provider and expert in the implementation and operation of modern applications and cloud infrastructures.
ADACOR contributes its expertise in the areas of AI infrastructure, programming and data to the project.
The future potential of AI-generated language is enormous. Simple corrections, updates, creation of complete content at the touch of a button, integration of terminology and pronunciation databases, selection of voice over artist and much more are constantly increasing the desire for professional implementation and use.
The Transformer technology made famous by Chat-GPT makes it possible to create artificially generated speech in a completely new quality. There is potential here for use in professional media productions, in targeted customer communication and new applications in accessibility.
Problem: The existing models are not yet consistent, reliable and of sufficient quality for high-quality applications in terms of pronunciation, emotions, undertones, dialects and colorations. This is where we come in.
The interest in and demand for AI-generated speech for media productions and for addressing customers is currently growing exponentially. In many applications, the voice previously spoken by professional voice over artist will be replaced by voice cloning in the future. The basic prerequisites for this are quality, acceptance and consideration of data protection, information and cyber risk aspects.
As part of the feasibility study, we therefore want to examine the following points in the form of a proof of concept:
Is it possible to develop a German speech corpus for training Transformer models for high-quality media productions?
Are the leading Transformer audio models (SpeechT5, Bark, MMS) qualitatively suitable for the new speech corpus?
Are the artificially generated voices qualitatively convincing for the fields of application described and what other potentials and risks arise from this?
Our aim is to create a platform that takes into account the technical requirements of professional audio post-production as well as the linguistic and legal aspects.





