The IberSPEECH 2022 Challenge starts!
Zaragoza, April 22, 2022
- RTVE Databases
The RTVE databases are composed by three corpora released in 2018, 2020 and 2022. The data is available subject to the terms of a licence agreement with the RTVE.
The new RTVEDB2022 contains a new test partition with 50 hours for the Albayzín-RTVE 2022 challenge and an extra partition for training ASR systems with audio and subtitles aligned thanks to the work of Vicomtech. The new training resources are made up of 168 hours of audio transcribed and automatically aligned at the phrase level, that is, including their start and end timestamps. The Albayzín-RTVE 2022 Challenge uses all the corpora. The license agreement allow to use both releases. More detailed information and the license agreement form can be found in the following link...
- 3/24 TV channel (Corporació Catalana de Mitjans Audiovisuals, CCMA)
The Catalan broadcast news database from the 3/24 TV channel proposed for the 2010 Albayzin Audio Segmentation Evaluation was recorded by the TALP Research Center from the UPC in 2009 under the Tecnoparla project funded by the Generalitat de Catalunya. The Corporació Catalana de Mitjans Audiovisuals (CCMA), owner of the multimedia content, allows its use for technology research and development. The database consists of around 87 hours of recordings in which speech can be found in a 92% of the segments, music is present a 20% of the time and noise in the background a 40%. Another class called others was deﬁned which can be found a 3% of the time. Regarding the overlapped classes, 40% of the time speech can be found along with noise and 15% of the time speech along with music. The data will be supplied in PCM format, mono, little endian 16 bit resolution, and 16 kHz sampling frequency.
- Aragón Radio (Corporación Aragonesa de Radio y Televisión, CARTV)
The database donated by the Corporación Aragonesa de Radio y Televisión (CARTV) consists of around twenty hours of the Aragon Radio broadcast. This data set contains around 85% of speech, 62% of music and 30% of noise in a way that 35% of the audio contains music along with speech, 13% is noise along with speech and 22% is speech alone. The data will be supplied in PCM format, mono, little endian 16 bit resolution, and 16 kHz sampling frequency.
- Albayzín Database Download
Please, contact the Albayzín Evaluations organizers to download the 3/24 TV channel and Aragón Radio Databases.
For the RTVE databases, follow the instructions given in the following link....