RTVE Databases


RTVE2018 Database

The RTVE2018 database is a collection of whole TV shows drawn from diverse genres and broadcast by the public Spanish National Television from 2015 to 2018. There are a total of 569 hours and 22 minutes of audio. About 460 hours are provided with the subtitles and about 109 hours have been human-revised transcribed. The database has been divided in 4 partitions: a train one, two development partitions dev1, dev2 and a test partition. Additionally, the database includes a set of text files extracted from all the subtitles broadcast by the RTVE 24H Channel during 2017.
The evaluation plans give a more detailed information about the use of the database for speech to text transcripcion, speaker diarization and multimodal diarization.

RTVE2020 Database

The RTVE2020 database is a collection of whole TV shows drawn from diverse genres and broadcast by the public Spanish National Television from 2018 to 2019. There are a total of 55 hours and 40 minutes of audio. The whole database has been human-revised transcribed. Additionally, 33 hours and 21 minutes have been labeled in terms of speaker, face and scene descriptors.
Information about the content of the database can be found in the RTVE2020 database descripcion report .
The evaluation plans give a more detailed information about the use of the database for speech to text transcripcion, speaker diarization and multimodal diarization.

RTVE2022 Database

The RTVE2022DB database is a collection of a diverse audio material recorded from the 60's to the present. It covers from historical recordings, popular broadcated TV shows and fictional shows. The database contains three partitions: a training partition with audio and the corresponding subtitles aligned for training ASR systems, a development partition with audio and the corresponding broadcast subtitles and reference subtitles for the Text and Speech Alignment evaluation, and finally, a test partition with the audio for all the challenges.
The evaluation plans give a more detailed information about the use of the database in the Albayzín evaluations.

RTVE Databases information

Information about the content of the RTVE databases can be found in the databases descripcion report .

RTVE Databases License

The data is available subject to the terms of a licence agreement with the RTVE.
To download the RTVE databases (2018, 2020 & 2022),  a representative of your research group, company,..., must sign the license agreement

  • License agreement: request the license form by sending an email to fondodocumentalrtve@rtve.es with Cc to lleida@unizar.es,
  • Please, return a scanned copy of the license with your signature to fondodocumentalrtve@rtve.es with Cc to lleida@unizar.es , with the subject: RTVE2022 database license. In the email body you must indicate the user and the purpose of the research.
  • Finally, the original signed license must be sent by post to RTVE Documentary Archive Department, indicating the user, the purpose of the research and the period of use.
    CORPORACION RTVE
    Dir. FONDO DOCUMENTAL RTVE
    Avda. Radiotelevisión, 4
    28223 Pozuelo de Alarcón
    Madrid
    Spain
RTVE Databases Download

Once we have processed your data license application, you will receive an email with the instructions to download the databases.
If you have any question, don't hesitate to contact us at lleida@unizar.es