RTVE Databases


RTVE2018 Database

The RTVE2018 database is a collection of whole TV shows drawn from diverse genres and broadcast by the public Spanish National Television from 2015 to 2018. There are a total of 569 hours and 22 minutes of audio. About 460 hours are provided with the subtitles and about 109 hours have been human-revised transcribed. The database has been divided in 4 partitions: a train one, two development partitions dev1, dev2 and a test partition. Additionally, the database includes a set of text files extracted from all the subtitles broadcast by the RTVE 24H Channel during 2017.
Information about the content of the database can be found in the RTVE2018 database descripcion report .

The RTVE2018 database has been used in the IberSPEECH-RTVE 2018 Challenge and in the Albayzín-RTVE 2020 Challenge. The evaluation plans give a more detailed information about the use of the database for speech to text transcripcion, speaker diarization and multimodal diarization.

RTVE2020 Database

The RTVE2020 database is a collection of whole TV shows drawn from diverse genres and broadcast by the public Spanish National Television from 2018 to 2019. There are a total of 55 hours and 40 minutes of audio. The whole database has been human-revised transcribed. Additionally, 33 hours and 21 minutes have been labeled in terms of speaker, face and scene descriptors.
Information about the content of the database can be found in the RTVE2020 database descripcion report .

The RTVE2020 database has been used in the Albayzín-RTVE 2020 Challenge. The evaluation plans give a more detailed information about the use of the database for speech to text transcripcion, speaker diarization and multimodal diarization.

RTVE Databases License

The data is available subject to the terms of a licence agreement with the RTVE.
To download the RTVE databases (2018 & 2020),  a representative of your research group, company,..., must sign the license agreement

  • License agreement: request the license form by sending an email to fondodocumentalrtve@rtve.es with Cc to lleida@unizar.es
  • Please, return a scanned copy of the license with your signature to fondodocumentalrtve@rtve.es with Cc to lleida@unizar.es , with the subject: RTVE2020 database license. In the email body you must indicate the user and the purpose of the research.
  • Finally, the original signed license must be sent by post to RTVE Documentary Archive Department, indicating the user, the purpose of the research and the period of use.
    CORPORACION RTVE
    Dir. FONDO DOCUMENTAL RTVE
    Avda. Radiotelevisión, 4
    28223 Pozuelo de Alarcón
    Madrid
    Spain
RTVE2020 Database Download

Once we have processed your data license application, you will receive an email with the instructions to download the databases.
If you have any question, don't hesitate to contact us at lleida@unizar.es