RTVE2018 Database

RTVE2018 Database

RTVE2018 database is a collection of whole TV shows drawn from diverse genres and broadcast by the public Spanish National Television from 2015 to 2018. There are a total of 569 hours and 22 minutes of audio. About 460 hours are provided with the subtitles and about 109 hours have been human-revised transcribed. The database has been divided in 4 partitions: a train one, two development partitions dev1, dev2 and a test partition. Additionally, the database includes a set of text files extracted from all the subtitles broadcast by the RTVE 24H Channel during 2017.
Information about the content orf the database can be found in the RTVE2018 database descripcion report .

RTVE2018 database has been used in the IberSPEECH-RTVE 2018 Challenge. The evaluation plans give a more detailed information about the use of the database for speech to text transcripcion, speaker diarization and multimodal diarization.

RTVE2018 Database License

The data is available subject to the terms of a licence agreement with the RTVE.
To download the RTVE2018 data,  a representative of your research group, company,..., must sign the license agreement

  • License agreement, español, english
  • Please, return a scanned copy of the license with your signature to fondodocumentalrtve@rtve.es with Cc to lleida@unizar.es , with the subject: RTVE2018 database license. In the email body you must indicate the user and the purpose of the research.
  • Finally, the original signed license must be sent by post to RTVE Documentary Archive Department, indicating the user, the purpose of the research and the period of use.
    Avda. Radiotelevisión, 4
    28223 Pozuelo de Alarcón

RTVE2018 Database Download

Once we have processed your data license application, you will receive an email with the instructions to download the database.
If you have any question, don't hesitate to contact us at lleida@unizar.es