The IberSpeech-RTVE 2018 Challenge starts!
Zaragoza, June 18, 2018
Albayzin Evaluation
IberSpeech-RTVE 2018 Challenge
Evaluation Plans
Following the success of previous “ALBAYZIN” technology evaluations since 2006, this year ALBAYZIN evaluations shall focus around multimedia analysis of TV broadcast content. Under the framework of a newly created Cátedra RTVE de la Universidad de Zaragoza, the IberSPEECH-RTVE Challenge 2018 will provide participants with an annotated TV broadcast database in Spanish and the necessary tools for the evaluations. The evaluation is supported by the Spanish Thematic Network on Speech Technology (RTTH) and Cátedra RTVE de la Universidad de Zaragoza and is organized by ViVoLab Universidad de Zaragoza . The evaluation will be conducted as part of the Iberspeech 2018 conference to be held in Barcelona from 21 to 23 November 2018.
- IberSPEECH-RTVE 2018 Speech to Text Transcription Challenge
- IberSPEECH-RTVE 2018 Speaker Diarization Challenge
- IberSPEECH-RTVE 2018 Multimodal Diarization Challenge
Additionally, the Albayzin 2018 Search on Speech Challenge organized by Universidad San Pablo-CEU and AuDIaS from Universidad Autónoma de Madrid will use the RTVE2018 database
- June 18, 2018: Registration opens.
- June 25, 2018: Release of the training and development data for S2TC, SDC and MDC.
- June 30, 2018: Release of the training and development data for SoSC.
- September 24, 2018: Registration deadline. Release of the evaluation data.
- October 21, 2018: Deadline for submission of results and system descriptions.
- October 31, 2018: Results distribute to the participants.
- Iberspeech 2018 conference (Barcelona, November 21-23, 2018): Official public publication of the results.
- IberSPEECH-RTVE 2018 Speech to Text Transcription Challenge
-
The IBERSPEECH-RTVE SPEECH TO TEXT TRANSCRIPTION is a new challenge in the ALBAYZIN evaluation series. This is supported by the Spanish Thematic Network on Speech Technology and Cátedra RTVE en la Universidad de Zaragoza and is organized by Vivolab – Universidad de Zaragoza. The Speech to Text transcription evaluation consists of automatically transcribe different types of TV shows. For this evaluation, RTVE has licensed more than 550 hours of own TV production jointly with the corresponding subtitles. The shows cover a great variety of scenarios from studio to live broadcast, from read speech to spontaneous speech, different Spanish accents, including Latin-American accents and a great variety of contents. Some of the contents have been labelled thanks to the Spanish Thematic Network on Speech Technology and Cátedra RTVE en la Universidad de Zaragoza. Eighty hours of different TV shows have been manually transcribed. Around forty five hours will be used for development and another thirty five hours for testing. The TV shows contents used for development and testing are different. The rest of the shows, more than 450 hours, can be used for training acoustic and language models. The selected TV shows range from broadcast news with a high degree of verbatim in the subtitles to live shows with respeaking subtitles. Participants are free to use these TV shows or any other data to train their systems. The content of the training data must be described on the system description report to be presented at IberSpeech2018.
- IberSPEECH-RTVE 2018 Speaker Diarization Challenge
-
The IBERSPEECH-RTVE SPEAKER DIARIZATION is a new challenge in the ALBAYZIN evaluation series. The evaluation is supported by the Spanish Thematic Network on Speech Technology ) and Cátedra RTVE Univesidad de Zaragoza and is organized by Vivolab - Universidad de Zaragoza. The Speaker Diarization evaluation consists of segmenting broadcast audio documents according to different speakers and linking those segments which originate from the same speaker. For this evaluation, the evaluation database has been donated by RTVE and labelled thanks to the Spanish Thematic Network on Speech Technology and Cátedra RTVE de la Universidad de Zaragoza. Around sixteen hours of two different TV shows will be used for development and another sixteen hours from another two different shows will be used for testing. For training, the Catalan broadcast news database from the 3/24 TV channel proposed for the 2010 Albayzin Audio Segmentation Evaluation [1-3] and the Corporación Aragonesa de Radio y Televisión (CARTV) database proposed for the 2016 Albayzin Speaker Diarization evaluation will be provided [4]. No a priori knowledge is provided about the number or the identity of the speakers participating in the audio to be analyzed. In the provided training data, information regarding the presence of noise, music and speech will be annotated but not in the development or test partitions. The Diarization Error Rate will be used as scoring metric as defined in the RT evaluations organized by NIST [5]. Two different conditions are proposed this year, a closed-set condition in which only data provided within this Albayzin evaluation can be used for training and an open-set condition in which external data can be used for training as long as they are publicly accessible to everyone (not necessarily free). Participants can submit systems in one or both conditions in an independent way.
- IberSPEECH-RTVE 2018 Multimodal Diarization Challenge
-
The IBERSPEECH-RTVE MULTIMODAL DIARIZATION is a new challenge in the ALBAYZIN evaluation series. This is supported by the Spanish Thematic Network on Speech Technology and Cátedra RTVE en la Universidad de Zaragoza and is organized by Vivolab - Universidad de Zaragoza. The multimodal diarization evaluation consists of segmenting broadcast audiovisual documents according to a closed set of different speakers and faces and linking those segments which originate from the same speaker and face. For this evaluation, a list of characters to recognize will be given. The rest of characters on the audiovisual document will be discarted for the evaluation purposes. System outputs must give for each segment who is speaking and who is/are in the image from the list of characters. For each character, a set of face pictures and short audiovisual document will be given. For this evaluation, shows covering studio to live broadcast will be used. Two hours will be used for development and four hours will be used for testing.