Albayzin Evaluation
IberSpeech-RTVE 2018 Challenge


FAQ Section

Q1: In the "Closed condition", can we use text outside the data provided by the organization to train language models?

No, in the "Closed condition", language models can only be trained using data provided by the organization (transcripts + closed-captions) and no external resources are allowed.

Q2: Regarding the composition in the eval partition, Should we expect the same shows as in the dev partitions?

The shows in the eval partition will be different from the shows in the dev partitions. No new episodes from the same shows should be expected.

Q3: Reading again the IberSPEECH-RTVE 2018 Speaker Diarization Challenge evaluation plan I have just realized that we don't have neither "Aragón Radio database" nor "3/24 TV channel database". How can we get them?

You can ask for them sending an e-mail to albayzinevaluations@gmail.com. You will be provided with the download instructions in response.

Q4: Since closed-captions are not verbatim transcription of the audio, in the "Closed condition", Can we use a previously trained system to perform filtering or alignment over these data?

Yes, considering that these tasks are for preparing the training data and not for training models. As far as no external data are used to train the models, participants are allowed to use tools developed with external resources. The descriptions of these tools must be included in the system description submitted to the conference along with the description of the data used to develop these tools.

Q5: Can we use a preexistent speaker diarization system to segment the training data in the "Closed condition"?

Yes, considering that these tasks are for preparing the training data. As far as no external data are used to train the models, participants are allowed to use previously developed tools. The descriptions of these tools must be included in the system description submitted to the conference along with the description of all the data used to develop these tools.

Q6: How is sclite called to get the performance of my system with dev partitions?

You have to use the .stm files provided in the stm folder on dev1 and dev2 as reference and your system output in plain text (utf-8) with .txt extension. Let say that we are evaluating the LM-20171107.acc audio file. We can find the LM-20171107.stm file in the stm folder of dev1. If the output file of our system is LM-20171107_SITE_P_SET.txt, then we will call sclite as
sclite -r LM-20171107.stm stm -h LM-20171107_SITE_P_SET.txt txt

Q7: How is md-eval-v22.pl called to get the performance of my system with dev partitions?

You have to use the .rttm files provided in the rttm folder on dev1 and dev2 as reference and your system output in rttm format. We will use the best match between hypothesis and reference labels. The flag -b must be used to get best match
md-eval-v22.pl -c 0.25 -b -r SPKR-REFERENCE.rttm -s SPKR-SYSTEM.rttm

Q8: What template do we have to use to submit the system description?

The same that the one used for the regular papers of the IberSpeech Conference. Paper length is up to 4 pages and one additional page for references. You can download from the following link
http://interspeech2018.org/assets/images/IS2018_paper_kit.zip
Please, include in your descriptions all the essential information to allow readers to understand the key aspects of your systems.The paper must describe clearly your systems, what you did, the algorithms and the training and development data used.