D3 Article in professional conference proceedings
New data, benchmark and baseline for L2 speaking assessment for low-resoure languages (2023)
Kurimo, M., Getman, Y., Voskoboinik, E., Al-Ghezi, R., Kallio, H., Kuronen, M., von Zansen, A., Hilden, R., Kronholm, S., Huhta, A., & Linden, K. (2023). New data, benchmark and baseline for L2 speaking assessment for low-resoure languages. In Proceedings of the 9th Workshop on Speech and Language Technology in Education (SLaTE) (pp. 166-170). International Speech Communication Association. https://doi.org/10.21437/SLaTE.2023-32
JYU authors or editors
Publication details
All authors or editors: Kurimo, Mikko; Getman, Yaroslav; Voskoboinik, Ekaterina; Al-Ghezi, Ragheb; Kallio, Heini; Kuronen, Mikko; von Zansen, Anna; Hilden, Raili; Kronholm, Sirkku; Huhta, Ari; et al.
Parent publication: Proceedings of the 9th Workshop on Speech and Language Technology in Education (SLaTE)
Place and date of conference: Dublin, Ireland, 18.-20.8.2023
Publication year: 2023
Publication date: 18/08/2023
Pages range: 166-170
Number of pages in the book: 186
Publisher: International Speech Communication Association
Place of Publication: Grenoble
Publication country: France
Publication language: English
DOI: https://doi.org/10.21437/SLaTE.2023-32
Publication open access: Openly available
Publication channel open access: Open Access channel
Publication is parallel published (JYX): https://jyx.jyu.fi/handle/123456789/88853
Abstract
The development of large multilingual speech models provides the possibility to construct high-quality speech technology even for low-resource languages. In this paper, we present the speech data of L2 learners of Finnish and Finland Swedish that we have recently collected for training and evaluation of automatic speech recognition (ASR) and speaking assessment (ASA). It includes over 4000 recordings by over 300 students per language in short read-aloud and free-form tasks. The recordings have been manually transcribed and assessed for pronunciation, fluency, range, accuracy, task achievement, and a holistic proficiency level. We present also an ASR and ASA benchmarking setup we have constructed using this data and include results from our baseline systems built by fine-tuning self-supervised multilingual model for the target language. In addition to benchmarking, our baseline system can be used by L2 students and teachers for online self-training and evaluation of oral proficiency.
Keywords: oral language skills; speech (phenomena); speech recognition; evaluation; second language; Finnish as a second language; Swedish as a second language; Finland Swedish; multilingualism; language learning
Free keywords: ASR; L2 speaking assessment; wav2vec2.0; low-resource languages
Contributing organizations
Related projects
- Digital support for training and assessing second language speaking
- Kuronen, Mikko
- Research Council of Finland
Ministry reporting: Yes
VIRTA submission year: 2023