A4 Article in conference proceedings
TallVocabL2Fi : A Tall Dataset of 15 Finnish L2 Learners’ Vocabulary (2022)

Robertson, F., Chang, L.-H., & Söyrinki, S. (2022). TallVocabL2Fi : A Tall Dataset of 15 Finnish L2 Learners’ Vocabulary. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, & S. Piperidis (Eds.), LREC 2022 : Proceedings of the 13th Conference on Language Resources and Evaluation. European Language Resources Association. LREC proceedings. https://aclanthology.org/2022.lrec-1.685/

JYU authors or editors

Robertson, Frankie
Söyrinki, Sini

Publication details

All authors or editors: Robertson, Frankie; Chang, Li-Hsin; Söyrinki, Sini

Parent publication: LREC 2022 : Proceedings of the 13th Conference on Language Resources and Evaluation

Parent publication editors: Calzolari, Nicoletta; Béchet, Frédéric; Blache, Philippe; Choukri, Khalid; Cieri, Christopher; Declerck,
Thierry; Goggi, Sara; Isahara, Hitoshi; Maegaard, Bente; Mariani, Joseph; Mazo, Hélène; Odijk, Jan; Piperidis, Stelios

Place and date of conference: Marseille, France, 20.-25.6.2022

ISBN: 979-10-95546-72-6

Journal or series: LREC proceedings

eISSN: 2522-2686

Publication year: 2022

Publisher: European Language Resources Association

Publication country: France

Publication language: English

Persistent website address: https://aclanthology.org/2022.lrec-1.685/

Publication open access: Openly available

Publication channel open access: Open Access channel

Publication is parallel published (JYX): https://jyx.jyu.fi/handle/123456789/84663

Abstract

Previous work concerning measurement of second language learners has tended to focus on the knowledge of small numbers of words, often geared towards measuring vocabulary size. This paper presents a “tall” dataset containing information about a few learners’ knowledge of many words, suitable for evaluating Vocabulary Inventory Prediction (VIP) techniques, including those based on Computerised Adaptive Testing (CAT). In comparison to previous comparable datasets, the learners are from varied backgrounds, so as to reduce the risk of overfitting when used for machine learning based VIP. The dataset contains both a self-rating test and a translation test, used to derive a measure of reliability for learner responses. The dataset creation process is documented, and the relationship between variables concerning the participants, such as their completion time, their language ability level, and the triangulated reliability of their self-assessment responses, are analysed. The word list is constructed by taking into account the extensive derivation morphology of Finnish, and infrequent words are included in order to account for explanatory variables beyond word frequency

Keywords: second language; learning; language learning; words; vocabulary (knowledge); measurement; measuring methods; evaluation; data; machine learning

Free keywords: word knowledge; word response data; mental lexicon; Finnish; learner data

Fields of science:

Contributing organizations

JYU units:

Ministry reporting: Yes

Reporting Year: 2022

JUFO rating: 1

Follow-up groups:

A4 Article in conference proceedingsTallVocabL2Fi : A Tall Dataset of 15 Finnish L2 Learners’ Vocabulary (2022)

JYU authors or editors

Publication details

Abstract

Contributing organizations

A4 Article in conference proceedings
TallVocabL2Fi : A Tall Dataset of 15 Finnish L2 Learners’ Vocabulary (2022)