'Broken Finnish': Accent perceptions in societal gatekeeping (SA 315581/JY 21000042461) project research dataset

Halonen, Mia; Ahola, Sari; Ahola, Sari; Hirvelä, Tuija; Huhta, Ari; Neittaanmäki, Reeta; Ohranen, Sari; Ohranen, Sari; Ullakonoja, Riikka. (2023). 'Broken Finnish': Accent perceptions in societal gatekeeping (SA 315581/JY 21000042461) project research dataset. V. 1.9.2022. University of Jyväskylä. https://doi.org/10.17011/jyx/dataset/85233.

JYU authors:

Contact person (yes/no): Yes
Contact person (yes/no): Yes
Contact person (yes/no): No
Contact person (yes/no): No
Contact person (yes/no): No
Contact person (yes/no): No
Contact person (yes/no): No
Contact person (yes/no): No
Contact person (yes/no): No

All authors: Halonen, Mia; Ahola, Sari; Ahola, Sari; Hirvelä, Tuija; Huhta, Ari; Neittaanmäki, Reeta; Ohranen, Sari; Ohranen, Sari; Ullakonoja, Riikka

Funders: Research Council of Finland

Right-holders:

Contributors:

Availability and identifiers

Availability: Contact owner

Publication year: 2023

URN identifier in original repository: http://urn.fi/URN:NBN:fi:jyu-202301301520

DOI identifier in original repository: https://doi.org/10.17011/jyx/dataset/85233

URN identifier in JYX: http://urn.fi/URN:NBN:fi:jyu-202301301520

DOI identifier in JYX: https://doi.org/10.17011/jyx/dataset/85233

Description of the dataset

Description: Data of project 'Broken Finnish': Accent perceptions in societal gatekeeping (SA 315581/JY 21000042461);
includes Project data (designed and constructed on the longitudinal data 2015; gathered 2016; analyzed/is analyzed 2018-2021) and National Certificate Language Proficiency examination (NCLP) longitudinal data 2009-2019.

The project includes two sets of data:

1) Rating data (gathered in an internet platform during 2015-2016). Informants: 44 NCLP raters; 50 L2 Finnish speaker test takers (10 Arabic, 10 Estonian, 10 Finland Swedish, 10 Russian and 10 Thai L1 speakers’ speech samples (5 male/5 female speakers in each L1 group).

Data outcome: a) numeric ratings on a six step scale (based on the NCLP rating criteria) of the focus group’s
speech performances; b) verbal descriptions of the performances; c) assumptions of the speakers’ L1 and the degree of certainty of the assumption (on a five step scale); d) speech samples of the test takers (1.5 min each).

Data output formats: .xlsx (Microsoft Excel) and .sav (IBM SPSS Statistics) formats; Statistical analyses (Rasch; MFRM; R) and modeling of the data; wav-format for Praat analyses; .mp3-format for packed data [to decrease the size of the files] in the rating platform and for research presentations. Transcriptions of the samples and Praat analyses.

2) Long-term data from the NCLP test system (2012-2016). 122 (=all) raters; 33,316 test takers (over 200 first languages).
Data outcome:
Background knowledge of a) the raters: age, gender, education, length of experience, b) the
test takers: L1, age, gender, education and length of Finnish studies.

Data output formats: .xlsx (Microsoft Excel) and .sav (IBM SPSS Statistics).

The data consist of more than 100 000 data entries/points. As the data is part of the active assessment system is grows all the time. This metadata description covers only the period of 2009-2019, which has been used in the project Broken Finnish (Rikkinäistä suomea).

The project focuses on accent perceptions in the National Certificates of Language Proficiency test in Finland. It explores how the test takers’ pronunciation is perceived as ‘foreign accent’ by the raters and how these perceptions affect the general proficiency rating. As the test is the most common way to prove language proficiency for the labour market and citizenship, it is a crucial societal gatekeeper.

The focus is on speakers of migrant groups of Arabic, Estonian, Russian, and Thai, and an older Finnish official minority group, Finland Swedish. The migrant groups belong to the biggest migrant groups, and all the groups face negative stereotyping in Finland. The project studies whether recognition or assumptions of the accents, possibly followed by the stereotypes concerning the speaker groups, might affect speech proficiency rating.

In addition to studying the accent perceptions the focus is on the assessment criteria of oral language proficiency, their use, internal relation and relation to the general proficiency level assessment. What of the oral language skills (fluency, coherence, vocabulary, structures and pronunciation) correlate best with the perceived general proficiency and, thus, the assessment in the NCLP.

The research team consist of sociolinguists, (socio)phoneticians, language test researchers, and statisticians.

Language: Finnish; English

Free keywords: National certificate for Language Proficiency; oral proficiency test in Finnish; language assessment; rating; assessment criteria; citizenship application; L1 = first language and its effect in proficiency rating; accent; stereotyping.

Keywords (YSO): language tests; citizenship; equality policy; language acquisition; language examinations; Finnish as a second language; Thai language; Estonian language; Russian language; Arabic language; equality (fundamental rights); legislation; Language Act; adult language proficiency test; Nationality Act; oral language skills; criteria; personal assessment; demographic statistics; compilation of statistics; statistics (data); statistics (discipline); Finland Swedish

Fields of science: 112 Statistics and probability; 6121 Languages; 519 Social and economic geography; 5142 Social policy

Follow-up groups: Accounting (School of Business and Economics JSBE) YLA

Do you deal with data concerning special categories of personal data in your research?: Yes

Projects related to dataset

‘Broken Finnish’: Accent perceptions in societal gatekeeping
- - Halonen, Mia
- Research Council of Finland
01/09/2018-31/08/2024

'Broken Finnish': Accent perceptions in societal gatekeeping (SA 315581/JY 21000042461) project research dataset

Availability and identifiers

Description of the dataset

Projects related to dataset

Publications and other outputs related to dataset