'Broken Finnish': Accent perceptions in societal gatekeeping (SA 315581/JY 21000042461) project research dataset


Halonen, Mia; Ahola, Sari; Ahola, Sari; Hirvelä, Tuija; Huhta, Ari; Neittaanmäki, Reeta; Ohranen, Sari; Ohranen, Sari; Ullakonoja, Riikka. (2023). 'Broken Finnish': Accent perceptions in societal gatekeeping (SA 315581/JY 21000042461) project research dataset. V. 1.9.2022. University of Jyväskylä. https://doi.org/10.17011/jyx/dataset/85233.


JYU authors
  • Contact person (yes/no)Yes
  • Contact person (yes/no)Yes
  • Contact person (yes/no)No
  • Contact person (yes/no)No
  • Contact person (yes/no)No
  • Contact person (yes/no)No
  • Contact person (yes/no)No
  • Contact person (yes/no)No
  • Contact person (yes/no)No

All authorsHalonen, Mia; Ahola, Sari; Ahola, Sari; Hirvelä, Tuija; Huhta, Ari; Neittaanmäki, Reeta; Ohranen, Sari; Ohranen, Sari; Ullakonoja, Riikka

FundersResearch Council of Finland

Right-holders

Contributors


Availability and identifiers

AvailabilityContact owner

Publication year2023

URN identifier in original repositoryhttp://urn.fi/URN:NBN:fi:jyu-202301301520

DOI identifier in original repositoryhttps://doi.org/10.17011/jyx/dataset/85233

URN identifier in JYXhttp://urn.fi/URN:NBN:fi:jyu-202301301520

DOI identifier in JYXhttps://doi.org/10.17011/jyx/dataset/85233


Description of the dataset

DescriptionData of project 'Broken Finnish': Accent perceptions in societal gatekeeping (SA 315581/JY 21000042461);
includes Project data (designed and constructed on the longitudinal data 2015; gathered 2016; analyzed/is analyzed 2018-2021) and National Certificate Language Proficiency examination (NCLP) longitudinal data 2009-2019.

The project includes two sets of data:

1) Rating data (gathered in an internet platform during 2015-2016). Informants: 44 NCLP raters; 50 L2 Finnish speaker test takers (10 Arabic, 10 Estonian, 10 Finland Swedish, 10 Russian and 10 Thai L1 speakers’ speech samples (5 male/5 female speakers in each L1 group).

Data outcome: a) numeric ratings on a six step scale (based on the NCLP rating criteria) of the focus group’s
speech performances; b) verbal descriptions of the performances; c) assumptions of the speakers’ L1 and the degree of certainty of the assumption (on a five step scale); d) speech samples of the test takers (1.5 min each).

Data output formats: .xlsx (Microsoft Excel) and .sav (IBM SPSS Statistics) formats; Statistical analyses (Rasch; MFRM; R) and modeling of the data; wav-format for Praat analyses; .mp3-format for packed data [to decrease the size of the files] in the rating platform and for research presentations. Transcriptions of the samples and Praat analyses.

2) Long-term data from the NCLP test system (2012-2016). 122 (=all) raters; 33,316 test takers (over 200 first languages).
Data outcome:
Background knowledge of a) the raters: age, gender, education, length of experience, b) the
test takers: L1, age, gender, education and length of Finnish studies.

Data output formats: .xlsx (Microsoft Excel) and .sav (IBM SPSS Statistics).

The data consist of more than 100 000 data entries/points. As the data is part of the active assessment system is grows all the time. This metadata description covers only the period of 2009-2019, which has been used in the project Broken Finnish (Rikkinäistä suomea).

The project focuses on accent perceptions in the National Certificates of Language Proficiency test in Finland. It explores how the test takers’ pronunciation is perceived as ‘foreign accent’ by the raters and how these perceptions affect the general proficiency rating. As the test is the most common way to prove language proficiency for the labour market and citizenship, it is a crucial societal gatekeeper.

The focus is on speakers of migrant groups of Arabic, Estonian, Russian, and Thai, and an older Finnish official minority group, Finland Swedish. The migrant groups belong to the biggest migrant groups, and all the groups face negative stereotyping in Finland. The project studies whether recognition or assumptions of the accents, possibly followed by the stereotypes concerning the speaker groups, might affect speech proficiency rating.

In addition to studying the accent perceptions the focus is on the assessment criteria of oral language proficiency, their use, internal relation and relation to the general proficiency level assessment. What of the oral language skills (fluency, coherence, vocabulary, structures and pronunciation) correlate best with the perceived general proficiency and, thus, the assessment in the NCLP.

The research team consist of sociolinguists, (socio)phoneticians, language test researchers, and statisticians.

LanguageFinnishEnglish

Free keywordsNational certificate for Language Proficiency; oral proficiency test in Finnish; language assessment; rating; assessment criteria; citizenship application; L1 = first language and its effect in proficiency rating; accent; stereotyping.

Keywords (YSO)language testscitizenshipequality policylanguage acquisitionlanguage examinationsFinnish as a second languageThai languageEstonian languageRussian languageArabic languageequality (fundamental rights)legislationLanguage Actadult language proficiency testNationality Actoral language skillscriteriapersonal assessmentdemographic statisticscompilation of statisticsstatistics (data)statistics (discipline)Finland Swedish

Fields of science112 Statistics and probability6121 Languages519 Social and economic geography5142 Social policy

Follow-up groupsAccounting (School of Business and Economics JSBE) YLA

Do you deal with data concerning special categories of personal data in your research?Yes


Projects related to dataset


Publications and other outputs related to dataset


Last updated on 2024-04-04 at 14:22