Common Language Resources and Technology Infrastructur (FIN-CLARIAH)


Main funder

Funder's project number: 345619


Funds granted by main funder (€)

  • 208 356,00


Funding program


Project timetable

Project start date: 01/01/2022

Project end date: 31/12/2023


Summary

FIN-CLARIAH is the premier Finnish digital research infrastructure (RI) for Social Sciences and Humanities (SSH) comprising two components, FIN-CLARIN and DARIAH-FI. In their first common development project, the FIN-CLARIAH
components seek to significantly broaden their mutual scope of digital SSH infrastructural support by consolidating and enhancing their
resources in three major data-oriented directions:

1. to reach beyond processing of spoken standard Finnish into colloquial speech (Goal 1)
2. to cater to a broad range of SSH research needs for processing unstructured text (Goal 2)
3. to facilitate research based on metadata (Goal 3).

The project involves all Finnish universities with research in SSH. While historically the SSH have not been at the forefront of the use
of digital technology, the field in Finland has potential to enact such a transformation. The aim of this development project is to ensure
that a digital transformation happens in an orderly fashion without duplication of efforts or reinventing the wheel.

In Finland, digitization of materials for SSH research is well underway, but one of the main problems from the perspective of research is that the data is scattered. This presents problems for researchers due to the incompatibility of formats and interfaces. Additionally, there is a danger of duplicated effort when developing tools to manage and process the datasets. The primary concern of the RI project
is therefore to ensure that both data and functionality are consolidated under a unified national RI operated by the efficient computing
infrastructure provided by CSC - the IT Center for Science, currently serving as the technical host of the Language Bank of Finland
which is a national RI service center for the SSH field based on open access to resources and services for SSH researchers.

The FIN-CLARIAH ecosystem has already made great strides, for example by creating unified processes for negotiating research rights
to materials, and through developing unified access mechanisms for the resulting datasets. Utilizing these, FIN-CLARIAH already
makes available large collections of textual and multi-modal resources as well as tools for analysing and enriching them. However, recent advances due to neural network technology, supercomputing availability, and large digital SSH datasets have created clear opportunities and needs for further development of a common SSH data and tools infrastructure.


Principal Investigator


Primary responsible unit


Last updated on 2022-06-07 at 12:45