Common Language Resources and Technology Infrastructur (Fin-Clariah)
Main funder
Funder's project number: 358726
Funds granted by main funder (€)
- 156 110,00
Funding program
Project timetable
Project start date: 01/01/2024
Project end date: 31/12/2025
Summary
Foundational machine learning models, most broadly exemplified by large language models such as GPT, or Whisper for speech+text
and CLIP for images+text are game changers also for research. This is particularly true in the humanities and social sciences, where
these kinds of materials are the bulk of the research data. In this project, our major aim is to upgrade the infrastructure to take advantage
of the recent progress.
FIN-CLARIAH is the premier Finnish digital research infrastructure (RI) for Social Sciences and Humanities
(SSH) comprising two components, FIN-CLARIN and DARIAH-FI. The two FIN-CLARIAH components will significantly upgrade
their digital SSH infrastructural support:
1. to upgrade the processing of languages spoken in Finland
2. to upgrade a broad range of tools for SSH research processing unstructured text
3. to facilitate research in audio-visual culture by processing metadata
4. to support transformer technology uptake among SSH researchers
To do so, the infrastructure needs to upgrade capabilities to 1) ingest new types of data (in particular speech/images/multimodal data in
addition to text), 2) develop new pre-trained downstream foundational models tuned to these datasets, and 3) develop tools by which
these downstream foundational models can be further fine-tuned to particular research tasks. To accomplish our goals, the current
project involves all relevant Finnish universities, drawing on the expertise of forefront interdisciplinary research groups in multiple fields
of data-centric humanities and social sciences who have experience in developing both such models as well as workflows based on them
in fields from linguistics through game studies to visual culture analysis. By infrastructuring the work of these forefront groups, the
project will make the advances accessible to a much wider group of researchers.
Aside from this core focus, FIN-CLARIAH will also perform needed upgrades to the basic data management, versioning and workflow
automation capabilities that underlie the infrastructure. Furthermore, we will also upgrade the licence negotiation and data-access
procedures of the infrastructure to better account for new requirements caused by both the new types of data ingested, as well as the
changing legal landscape for personal and copyrighted data. Finally, the project will also continue its work on community engagement to
ensure the facilities developed have the widest impact possible.
and CLIP for images+text are game changers also for research. This is particularly true in the humanities and social sciences, where
these kinds of materials are the bulk of the research data. In this project, our major aim is to upgrade the infrastructure to take advantage
of the recent progress.
FIN-CLARIAH is the premier Finnish digital research infrastructure (RI) for Social Sciences and Humanities
(SSH) comprising two components, FIN-CLARIN and DARIAH-FI. The two FIN-CLARIAH components will significantly upgrade
their digital SSH infrastructural support:
1. to upgrade the processing of languages spoken in Finland
2. to upgrade a broad range of tools for SSH research processing unstructured text
3. to facilitate research in audio-visual culture by processing metadata
4. to support transformer technology uptake among SSH researchers
To do so, the infrastructure needs to upgrade capabilities to 1) ingest new types of data (in particular speech/images/multimodal data in
addition to text), 2) develop new pre-trained downstream foundational models tuned to these datasets, and 3) develop tools by which
these downstream foundational models can be further fine-tuned to particular research tasks. To accomplish our goals, the current
project involves all relevant Finnish universities, drawing on the expertise of forefront interdisciplinary research groups in multiple fields
of data-centric humanities and social sciences who have experience in developing both such models as well as workflows based on them
in fields from linguistics through game studies to visual culture analysis. By infrastructuring the work of these forefront groups, the
project will make the advances accessible to a much wider group of researchers.
Aside from this core focus, FIN-CLARIAH will also perform needed upgrades to the basic data management, versioning and workflow
automation capabilities that underlie the infrastructure. Furthermore, we will also upgrade the licence negotiation and data-access
procedures of the infrastructure to better account for new requirements caused by both the new types of data ingested, as well as the
changing legal landscape for personal and copyrighted data. Finally, the project will also continue its work on community engagement to
ensure the facilities developed have the widest impact possible.
Principal Investigator
Primary responsible unit
Follow-up groups
Profiling area: School of Resource Wisdom (University of Jyväskylä JYU) JYU.Wisdom; School of Wellbeing (University of Jyväskylä JYU) JYU.Well