A prototype system for obtaining and managing training data for multilingual learning

The project aims to empower less-resourced language communities to create parallel corpora for machine translation, enhancing language preservation and cultural heritage through an open-source prototype.

Subsidie

€ 150.000

2023

Projectdetails

Introduction

It is difficult to build high-quality machine translation systems for less-resourced languages, such as the minority languages of Europe. State-of-the-art machine translation is trained on large parallel corpora, texts and their translations. But such corpora are not available for less-resourced languages.

Project Overview

We will provide a system for the rapid and inexpensive creation of new parallel corpora. Our PoC project will both produce an open-source prototype utilizing findings from the PI's ERC StG and determine IPR and future funding.

Key Innovations

The key innovation of the prototype will be that it can be used by the less-resourced language community themselves. Current systems require extensive background in natural language processing. Allowing the community to create and curate parallel data has clear social benefits.

Social Impact

The creation of high-quality machine translation systems for less-resourced languages will allow for more content creation in these languages, playing a strong role in the preservation of these languages. Curated parallel data will also be useful in activities such as education and cultural heritage research.

Funding Landscape

Government funding is available for digital language preservation for many of the 7000 languages spoken on Earth. Companies with online translation systems such as Google and DeepL/Linguee are not addressing this market, as the ROI is too low. It makes more sense to empower local communities to create such parallel data.

Evaluation and Future Development

We will carefully evaluate our prototype to ensure that it meets their needs. Along with the creation of the prototype, we will determine how best to structure the IPR to support future development.

Considerations for Implementation

Consulting, which we have already carried out for the Sorbian community, and a certification scheme for users of our system are two possibilities we will consider, along with:

Commercial machine translation
Multilingual classification problems such as hate speech detection.

Financiële details & Tijdlijn

Financiële details

Subsidiebedrag	€ 150.000
Totale projectbegroting	€ 150.000

Tijdlijn

Startdatum	1-10-2023
Einddatum	30-9-2025
Subsidiejaar	2023

Partners & Locaties

Projectpartners

TECHNISCHE UNIVERSITAET MUENCHENpenvoerder
LUDWIG-MAXIMILIANS-UNIVERSITAET MUENCHEN

Land(en)

Germany

Vergelijkbare projecten binnen European Research Council

Project	Regeling	Bedrag	Jaar	Actie
Evaluating and Programming Intelligent Chatbots for Any Language EPICAL aims to enhance intelligent chatbots by integrating low resource languages through innovative techniques in text generation, translation, and speech processing, promoting multilingual inclusivity.	ERC Advanced...	€ 2.498.200	2025	Details
DEep COgnition Learning for LAnguage GEneration This project aims to enhance NLP models by integrating machine learning, cognitive science, and structured memory to improve out-of-domain generalization and contextual understanding in language generation tasks.	ERC Consolid...	€ 1.999.595	2023	Details
Uncovering the creative process: from inception to reception of translated content using machine translation INCREC aims to analyze the creative processes of professional translators using neural machine translation to enhance translation quality and user experience in literary and audiovisual contexts.	ERC Consolid...	€ 1.993.643	2023	Details
Responsive classifiers against hate speech in low-resource settings Respond2Hate aims to empower users in low-resource settings to locally filter hate speech from social media using adaptive NLP models, enhancing online safety without relying on tech companies.	ERC Proof of...	€ 150.000	2023	Details
Linguistic traces: low-frequency forms as evidence of language and population history This project aims to reconstruct early European languages by analyzing low-frequency linguistic variants in historical texts, integrating philology with deep learning to uncover cultural interactions.	ERC Advanced...	€ 2.498.135	2025	Details

ERC Advanced...

Evaluating and Programming Intelligent Chatbots for Any Language

EPICAL aims to enhance intelligent chatbots by integrating low resource languages through innovative techniques in text generation, translation, and speech processing, promoting multilingual inclusivity.

ERC Advanced Grant

€ 2.498.200

2025

Project	Regeling	Bedrag	Jaar	Actie
Pas ChatGPT aan voor Dierentalen Dit project onderzoekt de haalbaarheid van technologie die dierentalen vertaalt naar menselijke talen om de communicatie tussen mensen en dieren te verbeteren.	Mkb-innovati...	€ 20.000	2023	Details
Quantum Computing Powered Neural Machine Translation Het project onderzoekt de haalbaarheid van quantum computing voor het ontwikkelen van efficiëntere vertaalmodellen, zodat Tolq.com een leidende positie kan verwerven in de vertaalmarkt.	Mkb-innovati...	€ 20.000	2020	Details
Haalbaarheidsonderzoek naar ontwikkeling van VocalMind Ai software om taalbarrière op te heffen Dit haalbaarheidsonderzoek richt zich op het ontwikkelen van VocalMindAi, een unieke vertaaltool die tekst omzet in spraak om een eerlijke waterverdeling wereldwijd te bevorderen binnen een community.	Mkb-innovati...	€ 20.000	2023	Details
Haalbaarheid GPT Domain Specific Neural Network Text Tolq.com IP B.V. onderzoekt de haalbaarheid van een innovatieve tekstgenerator om het kostbare en tijdrovende schrijfproces voor meertalige websites te optimaliseren.	Mkb-innovati...	€ 20.000	2021	Details
Bias Neutraliser CorTexter ontwikkelt een deep learning software om onbedoelde vooroordelen in recruitmentteksten te herkennen en te neutraliseren, waardoor gelijke kansen voor werkzoekenden worden bevorderd.	Mkb-innovati...	€ 20.000	2021	Details

Projectdetails

Introduction

Project Overview

Key Innovations

Social Impact

Funding Landscape

Evaluation and Future Development

Considerations for Implementation

Financiële details & Tijdlijn

Financiële details

Tijdlijn

Partners & Locaties

Projectpartners

Land(en)

Vergelijkbare projecten binnen European Research Council

Evaluating and Programming Intelligent Chatbots for Any Language

DEep COgnition Learning for LAnguage GEneration

Uncovering the creative process: from inception to reception of translated content using machine translation

Responsive classifiers against hate speech in low-resource settings

Linguistic traces: low-frequency forms as evidence of language and population history

Evaluating and Programming Intelligent Chatbots for Any Language

DEep COgnition Learning for LAnguage GEneration

Uncovering the creative process: from inception to reception of translated content using machine translation

Responsive classifiers against hate speech in low-resource settings

Linguistic traces: low-frequency forms as evidence of language and population history

Vergelijkbare projecten uit andere regelingen

Pas ChatGPT aan voor Dierentalen

Quantum Computing Powered Neural Machine Translation

Haalbaarheidsonderzoek naar ontwikkeling van VocalMind Ai software om taalbarrière op te heffen

Haalbaarheid GPT Domain Specific Neural Network Text

Bias Neutraliser

Pas ChatGPT aan voor Dierentalen

Quantum Computing Powered Neural Machine Translation

Haalbaarheidsonderzoek naar ontwikkeling van VocalMind Ai software om taalbarrière op te heffen

Haalbaarheid GPT Domain Specific Neural Network Text

Bias Neutraliser

Projectdetails

Introduction

Project Overview

Key Innovations

Social Impact

Funding Landscape

Evaluation and Future Development

Considerations for Implementation

Financiële details & Tijdlijn

Financiële details

Tijdlijn

Partners & Locaties

Projectpartners

Land(en)

Vergelijkbare projecten binnen European Research Council

Evaluating and Programming Intelligent Chatbots for Any Language

DEep COgnition Learning for LAnguage GEneration

Uncovering the creative process: from inception to reception of translated content using machine translation

Responsive classifiers against hate speech in low-resource settings

Linguistic traces: low-frequency forms as evidence of language and population history

Evaluating and Programming Intelligent Chatbots for Any Language

DEep COgnition Learning for LAnguage GEneration

Uncovering the creative process: from inception to reception of translated content using machine translation

Responsive classifiers against hate speech in low-resource settings

Linguistic traces: low-frequency forms as evidence of language and population history

Vergelijkbare projecten uit andere regelingen

Pas ChatGPT aan voor Dierentalen

Quantum Computing Powered Neural Machine Translation

Haalbaarheidsonderzoek naar ontwikkeling van VocalMind Ai software om taalbarrière op te heffen

Haalbaarheid GPT Domain Specific Neural Network Text

Bias Neutraliser

Pas ChatGPT aan voor Dierentalen

Quantum Computing Powered Neural Machine Translation

Haalbaarheidsonderzoek naar ontwikkeling van VocalMind Ai software om taalbarrière op te heffen

Haalbaarheid GPT Domain Specific Neural Network Text

Bias Neutraliser