A prototype system for obtaining and managing training data for multilingual learning
The project aims to empower less-resourced language communities to create parallel corpora for machine translation, enhancing language preservation and cultural heritage through an open-source prototype.
Projectdetails
Introduction
It is difficult to build high-quality machine translation systems for less-resourced languages, such as the minority languages of Europe. State-of-the-art machine translation is trained on large parallel corpora, texts and their translations. But such corpora are not available for less-resourced languages.
Project Overview
We will provide a system for the rapid and inexpensive creation of new parallel corpora. Our PoC project will both produce an open-source prototype utilizing findings from the PI's ERC StG and determine IPR and future funding.
Key Innovations
The key innovation of the prototype will be that it can be used by the less-resourced language community themselves. Current systems require extensive background in natural language processing. Allowing the community to create and curate parallel data has clear social benefits.
Social Impact
The creation of high-quality machine translation systems for less-resourced languages will allow for more content creation in these languages, playing a strong role in the preservation of these languages. Curated parallel data will also be useful in activities such as education and cultural heritage research.
Funding Landscape
Government funding is available for digital language preservation for many of the 7000 languages spoken on Earth. Companies with online translation systems such as Google and DeepL/Linguee are not addressing this market, as the ROI is too low. It makes more sense to empower local communities to create such parallel data.
Evaluation and Future Development
We will carefully evaluate our prototype to ensure that it meets their needs. Along with the creation of the prototype, we will determine how best to structure the IPR to support future development.
Considerations for Implementation
Consulting, which we have already carried out for the Sorbian community, and a certification scheme for users of our system are two possibilities we will consider, along with:
- Commercial machine translation
- Multilingual classification problems such as hate speech detection.
Financiële details & Tijdlijn
Financiële details
Subsidiebedrag | € 150.000 |
Totale projectbegroting | € 150.000 |
Tijdlijn
Startdatum | 1-10-2023 |
Einddatum | 30-9-2025 |
Subsidiejaar | 2023 |
Partners & Locaties
Projectpartners
- TECHNISCHE UNIVERSITAET MUENCHENpenvoerder
- LUDWIG-MAXIMILIANS-UNIVERSITAET MUENCHEN
Land(en)
Vergelijkbare projecten binnen European Research Council
Project | Regeling | Bedrag | Jaar | Actie |
---|---|---|---|---|
Evaluating and Programming Intelligent Chatbots for Any LanguageEPICAL aims to enhance intelligent chatbots by integrating low resource languages through innovative techniques in text generation, translation, and speech processing, promoting multilingual inclusivity. | ERC Advanced... | € 2.498.200 | 2025 | Details |
DEep COgnition Learning for LAnguage GEnerationThis project aims to enhance NLP models by integrating machine learning, cognitive science, and structured memory to improve out-of-domain generalization and contextual understanding in language generation tasks. | ERC Consolid... | € 1.999.595 | 2023 | Details |
Uncovering the creative process: from inception to reception of translated content using machine translationINCREC aims to analyze the creative processes of professional translators using neural machine translation to enhance translation quality and user experience in literary and audiovisual contexts. | ERC Consolid... | € 1.993.643 | 2023 | Details |
Responsive classifiers against hate speech in low-resource settingsRespond2Hate aims to empower users in low-resource settings to locally filter hate speech from social media using adaptive NLP models, enhancing online safety without relying on tech companies. | ERC Proof of... | € 150.000 | 2023 | Details |
Linguistic traces: low-frequency forms as evidence of language and population historyThis project aims to reconstruct early European languages by analyzing low-frequency linguistic variants in historical texts, integrating philology with deep learning to uncover cultural interactions. | ERC Advanced... | € 2.498.135 | 2025 | Details |
Evaluating and Programming Intelligent Chatbots for Any Language
EPICAL aims to enhance intelligent chatbots by integrating low resource languages through innovative techniques in text generation, translation, and speech processing, promoting multilingual inclusivity.
DEep COgnition Learning for LAnguage GEneration
This project aims to enhance NLP models by integrating machine learning, cognitive science, and structured memory to improve out-of-domain generalization and contextual understanding in language generation tasks.
Uncovering the creative process: from inception to reception of translated content using machine translation
INCREC aims to analyze the creative processes of professional translators using neural machine translation to enhance translation quality and user experience in literary and audiovisual contexts.
Responsive classifiers against hate speech in low-resource settings
Respond2Hate aims to empower users in low-resource settings to locally filter hate speech from social media using adaptive NLP models, enhancing online safety without relying on tech companies.
Linguistic traces: low-frequency forms as evidence of language and population history
This project aims to reconstruct early European languages by analyzing low-frequency linguistic variants in historical texts, integrating philology with deep learning to uncover cultural interactions.
Vergelijkbare projecten uit andere regelingen
Project | Regeling | Bedrag | Jaar | Actie |
---|---|---|---|---|
Pas ChatGPT aan voor DierentalenDit project onderzoekt de haalbaarheid van technologie die dierentalen vertaalt naar menselijke talen om de communicatie tussen mensen en dieren te verbeteren. | Mkb-innovati... | € 20.000 | 2023 | Details |
Quantum Computing Powered Neural Machine TranslationHet project onderzoekt de haalbaarheid van quantum computing voor het ontwikkelen van efficiëntere vertaalmodellen, zodat Tolq.com een leidende positie kan verwerven in de vertaalmarkt. | Mkb-innovati... | € 20.000 | 2020 | Details |
Haalbaarheidsonderzoek naar ontwikkeling van VocalMind Ai software om taalbarrière op te heffenDit haalbaarheidsonderzoek richt zich op het ontwikkelen van VocalMindAi, een unieke vertaaltool die tekst omzet in spraak om een eerlijke waterverdeling wereldwijd te bevorderen binnen een community. | Mkb-innovati... | € 20.000 | 2023 | Details |
Haalbaarheid GPT Domain Specific Neural Network TextTolq.com IP B.V. onderzoekt de haalbaarheid van een innovatieve tekstgenerator om het kostbare en tijdrovende schrijfproces voor meertalige websites te optimaliseren. | Mkb-innovati... | € 20.000 | 2021 | Details |
Bias NeutraliserCorTexter ontwikkelt een deep learning software om onbedoelde vooroordelen in recruitmentteksten te herkennen en te neutraliseren, waardoor gelijke kansen voor werkzoekenden worden bevorderd. | Mkb-innovati... | € 20.000 | 2021 | Details |
Pas ChatGPT aan voor Dierentalen
Dit project onderzoekt de haalbaarheid van technologie die dierentalen vertaalt naar menselijke talen om de communicatie tussen mensen en dieren te verbeteren.
Quantum Computing Powered Neural Machine Translation
Het project onderzoekt de haalbaarheid van quantum computing voor het ontwikkelen van efficiëntere vertaalmodellen, zodat Tolq.com een leidende positie kan verwerven in de vertaalmarkt.
Haalbaarheidsonderzoek naar ontwikkeling van VocalMind Ai software om taalbarrière op te heffen
Dit haalbaarheidsonderzoek richt zich op het ontwikkelen van VocalMindAi, een unieke vertaaltool die tekst omzet in spraak om een eerlijke waterverdeling wereldwijd te bevorderen binnen een community.
Haalbaarheid GPT Domain Specific Neural Network Text
Tolq.com IP B.V. onderzoekt de haalbaarheid van een innovatieve tekstgenerator om het kostbare en tijdrovende schrijfproces voor meertalige websites te optimaliseren.
Bias Neutraliser
CorTexter ontwikkelt een deep learning software om onbedoelde vooroordelen in recruitmentteksten te herkennen en te neutraliseren, waardoor gelijke kansen voor werkzoekenden worden bevorderd.