Evaluating and Programming Intelligent Chatbots for Any Language
EPICAL aims to enhance intelligent chatbots by integrating low resource languages through innovative techniques in text generation, translation, and speech processing, promoting multilingual inclusivity.
Projectdetails
Introduction
Intelligent chatbots (ICs) such as ChatGPT have revolutionized the generation of content for a few languages such as English, but there are 7099 currently spoken languages in the world. EPICAL will, for the first time, determine how to add new low resource languages (LRLs) to ICs.
Project Advances
We will make six advances to revolutionize the capabilities of ICs, unifying different areas of research that are incorrectly studied separately. We will:
- Determine how to generate hallucination-free text using ICs, and how to enter a virtuous cycle where LRL text is created using cross-lingual knowledge from ICs and then quickly post-edited and trained upon, resulting in a better LRL representation in the IC.
- Develop more powerful encoding and language adaptation approaches which combine the benefits of fine-tuning and adapters, taking full advantage of linguistically related languages to model LRLs.
- Enable ICs to reason about their own LRL capabilities and determine what they know and do not know.
- Unify research on machine translation and ICs to obtain ICs which can translate to LRLs with state-of-the-art accuracy.
- Enable high quality text-to-speech and automatic speech recognition of LRLs with ICs, thereby unifying the research on low resource speech processing with research on LRL text processing.
- Develop a novel evaluation methodology including a robust method for automatically measuring fact hallucination.
Research Group
My research group is well-known for LRL research, which differs from large commercial labs focusing only on the top 200 languages. Our work is critical for a multilingual Europe which values the role of minority languages, culture, and heritage.
Impact
Our innovations will benefit natural language processing beyond text generation and machine translation and strongly impact other areas of machine learning research suffering from data bottlenecks.
Financiële details & Tijdlijn
Financiële details
Subsidiebedrag | € 2.498.200 |
Totale projectbegroting | € 2.498.200 |
Tijdlijn
Startdatum | 1-3-2025 |
Einddatum | 28-2-2030 |
Subsidiejaar | 2025 |
Partners & Locaties
Projectpartners
- TECHNISCHE UNIVERSITAET MUENCHENpenvoerder
Land(en)
Vergelijkbare projecten binnen European Research Council
Project | Regeling | Bedrag | Jaar | Actie |
---|---|---|---|---|
A prototype system for obtaining and managing training data for multilingual learningThe project aims to empower less-resourced language communities to create parallel corpora for machine translation, enhancing language preservation and cultural heritage through an open-source prototype. | ERC Proof of... | € 150.000 | 2023 | Details |
DEep COgnition Learning for LAnguage GEnerationThis project aims to enhance NLP models by integrating machine learning, cognitive science, and structured memory to improve out-of-domain generalization and contextual understanding in language generation tasks. | ERC Consolid... | € 1.999.595 | 2023 | Details |
Educational Inequalities and Generative AI: A Focus on Language DiversityChatEQUITY explores the impact of generative AI on educational inequalities by developing a new assessment tool and analyzing access disparities among students and teachers in diverse linguistic contexts. | ERC Consolid... | € 2.195.500 | 2025 | Details |
Why do infants learn language so fast? A reverse engineering approachThis project develops a computational model to explore how infants efficiently learn language through statistical learning and three additional mechanisms, aiming to produce comparable outcomes to children's language acquisition. | ERC Advanced... | € 2.494.625 | 2025 | Details |
Controlling Large Language ModelsDevelop a framework to understand and control large language models, addressing biases and flaws to ensure safe and responsible AI adoption. | ERC Starting... | € 1.500.000 | 2024 | Details |
A prototype system for obtaining and managing training data for multilingual learning
The project aims to empower less-resourced language communities to create parallel corpora for machine translation, enhancing language preservation and cultural heritage through an open-source prototype.
DEep COgnition Learning for LAnguage GEneration
This project aims to enhance NLP models by integrating machine learning, cognitive science, and structured memory to improve out-of-domain generalization and contextual understanding in language generation tasks.
Educational Inequalities and Generative AI: A Focus on Language Diversity
ChatEQUITY explores the impact of generative AI on educational inequalities by developing a new assessment tool and analyzing access disparities among students and teachers in diverse linguistic contexts.
Why do infants learn language so fast? A reverse engineering approach
This project develops a computational model to explore how infants efficiently learn language through statistical learning and three additional mechanisms, aiming to produce comparable outcomes to children's language acquisition.
Controlling Large Language Models
Develop a framework to understand and control large language models, addressing biases and flaws to ensure safe and responsible AI adoption.
Vergelijkbare projecten uit andere regelingen
Project | Regeling | Bedrag | Jaar | Actie |
---|---|---|---|---|
Haalbaarheidsonderzoek naar AIPerLearn (AI-Powered Personalized Learning)STARK Learning onderzoekt de toepassing en training van AI-modellen om het ontwikkelen van gepersonaliseerde lesmaterialen te automatiseren en de kwaliteit en validatie te waarborgen. | Mkb-innovati... | € 20.000 | 2023 | Details |
Onderzoek naar eigen taalmodellen in de GMU AI Digital MarketplaceGMU onderzoekt de ontwikkeling van een eigen LLaMa-taalmodel op basis van eigen data om een AI Digital Marketplace te creëren en de afhankelijkheid van internationale bedrijven te verminderen. | Mkb-innovati... | € 20.000 | 2023 | Details |
EdionHet project ontwikkelt een geautomatiseerd systeem voor vraaggeneratie in de natuurkunde om docenten tijd te besparen en studenten een boeiendere leerervaring te bieden. | Mkb-innovati... | € 20.000 | 2023 | Details |
Pas ChatGPT aan voor DierentalenDit project onderzoekt de haalbaarheid van technologie die dierentalen vertaalt naar menselijke talen om de communicatie tussen mensen en dieren te verbeteren. | Mkb-innovati... | € 20.000 | 2023 | Details |
Project POLIGEN-AIHet project richt zich op het ontwikkelen van een betrouwbare "fact-based" chatbot om desinformatie te bestrijden en geïnformeerde beslissingen te ondersteunen, met aandacht voor technische en juridische haalbaarheid. | Mkb-innovati... | € 20.000 | 2023 | Details |
Haalbaarheidsonderzoek naar AIPerLearn (AI-Powered Personalized Learning)
STARK Learning onderzoekt de toepassing en training van AI-modellen om het ontwikkelen van gepersonaliseerde lesmaterialen te automatiseren en de kwaliteit en validatie te waarborgen.
Onderzoek naar eigen taalmodellen in de GMU AI Digital Marketplace
GMU onderzoekt de ontwikkeling van een eigen LLaMa-taalmodel op basis van eigen data om een AI Digital Marketplace te creëren en de afhankelijkheid van internationale bedrijven te verminderen.
Edion
Het project ontwikkelt een geautomatiseerd systeem voor vraaggeneratie in de natuurkunde om docenten tijd te besparen en studenten een boeiendere leerervaring te bieden.
Pas ChatGPT aan voor Dierentalen
Dit project onderzoekt de haalbaarheid van technologie die dierentalen vertaalt naar menselijke talen om de communicatie tussen mensen en dieren te verbeteren.
Project POLIGEN-AI
Het project richt zich op het ontwikkelen van een betrouwbare "fact-based" chatbot om desinformatie te bestrijden en geïnformeerde beslissingen te ondersteunen, met aandacht voor technische en juridische haalbaarheid.