Evaluating and Programming Intelligent Chatbots for Any Language

EPICAL aims to enhance intelligent chatbots by integrating low resource languages through innovative techniques in text generation, translation, and speech processing, promoting multilingual inclusivity.

Subsidie

€ 2.498.200

2025

Projectdetails

Introduction

Intelligent chatbots (ICs) such as ChatGPT have revolutionized the generation of content for a few languages such as English, but there are 7099 currently spoken languages in the world. EPICAL will, for the first time, determine how to add new low resource languages (LRLs) to ICs.

Project Advances

We will make six advances to revolutionize the capabilities of ICs, unifying different areas of research that are incorrectly studied separately. We will:

Determine how to generate hallucination-free text using ICs, and how to enter a virtuous cycle where LRL text is created using cross-lingual knowledge from ICs and then quickly post-edited and trained upon, resulting in a better LRL representation in the IC.
Develop more powerful encoding and language adaptation approaches which combine the benefits of fine-tuning and adapters, taking full advantage of linguistically related languages to model LRLs.
Enable ICs to reason about their own LRL capabilities and determine what they know and do not know.
Unify research on machine translation and ICs to obtain ICs which can translate to LRLs with state-of-the-art accuracy.
Enable high quality text-to-speech and automatic speech recognition of LRLs with ICs, thereby unifying the research on low resource speech processing with research on LRL text processing.
Develop a novel evaluation methodology including a robust method for automatically measuring fact hallucination.

Research Group

My research group is well-known for LRL research, which differs from large commercial labs focusing only on the top 200 languages. Our work is critical for a multilingual Europe which values the role of minority languages, culture, and heritage.

Impact

Our innovations will benefit natural language processing beyond text generation and machine translation and strongly impact other areas of machine learning research suffering from data bottlenecks.

Financiële details & Tijdlijn

Financiële details

Subsidiebedrag	€ 2.498.200
Totale projectbegroting	€ 2.498.200

Tijdlijn

Startdatum	1-3-2025
Einddatum	28-2-2030
Subsidiejaar	2025

Partners & Locaties

Projectpartners

TECHNISCHE UNIVERSITAET MUENCHENpenvoerder

Land(en)

Germany

Vergelijkbare projecten binnen European Research Council

Project	Regeling	Bedrag	Jaar	Actie
A prototype system for obtaining and managing training data for multilingual learning The project aims to empower less-resourced language communities to create parallel corpora for machine translation, enhancing language preservation and cultural heritage through an open-source prototype.	ERC Proof of...	€ 150.000	2023	Details
DEep COgnition Learning for LAnguage GEneration This project aims to enhance NLP models by integrating machine learning, cognitive science, and structured memory to improve out-of-domain generalization and contextual understanding in language generation tasks.	ERC Consolid...	€ 1.999.595	2023	Details
Educational Inequalities and Generative AI: A Focus on Language Diversity ChatEQUITY explores the impact of generative AI on educational inequalities by developing a new assessment tool and analyzing access disparities among students and teachers in diverse linguistic contexts.	ERC Consolid...	€ 2.195.500	2025	Details
Why do infants learn language so fast? A reverse engineering approach This project develops a computational model to explore how infants efficiently learn language through statistical learning and three additional mechanisms, aiming to produce comparable outcomes to children's language acquisition.	ERC Advanced...	€ 2.494.625	2025	Details
Controlling Large Language Models Develop a framework to understand and control large language models, addressing biases and flaws to ensure safe and responsible AI adoption.	ERC Starting...	€ 1.500.000	2024	Details

ERC Proof of...

A prototype system for obtaining and managing training data for multilingual learning

The project aims to empower less-resourced language communities to create parallel corpora for machine translation, enhancing language preservation and cultural heritage through an open-source prototype.

ERC Proof of Concept

€ 150.000

2023

Project	Regeling	Bedrag	Jaar	Actie
Haalbaarheidsonderzoek naar AIPerLearn (AI-Powered Personalized Learning) STARK Learning onderzoekt de toepassing en training van AI-modellen om het ontwikkelen van gepersonaliseerde lesmaterialen te automatiseren en de kwaliteit en validatie te waarborgen.	Mkb-innovati...	€ 20.000	2023	Details
Onderzoek naar eigen taalmodellen in de GMU AI Digital Marketplace GMU onderzoekt de ontwikkeling van een eigen LLaMa-taalmodel op basis van eigen data om een AI Digital Marketplace te creëren en de afhankelijkheid van internationale bedrijven te verminderen.	Mkb-innovati...	€ 20.000	2023	Details
Edion Het project ontwikkelt een geautomatiseerd systeem voor vraaggeneratie in de natuurkunde om docenten tijd te besparen en studenten een boeiendere leerervaring te bieden.	Mkb-innovati...	€ 20.000	2023	Details
Pas ChatGPT aan voor Dierentalen Dit project onderzoekt de haalbaarheid van technologie die dierentalen vertaalt naar menselijke talen om de communicatie tussen mensen en dieren te verbeteren.	Mkb-innovati...	€ 20.000	2023	Details
Project POLIGEN-AI Het project richt zich op het ontwikkelen van een betrouwbare "fact-based" chatbot om desinformatie te bestrijden en geïnformeerde beslissingen te ondersteunen, met aandacht voor technische en juridische haalbaarheid.	Mkb-innovati...	€ 20.000	2023	Details

Projectdetails

Introduction

Project Advances

Research Group

Impact

Financiële details & Tijdlijn

Financiële details

Tijdlijn

Partners & Locaties

Projectpartners

Land(en)

Vergelijkbare projecten binnen European Research Council

A prototype system for obtaining and managing training data for multilingual learning

DEep COgnition Learning for LAnguage GEneration

Educational Inequalities and Generative AI: A Focus on Language Diversity

Why do infants learn language so fast? A reverse engineering approach

Controlling Large Language Models

A prototype system for obtaining and managing training data for multilingual learning

DEep COgnition Learning for LAnguage GEneration

Educational Inequalities and Generative AI: A Focus on Language Diversity

Why do infants learn language so fast? A reverse engineering approach

Controlling Large Language Models

Vergelijkbare projecten uit andere regelingen

Haalbaarheidsonderzoek naar AIPerLearn (AI-Powered Personalized Learning)

Onderzoek naar eigen taalmodellen in de GMU AI Digital Marketplace

Edion

Pas ChatGPT aan voor Dierentalen

Project POLIGEN-AI

Haalbaarheidsonderzoek naar AIPerLearn (AI-Powered Personalized Learning)

Onderzoek naar eigen taalmodellen in de GMU AI Digital Marketplace

Edion

Pas ChatGPT aan voor Dierentalen

Project POLIGEN-AI

Projectdetails

Introduction

Project Advances

Research Group

Impact

Financiële details & Tijdlijn

Financiële details

Tijdlijn

Partners & Locaties

Projectpartners

Land(en)

Vergelijkbare projecten binnen European Research Council

A prototype system for obtaining and managing training data for multilingual learning

DEep COgnition Learning for LAnguage GEneration

Educational Inequalities and Generative AI: A Focus on Language Diversity

Why do infants learn language so fast? A reverse engineering approach

Controlling Large Language Models

A prototype system for obtaining and managing training data for multilingual learning

DEep COgnition Learning for LAnguage GEneration

Educational Inequalities and Generative AI: A Focus on Language Diversity

Why do infants learn language so fast? A reverse engineering approach

Controlling Large Language Models

Vergelijkbare projecten uit andere regelingen

Haalbaarheidsonderzoek naar AIPerLearn (AI-Powered Personalized Learning)

Onderzoek naar eigen taalmodellen in de GMU AI Digital Marketplace

Edion

Pas ChatGPT aan voor Dierentalen

Project POLIGEN-AI

Haalbaarheidsonderzoek naar AIPerLearn (AI-Powered Personalized Learning)

Onderzoek naar eigen taalmodellen in de GMU AI Digital Marketplace

Edion

Pas ChatGPT aan voor Dierentalen

Project POLIGEN-AI