Evaluating and Programming Intelligent Chatbots for Any Language

EPICAL aims to enhance intelligent chatbots by integrating low resource languages through innovative techniques in text generation, translation, and speech processing, promoting multilingual inclusivity.

Subsidie
€ 2.498.200
2025

Projectdetails

Introduction

Intelligent chatbots (ICs) such as ChatGPT have revolutionized the generation of content for a few languages such as English, but there are 7099 currently spoken languages in the world. EPICAL will, for the first time, determine how to add new low resource languages (LRLs) to ICs.

Project Advances

We will make six advances to revolutionize the capabilities of ICs, unifying different areas of research that are incorrectly studied separately. We will:

  1. Determine how to generate hallucination-free text using ICs, and how to enter a virtuous cycle where LRL text is created using cross-lingual knowledge from ICs and then quickly post-edited and trained upon, resulting in a better LRL representation in the IC.
  2. Develop more powerful encoding and language adaptation approaches which combine the benefits of fine-tuning and adapters, taking full advantage of linguistically related languages to model LRLs.
  3. Enable ICs to reason about their own LRL capabilities and determine what they know and do not know.
  4. Unify research on machine translation and ICs to obtain ICs which can translate to LRLs with state-of-the-art accuracy.
  5. Enable high quality text-to-speech and automatic speech recognition of LRLs with ICs, thereby unifying the research on low resource speech processing with research on LRL text processing.
  6. Develop a novel evaluation methodology including a robust method for automatically measuring fact hallucination.

Research Group

My research group is well-known for LRL research, which differs from large commercial labs focusing only on the top 200 languages. Our work is critical for a multilingual Europe which values the role of minority languages, culture, and heritage.

Impact

Our innovations will benefit natural language processing beyond text generation and machine translation and strongly impact other areas of machine learning research suffering from data bottlenecks.

Financiële details & Tijdlijn

Financiële details

Subsidiebedrag€ 2.498.200
Totale projectbegroting€ 2.498.200

Tijdlijn

Startdatum1-3-2025
Einddatum28-2-2030
Subsidiejaar2025

Partners & Locaties

Projectpartners

  • TECHNISCHE UNIVERSITAET MUENCHENpenvoerder

Land(en)

Germany

Vergelijkbare projecten binnen European Research Council

ERC Proof of...

A prototype system for obtaining and managing training data for multilingual learning

The project aims to empower less-resourced language communities to create parallel corpora for machine translation, enhancing language preservation and cultural heritage through an open-source prototype.

€ 150.000
ERC Consolid...

DEep COgnition Learning for LAnguage GEneration

This project aims to enhance NLP models by integrating machine learning, cognitive science, and structured memory to improve out-of-domain generalization and contextual understanding in language generation tasks.

€ 1.999.595
ERC Consolid...

Educational Inequalities and Generative AI: A Focus on Language Diversity

ChatEQUITY explores the impact of generative AI on educational inequalities by developing a new assessment tool and analyzing access disparities among students and teachers in diverse linguistic contexts.

€ 2.195.500
ERC Advanced...

Why do infants learn language so fast? A reverse engineering approach

This project develops a computational model to explore how infants efficiently learn language through statistical learning and three additional mechanisms, aiming to produce comparable outcomes to children's language acquisition.

€ 2.494.625
ERC Starting...

Controlling Large Language Models

Develop a framework to understand and control large language models, addressing biases and flaws to ensure safe and responsible AI adoption.

€ 1.500.000

Vergelijkbare projecten uit andere regelingen

Mkb-innovati...

Haalbaarheidsonderzoek naar AIPerLearn (AI-Powered Personalized Learning)

STARK Learning onderzoekt de toepassing en training van AI-modellen om het ontwikkelen van gepersonaliseerde lesmaterialen te automatiseren en de kwaliteit en validatie te waarborgen.

€ 20.000
Mkb-innovati...

Onderzoek naar eigen taalmodellen in de GMU AI Digital Marketplace

GMU onderzoekt de ontwikkeling van een eigen LLaMa-taalmodel op basis van eigen data om een AI Digital Marketplace te creëren en de afhankelijkheid van internationale bedrijven te verminderen.

€ 20.000
Mkb-innovati...

Edion

Het project ontwikkelt een geautomatiseerd systeem voor vraaggeneratie in de natuurkunde om docenten tijd te besparen en studenten een boeiendere leerervaring te bieden.

€ 20.000
Mkb-innovati...

Pas ChatGPT aan voor Dierentalen

Dit project onderzoekt de haalbaarheid van technologie die dierentalen vertaalt naar menselijke talen om de communicatie tussen mensen en dieren te verbeteren.

€ 20.000
Mkb-innovati...

Project POLIGEN-AI

Het project richt zich op het ontwikkelen van een betrouwbare "fact-based" chatbot om desinformatie te bestrijden en geïnformeerde beslissingen te ondersteunen, met aandacht voor technische en juridische haalbaarheid.

€ 20.000