Natural Language Understanding for non-standard languages and dialects

DIALECT aims to enhance Natural Language Understanding by developing algorithms that integrate dialectal variation and reduce bias in data and labels for fairer, more accurate language models.

Subsidie

€ 1.997.815

2022

Projectdetails

Introduction

Dialects are ubiquitous and for many speakers are part of everyday life. They carry important social and communicative functions. Yet, dialects and non-standard languages in general are a blind spot in research on Natural Language Understanding (NLU). Despite recent breakthroughs, NLU still fails to take linguistic diversity into account. This lack of modeling language variation results in biased language models with high error rates on dialect data. This failure excludes millions of speakers today and prevents the development of future technology that can adapt to such users.

Need for a Paradigm Shift

To account for linguistic diversity, a paradigm shift is needed:

Away from data-hungry algorithms with passive learning from large data and single ground truth labels, which are known to be biased.
To go past current learning practices, the key is to tackle variation at both ends: in input data and label bias.

With DIALECT, I propose such an integrated approach, to devise algorithms which aid transfer from rich variability in inputs, and interactive learning which integrates human uncertainty in labels. This will reduce the need for data and enable better adaptation and generalization.

Objectives of DIALECT

Advances in salient areas of deep learning research now make it possible to tackle this challenge. DIALECT's objectives are to devise:

a) New algorithms and insights to address extremely scarce data setups and biased labels.
b) Novel representations which integrate auxiliary sources of information such as complement text data with speech.
c) New datasets with conversational data in its most natural form.

Conclusion

By integrating dialectal variation into models able to learn from scarce data and biased labels, the foundations will be established for fairer and more accurate NLU to break down language and literary barriers. I am privileged to carry out this integration as I have contributed to research in top venues on both cross-lingual learning and learning from biased labels.

Financiële details & Tijdlijn

Financiële details

Subsidiebedrag	€ 1.997.815
Totale projectbegroting	€ 1.997.815

Tijdlijn

Startdatum	1-10-2022
Einddatum	30-9-2027
Subsidiejaar	2022

Partners & Locaties

Projectpartners

LUDWIG-MAXIMILIANS-UNIVERSITAET MUENCHENpenvoerder

Land(en)

Germany

Vergelijkbare projecten binnen European Research Council

Project	Regeling	Bedrag	Jaar	Actie
Diving into Data Diversity for Fair and Robust Natural Language Processing DataDivers aims to create a framework for measuring data diversity in NLP datasets to enhance model fairness and robustness through empirical and theoretical insights.	ERC Starting...	€ 1.500.000	2025	Details
DEep COgnition Learning for LAnguage GEneration This project aims to enhance NLP models by integrating machine learning, cognitive science, and structured memory to improve out-of-domain generalization and contextual understanding in language generation tasks.	ERC Consolid...	€ 1.999.595	2023	Details
Next-Generation Natural Language Generation This project aims to enhance natural language generation by integrating neural models with symbolic representations for better control, adaptability, and reliable evaluation across various applications.	ERC Starting...	€ 1.420.375	2022	Details
Language in the Dyad. Linking linguistic and neural alignment. This project investigates the relationship between neural and linguistic alignment in dyadic communication using EEG hyper-scanning and interactive language games to enhance understanding of dialogue dynamics.	ERC Consolid...	€ 1.997.048	2025	Details
Personalized and Subjective approaches to Natural Language Processing PERSONAE aims to revolutionize NLP by developing personalizable language technologies that empower individuals to adapt subjective tasks like sentiment analysis and abusive language detection.	ERC Starting...	€ 1.499.775	2024	Details

ERC Starting...

Diving into Data Diversity for Fair and Robust Natural Language Processing

DataDivers aims to create a framework for measuring data diversity in NLP datasets to enhance model fairness and robustness through empirical and theoretical insights.

ERC Starting Grant

€ 1.500.000

2025

Project	Regeling	Bedrag	Jaar	Actie
Project Hominis Het project richt zich op het ontwikkelen van een ethisch AI-systeem voor natuurlijke taalverwerking dat vooroordelen minimaliseert en technische, economische en regelgevingsrisico's beheert.	Mkb-innovati...	€ 20.000	2022	Details
Een standaard voor productiewaardige Deep Learning systemen Het project richt zich op het verbeteren van audio- en video-analyse systemen door samenwerking tussen Media Distillery, NovoLanguage en een partner, met als doel hogere kwaliteit en snellere ontwikkeling via gedeelde technologieën.	Mkb-innovati...	€ 104.061	2016	Details
Bias Neutraliser CorTexter ontwikkelt een deep learning software om onbedoelde vooroordelen in recruitmentteksten te herkennen en te neutraliseren, waardoor gelijke kansen voor werkzoekenden worden bevorderd.	Mkb-innovati...	€ 20.000	2021	Details

Projectdetails

Introduction

Need for a Paradigm Shift

Objectives of DIALECT

Conclusion

Financiële details & Tijdlijn

Financiële details

Tijdlijn

Partners & Locaties

Projectpartners

Land(en)

Vergelijkbare projecten binnen European Research Council

Diving into Data Diversity for Fair and Robust Natural Language Processing

DEep COgnition Learning for LAnguage GEneration

Next-Generation Natural Language Generation

Language in the Dyad. Linking linguistic and neural alignment.

Personalized and Subjective approaches to Natural Language Processing

Diving into Data Diversity for Fair and Robust Natural Language Processing

DEep COgnition Learning for LAnguage GEneration

Next-Generation Natural Language Generation

Language in the Dyad. Linking linguistic and neural alignment.

Personalized and Subjective approaches to Natural Language Processing

Vergelijkbare projecten uit andere regelingen

Project Hominis

Een standaard voor productiewaardige Deep Learning systemen

Bias Neutraliser

Project Hominis

Een standaard voor productiewaardige Deep Learning systemen

Bias Neutraliser

Projectdetails

Introduction

Need for a Paradigm Shift

Objectives of DIALECT

Conclusion

Financiële details & Tijdlijn

Financiële details

Tijdlijn

Partners & Locaties

Projectpartners

Land(en)

Vergelijkbare projecten binnen European Research Council

Diving into Data Diversity for Fair and Robust Natural Language Processing

DEep COgnition Learning for LAnguage GEneration

Next-Generation Natural Language Generation

Language in the Dyad. Linking linguistic and neural alignment.

Personalized and Subjective approaches to Natural Language Processing

Diving into Data Diversity for Fair and Robust Natural Language Processing

DEep COgnition Learning for LAnguage GEneration

Next-Generation Natural Language Generation

Language in the Dyad. Linking linguistic and neural alignment.

Personalized and Subjective approaches to Natural Language Processing

Vergelijkbare projecten uit andere regelingen

Project Hominis

Een standaard voor productiewaardige Deep Learning systemen

Bias Neutraliser

Project Hominis

Een standaard voor productiewaardige Deep Learning systemen

Bias Neutraliser