An Application for leveraging large-scale historical textbases

HistText is a user-friendly application designed for large-scale data mining of historical texts, enabling scholars to extract insights from vast multilingual corpora using advanced machine learning techniques.

Subsidie

€ 150.000

2024

Projectdetails

Introduction

HistText is a groundbreaking application developed to address the complex challenges of large-scale data mining in textual corpora, with a particular focus on historical documents. Created in the context of the ERC-funded ENP-China project, which aims to study the evolution of Chinese elites from the 19th century to 1949, HistText is the result of a synergistic collaboration between historians and computer scientists exploring machine learning applications for extensive text archives.

Features

Designed to manage databases containing billions of words across millions of multilingual documents, HistText offers a robust and versatile platform that streamlines the process of extracting and visualizing valuable insights. The application features:

A user-friendly interface
Advanced text analysis techniques
Powerful data visualization capabilities

It provides a simplified approach for novice users to conduct complex data queries and analyses, while also offering a comprehensive R-library for more expert users.

Challenges

The main challenge that the proof of concept aims to tackle is to make HistText a fully packageable and transferable tool that can cater to the specialized needs of scholars and institutions holding vast digital repositories.

Impact

With its focus on advanced text analysis and user accessibility, HistText stands as an invaluable resource not only for academics in the digital humanities but also for students and the general public.

Broader Applications

In terms of broader applications, HistText has the potential to be integrated into a wide range of institutions, including:

Libraries
Digital content providers
Other educational and research institutions

The platform is exceptionally well-suited for analyzing a wide range of text genres, including newspapers, periodicals, directories, and diaries, among others.

Conclusion

By offering a scalable, user-friendly, and methodologically rigorous tool, HistText aims to revolutionize how we approach large-scale textual analysis, providing a new pathway for understanding historical documents.

Financiële details & Tijdlijn

Financiële details

Subsidiebedrag	€ 150.000
Totale projectbegroting	€ 150.000

Tijdlijn

Startdatum	1-9-2024
Einddatum	28-2-2026
Subsidiejaar	2024

Partners & Locaties

Projectpartners

UNIVERSITE D'AIX MARSEILLEpenvoerder

Land(en)

France

Vergelijkbare projecten binnen European Research Council

Project	Regeling	Bedrag	Jaar	Actie
Modelling Text as a Living Object in Cross-Document Context InterText establishes a comprehensive framework for intertextuality in NLP, enabling efficient cross-document understanding through novel data models and neural representations for diverse applications.	ERC Advanced...	€ 2.499.721	2023	Details
Tool for the Analysis of Information Transfer in Manuscript Cultures The TInTraMac project develops a free, flexible research tool to encourage scholars to adopt digital methods for analyzing manuscript texts, enhancing accessibility and collaboration in textual studies.	ERC Proof of...	€ 150.000	2023	Details
Geology of Texts, Genealogy of Concepts, Intellectual Ecosystems: Mapping the Indic and Tibetic Buddhist Text Corpora Intellexus aims to uncover the interdependent development of Indic and Tibetic Buddhist texts and ideas through innovative mapping and visualization methods, enhancing understanding of their cultural traditions.	ERC Synergy ...	€ 9.902.166	2024	Details
Migrations of Textual and Scribal Traditions via Large-Scale Computational Analysis of Medieval Manuscripts in Hebrew Script MIDRASH aims to develop an interdisciplinary methodology using advanced technologies to study and reconstruct medieval Hebrew manuscripts, enhancing understanding of Jewish literary culture and its historical significance.	ERC Synergy ...	€ 10.296.259	2023	Details
Building an AI-tool to facilitate the integration, accessibility, and usability of heterogeneous cultural heritage data on medieval manuscripts. ManuscriptAI aims to integrate and enhance access to medieval manuscript data using machine learning, promoting digital preservation and inclusivity in Europe's cultural heritage narrative.	ERC Proof of...	€ 150.000	2024	Details

ERC Advanced...

Modelling Text as a Living Object in Cross-Document Context

InterText establishes a comprehensive framework for intertextuality in NLP, enabling efficient cross-document understanding through novel data models and neural representations for diverse applications.

ERC Advanced Grant

€ 2.499.721

2023

Project	Regeling	Bedrag	Jaar	Actie
Bias Neutraliser CorTexter ontwikkelt een deep learning software om onbedoelde vooroordelen in recruitmentteksten te herkennen en te neutraliseren, waardoor gelijke kansen voor werkzoekenden worden bevorderd.	Mkb-innovati...	€ 20.000	2021	Details
Elementa Labs MIT 2022 – Digitaliseren van Lab notebooks van kennisinstanties Het project digitaliseert oude lab notebooks van universiteiten om waardevolle kennis toegankelijk te maken, waardoor nieuwe onderzoekers fouten kunnen vermijden en efficiënter kunnen experimenteren.	Mkb-innovati...	€ 19.200	2022	Details
Real time knowledge extraction from unstructured big data streams Dit project ontwikkelt een applicatie voor het structureren van ongestructureerde data uit sociale media om de productiviteit in de agrarische sector te verbeteren via machine learning.	Mkb-innovati...	€ 199.307	2017	Details
CorTexter CorTexter ontwikkelt een inclusieve tool voor datagedreven screening en matching, die biases voorkomt en aanvullende informatie benut om eerlijke arbeidskansen te bevorderen.	Mkb-innovati...	€ 20.000	2020	Details
Inzet van computational linguistics voor het vergaren van military intelligence Dit project onderzoekt de haalbaarheid van computational linguistics voor het vergaren van militaire inlichtingen ter verbetering van veiligheid.	Mkb-innovati...	€ 20.000	2023	Details

Projectdetails

Introduction

Features

Challenges

Impact

Broader Applications

Conclusion

Financiële details & Tijdlijn

Financiële details

Tijdlijn

Partners & Locaties

Projectpartners

Land(en)

Vergelijkbare projecten binnen European Research Council

Modelling Text as a Living Object in Cross-Document Context

Tool for the Analysis of Information Transfer in Manuscript Cultures

Geology of Texts, Genealogy of Concepts, Intellectual Ecosystems: Mapping the Indic and Tibetic Buddhist Text Corpora

Migrations of Textual and Scribal Traditions via Large-Scale Computational Analysis of Medieval Manuscripts in Hebrew Script

Building an AI-tool to facilitate the integration, accessibility, and usability of heterogeneous cultural heritage data on medieval manuscripts.

Modelling Text as a Living Object in Cross-Document Context

Tool for the Analysis of Information Transfer in Manuscript Cultures

Geology of Texts, Genealogy of Concepts, Intellectual Ecosystems: Mapping the Indic and Tibetic Buddhist Text Corpora

Migrations of Textual and Scribal Traditions via Large-Scale Computational Analysis of Medieval Manuscripts in Hebrew Script

Building an AI-tool to facilitate the integration, accessibility, and usability of heterogeneous cultural heritage data on medieval manuscripts.

Vergelijkbare projecten uit andere regelingen

Bias Neutraliser

Elementa Labs MIT 2022 – Digitaliseren van Lab notebooks van kennisinstanties

Real time knowledge extraction from unstructured big data streams

CorTexter

Inzet van computational linguistics voor het vergaren van military intelligence

Bias Neutraliser

Elementa Labs MIT 2022 – Digitaliseren van Lab notebooks van kennisinstanties

Real time knowledge extraction from unstructured big data streams

CorTexter

Inzet van computational linguistics voor het vergaren van military intelligence

Projectdetails

Introduction

Features

Challenges

Impact

Broader Applications

Conclusion

Financiële details & Tijdlijn

Financiële details

Tijdlijn

Partners & Locaties

Projectpartners

Land(en)

Vergelijkbare projecten binnen European Research Council

Modelling Text as a Living Object in Cross-Document Context

Tool for the Analysis of Information Transfer in Manuscript Cultures

Geology of Texts, Genealogy of Concepts, Intellectual Ecosystems: Mapping the Indic and Tibetic Buddhist Text Corpora

Migrations of Textual and Scribal Traditions via Large-Scale Computational Analysis of Medieval Manuscripts in Hebrew Script

Building an AI-tool to facilitate the integration, accessibility, and usability of heterogeneous cultural heritage data on medieval manuscripts.

Modelling Text as a Living Object in Cross-Document Context

Tool for the Analysis of Information Transfer in Manuscript Cultures

Geology of Texts, Genealogy of Concepts, Intellectual Ecosystems: Mapping the Indic and Tibetic Buddhist Text Corpora

Migrations of Textual and Scribal Traditions via Large-Scale Computational Analysis of Medieval Manuscripts in Hebrew Script

Building an AI-tool to facilitate the integration, accessibility, and usability of heterogeneous cultural heritage data on medieval manuscripts.

Vergelijkbare projecten uit andere regelingen

Bias Neutraliser

Elementa Labs MIT 2022 – Digitaliseren van Lab notebooks van kennisinstanties

Real time knowledge extraction from unstructured big data streams

CorTexter

Inzet van computational linguistics voor het vergaren van military intelligence

Bias Neutraliser

Elementa Labs MIT 2022 – Digitaliseren van Lab notebooks van kennisinstanties

Real time knowledge extraction from unstructured big data streams

CorTexter

Inzet van computational linguistics voor het vergaren van military intelligence