An Application for leveraging large-scale historical textbases
HistText is a user-friendly application designed for large-scale data mining of historical texts, enabling scholars to extract insights from vast multilingual corpora using advanced machine learning techniques.
Projectdetails
Introduction
HistText is a groundbreaking application developed to address the complex challenges of large-scale data mining in textual corpora, with a particular focus on historical documents. Created in the context of the ERC-funded ENP-China project, which aims to study the evolution of Chinese elites from the 19th century to 1949, HistText is the result of a synergistic collaboration between historians and computer scientists exploring machine learning applications for extensive text archives.
Features
Designed to manage databases containing billions of words across millions of multilingual documents, HistText offers a robust and versatile platform that streamlines the process of extracting and visualizing valuable insights. The application features:
- A user-friendly interface
- Advanced text analysis techniques
- Powerful data visualization capabilities
It provides a simplified approach for novice users to conduct complex data queries and analyses, while also offering a comprehensive R-library for more expert users.
Challenges
The main challenge that the proof of concept aims to tackle is to make HistText a fully packageable and transferable tool that can cater to the specialized needs of scholars and institutions holding vast digital repositories.
Impact
With its focus on advanced text analysis and user accessibility, HistText stands as an invaluable resource not only for academics in the digital humanities but also for students and the general public.
Broader Applications
In terms of broader applications, HistText has the potential to be integrated into a wide range of institutions, including:
- Libraries
- Digital content providers
- Other educational and research institutions
The platform is exceptionally well-suited for analyzing a wide range of text genres, including newspapers, periodicals, directories, and diaries, among others.
Conclusion
By offering a scalable, user-friendly, and methodologically rigorous tool, HistText aims to revolutionize how we approach large-scale textual analysis, providing a new pathway for understanding historical documents.
Financiële details & Tijdlijn
Financiële details
Subsidiebedrag | € 150.000 |
Totale projectbegroting | € 150.000 |
Tijdlijn
Startdatum | 1-9-2024 |
Einddatum | 28-2-2026 |
Subsidiejaar | 2024 |
Partners & Locaties
Projectpartners
- UNIVERSITE D'AIX MARSEILLEpenvoerder
Land(en)
Vergelijkbare projecten binnen European Research Council
Project | Regeling | Bedrag | Jaar | Actie |
---|---|---|---|---|
Modelling Text as a Living Object in Cross-Document ContextInterText establishes a comprehensive framework for intertextuality in NLP, enabling efficient cross-document understanding through novel data models and neural representations for diverse applications. | ERC Advanced... | € 2.499.721 | 2023 | Details |
Tool for the Analysis of Information Transfer in Manuscript CulturesThe TInTraMac project develops a free, flexible research tool to encourage scholars to adopt digital methods for analyzing manuscript texts, enhancing accessibility and collaboration in textual studies. | ERC Proof of... | € 150.000 | 2023 | Details |
Geology of Texts, Genealogy of Concepts, Intellectual Ecosystems: Mapping the Indic and Tibetic Buddhist Text CorporaIntellexus aims to uncover the interdependent development of Indic and Tibetic Buddhist texts and ideas through innovative mapping and visualization methods, enhancing understanding of their cultural traditions. | ERC Synergy ... | € 9.902.166 | 2024 | Details |
Migrations of Textual and Scribal Traditions via Large-Scale Computational Analysis of Medieval Manuscripts in Hebrew ScriptMIDRASH aims to develop an interdisciplinary methodology using advanced technologies to study and reconstruct medieval Hebrew manuscripts, enhancing understanding of Jewish literary culture and its historical significance. | ERC Synergy ... | € 10.296.259 | 2023 | Details |
Building an AI-tool to facilitate the integration, accessibility, and usability of heterogeneous cultural heritage data on medieval manuscripts.ManuscriptAI aims to integrate and enhance access to medieval manuscript data using machine learning, promoting digital preservation and inclusivity in Europe's cultural heritage narrative. | ERC Proof of... | € 150.000 | 2024 | Details |
Modelling Text as a Living Object in Cross-Document Context
InterText establishes a comprehensive framework for intertextuality in NLP, enabling efficient cross-document understanding through novel data models and neural representations for diverse applications.
Tool for the Analysis of Information Transfer in Manuscript Cultures
The TInTraMac project develops a free, flexible research tool to encourage scholars to adopt digital methods for analyzing manuscript texts, enhancing accessibility and collaboration in textual studies.
Geology of Texts, Genealogy of Concepts, Intellectual Ecosystems: Mapping the Indic and Tibetic Buddhist Text Corpora
Intellexus aims to uncover the interdependent development of Indic and Tibetic Buddhist texts and ideas through innovative mapping and visualization methods, enhancing understanding of their cultural traditions.
Migrations of Textual and Scribal Traditions via Large-Scale Computational Analysis of Medieval Manuscripts in Hebrew Script
MIDRASH aims to develop an interdisciplinary methodology using advanced technologies to study and reconstruct medieval Hebrew manuscripts, enhancing understanding of Jewish literary culture and its historical significance.
Building an AI-tool to facilitate the integration, accessibility, and usability of heterogeneous cultural heritage data on medieval manuscripts.
ManuscriptAI aims to integrate and enhance access to medieval manuscript data using machine learning, promoting digital preservation and inclusivity in Europe's cultural heritage narrative.
Vergelijkbare projecten uit andere regelingen
Project | Regeling | Bedrag | Jaar | Actie |
---|---|---|---|---|
Bias NeutraliserCorTexter ontwikkelt een deep learning software om onbedoelde vooroordelen in recruitmentteksten te herkennen en te neutraliseren, waardoor gelijke kansen voor werkzoekenden worden bevorderd. | Mkb-innovati... | € 20.000 | 2021 | Details |
Elementa Labs MIT 2022 – Digitaliseren van Lab notebooks van kennisinstantiesHet project digitaliseert oude lab notebooks van universiteiten om waardevolle kennis toegankelijk te maken, waardoor nieuwe onderzoekers fouten kunnen vermijden en efficiënter kunnen experimenteren. | Mkb-innovati... | € 19.200 | 2022 | Details |
Real time knowledge extraction from unstructured big data streamsDit project ontwikkelt een applicatie voor het structureren van ongestructureerde data uit sociale media om de productiviteit in de agrarische sector te verbeteren via machine learning. | Mkb-innovati... | € 199.307 | 2017 | Details |
CorTexterCorTexter ontwikkelt een inclusieve tool voor datagedreven screening en matching, die biases voorkomt en aanvullende informatie benut om eerlijke arbeidskansen te bevorderen. | Mkb-innovati... | € 20.000 | 2020 | Details |
Inzet van computational linguistics voor het vergaren van military intelligenceDit project onderzoekt de haalbaarheid van computational linguistics voor het vergaren van militaire inlichtingen ter verbetering van veiligheid. | Mkb-innovati... | € 20.000 | 2023 | Details |
Bias Neutraliser
CorTexter ontwikkelt een deep learning software om onbedoelde vooroordelen in recruitmentteksten te herkennen en te neutraliseren, waardoor gelijke kansen voor werkzoekenden worden bevorderd.
Elementa Labs MIT 2022 – Digitaliseren van Lab notebooks van kennisinstanties
Het project digitaliseert oude lab notebooks van universiteiten om waardevolle kennis toegankelijk te maken, waardoor nieuwe onderzoekers fouten kunnen vermijden en efficiënter kunnen experimenteren.
Real time knowledge extraction from unstructured big data streams
Dit project ontwikkelt een applicatie voor het structureren van ongestructureerde data uit sociale media om de productiviteit in de agrarische sector te verbeteren via machine learning.
CorTexter
CorTexter ontwikkelt een inclusieve tool voor datagedreven screening en matching, die biases voorkomt en aanvullende informatie benut om eerlijke arbeidskansen te bevorderen.
Inzet van computational linguistics voor het vergaren van military intelligence
Dit project onderzoekt de haalbaarheid van computational linguistics voor het vergaren van militaire inlichtingen ter verbetering van veiligheid.