Scalable Learning for Reproducibility in High-Dimensional Biomedical Signal Processing: A Robust Data Science Framework
ScReeningData aims to develop a scalable learning framework to enhance statistical robustness and reproducibility in high-dimensional data analysis, reducing false positives across scientific domains.
Projectdetails
Introduction
Data science has quickly expanded the boundaries of signal processing and statistical learning beyond their accustomed domains. Powerful and complex machine learning architectures have evolved to distinguish relevant information from randomness, artifacts, and irrelevant data.
Challenges in Existing Frameworks
However, existing learning frameworks lack computationally scalable, tractable, and robust methods for high-dimensional data. Consequently, discoveries, for example, in genomic data can be the result of coincidental findings that happen to reach statistical significance.
Need for Appropriate Learning Frameworks
As long as groundbreaking advances in biotechnology are not accompanied by appropriate learning frameworks, valuable efforts are spent on researching false positives.
Project Overview
ScReeningData develops a coherent, fast, and scalable learning framework that jointly addresses the fundamental challenges of:
- Drastically reducing computational complexity
- Providing statistical and robustness guarantees
- Quantifying reproducibility in large-scale and high-dimensional settings
Innovative Approach
An unprecedented approach is developed that builds upon very recent work of the PI. The underlying concept is to repeat randomized controlled experiments that use computer-generated fake variables as negative controls to trigger an early stopping of the learning algorithms, thereby mitigating the so-called curse of dimensionality.
Advantages of Proposed Methods
In contrast to existing methods, the proposed methods are completely tractable and scalable to ultra-high dimensions. The gains of developing advanced robust learning methods that are computed ultra-fast and with tight guarantees on the targeted rate of false positives are enormous.
Impact on Scientific Domains
They lead to new reproducible discoveries that can be made with high statistical power. Due to the fundamental nature and the broad applicability of the proposed learning methods, the impacts of this project extend far beyond the considered biomedical signal processing use-cases, benefiting all scientific domains that analyze high-dimensional data.
Financiële details & Tijdlijn
Financiële details
Subsidiebedrag | € 1.500.000 |
Totale projectbegroting | € 1.500.000 |
Tijdlijn
Startdatum | 1-9-2022 |
Einddatum | 31-8-2027 |
Subsidiejaar | 2022 |
Partners & Locaties
Projectpartners
- TECHNISCHE UNIVERSITAT DARMSTADTpenvoerder
Land(en)
Vergelijkbare projecten binnen European Research Council
Project | Regeling | Bedrag | Jaar | Actie |
---|---|---|---|---|
Reconciling Classical and Modern (Deep) Machine Learning for Real-World ApplicationsAPHELEIA aims to create robust, interpretable, and efficient machine learning models that require less data by integrating classical methods with modern deep learning, fostering interdisciplinary collaboration. | ERC Consolid... | € 1.999.375 | 2023 | Details |
Provable Scalability for high-dimensional Bayesian LearningThis project develops a mathematical theory for scalable Bayesian learning methods, integrating computational and statistical insights to enhance algorithm efficiency and applicability in high-dimensional models. | ERC Starting... | € 1.488.673 | 2023 | Details |
Inference in High Dimensions: Light-speed Algorithms and Information LimitsThe INF^2 project develops information-theoretically grounded methods for efficient high-dimensional inference in machine learning, aiming to reduce costs and enhance interpretability in applications like genome-wide studies. | ERC Starting... | € 1.662.400 | 2024 | Details |
The missing mathematical story of Bayesian uncertainty quantification for big dataThis project aims to enhance scalable Bayesian methods through theoretical insights, improving their accuracy and acceptance in real-world applications like medicine and cosmology. | ERC Starting... | € 1.492.750 | 2022 | Details |
Cascade Processes for Sparse Machine LearningThe project aims to democratize deep learning by developing small-scale, resource-efficient models that enhance fairness, robustness, and interpretability through innovative techniques from statistical physics and network science. | ERC Starting... | € 1.499.285 | 2023 | Details |
Reconciling Classical and Modern (Deep) Machine Learning for Real-World Applications
APHELEIA aims to create robust, interpretable, and efficient machine learning models that require less data by integrating classical methods with modern deep learning, fostering interdisciplinary collaboration.
Provable Scalability for high-dimensional Bayesian Learning
This project develops a mathematical theory for scalable Bayesian learning methods, integrating computational and statistical insights to enhance algorithm efficiency and applicability in high-dimensional models.
Inference in High Dimensions: Light-speed Algorithms and Information Limits
The INF^2 project develops information-theoretically grounded methods for efficient high-dimensional inference in machine learning, aiming to reduce costs and enhance interpretability in applications like genome-wide studies.
The missing mathematical story of Bayesian uncertainty quantification for big data
This project aims to enhance scalable Bayesian methods through theoretical insights, improving their accuracy and acceptance in real-world applications like medicine and cosmology.
Cascade Processes for Sparse Machine Learning
The project aims to democratize deep learning by developing small-scale, resource-efficient models that enhance fairness, robustness, and interpretability through innovative techniques from statistical physics and network science.