Scalable Learning for Reproducibility in High-Dimensional Biomedical Signal Processing: A Robust Data Science Framework

ScReeningData aims to develop a scalable learning framework to enhance statistical robustness and reproducibility in high-dimensional data analysis, reducing false positives across scientific domains.

Subsidie
€ 1.500.000
2022

Projectdetails

Introduction

Data science has quickly expanded the boundaries of signal processing and statistical learning beyond their accustomed domains. Powerful and complex machine learning architectures have evolved to distinguish relevant information from randomness, artifacts, and irrelevant data.

Challenges in Existing Frameworks

However, existing learning frameworks lack computationally scalable, tractable, and robust methods for high-dimensional data. Consequently, discoveries, for example, in genomic data can be the result of coincidental findings that happen to reach statistical significance.

Need for Appropriate Learning Frameworks

As long as groundbreaking advances in biotechnology are not accompanied by appropriate learning frameworks, valuable efforts are spent on researching false positives.

Project Overview

ScReeningData develops a coherent, fast, and scalable learning framework that jointly addresses the fundamental challenges of:

  1. Drastically reducing computational complexity
  2. Providing statistical and robustness guarantees
  3. Quantifying reproducibility in large-scale and high-dimensional settings

Innovative Approach

An unprecedented approach is developed that builds upon very recent work of the PI. The underlying concept is to repeat randomized controlled experiments that use computer-generated fake variables as negative controls to trigger an early stopping of the learning algorithms, thereby mitigating the so-called curse of dimensionality.

Advantages of Proposed Methods

In contrast to existing methods, the proposed methods are completely tractable and scalable to ultra-high dimensions. The gains of developing advanced robust learning methods that are computed ultra-fast and with tight guarantees on the targeted rate of false positives are enormous.

Impact on Scientific Domains

They lead to new reproducible discoveries that can be made with high statistical power. Due to the fundamental nature and the broad applicability of the proposed learning methods, the impacts of this project extend far beyond the considered biomedical signal processing use-cases, benefiting all scientific domains that analyze high-dimensional data.

Financiële details & Tijdlijn

Financiële details

Subsidiebedrag€ 1.500.000
Totale projectbegroting€ 1.500.000

Tijdlijn

Startdatum1-9-2022
Einddatum31-8-2027
Subsidiejaar2022

Partners & Locaties

Projectpartners

  • TECHNISCHE UNIVERSITAT DARMSTADTpenvoerder

Land(en)

Germany

Vergelijkbare projecten binnen European Research Council

ERC Consolid...

Reconciling Classical and Modern (Deep) Machine Learning for Real-World Applications

APHELEIA aims to create robust, interpretable, and efficient machine learning models that require less data by integrating classical methods with modern deep learning, fostering interdisciplinary collaboration.

€ 1.999.375
ERC Starting...

Provable Scalability for high-dimensional Bayesian Learning

This project develops a mathematical theory for scalable Bayesian learning methods, integrating computational and statistical insights to enhance algorithm efficiency and applicability in high-dimensional models.

€ 1.488.673
ERC Starting...

Inference in High Dimensions: Light-speed Algorithms and Information Limits

The INF^2 project develops information-theoretically grounded methods for efficient high-dimensional inference in machine learning, aiming to reduce costs and enhance interpretability in applications like genome-wide studies.

€ 1.662.400
ERC Starting...

The missing mathematical story of Bayesian uncertainty quantification for big data

This project aims to enhance scalable Bayesian methods through theoretical insights, improving their accuracy and acceptance in real-world applications like medicine and cosmology.

€ 1.492.750
ERC Starting...

Cascade Processes for Sparse Machine Learning

The project aims to democratize deep learning by developing small-scale, resource-efficient models that enhance fairness, robustness, and interpretability through innovative techniques from statistical physics and network science.

€ 1.499.285