Explainable Machine Learning for Identifying the Full Heterogeneity of Peptidoforms and Proteoforms

explAInProt aims to enhance proteomics by developing explainable, end-to-end machine learning models to identify undetected protein variants and improve clinical applications through advanced sequencing methods.

Subsidie
€ 1.992.500
2024

Projectdetails

Introduction

Mass spectrometry driven proteomics allows deep insights into the working of cells. Still, the vast majority of proteoforms, representing the full heterogeneity of molecular forms of protein products in a sample, currently remain undetected in proteomics experiments.

Knowledge Gaps

This lack of information strongly restricts our knowledge of disease progression, possible biomarkers, and therapeutic targets across a large number of diseases. Several machine learning approaches have been developed for proteomics data, but not being trained end-to-end, they cannot capture the full wealth of proteomic mass spectra and commonly remain unexplained black boxes.

Project Overview

Within explAInProt, my team and I will develop representations of spectra that allow deploying explainable, end-to-end machine learning models on the wealth of proteomic data available, regarding both bottom-up and top-down spectra to identify novel protein variants.

Importance of Explanations

Explanations will allow identifying the origin of predictions and will help reduce bias, building up the trustworthiness of AI systems required for clinical applications.

Verification Strategies

To verify results, we will pioneer orthogonal real-time strategies based on selective sequencing approaches and calling of amino acids that we will introduce for nanopore sequencing devices as a complementary acquisition method.

Addressing Dark Matter

All combined, this will allow us to drastically increase our knowledge about the current dark matter of mass spectrometry driven proteomics: those proteins and peptides that are non-canonically modified, non-tryptic, have potentially multiple amino acid substitutions, or no close match in databases, or result from structural variants such as fusion proteins that remain undetected in current analyses.

Applicability

We will highlight applicability in two areas of particular concern in current approaches:

  1. The detection of structural variants in proteomic mass spectra.
  2. The characterization of novel microbial organisms without sufficient database information.

Financiële details & Tijdlijn

Financiële details

Subsidiebedrag€ 1.992.500
Totale projectbegroting€ 1.992.500

Tijdlijn

Startdatum1-12-2024
Einddatum30-11-2029
Subsidiejaar2024

Partners & Locaties

Projectpartners

  • HASSO-PLATTNER-INSTITUT FUR DIGITAL ENGINEERING GGMBHpenvoerder

Land(en)

Germany

Vergelijkbare projecten binnen European Research Council

ERC Starting...

Learning Isoform Fingerprints to Discover the Molecular Diversity of Life

This project aims to revolutionize proteomics by developing a novel data analysis strategy using deep learning to discover and quantify protein isoforms through their unique multi-dimensional fingerprints (ORIGINs).

€ 1.498.939
ERC Advanced...

A Native Mass Spectrometry Systemic View of Cellular Structural Biology

This project aims to enhance native mass spectrometry for studying protein interactions and diversity in their natural cellular environments, advancing structural biology and related fields.

€ 2.954.167
ERC Consolid...

Deciphering regulatory principles of proteasome heterogeneity and the degradation landscape in cancer

The project aims to enhance understanding of proteasome activity in cancer through MAPP technology, exploring its role in tumor-immune interactions and potential for improving immunotherapy outcomes.

€ 1.978.750
ERC Starting...

Deep Spatial Proteomics: connecting cellular neighbourhoods to functional states

Developing Deep Spatial Proteomics (DSP) to link cellular neighborhoods to proteome states, aiming to uncover disease mechanisms and improve patient stratification in cancer immunotherapy.

€ 1.470.851
ERC Proof of...

Precise, Rapid and Scalable Proteomics Solutions for Archaeology, Ecology, Wildlife Forensics and Food-chain Authentication

The PReciSe project aims to develop a fast, cost-effective proteomics method for taxonomic identification to enhance archaeological, ecological, and food supply chain verification.

€ 150.000

Vergelijkbare projecten uit andere regelingen

Mkb-innovati...

Haalbaarheidsonderzoek analyse-apparaat voor volledige en gevouwen eiwitten

Portal Biotech ontwikkelt een innovatieve nanopore-technologie voor het meten van volledige eiwitten, met als doel de diagnostiek te revolutioneren en klinische beslissingen te verbeteren.

€ 20.000
Mkb-innovati...

Haalbaarheidsonderzoek: portable nanopore device voor de identificatie van eiwitten en biomarkers.

Portal Biotech ontwikkelt een draagbaar analysetoestel op basis van nanopore technologie om real-time eiwitmetingen mogelijk te maken, wat de diagnostiek revolutionair verandert.

€ 20.000
EIC Accelerator

The ProM platform: New ways to drug the undruggable

PROSION's ProM-platform aims to unlock and target the undruggable 85% of the human proteome, developing new therapies for hard-to-treat diseases like cancer.

€ 2.461.375
EIC Accelerator

First time ultra-sensitive and simultaneous quantification of proteins, interactions, and post-translational modifications in single cells to enable exponential growth in proteomics and interactomics

PICO-NGS aims to revolutionize proteomics by enabling ultra-sensitive, high-parallel measurement of proteins, interactions, and modifications, accelerating advancements in various industries.

€ 2.498.125