Scalable Graph Algorithms for Bioinformatics using Structure, Parameterization and Dynamic Updates

This project aims to develop scalable exact graph algorithms for processing sequencing data, enhancing accuracy in RNA transcript discovery and genomic database indexing.

Subsidie
€ 1.999.868
2025

Projectdetails

Introduction

Sequencing technologies have developed to be cheap and accurate, leading to major breakthroughs, such as the complete sequence of a human genome, the creation of nationwide population gene banks, or the discovery of novel viruses. As the amount of data produced grows exponentially and their applications become more broad and complex, the community needs accurate computational methods that scale.

Algorithmic Challenges

At the core of many algorithmic methods for processing sequencing data is the basic primitive of finding a set of paths or walks in graphs of various nature. Under different formulations and objective functions, the resulting problems can be:

  1. NP-hard (e.g. flow decompositions)
  2. Polynomial-time (e.g. path covers)

These challenges are impractical on large graphs. Thus, many practical tools prefer fast heuristics to exact algorithms. While these may be optimized for specific inputs, they may not be reliable or accurate in general, which is a highly relevant issue in, e.g., medical and life-science research.

Project Goals

This project will develop general methods to massively scale such exact graph algorithms. The approach will include:

  1. Novel Graph Structures:

    • Safety structures, e.g., sets of paths that can be quickly found to appear in all optimal solutions and thus simplify the problem.
    • Variation structures that limit the hardness of a problem only to graph areas rich in genetic variation.
  2. Modern Algorithmic Techniques:

    • Parameterizing polynomial algorithms to run in time linear in the graph size and superlinear only in a small parameter.
    • Dynamic algorithms that, as the input grows, update solutions based only on the new data.

Applications

We will apply these methods in two high-impact applications:

  1. Long-read RNA transcript discovery
  2. Indexing massive and rapidly growing genomic databases

Conclusion

This project paves the way for exact graph algorithms usable independently of the problem complexity or of the input size, applicable to real-world problems.

Financiële details & Tijdlijn

Financiële details

Subsidiebedrag€ 1.999.868
Totale projectbegroting€ 1.999.868

Tijdlijn

Startdatum1-9-2025
Einddatum31-8-2030
Subsidiejaar2025

Partners & Locaties

Projectpartners

  • HELSINGIN YLIOPISTOpenvoerder

Land(en)

Finland

Vergelijkbare projecten binnen European Research Council

ERC Consolid...

Planetary-scale indexing of sequencing data

Develop a planetary genomic search engine to efficiently index and analyze vast DNA and RNA sequencing data, enabling groundbreaking biological discoveries and improved data accessibility.

€ 1.933.625
ERC Advanced...

The sequencing microscope - a path to look at the molecules of biology

This project aims to develop a novel technique that uses sequencing data to infer spatial information in tissues, enhancing our understanding of biological systems without advanced microscopy.

€ 2.500.000
ERC Proof of...

Computational multiplexing to optimise next-generation sequencing

Developing MultiSeq, a bioinformatics solution to streamline and reduce costs in NGS library preparation, aiming to democratize sequencing technology and enhance its application across industries.

€ 150.000
ERC Starting...

Provable Scalability for high-dimensional Bayesian Learning

This project develops a mathematical theory for scalable Bayesian learning methods, integrating computational and statistical insights to enhance algorithm efficiency and applicability in high-dimensional models.

€ 1.488.673
ERC Starting...

Towards a New Theory of Optimal Dynamic Graph Algorithms

This project aims to systematically study and establish intrinsically optimal dynamic graph algorithms with constant update times, potentially revolutionizing the field and enhancing real-world applications.

€ 1.400.000

Vergelijkbare projecten uit andere regelingen

EIC Pathfinder

Processing-in-memory architectures and programming libraries for bioinformatics algorithms

This project aims to enhance genomics research by developing energy-efficient, cost-effective edge computing solutions using processing-in-memory technologies for high-throughput sequencing data analysis.

€ 1.966.665
EIC Pathfinder

Computation driven development of novel vivo-like-DNA-nanotransducers for biomolecules structure identification

This project aims to develop DNA-nanotransducers for real-time detection and analysis of conformational changes in biomolecules, enhancing understanding of molecular dynamics and aiding drug discovery.

€ 3.000.418