Scalable Graph Algorithms for Bioinformatics using Structure, Parameterization and Dynamic Updates
This project aims to develop scalable exact graph algorithms for processing sequencing data, enhancing accuracy in RNA transcript discovery and genomic database indexing.
Projectdetails
Introduction
Sequencing technologies have developed to be cheap and accurate, leading to major breakthroughs, such as the complete sequence of a human genome, the creation of nationwide population gene banks, or the discovery of novel viruses. As the amount of data produced grows exponentially and their applications become more broad and complex, the community needs accurate computational methods that scale.
Algorithmic Challenges
At the core of many algorithmic methods for processing sequencing data is the basic primitive of finding a set of paths or walks in graphs of various nature. Under different formulations and objective functions, the resulting problems can be:
- NP-hard (e.g. flow decompositions)
- Polynomial-time (e.g. path covers)
These challenges are impractical on large graphs. Thus, many practical tools prefer fast heuristics to exact algorithms. While these may be optimized for specific inputs, they may not be reliable or accurate in general, which is a highly relevant issue in, e.g., medical and life-science research.
Project Goals
This project will develop general methods to massively scale such exact graph algorithms. The approach will include:
-
Novel Graph Structures:
- Safety structures, e.g., sets of paths that can be quickly found to appear in all optimal solutions and thus simplify the problem.
- Variation structures that limit the hardness of a problem only to graph areas rich in genetic variation.
-
Modern Algorithmic Techniques:
- Parameterizing polynomial algorithms to run in time linear in the graph size and superlinear only in a small parameter.
- Dynamic algorithms that, as the input grows, update solutions based only on the new data.
Applications
We will apply these methods in two high-impact applications:
- Long-read RNA transcript discovery
- Indexing massive and rapidly growing genomic databases
Conclusion
This project paves the way for exact graph algorithms usable independently of the problem complexity or of the input size, applicable to real-world problems.
Financiële details & Tijdlijn
Financiële details
Subsidiebedrag | € 1.999.868 |
Totale projectbegroting | € 1.999.868 |
Tijdlijn
Startdatum | 1-9-2025 |
Einddatum | 31-8-2030 |
Subsidiejaar | 2025 |
Partners & Locaties
Projectpartners
- HELSINGIN YLIOPISTOpenvoerder
Land(en)
Vergelijkbare projecten binnen European Research Council
Project | Regeling | Bedrag | Jaar | Actie |
---|---|---|---|---|
Planetary-scale indexing of sequencing dataDevelop a planetary genomic search engine to efficiently index and analyze vast DNA and RNA sequencing data, enabling groundbreaking biological discoveries and improved data accessibility. | ERC Consolid... | € 1.933.625 | 2023 | Details |
The sequencing microscope - a path to look at the molecules of biologyThis project aims to develop a novel technique that uses sequencing data to infer spatial information in tissues, enhancing our understanding of biological systems without advanced microscopy. | ERC Advanced... | € 2.500.000 | 2024 | Details |
Computational multiplexing to optimise next-generation sequencingDeveloping MultiSeq, a bioinformatics solution to streamline and reduce costs in NGS library preparation, aiming to democratize sequencing technology and enhance its application across industries. | ERC Proof of... | € 150.000 | 2023 | Details |
Provable Scalability for high-dimensional Bayesian LearningThis project develops a mathematical theory for scalable Bayesian learning methods, integrating computational and statistical insights to enhance algorithm efficiency and applicability in high-dimensional models. | ERC Starting... | € 1.488.673 | 2023 | Details |
Towards a New Theory of Optimal Dynamic Graph AlgorithmsThis project aims to systematically study and establish intrinsically optimal dynamic graph algorithms with constant update times, potentially revolutionizing the field and enhancing real-world applications. | ERC Starting... | € 1.400.000 | 2022 | Details |
Planetary-scale indexing of sequencing data
Develop a planetary genomic search engine to efficiently index and analyze vast DNA and RNA sequencing data, enabling groundbreaking biological discoveries and improved data accessibility.
The sequencing microscope - a path to look at the molecules of biology
This project aims to develop a novel technique that uses sequencing data to infer spatial information in tissues, enhancing our understanding of biological systems without advanced microscopy.
Computational multiplexing to optimise next-generation sequencing
Developing MultiSeq, a bioinformatics solution to streamline and reduce costs in NGS library preparation, aiming to democratize sequencing technology and enhance its application across industries.
Provable Scalability for high-dimensional Bayesian Learning
This project develops a mathematical theory for scalable Bayesian learning methods, integrating computational and statistical insights to enhance algorithm efficiency and applicability in high-dimensional models.
Towards a New Theory of Optimal Dynamic Graph Algorithms
This project aims to systematically study and establish intrinsically optimal dynamic graph algorithms with constant update times, potentially revolutionizing the field and enhancing real-world applications.
Vergelijkbare projecten uit andere regelingen
Project | Regeling | Bedrag | Jaar | Actie |
---|---|---|---|---|
Processing-in-memory architectures and programming libraries for bioinformatics algorithmsThis project aims to enhance genomics research by developing energy-efficient, cost-effective edge computing solutions using processing-in-memory technologies for high-throughput sequencing data analysis. | EIC Pathfinder | € 1.966.665 | 2022 | Details |
Computation driven development of novel vivo-like-DNA-nanotransducers for biomolecules structure identificationThis project aims to develop DNA-nanotransducers for real-time detection and analysis of conformational changes in biomolecules, enhancing understanding of molecular dynamics and aiding drug discovery. | EIC Pathfinder | € 3.000.418 | 2022 | Details |
Processing-in-memory architectures and programming libraries for bioinformatics algorithms
This project aims to enhance genomics research by developing energy-efficient, cost-effective edge computing solutions using processing-in-memory technologies for high-throughput sequencing data analysis.
Computation driven development of novel vivo-like-DNA-nanotransducers for biomolecules structure identification
This project aims to develop DNA-nanotransducers for real-time detection and analysis of conformational changes in biomolecules, enhancing understanding of molecular dynamics and aiding drug discovery.