Planetary-scale indexing of sequencing data
Develop a planetary genomic search engine to efficiently index and analyze vast DNA and RNA sequencing data, enabling groundbreaking biological discoveries and improved data accessibility.
Projectdetails
Introduction
Nowadays a huge amount of planetary DNA and RNA sequencing data is available and continues to double every two years. However, efficiently analyzing this data is impossible due to its size measured in petabases. Many high-impact biological discoveries could be made but are prevented by the lack of fast search algorithms.
I recently demonstrated this potential by discovering an order of magnitude more RNA virus species within all public RNA samples. A global index, i.e. a planetary genomic search engine, would unlock instant and inexpensive search within petabase-scale data.
Hypothesis
I hypothesize that I can create a searchable index for all of the public DNA and RNA sequencing data. Leveraging my unique expertise across algorithms and data structures for biological sequences, my plan is to:
- Design efficient methods to assemble and compress all available sequencing data.
- Construct an external-memory index that will support versatile biological queries.
Potential Impact
A planetary sequencing data index will enable a myriad of bioinformatics analyses that are currently out of reach.
I will demonstrate the utility of the index by:
- Constructing a database of human transcripts with novel disease associations.
- Discovering novel microbial species.
- Providing a search engine for environmental metagenomes.
The resulting unprecedented collection of assembled genomes and compressed reads will lift a major challenge in data accessibility, improving its efficiency by several orders of magnitude, and revolutionizing the scale of future bioinformatics analyses.
Financiële details & Tijdlijn
Financiële details
Subsidiebedrag | € 1.933.625 |
Totale projectbegroting | € 1.933.625 |
Tijdlijn
Startdatum | 1-9-2023 |
Einddatum | 31-8-2028 |
Subsidiejaar | 2023 |
Partners & Locaties
Projectpartners
- INSTITUT PASTEURpenvoerder
Land(en)
Vergelijkbare projecten binnen European Research Council
Project | Regeling | Bedrag | Jaar | Actie |
---|---|---|---|---|
Scalable Graph Algorithms for Bioinformatics using Structure, Parameterization and Dynamic UpdatesThis project aims to develop scalable exact graph algorithms for processing sequencing data, enhancing accuracy in RNA transcript discovery and genomic database indexing. | ERC Consolid... | € 1.999.868 | 2025 | Details |
The sequencing microscope - a path to look at the molecules of biologyThis project aims to develop a novel technique that uses sequencing data to infer spatial information in tissues, enhancing our understanding of biological systems without advanced microscopy. | ERC Advanced... | € 2.500.000 | 2024 | Details |
Linking genome variation with haplotype-resolved sequencingThe project aims to validate and scale the haplotagging technique for DNA sequencing, enhancing haplotype context while integrating with existing Illumina technology to improve disease detection. | ERC Proof of... | € 150.000 | 2022 | Details |
Computational multiplexing to optimise next-generation sequencingDeveloping MultiSeq, a bioinformatics solution to streamline and reduce costs in NGS library preparation, aiming to democratize sequencing technology and enhance its application across industries. | ERC Proof of... | € 150.000 | 2023 | Details |
Optical Sequencing inside Live Cells with Biointegrated NanolasersHYPERION aims to revolutionize intracellular biosensing by using plasmonic nanolasers for real-time detection of RNA, enhancing our understanding of molecular processes in living cells. | ERC Starting... | € 1.577.695 | 2022 | Details |
Scalable Graph Algorithms for Bioinformatics using Structure, Parameterization and Dynamic Updates
This project aims to develop scalable exact graph algorithms for processing sequencing data, enhancing accuracy in RNA transcript discovery and genomic database indexing.
The sequencing microscope - a path to look at the molecules of biology
This project aims to develop a novel technique that uses sequencing data to infer spatial information in tissues, enhancing our understanding of biological systems without advanced microscopy.
Linking genome variation with haplotype-resolved sequencing
The project aims to validate and scale the haplotagging technique for DNA sequencing, enhancing haplotype context while integrating with existing Illumina technology to improve disease detection.
Computational multiplexing to optimise next-generation sequencing
Developing MultiSeq, a bioinformatics solution to streamline and reduce costs in NGS library preparation, aiming to democratize sequencing technology and enhance its application across industries.
Optical Sequencing inside Live Cells with Biointegrated Nanolasers
HYPERION aims to revolutionize intracellular biosensing by using plasmonic nanolasers for real-time detection of RNA, enhancing our understanding of molecular processes in living cells.
Vergelijkbare projecten uit andere regelingen
Project | Regeling | Bedrag | Jaar | Actie |
---|---|---|---|---|
Processing-in-memory architectures and programming libraries for bioinformatics algorithmsThis project aims to enhance genomics research by developing energy-efficient, cost-effective edge computing solutions using processing-in-memory technologies for high-throughput sequencing data analysis. | EIC Pathfinder | € 1.966.665 | 2022 | Details |
Next Generation Molecular Data StorageThis project aims to develop a cost-effective and efficient DNA nanostructure-based data storage system, enhancing longevity and reducing electronic waste compared to traditional media. | EIC Pathfinder | € 2.418.514 | 2023 | Details |
Proof of concept of affordable and scalable DNA data storage by developing a writing technology based on enzymatic DNA synthesis with a very dense semiconductor integrated circuit (CMOS chip)The Hyperion project aims to develop affordable, scalable DNA data storage technology through advanced enzymatic synthesis and high-density chip multiplexing, targeting exabyte capacity and efficiency. | EIC Pathfinder | € 4.000.000 | 2023 | Details |
DNA-based Infrastructure for Storage and ComputationThe DISCO project aims to engineer a robust DNA-based storage and computing platform, starting with a 10-bit prototype and scaling to hundreds of bits using advanced molecular techniques. | EIC Pathfinder | € 3.993.665 | 2023 | Details |
Interoperable end-to-end platform of scalable and sustainable high-throughput technologies for DNA-based digital data storagePEARL-DNA aims to develop a high-throughput, modular DNA-based data storage platform to enhance longevity, efficiency, and integration in sustainable data management solutions. | EIC Pathfinder | € 3.999.857 | 2023 | Details |
Processing-in-memory architectures and programming libraries for bioinformatics algorithms
This project aims to enhance genomics research by developing energy-efficient, cost-effective edge computing solutions using processing-in-memory technologies for high-throughput sequencing data analysis.
Next Generation Molecular Data Storage
This project aims to develop a cost-effective and efficient DNA nanostructure-based data storage system, enhancing longevity and reducing electronic waste compared to traditional media.
Proof of concept of affordable and scalable DNA data storage by developing a writing technology based on enzymatic DNA synthesis with a very dense semiconductor integrated circuit (CMOS chip)
The Hyperion project aims to develop affordable, scalable DNA data storage technology through advanced enzymatic synthesis and high-density chip multiplexing, targeting exabyte capacity and efficiency.
DNA-based Infrastructure for Storage and Computation
The DISCO project aims to engineer a robust DNA-based storage and computing platform, starting with a 10-bit prototype and scaling to hundreds of bits using advanced molecular techniques.
Interoperable end-to-end platform of scalable and sustainable high-throughput technologies for DNA-based digital data storage
PEARL-DNA aims to develop a high-throughput, modular DNA-based data storage platform to enhance longevity, efficiency, and integration in sustainable data management solutions.