Planetary-scale indexing of sequencing data

Develop a planetary genomic search engine to efficiently index and analyze vast DNA and RNA sequencing data, enabling groundbreaking biological discoveries and improved data accessibility.

Subsidie
€ 1.933.625
2023

Projectdetails

Introduction

Nowadays a huge amount of planetary DNA and RNA sequencing data is available and continues to double every two years. However, efficiently analyzing this data is impossible due to its size measured in petabases. Many high-impact biological discoveries could be made but are prevented by the lack of fast search algorithms.

I recently demonstrated this potential by discovering an order of magnitude more RNA virus species within all public RNA samples. A global index, i.e. a planetary genomic search engine, would unlock instant and inexpensive search within petabase-scale data.

Hypothesis

I hypothesize that I can create a searchable index for all of the public DNA and RNA sequencing data. Leveraging my unique expertise across algorithms and data structures for biological sequences, my plan is to:

  1. Design efficient methods to assemble and compress all available sequencing data.
  2. Construct an external-memory index that will support versatile biological queries.

Potential Impact

A planetary sequencing data index will enable a myriad of bioinformatics analyses that are currently out of reach.

I will demonstrate the utility of the index by:

  1. Constructing a database of human transcripts with novel disease associations.
  2. Discovering novel microbial species.
  3. Providing a search engine for environmental metagenomes.

The resulting unprecedented collection of assembled genomes and compressed reads will lift a major challenge in data accessibility, improving its efficiency by several orders of magnitude, and revolutionizing the scale of future bioinformatics analyses.

Financiële details & Tijdlijn

Financiële details

Subsidiebedrag€ 1.933.625
Totale projectbegroting€ 1.933.625

Tijdlijn

Startdatum1-9-2023
Einddatum31-8-2028
Subsidiejaar2023

Partners & Locaties

Projectpartners

  • INSTITUT PASTEURpenvoerder

Land(en)

France

Vergelijkbare projecten binnen European Research Council

ERC Consolid...

Scalable Graph Algorithms for Bioinformatics using Structure, Parameterization and Dynamic Updates

This project aims to develop scalable exact graph algorithms for processing sequencing data, enhancing accuracy in RNA transcript discovery and genomic database indexing.

€ 1.999.868
ERC Advanced...

The sequencing microscope - a path to look at the molecules of biology

This project aims to develop a novel technique that uses sequencing data to infer spatial information in tissues, enhancing our understanding of biological systems without advanced microscopy.

€ 2.500.000
ERC Proof of...

Linking genome variation with haplotype-resolved sequencing

The project aims to validate and scale the haplotagging technique for DNA sequencing, enhancing haplotype context while integrating with existing Illumina technology to improve disease detection.

€ 150.000
ERC Proof of...

Computational multiplexing to optimise next-generation sequencing

Developing MultiSeq, a bioinformatics solution to streamline and reduce costs in NGS library preparation, aiming to democratize sequencing technology and enhance its application across industries.

€ 150.000
ERC Starting...

Optical Sequencing inside Live Cells with Biointegrated Nanolasers

HYPERION aims to revolutionize intracellular biosensing by using plasmonic nanolasers for real-time detection of RNA, enhancing our understanding of molecular processes in living cells.

€ 1.577.695

Vergelijkbare projecten uit andere regelingen

EIC Pathfinder

Processing-in-memory architectures and programming libraries for bioinformatics algorithms

This project aims to enhance genomics research by developing energy-efficient, cost-effective edge computing solutions using processing-in-memory technologies for high-throughput sequencing data analysis.

€ 1.966.665
EIC Pathfinder

Next Generation Molecular Data Storage

This project aims to develop a cost-effective and efficient DNA nanostructure-based data storage system, enhancing longevity and reducing electronic waste compared to traditional media.

€ 2.418.514
EIC Pathfinder

Proof of concept of affordable and scalable DNA data storage by developing a writing technology based on enzymatic DNA synthesis with a very dense semiconductor integrated circuit (CMOS chip)

The Hyperion project aims to develop affordable, scalable DNA data storage technology through advanced enzymatic synthesis and high-density chip multiplexing, targeting exabyte capacity and efficiency.

€ 4.000.000
EIC Pathfinder

DNA-based Infrastructure for Storage and Computation

The DISCO project aims to engineer a robust DNA-based storage and computing platform, starting with a 10-bit prototype and scaling to hundreds of bits using advanced molecular techniques.

€ 3.993.665
EIC Pathfinder

Interoperable end-to-end platform of scalable and sustainable high-throughput technologies for DNA-based digital data storage

PEARL-DNA aims to develop a high-throughput, modular DNA-based data storage platform to enhance longevity, efficiency, and integration in sustainable data management solutions.

€ 3.999.857