Computational methods for genomic structural variants interpretation

Research project objectives

Structural variants (SVs) of DNA fragments constitute a significant part of the total genetic variation observed in the human population. At the same time, they play a crucial role in the pathogenesis of many genetic diseases as well as in the evolution of the human genome.Until now, SVs have been linked to many diseases, including dozens of genomic diseases (mainly caused by recurrent Copy Number Variants) and hundreds of monogenic diseases with Mendelian inheritance. In recent years, more and more advanced SV detection algorithms have been developed based on data from different sequencing and microarray technologies. Our group has also contributed to the identification of SVs and the analysis of genome features that mediate the formation of rearrangements. However, it should be noted that the disease phenotype can be currently linked to SV only for ca 10 percent of analyzed cases. Therefore it is much more challenging to discover the mechanism of pathogenicity of a given variant. Taking advantage of our experience gained in the analysis of genome rearrangements we would like contribute to this field by providing computational tools supporting the SV interpretation. Three specific tasks will be addressed by us:

  • SVAssembly: implement adequate algorithmic solution for assembly of complex and highly repetitive SV regions from long-read sequencing data
  • PathoScore: develop biologically sound statistical model of structural genomic features underlying the pathogenicity of the rearrangements
  • SVAnnotate: provide a tool that systematically prioritize SV that my contribute to disease by disrupting 3D structure of chromosome, with focus on non-coding SVs

Research project methodology

The proposed research tasks cover the analysis of human genome rearrangements with the use of data from of modern high-throughput technologies like long-read sequencing and Hi-C experiments. We would like to explore the potential of integrative approach to support the clinical interpretation of structural variants. Our tools will provide the access to various databases on genomic information enabling the exploration of distant genetic regulation, disruption of chromatin structure and several phenotype-linked annotations. Moreover, we will propose the statistical model for evaluation of genomic features that may underly the patient’s phenotype. All genomic information will be appropriately modelled assuming null-hypothesis of benign SV and from such model p-values can be calculated yielding the statistically sound ranking of genes potentially linked to the phenotype.

Expected impact of the research project on the development of science

Due to the low effectiveness of linking the detected SVs with the phenotype of patients, any tool supporting researchers and clinicians in this area will have a significant impact on the development of both basic science and translational medicine. Similarly, differences in the symptoms of patients with the same genomic variant can be explained by the de-novo assembly of rearrangement regions.

International collaboration

Our longstanding cooperation with group of Pawel Stankiewicz started in 2010 and is focused on genomic rearrangements. The unique opportunity to analyze huge genomic database of patients that undergone aCGH analysis coupled with our algorithmic and modeling skills resulted in several successful projects. Outcomes were published in high impact journals, like Human Mutation, Genome Biology, Genome Research, Nucleic Acid Research. PLOS Genetics. Our most impressive recent finding (Breakthrough paper in NAR) revolutionized the perception of human genome stability, as we have shown that even short homologous DNA fragments (ALU transposable elements) may mediate structural variations via Non-Allelic Homologous Recombination mechanisms.