CompBio @ MIMUW

Research projects

Modelling stress-induced transposon activity

The project aims to perform a bioinformatic and experimental analysis of dynamics proliferation of transposons (TE) in the presence of environmental stress conditions. Within the project we intend to develop a computational model describing the relationship between changes in the genomes of organisms, caused by the activity of mobile genetic elements, and the level of their adaptation to a changing environment.

We also plan to perform experimental analysis based on plant genomes: we will examine the activity of transposons in higher organisms by a comparative study on the number and distribution of selected families of transposons in the Medicago truncatula genome. The data will come from cultured plant cells exposed to physiological stress. The obtained experimental data will be used for verification of the TE proliferation model and to allow for testing hypotheses concerning the impact of TE activity on the adaptation of organisms.

Moreover, we develop algorithms and data structures to represent knowledge about the activities of the various TE families. Further, we focus on inferring optimal evolutionary scenarios that enable us to identify the chronology of transposon activity.

Integrative systems biology: inferring from massive heterogeneous data.

Modern biotechnology offers efficient techniques for large-scale measurements of molecular responses to targeted interventions into cellular processes. Besides experimental techniques for high-throughput analysis, various formal methods and algorithmic approaches were proposed for molecular modelling. These are, e.g. logical inference techniques for studying possible system evolution, statistical data mining for gene expression and proteomic studies, and stochastic simulations for dynamic system behavior.

Our approach aims to interrelate heterogeneous and often noisy ''-omics'' data by relying on mathematics, statistics and computer science. It is our objective to address the specific challenges and obtain deep biological insights relating to the following tasks:

Modelling peptide degradation process from LC-MS/MS data.
Determining the regulatory mechanism of gene expression pattern.
Sensitivity analysis of signalling pathways.
Detecting aberrations in diseased genomes.

Detecting DNA Copy Number Variations in array based CGH data.

We propose a novel multiple sample aCGH analysis methodology aiming in rare Copy-Number Variations (CNVs) detection. In contrast to the majority of previous approaches, which deal with cancer datasets, we focus on constitutional genomic abnormalities identified in a diverse spectrum of diseases in human.

Our method is tested on exon aCGH array of several hundreds samples of patients affected with developmental delay/intellectual disability, epilepsy, or autism. The robust statistical framework applied in our method enables to eliminate the influence of widespread technical artifact termed 'waves'.

Modeling serum proteolytic activity from LC-MS/MS data.

In the project we deal with modeling serum proteolysis process from tandem mass spectrometry data. The parameters of peptide degradation process inferred from LCMS/ MS data correspond directly to the activity of specific enzymes present in the serum samples of patients and healthy donors. Our approach integrate the existing knowledge about peptidases' activity stored in MEROPS database with the efficient procedure for estimation the model parameters. Taking into account the inherent stochasticity of the process, the proteolytic activity is modeled with the use of Chemical Master Equation (CME). Assuming the stationarity of the Markov process we calculate the expected values of digested peptides in the model. The parameters are fitted to minimize the discrepancy between those expected values and the peptide activities observed in the MS data.

Heuristic for the exploration of phylogenetic trees space

This task is related to the desig nof efficient algorithms for the problem of optimal species tree inference from a set of input gene trees. Our work is focused on several parts: (1) designing faster algorithms for the local search problem, that is, more efficient exploration of a local neighbourhood of a given species tree, (2) the problem of starting tree generation, and (3) hill climbing heuristic for the general search.

Elucidating regulatory mechanisms downstream of a signaling pathway using informative experiments.

Signaling cascades are triggered by extra-cellular stimulation and propagate the signal to regulate transcription. Systematic reconstruction of this regulation requires pathway-targeted, informative experimental data. However, experimental design is difficult since even highly informative experiments might be redundant with other experiments. In addition, experimental outcomes vary not only between different genetic perturbations but also between the combinations of environmental stimuli. We have developed a practical algorithmic framework that iterates design of experiments and reconstruction of regulatory relationships downstream of a given pathway. The experimental design component of the framework, called MEED, proposes a set of experiments the can be performed in the lab and given as input to the reconstruction component. Both components take advantage of expert knowledge about the signaling system under study, formalized in a predictive logical model. The reconstruction component reconciles the model predictions with the data from the designed experiments to provide a set of identified target genes, their regulators in the pathway and their regulatory mechanisms. Reconstruction based on uninformative data may lead to ambiguous conclusions about the regulation. To avoid ambiguous reconstruction, MEED maximizes diversity between the predicted expression profiles of genes regulated through different mechanisms.

Computational modeling of transcriptional regulation in the context of chromatin dynamics

The project aim is to develop a new computational model for transcriptional regulation by integrating high throughput data on transcription factor (TF) binding with global data on chromatin state. The intended model needs to account for temporal changes both in TF binding and in chromatin states in addition to their relation to temporal changes in target gene expression. At the end of the process we should be able to make specific predictions regarding target gene expression for genes involved in early Drosophila development. It is also desirable for the model to allow for introspection, i.e. analysis of conditional dependencies between variables. The initial formulation of the model will be guided by the statistical analyses of available ModEncode data, collected from staged whole embryos as well as data specific to mesoderm development in collaboration with biologists.

Structural analysis and prediction of cis-regulatory modules in genomes of higher eukaryotes

Transcription factor binding to gene promoters controls transcriptional processes in cells. In higher eukaryotes, transcription factor binding sites tend to cluster into cis-regulatory modules (CRMs). We analyse structural properties of experimentally verified CRMs and their conservation among related species. Based on this analysis we develop CRM prediction methods. An additional objective is to draw gene expression regulatory mechanisms from sequence and microarray data. In particular, we recover relationships between cis-regulatory features and expression patterns, analyse dependencies between gene expression profiles and infer related regulatory interactions.

Protein-Protein Interaction networks

Large-scale protein-protein interaction (PPI) networks are now available for human and many model organisms. The arising challenge is to analyze these data to reveal the basic components and organization of the cellular machinery. Pioneering studies have shown that cross-species comparison is an effective approach for uncovering key modules in PPI networks. Early successes have in turn stimulated the research for new methods, with a more solid grounding in mathematical models, and better scalability, to allow multiple network comparison. We developed a novel framework for comparing PPI networks across species, providing new insights into the evolution of these systems. Our approach is based on the reconstruction of a hypothetical PPI network of the common ancestor of the considered species. The reconstruction algorithm is built upon a proposed model of protein network evolution, which takes into account the phylogenetic history of the proteins and the rewiring of their interactions. Initial application of our procedure to networks of D. melanogaster, C. elegans and S. cerevisiae revealed that the most probable ancestral interactions often correspond to known protein complexes. We are now extending the framework to provide practical methods for transferring and integrating PPI evidence from multiple datasets and species and for studying theoretical properties of large evolving networks.

Comparative genomics of plant transposons

The main objective of this project is to develop computational tools and efficient algorithms aiming in the analysis of the evolution of DNA transposons in the sequenced genomes of plants. Laboratory verification of bioinformatics results are carried out on plant material representing family Fabaceae, subfamily Papilionoideae. We try to determine whether the hypothetical period of activity of selected transposons corresponds to the time of speciation and whether species belonging to the same evolutionary line are characterized by the presence of similar families of transposons. Our study aimed at understanding the evolutionary mechanisms that led to such a broad spread oftransposons in eukaryotic genomes. Moreover, we expect that this knowledge will translate into practical applications in the field "transposon tagging'' and genetic modification of crop plants.

Predicting cell type-specific transcription factor cooperative binding

Transcription factors (TFs) are essential for the regulation of gene expression. Their activating effect is achieved by binding to the specific DNA sequence fragments in the regulatory regions of the genome, usually close to each other. This binding is usually studied in isolation, even though TFs typically build up regulatory complexes with other TFs, chromatin modifiers and co-factor proteins. In order to elucidate the mechanisms of transcription regulation, it is crucial to determine which TFs exhibit direct cooperative binding to DNA, and infer the precise nature of these interactions. Just as SOX2-OCT4 cooperativity is a defining feature of embryonic stem cells, other such interactions could play an essential role in the regulatory networks of many other cell types. Our objective is to systematically infer pairs of TFs binding cooperatively to DNA as dimers. For each predicted dimer, we also aim to derive the spatial arrangement of individual TFs on DNA and the cell-type specificity of the interaction. Our current approach is based on large-scale analysis of motif complex enrichment in regulatory regions specific for individual cell types.

Funding sources

fundacja nauki polskiej

Software projects

MSARC: Multiple sequence alignment without guide-trees.
Bmap: Efficient and error-tolerant sequencing read mapping.
URec: Unrooted REConciliation
CTX-PSI-BLAST: Context sensitive version of protein BLAST.
mz2m: Program for interpretation of peptide mass spectra.
MetaPepSeq: MetaPepSeq is metaserver for protein identification in mass spectrometry experiments.
TIRfinder: A tool for mining class II transposons.
MEMOFinder: De novo motif finding using different predictors and a database.
BNFinder: Bayesian network topology inference.
Tav4SB: Web service operations for analysis of the kinetic models of biological systems.
joda: Joint deregulation analysis
bgmm: Knowledge in mixture modelling
meed: Model Expansion experimental design
BRAIN: Baffling Recursive Algorithm for Isotope distributioN calculations

CompBio@MIMUW

Research projects

Modelling stress-induced transposon activity

Integrative systems biology: inferring from massive heterogeneous data.

Detecting DNA Copy Number Variations in array based CGH data.

Modeling serum proteolytic activity from LC-MS/MS data.

Heuristic for the exploration of phylogenetic trees space

Elucidating regulatory mechanisms downstream of a signaling pathway using informative experiments.

Computational modeling of transcriptional regulation in the context of chromatin dynamics

Structural analysis and prediction of cis-regulatory modules in genomes of higher eukaryotes

Protein-Protein Interaction networks

Comparative genomics of plant transposons

Predicting cell type-specific transcription factor cooperative binding

Funding sources

Software projects