The project aims to perform a bioinformatic and experimental analysis of dynamics proliferation of transposons (TE) in the presence of environmental stress conditions. Within the project we intend to develop a computational model describing the relationship between changes in the genomes of organisms, caused by the activity of mobile genetic elements, and the level of their adaptation to a changing environment.
We also plan to perform experimental analysis based on plant genomes: we will examine the activity of transposons in higher organisms by a comparative study on the number and distribution of selected families of transposons in the Medicago truncatula genome. The data will come from cultured plant cells exposed to physiological stress. The obtained experimental data will be used for verification of the TE proliferation model and to allow for testing hypotheses concerning the impact of TE activity on the adaptation of organisms.
Moreover, we develop algorithms and data structures to represent knowledge about the activities of the various TE families. Further, we focus on inferring optimal evolutionary scenarios that enable us to identify the chronology of transposon activity.
Modern biotechnology offers efficient techniques for large-scale measurements
of molecular responses to targeted interventions into cellular processes.
Besides experimental techniques for high-throughput analysis, various formal
methods and algorithmic approaches were proposed for molecular modelling.
These are, e.g. logical inference techniques for studying possible system evolution,
statistical data mining for gene expression and proteomic studies, and stochastic simulations for dynamic system behavior.
Our approach aims to interrelate heterogeneous and often noisy ''-omics'' data by relying on mathematics, statistics and computer science. It is our objective to address the specific challenges and obtain deep biological insights relating to the following tasks:
Modelling peptide degradation process from LC-MS/MS data.
Determining the regulatory mechanism of gene expression pattern.
Sensitivity analysis of signalling pathways.
Detecting aberrations in diseased genomes.
Detecting DNA Copy Number Variations in array based CGH data.
We propose a novel multiple sample aCGH analysis methodology aiming
in
rare Copy-Number Variations (CNVs) detection. In contrast to the
majority
of previous approaches, which deal with cancer datasets,
we focus on constitutional genomic abnormalities identified
in a diverse spectrum of diseases in human.
Our method is tested on exon aCGH array of several hundreds samples
of
patients affected with developmental delay/intellectual
disability, epilepsy, or autism. The robust
statistical framework applied in our method enables
to eliminate the influence of widespread technical artifact
termed 'waves'.
Modeling serum proteolytic activity from LC-MS/MS data.
In the project we deal with modeling serum proteolysis
process from tandem mass spectrometry data. The parameters of
peptide
degradation process inferred from LCMS/ MS data correspond directly
to
the activity of specific enzymes present in the serum samples of
patients and healthy
donors.
Our approach integrate the existing knowledge about
peptidases' activity stored in MEROPS database with the
efficient procedure for estimation the model parameters. Taking
into account the inherent stochasticity of the process, the
proteolytic activity is modeled with the use of Chemical Master
Equation (CME). Assuming the stationarity of the Markov
process we calculate the expected values of digested peptides
in the model. The parameters are fitted to minimize the
discrepancy between those expected values and the peptide
activities observed in the MS data.
Heuristic for the exploration of
phylogenetic trees space
This task is related to the desig nof efficient
algorithms for the
problem of optimal species tree inference from a
set of input gene
trees. Our work is focused on several parts: (1)
designing faster
algorithms for the local search problem, that is,
more efficient
exploration of a local neighbourhood of a given
species tree, (2) the
problem of starting tree generation, and (3) hill
climbing heuristic
for the general search.
Elucidating regulatory mechanisms downstream of a signaling
pathway using informative experiments.
Signaling cascades are triggered by extra-cellular stimulation and
propagate the signal to regulate transcription. Systematic
reconstruction of this regulation requires pathway-targeted,
informative experimental data. However, experimental design is
difficult since even highly informative experiments might be
redundant
with other experiments. In addition, experimental outcomes vary
not
only between different genetic perturbations but also between the
combinations of environmental stimuli.
We have developed a practical algorithmic framework that iterates
design of experiments and reconstruction of regulatory
relationships
downstream of a given pathway. The experimental design component
of
the framework, called MEED, proposes a set of experiments the can
be
performed in the lab and given as input to the reconstruction
component. Both components take advantage of expert knowledge
about
the signaling system under study, formalized in a predictive
logical
model. The reconstruction component reconciles the model
predictions
with the data from the designed experiments to provide a set of
identified target genes, their regulators in the pathway and
their
regulatory mechanisms. Reconstruction based on uninformative data
may
lead to ambiguous conclusions about the regulation. To avoid
ambiguous
reconstruction, MEED maximizes diversity between the predicted
expression profiles of genes regulated through different mechanisms.
Computational modeling of transcriptional regulation
in the context of chromatin dynamics
The project aim is to develop a new computational model for
transcriptional regulation by integrating high throughput data on
transcription factor (TF) binding with global data on chromatin
state. The intended model needs to account for temporal changes
both in TF binding and in chromatin states in addition to their
relation to temporal changes in target gene expression. At the end
of the process we should be able to make specific predictions
regarding target gene expression for genes involved in early
Drosophila development. It is also desirable for the model to allow
for introspection, i.e. analysis of conditional dependencies
between variables. The initial formulation of the model will be
guided by the statistical analyses of available ModEncode data,
collected from staged whole embryos as well as data specific to
mesoderm development in collaboration with biologists.
Structural analysis and prediction of cis-regulatory modules in
genomes of higher eukaryotes
Transcription factor binding to gene promoters controls
transcriptional processes in cells. In higher eukaryotes,
transcription factor binding sites tend to cluster into
cis-regulatory modules (CRMs).
We analyse structural properties of experimentally verified CRMs
and their conservation among related species. Based on this
analysis we develop CRM prediction methods.
An additional objective is to draw gene expression regulatory mechanisms
from sequence and microarray data.
In particular, we recover relationships between cis-regulatory
features and expression patterns,
analyse dependencies between gene expression profiles and infer
related regulatory interactions.
Protein-Protein Interaction networks
Large-scale protein-protein interaction (PPI) networks are now
available for human and many model organisms. The arising challenge is to
analyze these data to reveal the basic components and organization of the
cellular machinery. Pioneering studies have shown that cross-species
comparison is an effective approach for uncovering key modules in PPI
networks. Early successes have in turn stimulated the research for new methods, with a more solid grounding in mathematical models, and better
scalability, to allow multiple network comparison. We developed a novel framework for comparing PPI networks across
species, providing new insights into the evolution of these systems. Our
approach is based on the reconstruction of a hypothetical PPI network of the
common ancestor of the considered species. The reconstruction algorithm
is built upon a proposed model of protein network evolution, which takes
into account the phylogenetic history of the proteins and the
rewiring of their interactions. Initial application of our procedure to networks of
D. melanogaster, C. elegans
and S. cerevisiae revealed that the most probable ancestral
interactions often
correspond to known protein complexes. We are now extending the
framework
to provide practical methods for transferring and integrating PPI
evidence from
multiple datasets and species and for studying theoretical
properties of large
evolving networks.
Comparative genomics of plant transposons
The main objective of this project is to develop computational
tools and
efficient algorithms aiming in the analysis of the evolution
of DNA transposons in the sequenced genomes of plants.
Laboratory verification of bioinformatics results are carried out
on
plant material representing family Fabaceae, subfamily
Papilionoideae.
We try to determine whether the hypothetical period of activity
of
selected transposons corresponds to the time of speciation and
whether
species belonging to the same evolutionary line are characterized
by
the presence of similar families of transposons.
Our study aimed at understanding the evolutionary mechanisms that
led
to such a broad spread oftransposons in eukaryotic
genomes. Moreover,
we expect that this knowledge will translate into practical
applications in the field "transposon tagging'' and genetic
modification of crop plants.
Transcription factors (TFs) are essential for the regulation of gene expression. Their activating effect is achieved by binding to the specific DNA sequence fragments in the regulatory regions of the genome, usually close to each other. This binding is usually studied in isolation, even though TFs typically build up regulatory complexes with other TFs, chromatin modifiers and co-factor proteins. In order to elucidate the mechanisms of transcription regulation, it is crucial to determine which TFs exhibit direct cooperative binding to DNA, and infer the precise nature of these interactions. Just as SOX2-OCT4 cooperativity is a defining feature of embryonic stem cells, other such interactions could play an essential role in the regulatory networks of many other cell types. Our objective is to systematically infer pairs of TFs binding cooperatively to DNA as dimers. For each predicted dimer, we also aim to derive the spatial arrangement of individual TFs on DNA and the cell-type specificity of the interaction. Our current approach is based on large-scale analysis of motif complex enrichment in regulatory regions specific for individual cell types.