UREC - Unrooted REConciliation

(c) Copyright 2005-2006 by P.Gorecki

Contents

General description
Urec is a software for computing unrooted gene and rooted species tree reconciliation, and inferring species phylogenies from sets of unrooted gene family trees based on results presented in [1,2].
Click here for the description of the software (this file is included in the packages below).

Downloads
Unix v1.02 sources (run make to compile): urec102src.tgz
Windows executables v1.01 (console application; mingw compilation): urec101mingw.zip (soon version v1.02)

Old versions

Changes
  • v1.02 - improvements in random species/gene trees generator: -u and -E option

The result of experiments
General info
The sequences grouped into families taken from: http://cbi.labri.fr/Genolevures/. Unrooted gene trees computed by ClustalW. Species:
  • c - C. glabrata
  • k - K. lactis
  • d - D. hanseni
  • y - Y. lipolytica
  • s - S. cerevisiae
The names of gene families (taken from Genolevures) and its unrooted gene trees: fam2gtrees.txt.
Input files
The sets of gene and species trees:
  • strees.txt - the set of all 105 rooted species trees with leaves labeled by c,k,d,y and s.
  • gtrees.txt - the set of 4807 unrooted gene family trees computed by ClustalW.
The first experiment (by cost)
The summary of costs can be computed by:
    urec -b -S strees.txt -G gtrees.txt -cC
or in sorted version (in Unix with sort):
    urec -b -S strees.txt -G gtrees.txt -cC | sort -n -k2
To compute distributions of gene duplications and losses add "-d" option:
    urec -b -S strees.txt -G gtrees.txt -cCd
To obtain distributions shown as attributes in the nested parenthesis notation:
    urec -b -S strees.txt -G gtrees.txt -cCx
Results:
  • result.txt - a file which contains 105 rooted species tree topologies sorted by the mutation cost (the first is the optimal species tree); each line contains:
    • number of the species tree
    • topology in nested parenthesis notation
    • mutation cost (dups+losses)
    • total number of duplications
    • total number of gene losses
  • extended_result.txt - an extended version of the previous file; each line contains
    • mutation cost
    • a pair consisting of number of duplications and gene losses
    • a topology with attributes associated to each node; attributes define the distribution of gene dupl. and gene losses in the species tree
  • extended_result.pdf visualization of extented_result.txt
  • details - a directory containing 4708 ps-files (grouped into subdirs) one per each gene family; a file contains an optimal rooted gene tree and its embedding for the optimal species tree; note that only one is shown; other optimal rooted gene trees can be easily obtained by changing the topology of the hat; see [2] for more details
  • A directory with all files
The second experiment (by voting)
Computed by:
    urec -v -S strees.txt -G gtrees.txt
or in sorted version (in Unix with sort):
    urec -v -S strees.txt -G gtrees.txt -cC | sort -r -n -k2
Results (the points are not normalized): resultsvoting.txt

Bibliography
[1] Pawel Gorecki and Jerzy Tiuryn, URec: a system for unrooted reconciliation, Bioinformatics 2007 23(4):511-512
[2] Pawel Gorecki and Jerzy Tiuryn, Inferring phylogeny from whole genomes, Bioinformatics 2007 23: e116-e122 (Proc. of ECCB 2006)

Homepage
Last update: 20.2.2007
Valid HTML 4.01 Transitional