Protein family trees
Protein family trees should be provided in an enriched recursive format that is similar to Newick, but which puts siblings from the
same species in brackets and adds species information after the closing bracket. This extra information is necessary because we
have to distinguish the species from which the nodes are coming from. Each line in the file should describe one protein family tree
as follows:
([nodeA1, nodeA2, ...]speciesA, [nodeB1, nodeB2, ...]speciesB)familyName;
Where each node has recursive format:
node := proteinName if node is a leaf, or:
node := ([nodeC1, nodeC2, ...]speciesC, [nodeD1, nodeD2, ...]speciesD)nodeName.
Example:
([([ce_ce13562]ce,[dm_dm178]dm)bi_i1318]Bilateria,[sc_sc5512,sc_sc1833,sc_sc3895,sc_sc4997]sc)root_222;