TACO: Transcription factor Association from Complex Overrepresentation

Citations

Jankowski A., Prabhakar S., Tiuryn J.: TACO: a general-purpose tool for predicting cell-type–specific transcription factor dimers. BMC Genomics 2014, 15:208.

Jankowski A., Szczurek E., Jauch R., Tiuryn J., Prabhakar S.: Comprehensive prediction in 78 human cell lines reveals rigidity and compactness of transcription factor dimers. Genome Res. 2013, 23:1307-18.

Contact

If you have any questions or comments, please contact Aleksander Jankowski.

TACO is actively developed; for the latest version, see github.com/ajank/taco.

Overview

TACO, or Transcription factor Association from Complex Overrepresentation, is a program to predict overrepresented motif complexes in any genome-wide set of regulatory regions.

Release package

The latest packaged version of TACO is 1.0. The release package, TACO-1.0.tar.gz, contains both the source code and example specifications. It is licensed under the GNU General Public License. For the latest version, see github.com/ajank/taco.

Installation instructions

TACO is written in C++ and should run on any Unix-like operating system, such as Linux and Mac OS X. To compile it, run make. After a successful compilation, the executable file src/taco could be copied to a system-wide directory, such as /usr/local/bin.

TACO makes use of R library functions. It may happen that you do not have R installed, or it was not built as a library. In such a case, install the standalone R math library, found in the package libRmath-devel or r-mathlib (depending on the system distribution). If you encounter any problems with the compilation, please contact the author.

Example specifications

In the release package, a few example specification files are provided. To repeat the analyses, you will need:

the reference human (hg19) genome (FASTA format)
a motif database – use either TRANSFAC (commercial), JASPAR or SwissRegulon
a list of input datasets (narrowPeak or BED format).

We provide example lists of UW and Duke open chromatin datasets, as well as the respective URLs of narrowPeak files to be downloaded from the ENCODE Project. To download the latter ones, go to the wgEncodeUwDnase_hg19 or wgEncodeOpenChromDnase_hg19 subdirectory and run wget -i urls.list.

We also provide example list of K562 ChIP-seq peaks, and the respective URLs, in similar manner. To repeat this analysis, for each dataset you will also need the top 5 motifs found in ChIP-seq peaks using MEME. They can be downloaded from Factorbook or generated locally.

Example results

Using a set of UW open chromatin datasets in mouse, we generated a comprehensive list of 186 cooperativity predictions in mouse; see the graphical view (44 MB PDF file) and the tabular description of underlying motif complexes. For a comprehensive list of 603 cooperativity predictions in human, please refer to our previous work.

Documentation

The full documentation of TACO is provided on a separate page.