External (user-supplied) motifs
MMF allows the user to supply his/her own motifs as the input to the software. These external motifs can be of two different kinds considered at different stages of computation:
- Since only a limited number of tools can be supported by mmf directly, the user is welcome to submit the results of another de novo search algorithm to be considered in addition to the default ones by MMF. MMF accepts a number of formats for user specified motifs.
- A user may also choose to use his/her own reference database of known motifs instead of the JASPAR database provided by MMF. The main difference is that the threshold for dividing motifs into clusters may be determined on the basis of minimal distance between motifs in the reference database, so it is affected by the choice of that database. Also the format of the database is different, because we need to provide meaningful names to the motifs.
Motifs from the Jaspar database
JASPAR is a collection of transcription factor DNA-binding preferences, modeled as matrices. It consists of the following sub-databases:
- JASPAR_CORE (default) - contains a curated, non-redundant set of 123 profiles, derived from published collections of experimentally defined transcription factor binding sites for multicellular eukaryotes
- JASPAR_PHYLOFACTS - consists of 174 profiles that were extracted from phylogenetically conserved gene upstream elements. It is a mix of known and as of yet undefined motifs
- JASPAR_FAM - consists of models describing shared binding properties of structural classes of transcription factors
- JASPAR_POLII - consists of models describing patterns found in RNA Polymerase II (Pol II) promoters
- JASPAR_CNE - a collection of 233 matrix profiles derived by clustering of overrepresented motifs from human conserved non-coding elements
- JASPAR_SPLICE - contains matrix profiles of human canonical and non-canonical splice sites, as matching donor:acceptor pairs
More detailed information containing these subdatabases can be found at the Jaspar project website.
Reference: Sandelin A, Alkema W, Engstrom P, Wasserman WW, Lenhard B. JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D91-4.
Website: http://jaspar.genereg.net/
Acceptable formats of the motifs provided by user
Format description | Examplary data describing motif |
Logo of the motif created
with the WebLogo tool |
---|---|---|
Position Frequency Matrix (PFM) |
0 5 0 1 0 2 0 4 5 1 0 0 1 4 1 0 6 0 0 0 |
|
Matrix format used in Transfac database |
01 0 3 1 1 C 02 0 0 1 4 T 03 0 4 0 1 C 04 0 0 0 5 T 05 5 0 0 0 A 06 5 0 0 0 A |
|
Matrix format from Transfac with header line and arbitrary data in first and last column | PO A C G T XXX1 1 3 4 1 XXX6 XXX2 0 0 1 8 XXX7 XXX3 1 7 0 1 XXX8 XXX4 3 0 0 6 XXX9 XXX5 9 0 0 0 XXX10 |
Format of the database provided by user
Format of the external database containing motifs has to be the same as the one used by file MATRIX_DATA.txt in Jaspar database. Each line must consist of four positions:<motif's identifier> <symbol> <position> <frequency>A good example can be found here.
Short example describing database containing two motifs:
MA0001 A 1 0.0000 MA0001 A 2 3.0000 MA0001 A 3 79.0000 MA0001 A 4 40.0000 MA0001 C 1 94.0000 MA0001 C 2 75.0000 MA0001 C 3 4.0000 MA0001 C 4 3.0000 MA0001 G 1 1.0000 MA0001 G 2 0.0000 MA0001 G 3 3.0000 MA0001 G 4 4.0000 MA0001 T 1 2.0000 MA0001 T 2 19.0000 MA0001 T 3 11.0000 MA0001 T 4 50.0000 MA0052 T 5 49.0000 MA0052 T 4 55.0000 MA0052 T 3 0.0000 MA0052 T 2 58.0000 MA0052 T 1 7.0000 MA0052 G 5 0.0000 MA0052 G 4 0.0000 MA0052 G 3 0.0000 MA0052 G 2 0.0000 MA0052 G 1 0.0000 MA0052 C 5 0.0000 MA0052 C 4 1.0000 MA0052 C 3 1.0000 MA0052 C 2 0.0000 MA0052 C 1 50.0000 MA0052 A 5 9.0000 MA0052 A 4 2.0000 MA0052 A 3 57.0000 MA0052 A 2 0.0000 MA0052 A 1 1.0000
back