External motifs
MMF allows clustering external motifs with those found by motif-discovery programs. The main source of such motifs is Jaspar database, however there is also a possibility to add any custom motifs (in format specified here) or an external database.
Motifs from the Jaspar database
JASPAR is a collection of transcription factor DNA-binding preferences, modeled as matrices. It consists of the following sub-databases:
- JASPAR_CORE (default) - contains a curated, non-redundant set of 123 profiles, derived from published collections of experimentally defined transcription factor binding sites for multicellular eukaryotes
- JASPAR_PHYLOFACTS - consists of 174 profiles that were extracted from phylogenetically conserved gene upstream elements. It is a mix of known and as of yet undefined motifs
- JASPAR_FAM - consists of models describing shared binding properties of structural classes of transcription factors
- JASPAR_POLII - consists of models describing patterns found in RNA Polymerase II (Pol II) promoters
- JASPAR_CNE - a collection of 233 matrix profiles derived by clustering of overrepresented motifs from human conserved non-coding elements
- JASPAR_SPLICE - contains matrix profiles of human canonical and non-canonical splice sites, as matching donor:acceptor pairs
More detailed information containing these subdatabases can be found at the Jaspar project website (link below).
Reference: Sandelin A, Alkema W, Engstrom P, Wasserman WW, Lenhard B. JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D91-4.
Website: http://jaspar.genereg.net/
Acceptable formats of the motifs provided by user
Format description | Examplary data describing motif |
Logo of the motif created
with the WebLogo tool |
---|---|---|
Position Frequency Matrix (PFM) |
0 5 0 1 0 2 0 4 5 1 0 0 1 4 1 0 6 0 0 0 |
|
Matrix format used in Transfac database |
01 0 3 1 1 C 02 0 0 1 4 T 03 0 4 0 1 C 04 0 0 0 5 T 05 5 0 0 0 A 06 5 0 0 0 A |
|
Matrix format from Transfac with header line and arbitrary data in first and last column | PO A C G T XXX1 1 3 4 1 XXX6 XXX2 0 0 1 8 XXX7 XXX3 1 7 0 1 XXX8 XXX4 3 0 0 6 XXX9 XXX5 9 0 0 0 XXX10 |
Format of the database provided by user
Format of the external database containing motifs has to be the same as the one used by file MATRIX_DATA.txt in Jaspar database. Each line must consist of four positions:<motif's identifier> <symbol> <position> <frequency>A good example can be found here.
Short example describing database containing two motifs:
MA0001 A 1 0.0000 MA0001 A 2 3.0000 MA0001 A 3 79.0000 MA0001 A 4 40.0000 MA0001 C 1 94.0000 MA0001 C 2 75.0000 MA0001 C 3 4.0000 MA0001 C 4 3.0000 MA0001 G 1 1.0000 MA0001 G 2 0.0000 MA0001 G 3 3.0000 MA0001 G 4 4.0000 MA0001 T 1 2.0000 MA0001 T 2 19.0000 MA0001 T 3 11.0000 MA0001 T 4 50.0000 MA0052 T 5 49.0000 MA0052 T 4 55.0000 MA0052 T 3 0.0000 MA0052 T 2 58.0000 MA0052 T 1 7.0000 MA0052 G 5 0.0000 MA0052 G 4 0.0000 MA0052 G 3 0.0000 MA0052 G 2 0.0000 MA0052 G 1 0.0000 MA0052 C 5 0.0000 MA0052 C 4 1.0000 MA0052 C 3 1.0000 MA0052 C 2 0.0000 MA0052 C 1 50.0000 MA0052 A 5 9.0000 MA0052 A 4 2.0000 MA0052 A 3 57.0000 MA0052 A 2 0.0000 MA0052 A 1 1.0000