Help
This page serves as a help reference for the options available in the web input form. Every option has a short description which can be accessed by clicking on a questionmark in the form.
Input data
Input -
input file must contain sequences in FASTA
format. If MEME
program is used, additional limitation
occur. Namely, each sequence's name has to be
at most 24-characters long (sequence's name in
the title line is
everything following the ">" up to the first
blank).
An example of FASTA file:
>SEQ1; M: AACTaGAGTT at 12 ttagaatggttAACTaGAGTTccgtcaggccattgataccgcacagttggtaacactcac ctatatggaaggtaatgtagtggagcgcgtggttgcgtag >SEQ2; M: AACTTGAGTg at 18 aatgcaaatctgtccttAACTTGAGTgcacacactatgtctcgtaaccatgacggtgaag gtacggaacatgcgggccacagtttcgcgggtcttgggtt >SEQ3; M: AACTTGAtTT at 53 actccatgtactgggcttaacaagcccatgctgacgcagaagtttggcatggAACTTGAt TTtcatgcttttataccgccgttaacttcatctatcctca >SEQ4; M: AACTTGcGTT at 28 ggcgtatacacacacgactcagtaaagAACTTGcGTTgctggtcgctccctgagcaggag ggtacgtagtgcgtaaacgtagtcatagttggctaaccct >SEQ5; M: AACTTGAaTT at 53 agagtgggcacgcgggaaacggtgaaaaaagtccagactcaagcggttggttAACTTGAa TTccacccgagtcgtacacgtatggaagtcgagttctagt >SEQ6; M: AAtTTGAGTT at 63 taacgcagcttgcataatacatgacacgatttggccttcgtcacgggctacgtctattgt ccAAtTTGAGTTaagtgccgagttcaatagaaccgtccga >SEQ7; M: AACTTGAcTT at 1 AACTTGAcTTaacggtttacagcgctgtcgcaccgtcaagcgaccctgctgtcctggata aatggtgccgacatccagattcggtggagtactccttccg >SEQ8; M: AACTTGAGTc at 63 ctaggggggctgctactacgattaatgagggccacgcggcagaccggcatgcagtggacc gaAACTTGAGTcgcccacgtgcccctactttgtccgtgga >SEQ9; M: AACTTGAGgT at 11 ttgataatggAACTTGAGgTtctaaaatgagtcctgagtcgactcgaaacagattacggt cggagaaccccattaggttgtacaggcgatagaatggaaa >SEQ10; M: AACTcGAGTT at 71 tttaatgtgtattctattgtaattaggtgtcacttaggctacgcccacatttgatgaagc cagtaattcgAACTcGAGTTgcgtggctgcttacgactcg
Organism - organism that the sequences come from. For each mentioned species different DNA background model is used.
Motif Prediction
Programs - Select which programs you want to use for motif finding. Details...
Search on both strands - whether to search for motifs occurrences only on the given DNA strand or on its reverse complement as well
Motif length - expected motif length; this value is passed to the motif discovery programs. Note: the returned motifs may not be of exactly the given length
Number of results for each program - the maximum number of motifs returned by each motif discovery program. This value should be kept relatively low (1-15), as otherwise the clustering process (especially comparing motifs each with each) may be very time-consuming
External motif predictions - (optional) file with user supplied motifs. These motifs are clustered together with the predicted ones and (optionally) motifs from reference database. Acceptable file formats are described here
Reference Motif Database
Reference database - select a variant of the JASPAR database to be used. details...
User supplied database -
(optional) instead of using Jaspar as a reference DB, one may
prefer to use his/her own database. It must be of the format
described here.
Note: if you specify any database in this field, Jaspar
will not be used
Motif comparison
Distribution comparison function - Select one of the supplied metrics for comparing probability distributions defined by motifs.
Comparison type - there are two ways of obtaining distributions to be compared from motifs. One approach is to take columns from motif's PSPM and is called columns comparison (choose Motif from the select element). Another approach (choose Sequence) is to check how each motif fits on each position of the input sequences and is described here
Motif filtering threshold
- the filtering phase takes place right after discovering the
new motifs. The value specified in this field is used to
determine which motifs returned by the same MDP (Motif
Discovery Program) will be treated as the same motif. Namely,
if the distance between them is less than this value, one of
them is removed. Value 0 will result in skipping the filtering
phase.
Note: this value should be adjusted to the chosen
comparison type and function. Generally, recommended values
are 0.01 - 0.2
Motif clustering
Clustering threshold - this value tells the program when to stop the hierarchical clustering process. Once the smallest distance between two clusters (we define the distance between two clusters as an average distance between objects in these clusters, using specified metric) of motifs is greater than value x, no more clusters are merged. The value x is computed as follows:
- Assume that threshold is the value specified in the motif clustering threshold field
- If use relative threshold is not checked, x = threshold
- If use relative threshold is checked, x = threshold * minRefDB, where minRefDB is the smallest distance between clustered motifs from the reference database
Hence, the latter should be used if we know that each clustered motif from the reference DB should appear in different cluster
Consensus motifs
Column trimming threshold - all positions on consesus motif's edge will be trimmed until one with information content of at least the specified value occurs
Column similarity function for consensus - chosen function will be used to compare columns of each motif to the so-far built consensus motif
Output
Provide data for weblogo - user may wish to create the graphical representation of the consensus motif using WebLogo tool. In such case, the size of alignment representing this consensus can be specified. The more sequences in the alignment, the better the weblogo is (i.e. closer to the actual PSPM of a consensus motif).
back