MMF - documentation.

Motif finding

The first step in MMF is launching motif discovery programs. Currently, four of them are available:

BioProspector
MDscan
MEME
Weeder

Each of these programs can be downloaded and used locally.

For all of them at once user can set the following parameters:

whether to search only the input sequences, or their complementary sequences as well
the desired motifs' length
max. number of returned motifs

At the end of this step, the programs' results are gathered together.
[+ TODO: Inner filtering]

BioProspector is a program using a Gibbs sampling strategy, and Markov background to model the base dependencies of non-motif bases.

Reference: Liu X, Brutlag DL, Liu JS. BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pacific Symposium on Biocomputing 2001;:127-38.

Website: http://robotics.stanford.edu/~xsliu/BioProspector/

License: MIT license

MDscan is a program designed specially for ChIP-array experiments, however can be used in other experiments where some of the sequences may contain motif sites. The algorithm combines the advantages of two search strategies: word enumeration and iterative updating of motif's PSSM.

Reference: Liu XS, Brutlag DL, Liu JS, An algorithm for finding protein-DNA binding sites with applications to chromatin immunoprecipitation microarray experiment, Nature Biotechnology 2002 Aug;20(8):835-9.

Website: http://ai.stanford.edu/~xsliu/MDscan/

License: MIT license

MEME(Multiple EM For Motif Elicitation) tool uses a statistical method (EM - Expectation Maximisation) for identifying highly conserved regions.

References: Timothy L. Bailey, Charles Elkan, Fitting a mixture model by expectation maximization to discover motifs in biopolymers, Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, (28-36), AAAI Press, 1994.

Timothy L. Bailey, Nadya Williams, Chris Misleh, and Wilfred W. Li, MEME: discovering and analyzing DNA and protein sequence motifs, Nucleic Acids Research, Vol. 34, pp. W369-W373, 2006.

Website: http://meme.sdsc.edu/meme/meme.html

License: MEME is copyrighted software and can be licensed for commercial use.

Weeder searches for candidate motifs by scanning a suffix tree built for input sequences. Additionally, the program uses a background model based on pre-computed frequencies of all possible 6- and 8-bp subsequences from several most important organisms.

Reference: Giulio Pavesi, Giancarlo Mauri, Graziano Pesole, An algorithm for finding signals of unknown length in DNA sequence, Bioinformatics, Vol. 17 No Suppl. 1, June 2001, Pages: S207-S214.

Website: http://159.149.109.16:8080/weederWeb/

License: Please see Weeder license.

back