The seed concept is central in the theory of sequence alignments and there are many examples of efficient tools applying seeds for homology search. Recently subset seeds have been proposed for similarity search in protein sequences.
We experimentally evaluate the applicability of subset seed concept to protein homology search. We advocate the use of multiple subset seeds derived from a hierarchical tree of amino acid residues.
Our method computes, by an evolutionary algorithm, seeds that are specific for a given protein family. The representation of seeds by deterministic finite automata (DFAs) is developed and built into the NCBI-BLAST software.
This extended tool, named SeedBLAST is an open source software freely available for download.
If you have any questions or comments, please do not hesitate to