Swelfe - Help
SIM algorithm applied to the search for internal repeats in DNA and amino acid sequences and in 3D structures
Structures must be in PDB format. PDB files can be found on the Protein Data Bank site.
You have to choose only one structure and if the structure has several chains, you have to choose only one chain. (You can either parse the PDB file or, more easily, paste the name of the PDB and the chain in the correct box, for example 1b7fA).
You can choose the "3 levels" option : the program will search for DNA and amino acid sequences corresponding to your PDB file and compute alignment for DNA, amino acid sequence and for 3D structure. Be careful : sequences will be cut to be as long as PDB files.
Sequences must be in fasta format. You have to choose only one sequence.
Optional parameters
For structures, the program will take into account the max RRMSD and one other parameter (by default minimum score = 350).
For amino acid sequences, the program will consider two parameters including the substitution matrix (by default probability less than 0.01, matrix Blosum62).
For DNA sequences you must choose only one parameter (by default probability less than 0.01).
(Only for structures)
The Relative RMSDD is calculated after Smith and Waterman alignment and after best superimposition between the two repeats. It checks that the two repeats superimpose well. This is a RMSD independant of the length of the repeat. (Betancourt & Skolnick 2001)
By default the maximum RRMSd is 0.5. All repeats beyond this threshold are not shown.
(Only for sequences)
This option checks that scores of repeats are statistically significant. This is done by Waterman and Vingron method (1994), and with 100 random sequences.
The smaller the p-value is, the more significant the repeat is.
The default value is 0.01.
This option can select repeats by score. It is very useful for structures as we can not calculate p-value for these repeats.
The default is 350 for structure. It is equivalent to a match composed of 7 or 8 residues perfectly aligned.
You can choose how many repeats you want to obtain. This option can detect small repeats that are not significant or that have a score lower than the threshold.
You can choose the minimum length for repeats.
For amino acid sequence, you can choose your substitution matrix. By default Blosum62 is used but you can also use Blosum45, Blosum80, Pam30, Pam50, Pam125 or Pam250.
Gap opening and extension penalty
You can change default values.
Remove overlapping repeats
For structures with long or many alpha helixes or for very repetitive sequences, repeats found can overlap a lot each other. This option suppresses repeats that overlap each other more than X %.
This option is used after alignment.