What is 3DMSS-Sites ?
3DMSS-Sites provides means to search for 3D sub structures between structures. The search is general: 3DMSS-Sites searches for similar "clouds" of coordinates between sets of coordinates, i.e it is independent on the order of the atoms of the structures. Atomic or residue type compatibilities can be defined to avoid non relevant pairings.
3DMSS-Sites relies on two different algorithms Escan [1] and CSR [2] that have different properties.
3DMSS-Sites query can be a protein structure, a 3D motif, a collection of motifs or the bank of non redundant motifs that have been extracted from the Catalytic Site Atlas. Thus, 3DMSS-Sites can search for occurrences of any known catalytic site geometrically compatible with a given protein structure.
Alternatively to this help, consider reading the 3DMSS-Sites FAQ (frequently asked questions)
        
Quick access to 3DMSS-Sites search.

1. History
2. Concepts
3. Usage

4. Examples, sample tests

History:

1998: Escan and CSR algorithms are published.
2004: Escan and CSR algorithms are implemented as separated online services .
2005: 3DMSS v1.0. Combines Escan and original CSR algorithm.
2006: 3DMSS-Sites v1.5. Enhanced CSR version. 3DMSS is coupled with the collection of Catalytic sites identified in the Catalytic Site Atlas.

Concepts:

In the 3DMSS-Sites service, two different algorithms can be employed, both considers atomic coordinates without taking into account atoms order:

Usage:

1. Prepare your query.
(Not mandatory for 3DMSS-Sites)
Watch the format!
PDBpart can help!
2. Prepare your bank.
(Not mandatory for 3DMSS-Sites/CSR/Escan)
PDBPileUp can help!

3. Select an algorithm and run the search

3DMSS-Sites
  • Combine Escan and CSR(re) methods
  • Predefined motifs and bank of motifs of Catalytic Site Atlas
  • Coordinates plus atom/residue types


Access to former service versions remains temporarilly possible here, although we discourage using these versions.
CSR
Coordinates only

CSR1 (older version)
Escan
Coordinates plus atom/residue types

Escan1 (older version)



PDBPileUp: This service is to make easy the creation of specific banks. It will pile up the PDB files into the multi PDB format accepted by 3DMSS-Sites search algorithms.

PDBpart: This service is a help to define a query. It will apply a mask to residues of a PDB file to return only some residues.

PDBTM: This service allows to apply the 3D transformation matrices returned as REMARKs in the 3DMSS-Sites result files.

OpenBabel: Interconvertion of file formats.


Input/Output formats:

   
The input/output format is based on the PDB file format.
       The Escan REMARK line format is as follows:
       REMARK     keyword value
             <---> 5 blanks (mandatory)

       REMARKs apply to the previous ATOM or HETATM lines.
       Possible keywords are:
       keyword=WEIGHT.  (note the dot)
            WEIGHT is the weight affected to the atom. By default (no weight specified) the weight is 1.
              Example:
               ATOM    371 OD1  ASP    51      -0.661  -1.665   1.498 0.000 0.000
               REMARK     WEIGHT. 2.00
       keyword=MATCH.
            MATCH is a UNIX regular expression specifying atom types matched by the coordinates:
                  .  any symbol
                  *  preceeding expression repeated 0 times or more
                  + preceeding expression repeated 1 times or more
                  |  or
                  [] domain
               Example:
               ATOM    371 OD1  ASP    51      -0.661  -1.665   1.498 0.000 0.000
               REMARK     MATCH. O.*
                means that OD1 371 can match any oxygen. To specify a match against only OD or OE, one should use:
                REMARK     MATCH. OD|OE
                or
                REMARK     MATCH. O(D|E)

                If no match is specified, the ATOM lines only match atoms with the same name.
       keyword=RESMATCH.
                This is the pending of MATCH for residue names.
                Example:
               ATOM    371 OD1  ASP    51      -0.661  -1.665   1.498 0.000 0.000
               REMARK     RESMATCH. GLY|T.*
                implies OD1 371 can match any atom in GLY or any atom in a residue which name starts with T (e.g. THR, TYR, TRP).
                If no RESMATCH is specified, the atom can match any residue type (i.e the default is equivalent to  RESMATCH. .*)
       It is possible to combine MATCH and RESMATCH:
       Example:
               ATOM    371 OD1  ASP    51      -0.661  -1.665   1.498 0.000 0.000
               REMARK     MATCH. O.*
               REMARK     RESMATCH. ASP
       OD1 371 can match any atom with name starting with O in an ASP.

       Note about PDB atom name for MATCH. and RESMATCH.

        PDB atom names are specified from column 13 to 16, i.e. many atom
        names begin with a space character and all atom names are 4 characters
        long (for example, alpha-carbon name is " CA ", peptidic nitrogen is " N  "
        In Escan, a conversion is done from these PDB atom names into an
        internal representation before matching rules are applied.
        You should know these conversion rules in order to use correct regular
        expression for the atom name specifications.

        Hereafter "WXYZ" refers to the 4 characters code of atom name in PDB
        (for example, for alpha-carbon atoms, "WXYZ" is " CA ")

        If the first character, "W", is in the [A-Z] range then
                Escan matchname is ":WX"
        else if "Y" is in the [A-Z] range
                Escan matchname is "XY"
        else
                Escan matchname is "X"

      Examples:

        PDB name      ->     Escan internal name
        " CA "        ->     "CA"         (alpha-carbon)
        "ZN  "        ->     ":ZN"        (Zinc)
        "CA  "        ->     ":CA"        (Calcium)
        " CO "        ->     "CO"
        " HA1"        ->     "HA"
        " OD1"        ->     "OD"
        " O1D"        ->     "O"
        "C1  "        ->     ":C1"                                                                     


    1. Bank:
      • Both CSR and Escan accept multiPDB files as bank. A multiPDB file is one file containing series of PDB files. Each individual entry MUST start with a HEADER lines and end with END line. To preserve OpenBabel compatibility, COMPND is also accepted instead of HEADER. For CSR, it is however important that the bank contains homogeneous data (e.g. protein trace only or protein heavy atoms, but not some traces and some full protein (all heavy atoms)).
      • The PDBPileUp utility can generate mutliPDB files easily from PDBIds or pile up PDB files iteratively.
KIT =      12       3       1 ; N =     4     4 ; RMS , SUP :     0.567706 0.759076
                    Here, the N reports the number of atoms involved in a match found by CSR.  It may not meet the requirements of the search parameters.
                   Accepted matches are followed by a series of lines starting with:
MATCH:     8     0.655     0.914
                   where 8 is the number of atoms paired in the hit, and 0.655     0.914 the RMSd and the tolerance.
                   For requests leading to no hit, the KIT lines may help to adjust the maximal number of atoms.
                   You may also think of increasing the number of iterations ...
                   You may also check carefully indications related to the CUTOFF VALUE TOO SMALL or TOO LARGE.

Search Parameters

Note: the documentation for the individual Escan and CSR services is presently left in this page, in order to provide some additional help understanding 3DMSS-Sites parameter choice.
         Note: for CSR, the parameters but the cutoff distance and parameters related to the number of iterations are only used a posteriori, to filter among the solutions identified.
            Otherwise, CSR would return a solution for each query against each member of the bank.
Examples



hiv3
HIV-1 protease catalytic site query:
(this motif is already defined in 3DMSS-Sites, but it can be copied/pasted in Escan/CSR input query field)

HEADER    COMPLEX (ASPARTIC PROTEASE/INHIBITOR)   27-JAN-98   1A30
TITLE     HIV-1 PROTEASE COMPLEXED WITH A TRIPEPTIDE INHIBITOR
ATOM    197  CG  ASP A  25      15.491  27.364   6.131  1.00 16.91           C
ATOM    198  OD1 ASP A  25      15.413  27.271   4.884  1.00 18.26           O
ATOM    199  OD2 ASP A  25      14.999  26.534   6.915  1.00 19.87           O
ATOM    204  CB  THR A  26      16.491  31.246   1.312  1.00 14.58           C
ATOM    205  OG1 THR A  26      16.917  30.009   0.732  1.00 14.02           O
ATOM    950  CG  ASP B  25      15.325  25.224   1.561  1.00 14.53           C
ATOM    951  OD1 ASP B  25      15.806  25.931   2.467  1.00 16.12           O
ATOM    952  OD2 ASP B  25      14.535  24.296   1.789  1.00 21.36           O
ATOM    957  CB  THR B  26      20.365  28.766   2.172  1.00 18.08           C
ATOM    958  OG1 THR B  26      19.571  29.418   3.168  1.00 18.28           O
END

Note: A bank containing only 5 Aspartyl Proteases to test with CSR the same catalytic site used for  HIV-1 protease catalytic site can be accessed here. (For CSR, you must set the minimal number of atoms to 8, and to start let the other default values).
Using CSR, that does not take atomic type into account, good geometric matches are found, although not corresponding to the site. Such result is perfectly consistent with the goal of CSR. It illustrates the different scopes in which CSR and Escan should be used, as well as the extreme care required, when addressing a particular problem, to choose the correct algorithm, and to prepare both query and bank in a relevant manner. The probability that some correct geometric match occurs between a query of small size and proteins having several thousands of atoms is large. CSR is best suited for larger problems!

match in 1B5F:

1B5F
ATOM    248  CG  ASP A  32      15.540  27.419   6.153  0.00 24.19
ATOM    249  OD1 ASP A  32      15.684  27.233   4.927  0.00 19.15
ATOM    250  OD2 ASP A  32      14.739  26.748   6.828  0.00 22.25
ATOM    255  CB  THR A  33      16.193  31.216   1.412  0.00 15.94
ATOM    256  OG1 THR A  33      16.811  30.036   0.895  0.00 16.07
ATOM   1712  CG  ASP A 215      15.375  25.247   1.499  0.00 19.27
ATOM   1713  OD1 ASP A 215      15.897  25.975   2.357  0.00 19.41
ATOM   1714  OD2 ASP A 215      14.602  24.312   1.764  0.00 18.85
ATOM   1719  CB  SER A 216      20.398  28.679   2.094  0.00 21.56
ATOM   1720  OG  SER A 216      19.674  29.195   3.203  0.00 23.20


Serine protease catalytic site query:
(this motif is already defined in 3DMSS-Sites, but it can be copied/pasted in Escan input query field)

HEADER    1A3B HIS/ASP/SER CATALYTIC SITE
ATOM    551  ND1 HIS H  57       8.981  -8.152  15.830  1.00 17.35           N 
ATOM    552  CD2 HIS H  57       9.053  -9.467  17.592  1.00 14.44           C 
ATOM    553  CE1 HIS H  57      10.246  -8.415  16.098  1.00 20.68           C 
ATOM    554  NE2 HIS H  57      10.307  -9.197  17.145  1.00 21.37           N 
ATOM   1043  CG  ASP H 102       7.273  -5.970  13.672  1.00 14.00           C 
ATOM   1044  OD1 ASP H 102       6.614  -5.729  14.754  1.00 13.43           O 
ATOM   1045  OD2 ASP H 102       8.266  -6.793  13.690  1.00 16.16           O 
ATOM   1795  N   SER H 195      15.641  -7.104  17.788  1.00 14.34           N 
ATOM   1798  O   SER H 195      14.090  -5.235  19.221  1.00 15.16           O 
ATOM   1800  OG  SER H 195      13.458  -8.838  16.720  1.00 17.11           O 
END




Zinc fixation site query:
(this motif is already defined in 3DMSS-Sites, but it can be copied/pasted in Escan input query field)

HEADER  BINDING SITE OF ZN IN 1HSZ                            1hsz 
ATOM    718  SG  CYS A  97       3.734  16.785 -18.443  1.00 18.15           S  
REMARK     MATCH. N.*|O.*|S.*
REMARK     RESMATCH. CYS|HIS|HOH
ATOM    737  SG  CYS A 100       2.286  15.349 -15.180  1.00 18.51           S  
REMARK     MATCH. N.*|O.*|S.*
REMARK     RESMATCH. CYS|HIS|HOH
ATOM    761  SG  CYS A 103       5.608  14.289 -16.252  1.00 16.69           S  
REMARK     MATCH. N.*|O.*|S.*
REMARK     RESMATCH. CYS|HIS|HOH
ATOM    826  SG  CYS A 111       3.074  13.055 -18.482  1.00 21.91           S  
REMARK     MATCH. N.*|O.*|S.*
REMARK     RESMATCH. CYS|HIS|HOH
END


2pec-2bsp-view12pec-2bsp-view2
2pec-2bsp-view1
   



References
[1] Escan: Escalier, V., J. Pothier, H. Soldano and A. Viari "Pairwise and multiple identification of three-dimensional common substructures in proteins." J. Computational Biology (1998)  5(1):41-56.
[2] CSR: M. Petitjean "Interactive Maximal Common 3D Substructure Searching with the Combined SDM/RMS Algorithm" Comput. Chem. (1998) 22[6],463-465).