BioComputing

    Raphael - determination of periodicity in protein structures


Quick Help and References

Description
Repeat proteins form a distinct class of structures where folding is greatly simplified. Several classes have been defined, with solenoid repeats of periodicity between ca. 5 and 40 being the most challenging to detect. From a structural point of view, finding repeats may be complicated by the presence of insertions or multiple domains. To the best of our knowledge, no automated methods are available to characterize solenoid repeats from structure. This server allows the analysis of one protein structure of interested to determine periodicity information.

E-Mail address
This is optional but must be supplied if the user requires an email to be sent once the server is finished processing.

Job Name
An optional title for your submission. This will appear in the header of the output. We suggest you select one, especially if sending multiple queries, as they may be completed in a different order.

Structure

This is where you must provide information about your structure. There are 2 possibilities for providing the structure:
  • Entering the 4 letter PDB ID. For example, entering simply 9ANT will execute raphael on chains A, B, C, D, E, F for this PDB file. In addition, one can provide the chain ID in order to single out the calculation for a particular chain. For NMR PDB files the first model is only considered.

    We recommend all users to use chains or domains since RAPHAEL was trained on CATH domains. RAPHAEL behaves very well on domains and even on multi domain chains. However, it remains untested on large structures with multiple chains (e.g. 9ANT).

  • Upload a file in PDB format. The file must conform to the PDB specification.
Output
The output of the RAPHAEL server can be divided in three parts:

The top section contains the periodicity statistics on the structure (see figure 1). To the left some values are calculated on the structure. The following gives a points the user to the correct equation in our paper:

  • Total score: Equation 3 in paper.
  • Variance in period matrix: equation 4 in paper.
  • Variance in sequence seperation for 6 Å contacts: equation 6 in paper.
  • Predicted repeat length: This is the average period in the period matrix with outliers removed. See paper, methods section, for more details.
  • Minimum distance between N and C terminus divided by protein length: Equation 5 in paper but normalized by the protein length.
  • Concentration of 15 Å contacts with sequence seperation > 55: equation 6 in paper.
  • SVM SCORE: The final output of the machine learning algorithm. Negative values indicate globular structures while positive predicts repeated structures. The higher the magnitude the more confident and more repeated ir globular the structure should be.
stats in structure
Figure 1. The JMOL structure and periodicity statistics.
To the right the structure is visualized in an applet using JMOL.

Figure 2 shows the sequence information output by RAPHAEL. There are 3 sequence types:

  • Amino acids.
  • Secondary structure.
  • Insertions: Residues are determined to belong to either periodic parts of the structure or insertions (non-periodic bulges). See paper for more details.
Sequence information
Figure 2. The sequence information in 3 forms: amino acids, secondary structure and if the residue is considered periodic or not.

Two images are also generated: (i) the period matrix and (ii) a measure of the variance in from the calculated period. The period matrix plots the the frequency of periods emanating from each residue. The variance profile tends towards 0 for large deviations from the calculated average period and infinity for highly periodic regions. See the section in the paper titled: "2.4 Finding insertions".
example period matrix
Figure 3. The period matrix and the variance profile from the period matrix.

Examples
Below is the link to sample output of the RAPHAEL server.

Example   -    1hm9 chain A: This chain contains a large inserted domain at the N terminus which detected as such by RAPHAEL.

References

If you use the server in work leading to publications, please cite:
  • Methods paper:
    Ian Walsh, Francesco G. Sirocco, Giovanni Minervini, Carlo Ferrari and Silvio C.E. Tosatto,
    RAPHAEL: Recognition, periodicity and insertion assignment of solenoid protein structures. Bioinformatics, 28 (24), pp. 3257-3264. (2012)


    (c)   04 / 2012   Ian Walsh and   Silvio Tosatto