|
RAPHAEL - Data resource
Precompiled sets
|
|
This page contains some precompiled protein test sets generated with RAPHAEL believed to be of wider use.
If you want to determine periodicity information for your own particular PDB structures use
the RAPHAEL server.
The test sets contain a raw list of CATH domains and PDB chains.
(NB: The test sets are quite large, in the range of tens of MBs to download.)
TRAINING AND TESTING SETS, Downloads and rankings:
-
35% sequence identity:
-
247 non-repeats:
294 CATH domains used for training and testing.
-
105 repeats:
104 CATH domains used for training and testing.
-
Insertions:
Residues assigned visually as periodic or not part of the periodicity.
-
Repeat length:
The repeating unit length assigned visually.
DISCOVERY SETS, Downloads and rankings:
-
35% sequence identity:
- Ranked CATH 3.4 domain list (SVM score≥0)
All CATH domains with SVM scores greater than 0. Period matrix, JPEG pymol generated image and links to CATH.
Useful for a quick visualization of the CATH 3.4 repeated domains. All domains sorted by highest SVM score to 0 (higher SVM score
indicates higher periodicity). Redundancy reduction at S level (pairwise sequence identity ≤ 35%)
- CATH 3.4 domains (SVM score≥0)
All CATH domains with SVM scores greater than 0. PDB and fasta files. Redundancy reduction at S level (pairwise sequence identity ≤ 35%)
- Ranked CATH 3.4 domain list (SVM score≥1)
All CATH domains with SVM scores greater than 1. Period matrix, JPEG pymol generated image and links to CATH.
Useful for a quick visualization of the CATH 3.4 repeated domains. All domains sorted by highest SVM score to 0 (higher SVM score
indicates higher periodicity). Redundancy reduction at S level (pairwise sequence identity ≤ 35%)
- CATH 3.4 domains (SVM score≥1)
All CATH domains with SVM scores greater than 1. PDB and fasta files. Redundancy reduction at S level (pairwise sequence identity ≤ 35%)
-
60% sequence identity:
- Ranked CATH 3.4 domain list (SVM score≥0)
All CATH domains with SVM scores greater than 0. Period matrix, JPEG pymol generated image and links to CATH.
Useful for a quick visualization of the CATH 3.4 repeated domains. All domains sorted by highest SVM score to 0 (higher SVM score
indicates higher periodicity). Redundancy reduction at S level (pairwise sequence identity ≤ 60%)
- CATH 3.4 domains (SVM score≥0)
All CATH domains with SVM scores greater than 0. PDB and fasta files. Redundancy reduction at S level (pairwise sequence identity ≤ 60%)
- Ranked CATH 3.4 domain list (SVM score≥1)
All CATH domains with SVM scores greater than 1. Period matrix, JPEG pymol generated image and links to CATH.
Useful for a quick visualization of the CATH 3.4 repeated domains. All domains sorted by highest SVM score to 0 (higher SVM score
indicates higher periodicity). Redundancy reduction at S level (pairwise sequence identity ≤ 60%)
- CATH 3.4 domains (SVM score≥1)
All CATH domains with SVM scores greater than 1. PDB and fasta files. Redundancy reduction at S level (pairwise sequence identity ≤ 60%)
-
40% sequence identity:
- Ranked PDB chains (SVM score≥0)
All PDB chains with SVM scores greater than 0. Period matrix and JPEG pymol generated image.
Useful for a quick visualization of the PDB repeated chains. All chains sorted by highest SVM score to 0 (higher SVM score
indicates higher periodicity). Redundancy reduction at 40% using cd-hit.
- PDB chains (SVM score≥0)
All CATH domains with SVM scores greater than 0. PDB and fasta files. Redundancy reduction at 40% using cd-hit.
- Ranked PDB chain list (SVM score≥1)
All PDB chains with SVM scores greater than 1. Period matrix and JPEG pymol generated image.
Useful for a quick visualization of the PDB repeated chains. All chains sorted by highest SVM score to 1 (higher SVM score
indicates higher periodicity). Redundancy reduction at 40% using cd-hit.
- PDB chains (SVM score≥1)
All PDB chains with SVM scores greater than 1. PDB and fasta files. Redundancy reduction at 40% using cd-hit.
Instructions:
The sets come as a ".tar.gz" file, meaning that they have to be untarred and
unzipped. To do this on linux, type the following into a shell:
$> cd wherever-you-downloaded-the-package
$> tar xvfz archive-name.tar.gz
$> cd archive-name
In the archive-name directory you will find the *.pdb structures and *.fasta sequences
in the folders titled "pdb" and "fasta".
If you encounter any problems using the downloaded test sets, please contact the author at
silvio@cribi.unipd.it.
(c)
Ian Walsh
for Biocomputing UP,
04 / 2012