Features
All the applications are in the bin folder, in there you will find a set of programs ready to use, all of them have the -h option, that shows which are the possible options to run the program. In the following section all the application are explain and also there is at least one example on how to use it.
Contents
Biopool library
The Biopool class implementation follows the composite design pattern and for a complete description of the class hierarchy we recommend to see the [Doxygen documentation]. Without going into implementation details a Protein object is just a container for vectors representing chains. Each vector has 2 elements: the Spacer and the Ligand Set. The Spacer is the container for AminoAcid objects whereas the LigandSet is a container for all other molecules and ions, including DNA/RNA chains. Ultimately all molecules, both in the Spacer and in the LigandSet are collections of Atom objects. The main feature in Biopool is that each AminoAcid object in the Spacer is connected to its neighbours by means of one rotational vector plus one translational vector. This implementation make ease the modification of the protein structure and lot of functions were implemented to modify/perturbate/transformate the residue relative position in an efficient way. Rotation and Translation vectors:
The object representation look like that:
Victor includes different packages: Biopool, Lobo and Energy. Every package is identified by a directory, starting with a capital letter, in the main Victor path. Inside each package you will find the Source folder containing the classes code and the APPS directory including useful utilities. In the main path you will find the data folder containing symbolic links to data files used by singular packages. In the main Victor path you should also find the bin directory containing most important programs simply copied from the APPS folders.
Parsing a PDB file (PdbLoader)
Biopool uses the PdbLoader class to load PDB files. By default it loads all standard residues and hetero atoms excluding nucleotides and water molecules. When possible it also tries to place hydrogen atoms to every amino acid included in the spacer and determine the secondary structure with the DSSP algorithm. The simplest way to load a PDB into a Protein object is:
#include <PdbLoader.h>
#include <Protein.h>
#include <iostream>
using namespace Victor::Biopool;
using namespace Victor;
int main( int argc, char* argv[] ) {
string inputFile = "MyPdbFile.pdb";
ifstream inFile( inputFile.c_str() );
PdbLoader pl(inFile); // creates the PdbLoader object
Protein prot;
prot.load( pl ); // creates the Protein object
}
A complete explanation about generating a new project is available here in the tutorial section.
Get the secondary structure
There are 3 different ways in Victor to get the secondary structure. The first (inaccurate) is just parsing the HELIX and SHEET fields in the PDB file. The second method is to infer the secondary structure from torsional angles. The last choice is to use an implementation of the DSSP algorithm, consider that you can find little (negligible) differences compared to the original algorithm but it is the most accurate way to calculate the secondary structure.
Align
Remember to set environment variables before running any application:
export VICTOR_ROOT=/<your_folder>/victor/ export PATH=$PATH:/<your_folder>/victor/bin/
For a preliminary overview and explanation of Align classes visit the Introduction page.
Supposing you have already found a template candidate, you need to align it against your target sequence. The subali application let you choose from very different type of algorithms, strategies and parameters.
Strategy:
- Sequence to sequence
- Profile to sequence
- Profile to profile
Alignment algorithm:
- Local
- Global
- Freeshift
Blosum substitution matrix:
- 62
- 45
- 50
- 80
Weighting scheme (only for profiles):
- PSIC (Sunyaev et al., 1999)
- Henikoff (Henikoff & Henikoff, 1994)
- SecDivergence (Rychlewski et al., 2000)
Scoring function (only for profile to profile alignments):
- Sum of pairs
- CrossProduct
- LogAverage (Ohsen and Zimmer, 2003; von Ohsen et al., 2003)
- Dot product
- DotPFreq (Wang and Dunbrack, 2004)
- DotPOdds (Wang and Dunbrack, 2004)
- EDistance (Euclidean distance)
- Pearson (Pietrokovski, 1996)
- JensenShannon (Yona and Levitt, 2002)
- Atchley metric
- AtchleyDistance (Atchley et al., 2005)
- AtchleyCorrelation (Atchley et al., 2005)
The simplest code fragment to generate a global alignment is:
Blosum sub(matrixFile);
SequenceData ad(2, seq1, seq2);
ScoringS2S sc(&sub, &ad);
NWAlign nwAlign(&sc, &ad, gapPenalty, gapExtension);
Sequence to sequence alignment
subali allows to generate complex alignments using the full functionality provided by the software, one of this alignment types is Sequence to sequence. In case of using this type of alignment no weighting nor scoring function is used.
subali --in ../samples/AlignSeq2SeqData/input.fasta
The input.fasta file should contain the target and the template fasta sequence.
By default the application sets some values, that can be changed by the user using the corresponding option, for more detail use:
subali -h
The values used by default are:
Alignment algorithm : local Suboptimal alignments : 1 Substitution matrix : blosum62 Gap function : AGP GAP open : 12.00 Gap extension : 3 Secondary information : no Output : screen
For more alignment types go to Tutorial
Energy
Remember to set environment variables before running any application:
export VICTOR_ROOT=/<your_folder>/victor/ export PATH=$PATH:/<your_folder>/victor/bin/
How to obtain the solvation potential
pdb2solv is an application that creates a file containing all the frequencies of occurrence of a residue with burial r, that are needed to derived the solvation potentials for all the amino acids in a PDB. The solvation potential for a residue a is defined as:
S = RT * ln(fa(r)/f(r))
Where r is the degree of residue burial, fa(r) is the frequency of occurrence of residue a with burial r and f(r) is the frequency of occurrence of all residues with burial r.
The degree of burial (r) of a residue is defined as the number of Cβ atoms (of other residues) lying in a sphere centred in its Cβ with a radius of 10 Å (non polar) or 7 Å (polar).
To obtain the solv.par file used for pdb2energy, frst, and other applications, you need run this command for all PDBs in your dataset.
pdb2solv -i ../samples/119L.pdb
The output will depend on the given options, considering 30 maximum possible bins (by default test.out, use -o option to set a name)
Non polar output file format
total quantity of residues evaluated | AA type(3L) | frequencies
Polar output file format
P | total quantity of residues evaluated | AA type(3L) | frequencies | Polar frequency |
For a detailed example see pdb2solv example
How to obtain torsion angles from a PDB
The application pdb2tor extract angles for each residue. As input it uses a PDB file and the corresponding chain, or a file with the PDB ids which can optionally include the chain. If the chain is omitted the application uses the first parsed chain.
Structure of the pdb filelist Uses the first chain for each pdb
PDBID PDBID PDBID
To use the corresponding chain for each pdb, need to use the --complete option
PDBID chain PDBID chain PDBID chain
This application can be used also to generate the tor.par file used for TAP application. To generate it you need to use the following line with the TOP500H database.
pdb2tor -I <filelist> --complete
Output format (-A option, Give per residue phi, psi, omega, chi, pre-psi and pre-psi angle)
AA Type(one letter format) | Number | pre-phi | pre-psi | phi | psi | omega | chi1 | chi2
Output format (using -r option)
phi | psi | AA type | pre phi | pre psi | omega | #carbons | chi1 | chi2
For a detailed example see pdb2tor example
How to obtain normalized energy from a PDB
The application pdb2torenergy calculates a pseudo-energy to evaluate the quality of a structural model, as expressed in a single (real) number. This program allows you to obtain the normalized energy mentioned in TAP paper. By default the program uses tor.par that is created by pdb2tor. To calculate the normalized energy multiple PDBs or PDB chains can be used. Depending of the options the energy can be calculated for the entire chain or for each singular residue.
Per residue, one value for each residues in the pdb:
pdb2torenergy -i samples/119L.pdb --allchains -p
Per pdb, one value:
pdb2torenergy -i samples/119L.pdb --allchains
For chain A:
pdb2torenergy -i samples/1IHQ.pdb -c A
For a detailed example see pdb2torenergy example
How to obtain FRST value from a PDB
The application frst allows to calculates the FRST value using solvation potential, torsion angles, rapfdf potential. To use this application some input files are needed. All this mentioned files can be generated using other Energy/Lobo applications or you can use the already generated ones available in the victor/data/ folder.
Default Input files
tor.par, created by pdb2tor solv.par created by pdb2solv ram.par
The -v option will print the following energies separately: Rapdf, Solvation, Hydrogen bonds and Torsion.
To calculate the average energy over a chain for example in a NMR ensemble:
frst -i samples/16PK.pdb
To calculate the average over many PDB files:
frst -I samples/filelist
For a detailed example see frst example
How to obtain TAP value from a PDB
The pdb2tap application allows to evaluate the quality of a protein model or the structure determined by X-ray crystallography. The method is based on a relative pseudo-energy calculated from the side chain and the backbone torsion angle propensities. Both are normalized against the global minimum and maximum for the protein under consideration. The TAP energy, known as normalized torsion angle propensity, gives a indication of the degree of nativeness of the protein model.
The application requires the tor.par file that can be created with the pdb2tor application.
Output format
phi | psi | AA type | pre phi | pre psi | omega | #carbons | chi1 | chi2
Where values close to 1 are associated to good native structure, otherwise is close to 0. An example is available at: http://www.biomedcentral.com/content/supplementary/1471-2105-8-155-s1.txt
Input data
The application can be used with one or many PDBs and PDB chains.
Single structure Xray using one chain:
pdb2tap -i samples/102M.pdb -c A
Output:
Single structure Xray using all chains(all chains in pdb):
pdb2tap -i samples/1A3W.pdb -P sal --allchains
Multiple models NMR using one chain:
pdb2tap -i samples/1IHQ.pdb -P sal -c A --nmr
Prints the tap value for each model, the average tap value for all models, standard deviation, minimum and maximum tap value.
Multiple models NMR using all chains(all chains in pdb):
pdb2tap -i samples/1IHQ.pdb -P sal --allchains --nmr
Prints the average tap value for all chains in each model, the average tap value for all models, standard Deviation, minimum and maximum.
For a detailed example see pdb2tap example
For more reference see:
Fine-grained statistical torsion angle potentials are effective in discriminating native protein structures. PMID: 16712465
Lobo
Lobo is a Loop Modeling software that uses pre-calculated Look-Up Tables (LUTs) that represent loop fragments of various sizes to speed up calculation. LUTs can be generated once and stored, only requiring loading during loop modeling.
Conformations are produced by recursively dividing the segment until the backbone coordinates can be derived analytically.
Remember that before trying any of the following applications the environment variables should be set. Be careful to add the final "/" to the path.
export VICTOR_ROOT=/<your_folder>/victor/ export PATH=$PATH:/<your_folder>/victor/bin/
Obtain torsion angles from a PDB
How to obtain torsion angles of a PDB
Loop2torsion allows to obtain all the phi and psi angles of all amino acids in a selected chain.
loop2torsion -i samples/2R8O.pdb -c A
The output contains the list of the angles and the B-factor of 1.
-72.1 157 1.0 -165 142 1.0 122 -172 1.0 -126 98.1 1.0 ....
How to cluster angle data
ClusterRama can clusterize a Ramachandran distribution. The input file can be for example tor.par generated before with the Energy module (see Energy section). To obtain the clustered data using a cutoff value of 100:
ClusterRama -i data/tor.par -o outRama -c 100.0
The output contains the number of values in the input file, the angles and the corresponding residue name:
12 -55.07 -44.61 GLY 76.11 -172.4 GLY -139.2 129 GLY ...
How to generate clustered lookup tables (REMOVE)
LoopTableTest generates tables of protein entries for the Lobo algorithm .
LoopTableTest -A 1 -B 1 -O output.lt -R outRama -S s
The "output.lt" created is not a plain text file, use LoopTablePlot application to output the corresponding angle values
Min: EP: -4.126 ED: -1.281 N: -0.9997 MP: -1.582 MD: -0.4919 MN: -0.9949 EP: 2.6 ED: -1.332 N: -1 MP: 1.521 MD: 0.4671 MN: -0.8217 EP: -3.966 ED: -1.289 N: -0.9836 MP: -1.598 MD: -0.7378 MN: -0.5885 Max: EP: 3.437 ED: 1.022 N: 0.6597 MP: 0.9131 MD: 0.5203 MN: 0.8068 EP: 4.856 ED: 0.1761 N: 0.6105 MP: 2.486 MD: 0.9987 MN: 0.6888 EP: 3.592 ED: 1.27 N: 0.9813 MP: 1.307 MD: 0.8342 MN: 0.7185 ---------------------------- Entry 0 EP: -2.737 ED: -0.01248 N: -0.02252 MP: -0.8014 MD: 0.2146 MN: 0.6219 EP: 2.699 ED: -1.172 N: 0.5104 MP: 1.879 MD: 0.921 MN: -0.3856 EP: 1.984 ED: -0.6955 N: -0.8596 MP: 1.022 MD: 0.3252 MN: 0.6816
To create the Ramachandran input file that contains the clustered data use ClusterRama application.
How to generate LUTs using Ramachandran clustered data
The ClusterLoopTable program allows you to create a new clustered LUT, based on LUTs already created with LoboLUT or loboLUT_all and defining a cutoff value. In this example, a cutoff of 10 is set, and used a LUT of length 5.
ClusterLoopTable -I data/aa5.lt -O data/aa5clustered.lt -C 10.0
The created output is not a plain text file, to see the content use the LoopTablePlot application
How to analyze the backbone geometry of a PDB
BackboneAnalyzer is an application that allows to analyze a PDB file in terms of bond lengths and bond angles . As input it uses the PDB file and the chain to evaluate
backboneAnalyzer -i samples/2R8O.pdb -c A
The printed output includes the minimum, maximum, average bonds lengths and angles and the corresponding standard deviations.
------------------------------------------------------- Bond Lengths Bond Angles Num N->CA CA->C' C'->N N->CA CA->C' C'->N ------------------------------------------------------- Min: 1.4450 1.5019 1.3206 116.87 104.83 112.55 Max: 1.4804 1.5479 4.0701 158.03 118.34 158.56 ------------------------------------------------------- Avg: 1.4636 1.5272 1.3505 121.58 111.71 116.73 SD: 0.0054 0.0067 0.2074 2.45 2.16 1.98
How to create LUTs
How to create a LUT
The construction of the LUTs is separated from modelling and has to be executed only once. LoboLUT is the program necessary to create a look-up table of a specific length. To create a LUT to model loops of length N, first is necessary to create LUTs from size 2 to N/2. In any case the application would create a binary file containing the corresponding values for the selected length.
Create a first LUT of length 2:
loboLUT -A 1 -B 1 -O aa2.lt --table <destination path>/ -R data/tor.par
Add 1 residue:
loboLUT -A aa2.lt -B 1 -O aa3.lt --table <destination path>/ -R data/tor.par
Create a table of length 4 combining two smaller LUTs.
loboLUT -A aa2.lt -B aa2.lt -O aa4.lt --table <destination path>/ -R data/tor.par
To avoid the annoying task of creating all LUT tables by hand you can use LoboLUT_all that will do the task for you automatically.
N.B. Remember you set the VICTOR_ROOT path to select a convenient destination path.
How to create LUTs for a fragment of size N
LoboLUT_all is a perl script used to automatically generate all the necessary LUTs for modelling a fragment of length N. For example, to create LUTs for a fragment of length 5 you can run the following command:
loboLUT_all -c 5
This will create LUTs for fragments of length 2, 3 and 5. For more details see also loboLUT_all example
Convert a binary LUT into text
LUT tables are generally saved in binary format both for performance and space efficiency. LoopTablePlot is able to convert LUT tables in a human readable textual format. For example, to generate the corresponding plot for the LUT aa5.lt (created previously):
LoopTablePlot -i aa5.lt -o <plot output file> -s l
The s option allows to define the numerical precision (small=s, medium=m, large=l), that, of course, strongly affects the storage size. For a detailed example see LoopTablePlot example
How to model a loop
How to identify loops in PDBs
CreateLoopTestset is a program that allows you to model a single loop. It gives the user full flexibility about the setting of parameters for ranking and modelling. It finds the starting and ending positions in a single o multiple PDB files. Its output can be used to model the loop with the LoopModelTest application. To obtain the list of starting and ending points:
createLoopTestset -o listLoops -i samples/filelist
Where the content in filelist is for example:
samples/173D samples/2MKPC samples/4JDG samples/173L samples/3A0R
The output will be:
index1 (-s): 9 index2 (-e) 13 index1 (-s): 44 index2 (-e) 49 index1 (-s): 52 index2 (-e) 57 index1 (-s): 62 index2 (-e) 70 ..........
Where the (-s) and (-e) are the starting and ending position respectively.
How to model a loop
LoopModelTest allows to generate possible loop conformations and creates a PDB file for each solution:
LoopModelTest -i samples/<pdb_file.pdb> -c A -s X -e Y
Where X and Y are the start and end positions obtained by CreateLoopTestset and -c A tells the program to work on the chain A of an specific PDB file: Using the information obtained with the app CreateLoopTestset
LoopModelTest -i samples/119L.pdb -c A -s 8 -e 12
Remember to create the LUT table for a fragment of length 7 with loboLUT_all.
The new pdbs files are created in the working path. The output columns correspond to the global RMS, end RMS, bond lenght, bond angle and torsion angle:
Results: 1.35 121 180 0 global RMS= 0.416 ( 0.366) end-RMS= 0.234 1.17 126 175 1 global RMS= 0.356 ( 0.295) end-RMS= 0.0822 1.38 121 -176 ......