Documentation and help
Contact
For questions, feedback and suggestions please contact us using this site
Overview
SBS EpiToolKit is a service of the Division for Simulation of Biological Systems at University of Tuebingen. The aim of this website and its services is to support immunological research. It provides a collection of methods from computational immunology for the prediction of MHC ligands or potential T-Cell epitopes. Additionally, SNEPv2 extends epitope prediction by the possibility to analyze the influence of protein polymorphisms on the immunogenicity of the arising polymorphic peptides.
The navigation through the individual steps is guided by this navigation bar top of each site of the prediction pipeline. In the pipeline it contains a short description of the current step. The color of the step in the navigation bar corresponds to the heading colors of this documentation and the help page.
1. Sequence input
   This step provides input fields to enter the sequences or to request for sequences from databases for which epitope predictions should be performed.
2. Sequence information
   This step displays the search results for sequences and shows additional information. Sequences can be selected or deselected for further steps of the prediction pipeline.
3. Allele selection
   This step provides the available methods and alleles. Select the desired alleles in the allele tree and start predictions.
4. Prediction results
   This step displays the prediction results. Here, different filters can be applied to the predictions.
During each step of the pipeline the corresponding help page can be accessed. To access the help page click on this button     
Select alleles
In this step the alleles for which predictions should be performed have to be selected. A tree with default method and most common alleles is offered (see A6. Default settings). To access all available prediction methods (see A4. Prediction methods) and alleles click Advanced options. Advanced options offers two possibilities for accessing and creating your personal allele selection:
  1. Select prediction methods and peptide lengths for which available alleles should be displayed. Clicking Create tree from selection creates an allele tree containing all available alleles for the specified methods and peptide lengths.
  2. Enter alleles in the input field separated by blanks. Use common allele format e.g. A*0201 or B*1501. Click Search alleles to search for entered alleles and creating a tree containing all prediction methods and peptide lengths available for the found alleles. Take a look at the Allele selection bar top of the allele selection field to see if all requested alleles are available.

Run predictions

After selecting the desired alleles in the allele tree all necessary information is entered for epitope prediction. There are two possibilities to get the epitope predictions:
  1. Display predictions.   Do not select the Export predictions checkbox. Click next to run predictions and display the results in your browser.
  2. Export predictions.   Select the Export predictions checkbox. An output format and a filter method can be chosen n the export field. Default values are preselected (see A6. Default settings). Possible file formats are .csv and .xls (see A2. File formats). For filter methods see A5. Filter methods. Click next to run epitope predictions. After predictions have been completed an export button is displayed to download them. Finally, reloading the page after exporting returns to the alleles selection.
A1.   Polymorphism format
If polymorphic sequences should be specified put the polymorphisms in the FASTA header of the according sequence. Please use the following header format:

> ID | [ Polymorphismlist  ]
    Polymorphismlist:   Semicolon separated list of Polymorphisms
       Polymorphism:  Position : Aminoacidlist
          Position:  Polymorphic sequence position (Integer)
          Aminoacidlist:  Comma separated list of observed amino acids (One letter code)

By default the first specified amino acid is treated as reference residue and should be the amino acid contained in the entered sequence. Any number of amino acids can be specified for one polymorphism. But notice that for protein sequences containing non-standard amino acids the concerning residues are replaced by a dummy ('X'). Peptides containing an 'X' are not considered in epitope prediction.

Example:
> sequence | [5:A,G;10:R,M,V]
XXXXAXXXXRXXXXXXXXXXXXXXXXXXXXXXX
codes for
XXXXAXXXXRXXXXXXXXXXXXXXXXXXXXXXX
XXXXAXXXXMXXXXXXXXXXXXXXXXXXXXXXX
XXXXAXXXXVXXXXXXXXXXXXXXXXXXXXXXX
XXXXGXXXXRXXXXXXXXXXXXXXXXXXXXXXX
XXXXGXXXXMXXXXXXXXXXXXXXXXXXXXXXX
XXXXGXXXXVXXXXXXXXXXXXXXXXXXXXXXX
A2.   File formats
Two file formats are available for exporting prediction results.
  1. XLS.   This export possibility offers a single xls file for download. For every entered sequence a separate worksheet is contained in the xls file. In case of simple epitope prediction the results for each sequence are ordered by peptide length. In case of SNEPv2 the results are ordered by polymorphic position and by peptide length.
  2. CSV.   The Comma Separated Values files are compressed and archived in one ZIP file which is offered for download. Additionally a README file is contained in the ZIP file which contains information about the file nomenclature, the creation date and the applied filter.
A3.   Databases

NCBI RefSeq

The RefSeq database [1] is a collection of DNA, RNA and protein sequences for a variety of organisms. For this service a local copy of this database is used. This copy is periodically updated to provide the most recent RefSeq release.

Current local version:    Release 31   (2008/11/10)

UniProtKB/Swiss-Prot

The UniProt/Swiss-Prot database [2] is a collection of protein sequences for a variety of organisms. It is a manually annotated and reviewed database with many additional information. For this service a local copy of this database is used. This copy is periodically updated to provide the most recent UnProt/Swiss-Prot release.

Current local version:    Release 54.8   (2008/02/12)
A4.   Prediction methods
A short description of all available prediction methods. Details on the prediction methods can be found in the respective publications.
  1. SYFPEITHI.  [5]   SYFPEITHI is a based on Position Specific Scoring Matrices (PSSMs). The matrices are manually generated from naturally processed MHC-ligands from the SYFPEITHI database.
  2. Bimas/HLA_Bind.  [6]   HLA_Bind was developed at the BioInformatics and Molecular Analysis Section (BIMAS) at the NIH. The prediction method uses Position Specific Scoring Matrices (PSSMs) that are derived from experimentally determined relative binding affinities. The original values in the matrices are log-transformed to obtain an additive scoring scheme.
  3. Epidemix.  [7]   Epidemix is based on Position Specific Scoring Matrices (PSSMs). The matrices are statistically computed on the positive training set for SVMHC.
  4. SVMHC.  [8]   SVMHC uses Support Vector Machine (SVM) - classification to predict MHC binding peptides. The method is trained on known MHC binding peptides from the SYFPEITHI database and randomly generated non-binders.
  5. UniTope.  [9]   UniTope uses a single Support Vector Machine (SVM) model for the prediction of all MHC class I alleles. The model is trained on known MHC binding peptides and on structural information on MHC:peptide complexes. UniTope performs a binary classification and does not provide scores that represent the relative binding affinities of the peptides.
  6. Hammer (TEPITOPE).  [10]   This method is based on Position Specific Scoring Matrices (PSSMs) and predicts binding peptides for MHC class II. The virtual matrices were published by Sturniolo et al. [10] and are used by the TEPITOPE software. Please notice that the matrices were taken from the original paper without changes or updates.
  7. MHCIIMulti.  [11]   This method provides predictions for potential T-cell epitopes for a large number of MHC class II alleles.

Creation date of the methods matrices and data files:
SYFPEITHI matrices: November 2007
Bimas/HLA_Bind matrices: February 2008
Epidemix matrices: June 2006
SVMHC: Same as SVMHC webserver
UniTope: June 2007
MHCIIMulti: February 2008
Hammer matrices: See Sturniolo et al. (1999)
A5.   Filter methods
Filter methods are used to determine a threshold that separates the peptides into binders and non-binders based on the predicted scores. Peptides with a score higher than the threshold will be considered as binders, peptides with score lower than the threshold will be considered as non-binders. The following filter options are available:
  1. No filtering.   All predicted scores are displayed. The peptides are not classified as binders or non-binders.
  2. Filter using halfmax scores.   For all matrix-based method, the halfmax scores is defined as half of the maximal value obtainable from the matrix (half of the sum over the maximum value in each column of the matrix). The halfmax scores are used as thresholds. For SVM-based predictions halfmax scores are not defined. The 2%-thresholds are used instead (see "Filter by percentage"). Only peptides classified as binders are displayed.
  3. Filtering by percentage.   The thresholds are determined based on the score distribution on a large set of peptides derived from natural proteins. The thresholds can be interpreted as follows:
    Using e.g. a 2%-threshold, two percent of the peptides used to compute the background distribution would be classified as binders. The advantage of this filtering method is, that - in contrast to the halfmax-filtering thresholds for different alleles are comparable. The input format for the percentage thresholds is a float in the interval of [0,1]; e.g. 0.02 for a two percent threshold. Only peptides classified as binders are displayed.
Notice.   UniTope directly performs a binary classification. For consistency, all filter methods are also available for UniTope predictions but do not have an influence on the classification - peptides with score 1 are always classified as binders whereas peptides with score 0 are always classified as non-binders.
A6.   Default settings
For easy use of EpiToolKit default values for all alterable parameters are predefined. The default settings are:
  1. Prediction method:    SYFPEITHI
  2. Filter method:    Halfmax scores
How to cite
M. Feldhahn, P. Thiel, M. M. Schuler, N. Hillen, S. Stevanovic, H.-G. Rammensee and Oliver Kohlbacher
EpiToolKit--a web server for computational immunomics.
Nucleic Acid Research (2008)
PubMed ID: 18440979
References
[1] Pruitt K.D. , Tatusova T. , Maglott D.R.
NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.
Nucleic Acids Res 35:D61--D65 (2007)
[2] The UniProt Consortium
The Universal Protein Resource (UniProt).
Nucleic Acids Res 35:D193--D197 (2007)
[3] Schuler M.M. , Dönnes P. , Nastke M.D. , Kohlbacher O. , Rammensee H.G. , Stevanovic S.
SNEP: SNP-derived epitope prediction program for minor H antigens.
Immunogenetics 57:816--820 (2005)
[4] Sherry S.T. , Ward M.H. , Kholodov M. , Baker J. , Phan L. , Smigielski E.M. , Sirotkin K.
dbSNP: the NCBI database of genetic variation.
Nucleic Acids Res 29:308--311 (2001)
[5] Rammensee H. , Bachmann J. , Emmerich N.P. , Bachor O.A. , Stevanovic S.
SYFPEITHI: database for MHC ligands and peptide motifs.
Immunogenetics 50:213--219 (1999)
[6] Parker K.C. , Bednarek M.A. , Coligan J.E.
Scheme for ranking potential HLA-A2 binding peptides based on independent binding of individual peptide side-chains.
J Immunol 152:163--175 (1994)
[7] Feldhahn M.
FRED: A Framework for T-Cell Epitope prediction
Diplomathesis, University of Tuebingen (2006)
[8] Dönnes P., Kohlbacher O.
SVMHC: a server for prediction of MHC-binding peptides.
Nucleic Acids Res 34:W194--W197 (2006)
[9] Feldhahn M. , Toussaint N. , Ziehm M.
UniTope.
Unpublished
[10] Sturniolo T. , Bono E. , Ding J. , Raddrizzani L. , Tuereci O. , Sahin U. , Braxenthaler M. , Gallazzi F. , Protti M.P. , Sinigaglia F. , Hammer J.
Generation of tissue-specific and promiscuous HLA ligand databases using DNA microarrays and virtual HLA class II matrices.
Nat Biotechnol 17:555--561 (1999)
[11] Pfeifer N. , Kohlbacher O.
Multiple instance learning allows MHC class II epitope predictions for alleles without experimental data.
submitted : (2008)