Documentation and help
| Contact |
| For questions, feedback and suggestions please contact us using this site |
Overview
Sequence input (Epitope prediction)
The first step is to specify the sequence(s) for which predictions should be performed. There are four input fields, each providing
another possibility to access or define sequences. Please use only one of the according input fields:
Notice:
For protein sequences containing non-standard amino acids the concerning residues are replaced by a dummy ('X'). Peptides containing an 'X' are not considered in epitope prediction.
- RefSeq. This input field provides sequences from a local copy of NCBI RefSeq [1]. As search keys use RefSeq accession or GeneID. Separate search keys by blank.
- Swiss-Prot. This input field provides sequences from a local copy of the UniProtKB/Swiss-Prot [2]. As search keys use Primary accession number, Swiss-Prot entry name or Gene name. Separate search keys by blank.
- Fasta upload. Use this possibility to upload sequence files in Fasta format.
- Paste sequence(s). This input field provides the possibility to paste own sequences. Either paste one or more sequences in Fasta format or enter sequences only. In this case separate the sequences by blank lines.
Notice:
For protein sequences containing non-standard amino acids the concerning residues are replaced by a dummy ('X'). Peptides containing an 'X' are not considered in epitope prediction.
Sequence input (SNEPv2 [3])
The first step is to specify the sequence(s) for which predictions should be performed. There are four input fields, each providing
another possibility to access or define sequences. Please use only one of the according input fields:
Notice:
For protein sequences containing non-standard amino acids the concerning residues are replaced by a dummy ('X'). Peptides containing an 'X' are not considered in epitope prediction.
-
RefSeq.
This input field provides sequences from a local copy of NCBI RefSeq [1].
As search keys use RefSeq accession or GeneID. Separate search keys by blank.
For every sequence SNEPv2 searches NCBI dbSNP [4] for known polymorphisms. Heterozygosity restriction can be used to define a percentage-range to restrict the average heterozygosity of SNPs to search for. To use this option please select the corresponding checkbox and define a percentage range. The left selection defines the lower bound and the right selection defines the upper bound of the percentage-range. Please notice that not all SNP entries in the dbSNP contain heterozygosity annotation. To also access SNPs without heterozygosity annotation do not select the Heterozygosity restriction checkbox. - Swiss-Prot. This input field provides sequences from a local copy of the UniProtKB/Swiss-Prot [2]. As search keys use Primary accession number, Swiss-Prot entry name or Gene name. Separate search keys by blank. For every sequence SNEPv2 extracts known polymorphisms from Swiss-Prot entries.
- Fasta upload. Use this possibility to upload sequence files in Fasta format. Sequence polymorphisms have to be specified in Fasta header. For details on specifying own polymorphisms see A1. Polymorphism format
- Paste sequence(s). This input field provides the possibility to paste own sequences. Paste one or more sequences in Fasta format. Sequence polymorphisms have to be specified in Fasta header. For details on specifying own polymorphisms see A1. Polymorphism format
Notice:
For protein sequences containing non-standard amino acids the concerning residues are replaced by a dummy ('X'). Peptides containing an 'X' are not considered in epitope prediction.
Sequence results (Epitope prediction)
For every query the search result is displayed. Additional information is presented for every available sequence.
To display a sequence click on View button below the sequence identifier. If a sequence listed should not be
processed further deselect the checkbox left of the sequence identifier. By default, all sequences are selected for epitope
prediction. For sequences from either RefSeq or Swiss-Prot links to the original resources are available.
Sequence results (SNEPv2)
For every query the search result is displayed. Additional information is presented for every available sequence.
To display a sequence click on View button below the sequence identifier. If a sequence listed should not be
processed further deselect the checkbox left of the sequence identifier. By default all sequences are selected for epitope
prediction. For sequences from either RefSeq or Swiss-Prot links to the original resources are available.
If polymorphisms were found, details on them are listed with the other sequence information. All polymorphisms are
individually (de-)selectable by the adjoining checkboxes for further processing.
Select alleles
In this step the alleles for which predictions should be performed have to be selected. A tree with default method
and most common alleles is offered (see A6. Default settings). To access all available prediction methods (see A4. Prediction methods)
and alleles click Advanced options. Advanced options offers two possibilities for accessing and creating your personal allele selection:
- Select prediction methods and peptide lengths for which available alleles should be displayed. Clicking Create tree from selection creates an allele tree containing all available alleles for the specified methods and peptide lengths.
- Enter alleles in the input field separated by blanks. Use common allele format e.g. A*0201 or B*1501. Click Search alleles to search for entered alleles and creating a tree containing all prediction methods and peptide lengths available for the found alleles. Take a look at the Allele selection bar top of the allele selection field to see if all requested alleles are available.
Run predictions
After selecting the desired alleles in the allele tree all necessary information is entered for epitope prediction. There are two possibilities to get the epitope predictions:- Display predictions. Do not select the Export predictions checkbox. Click next to run predictions and display the results in your browser.
- Export predictions. Select the Export predictions checkbox. An output format and a filter method can be chosen n the export field. Default values are preselected (see A6. Default settings). Possible file formats are .csv and .xls (see A2. File formats). For filter methods see A5. Filter methods. Click next to run epitope predictions. After predictions have been completed an export button is displayed to download them. Finally, reloading the page after exporting returns to the alleles selection.
Prediction results (Epitope prediction)
Predictions for every target sequence can be accessed via the tabs on top of the predictions frame.
To display another sequence just click the according tab. By default the predictions are filtered (see A6. Default settings).
To get unfiltered predictions or to change the filter method click Advanced options.
A separate table is displayed for every peptide length. The tables can be sorted by each column in
the third row (Pos, Sequence, [ Alleles,... ] ). To sort by a certain column click on the according
table-cell. A second click on same table-cell reverses sort order.
Advanced options
In the advanced options field different filter methods can be applied to the predictions. Three possibilities are available (see A5. Filter methods). To apply a certain filter select the appropriate entry and click Apply filter.Export predictions
To download the actually presented prediction results select Export predictions checkbox. In the export field different file formats can be chosen. Possible formats are .csv and .xls (see A2. File formats). Click export to export prediction results.Prediction results (SNEPv2)
Predictions for every target sequence can be accessed via the tabs on top of the predictions frame.
To display another sequence just click the according tab. By default the predictions are filtered (see A6. Default settings).
To get unfiltered predictions or to change the filter method click Advanced options.
The buttons SNEPv2 predictions and Full predictions can be used to toggle between the SNEPv2 and the epitope
predictions without polymorphisms. A separate result table is presented for every polymorphic peptide and peptide length.
The tables can be sorted by each column in the third row (Pos, Sequence, [ Alleles,... ] ). To sort by a certain column click on the according
table-cell. A second click on same table-cell reverses sort order.
Advanced options
In the advanced options field different filter methods can be applied to the predictions. Three possibilities are available (see A5. Filter methods). To apply a certain filter select the appropriate entry and click Apply filter.Export predictions
To download the actually presented prediction results select Export predictions checkbox. In the export field different file formats can be chosen. Possible formats are .csv and .xls (see A2. File formats). Click export to export predictions.A1. Polymorphism format
If polymorphic sequences should be specified put the polymorphisms in the FASTA header
of the according sequence. Please use the following header format:
> ID | [ Polymorphismlist ]
Polymorphismlist: Semicolon separated list of Polymorphisms
Polymorphism: Position : Aminoacidlist
Position: Polymorphic sequence position (Integer)
Aminoacidlist: Comma separated list of observed amino acids (One letter code)
By default the first specified amino acid is treated as reference residue and should be the amino acid contained in the entered sequence. Any number of amino acids can be specified for one polymorphism. But notice that for protein sequences containing non-standard amino acids the concerning residues are replaced by a dummy ('X'). Peptides containing an 'X' are not considered in epitope prediction.
Example:
> sequence | [5:A,G;10:R,M,V]
XXXXAXXXXRXXXXXXXXXXXXXXXXXXXXXXX
codes for
XXXXAXXXXRXXXXXXXXXXXXXXXXXXXXXXX
XXXXAXXXXMXXXXXXXXXXXXXXXXXXXXXXX
XXXXAXXXXVXXXXXXXXXXXXXXXXXXXXXXX
XXXXGXXXXRXXXXXXXXXXXXXXXXXXXXXXX
XXXXGXXXXMXXXXXXXXXXXXXXXXXXXXXXX
XXXXGXXXXVXXXXXXXXXXXXXXXXXXXXXXX
> ID | [ Polymorphismlist ]
Polymorphismlist: Semicolon separated list of Polymorphisms
Polymorphism: Position : Aminoacidlist
Position: Polymorphic sequence position (Integer)
Aminoacidlist: Comma separated list of observed amino acids (One letter code)
By default the first specified amino acid is treated as reference residue and should be the amino acid contained in the entered sequence. Any number of amino acids can be specified for one polymorphism. But notice that for protein sequences containing non-standard amino acids the concerning residues are replaced by a dummy ('X'). Peptides containing an 'X' are not considered in epitope prediction.
Example:
> sequence | [5:A,G;10:R,M,V]
XXXXAXXXXRXXXXXXXXXXXXXXXXXXXXXXX
codes for
XXXXAXXXXRXXXXXXXXXXXXXXXXXXXXXXX
XXXXAXXXXMXXXXXXXXXXXXXXXXXXXXXXX
XXXXAXXXXVXXXXXXXXXXXXXXXXXXXXXXX
XXXXGXXXXRXXXXXXXXXXXXXXXXXXXXXXX
XXXXGXXXXMXXXXXXXXXXXXXXXXXXXXXXX
XXXXGXXXXVXXXXXXXXXXXXXXXXXXXXXXX
A2. File formats
Two file formats are available for exporting prediction results.
- XLS. This export possibility offers a single xls file for download. For every entered sequence a separate worksheet is contained in the xls file. In case of simple epitope prediction the results for each sequence are ordered by peptide length. In case of SNEPv2 the results are ordered by polymorphic position and by peptide length.
- CSV. The Comma Separated Values files are compressed and archived in one ZIP file which is offered for download. Additionally a README file is contained in the ZIP file which contains information about the file nomenclature, the creation date and the applied filter.
A3. Databases
NCBI RefSeq
The RefSeq database [1] is a collection of DNA, RNA and protein sequences for a variety of organisms. For this service a local copy of this database is used. This copy is periodically updated to provide the most recent RefSeq release.Current local version: Release 39 (2010/02/17)
UniProtKB/Swiss-Prot
The UniProt/Swiss-Prot database [2] is a collection of protein sequences for a variety of organisms. It is a manually annotated and reviewed database with many additional information. For this service a local copy of this database is used. This copy is periodically updated to provide the most recent UnProt/Swiss-Prot release.Current local version: Release 57.14 (2010/02/15)
A4. Prediction methods
A short description of all available prediction methods. Details on the prediction methods can be found in the respective publications.
-
SYFPEITHI. [5]
SYFPEITHI is a based on Position Specific Scoring Matrices (PSSMs).
The matrices are manually generated from naturally processed MHC-ligands from the SYFPEITHI database.
-
Bimas/HLA_Bind. [6]
HLA_Bind was developed at the BioInformatics and Molecular Analysis Section (BIMAS) at the NIH.
The prediction method uses Position Specific Scoring Matrices (PSSMs) that are derived from
experimentally determined relative binding affinities.
The original values in the matrices are log-transformed to obtain an additive scoring scheme.
-
Epidemix. [7]
Epidemix is based on Position Specific Scoring Matrices (PSSMs).
The matrices are statistically computed on the positive training set for SVMHC.
-
SVMHC. [8]
SVMHC uses Support Vector Machine (SVM) - classification to predict MHC binding peptides.
The method is trained on known MHC binding peptides from the SYFPEITHI database and randomly generated non-binders.
-
UniTope. [9]
UniTope uses a single Support Vector Machine (SVM) model for the prediction of all MHC class I alleles.
The model is trained on known MHC binding peptides and on structural information on MHC:peptide complexes.
UniTope performs a binary classification and does not provide scores that represent the relative binding affinities of the peptides.
- Hammer (TEPITOPE). [10] This method is based on Position Specific Scoring Matrices (PSSMs) and predicts binding peptides for MHC class II. The virtual matrices were published by Sturniolo et al. [10] and are used by the TEPITOPE software. Please notice that the matrices were taken from the original paper without changes or updates.
- MHCIIMulti. [11] This method provides predictions for potential T-cell epitopes for a large number of MHC class II alleles.
| Creation date of the methods matrices and data files: | ||
| SYFPEITHI matrices: | November 2007 | |
| Bimas/HLA_Bind matrices: | February 2008 | |
| Epidemix matrices: | June 2006 | |
| SVMHC: | Same as SVMHC webserver | |
| UniTope: | June 2007 | |
| MHCIIMulti: | February 2008 | |
| Hammer matrices: | See Sturniolo et al. (1999) | |
A5. Filter methods
Filter methods are used to determine a threshold that separates the peptides into binders and non-binders
based on the predicted scores. Peptides with a score higher than the threshold will be considered as binders,
peptides with score lower than the threshold will be considered as non-binders.
The following filter options are available:
- No filtering. All predicted scores are displayed. The peptides are not classified as binders or non-binders.
- Filter using halfmax scores. For all matrix-based method, the halfmax scores is defined as half of the maximal value obtainable from the matrix (half of the sum over the maximum value in each column of the matrix). The halfmax scores are used as thresholds. For SVM-based predictions halfmax scores are not defined. The 2%-thresholds are used instead (see "Filter by percentage"). Only peptides classified as binders are displayed.
-
Filtering by percentage.
The thresholds are determined based on the score distribution on a large set of peptides derived from natural proteins.
The thresholds can be interpreted as follows:
Using e.g. a 2%-threshold, two percent of the peptides used to compute the background distribution would be classified as binders. The advantage of this filtering method is, that - in contrast to the halfmax-filtering thresholds for different alleles are comparable. The input format for the percentage thresholds is a float in the interval of [0,1]; e.g. 0.02 for a two percent threshold. Only peptides classified as binders are displayed.
A6. Default settings
For easy use of EpiToolKit default values for all alterable parameters are predefined. The default settings are:
- Prediction method: SYFPEITHI
- Filter method: Halfmax scores
How to cite
| M. Feldhahn, P. Thiel, M. M. Schuler, N. Hillen, S. Stevanovic, H.-G. Rammensee and Oliver Kohlbacher |
| EpiToolKit--a web server for computational immunomics. |
| Nucleic Acid Research 2008 Jul 1;36(Web Server issue):W519-22. Epub 2008 Apr 24. |
| Doi |
| PubMed (PMID: 18440979) |
References
| [1] | Pruitt K.D. , Tatusova T. , Maglott D.R. |
| NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. | |
| Nucleic Acids Res 35:D61--D65 (2007) |
| [2] | The UniProt Consortium |
| The Universal Protein Resource (UniProt). | |
| Nucleic Acids Res 35:D193--D197 (2007) |
| [3] | Schuler M.M. , Dönnes P. , Nastke M.D. , Kohlbacher O. , Rammensee H.G. , Stevanovic S. |
| SNEP: SNP-derived epitope prediction program for minor H antigens. | |
| Immunogenetics 57:816--820 (2005) |
| [4] | Sherry S.T. , Ward M.H. , Kholodov M. , Baker J. , Phan L. , Smigielski E.M. , Sirotkin K. |
| dbSNP: the NCBI database of genetic variation. | |
| Nucleic Acids Res 29:308--311 (2001) |
| [5] | Rammensee H. , Bachmann J. , Emmerich N.P. , Bachor O.A. , Stevanovic S. |
| SYFPEITHI: database for MHC ligands and peptide motifs. | |
| Immunogenetics 50:213--219 (1999) |
| [6] | Parker K.C. , Bednarek M.A. , Coligan J.E. |
| Scheme for ranking potential HLA-A2 binding peptides based on independent binding of individual peptide side-chains. | |
| J Immunol 152:163--175 (1994) |
| [7] | Feldhahn M. |
| FRED: A Framework for T-Cell Epitope prediction | |
| Diplomathesis, University of Tuebingen (2006) |
| [8] | Dönnes P., Kohlbacher O. |
| SVMHC: a server for prediction of MHC-binding peptides. | |
| Nucleic Acids Res 34:W194--W197 (2006) |
| [9] | Feldhahn M. , Toussaint N. , Ziehm M. |
| UniTope. | |
| Unpublished |
| [10] | Sturniolo T. , Bono E. , Ding J. , Raddrizzani L. , Tuereci O. , Sahin U. , Braxenthaler M. , Gallazzi F. , Protti M.P. , Sinigaglia F. , Hammer J. |
| Generation of tissue-specific and promiscuous HLA ligand databases using DNA microarrays and virtual HLA class II matrices. | |
| Nat Biotechnol 17:555--561 (1999) |
| [11] | Pfeifer N. , Kohlbacher O. |
| Multiple instance learning allows MHC class II epitope predictions for alleles without experimental data. | |
| Lecture Notes in Bioinformatics: Proceedings of WABI 2008 : (2008) |
