Source data – BEI Resources Catalog # NR-15433

Purpose of Analysis

The purpose of this analysis was to provide CCHI researchers with class II binding information on epitopes provided by NIAID to CCHI researchers. The analysis was performed by Toby Cohen (recent graduate of Brown University) in response to requests and guidance by CCHI PI Annie De Groot, URI Professor and CEO/CSO, EpiVax, using tools developed at EpiVax by Matt Ardito, Gary Kurzky, and principle architect, Bill Martin. Additional information on conservation with H1N1 strains can be provided upon request.

This information is provided free of charge and restriction to NIAID. If the data is used in a publication, the above individuals and EpiVax would like to be acknowledged, and we would ask to review the text of the manuscript so as to be sure that the analysis is presented as described. Contact Annie De Groot ( or Toby Cohen ( for questions.

Data Source

Peptides are the critical reagent for immunogenicity studies. To accelerate H1N1 vaccine design, the NIAID provides a series of 139 peptides, known as BEI resources catalog # NR-15433, that span the whole amino acid sequence of the hemagglutinin (HA) sequence of A/California/04/09 H1N1 (GenPept: ACQ76318).

The 139 peptides are all 15 amino acids long overlapping by 11 amino acids with the final peptide in the series consisting of only 14 amino acids. In order to obtain the sequences of the 139 peptides, we applied for level 1 institutional status as mandated by BEI (information on contacting BEI is here: (

When registration was complete, we were able to download the product sheet, which contained the sequences for the 139 peptides. We also downloaded the whole HA sequence of A/California/04/09 H1N1 (GenPept: ACQ76318) from NCBI.

>> Immunogenicity Analysis of BEI peptides
>> Immunogenicity Analysis of the BEI Parent sequence by EpiMatrix
>> Comparison of BEI sequences with clustered epitopes originally selected by EpiMatrix
>>Tools used for this analysis
>> Conclusions

Immunogenicity Analysis of BEI peptides

The sequences for the 139 peptides were uploaded to EpiMatrix version 1.2 as a .cls file named “H1N1_HA_BEI.cls.” The peptides were all labeled as part of a protein labeled “NR-15433” with the start designation of each peptide corresponding to its position within the HA sequence. The whole HA sequence was uploaded as a .pep file named “H1N1_HA_BEI_WHOLE.pep” with the protein labeled as “GB_ACQ76318.1.”

We elected to evaluate just class II epitopes for this “pilot”. We started our evaluation by parsing the provided sequence into overlapping 9-mer frames where each frame overlaps the last by 8 amino acids. Each frame was then analyzed for binding potential with respect to each of a panel of eight common Class II alleles. We consider each frame-by-allele evaluation as a single “assessment”. All EpiMatrix assessment scores (Z-scores) equal to or above 1.64 are defined as “Hits”; that is to say potentially immunogenic and worthy of further consideration. An immunogenicity score is calculated by summing the Z-scores of all of the EpiMatrix “Hits” in a given peptide and adjusting for our expectation based on the length of the submitted peptide. We established our expectation (i.e. the number of high scoring 9-mers we would expect to find by chance alone) by testing a large number of randomly-generated protein sequences. Immunogenicity scores above zero indicate an excess of putative T-cell epitopes and thus an increased potential for immunogenicity. Immunogenicity scores below zero indicate a dearth of putative T-cell epitopes and a reduced potential for immunogenicity. We believe peptides scoring above 10 contain an unusually high number of potential T-cell epitopes. Of all 139 peptides, only 23 (17%) would we predict to bind promiscuously to the most common HLA-DRB1* alleles (defined as a score higher than 10). A file containing the complete EpiMatrix analysis of the 139 BEI peptides can be downloaded here: “EpiMatrix_H1N1 Peptide Analysis Data.xls“. A summary of the analysis  can be found under the tab “Overview- Sorted By Score”, or the results can be seen online by clicking here.

Immunogenicity Analysis of the BEI Parent sequence by EpiMatrix

While 23 of the peptides offered through BEI are predicted to be promiscuous, using EpiMatrix, we can identify an even smaller peptide set that more effectively covers all of the immunogenic regions of the HA protein. ClustiMer identifies areas of high epitope density within the sequence of a protein. Using ClustiMer, we found 16 peptides that score above 10 on our EpiMatrix scale. In addition, ClustiMer will identify immunogenic peptide sequences that contain flanking residues on the amino and carboxy terminals. Having an extra three amino acids on either end of the peptide has been proven to enhance immunogenicity. Many of the peptides provided by BEI have their most immunogenic regions at starting or ending at the terminal residues. This could result in missing many protective epitopes.

A summary of the peptides identified by ClustiMer are provided in the accompanying excel file “EpiMatrix_H1N1 Peptide Analysis Data.xls” under the tab “Best Peptides Class II Clusters, or the results can be seen online by clicking here.

Comparison of BEI sequences with clustered epitopes originally selected by EpiMatrix

Several of the peptides provided by BEI overlap with the clusters selected by EpiMatrix; however, there are two clusters identified by EpiMatrix, which are not represented by any of the peptides provided by BEI. In other words, the 16 peptides identified by EpiMatrix and ClustiMer are better able to cover all of the immunogenic regions of the H1N1 HA peptide. None of the BEI peptides score higher than their EpiMatrix equivalents; however, this is partly due to the fact that the EpiMatrix clusters are all slightly longer, thus containing more epitope density.

Tools used for this analysis

EpiMatrix. EpiMatrix, a T-cell epitope mapping algorithm developed by the principal scientists at EpiVax, screens protein sequences for 9 to 10 amino acid long peptide segments predicted to bind to one or more MHC alleles [[2], [2]]. EpiMatrix uses the pocket profile method for epitope prediction, which was first described by Sturniolo and Hammer in 1999 [[3]]. For reasons of efficiency and simplicity, predictions are limited to the eight most common HLA class II alleles and six “supertype” HLA class I alleles [ [4]]. EpiMatrix raw scores are normalized with respect to a score distribution derived from a very large set of randomly generated peptide sequences. Any peptide scoring above 1.64 on the EpiMatrix “Z” scale (approximately the top 5% of any given peptide set) has a significant chance of binding to the MHC molecule for which it was predicted. Peptides scoring above 2.32 on the scale (the top 1%) are extremely likely to bind; the scores of most well known T-cell epitopes fall within this range of scores [ [5], [6], [7]]. EpiMatrix has been successfully applied to the analysis of previously published epitopes [[8]], and in the prospective selection of epitopes from HIV [[9]], Mycobacterium tuberculosis [ [10]], Tularemia [ [11]] and vaccinia virus [[12]].

ClustiMer identifies “clustered” or promiscuous epitopes [9,10]. We have observed that potential T-cell epitopes are not randomly distributed throughout protein sequences but instead tend to “cluster” in specific regions. In our experience, T-cell epitope “clusters” range from 9 to roughly 25 amino acids in length and, considering their affinity to multiple alleles and across multiple frames, can contain anywhere from 4 to 40 binding motifs. We usually limit our class II epitope selections for vaccine design to selected “promiscuous epitopes” – epitopes that have the potential to be recognized in the context of more than one HLA [[13],[14],[15]].

EpiBars. Further, we have observed that many of the most reactive T-cell epitope clusters present a feature we refer to as an “EpiBar”. An EpiBar is a single 9-mer frame that is predicted to be reactive to at least four different HLA alleles. EpiBars may be a signature feature of highly immunogenic, promiscuous class II epitopes. Sequences that contain EpiBars include Influenza Hemagglutinin 307-319 (Cluster score of 18) and Tetanus Toxin 825-850 (Cluster score of 16). An example of an immunogenic peptide that contains an EpiBar is shown below. Note the horizontal bar of high Z scores at position 308.

Influenza Hemagglutinin
Typical EpiMatrix analysis. Z score (Top Percentages) indicates potential of a 9-mer frame to bind to a given HLA allele. All Z scores in the Top 5% (>1.64) are considered “hits”. **Though not hits, scores in the top 10% are considered elevated; scores below 10% are masked for simplicity. Frames containing four or more alleles scoring above 1.64 are colloquially referred to as “EpiBars” and are highlighted in yellow (see frame 308:YVKQNTLKL). This band-like pattern is characteristic of promiscuous epitopes. The influenza peptide scores extremely high for all 8 alleles in EpiMatrix; the deviation compared to expectation is + 17.62.


We have provided an analysis of the BEI peptides that may be of use for selection of limited sets of highly promiscuous epitopes (for Thelp). Several of these peptides overlap with clusters that were also selected by EpiMatrix on the parent sequence, comparisons between the ‘best clusters’ and the BEI peptides are provided.

There is no reason to suspect that these overlapping peptides provided by BEI are optimally configured for HLA presentation, and in fact they are not. If the desired binding 9-mer is too close to either the N- or c-terminus, the MHC-peptide interaction might be ablated [[16]]. Important epitopes such as the ones contained in peptides 37-51, 61-75, and 533-547 are being presented at the n-terminal or c-terminal extreme of a the peptide, while many others such as 117-131 and 165-179 have the most important 9-mer in the second or second to last frame of the peptide. When the peptides overlap by four as they do in the BEI set, stabilizing flanks may be missing, and their immunogenicity could be reduced.

We know from our own research that sub-optimally configured peptides lead to sub-optimal HLA binding and T cell activation. We would suggest that the epitopes identified in this analysis could be synthesized in addition to the epitopes provided through BEI. In-silico predictive tools, like EpiMatrix, could be used to prospectively design efficient epitope sets, which are more likely to be immunogenic [6].


[1] De Groot, AS, Jesdale, BM, Szu, E, Schafer, JR. An interactive web site providing MHC ligand predictions: application to HIV research. AIDS Res. and Human Retroviruses. 19 97;13: 539-541.

[2] Schafer JA, Jesdale BM, George JA, Kouttab NM, De Groot, AS. Prediction of well-conserved HIV-1 ligands using a Matrix-based Algorithm, EpiMatrix. Vaccine. 1998;16(19):1880-1884.

[3] Sturniolo, T, Bono, E, DingJ, Raddrizzani,L, Tuereci, O, Sahin, U, Braxenthaler, M, Gallazzi, F, Protti, MP, Sinigaglia, F, and Hammer, J. Generation of tissue-specific and promiscuous HLA ligand databases using DNA microarrays and virtual HLA class II matrices. Nature Biotech. 1999;17:555-561.

[4] De Groot AS, Knopf PM, Martin B. De-immunization of therapeutic proteins by T cell epitope modification. Mire-Sluis, A. Ed. State of the Art Analytical Methods for the Characterization of Biological Products and Assessment of Comparabilitiy. Dev. Biol. Basel, Karger, 2005;122:137-160.

[5] De Groot AS, McMurry J, Marcon L, Franco J, Rivera D, Kutzler M, Weiner D, Martin B. Developing an epitope-driven tuberculosis (TB) vaccine. Vaccine. 2005;23:2121-31.

[6] Moise L, McMurry JA, Buus S, Frey S, Martin WD, De Groot AS. In Silico-Accelerated Identification of Conserved and Immunogenic Variola/Vaccinia T-Cell Epitopes. http:/ Vaccine. 2009 Oct 30;27(46):6471-9

[7] De Groot AS, Martin W. Reducing risk, improving outcomes: bioengineering less immunogenic protein therapeutics. Clin Immunol. 2009 May;131(2):189-201.

[8] Meister GE, Roberts CG, Berzofsky JA, De Groot AS. Two novel T cell epitope prediction algorithms based on MHC-binding motifs; comparison of predicted and published epitopes from Mtb and HIV protein sequences. Vaccine. 1995;(6):581-91.

[9] Bond KB, Sriwanthana B, Hodge TW, De Groot AS, Mastro TD, Young NL, Promadej N, Altman JD, Limpakarnjanarat K, McNicholl JM. An HLA-directed molecular and bioinformatics approach identifies new HLA-A11 HIV-1 subtype E cytotoxic T lymphocyte epitopes in HIV-1-infected Thais. AIDS Res Hum Retroviruses. 2001;17(8):703-17.

[10] Dong, Y, Dimaria, S, Sun, X, Jesdale, BM, De Groot, AS. Rom, WN, Bushkin, Y. HLA-A2-restricted CD8+ cytotoxic T cell responses to novel Mycobacterium tuberculosis targets superoxide dismutase and alanine dehydrogenase. Infect. Immun. 2004;72(4):2412-5.

[11] McMurry JA, Gregory SH, Moise L, Rivera D, Buus S, De Groot AS. Diversity of Francisella tularensis Schu4 antigens recognized by T lymphocytes after natural infections in   humans: identification of candidate epitopes for inclusion in a rationally designed tularemia vaccine. Vaccine. 2007;25:3179-91.

[12] Otero M, Calarota SA, Dai A, De Groot AS, Boyer JD, Weiner DB. Efficacy of novel plasmid DNA encoding vaccinia antigens in improving current smallpox vaccination strategy. Vaccine. 2006; 24:4461-70.

[13] Sette A, Sidney J. HLA supertypes and supermotifs: a functional perspective on HLA polymorphism, Curr. Op. Immunol.. 1998;10:478.

[14] Panina-Bordignon P, Tan A, Termijtelen A, Demotz S, Corradin G, Lanzavecchia A. Universally immunogenic T cell epitopes: promiscuous recognition by T cells, European Journal of Immunology 1989 19:2237-2242.

[15] De Groot AS, Clerici M, Hosmalin A, Hughes SH, Barnd D, Hendrix CW, Houghten R, Shearer GM, Berzofsky JA. Human Immunodeficiency virus reverse transcriptase T helper epitopes identified in mice and humans: correlation with a cytotoxic T cell epitope, J. Infect Dis, 1991 164:1058-1065.

[16] Godkin AJ, Smith KJ, Willis A, Tejada-Simon MV, Zhang J, Elliott T, Hill AV. Naturally processed HLA class II peptides reveal highly conserved immunogenic flanking region sequence preferences that reflect antigen processing rather than peptide-MHC interactions. J Immunol. 2001 Jun 1;166(11):6720-7.