Lineage‐specific epitope profiles for HPAI H5 pre‐pandemic vaccine selection and evaluation

Background Multiple highly pathogenic avian influenza (HPAI) H5 viruses continue to co‐circulate. This has complicated pandemic preparedness and confounded effective vaccine candidate selection and evaluation. Objectives In this study, we aimed to predict and map the diversity of CD8+ T‐cell epitopes among H5 hemagglutinin (HA) gene lineages to estimate CD8+ T‐cell immunity in humans induced by vaccine candidates. Methods A dataset consisting of 1125 H5 HA sequences collected between 1996 and 2017 from avian and humans was assembled for phylogenetic and lineage‐specific epitope analyses. Conserved epitopes were predicted from WHO‐endorsed vaccine candidates and representative clade‐defining strains by pairwise comparison with Immune Epitope Database (IEDB). The distribution of predicted epitopes was mapped to each HPAI H5 lineage. We assume that high similarity and conservancy of predicted epitopes from vaccine candidates among all circulating HPAI H5 lineages is correlated with high immunity. Results A total of 49 conserved CD8+ T‐cell epitopes were predicted at 28 different amino acid positions of the HA protein. Mapping these epitopes to the phylogenetic tree allowed us to develop epitope profiles, or “fingerprints,” for each HPAI H5 lineage. Vaccine epitope percentage analyses showed some epitope profiles were highly conserved for all H5 isolates and may be valuable for universal vaccine design. However, the positions with low coverage may explain why the vaccine candidates do not always function well. Conclusions These findings demonstrate that our analytical approach to evaluate conserved CD8+ T‐cell epitope prediction in a phylogenetic framework may provide important insights for computational design of vaccine selection and future epitope‐based design.


| INTRODUCTION
Highly pathogenic avian influenza (HPAI) H5 virus causes a highly infectious disease in birds and severe respiratory infections in humans. 1 The HPAI H5N1 virus was first isolated from geese (prototype virus: A/goose/Guangdong/1/96) in China in 1996, and the first human infection occurred in Hong Kong in 1997. 2 Since its emergence, HPAI H5N1 virus has rapidly evolved into at least ten co-circulating lineages infecting many different host species. 3 HPAI H5 is endemic in domestic poultry populations throughout China, South-East Asia, and the Middle East with sporadic human outbreaks occurring in some countries. 4 The virus has also re-emerged numerous times, spilling over from wild birds to cause outbreaks in domestic populations in Europe, Africa, and North America. [4][5][6][7] Control of these outbreaks has resulted in hundreds of millions of domestic birds dying from illness or being culled. 8,9 Since 2003, the estimates of global loss from these outbreaks run into the billions of dollars. 9,10 In December 2014, a novel virus but carrying a different neuraminidase type (N2) containing the HA gene from HPAI H5N1 viruses was detected in North America leading to a widespread multistate outbreak among domestic poultry. 9,11 In addition to the risk of human infections and pandemic emergence, the loss of protein from decimated domestic flocks in H5 endemic regions is a severe threat to human health for resource-challenged communities. 12 A human-adapted pandemic H5 virus may emerge, either through reassortment with circulating human-adapted viruses or through evolutionary selection following repeated human infection. Even though human infection with HPAI H5 is relatively rare, when it does occur, the case fatality rate is high. From 2003 to 2014, 667 human cases have been identified, of which 58.9% were fatal. 13 Although there is no evidence of sustained human-to-human transmission, the rapid emergence of novel strains of H5 viruses from animal reservoirs combined with the sporadic human infections indicates a high risk for pandemic HPAI H5 emergence. Two recent studies independently demonstrated that as few as three to five mutations in HPAI H5N1 viruses could allow for the effective aerosol transmission between ferrets. 14,15 Furthermore, about 50% of novel HPAI H5 strains generated from reassortment between contemporary avian H5N1 and human H3N2 showed a high degree of compatibility and potential virulence. 16 Effective vaccines are important to mitigate the pandemic threat of HPAI H5 virus. For H5 virus, pre-pandemic vaccine candidate strains are selected based on genetic surveillance data and comparative antigenic profiles. 17 Viruses that stimulate broadly reactive antibodies against diverse strains should be ideal vaccine candidates. 18 Highly effective vaccine candidates for HPAI H5 are difficult to predict due to co-circulation of diverse lineages and genotypes.
One possible approach to develop highly cross-reactive vaccine candidates for HPAI H5 is based on the cytotoxic T-lymphocytes (CD8+ T cells), which tend to target more conserved influenza virus proteins (epitopes). This approach takes advantage of the memory CD8+ T cells that develop in response to vaccination or previous infection. 19,20 Evolutionarily conserved viral epitopes may be recognized by T cells or B cells/antibodies that have been previously primed by circulating influenza strains and can provide broad protection across different influenza types (also known as preexisting immunity). 21 The H5 HA protein is a major surface protein that plays a pivotal role in viral infection and the primary immune target. Given the high diversity of HPAI H5 lineages, CD8+ T-cell epitopes that are conserved across lineages may provide insights into whether or not a vaccine candidate may induce protection against other diverse or drifted HPAI viruses.
The CD8+ T-cell responses are important for viral clearance, promoting disease recovery and reducing disease severity. 22 Evidence from human 23,24 and animal 25 studies shows that preexisting memory CD8+ T-cell responses directed at conserved and/or cross-reactive epitopes can prevent subjects from severe influenza infection and provide a measure of protection for the immune naive host. 24 Although some epitope analyses, such as for H7N9, 26 H3N2, 27 and H1N1, 21,[26][27][28] have been reported, no systematic studies of conserved HPAI H5 CD8+ T-cell epitopes have been conducted.
In this study, we described the phylogenetic distribution of HPAI H5N1 epitopes binding to HLA-A and recognized by human CD8+ T cells and developed lineage-specific epitope profiles. Lineage-specific epitope profiles based on similarity to vaccine candidate epitopes may be used to evaluate the vaccine candidates for their potential capability of inducing memory CD8+ T-cell responses in humans. Furthermore, understanding the spatial and phylogenetic distribution of conserved and variable epitopes should be integrated into pre-pandemic planning and vaccine selection. Nucleotide sequences were aligned using MUSCLE v3.8. 29 After subsampling (criteria described in supplemental materials), a representative dataset was determined (n=1095; details in Table S3). In addition, we included 30 vaccine candidate strains selected by WHO for HPAI H5Nx 30 in our final dataset (n=1125; full taxa in Fig. S2).

| Conserved epitope prediction and Conservancy analysis
WHO vaccine candidates and clade-defining reference strains of HPAI H5 viruses were used to predict all possible conserved virus epitopes recognized by human CD8+ T cells based on the availability of a database of previously identified epitope regions. 31,32 Beginning with the start codon (ATG), nucleotide sequences of these regions were translated into protein sequences in BioEdit v7.2.5. 33 The protein sequences were uploaded into IEDB MHC-I Binding Predictions (http://tools.immuneepitope.org/mhci/) to predict epitopes that bind to human class I peptide-MHC supertypes. Conserved epitopes were predicted by searching the translated HA protein where each nineamino acid (aa) chain 34 was compared to previously identified CD8+ T-cell epitopes of influenza virus based on pairwise similarity.
Artificial neural network binding affinity (ANN IC 50 ) is one of the indices to measure the computational affinity for HLA-A of each epitope. Epitopes with IC 50 less than 50 nmol/L were recognized as strongbinding, while an IC 50 range between 50 and 500 nmol/L was considered weak binding. 34 For ease, epitopes with 9-aa were named based on the position of the first amino acid (aspartic acid, translated from nucleotides GAT) in the HA protein chain after removing the signal peptide (ie, name corresponds to aa position 1-552). The predicted epitopes were compared to the HA alignment of 1125 taxa dataset described above, in order to identify epitopes conserved among circulating H5 strains 35 (http://tools.immuneepitope.org/tools/conservancy/iedb_input). From this analysis, conserved epitopes corresponding to the predicted 9-aa epitope could be identified and mapped on to the estimated phylogeny.

| Validation of predicted epitopes against experimentally defined epitope
The predicted CD8+ T-cell epitopes of H5 HA protein in Table 2 column Epitope 1 (other mutants of epitopes in column Epitope 2-5 were not included in this comparison analysis) were screened against the Immune Epitope Database (IEDB) (www.immuneepitope.org) repository through Influenza Research Database (https://www.fludb. org/), which contains the experimentally defined epitope information present in the published literature. Comparing CD8+ T-cell epitopes against those experimentally confirmed epitopes would help in identifying those epitopes that are evolutionarily conserved. We assume that those epitopes that are conserved across lineages will induce some degree of immune protection against various lineages.

| Phylogenetic and coalescent analysis
The final dataset of 1125 nucleotide sequences of the HA gene were used to reconstruct the phylogeny of HPAI H5 with both maximumlikelihood (ML) and Bayesian phylogenetic methods. The ML phylogenetic tree was performed with RAxML using the GTR-GAMMA model. 36 No bootstrap analysis was conducted. The best-scoring ML tree was automatically generated from three runs by RAxML, and vis-

| Conserved epitope mapping
The distribution of conserved CD8+ T-cell epitopes across all HPAI H5 isolates was mapped onto the ML tree by new grouped clades (S-part II). The percentages of conserved epitopes presented in each clade were calculated and displayed as a pie graph for each epitope position. For diverse epitopes in one position, the percentage of each epitope was calculated, and the pie graph contained different colors to represent different epitopes in Table 2.
We defined the isolates from 01/01/2012 to 03/31/2017 as currently circulating strains. To understand the distribution of conserved CD8+ T-cell epitopes in these circulating strains at different geographic locations, a similar strategy of epitope mapping as above was used. Only the percentages of conserved epitopes were calculated for each geographic location, not for each clade.

| Clade-specific epitope profile
Clade-specific epitope profiles were calculated to examine how effectively the vaccine candidates can simulate conserved CD8+ T-cell epitopes across the diverse HPAI H5 viruses. For this analysis, we assume that high similarity and conservancy of predicted epitopes from vaccine candidates among all HPAI H5 lineages is correlated with high immunity. Thirty vaccine candidates covered clades 1, 2.1, 2.2, 2.3, 4, and 7; therefore, epitope profiles were created for these groups.
Conserved epitopes in one specific vaccine strain were examined to determine the degree of similarity of all HPAI H5 isolates in the vaccine-corresponding clade. Clade-specific epitope profile results were reported as the proportion of H5 strains in each clade that were resembled by each epitope in this vaccine candidate. For ease, individual proportion for each epitope in the vaccine candidates was shown as a heat map. Mean epitope coverage, as the measure for overall capability of one vaccine candidate to induce CD8+ T-cell immunity in humans, was calculated by averaging the proportions across all epitope positions.
Further, the epitope profile was created for currently circulating H5 strains. The same strategy as above was employed, but the denominator for the proportion is the number of currently circulating strains in each clade. The vaccine candidates selected for clades no longer circulating were not included in the epitope profile analysis.
The heat map of epitope profile for currently circulating strains is displayed in Fig. S1.

| Phylogenetic structure and evolutionary history of HPAI H5
Bayesian simulation of phylogenetic history using 1125 full-length H5 HA nucleotide sequences (Figure 1) showed lineage diversification consistent with the WHO/OIE/FAO H5N1 classification system. 40 The major clades, such as clade 1, 2.1, 2.2, 2.3, and 7, persisted through to the most recent sampling date included. A number of smaller clades did not persist and became extinct before 2010. The time of most recent common ancestors (TMRCA) for circulating HPAI H5 clades is presented in Table 1.
Contemporary strains, defined as those isolated between 01/01/2012 and 3/31/2017, comprised 253 isolates. They were found F I G U R E 1 Bayesian relaxed clock phylogenetic tree highlighting HPAI H5 strains isolated since 2012. Red branches represent vaccine candidates. Strains isolated since 2012 are defined as currently circulating strains in this study. Tree nodes annotated with stars are the most common ancestor between currently circulating strains and their closest vaccine candidates. TMRCAs were shown in Table 3 Clade Fewer (14.62%) belonged to clade 2.2, which displayed a ladderlike pattern, suggesting that most variants did not persist in the populations or regions surveyed. While 9.49% of these isolates belonged to clade 1, the remainder belonged to clades 7 (3.56%) and 2.1 (2.37%).

| Prediction of conserved CD8+ T-cell epitopes
Based on the 30 vaccine candidates selected by the WHO (Table S4) and 28 clade-defining strains (Table S5)  The study aimed to provide information for vaccine development, and the number of conserved viral epitopes predicted to react with CD8+ T cells was specifically displayed for each vaccine candidate and its clade category (  The number of conserved epitopes in each vaccine strain was similar across different clades; that is, each vaccine candidate contained 13-21 conserved viral epitopes recognized by CD8+ T cells with two to four strong-binding epitopes. In contrast, the number of weak-binding epitopes was approximately three to five times that of the strongbinding epitopes for each vaccine. Importantly, the epitopes that we identified could be validated through comparison with experimentally identified CD4+ T-cell epitopes, revealing that 43% (12/28) of our predicted CD8+ T-cell epitopes in Table 2 column Epitope 1 were conserved with previously identified CD4+ T-cell epitopes (Table S6).

| Epitope distribution across all H5 clades on phylogenetic tree
The distribution of CD8+ T-cell epitopes across all HPAI H5 clades was mapped to the tips of the ML phylogenetic tree (Figure 2). The

| Epitope profile of clade-specific vaccine candidates
Epitope profiles were examined for HPAI H5 isolates to estimate the capability of each vaccine candidate to induce immunity. All 30 vaccine candidates were compared to all isolates to identify homologous epitope profiles in each corresponding clade (Figure 4). For 29 vaccines candidates of clades 1, 2.1-2.3, and 7, the profile was calculated only for currently circulating strains (Fig. S1).
The majority of epitope positions in each vaccine candidate had very high proportions (blue, Figure 4) that simulate all clade-specific  Conventionally, the position of starting amino acid along HA protein sequence after removing the signal peptide was used to identify the epitope. Bold and underlined format represents strong-binding epitope. The epitope in each column was coded as "1," "2," "3," "4," "5," accordingly. For example, in the column "EPITOPE 2," if epitope SLDGVKPI was present in position 43 in a certain H5N1 strain, then it was coded as "2" for position 43.
T A B L E 2 Strong-and weak-binding HPAI H5 epitopes included in this study Further epitope profile analysis for currently circulating H5 strains showed similar results (Fig. S1) To connect epitope profile with phylogenetic information, the TMRCAs between the currently circulating strains and their closest vaccine candidate (Table 3) were reported for each clade based on the Bayesian phylogenetic tree (black stars, Figure 1). Generally, the majority of TMRCAs were close to recent years ranging from January 2008 to

| DISCUSSION
In this study, we estimated the diversity and spatial distribution of HPAI H5 epitopes binding to HLA-A and recognized by human CD8+  12,41 The potential severity of an H5 pandemic remains a major concern. 42 One strategy to mitigate large-scale loss of life is to stockpile effective vaccine candidates that can be deployed in the event of an H5 pandemic occurrence. Combining our phylogenetic analysis with lineage-specific mapping of CD8+ T-cell epitopes across all HPAI H5 clades provides a unique CD8+ T-cell epitope fingerprint that allows for important conserved characteristics to be easily identified. In addition, data generated through genetic surveillance may be utilized for crude, but rapid predictions of effective vaccine candidates against emerging lineages and pandemic risk assessment.
An ideal pre-pandemic vaccine candidate strain would contain all important characteristics of specific strains to be effective, yet still share enough characteristics with other circulating viruses to induce broad protection against as many potentially pandemic viruses as possible. 43 CD4+/CD8+ T-cell epitopes represent heritable phenotypic characteristics that are recognized by the host immune system. CD8+ T-cell epitopes for internal genes and CD4+ T-cell epitopes for HA were mostly T A B L E 3 The divergence times estimated between currently circulating HPAI H5 strains and their closest vaccine candidate Nodes: Ancestral nodes between currently circulating strains and their closest vaccine candidate, marked as black stars in Figure 1. MC, Mean overall coverage for each vaccine candidate from Fig. S1.
a The dates are presented as decimal years. TMRCA, the time of most recent common ancestor; 95% BCI, 95% Bayesian credible interval.
studied for epitope-based vaccine design. 19,22,44,45 Studies showed that influenza HA does have conserved CD8+ T-cell epitopes. 21 For the evaluation of immunity induction of vaccine candidates selected by the WHO, our results of highly conserved epitopes indicate potential cross-reactivity of vaccine candidates to circulating viruses in other clades. In our analysis, assuming that high similarity and conservancy of predicted epitopes from vaccine candidates among all HPAI H5 lineages is correlated with high immunity, we can predict that vaccine candidates in clades 2.3 and 7 might not stimulate broadly reactive immune protection. Vaccine candidates are often selected based on antigenic divergence from the population mean. 48 However, the rate of antigenic divergence is a function of epidemic size, duration of outbreaks, and geographic structuring of lineages. 48,49 In addition, the long-term circulation of some lineages may result in older vaccine candidates stimulating less specific epitope coverage for currently circulating strains due to genetic drift. 50  Non-reactive Epitope Epitope 1 in Table 2 Epitope 2 in Table 2 Epitope 3 in Table 2 Epitope 5 in Table 2 Missing data Legend: Epitope 4 in Table 2 to balance the contribution of outbreaks from different locations and years. In addition, the prediction of human epitopes is based on vaccine candidates that have primarily been collected from avian hosts.
Host adaptation mutations, including silent mutations associated with changes in codon usage bias, have not been considered in our epitope prediction. Moreover, an analysis focused only on HA protein has limitations. 52 Many studies 19,52 reported that the immunodominance of T cellular responses from the T-cell epitopes derived from internal proteins, as they are highly conserved. Hence, to identify all potential vaccine candidate epitopes (from all types of proteins) for HPAI H5, it would be worth conducting further studies that consider the inductive potential of other viral proteins. Even though conserved CD8+ Tcell epitopes were predicted from genetic motifs, which may provide insight into developing a universal vaccine, inaccuracies from epitope prediction programs may generate misleading results that need to be confirmed experimentally. 43,53 A clear limitation of sequence-based inference is that the utility of these highly conserved epitopes to induce broadly protective immunity needs to be experimentally tested.
Studies in animal models or in vitro human CD8+ T cells are necessary and will help confirm the true immunogenic potential of HA epitopes.
Currently, commercial inactivated whole-virus vaccines are produced by exposing the virus of interest to a cross-linking agent such as formaldehyde or an alkylating agent such as beta-propiolactone.
Incorporating the diversity of predicted epitopes may be useful for computationally optimized design of broadly cross-reactive vaccine candidates including the development of T-cell epitope-based vaccines. 54 With experimental confirmation of true immunogenic CD8+ T-cell epitopes, the spatial and phylogenetic distribution of conserved and variable epitopes should be integrated into pre-pandemic planning and vaccine selection to ensure broad reactivity and potential effectiveness. The frequency of expression of different HLA alleles varies across different ethnicities. 55 Future studies integrating HLA allele frequency, host demographics, epitope predictions, whole viral proteome, and viral genomic diversity into computational models may provide novel insights into vaccine effectiveness and risk assessment of potential pandemic.

ACKNOWLEDGEMENT
This study was partially funded by the National Institutes of Health (NIH) Centers for Excellence in Influenza Research and Surveillance (contract #HHSN272201400006C). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.  Table 2 for each position 18  Non-reactive Epitope Epitope 1 in Table 2 Epitope 2 in Table 2 Epitope 3 in Table 2 Epitope 5 in Table 2 Missing data Legend: Epitope 4 in Table 2 F I G U R E 4 Heat map showing homology to epitope profiles predicted from clade-specific vaccine candidates among contemporary HPAI H5 circulating viruses. Clade-specific epitope profile reported as the proportion of H5 strains in each clade with high similarity to epitope profiles of the vaccine candidate. The color range represents the proportion from 0.00 to 1.00 in the heat map. The mean epitope coverage was calculated from averaging the proportions in different epitope positions 18