A serum protein signature of APOE genotypes in centenarians

Abstract The discovery of treatments to prevent or delay dementia and Alzheimer's disease is a priority. The gene APOE is associated with cognitive change and late‐onset Alzheimer's disease, and epidemiological studies have provided strong evidence that the e 2 allele of APOE has a neuroprotective effect, it is associated with increased longevity and an extended healthy lifespan in centenarians. In this study, we correlated APOE genotype data of 222 participants of the New England Centenarian Study, including 75 centenarians, 82 centenarian offspring, and 65 controls, comprising 55 carriers of APOE e 2, with aptamer‐based serum proteomics (SomaLogic technology) of 4,785 human proteins corresponding to 4,137 genes. We discovered a signature of 16 proteins that associated with different APOE genotypes and replicated the signature in three independent studies. We also show that the protein signature tracks with gene expression profiles in brains of late‐onset Alzheimer's disease versus healthy controls. Finally, we show that seven of these proteins correlate with cognitive function patterns in longitudinally collected data. This analysis in particular suggests that Baculoviral IAP repeat containing two (BIRC2) is a novel biomarker of neuroprotection that associates with the neuroprotective allele of APOE. Therefore, targeting APOE e 2 molecularly may preserve cognitive function.

cognitive function (Corder et al., 1993;Liu, Liu, Kanekiyo, Xu, & Bu, 2013). The e 3 allele is the "neutral allele" in many ethnicities, while e 2 is the least common allele that emerged as a longevity variant when Schachter et al. noted an increased frequency of e 2 in French centenarians (Schachter et al., 1994). Since then, several studies have provided evidence that e 2 has a beneficial neuroprotective effect (Kim et al., 2017), decreases neuroinflammation (Dorey, Chang, Liu, Yang, & Zhang, 2014), and promotes longevity Sebastiani et al., 2018) and healthy aging (Kulminski et al., 2016;Wu & Zhao, 2016). Therefore, we hypothesize that targets of this allele may lead to the discovery of treatments that help maintain good cognitive function and escape cognitive impairment with aging.
Despite multiple research efforts, the biological mechanisms associated with variants of APOE, particularly the e 2 allele, are still unclear.
One strategy to understand the paths linking APOE alleles to phenotypes is to examine their biological products, starting, for example, from the list of genes that are in cis and in trans with APOE alleles. The challenge of these analyses is the tissue specificity of the results, the relative rarity of carriers of the e 2 allele in the population, and the fact that relevant tissues like brain are not easily accessible. Recent studies have shown that the APOE protein and additional proteins associated with APOE genotypes can be detected in serum (Emilsson et al., 2018) and plasma (Muenchhoff et al., 2017;Rezeli et al., 2015;Simon et al., 2012;Sun et al., 2018), thus opening the way to new research avenues to both decipher the mechanisms linking genotypes to phenotypes, and to provide sensitive, noninvasive biomarkers of AD or progression of cognitive decline or protection from these phenotypes.
In this study, we leveraged the over-representation of e 2 in centenarians and their offspring to correlate APOE genotype data of 222 participants of the New England Centenarian Study, including 75 centenarians, 82 centenarian offspring, and 65 controls, comprising 55 carriers of APOE e 2 , with aptamer-based serum proteomics (SomaLogic technology) of 4,785 human proteins corresponding to 4,137 genes. We discovered and replicated a list of 16 proteins that associate with different APOE genotypes and map to different gene expression profiles in brains of LOAD and healthy controls. We also showed that some of the proteins in the signature correlate with patterns of cognitive function.

| NECS
The New England Centenarian Study (NECS) is a study of centenarians, their long-lived siblings, offspring, and controls who are either individuals with one parent who died age 72-74 (average life expectancy for the centenarian birth cohort) or spouses of centenarian offspring (Sebastiani & Perls, 2012). The study began by recruiting centenarians in the Boston metropolitan area in 1994 and expanded in the late 1990s to include North America and English speaking countries. The age of participants is carefully validated (Young et al., 2010), and participants are followed up annually to assess their health, physical, and cognitive functions. The cognitive assessment in centenarians is administered annually using the 37-point Blessed Information-Memory-Concentration (BIMC) test (Kawas, Karagiozis, Resau, Corrada, & Brookmeyer, 1995), and in centenarians' offspring and controls cognitive testing is performed every other year using the Telephone Interview for Cognitive Status (TICS). TICS is based on 12 items with a maximum of 51 points and assesses orientation to time and place, episodic memory, language, and working memory (Brandt, Spencer, & Folstein, 1988). An abbreviated version of TICS that includes the tasks of counting backwards, subtracting sevens, and immediate and delayed word list recall with a maximum of 27 points was validated against detailed in-person neuropsychological testing and clinician adjudication (Crimmins, Kim, Langa, & Weir, 2011). All subjects provided informed consent approved by the Boston University Medical Campus IRB.

| InChianti
The Invecchiare in Chianti (InCHIANTI) study is a population-based prospective cohort study aimed at identifying factors that influence mobility with age located in the Chianti region in Tuscany, Italy (Ferrucci et al., 2000). Briefly, 1,453 individuals were randomly selected based on city registries and ranged in ages from 20 to 102 years old. Overnight fasting blood and plasma samples were stored for genomic DNA extraction and measurement of plasma proteins. The study protocol was approved by the Italian National
Serum samples were selected from 227 participants (79 centenarians, 83 offspring, and 65 controls) who were alive at least 1 year after the blood draw and were free of major aging-related diseases at least 1 year from the time of the blood draw (Table 1). The 227 serum samples from the NECS biorepository were assayed with 5,034 SOMAmers. The samples were randomized into analytic batches of 84 samples or less and the plates were assayed as a set, to avoid biases from technical procedures and sample processing.
The SOMAscan results passed a quality control assessment for median intra-and interassay variability, CV ≤ 15%, similar to variability previously reported in the SOMAscan assays (Candia et al., 2017).
Proteomic profiles of 987 plasma samples from InChianti were assayed using the 1.3K SOMAscan Assay at the Trans-NIH Center for Human Immunology and Autoimmunity, and Inflammation (CHI), National Institute of Allergy and Infectious Disease, National Institutes of Health. The experimental process utilized in the proteomic assessment and normalization was consistent with previously | 3 of 12 reported experiments with the same technology (Tanaka et al., 2018). The relative abundance of proteins in plasma samples corresponds with the abundance of SOMAmer reagents. The data readout from the SOMAscan-based proteomics is relative fluorescence units (RFUs) and is directly proportional to the reported relative abundance of SOMAmer reagents.

| SNP genotyping
APOE alleles were inferred from SNPs rs7412 and rs429358 that were either genotyped using real-time PCR in 2,010 NECS participants, or imputed using IMPUTE2 in participants for whom additional DNA was not available but genome-wide genotype data were available (). Genotype data were available for 222 subjects. In the InCHIANTI, APOE genotyping of two SNPs rs7412 and rs429358 was completed using TaqMan assay (Applied Biosystems, Inc. [ABI]).

| Statistical analysis
We identified three outlier samples in the set of 227 using principal component analysis that were removed from the subsequent analyses. The expression data of the 4,785 proteins were log-transformed and, for each protein, values in excess of three standard deviations from the mean were removed. The association of each protein with the genotypes of APOE were analyzed using a fixed-effect ANCOVA model, adjusted for sex, age of the serum sample, and age of the participant at blood draw. In particular, the following ANCOVA model was fitted for each of the analytes.
where the dummy variables x g denote carriers of one of the APOE genotype g = e 2 e 2 ,e 2 e 3 ,e 2 e 4 ,e 3 e 4 , and β g represents the log-transformed fold change of the analyte comparing carriers of the genotype g relative to carriers of the common genotype e 3 e 3 .
We selected significant proteins based on the F test, with 4 and 214 degrees of freedom, and used a false discovery rate (FDR) < 0.01 as level of significance to correct for multiple testing with Benjamini-Hochberg. Qvalues were calculated using the qvalue package in R. For comparison with published protein quantitative trait loci (pQTL), we analyzed the association between the two SNPs rs7412 and rs429358 and the expression of the significant proteins in the 222 NECS samples using regression of the log-transformed expression adjusted for sex and age at blood draw. We analyzed the associations of SNPs rs6857, rs769449, and rs2075650 in the same locus of APOE and the level of the significant proteins, after adjusting for the APOE genotype, sex, and age at blood draw. The rationale for choosing these three SNPs was that they have been reported in the longevity literature as possible genetic variants of longevity and healthy aging, with effects independent of the APOE alleles (Sebastiani et al., 2018). We correlated the expression of the log-transformed proteins associated with APOE genotypes with longitudinal change of TICS using linear regression adjusted for APOE genotypes, sex, age, and education, and we estimated the regression coefficients using the generalized estimating equations to account for repeated measures.

| Replication
The results were replicated in three independent datasets.
log y protein ∼ 0 + gender x gender + year x year + age x age + ∑ g g x g ,  (33) 17 (38) Note: Numbers in the first row represent genotype counts, stratify by subject type. Numbers are mean and standard deviation in parenthesis. Abbreviations: cent, centenarians; MI, myocardial infarction; offs, offspring; contr, controls.

| Replication in InCHIANTI
Five of the 16 proteins were measured in 987 plasma samples from participants of InCHIANTI using the same SOMAscan technology and a platform with 1,301 proteins. The RFU values were natural log-transformed, and outliers outside 3SD were removed. The association between the log-transformed level of the five proteins and APOE genotypes were analyzed using regression adjusted for age, sex and study site (Chianti or Ripoli).

| Replication in published pQTLs in plasma and serum
We extracted all significant associations between SNPs rs7412 and rs429358 and cis-and trans-proteins discovered in 3,301 plasma samples described in (Sun et al., 2018), and in 5,457 serum samples described in (Emilsson et al., 2018). We compared the results with the association between the two SNPs and the expression of the 16 proteins in the 222 NECS samples. The results of the comparison are in Table 4.

| In silico validation
We evaluated the 16 genes corresponding to the APOE signature in gene expression data of postmortem brain tissue from three using samples across all brain tissues, as well as within specific brain region tissues. Additionally, the skewedness of the LOAD patients' distribution-that is, the over-representation of LOAD patients among lower APOE scores-was assessed by a nonparametric one-sample KS test of deviation from the uniform distribution. The KS score and its significance were computed with the R function ks.test.
All analyses were conducted in R V3.5.  genotype of APOE that is more prevalent in healthy agers and centenarians (Sebastiani et al., 2018). The ages of study participants varied between 45 years and 114 years, but mean age per genotype groups was comparable. By design, the participants included in this study were healthy and survived at least 1 year beyond the blood draw. Table 2 shows the list of 16 proteins significantly associated with the APOE genotypes at 1% FDR. Table S1 shows The heatmap in Figure 1c shows good separation of samples by  Table S2. We annotated the list of 16 proteins in the signature using DAVID (Huang et al., 2009), with background restricted to the list of proteins assayed in the SOMAscan array. The analysis showed that the signature was enriched with proteins that contain at least one coiled coil domain (FDR < 0.1%) that are highlighted in Table 2.

| Replication
We investigated the association between the APOE genotypes and levels of expression of five of the 16 proteins that were also measured in 987 plasma samples of InCHIANTI participants. The results in Table 3 show significant replication of the associations with APOE, APOB, and CRYZL1, with consistent effects, while PSME1 and CKAP2 did not show any variation by APOE genotypes in this set.
We estimated the associations between the two SNPs and the 16 proteins listed in Table 2 in the 222 NECS serum samples, and Table 4 compares the results with the results published in serum (Supplement Heppner et al., 2015in (Emilsson et al., 2018), and plasma (Supplement table in (Sun et al., 2018)). For all but APOB, we found a significant association with rs7412 or rs429358 in the NECS serum samples, and all the associations were significantly replicated in either one or both studies with consistent effects. Note that the protein scan in plasma used a reduced SOMAscan array covering less than 3,000 proteins, thus limiting the replication set, while the protein scan in serum used a SOMAscan array comparable to the one used in our study. The genetic effects reported in (Emilsson et al., 2018) were estimated after using a Yeo-Johnson transformation of the protein data to improve normality; hence, the effects are not directly comparable to our analysis in which we used a log-transformation of the protein data. However, the directions of effects are all consistent.
Interestingly, this analysis showed a significant replication of the effect of the T allele of rs7412 with levels of PSME1 and CKAP2 that is consistent with overexpression of these two proteins in carriers of the e 2 allele but failed to replicate in InCHIANTI.
To evaluate whether additional SNPs in the APOE locus could explain the association between the APOE genotypes and the 16 proteins in the signature, we analyzed the association between the 16 proteins and each of the SNPs rs6857, rs769449, and rs2075650, adjusting for sex, age at blood draw, and the APOE genotypes. While the association between the APOE genotypes and each of the 16 proteins remained significantly associated, none of the SNPs rs6857, rs769449, and rs2075650 was a significant pQTL for these proteins in the multi-SNP analysis (Table S3).

| Association of the APOE signature to LOAD status in brain tissues
We evaluated the APOE signature in gene expression data of postmortem brain tissue from 129 LOAD patients and 101 healthy controls  to test the hypothesis that the serum protein signature corresponds to distinct gene expression signatures in brains of LOAD patients and healthy controls. Figure Figure 2b confirm that the LOAD subjects have a significantly lower signature score than the controls, both across and within brain regions (p < 2.2E−6).

| Effect on cognitive function
We  Note: Columns e2e2, e2e3, e2e4, e3e4 report fold change of protein level relative to e3e3, and p + is p-value from F test with 4 and 214 degrees of freedom after adjusting for sex, age at blood draw, and length of sample storage. Columns E2 and E4 report fold changes for genotype groups E2 = e2e2 or e2e3, E4 = e2e4, e3e4, e4e4, relative to e3e3 and p and p ++ are p-values from T test for age, sex and study site.
nominal significant association with TICS score (Table 5, p < .05), and PSME1 showed borderline significant association with TICS score (p = .0751). The associations between BIRC2, CEP57, KMT2C, and APOE remained significant even after correction for multiple testing (p < .05/16 = 0.003). Increasing levels of BIRC2, PSME1, CEP57, and LRRN1 were associated with increasing TICS score in carriers of one or more e 2 alleles, while increasing values of CTF1 were associated with decreasing TICS score in the same genetic background (Figure 3). The analysis also showed that in carriers of one or more e 4 alleles increasing values of KMT2C, CEP57, and LRRN1 were associated with increasing TICS score, while increasing levels of APOE were associated with decreasing TICS score. We also fitted models with interactions between protein levels and age at TICS to test the hypothesis that levels of proteins in the signature modify the rate of change of TICS score, but none of the interactions reached statistical significance. The interaction between BIRC2 levels and age reached borderline statistical significance (p = .057) in carriers of one or more e 2 alleles.

| D ISCUSS I ON
Comprehensive measurement of the proteome in a large number of samples has been challenging. The SomaLogic aptamer-based technology has emerged in the last few years as a robust, high throughput assay for quick and scalable measurement of protein levels (Davies et al., 2012). Recent publications have shown the richness of the proteome of human plasma and serum and the fact that proteins expressed in serum can be helpful to detect potential regulatory mechanisms (Emilsson et al., 2018), as well as to discover accessible diagnostic and prognostic biomarkers (Hathout, 2015). In this work, we combined access to the largest plex aptamer-based proteomic platform available, and a relatively large number of serum samples from a unique population enriched for carriers of e 2, healthy agers and extreme survivors to discover a novel protein signature of the APOE genotypes. The signature includes known cis-and transproteins associated with APOE genotypes (Muenchhoff et al., 2017), namely APOB and APOE, and 14 proteins not previously associated with specific APOE genotypes. We replicated part of the results in plasma proteins that were profiled using a smaller platform based on the same technology (Tanaka et al., 2018), and we also showed agreement between our results and pQTLs in serum and plasma discovered in the APOE locus in much larger studies (Emilsson et al., 2018;Sun et al., 2018).
Some of the proteins picked up are particularly noteworthy. The signature includes APOE and APOB that are specifically expressed in brain, liver, gastrointestinal tissues, and skin, while the other 14 proteins are expressed in multiple tissues and have a variety of biological functions described in Table S2. Baculoviral IAP repeat containing two (BIRC2, also known as cIAP1) is a member of a family Note: NECS: beta coefficients and standard errors estimated from linear regression of log-transformed protein data, adjusted for age and sex. Science (Serum): beta coefficients estimated from linear regression of Yeo-Johnson transformation of protein data (Emilsson et al., 2018). Cis-and trans-effects available only for rs7412. Nature (Plasma): beta coefficients and standard errors estimated from fixed-effect inverse-variance meta-analysis of two cohorts (Sun et al., 2018). Cohorts specific results reported beta coefficients of SNPs on log-transformed protein levels, adjusted for sex, age, BMI, and additional covariates. Data of missing proteins were not available.
BIRC2 is a regulator of the noncanonical NF-kappaB pathway (Mak et al., 2014), though a positive regulator of canonical NF-kappaB signaling (Hinz et al., 2010). Alzheimer's disease has been associated with inflammation in the brain (Heppner, Ransohoff, & Becher, 2015;Kinney et al., 2018), so the finding that this ligase is associated with better outcomes is of interest. S100 calcium binding protein A13 (S100A13) is a member of the S100 family of proteins involved in several biological functions and interacts with the receptor for advanced glycation end products (RAGE) (Rani, Sepuru, & Yu, 2014). RAGE activation is associated with neuroinflammation and neurodegeneration, and although the mechanism remains unclear, there is strong evidence supporting a role of RAGE in several neurodegenerative diseases including AD (Ray, Juranek, & Rai, 2016). VPS29, retromer complex component, belongs to a group of vacuolar protein sorting (VPS) genes that may be related to AD pathology (Vieira et al., 2010). Tubulin folding cofactor A (TBCA) is involved in the pathway leading to correct folding of beta tubulin. Some literature on AD pathogenesis suggests an interaction between tubulin and tau (Puig, Ferrer, Luduena, & Avila, 2005;Salama et al., 2018) and the association between TBCA and APOE genotypes suggests a mechanism of genetic regulation. CRYZL1 is the product of the gene CRYZL1 in chromosome 21 and includes a NAD(P)H binding site. Interestingly, chromosome 21 trisomy is associated with higher risk for early onset of AD, and NAD(P)H oxidase is upregulated in AD brain (Block, 2008).
LRNN1 (leucine-rich neuronal protein) is a secreted protein that has been previously associated with Alzheimer's disease by RNA (Bai et al., 2014). In chicks, LRRN1 is required for the formation of the midbrain (Tossell et al., 2011), but it seems its function is to define neuronal boundaries, so perhaps its function is inhibitory for differentiated neurons. LRNN1 is also highly expressed in unfavorable neuroblastoma, which also suggests a negative role in neuronal differentiation, though a positive inducer of proliferation (Hossain et al., 2012). Another secreted protein that appears to be upregulated in carriers of the e 4 allele is cardiotrophin-1 (CTF1).
This result is surprising since two separate papers found overexpression of this gene to be protective in mice models of Alzheimer's Wang, Liu, Liu, Li, & Wang, 2017). Our findings do not suggest this, although of course the upregulation might be "compensatory." Cardiotrophin-1 acts through the IL-6 receptor and is therefore an activator of inflammatory signaling. Perhaps it is not surprising after all to learn that a pro-inflammatory molecule is positively associated with Alzheimer's disease.
Nine of the 16 proteins in the signature include a coiled coil domain. Compared to 472 proteins with a coiled coil domain in the list of annotated 4,127, the inclusion of nine in 16 represents an almost fivefold enrichment (p-value .006 from Fisher's exact test). Coiled coil domains are potentially involved in aggregation of amyloid (Fiumara, Fioriti, Kandel, & Hendrickson, 2010), and this suggests that a possible neuroprotective mechanism associated with the e 2 could be to limit accumulation of β-amyloid. Another surprising F I G U R E 2 Projection of the APOE signature in brain RNA samples. a) Heatmap of the signature score generated with GSVA in 690 brain samples. UP-DN = all 16 genes; UP: nine genes with proteins that increase levels in carriers of the e 2 allele; DN: seven genes with proteins that increase levels in carriers of the e 4 allele. Displayed above the heatmap is the plot corresponding to the KS test assessing the significance of the skewedness in the distribution of LOAD patients toward low levels of the UP-DN score (see Methods). b) Boxplots of the signature score in brains of the LOAD patients (pink) and healthy controls (blue aspect of the result is that some of these proteins are thought to be intracellular. It is not uncommon, however, to find cytoplasmic proteins in sera by SOMAscan (Geyer et al., 2016) For example, BIRC2 encodes an E3 ubiquitin ligase, which does not contain a signal sequence but its upregulation in the blood may be a general indicator of increased expression body-wide.
Emilsson and coauthors used a SOMAscan platform with 4,785 proteins to profile the serum of 5,457 serum samples and used scalefree network analysis to show that serum protein clusters in a small number of modules (Emilsson et al., 2018). They showed that a cluster of SNPs in the APOE locus is associated with a lipoprotein enriched module of 27 serum proteins that share APOE, TBCA, APOB, S100A13, CRYZL1, C5orf38 with our signature (Emilsson et al., 2018).
Our analysis shows that the overlapping associations are attributable to the specific APOE genotypes, rather than other genetic variants in the same locus. In addition, Emilsson's work suggests that protein modules in serum have a 37.3% agreement with gene expression signatures in various tissues. Consistent with their results, our analysis shows that the expression of the genes associated with the proteins in the APOE signature produce brain transcriptional profiles that distinguish AD patients from healthy controls. This result suggests that the protein signature in serum could be a candidate biomarker for AD resistance, diagnosis and possibly prognosis. However, the value of the signature as serum-biomarker of AD needs to be assessed and replicated in larger samples to establish its clinical value.
The small sample size of this study and the selection of healthy subjects gave limited power to correlate the protein signature with aging markers. However, we are able to show that seven of the proteins in the APOE signature are associated with TICS score in particular genetic backgrounds, suggesting that these proteins have a predictive value in addition to the putative neuroprotective role of the e 2 alleles of APOE and could be novel targets for neuroprotective interventions. For example, the pattern of expression of BIRC2 by APOE genotypes (Figure 1) and the strong association between BIRC2 and TICS score in carriers of one or more e 2 alleles (Table 5) suggest that this protein may be expressed only in carriers of one or more e 2 alleles and that additional factors contribute to its varying expression level that positively correlates with better cognitive function. The patterns of PSME1 and CEP57 and the positive correlation with TICS in carriers of one or more e 2 alleles suggest that compounds that increase these protein levels could also lead to neuroprotection. The pattern of association between LRRN1 and TICS score however is less clear since in both carriers of the e 2 alleles and the e 2 alleles, increasing values of LRRN1 predict higher TICS scores. This protein needs more in-depth characterization in order to understand its role relative to APOE genotypes and cognitive status.
The advantage of working with serum and plasma proteins is that blood is easily accessible and therefore ideal for biomarker discovery, but the lack of tissue specificity may challenge the understanding of the biological mechanisms. Recently established bioinformatic resources of human proteins, for example, the human protein atlas (;Thul & Lindskog, 2018;), provide detailed annotation of protein expression in more than 80 tissues and cell types and can be used to help generate hypotheses of the biological mechanisms that F I G U R E 3 Scatter plots of TICS score (y-axis) versus RFU of the seven proteins listed in Table 5. Red: carriers of E2, green: carriers of E4, black: carriers of E3. APOE genotype groups were defined as in 5 translate genetic variants into expressed phenotypes. A constant debate in the field is whether serum (lacks clotting factors) or plasma should be used. We could replicate 9 of the associations discovered between serum proteins and APOE genotypes in plasma, but no data were available to test the association of BIRC2, CEP57, KIN, S100A13, C5orf38, CKAP, and UBA2 in plasma. Soares et al (Soares et al., 2012) identified a signature of APOE genotypes in plasma that included APOE and APOB, and additional proteins CXCL9 and IL13.
Levels of IL13 were not associated with APOE genotypes in any of the SOMAmers included in the platform. We detected a statistically significant association between levels of CXCL9 and APOE genotypes (p = .0015) but with inconsistent effects.