Identification and validation of an eight‐gene expression signature for predicting high Fuhrman grade renal cell carcinoma

Clear cell renal cell carcinoma (ccRCC) is a malignancy with heterogeneous outcomes. Currently, renal mass biopsies are commonly employed to extract disease characteristics and aid prognosis. Although the pathological diagnosis of malignant disease is accurate in contemporary reports, the classification of Fuhrman grade using biopsy specimens remains far from promising. To generate a gene signature to distinguish high‐grade ccRCC, we used the cancer genome atlas (TCGA) database to develop a gene expression signature for distinguishing high‐grade (G3/4) from low‐grade (G1/2) disease. The expression profile was further validated for performance and clinical use in 283 frozen renal cancer samples and 127 ex vivo renal mass biopsy samples, respectively. The area under curve (AUC) was used to quantify discriminative ability and was compared using the De‐long test. Using the discovery dataset, we identified a 24‐gene signature for high‐grade disease with an AUC of 0.884. After applied to the development dataset, an eight‐gene profile was defined and achieved an AUC of 0.823. Accuracy of eight‐gene panel was maintained in the renal mass biopsies (RMB) samples (AUC = 0.821). In summary, using three‐stage design, we validated an eight‐gene expression signature for predicting high Fuhrman grade of ccRCC. This tool may help to reveal the characteristics of ccRCC biopsy specimens.

Alternatively, renal mass biopsies (RMBs) may be considered for small lesions in selected individuals and for advanced RCC to establish RCC diagnosis. 8 Pathological features from RMBs may guide active surveillance strategies, cryosurgery and radiofrequency ablation in early RCC and first-line therapy in advanced RCC. However, sample error and tumor heterogeneity have contributed to the inaccuracy of RMBs. 9 Although the biopsy failure rate was reduced by improved biopsy approaches, [10][11][12][13] the precise grading of ccRCC remains difficult.
Recent studies have attempted to characterize the molecular basis of aggressiveness and clinical outcome of ccRCC. [14][15][16][17] With the advances in various expression profilings and bioinformatic technologies, efforts are being made to identify molecular classifiers to refine the current grading method for ccRCC. 14,18 These models were mostly based on the information of tumor grade and stage. However, precise pathological grading of RMB samples remains difficult, and the accuracy of predicting high grade with histological method was as low as 70% in most centers. 19 There is a need of ccRCC gene profile from RMB specimens that would lead to more precise Fuhrman grading.
In our study, we demonstrated that gene expression profiling of RMB specimens could yield diagnostic information for predicting high Fuhrman grade. An eight-gene signature is a useful tool, which was identified from gene expression profiles and validated in RMB dataset, with a high sensitivity and specificity for Fuhrman grading. The value of this gene signature may be in the RMBs to initiate an active surveillance protocol for the elderly or medical comorbid patients. Also, the signature may have potentially academic interest in molecular understanding about Fuhrman grading system.

Patient samples
The Cancer Genome Atlas (TCGA) dataset, level 3 RNAseq expression data from kidney renal clear cell carcinoma (KIRC) samples by Illumina HiSeq 2000 RNA sequencing platform were obtained from TCGA data portal (https://genome-cancer. ucsc.edu/proj/site/hgHeatmap/). Only tumor samples were taken into account in the study. Tumor transcriptomic profiles of 20,534 genes were measured in 478 primary ccRCC patients. Clinical information, including intact FG for selected subjects, was retrieved from the "Clinical Biotab" section of the data matrix based on the Biospecimen Core Resource (BCR) identification numbers of the patients. Extended demographic parameters for these patients, characterized by TCGA consortium, are shown in Table 1.
The Fudan University Shanghai Cancer Center (FUSCC) development set cohort consisted of 283 patients with histologically confirmed ccRCC by an experienced pathologist who had undergone nephron spearing nephrectomy (NSS) or nephrectomy at FUSCC without any pretreatment. These patients were consecutively enrolled from 2009 to 2012. The frozen tumor tissues were stored at 2808C once resected.
The RMB set was acquired from June 2013 to July 2014 at FUSCC for 127 ex vivo biopsy patients who underwent radical or partial nephrectomy for suspicious renal mass. Six core biopsies were taken from each tumor using an 18-gauge biopsy needle (two was obtained from a central location and four from a peripheral location) immediately after partial or radical nephrectomy as previously described 20 to mimic RMBs. Three cores were separately snap-frozen in liquid nitrogen and stored at 2808C for RNA preparation. The other three remaining cores were fixed in 10% formalin for standard histological processing. Histological type and Fuhrman grade of biopsies specimen were reviewed by the same pathologist. Clinical data including pathological examination of surgical specimen were collected from electronic medical records at FUSCC. Our study was approved by the ethical committee of FUSCC, and each patient provided written informed consent before participation.

RNA preparation and cDNA synthesis
Each frozen tissue specimen was cut into small pieces using a clean disposable operation blade in RNase-free 1.5 ml Eppendorf tubes on ice. Then, 0.4 ml Trizol (Invitrogen, Life Technology, Carlsbad, CA) was added to the tubes and samples were ground manually using a grinding device (OSE-Y10/Y20, TIANGEN, Beijing, China) with a disposable RNase-free grinding rod (WL046, TIANGEN). Next 0.6 ml Trizol was added after grinding and tubes were briefly vortexed, and then 200 ll chloroform (Merck #102445) was added. Phase separation was performed manually without touching the lower phase and RNA precipitation was completed with 500 ll isopropanol. The RNA pellets were washed with 1 ml of 70% ice-cold ethanol, and the RNA was resolved with 50 ml diethylpyrocarbonate (DEPC) water and digested using 2U DNase at 378C for 20 min followed by ethanol precipitation. 21 RNA extractions of core-biopsy samples were similar to that in uncut tissues. The total RNA of the three biopsy cores was mixed together before treatment with DNase. cDNA preparation was performed according to manufacturer's What's new? Advanced clear cell renal cell carcinoma (ccRCC) is an aggressive and often fatal disease. Despite the existence of wellestablished grading systems, challenges remain in ccRCC diagnosis, including the need to overcome grading inaccuracies during renal mass biopsy (RMB). Here, using data from the Cancer Genome Atlas, a predictive gene panel was developed to improve efforts to distinguish between high-and low-grade ccRCC. An eight-gene signature was validated clinically in RMB samples from ccRCC patients and was significantly more accurate in predicting high-grade ccRCC than conventional approaches. The novel gene signature could facilitate risk-stratification and therapeutic decision-making in ccRCC.
instructions using a RevertAid First Strand cDNA Synthesis Kit (K1622; Thermo Fisher Scientific, Waltham, MA). Then, 500 ng template RNA was used in the reverse transcription reaction with 1 ml of random primer (0.2 lg/ll) in a total volume of 20 ml. The reaction mixture was incubated for 5 min at 258C followed by 60 min at 428C. cDNA concentration was measured using a NanoVue (28923215, GE Healthcare), and cDNA was diluted to 200 ng/ml with DEPC water.

Quantitative reverse transcription-polymerase chain reaction (qRT-PCR) analysis
qRT-PCR primer sequences of 24 candidate genes are listed in Supporting Information Table S1. All primers were used at an annealing temperature of 608C. Then 200 ng cDNA template was applied for the SYBR Green (638320, Takara, Japan) qRT-PCR analysis per well. The amplification was performed using an Applied Biosystems 7900HT Fast Real-Time PCR System (Life Technologies), and all measurements were taken in triplicate. The melting curves of each measurement were checked; only the coordinate results were included in the subsequent analysis. ACTB (b-actin) was used as the internal control. The mean Ct value of each gene minus the mean Ct value of ACTB was calculated as DCt. The -DCt value of each gene was applied for binary logistic regression and model construction.

Data analysis
All the statistical analysis steps, including data preprocessing, gene selection, classification model construction and independent testing was performed with R software and packages from the Bioconductor project. 22,23 Significant gene selection was performed by the method of least absolute shrinkage and selection operator (LASSO) using the LARS package. 24 As a typical penalized regression method, LASSO selects predictive genes and simultaneously estimates the regression coefficients in the multiple linear regression model, and thus is particularly useful for candidate genes' selection and parameter estimation in high-dimensional genomic data. The prediction accuracy was estimated by tenfold cross-validation, which means that the dataset was divided into ten approximately equally sized subsets; a prediction model was trained for nine subsets and prediction was conducted for the remaining subset. This training and testing process was repeated 10 times to include predictions for each subset. For the data obtained by qRT-PCR, the Mann-Whitney unpaired test was used for the comparison between low-and high-grade samples. All significance tests were two-sided, and a p values of <0.05 was considered significant. A stepwise logistic regression model was used to select predictive markers based on the development dataset. The predicted probability of being diagnosed with low-and high-grade tumors was used as a surrogate marker to construct a receiver operating characteristic (ROC) curve. Area under the ROC curve was used as an accuracy index to identify the best combination of multiple markers.

Results
Detailed workflow of our study is shown in Figure 1. Clinicopathological characteristics of the TCGA and FUSCC development cohort cohorts are listed in Table 1. The tumor grade distributions of the TCGA and FUSCC development cohort were similar (p 5 0.526). A total of 127 ex vivo biopsy patients were enrolled in the RMB cohort, including six benign and nine non-ccRCC cases as well as 112 ccRCC patients (Table 2).
Patients in TCGA cohort were mostly Caucasians and the development and RMB cohorts were all Asians. Sex distribution differences were not shown in both the development and RMB cohorts. High grade was correlated with higher percentages of tumor necrosis, both in TCGA and FUSCC development cohort (all p < 0.001). Although age distributions were different among the three cohorts, age distribution were not significantly related to grade in TCGA cohort (p 5 0.508).

TCGA data analysis and candidate gene selection
The RNAseq dataset included gene expression levels for 20,534 genes in 221 low FG and 269 high FG patients. First we performed LASSO analysis to investigate candidate genes that are related to FG. With the tenfold cross-validation process, a list of 87 candidate genes was identified. Second, in multivariate logistic regression, 24 genes remained statistically significant and were selected to build a logistic regression model. The diagnostic performance for the 24-gene logit model, measured by AUC, reached 0.88 (detailed in Table 3). Among which, nine genes were overexpressed in low-grade patients and 15 genes were overexpressed in high-grade patients.

Development of the expression signature in the FUSCC cohort
The 24 genes selected in the discovery set were further evaluated by qRT-PCR analysis in 283 Chinese ccRCC patients. In To identify a robust gene expression signature to distinguish low and high Fuhrman grade ccRCC, we used TCGA dataset of 478 samples as a discovery set. A list of 24 candidate genes was selected based on their differential expression pattern between low and high Fuhrman grade samples (Table 3). Then, we further investigated the expression profiles of candidate genes using qRT-PCR. An eight-gene expression signature was identified and trained using 134 low-grade and 149 high-grade ccRCC samples. Finally, the eight-gene expression signature was validated using 127 prospectively collected ex vivo renal biopsy specimens (6 benign, 112 ccRCC and 9 non-ccRCC cases).
univariate analysis, 10 out of 24 genes (ATOH8, ATP1A3, C10orf4, CHMP4C, CNGA1, NCRNA00116, PLA2G15, PPP1R1A, SPOCK1 and TXNDC16) were confirmed to be significantly differentially expressed between low and high FG patients. In multivariate analysis, C10orf4 and TXNDC16 failed to reach statistical significance and thus were excluded from the final panel (Table 4). The predicted probability of being diagnosed with high FG tumors from the logit model based on the eight genes was used to construct the ROC curve, as follows: logit(p 5 high

Predictive capability of the eight-gene expression signature in RMB dataset
In molecular profile analysis, parameters estimated from the FUSCC development set were used to predict the probability of being diagnosed with high FG tumor for these patients. The AUC of the eight-gene expression signature was 0.821 (95% CI, 0.737-0.887; Fig. 2b). The eight-gene signature could distinguish grade IV ccRCC with an AUC of 0.925 (95% CI, 0.860-0.966; Fig. 2c) and it could identify malignant tumor from benign mass with an AUC of 0.915 (95% CI, 0.852-0.957; Fig. 2d). Moreover, this gene set had a good performance in mixed histology types (AUC 5 0.818, 95% CI, 0.736-0.882; Fig. 2e).
By standard histological methods, ex vivo core biopsies of the 127 renal tumors yielded noninformative results in 8 cases and 28 misclassified cases with an AUC of 0.626 (95% CI, 0.531-0.719), significant lower than eight-gene signature (p 5 0.001). Using conventional clinical parameters (tumor size, age, gender, stage, necrosis) to predict high grade, AUC  Fig. 1f).

Performance of the eight-gene signature in outcome prediction
We applied the eight-gene signature equation to calculate the possibility of high FG in FUSCC development cohort. Then we trichotomized them into three groups. Kaplan-Meier curve was plotted (Supporting Information Fig. S1). We also evaluated the outcome predictive capability of our model in 478 TCGA patients' overall survival. We found that patient with a high score of our model is associated with poor prognosis. In detail, the C-index of eight-gene model (eight-gene signature and stage) can reach 0.747. Adding eight-gene signature alone into SSIGN model, 25 C-index increased to 0.755 (C-index of SSIGN alone is 0.751, p 5 0.048). We used bootstrapping to internally validate and calibrate three models. Fig. S2 depicted the three models' calibration curve, which plotted the predicted vs. observed 5year survival of TCGA cohort. The plots demonstrated good calibration (R code in Supporting Information).

Discussion
In addition to patient characteristics and expertise of surgeons, the choice of treatment for ccRCC is mainly based on evaluating its biological potential. Hence, clear identification of prognostic factors like FG, tumor size would help urologists distinguish progressive malignancies that require immediate intervention or target therapies and indolent ones for which suitable for active surveillance or ablation. In parallel with the established imaging technology to determine T stage, grading of ccRCC is another important parameter for clinicians. However, accurate determination of FG is difficult because of the heterogeneity of RCCs and issues associated with incomplete sampling. In our study, a pragmatic approach was used to identify candidate biomarkers predicting high FG for this important clinical need. With authoritative and high-dimensional genomic data of TCGA, we used the solid bioinformatics algorithm to develop the predictive gene panel. Moreover, we validated this gene panel using a RMB cohort. In the ex vivo RMB dataset, the accuracy of this eight-gene signature was significantly higher than conventional methods. Interestingly, the model performed well to distinguish malignant and grade IV tumors and non-ccRCC specimen did not affect much on this model. These make the model more valuable in clinical RMB practice. To our knowledge, this is the first gene expression signature for Fuhrman grade classification in ccRCC. Renal biopsies are commonly performed on patients with late stage renal masses or small lesions followed by cryotherapy or ablation. The European Association of Urology (EAU) recommends biopsy for all patients undergoing active surveillance, and American Urological Association (AUA) lists RMB as an option for such patients. The accuracy of RMB has been demonstrated to be higher than 60% for histologic diagnosis (cancer vs. benign), however, it remains poor for grade. Previous studies evaluating RMB performance in tumor grading by histological procedure suggested that the accurate diagnosis rate was nearly 60%, ranging from 43 to 75%. 19 In our study, the eightgene signature markedly increased the accuracy of tumor grading than histological method with an elevated AUC of 0.195 (p 5 0.001). Also this gene panel had better discriminable ability than conventional parameters using tumor size, AJCC stage and necrosis. 26 Therefore, this signature could increase the accuracy of histological diagnosis in RMB samples.
A major strength of our study is that the samples were derived from two large populations with high statistic power and low false positive rate (FDR). Another advantage of our study is that the 24-gene signature derived from the data of TCGA. That is because the quality of the clinical samples is quite important to obtain reliable and reproducible data of molecular profiling. Third, the following development and validation stages were qRT-PCR based which is a fast, quantitative method with high reproducibility and is widely applicable in hospitals. However, there are different results between RNAseq and qRT-PCR in TCGA discovery dataset and FUSCC development dataset. Since the purpose of our study is to make an easy to use method for predicting highgrade ccRCC, the final model is qRT-PCR based. To make  test the reproducibility of this model, we validate our results in biopsy samples with qRT-PCR method. Along with the development of medical technology and molecular therapy, the treatment methods of ccRCC improved. Less invasive strategies such as ablative therapy or active surveillance are now being discussed for the treatment of small ccRCCs. Patient selection and cancer subtype classification is becoming important. So far, many approaches have been applied to establish the molecular subtypes of cancers. 14,27 These attempts made a big step forward in the understanding of ccRCC. However, in these models, FG and stage remained the two most significant discriminators of the outcome. Therefore, to avoid overdiagnosis and overtreatment of ccRCC patients due to the low accuracy of regular histological test of RMB samples, a high FG biomarker test is required. The development and validation of the eight-gene signature made it not only applicable in regular specimen of partial or radical nephrectomy, but also predictive in ex vivo RMB samples. This could provide additional biological behavior information for both physicians and pathologist for consulting and decision making. This could also help urologists in RMB cohort to initiate an active surveillance protocol in the elderly or medically comorbid patients.

Tumor Markers and Signatures
Most of the genes listed in the signature were found to be involved in differentiation and the extracellular matrix. To some extent, this result demonstrated that the signature could explain the nature of the Fuhrman grading system based upon nuclear size, and the shape and prominence of nucleoli. 2 ATOH8 is recognized as a transcriptional factor that has important roles in cell differentiation and developmental biology. [28][29][30] ATP1A3 and CNGA1 are important in cellular energy metabolism, 31,32 while CHMP4C belongs to the chromatin-modifying protein family and is associated with nucleus organization. 33 PLA2G15 encodes an enzyme that regulates the multifunctional lysophospholipids and exists in exosomes. 34 PPP1R1A is protein phosphatase gene, and differential expression of this gene had been reported to be associated with multistep carcinogenesis. 35 SPOCK1 was reported to be a TGF-b target associated with epithelial-mesenchymal transition (EMT) of lung cancer 36 and tumor proliferation and metastasis in hepatocellular and gallbladder cancer. 37,38 Finally, NCRNA00116 is a long intergenic nonprotein coding RNA without a clear function annotation as yet. 39 These genes are poorly studied in ccRCC and further research may reveal a better understanding of ccRCC and Fuhrman grading. The interactions of these genes with other well-known genes that affect biological behavior and anticancer therapy of RCC such as VHL, 40 SMADs, 41 BAP1, 42 PD-L1 43 and CAIX 44 remain unclear.
At last, the eight-gene classifier was proved to be predictive of prognosis. Tumor stage does not decide outcome alone. Thus in the comparison to SSIGN, 25 we add stage into model as previous report. 14 Results of comparing with SSIGN model showed that our model is associated with ccRCC prognosis. This reflects the reliability of our model to some extent. Since the model codes of clearcode34, 14 ccA/ccB 45 and S-3 score 46 were not available to us, we cannot compare our model with them. In a previous report, 34-gene model clearcode34 did not match the ability of 90-gene model ccA/ccB. 46 The advantage of clearcode34 is that fewer genes make it easier for applications. In the same way, our model is also easy to apply. Although our model did not match previous models, but in some special situations like insufficient sampling in biopsy for determining tumor grade, early stage tumor with serve comorbidity is suitable for active surveillance.
However, there are some limitations. First, although the discovery set was based on a global TCGA cohort, the development and RMB set were derived from the same Asian center, and thus selection bias may exist. Second, the case number of the RMB group was small and only Asians were included. Future prospective studies in large cohorts of patients of different ethnic backgrounds will be needed to fully refine the integrated grading algorithm. Third, because biopsies were taken ex vivo, the nondiagnostic rate in our study is unknown. Fourth, consider of biological heterogeneity of tumors, the limited biopsy data may not provide enough molecular information for a reliable answer. Also, six biopsies on the RMB cohort are very different from what can be obtained in a patient with a renal mass currently in practice. Moreover, ex vivo biopsies are very different from percutaneous renal biopsy in the preoperative setting. The influence of ITH should be considered validated in further studies. Therefore, our study will require validation in a cohort with preoperative biopsies in FFPE specimen.

Conclusions
After external validation, these methods may serve as examples of tools available for the diagnosis and risk-stratification of patients with ccRCC to assist in optimal individualized therapy.