Who should be included in a clinical trial of screening for bladder cancer?
A decision analysis of data from the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial
Because of its relatively low incidence, bladder cancer screening might have a better ratio of benefits to harms if it is restricted to a high-risk population. Data from the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial were used and simple decision analytic techniques were applied to compare different eligibility criteria for a screening trial.
For a variety of possible eligibility criteria, the percentage of the population aged 55 years to 74 years and classified as being at high risk for developing invasive or high-grade carcinoma, and therefore likely to benefit from screening, was calculated. Regression models were used to calculate a risk score based on age, sex, smoking history, and family history of bladder cancer. The reduction in cases was calculated given hypothetical risk reductions associated with screening. The trade-off between patients screened and tumors avoided was calculated as a net benefit.
The 5-year probability of being diagnosed with invasive bladder cancer was 0.24%. Using a risk score > 6 or > 8 as the eligibility criterion for a trial was generally superior to including all older adults. In a typical scenario, a risk score > 6 would result in approximately 25% of the population being screened to prevent 57 invasive or high-grade bladder cancers per 100,000 population; screening the entire population would prevent only an additional 38 cases.
Screening for bladder cancer can be optimized by restricting it to a subgroup of patients considered to be at elevated risk. Different eligibility criteria for a screening trial can be compared rationally using decision-analytic techniques. Cancer 2013. © 2012 American Cancer Society.
Population-based trends have demonstrated that major reductions in cancer mortality have occurred most often under only 1 of 2 circumstances: reduced exposure to a carcinogen (stomach, liver, and lung cancer) or through screening (breast, cervical, and colon cancer). Bladder cancer would appear to be a natural candidate for screening due to the availability of an inexpensive, noninvasive test (urinalysis) and a lead time between screen detection and progression to advanced disease that is sufficiently long to allow for intervention. The majority of cases of bladder cancers are of urothelial cell origin and arise from the bladder epithelium. The natural pathway of progression of invasive bladder tumors is believed to proceed from cancer development within the epithelium to subsequent invasion of the detrusor muscle.
The prognosis for patients with bladder cancer is closely associated with the stage of disease at the time of presentation. Between 25% to 35% of tumors present as muscle-invasive lesions. These tumors are associated with a high mortality rate whereas lower stage lesions are frequently cured with less intensive treatments. The early detection of tumors that are destined to progress may allow for earlier curative intervention and potentially could obviate the need for radical cystectomy or chemotherapy.
Although bladder cancer is the fourth most common tumor among US males, the gender-specific incidence in the general population is between 80% and 90%, which is lower than that of other cancers, such as breast or prostate, that are typically subject to screening.1 This has caused inevitable problems for screening studies. For example, Messing et al identified only 10 high-grade or invasive bladder tumors in 1575 participants in a bladder cancer screening study2 and Lotan et al found only 1 high-grade lesion in a series of 1502 patients considered to be at high risk for the development of bladder cancer due to either a smoking history or chemical exposure.3
The natural response to such studies is that bladder cancer screening should be restricted to those patients at highest risk. Yet this immediately requires an answer as to what constitutes high risk. In the case of bladder cancer, for example, it is known that smoking is a risk factor, but a decision has to be taken as to how many pack-years is enough to place someone in a high-risk category. We have previously argued4 that the choice of risk cut points varies between trials. For example, the National Lung Screening Trial included only smokers or those who recently quit smoking, with a smoking history of ≥30 pack-years,5 whereas the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial included older individuals irrespective of smoking history. It is often unclear how trials have determined cut points, such as smoking history, for eligibility criteria. We have shown in both a methodologic article4 and a practical application6 that risk cut points can be chosen rationally on the basis of a simple, decision-analytic methodology. The overall objective is to determine a risk cutoff that would capture a sufficiently large number of invasive cancers without subjecting an unnecessarily high percentage of the population to unneeded screening. In this article, we applied this methodology to a bladder cancer data set to address the question of the optimal risk group for a trial of bladder cancer screening. We assumed that, were the trial to demonstrate a benefit, the criteria used to select patients for bladder cancer screening in practice would be close to those used in the trial. As such, the trial was designed to reflect what would be the optimum strategy at the population level.
MATERIALS AND METHODS
Data for this investigation were obtained from the PLCO trial on January 14, 2010, which included participant follow-up to December 31, 2008. The PLCO trial has been described previously.7 In brief, men and women aged 55 years to 74 years who reported no history of prostate, lung, colorectal, or ovarian cancer were enrolled between 1993 and 2001 at 10 centers around the United States in a randomized trial designed to evaluate the effect of screening for those 4 cancers. The PLCO trial did not screen for bladder cancer. At baseline, participants completed a self-report questionnaire that collected information regarding exposures including demographics, smoking habits, personal medical history, and family history of cancer. Cancer cases were ascertained through the routine follow-up of positive trial screening examinations and through the use of a mailed annual study update questionnaire. All cancers were confirmed by retrieving medical records, which were then abstracted by certified tumor registrars at each of the screening centers.
For this analysis, the data sets were separated into training (one-third; N = 49,873) and validation (two-thirds; n = 99,746) sets by the PLCO data management group. We used the training data set to evaluate a variety of screening strategies; the 4 strategies with the best performance characteristics were then evaluated in the validation data set. We used both risk group and risk score approaches. To develop risk scores, we used a Cox proportional hazards regression model to calculate model coefficients; these were divided by the smallest coefficient and rounded to the nearest integer to derive “points” for each variable that were summed for the total risk score. The variables contributing to the risk score were age, sex, smoking history, and family history of bladder cancer. Risk scores were calculated by assigning 2 points for age ≥ 65 years, 2 points for a smoking history of 10 pack-years to 19 pack-years, 4 points for a smoking history of ≥ 20 pack-years, 4 points for male sex, and 1 point for family history of bladder cancer. We compared several potential screening strategies that were developed using simple decision rules (eg, screen if the individual is aged ≥ 65 years and a smoker with ≥ 10 pack-years).
The objective was to identify a group of participants with an increased probability of developing invasive bladder cancer because these patients have the potential to benefit the most from a screening program. We assumed that detecting a bladder cancer when in situ would lower the likelihood that the patient would develop more advanced stage disease, which increases the risk of radical surgery and cancer-specific death. High-grade in situ tumors, although rare, are generally considered aggressive, and therefore we included grade 3 or 4 carcinoma in situ in our definition of an event. We also assumed that invasive cancers diagnosed in the PLCO trial were detected clinically, rather than by screening, on the grounds that screening for bladder cancer remains very rare in the United States.
We estimated the screening rate in the population under different criteria for trial eligibility by calculating the percentage of patients in our data set meeting each criterion. To calculate the sensitivity and specificity of the various screening strategies for survival time data, we used a previously published methodology.6 In brief, we used Kaplan-Meier estimates of survival at a landmark time, predefined to be 5 years, and converted these to relevant conditional probabilities using Bayes theorem.
Because we were treating incidence as a binary event, the relative risk (RR) was defined as the risk of invasive or high-grade disease at 5 years with screening divided by the 5-year risk without screening. As before, we assumed a constant RR reduction. For example, given a RR of 0.7, a patient with a 10% probability of invasive bladder cancer without screening would have a 7% risk with screening; a patient with a 1% baseline risk would have a 0.7% risk if screened.
To identify the optimal eligibility criteria for a screening trial, we needed to account for the finding that some approaches would reduce the event rate more than others, but would involve a greater percentage of the population being screened. To calculate whether the reduction in malignant disease offsets the increase in the percentage of the population screened, we need to consider the maximum number of patients a clinician would need to screen to prevent 1 invasive or high-grade cancer. This is known as the “number-needed-to-screen threshold” or “number-willing-to-screen” (NWS) and is a judgment that can vary from clinician to clinician. This number tells us how a physician weighs the benefits of detecting a potentially invasive bladder cancer case while it is still in situ against the harms of screening, which include inconvenience, anxiety, and harm from workup and overtreatment, as well as the financial costs of follow-up interventions such as imaging and cytoscopy. The NWS for other cancers has been reported to range from 1000 to 2000 to prevent 1 death.8 We used a lower range to reflect the change of endpoint from mortality to invasive cancer.
The reduction in the event rate is estimated by assuming various RR reductions in the percentage of the population that is subject to screening, such that:
Events if screening = the number of events in the population not eligible for screening + the number of events in the population eligible for screening × the RR reduction associated with screening
Reduction in event rate = (events if no screening − events if screening) ÷ population size
The clinical net benefit is then calculated using the screening rate, defined as the number of individuals in the population recommended to undergo screening divided by the total population.
Net benefit = reduction in event rate − (screening rate ÷ NWS)
In other words, the screening rate is weighted by a factor related to the degree of harm or cost associated with screening. Because net benefit includes both screening and event rates, the optimal screening strategy is the one with the highest net benefit, irrespective of the absolute size of the differences in net benefit. To illustrate the calculation of net benefit, imagine that a trialist would subject no more than 1000 individuals to yearly screening to find 1 invasive bladder cancer while it is still in situ. Further imagine that the trialist wished to evaluate a screening strategy that recommended screening for 50% of the population, and that, without screening, there would be 200 invasive tumors per 100,000 population, 150 of which would be found in the 50% of the population that is screened. If the anticipated RR of screening is 60%, implementation of the screening strategy would lead to a total of 50 invasive tumors in those not screened plus 150 × 0.6 in those subject to screening, for a total of 140 invasive tumors. This means that the incidence of invasive tumors would be reduced by 200 − 140 = 60 while screening 50% of 100,000 = 50,000. Given an NWS of 1000, the net benefit is 60 − (50,000 ÷ 1000) = 10.
To calculate sample size for trials using different eligibility criteria, we used a standard formula for a binary comparison of percentages9 with a power of 80% and an alpha of 5%. Event rates with and without screening were calculated for different RRs and eligibility criteria as described earlier; the number needing to be assessed for eligibility was calculated by multiplying the sample size by the reciprocal of the percentage of the population meeting the eligibility criteria. All analyses were conducted using Stata 11.0 software (StataCorp LP, College Station, Tex).
Table 1 shows the baseline characteristics of the training and validation data sets. The median age of the patients at the time of randomization was 62 years (interquartile range, 58 years-67 years) and approximately one-half of the patients (51%) were female. Slightly more than one-half of the participants reported a history of smoking (53%), whereas 25% reported smoking for > 30 pack-years. High-grade in situ cancer was rare, equivalent to only 2% of invasive cases.
Table 1. Summary of Patient Characteristicsa
|Age at randomization (IQR), y||62 (58-67)||62 (58-67)|
|Female||25,348 (51%)||50,666 (51%)|
|Married||37,717 (76%)||75,193 (75%)|
|White race||44,139 (89%)||88,104 (88%)|
|No. of comorbidities|| || |
|0||14,752 (30%)||29,645 (30%)|
|1||16,262 (33%)||32,162 (32%)|
|2||10,794 (22%)||21,438 (21%)|
|≥3||8065 (16%)||16,501 (17%)|
|Smoking history (pack-y)|| || |
|0 (nonsmoker)||23,664 (47%)||47,356 (47%)|
|1-10||4766 (10%)||9491 (10%)|
|10-20||4995 (10%)||9846 (10%)|
|20-30||3915 (8%)||7711 (8%)|
|≥30||12,533 (25%)||25,342 (25%)|
|Family history of bladder cancer||901 (2%)||1769 (2%)|
|Invasive or high-grade bladder cancerb||264||506|
Over the entire period of follow-up, 770 patients developed invasive or high-grade bladder cancer (264 patients in the training cohort and 506 patients in the validation cohort). The 5-year probability for the entire validation cohort was 0.24% (95% confidence interval, 0.21%-0.27%).
The 5-year risk of bladder cancer in the validation set was 0.341% versus 0.181% for participants aged ≥ 65 years compared with those aged < 65 years; 0.121%, 0.239%, and 0.434%, respectively, for those with smoking histories of < 10 pack-years, 10 pack-years to 19 pack-years, and ≥ 20 pack-years; 0.405% versus 0.078% in males and females; and 0.230% versus 0.238% in those with and those without a family history of bladder cancer. It can be observed that family history did not discriminate in the validation set. This may because of the relative rarity of a reported family history (< 2%) and the consequent imprecision of statistical estimates of the association between family history and outcome.
After evaluating a variety of risk criteria on the training set, 4 were chosen for evaluation on the validation set: 2 approaches based on risk scores and, as comparison groups, the strategies of screening all or none of the participants. Table 2 shows the number of participants screened under each of these 4 strategies and the expected number of diagnoses of invasive or high-grade bladder cancer within 5 years given various levels of screening effectiveness. Compared with the strategy of screening no one, screening those with a risk score > 8 requires screening only 8.4% of the population, while reducing the rate of invasive or high-grade cancer by 6% to 14% (14-34 events per 100,000 population), depending on the assumed RR reduction of screening. Using the less strict threshold for screening of a risk score > 6 results in a greater percentage of the population screened (23.4%), but a larger reduction in the event rate (12%-30%; 48-119 events within 5 years per 100,000 population).
Table 2. Percentage of Patients Screening Using Each Screening Strategy and the Associated Reduction in the 5-Year Risk of Invasive or High-Grade Bladder Cancer for Various RRs
|None||0.0%||0.0%||100.0%||238 (0%)||238 (0%)||238 (0%)||238 (0%)|
|Risk score >8c||8.4%||28.8%||91.7%||204 (14%)||211 (12%)||218 (9%)||224 (6%)|
|Risk score >6c||23.4%||59.8%||76.7%||167 (30%)||181 (24%)||195 (18%)||210 (12%)|
|All||100.0%||100.0%||0.0%||119 (50%)||143 (40%)||167 (30%)||190 (20%)|
Table 3 shows the net benefit for each screening strategy across a range of NWS and the RRs of invasive disease. The optimal screening strategy is the one with the highest net benefit. When screening is assumed to be more effective (smaller RR) and tolerable (higher NWS), approaches that screen more participants are optimal (risk score > 6). When screening is believed to be less effective, the optimal strategy is to screen fewer participants (those with a risk score of > 8 or none). One would only opt not to screen anyone if the NWS was ≤ 500 and the assumed risk reduction was ≤ 20% or if the NWS was ≤ 250 and the assumed risk reduction was ≤ 40%. It is interesting to note that there was no combination of screening effectiveness and tolerability for which the optimal strategy was to screen everyone in the population. In other words, the risk score is important to identify population groups at sufficiently high risk for the development of invasive bladder cancer that could benefit from screening.
Table 3. Net Benefit Associated With Various Screening Strategies and RRs for NWS of 250, 500, 750, 1000, and 1500 Individualsa
|Risk score >8b|| || || || |
|Risk score >6b|| || || || |
|All|| || || || |
Some power calculations are given in Table 4. It is clear that a randomized trial of screening for bladder cancer would only be feasible if restricted to a high-risk group; a trial including all older adults would require approximately 3 times as many patients. It is also clear that conducting a trial in a subgroup of patients at increased risk would require large numbers to be assessed for eligibility. However, this would likely prove to be only a minor practical challenge given that the criteria would be so simple and may be available in electronic medical records. A family history of bladder cancer may not be routinely recorded but reclassifies only a small minority of patients. For example, excluding a family history from the risk score would reclassify less than approximately 250 individuals per 100,000 population as being at high risk, only 1 of whom would be found to have invasive bladder cancer within 5 years. Therefore, family history might be dropped from the eligibility assessment if obtaining these data caused practical difficulties.
Table 4. Power Calculations for Various Trial Scenariosa
|Risk score >8b||20,268||242,027||37,610||449,114||88,152||1,052,653|
|Risk score >6b||27,338||116,682||50,734||216,539||118,930||507,608|
As sensitivity analyses, we excluded any cancers occurring within the first year on the grounds that these may not be prevented by screening. This did not substantively change our results. For example, a screening strategy that used a risk score > 8 would have a specificity of 92% and a sensitivity of 31% instead of 92% and 29%, respectively. We also examined all cancers occurring within 10 years, instead of 5 years. Due to the higher number of cancers detected with a longer follow-up, using 10 years did shift the optimal screening strategy toward those that screened a larger percentage of the population. For example, the optimal screening strategy would be to screen all individuals if the RR was 0.6 and the NWS was ≥ 1000. Nonetheless, a risk score > 6 was the favored strategy for most scenarios.
The results of the current study clearly demonstrate that screening for bladder cancer needs to be restricted to a subgroup of patients considered to be at elevated risk. Under reasonable assumptions for the benefits of screening (in terms of reducing radical surgery and cancer-specific death) and harms (in terms of inconvenience, anxiety, and the harms associated with the workup of false-positive results), as well as financial costs, implementing screening for a high-risk subgroup in the population was clearly preferable to screening all or none of the population. Moreover, we were able to compare different definitions of high risk using decision-analytic methods to identify optimal criteria for bladder cancer screening.
Our modeling approach is not based on a particular approach to screening, either in terms of frequency or a type of test. We assume that different tests will vary with respect to effectiveness (compare a highly sensitive test given yearly with a less sensitive test given every 5 years) and tolerability (compare an expensive test with poor specificity with an inexpensive, highly specific test). As such, we provide the results for a variety of different scenarios regarding tolerability (in terms of NWS) and effectiveness (in terms of RR).
There are several possible limitations to the current study. First, our population was comprised of those volunteering for a screening trial. Although this makes the cohort ideal for our primary aim, namely to determine the inclusion criteria for a trial of bladder cancer screening, PLCO volunteers may not be representative of the population as a whole.10 This would appear to be an inherent limitation of randomized trials regarding cancer screening. Second, occupational exposure to chemicals such as azo dyes is known to increase the risk of bladder cancer,11 and such exposures were not recorded in the PLCO trial. Given that pertinent occupational exposure is relatively rare, it is difficult to justify including it formally in a risk prediction model. One might imagine that occupational exposure would be a matter of clinical judgment; for example, a physician might choose to screen an older patient who had been exposed, even if that person had never smoked. Third, it is possible that participation in the PLCO trial reduced the risk of invasive bladder cancer. For example, an early stage bladder cancer might be diagnosed and treated in a patient being evaluated for prostate cancer. That said, prostate and colorectal screening are widely prevalent in the community, and ovarian screening affects only women, who are known to be at a lower risk of bladder cancer. Moreover, this effect would only have been observed in the 50% of our cohort randomized to screening. Hence, we do not anticipate that this would have a major impact on our findings.
We did not include race in our model. Although whites have a higher incidence of bladder cancer, mortality is higher among African Americans.12 Therefore, it would be difficult to justify including race as a predictor, one that would lead to fewer African American individuals undergoing screening. Nonetheless, we did repeat our analyses adjusting for race and found no important differences; for example, the RR for a 1-point increase in risk score was identical to 2 decimal places.
We also demonstrated that restricting screening to a high-risk group would dramatically improve the feasibility of a randomized trial of bladder screening. Although the maximum number of patients that need to be screened to prevent one invasive or high-grade bladder cancer is a clinician judgment that can vary, it is clear that delineating parameters before the trial that result in a population truly at risk for the disease dramatically improves the likelihood of determining whether a novel biomarker or treatment strategy is of value.
The exact nature of the optimal screening strategy for bladder cancer has yet to be established. Microscopic blood in the urine has been found in essentially all patients with bladder cancer if serially evaluated. Simple home microscopic urine evaluation can be accomplished with a chemical reagent strip for hemoglobin analyses. This has served as the screening test of choice in the majority of previous bladder cancer screening efforts to determine who should undergo definitive workup. Microhematuria, however, is a nonspecific finding, identified in 15% to 20% of all patients screened.13, 14 The poor specificity of microhematuria and the relatively low incidence of bladder cancer in the general population lead to many unnecessary investigations (cystoscopy and imaging), which greatly increases the cost of such a screening strategy and decreases its relative benefit and patient acceptance. Other urinary biomarkers have been approved by the US Food and Drug Administration for the detection of bladder cancer15; however, their use as a potential secondary screening test has yet to be established.
The results of the current study have demonstrated that any trial of bladder cancer screening should be restricted to a subgroup at elevated risk. Moreover, it has been shown that different eligibility criteria for risk can be compared rationally using decision-analytic techniques.
Note Added in Proof
Supported in part by funds from David H. Koch provided through the Prostate Cancer Foundation, the Sidney Kimmel Center for Prostate and Urologic Cancers, and a P50-CA92629 SPORE grant from the National Cancer Institute to Dr. P. T. Scardino.
CONFLICT OF INTEREST DISCLOSURES
The authors made no disclosures.