Abstract
Background and objectives: The Short Form 12 (SF-12) has not been validated for long-term dialysis patients. The study compared physical and mental component summary (PCS/MCS) scores from the SF-36 with those from the embedded SF-12 in a national cohort of dialysis patients.
Design, setting, participants, & measurements: All 44,395 patients who had scorable SF-36 and SF-12 from January 1, 2006, to December 31, 2006, and were treated at Fresenius Medical Care, North America facilities were included. Death and first hospitalization were followed for up to 1 year from the date of survey. Correlation and agreement were obtained between PCS-36 and PCS-12 and MCS-36 and MCS-12; then Cox models were constructed to compare associated hazard ratios (HRs) between them.
Results: Physical and mental dimensions both exhibited excellent intraclass correlation coefficients of 0.94. Each incremental point for both PCS-12 and PCS-36 was associated with a 2.4% lower adjusted HR of death and 0.4% decline in HR for first hospitalization (both P < 0.0001). Corresponding improvement in HR of death for each MCS point was 1.2% for MCS-12 and 1.3% for MCS-36, whereas both had similar 0.6% lower HR for hospitalization per point (all P < 0.0001).
Conclusions: The use of the SF-12 alone or as part of a larger survey is valid in dialysis patients. Composite scores from the SF-12 and SF-36 have similar prognostic association with death and hospitalization risk. Prospective longitudinal studies of SF-12 surveys that consider responsiveness to specific clinical, situational, and interventional changes are needed in this population.
The medical outcome survey Short Form 36 (SF-36) has been widely used and validated as a quality of life (QoL) assessment tool for the general population and in various subpopulations (1), including patients who have ESRD and are on dialysis (2–14). These studies have shown that physical (PCS) and mental component summary (MCS) scores from the SF-36 are significantly associated with clinical indicators (e.g., hemoglobin, albumin, dialysis dosage), morbidity, and mortality in the dialysis population, even after adjustment for case mix and other factors; however, Ware et al. (15) have since used regression methods to select 12 of the 36 items that are covered by the SF-36 to reproduce the PCS and MCS scores. The shortened questionnaire, known as the SF-12, required only one third of the usual time for completion of the SF-36, with the trade-off being loss of information from eight domain scores, namely general health, vitality, physical functioning, role-physical, bodily pain, social functioning, role-emotional, and mental health (1,16). Direct comparisons between both PCS-36 and PCS-12 and between MCS-36 and MCS-12 have indicated very good correlation and agreement in the general population (15,17), the elderly (18), and some specific subpopulations, including patients with rheumatoid arthritis (19) and ischemic stroke (20) and after myocardial infarction (21).
The SF-12 has not been validated specifically for patients who are on long-term dialysis, although it was used in lieu of the SF-36 in two small studies (22,23). We also found a study that reported mean SF-12 component scores from 38 dialysis patients (from a larger cohort of patients with chronic kidney disease) as part of the recently developed Kidney Disease Quality of Life-36 (KDQoL-36) (24). Furthermore, information regarding any association between morbidity and mortality rates with SF-12 component scores in this population is lacking. This cross-sectional study aimed to measure agreement between the SF-36 and the embedded SF-12 in a large, contemporary, nationally distributed population of long-term dialysis patients. In addition, we compared implications of PCS and MCS derived from both methods on the basis of their respective associations with hazard rates for hospitalization and death.
Materials and Methods
Study Population
An automated reminder alerts the social worker to offer the SF-36 survey to all patients who initiate dialysis therapy in Fresenius Medical Care, North America (FMCNA) facilities after their 45th day and upon completion (or refusal to participate) at 6-mo intervals thereafter. Between January 1, 2006, and December 31, 2006, 80,049 prevalent dialysis patients from approximately 1100 FMCNA-legacy facilities had at least one opportunity to complete the survey. Among them, 44,395 (55%) unique patients had scorable SF-36 and SF-12 responses (i.e., “responders”), forming the basis of this report.
Case-mix information (age, gender, race, diabetes, vintage, and dialysis modality) was collected as of the survey date for both responders and nonresponders; the latter group comprised patients who were unable to respond (e.g., because of cognitive or language difficulties), were unwilling to respond, had incomplete/unscorable responses, or postponed addressing the survey and never completed it. For responders, age was calculated on the date of survey, whereas vintage was defined as the time elapsed between each patient's date of first dialysis and the survey date. For nonresponders, we substituted the date that the survey was offered for “survey date.”
For responders, all available laboratory values from routine monthly evaluations that were performed by a single laboratory (Spectra Laboratories, Rockleigh, NJ) were averaged for the last 3-months before and leading up to the survey date to include albumin (by bromcresol green method), creatinine, hemoglobin, phosphorus, calcium, ferritin, and transferrin saturation. Dialysis dosage was collected and averaged during the same period, and hemodialysis (HD) dosage obtained from two-sample variable volume urea kinetic modeling was converted into weekly standardized Kt/V to allow for pooling and analytical compatibility with peritoneal dialysis dosage (25,26). The first hospitalization and mortality (includes withdrawal from dialysis) outcomes were tracked for a follow-up period of up to 1 year from the date of survey. Patients who were lost to follow-up as a result of transplantation, recovery of kidney function, or transfer out of the FMCNA system contributed person-time at risk until their last day before discharge.
SF-36 and SF-12 QoL Scores
The SF-36 summary scores (PCS-36 and MCS-36) range from 0 to 100, with higher scores representing better self-reported health, and they were calculated using standard (US-derived) scoring algorithms from Ware et al. (1,16), General health and vitality are domains shared by PCS and MCS. In addition, PCS encompasses physical functioning, role-physical, and bodily pain, whereas MCS includes social functioning, role-emotional, and mental health. The embedded SF-12 uses only 12 questions from the SF-36 to reproduce the PCS and MCS scores that would have been obtained from 35 of 36 questions on the SF-36 (15). An overview of the structure of each survey is provided in Table 1. The SF-12 summary scores (PCS-12 and MCS-12) also range from 0 to 100 and were calculated using the SAS algorithm program from the KDQoL work group, developed for scoring the SF-12 components of the KDQoL-36 (27).
Overview of SF-36 and embedded SF-12, with SF-36 scales and marks for questions with major contributions to each of the PCS and MCS scores
Statistical Analyses
Pearson linear correlation coefficient (r), Spearman rank correlation coefficient (ρ), and intraclass correlation coefficient were calculated for comparison of SF-36 and SF-12 to describe agreement. Pearson correlation coefficients were also determined (1) within subsets of race, gender, and dialysis modality to determine consistency within subgroups and (2) to assess the relationship between SF-36 domain scores and SF-12 component scores. Cox proportional hazard models were constructed to determine associations between SF-36 component scores individually, with hospitalization as well as mortality rates, both with and without adjustment for case mix and laboratory variables. In parallel, similar models were constructed using SF-12 component scores. A final multivariable model was then constructed with both PCS-36 and MCS-36 as predictor variables and for side-by-side comparison, a second model substituting both PCS-12 and MCS-12 while retaining all of the other variables unchanged. No imputation was attempted for missing values, and all analyses were performed using SAS 9.1.3 (SAS Institute, Cary, NC).
Results
The study cohort of 44,395 patients (55% response rate) had mean age of 61.2 ± 15.1 years; 46% were female, 57% were white, and 51% had diabetes; and mean vintage was approximately 3 years, with the majority (94%) of patients treated with in-center HD. These characteristics, shown in Table 2, were similar to those from 35,654 (45%) nonresponders although statistical comparisons indicate significant differences in all categories at P < 0.01 except female gender (P = 0.9), a result, in part, of the large sample size. The comparative distributions of SF-36 and SF-12 scores are shown in Figure 1; a slight right shift for PCS-12 versus PCS-36 and a slight left-shift for MCS-12 versus MCS-36 were noted. The responders' mean PCS-36 and PCS-12 scores were 33.1 ± 10.5 and 35.3 ± 9.8, respectively, whereas the mean MCS-36 and MCS-12 scores were 48.0 ± 11.2 and 46.9 ± 10.7, respectively. In addition to the frame shift, we observed skewness at the extremes (skewness parameter for PCS-36 = 0.27, PCS-12 = 0.29, MCS-36 = −0.29, and MCS-12 = −0.25), also shown in Figure 2.
Patient characteristics of all patients surveyed from January 1 through December 31, 2006
Frequency distribution curves showing overlap of responder's survey scores for (A) PCS-36 with PCS-12 and (B) MCS-36 with MCS-12.
Scatter plots showing the linear correlation (r) between PCS-36 with PCS-12 (A) and MCS-36 with MCS-12 (B).
Excellent linear correlation was noted between PCS-36/PCS-12 and MCS-36/MCS-12 measures, with both having the same Pearson coefficients (r = 0.94, P < 0.0001; Figure 2). Furthermore, the rank order of values were similarly at Spearman ρ = 0.94 for both comparisons (P < 0.0001). The intraclass correlation coefficient values between both PCS scores and both MCS scores were also at 0.94 (P < 0.0001), indicating that scores between these two instruments not only were highly correlated but also had excellent agreement. Additional subset analysis within subgroups of race and gender indicated that r = 0.94 consistently, whereas in different dialysis modalities, r = 0.94 for in-center HD and r = 0.95 for each of peritoneal dialysis and home HD (all P < 0.0001); therefore, there was excellent correlation and agreement between PCS-12 and PCS-36 as well as between MCS-12 and MCS-36.
The PCS-12 also exhibited an almost identical correlation profile with that of PCS-36 toward the eight SF-36 domains, which was mirrored when comparing MCS-36 with MCS-12 (Table 3). Because of slight shifts in distribution curves for PCS-12 and MCS-12 evident in both Figures 1 and 2, there were differences in absolute group mean scores compared with the SF-36 among patients who were hospitalized and those who died during the 1-year follow-up period (Table 4); however, the “gap” in mean scores that was observed between patients with and without outcomes was consistent between PCS-36 and PCS-12 or MCS-36 and MCS-12. The differences in means between hospitalized and nonhospitalized patients were 3.4 points for PCS-36 and 3.0 points for PCS-12, whereas it was 1.6 points for MCS-36 and 1.7 points for MCS-12. Similarly, between those who died and survivors, differences were 5.1 points for PCS-36 and 4.7 points for PCS-12 and 2.1 points for MCS-36 and 2.2 points for MCS-12; therefore, although the comparative scores were not exactly the same and the thresholds were different, the ability of either measure to separate between those with desirable outcomes (e.g., survived or not hospitalized) and those with poor outcomes (e.g., died or hospitalized) were statistically significant within either measure, and the magnitude of the difference in scores relative to different outcomes were similar between SF-12 and SF-36.
Linear correlation coefficients comparing SF-36 and SF-12 component scores
Average SF-36 and SF-12 scores among patients grouped by observed outcomes, with a follow-up period of up to 1 year from the date of survey
The risk profiles for mortality when using each of PCS-36, PCS-12, MCS-36, and MCS-12 individually, in unadjusted, case-mix–adjusted, and case-mix– and laboratory-adjusted models are shown in Figure 3. The corresponding risk profiles for hospitalization are shown in Figure 4. The risk profiles are markedly similar between PCS-36 and PCS-12 as well as between MCS-36 and MCS-12. Furthermore, we note more prominent hazard ratios (HRs) associated with PCS than MCS in both the mortality and hospitalization analyses, indicating that PCS was a stronger predictor of these outcomes than MCS. In addition, there seems to be a greater difference in HRs among categories in models for mortality versus models for hospitalization, indicating that both the PCS and the MCS were more predictive of mortality than hospitalization. When PCS and MCS were combined in a multivariable model (Table 5), SF-36 and SF-12 component scores exhibited virtually identical HRs and the other independent variables in the model similarly had HRs unchanged. Each incremental PCS-12 and PCS-36 point was associated with identical 2.4% lower adjusted HR of death and 0.4% decline in HR for first hospitalization (both P < 0.0001). Corresponding improvement in HR of death for each MCS point was 1.2% for MCS-12 and 1.3% for MCS-36, whereas both had a similar 0.6% lower HR for hospitalization per point (all P < 0.0001).
Risk profile from Cox proportional hazard models for time to death using PCS-36 (A), PCS-12 (B), MCS-36 (C), and MCS-12 (D).
Risk profile from Cox proportional hazard models for time to first hospitalization using PCS-36 (A), PCS-12 (B), MCS-36 (C), and MCS-12 (D).
Comparative HRs within similar multivariable Cox models for hospitalization and mortality that contain both PCS and MCS scores as determinant variables in addition to all case-mix and laboratory biomarkers
Discussion
To our knowledge, this report is the largest cross-sectional study in patients ESRD and with SF-12 and SF-36 information with accompanying risk estimates for hospitalization and mortality. Our survey response rate of 55% compares favorably with 47.6% of the American cohort reported in the Dialysis Outcomes and Practice Pattern Study (DOPPS) (28). The responders' demographic characteristics were similar enough to that of nonresponders, and because the study population was distributed nationally, we believe that these results are potentially generalizable to the US dialysis population. Results indicate that group PCS and MCS information derived from SF-12 is highly correlated and is in agreement with those derived from SF-36 in this population. In addition, we show for the first time that implications on hospitalization or mortality HRs derived from SF-36 composite scores are equally applicable to SF-12–derived composite scores; therefore, use of PCS-12/MCS-12 in lieu of the PCS-36/MCS-36, either alone or as part of a larger questionnaire (e.g., KDQoL-36), is valid in the US ESRD population.
We confirm that, on average, PCS scores are much lower in patients with ESRD than in the general population (approximately 17 points less by PCS-36 and approximately 15 points less by PCS-12), consistent with previous reports in large US prevalent dialysis cohorts (2,3,5,6). Similarly, we confirm that average MCS scores are only slightly lower (approximately 3 to 4 points less) than in the general population. Although a decade apart, the distribution of SF-36 PCS and MCS scores from this study almost exactly mirrors that of the FMCNA cohort from 1996 (5). Furthermore, we detected a similar magnitude of decline in adjusted relative risk for death for each 1-point increase in PCS (2.4%) as reported in other large studies: 2.0% from Lowrie et al. (5), approximately 2.1% from De Oreo et al. (2) (derived from 10.4% decline per 5 points), and 2.1% in the US cohort of DOPPS by Mapes et al. (28) (derived from 21% decline per 10 points), for each 1-point increase in PCS. The HR for mortality in this study decreased by 1.3% for each incremental MCS-36 point (with 1.2% for MCS-12) and was most consistent with DOPPS data, which revealed a 1.3% lower HR per 1-point change in MCS (derived from 13% decline per 10 points) (28). Values ranged from a 1.4% decline for each 5-point increase in MCS (De Oreo et al.) to a 2% decline for each MCS 1-point increase (Lowrie et al.) (2,5); however, significant increases in death risk accrue as soon as PCS-12 falls below 44 (for PCS-36 below 40), whereas a slightly higher risk was associated with MCS-12/36 below 50.
The corresponding decline in hazard rate for first hospitalization was only by 0.4% per incremental PCS point in this study, slightly lower than the 0.9% lower hazard rate per 1-point increment of PCS from DOPPS (Lowrie et al. reported odds ratios for hospitalization and De Oreo et al. reported hospital days per patient-year) (2,5,28). The corresponding time to first hospitalization hazard rates were −0.6% lower HR per MCS point in this study and 5% lower HR per 10-point MCS increment in DOPPS. Similarly, hospitalization risk begins to increase at PCS-12/36 below 44 and MCS-12 below 50, whereas for MCS-36, scores up to 59 were associated with increased risk (of borderline significance) in adjusted models. Of note, adjustment for case-mix and laboratory variables had a larger impact on the hazard rates associated with PCS than MCS, perhaps indicative of a stronger correlation between these variables and physical well-being.
Although the PCS and MCS both are known to predict hard outcomes in ESRD, losing information provided by the eight domains of the SF-36 may decrease the ability to detect more specific changes in a patient's functional well-being. For example, one seminal study that showed QoL improvement that resulted from increased hematocrit by the use of recombinant erythropoietin in new dialysis patients would be less impressive if only PCS and MCS were reported, absent the much larger changes detected in vitality, social functioning, mental health, and physical functioning (13). Thus, the usefulness of the SF-12 in isolation will depend on the purpose of investigators, notwithstanding the logistical ease of implementation when compared with the SF-36. The Centers for Medicare and Medicaid Services Interpretative Guidelines [S&C-09-01, version 1.1, 10/03/08 (29)] memo accompanying the recently updated Conditions for Coverage for ESRD facilities (42 CFR part 494), identified the KDQoL-36 as the preferred standardized physical and mental assessment tool for psychosocial status, on the basis of recommendations from the National Quality Forum and the Centers for Medicare and Medicaid Services Clinical Performance Measures Work Group, with consideration that use of the KDQoL-36 is free from royalty fees. Loss of the eight domain scores will be offset by the addition of kidney disease–specific questions on the burden, symptoms, and effects of kidney disease on daily life (24). In this analysis, we used the scoring algorithm from the KDQoL work group, thereby providing a better understanding of the PCS-12 and MCS-12 that will eventually be obtained from implementing the KDQoL-36 (27).
Taken together, results suggest that the use of SF-12–derived component scores in ESRD are just as good as their SF-36 counterparts. Furthermore, norms and other interpretation guidelines from previous work using the SF-36 in the US dialysis population is useful in interpreting the SF-12. The strengths of this study include results that were robust, a diverse source population with a distribution that is national in scope, a relatively high survey response rate, and a contemporary time period reflecting current dialysis practices and technology; however, the study has several limitations, with the first three of them inherent to the study design: First, this was a cross-sectional study that did not take into account longitudinal changes; second, results obtained with regard to prediction of death and hospitalization hazard rates were not necessarily causal and should be interpreted with caution; third, the strengths of the association observed may have a larger variance when studying groups of a much smaller size. Fourth, this analysis does not necessarily apply to non-US patient populations, although our findings may be true in these populations, as well. Clearly there are international and intercontinental variations in scores and interpretation of scores, evident in DOPPS (6). Fifth, application and implications represented here pertain to groups of patients, and the role of SF-12 component scores as an adjunct to clinical decision making in individual patients require further investigation. We need longitudinal studies in individual patients that can also assess the sensitivity of SF-12 measurements to changes in clinical condition and interventions as well as the potential for patient fatigue or “burnout” with repeated periodic survey administration. Finally, potential “context” bias may arise from our use of the embedded SF-12 (within the SF-36) as opposed to isolated implementation of the SF-12 questionnaire; however, some reassurance may be gained from a study conducted by Ware et al. (30) in a sample of 525 patients for whom the product moment correlation between answers to isolated SF-12 questions and the SF-12 items embedded in the SF-36 was exceptionally high (r = 0.999).
Conclusions
This study validates the use of the SF-12 alone or as part of a bigger survey (e.g., KDQoL-36) in long-term dialysis patients. Both PCS-12 and MCS-12 correlated with those from SF-36 and have identical prognostic association with death and hospitalization risk. Norms and other interpretation guidelines from previous work using the SF-36 PCS and MCS in the US dialysis population will be applicable in interpreting the SF-12 moving forward. Further study is needed to determine utility of longitudinal SF-12 measurements with regard to responsiveness to specific clinical, situational, and interventional or therapeutic changes not only in patient groups but also within individual patients who are on long-term dialysis.
Disclosures
All authors are employees of Fresenius Medical Care North America.
Acknowledgments
A previous version of this work was published as an abstract (J Am Soc Nephrol 19: 289A, 2008).
We thank Dr. Fred Finkelstein for providing a wonderful overview of the state of the science of evaluating health-related QoL for our research team. We are grateful to FMCNA social workers for diligently attempting to collect QoL information from our dialysis patients. We also thank Norma Ofsthun and Lori Vienneau for sharing their automated SF-36 scoring algorithm (in SAS) that has been used extensively in previous FMCNA projects.
Footnotes
-
Published online ahead of print. Publication date available at www.cjasn.org.
- Received October 12, 2009.
- Accepted November 18, 2009.
- Copyright © 2010 by the American Society of Nephrology