Abstract
Background and objectives: Whereas current GFR estimating equations approximate direct GFR measurement at a single time point, formulas that capitalize on changes in easily measured biologic parameters could improve the accuracy and precision of GFR estimation.
Design, setting, participants, & measurements: In the Chronic Kidney Disease in Children Cohort (aged 1 to 16 yr), we measured GFR by plasma disappearance of iohexol (iGFR) and biomarkers in the first two annual visits. Models took the form GFR_{2} = a[GFR_{1}/40]^{b}[X_{2}/X_{1}]^{c}, where GFR_{2} and GFR_{1} represented the current and previous years' iGFR, 40 ml/min per 1.73 m^{2} was the cohort mean, and X_{2}/X_{1} was the change in predictors over time. Using data from 360 participants with a median age of 12.1 yr, we evaluated the predictive performance of a past GFR measurement and 20 other variables using a twothirds random sample of the data. A onethird sample was reserved for validation.
Results: Previous iGFR measurements were strongly predictive of subsequent iGFR and adding change in height/serum creatinine significantly improved the explanatory power to 78%. In the validation set, the correlation between estimated and measured GFR was 0.88, and 48 and 88% of estimated GFRs were within 10 and 30% of observed iGFRs. When the past GFR measurement was not used, addition of change in markers to a crosssectional model did not improve prediction.
Conclusions: Longitudinal formulas to estimate iGFR capitalize on the high predictive power of previous iGFR measurements and in this study yielded a parsimonious prediction model with the potential for assessing progression in the clinical setting.
GFR, the gold standard for assessing kidney function, is a measure of the volume of plasma filtered each minute through the glomeruli of the kidneys. Direct measurement of GFR requires timed blood draws or urine collection to estimate clearance of an endogenous or exogenous marker that is ideally freely filtered and neither reabsorbed nor secreted. These methods can be challenging and timeconsuming for both study personnel and participants. Investigators, therefore, often turn to estimation of GFR using current measurements of biomarkers, such as serum creatinine (SCr). A valid and precise estimate of GFR that is based on easily measured variables could greatly simplify studies of kidney disease. Although available formulas provide reasonable approximation to clearancebased direct GFR measurements, the variability can be large (1,2), yielding the potential for attenuation of risk estimates for risk factors under study.
A recently published GFRestimating equation by Schwartz et al. (3) achieved improved prediction by adding cystatin C and blood urea nitrogen (BUN) to SCr. Another approach, however, is to recover more information from known predictors by taking advantage of a longitudinal study design. The predictive value of current biomarkers and bioeffects (hereafter called simply biomarkers) could be augmented with past biomarker data by using the change over time of a given biomarker as the predictive quantity in an estimating equation. More important, one powerful predictor that could be incorporated using a longitudinal context for estimation would be past GFR measurements. This idea forms the basis for transition models that regress current outcomes on past values of the outcome and on current and past values of predictors. Such models provide one tool for the analysis of longitudinal data and have been used in etiologic investigations of personal cigarette smoking on changes in pulmonary function in children (4); however, these models have not been extended to the arena of prediction, although they have much to offer.
Using a method that incorporates past measured GFR into the estimation of current values has a number of advantages. Grounding the estimation in past GFR levels smoothes the GFR function over time and lends continuity to the GFR data. Within the context of a longitudinal study, a more natural estimation vehicle is a longitudinal prediction model that capitalizes on past kidney function assessment and borrows strength from biomarker information at both a previous and a current visit. In research settings, participants could undergo direct GFR measurement at fewer time points, reducing the stress of study involvement while maintaining the integrity of GFR information over designs with pure estimation strategies. There are also clinical benefits given that patients often have a previous GFR measurement, and clinicians could capitalize on that information to predict current GFR and, potentially, future decline.
Here, we describe the development of a longitudinal estimating equation for GFR within the context of the Chronic Kidney Disease in Children Cohort Study (CKiD). The CKiD uses iohexol plasma clearance data to measure GFR (iGFR) in each member of the cohort. This method has been shown to be a robust direct measurement of GFR (5–7). We evaluated the predictive benefit of capitalizing on these direct GFR measurements taken in a previous annual study visit for estimation of a current GFR. The strength of past measurements in predicting current GFR, we hypothesized, would yield a parsimonious estimating equation with good predictive credentials. Given that direct GFR measurements are not available in many study settings, we also explored the predictive gain of incorporating information on past biomarker levels into a crosssectional biomarkeronly estimating equation.
Materials and Methods
Study Participants and Design
The CKiD has been described previously (8). Briefly, children with mild to moderate CKD (30 to 90 ml/min per 1.73 m^{2}) were recruited on the basis of the original Schwartz formula (1,9,10) from 43 participating pediatric nephrology centers. Eligible children were 1 to 16 yr of age and had never undergone dialysis or organ transplant. GFR was determined from plasma iohexol disappearance curves at baseline, 1 yr later, and every other year thereafter. Data from participants with attainable iGFR measurements in visit 1 (baseline) and in visit 2 (1 yr later) were used to build and validate GFR estimating equations.
GFR Measurement and Biomarker Assays
At the study visit, an intravenous line or butterfly needle was used to administer 5 ml of iohexol. A second intravenous line was saline locked and used for obtaining blood samples for measurement of SCr and BUN; an aliquot was also obtained for HPLC determination of an iohexol blank. SCr (enzymatic) and BUN were analyzed centrally at the CKiD's laboratory at the University of Rochester (G.J.S.) on an Advia 2400 (Siemens Diagnostics, Tarrytown, NY). Blood samples were collected at four time points (10, 30, 120, and 300 min) after infusion, and serum iohexol concentrations were used to calculate iGFR. The method of GFR determination using the plasma disappearance of iohexol in a twocompartment system has been previously reported (2). GFR values were scaled to body surface area (BSA), which was determined using the formula of Haycock et al. (11). Chemistry panels (sodium, potassium, chloride, BUN, SCr, calcium, and phosphorus) and iohexol concentrations were entered by the Central Biochemistry Laboratory into a webbased data management system (Nephron) developed by the Data Coordinating Center.
Statistical Analysis
Twenty variables (age, height, weight, BSA, glomerulonephritis diagnosis, male, white race, height/SCr [ht/SCr], BUN, albumin, phosphate, calcium, glucose, proteinuria, hemoglobin, potassium, systolic BP, hematocrit, carbon dioxide, and sodium) were available in the data set for inclusion in a predictive model of GFR previously demonstrated to be associated (either cause or effect) with kidney function (12–15). Because cystatin C had been processed in only 59% of the participants, it was not used in the primary equation development but was used for sensitivity analysis (16). Parameter estimates from bivariable regressions adjusting for the variability in iGFR_{2} explained by iGFR_{1} were evaluated for predictive strength in the full data set. Factors that were significantly associated with iGFR_{2} in the adjusted models (P < 0.05) were further evaluated in multivariate models using a twothirds random sample of the data, which formed a training set for development of a prediction model. Demographic and morphologic variables were included to explain remaining variability after all of the biomarkers were assessed. A onethird random sample was reserved as an independent data set for validating the model fit.
The longitudinal GFR estimating equation was of the form log(iGFR_{2}) = α + γlog(iGFR_{1}/40) + βlog(X_{2}/X_{1}) + ε, where iGFR_{2} and iGFR_{1} represented the current and previous year's iGFR and X_{2}/X_{1} was the series of current year divided by the previous year's biomarker values. The estimate of the cohort mean (40 ml/min per 1.73 m^{2}) was used to standardize iGFR and make estimated effects interpretable. Specifically, exponentiating the intercept, α, yields the GFR in visit 2 for a child whose GFR in visit 1 was 40 ml/min per 1.73 m^{2} and whose biomarker values remained unchanged between visits (X_{2} = X_{1}). The parameter γ is the power that describes the impact on GFR_{2} of the GFR at visit 1 because it deviates from the mean in individuals with the same changes in the Xs. Likewise, the parameter vector β is the power that quantifies the impact of the ratio of visit 2 to visit 1 biomarker values in individuals with the same GFR_{1}. In particular, for two children who have the same GFR_{1} but one has twice the change in a biomarker relative to the other, the one with the higher change is expected to have 2^{β} the GFR_{2} of the other. The quantity, ε, represents the error between estimated GFR_{2} and the observed value.
Similarly, a complementary biomarkeronly longitudinal formula that added the change in biomarker levels to a crosssectional model was developed. The equation was of the form log(iGFR_{2}) = α + δlog(X_{2}) + βlog(X_{2}/X_{1}) + ε, where log(X_{2}) was the log of the visit 2 values centered at their means and X_{2}/X_{1} was defined as previously. The parameter interpretations are similar to those already described with the parameter vector δ now representing the power that describes the impact on GFR_{2} of the biomarker values in visit 2. The log normality assumption for these models was validated by fitting a generalized γ distribution and assessing whether the shape parameter estimate was statistically different from a value of 0 (3,17).
Results
A total of 362 study participants had available iGFR measurements that were based on iohexol plasma disappearance curves from the first and second annual study visits. Two individuals with extreme changes in iGFR from visit 1 (62.7 and 40.0 ml/min per 1.73 m^{2}) to visit 2 (8.1 and 7.5 ml/min per 1.73 m^{2}, respectively) were removed, leaving data from 360 children for analysis. The median age was 12.1 yr (visit 2), the baseline (visit 1) median iGFR was 42.7 ml/min per 1.73 m^{2}, and the median iGFR at visit 2 was 42.2 ml/min per 1.73 m^{2}. Descriptive statistics of the visit 2 characteristics and the ratios of visit 2 to visit 1 values are provided in Table 1. The median ratio for iGFR is <1.0, indicating declining values, although the decline was minimal during the first year of the study.
Table 2 shows the results of univariate linear regression models for iGFR_{2} (in the log scale) on each one of the variables in Table 2. In addition, it provides the adjusted regression coefficient arising from a regression of the bivariable model including iGFR_{1}. The explained variability is the percentage of the variability in iGFR_{2} accounted for by the independent variable after removing the effects of other variables in the model. From the unadjusted estimates, we found the strongest predictors of the log(iGFR_{2}) to be log(iGFR_{1}) (R^{2} = 68%), the change in log(ht/SCr) (R^{2} = 20%), the change in log(BUN) (R^{2} = 3%), log(age) (R^{2} = 3%), the change in log(phosphate) (R^{2} = 2%), log(height) (R^{2} = 2%), and the change in log(albumin) (R^{2} = 1%). These factors remained strongly associated with current iGFR after removing the effect of previous iGFR by adding it to the model. Weight and BSA were also significant in the adjusted models but were strongly correlated with height, which had the highest R^{2} estimate. Glomerular diagnosis and the change in log(calcium) were also significant factors independent of the previous iGFR.
In the training set of 220 randomly selected children from the 329 who had complete data for all of the variables of interest (iGFR_{1}, ht/SCr, BUN, albumin, age, height, phosphate, calcium, and glomerular diagnosis), we evaluated significant predictive factors (P < 0.05) from the adjusted regressions shown in Table 2. Table 3 shows the parameter estimates and predictive properties of the models, which contained various combinations of the predictors log(iGFR_{1}), change in log(ht/SCr), the change in log(BUN), the change in log(albumin) and log(height at visit 2). Other factors were NS predictors in adjusted multivariable models. Including all of the significant biomarkers resulted in model III with an R^{2} of 79%, a root mean squared error (RMSE) of 0.190, and 84% of the predicted GFR_{2} values within 30% of the observed iGFRs. Adding log(height_{2}) to the model yielded a small increase in R^{2} to 80% (model IV); however, an estimating equation using only GFR_{1} and the change in log(ht/SCr) (model IIa) resulted in a model with similar performance.
The models from Table 3 were evaluated in an independent set of data (not used for model development) that consisted of 109 participants with results shown in Table 4. Because the estimated GFR originates from data on the log scale, both the bias and the 95% limits of agreement are presented as percentages. The model with only the previous GFR and the change in log(ht/SCr) (model IIa) yielded the best performance in the validation set with a correlation of 0.88, errors within 10 and 30% of 88 and 48%, respectively, and a small bias of 0.5%. The predictive performance of the estimation using all of the predictors (model IV) is graphically illustrated in Figures 1 and 2 for the training and validation set, respectively. The departure of the predicted GFR estimates from the regression line through the data, which represents model I, highlights the degree to which the additional biomarker terms contribute information to the estimation.
Table 5 contains the results from replacing iGFR_{1} in the model with the log of the crosssectional measurements of visit 2 predictive biomarkers from Table 3 (ht/SCr, BUN, and albumin). Model V, the purely crosssectional model, results in an R^{2} of 76%, an RMSE of 0.204, and percentages of errors within 10 and 30% of 35 and 82%, respectively. When terms for the change in those biomarkers from visit 1 to visit 2 were added to the model, only the change in the log of ht/SCr was significant (Table 5, model VI). The R^{2} increased slightly to 77%, the RMSE became 0.202, and the percentages of errors within 10 and 30% remained similar (36 and 81%, respectively). Hence, current levels of ht/SCr and BUN were important for predicting current GFR, whereas the speed at which the current level of ht/SCr is achieved was statistically significant but added little to the predictive performance of the model.
Evaluating the predictive performance of model VI in the validation set (Table 6), the change of ht/SCr did not affect the predictive value or fit of the estimating equation. The bias decreased to −2.1 from −3.2%, but the 95% limits of agreement remained approximately the same width.
The distribution of iGFR_{2} was found to be consistent with a lognormal distribution in the models presented (e.g., 95% confidence interval for shape parameter of model IV: −0.4242 to 0.0496). Although we presented the results from only one data split (two thirds training and one third validation), 10 repetitions of the analysis with resampling of the training and validation set produced consistent results.
Discussion
Cohort studies offer inferential advantages over crosssectional studies in their ability to provide information on individual changes in biomarkers and factors of interest over time. Evaluating cohort data crosssectionally ignores the additional information on disease natural history that is captured in longitudinal measurement. Prediction models should similarly benefit from capitalizing on the longitudinal data collection by using past and present biomarker data to predict the measure of interest. In this study, we evaluated whether this supposition would hold true for the estimation of GFR. We found that a direct iGFR measurement taken in the first year of the CKiD study was highly predictive of the GFR value measured in the second year of the study. Variability in the prediction was reduced by adding the change in ht/SCr to the model. The change in log BUN and the change in log albumin added slightly to the predictive performance in the training set of data but had a detrimental effect on the error percentages in the validation data set, although BUN has been found valuable in crosssectional GFR estimating formulas (3). Performance in an independent data set indicated that an estimating equation based on the previous iGFR measurement and the change in ht/SCr resulted in the fewest errors and a bias of 0.5%.
The predictive power of the change in cystatin C, when evaluated in a subanalysis of the 60% (n = 122) with turbidimetric (DAKO SD, Copenhagen, Denmark) cystatin C measurements, did not contribute significantly (P = 0.08) to the prediction of the longitudinal model III in Table 3. Because cystatin C has been shown to contribute substantially to crosssectional equations (3,18,19), it is possible that a larger availability of measurements and/or alternative methods, such as nephelometry (20) (Siemens DadeBehring), for measuring cystatin C will result in improvements of the prediction of the proposed longitudinal equations.
Although the change in ht/SCr was informative when complementing the previous GFR measurement, the change in this biomarker was not as informative when added to a crosssectional equation. The performance in both the training and validation data sets indicates that visit 1 biomarker levels do not contain much information regarding GFR 1 yr later after taking into account the attained levels observed at visit 2; however, the decline in GFR during a single year was minimal in a majority of the participants. The benefit to prediction from previous biomarker measurements may be realized only with greater degrees of progression after more followup time has accrued.
These results indicate that the use of immediate biomarker values provides an adequate GFR estimate, but future investigations will be needed to evaluate the potential for using past biomarker data and direct GFR measurement to anticipate GFR decline. Current results suggest that when GFR measurements are available, using that past GFR information and the change in ht/SCr during the followup period improves the estimation compared with crosssectional prediction with equivalent biomarkers. Model IIa in Table 3, which relies primarily on the previous year's GFR measurement, yielded an RMSE of 0.197 compared with 0.204 for the purely crosssectional biomarkeronly equation, model V in Table 5.
The limitations to drawing broad conclusions from these results include the selectivity of participants in CKiD, who primarily represent children who have urologic disease, are under the care of pediatric nephrologists, and are mostly at Tanner stage I. In addition, the analyses used the logarithm of the predictors, which may prove, with the accumulation of more followup time, to be a suboptimal transformation for some biomarkers; however, the assumptions that accompanied the form of our equation were evaluated and not found to be violated to an extent that would invalidate our conclusions.
Conclusions
We have shown that during a 1year followup period, longitudinal direct measurement of GFR can be complemented with consistent estimates of GFR using transition models and biomarker data. The marriage of direct and estimated GFR provides a continuity of GFR data that befits the longitudinal study platform. Additional followup time and the availability of new biomarker data hold the promise for realizing the full potential of longitudinal estimating equations.
Disclosures
None.
Acknowledgments
The CKiD is funded by the National Institute of Diabetes and Digestive and Kidney Diseases with additional funding from the National Institute of Neurologic Disorders and Stroke; the National Institute of Child Health and Human Development; and the National Heart, Lung, and Blood Institute (UO1DK66143, UO1DK66174, UO1DK66116, and U01DK082194). GE Healthcare, Amersham Division, provided iohexol for the GFR measurements. The CKID web site is located at http://www.statepi.jhsph.edu/ckid.
Data in this article were collected by the CKiD with clinical coordinating centers (principal investigators) at Children's Mercy Hospital and the University of Missouri–Kansas City (B.W.) and Johns Hopkins School of Medicine (S.F.), data coordinating center at the Johns Hopkins Bloomberg School of Public Health (A.M.), and the Central Biochemistry Laboratory at the University of Rochester (G.J.S.).
Footnotes

Published online ahead of print. Publication date available at www.cjasn.org.
 Received March 16, 2009.
 Accepted August 10, 2009.
 Copyright © 2009 by the American Society of Nephrology