## Summary

**Background and objectives** Creatinine excretion rate (CER) indicates timed urine collection accuracy. Although equations to estimate CER exist, their bias and precision are untested and none simultaneously include age, sex, race, and weight.

**Design, setting, participants, & measurements** Participants (*n* = 2466) from three kidney disease trials were randomly allocated into equation development (2/3) and internal validation (1/3) data sets. CER served as the dependent variable in linear regression to develop new equations. Their stability was assessed within the internal validation data set. Among 987 individuals from three additional studies the equations were externally validated and compared with existing equations.

**Results** Mean age was 46 years, 42% were women, and 9% were black. Age, sex, race, weight, and serum phosphorus improved model fit. Two equations were developed, with or without serum phosphorus. In external validation, the new equations showed little bias (mean difference [measured − estimated CER] −0.7% [95% confidence interval −2.5% to 1.0%] and 0.3% [95% confidence interval −2.6% to 3.1%], respectively) and moderate precision (estimated CER within 30% of measured CER among 79% [76% to 81%] and 81% [77% to 85%], respectively). Corresponding numbers within 15% were 51% [48% to 54%] and 54% [50% to 59%]). Compared with existing equations, the new equations had similar accuracy but showed less bias in individuals with high measured CER.

**Conclusions** CER can be estimated with commonly available variables with little bias and moderate precision, which may facilitate assessment of accuracy of timed urine collections.

## Introduction

In most instances, outpatient collection of timed urine specimens for assessment of GFR and proteinuria has been replaced by serum creatinine-based estimating equations and spot urine albumin concentrations, respectively. However, timed urine specimens remain critical as a confirmatory test for GFR and proteinuria estimates (1); in the diagnostic evaluation of nephrolithiasis (2,3) and secondary hypertension (4); and for assessment of dietary sodium, potassium, and protein intake (5). Because timed urine collections are often over- or undercollected in the outpatient setting (6), their clinical use requires determination of whether an individual urine specimen is accurately collected. Historically, this has been accomplished by comparing the measured urinary creatinine excretion rate (CER) to an individual's expected CER. Imbembo and Walser combined data from four studies of CER across a wide age spectrum and used ideal weights from the Metropolitan Life Insurance Company tables. They reported that the expected CER was approximately 15 to 25 mg/kg per day in men, and 10–20 mg/kg per day in women (7,8), which has since frequently been used as the clinical standard of expected CER to which measured CER (mCER) is compared. However, extremes of these “acceptable” normal ranges vary by as much as 65% in men and 100% in women, thus timed urine collections may be deemed inaccurate only when grossly over- or undercollected. They also ignore factors such as age and race, which influence CER. These features limit the clinical utility of timed urine specimens and may contribute to erroneous clinical decision-making.

When serum creatinine is in steady state, the predominant determinant of CER is endogenous creatinine generation, which is largely a function of muscle mass (9) and therefore differs by sex (7,10,11), age (7,9,10,12), race (12,13), and body weight (7,10,11,13). Although prior CER estimating equations have been proposed (7,10,11,13), none incorporate all of these variables. To our knowledge, none have been externally validated in populations distinct from where they were derived, thus their bias and precision are unknown.

Here, we use pooled data from six kidney disease studies. Participants collected timed urine specimens in the outpatient setting. We develop two new CER estimation equations in three of the studies and then externally validate and compare their performance to existing CER estimating equations in participants from the remaining three studies.

## Materials and Methods

### Study Population

The Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) is a research group of pooled data from prior studies to develop and validate GFR estimating equations (14). A total of 26 studies were identified and these were divided into two data sets: one for equation development and interval validation (10 studies) and the second for external validation (16 studies). Key inclusion criteria were availability of GFR measurement by iothalamate or other exogenous filtration markers, calibration of serum creatinine assay, and willingness of the investigators to share individual patient data.

The analysis presented here includes six studies that provided timed urine collection data. The studies were divided into two categories, consistent with those established for GFR development and validation. The first consisted of the Modification of Diet in Renal Disease study (5), the Captopril in Diabetic Nephropathy Study (15), and the Diabetes Control and Complications Trial (16) with 2466 individuals and was used for development (random number generator selected two thirds of participants) and internal validation (remaining one third) data sets. The second category consisted of three additional studies (the Nephrotest Chronic Kidney Disease Study [17], the Consortium for Radiologic Imaging Studies of Polycystic Kidney Disease [CRISP] study (18), and the Groningen REnal hemodynamics Cohort [GRECO] [19,20] study) for 987 individuals used for external validation. Thus, in aggregate, the analysis presented here included 3453 participants.

### Measurements

Detailed description of each of the six studies is provided elsewhere (5,15–21). In each, age, sex, and race were determined by self-report. GFR was measured as clearance of ^{51}Cr-EDTA in one study (17) and iothalamate in the remainder. In each study, participants received instructions about voiding to an empty bladder before initiation of the timed urine collection, collecting all urine, and refrigerating specimens before their return to study personnel. Urine creatinine was measured by the alkaline picrate assay in one study (22) and by the kinetic Jaffe colorimetric assay in the remainder. Urine volume was recorded, and CER was expressed in milligrams per day. The collection time was 24 hours in all but one study, which used a 4-hour collection (23). When converted to the 24-hour equivalent, CER in this study was similar compared with the remaining studies. This study was included within the development and internal validation data set.

### Equation Development

We developed CER estimating equations in the development data set using least-squares linear regression. We excluded subjects with mCER levels that were biologically implausible *a priori* (mCER <350 or >3500 mg/d, *n* = 38 [1.1%]). We evaluated age, sex, race, weight, height, and laboratory measurements selected *a priori* as surrogates of nutrition or muscle mass (serum creatinine, urea nitrogen, bicarbonate, total cholesterol, glucose, calcium, phosphorus, and albumin). We evaluated the influence of each on the linear regression model *R*^{2} and root mean square error (RMSE). Age, sex, and race were forced, given their known relationship to creatinine generation. To provide a parsimonious list, additional variables were retained when they improved the *R*^{2} > 0.02.

In sensitivity analysis, we evaluated models of natural log-transformed CER and CER corrected for body surface area (BSA) in a similar fashion to that described above. The resulting equations had similar precision, bias, and overall accuracy to the untransformed equations in external validation data sets, so we present data for untransformed CER equations only. Multiplicative interaction terms were generated by multiplying each of the retained variables with one another. Each interaction term was individually included in the final regression model and evaluated for their influence on the *R*^{2} and RMSE. None improved overall model accuracy (change in *R*^{2} < 0.02).

### Equation Evaluation

Each participant's estimated CER (eCER) was calculated using the newly derived equations and four previously published CER estimating equations (7,10,11,13). Differences were compared graphically by plotting their difference (mCER − eCER) against eCER (24,25). A least-squares linear regress line and a Lowess smoother plot (0.8 bandwidth) were superimposed to facilitate evaluation of bias. Bias was expressed as the median difference (mCER − eCER) and percent difference ([mCER − eCER] × 100/mCER). The interquartile range of the difference and percent difference was used to assess precision. Overall accuracy reflects a combination of bias and precision and was expressed as the percentage of individuals with eCER within 15% (P_{15}), 20% (P_{20}), and 30% (P_{30}) of mCER and RMSE in the external validation data set. P_{15} was chosen through consensus of authors because we considered errors in measurement >15% potentially unacceptable for clinical purposes if used for a confirmatory test for GFR. P_{30} was chosen as a commonly used criterion to evaluate accuracy of GFR estimating equations (14,21), thereby providing a reference for comparison to other estimating equations in clinical practice.

In the external validation data set, we compared bias, precision, and accuracy of the newly developed equations to four existing equations (7,10,11,13). The original equation by Rule *et al.* was expressed as the natural log of CER and was normalized to 1.73 m^{2} of BSA (9). Thus, we converted this equation by exponentiation, and multiplying by BSA/1.73, to facilitate comparisons. BSA was calculated by the DuBois equation (26).

Confidence intervals were computed using bootstrap methods (2000 bootstraps) for the absolute and percentage difference and the RMSE. We used binomial approximation to estimate standard errors for P_{15}, P_{20}, and P_{30} (27).

Analyses were conducted using Stata statistical software, version 11.0 (Stata Corporation, College Station, TX).

## Results

### Clinical Characteristics

Among the 3453 subjects from all six studies, mean age was 46 years, 42% were women, 9% were black, and median GFR was 50 (interquartile range 30 to 82) ml/min per 1.73 m^{2}. Mean CER differed by sex (1613 ± 428 mg/d in men and 1100 ± 285 mg/d in women), and values were similar to prior studies in community-living populations (28,29). Younger age and greater weight were associated with greater CER (Figure 1). Holding age and weight constant, the CER SD was 337 mg/d in men and 343 mg/d in women. Seventy-two percent of men and 79% of women had CER between the common clinically acceptable ranges (15 to 25 mg/kg per day in men and 10–20 mg/kg per day in women).

Table 1 shows the characteristics of the participants in the development, internal validation, and external validation data sets. Participants in the development and internal validation data sets were similar except the internal validation data set had a higher percentage of whites and higher serum phosphorus levels. Compared with the development and internal validation data sets, participants in the external validation data set were older; were more frequently white; and more frequently had moderate CKD (eGFR 30 to 59 ml/min per 1.73 m^{2}), higher serum albumin levels, lower serum phosphorus levels, and lower CER.

### Equation Development

Sex, weight, and serum phosphorus all improved model fit (increment in *R*^{2} > 0.02 for each factor; see Supplemental Table 1). Black race contributed less to the *R*^{2}; however, only 179 (11%) subjects in the development data set were black, and sensitivity analyses demonstrated that bias improved within blacks when the race coefficient was retained. Inclusion of height; diabetes; or serum concentrations of creatinine, urea nitrogen, albumin, bicarbonate, total cholesterol, glucose, and calcium did not improve model fit (change in *R*^{2} < 0.02). Serum phosphorus improved model fit but is not uniformly available in clinical practice or in research settings, so its inclusion may limit the utility of the final equation (hereafter referred to as equation E). Therefore, we elected to determine the stability and external validity of equation E and the preceding equation without serum phosphorus (hereafter referred to as equations D) in the internal and external validation data sets.

### Equation Validation

Table 2 shows the performance of the newly developed equations in the internal and external validation data sets. We observed little bias (<1%) in the internal validation data set. Precision was moderate because the interquartile range around the percent difference was 26% in either equation in internal validation. For accuracy, the mCER was within 15% of the eCER (P_{15}) among 59% of individuals with either equation and within 30% of eCER (P_{30}) among 85% of individuals with equation D and 86% with equation E.

Next, we evaluated equation performance in the external validation data set. The percent difference between mCER and eCER (bias) remained <1% with either equation, and the interquartile ranges were modestly higher compared with observations in internal validation (32% and 27% for equations D and E, respectively). With equation D, 51% of participants had eCER within 15% and 79% had eCER within 30% of mCER. Similar values for equation E were 54% and 81%, respectively. Accuracy was also assessed by comparison for the RMSE (lower RMSE demonstrates greater accuracy). The RMSE was slightly higher in the external validation compared with internal validation for both equations.

We evaluated the performance of the equations in subgroups defined by variables retained in the equations. The number of individuals within strata was limited, particularly among blacks and persons with eGFR ≥90 ml/min per 1.73 m^{2}. The overall accuracy (lower RMSE) was better in women compared with men, and in normal-weight compared with overweight participants using equation D (Table 3). Bias was greater at higher GFR. For example, among individuals with GFR ≥90 ml/min per 1.73 m^{2}, mCER was approximately 9% higher than eCER and 8% lower among those with GFR <30. However, overall accuracy was similar across GFR categories. Similar observations were made in subgroup analysis using equation E (Supplemental Table 2).

Next, we compared the performance of the new equations to equations previously published by other groups. Compared with these studies, the CKD-EPI population was at least 3 times larger, participants were younger, and had lower mean GFR (Supplemental Table 3). Table 4 summarizes the performance of the new equations and four existing equations in external validation. Equation E had the lowest bias and greatest accuracy by comparison of the point estimates; however, the 95% confidence intervals overlapped with most of the other equations. The Cockcroft–Gault equation had statistically significantly greater bias compared with any of the remaining equations. Figure 2 graphically depicts bias across the range of CER. Equations D, E, and by Rule and colleagues had little bias across the spectrum, whereas the equations by Cockcroft–Gault, Walser, and Goldwasser tended to overestimate CER among individuals with the highest CER values.

## Discussion

To our knowledge, this is the first study to externally validate CER estimating equations in populations distinct from where they were derived. Our study therefore provides the first assessment of bias, precision, and accuracy of four existing and two new CER equations (see Table 4 legend). With the exception of the Cockcroft–Gault equation, the remaining equations showed little bias on average, moderate precision, and similar accuracy. The two new equations and that by Rule and colleagues showed the least bias among individuals with the highest CER values. These equations may have clinical utility for evaluating completeness of timed urine collections in clinical practice.

This study provides a first step to improving evaluation of timed urine collection in clinical practice. Preceding this study, the range of CER values deemed clinically acceptable had extremes as great as 100%, which might lead to erroneous clinical decision-making. For example, true urine aldosterone levels obtained in evaluation of secondary hypertension might be 50% or 200% of the measured level without recognition by the clinician. This might lead to the erroneous conclusion of normal aldosterone status, when in fact the patient has hyperaldosteronism, or *vice versa*. If used in clinical practice, the new equations will allow a more refined assessment of expected CER, incorporating variables such as age, race, and body weight, rather than depending on broad ranges that ignore these important determinants of CER.

Overall, the precision and accuracy of the CER equations were similar to GFR estimating equations (14). However, when used for GFR assessment, timed urine collections are usually used as a confirmatory test. Ideally one would want greater accuracy for confirmatory tests. For other purposes such as assessment of dietary intake, 30% errors may be acceptable. We offer several cutpoints (P_{15}, P_{20}, and P_{30}) to allow tailored evaluation of collection accuracy depending on the clinical scenario. The reasons underlying the remaining imprecision are uncertain. Residual imprecision may reflect errors in timed urine collections used in this study. If such errors were random, they would decrease precision without affecting the mean level, thereby not affecting bias, which is consistent with the data observed here. It is also possible that there is some biologic variability in CER that was not captured by the CER equations. An important next step will be to evaluate the equations in urine collections obtained under extremely strict quality control, such as in metabolic wards with bladder scans and/or foley catheters. If bias remains low and precision is improved, the data would suggest that collection inaccuracies may have contributed to the residual imprecision observed here. For now, we recommend considering the accuracy required by the clinical scenario and comparing the mCER to the corresponding threshold ranges reported here. If mCER is outside of this range, it may be prudent to repeat the timed collection. If the individual's mCER is reproducible in the second specimen compared with their first, the average value of the two measurements will improve accuracy. If not, it may be more likely that the original collection was indeed inaccurately collected.

Although the overall accuracy of the equations were similar, there was some bias depending on the level of GFR. This may reflect extrarenal creatinine excretion and metabolism among persons lower GFR (30). Thus, the performance of the equations should be validated in individuals without CKD before their use in that setting. If the degree of bias is considered clinically important, a correction factor (equivalent to the percent bias for the individual's level of GFR) could be incorporated.

Strengths of this study include its relatively large sample size and availability of demographic and laboratory data commonly available in clinical practice. These features allowed for equation development, externally validation, comparison of performance, and subgroup analysis all within one study. The study also has limitations. The study did not include subjects with advanced liver disease, cancer, amputees, or other individuals with muscle mass far from population norms. Because CER and muscle mass are strongly correlated (9,13), we anticipate that such individuals may have lower CER, and that the CER estimating equations may be unreliable (9). Most subjects had kidney disease and many were participants in clinical trials. Prevalence of black race was low (*n* = 303, 9%), which may influence the precision of the black race coefficient.

In conclusion, commonly available clinical variables allow estimation of an individual's expected CER with little bias and with moderate precision. Comparing mCER to eCER calculated by these equations may prove useful to evaluate whether timed urine specimens are accurate before they are used for diagnostic purposes. For now, caution should be used in using the equations in patients without kidney disease and in persons with muscle mass outside of population norms.

## Disclosures

None.

## Acknowledgments

We thank Ms. Clydene Nee for review of the manuscript and administrative assistance. The National Institutes on Diabetes and Digestive and Kidney Diseases (UO1 DK 053869) supported the CKD-EPI collaboration. The National Heart, Lung, and Blood Institute (R01HL096851-01) supported Dr. Ix.

## Footnotes

Published online ahead of print. Publication date available at www.cjasn.org.

Supplemental information for this article is available online at www.cjasn.org.

- Received June 7, 2010.
- Accepted September 9, 2010.

- Copyright © 2011 by the American Society of Nephrology