Abstract
Background and objectives: Billing claims are increasingly examined beyond administrative functions as outcomes measures in observational research. Few studies have described the performance of billing claims as surrogate measures of clinical events among kidney transplant recipients.
Design, setting, participants, & measurements: We investigated the sensitivity of Medicare billing claims for clinically verified cardiovascular diagnoses (five categories) and procedures (four categories) in a novel database linking Medicare claims to electronic medical records of one transplant program. Cardiovascular events identified in medical records for 571 Medicare-insured transplant recipients in 1991 through 2002 served as reference measures.
Results: Within a claims-ascertainment period spanning ±30 d of clinically recorded dates, aggregate sensitivity of single claims was higher for case definitions incorporating Medicare Parts A and B for diagnoses and procedures (90.9%) compared with either Part A (82.3%) or Part B (84.6%) alone. Perfect capture of the four procedures was possible within ±30 d or with short claims window expansion, but sensitivity for the diagnoses trended lower with all study algorithms (91.2% with window up to ±90 d). Requirement for additional confirmatory diagnosis claims did not appreciably reduce sensitivity. Sensitivity patterns were similar in the early compared with late periods of the study.
Conclusions: Combined use of Medicare Parts A and B billing claims composes a sensitive measure of cardiovascular events after kidney transplant. Further research is needed to define algorithms that maximize specificity as well as sensitivity of claims from Medicare and other insurers as research measures in this population.
Cardiovascular events account for 30 to 50% of mortality among kidney transplant recipients and compose one of the most important public health concerns in this population (1,2). Single-center medical records may provide detailed descriptions of clinical complications such as cardiovascular diagnoses and procedures after transplantation, but medical records are difficult to analyze unless integrated and stored electronically. The Organ Procurement and Transplantation Network (OPTN) collects national survey data on comorbidities at transplant but not in follow-up forms, and the current strategic focus of the OPTN is to reduce rather than broaden the scope of survey data collection to lower time and effort burdens on centers (3,4). Administrative billing claims of insurance providers compile dated, diagnosis-linked records of physician encounters, hospitalizations, and procedures that may be useful as proxy measures of clinical cardiovascular events in epidemiologic studies of large samples, including populations of transplant recipients.
The International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) is an internationally accepted method for classifying diseases, diagnoses, and procedures in the processing of bills for medical services (5). The ICD-9-CM codes on claims submitted to national insurance providers such as Medicare are increasingly examined beyond their administrative functions as surrogate measures in studies of health care use, care quality, clinical outcomes, and syndromic surveillance (6–8). Despite the growing popularity of billing claims as epidemiologic research measures, there is a relative paucity of data validating the accuracy of claims in representing clinical diagnoses and exposures in general population and transplantation samples. Among the published claims validation studies, Hebert et al. (9) studied the ability of Medicare claims with diagnosis codes for diabetes to identify self-reported diabetes among 6958 elderly, rural adults surveyed in 1992 through 1993. The authors found that a minimum requirement of one institutional claim or two ambulatory claims in the previous 2 yr provided good accuracy for research purposes (κ = 0.77, sensitivity 0.76, specificity 0.98).
The “Hebert case definition” caught the attention of epidemiologists in transplantation, and related algorithms have been applied to describe not only the incidence of posttransplantation diabetes but also an array of clinical conditions in kidney transplant recipients, including malignant, infectious, gastrointestinal, metabolic, psychiatric, and some cardiovascular complications (10–18). Other studies of acute posttransplantation cardiovascular events have used single claims as case definitions (19–21). On the basis of Medicare data for elderly Pennsylvania residents in 1999 through 2000, Kiyota et al. (22) found that single-hospital claims with diagnosis codes for acute myocardial infarction offered 94% positive predictive value (a metric of measurement accuracy that frames specificity in the context of prevalence) for myocardial infarctions recorded in hospital records, but these authors did not report sensitivity metrics for the claims-based definition. The accuracy of Medicare claims for cardiovascular events has not been assessed for patients with insurance eligibility on the basis of transplantation rather than age.
To improve understanding of the accuracy and limitations of billing claims as measures of cardiovascular events among kidney transplant recipients, we examined a novel, aggregate database of electronic medical records from one large Midwestern transplant program linked at the patient level with administrative billing claims of Medicare. Using this compilation of two independent data sources, we assessed the sensitivity of Medicare billing claims to detect clinically verified cardiovascular events and investigated whether sensitivity differed among several claims algorithms. In particular, we examined (1) sensitivity of combined Part A and Part B claims, as compared with use of either billing source alone; (2) comprehensiveness of detection in claims for procedures compared with diagnoses; (3) possible reductions in sensitivity for diagnoses with application of a more stringent requirement for additional confirmatory claims in the case definition; (4) the impact of extended claims capture periods on sensitivity; and (5) sensitivity patterns over time during the study.
Materials and Methods
Data Sources and Linkage Methods
The data source for this study is a novel construction in which electronic medical records from the Washington University Kidney Transplant Program Database (WU-KTDB) were linked at the patient level with national OPTN records and Medicare billing claims as compiled within the US Renal Data System (USRDS). After institutional review board, National Institutes of Health, and USRDS approvals, recipient identifier numbers from the WU-KTDB were linked using Social Security Numbers, names, and dates of birth to USRDS_IDs. Names and Social Security Numbers were removed after the linkage procedure, such that records for individual patients in the analytic files are identified by anonymous WU-KTDB patient codes and USRDS_IDs. Match validation included confirmation of agreement of transplant dates recorded in clinical records with OPTN reports.
Sampling Criteria
For this analysis, we retrospectively sampled adult (≥18 yr of age) recipients of kidney transplants at Washington University in 1991 through 2002. To allow examination of Medicare claims as diagnosis and procedure measures, analytic samples were restricted to patients with Medicare as their primary insurer. Medicare payer status for transplant recipients is tracked within “Payer History” records of the USRDS (23); we also require a minimum Medicare payment for the initial transplant hospitalization of at least $15,000 as indication of usage of Medicare primary benefits (24). Patients included in measures agreement analyses for particular clinical events were required to have active Medicare benefits at the time of the reference event, and thus analytic subsamples varied according to the event of interest.
Measures of Cardiovascular Diagnoses and Procedures
The American Society of Transplantation defines cardiovascular disease as coronary heart disease, cerebrovascular disease, and peripheral vascular disease (25). We also studied congestive heart failure, atrial fibrillation, and venous thromboembolism as cardiovascular diagnoses that have been associated with morbidity and mortality after transplantation (12,15,16,26). Along with diagnoses, we studied several cardiovascular procedures tracked in the WU-KTDB: Cardiac catheterization, coronary artery bypass grafting, amputation, and revascularization of peripheral vascular disease.
Reference Standard for Events
The clinical criteria used for scoring cardiovascular events are defined in Table 1. Information describing the clinical course of Washington University's transplant recipients is prospectively and retrospectively entered into a secured research database by trained nurse coordinators. Out-of-center events are included when they are reported back to the center with sufficient supporting detail to meet diagnostic criteria. Because many transplant recipients return to community providers for general care and some outside events are not communicated to the center, the transplant center data were not expected to capture comprehensively all out-of-center cardiovascular events but were considered as the reference standard for clinically verified events when captured. Complete follow-up of patient vital status and allograft status is maintained for all transplant recipients at the center.
Definitions of cardiovascular diagnoses and procedures as recorded in the transplant center's clinical database
Billing Claims as Comparison Measures
Evidence of clinical events was ascertained from Medicare billing claims using ICD-9-CM Diagnosis Codes, ICD-9 Procedure Codes, and Current Procedural Terminology codes submitted with claims. Table 2 lists the specific codes used to identify the cardiovascular diagnoses and procedures of interest within claims-based algorithms.
Billing claim codes used for identification of cardiovascular diagnoses and proceduresa
Statistical Analysis
We explored a series of algorithms for claims-based ascertainment of diagnoses and procedures. First, we examined case definitions formed by single institutional claims (Medicare Part A), single physician/supplier claims (Medicare Part B), or one claim of either type (Part A or B) within ±30 d of event date recorded in the WU-KTDB. This primary capture window was chosen because dates of some events in the clinical record were recorded as month and year, without day of month. Next, we examined case capture by expanding the window of eligible single claims successively to within ±45, ±60, and ±90 d of the clinically recorded event.
For diagnoses only, we also applied a variation of the Hebert method similar to that used by the USRDS Annual Data Report, which defines diagnoses on the basis of one Part A or two Part B claims submitted at least 1 d but no more than 365 d apart, in which the latest claim date is defined as date of diagnosis (2); we allowed final diagnosis date to fall up to 30 d after the date in the clinical record. In contrast to diagnoses that may be confirmed or excluded in practice on the basis of clinical reassessment, procedures are discrete events that may be represented by single claims; therefore, Hebert case definitions were not applied to procedures.
A schematic of the study design and event ascertainment procedures from the two data sources is shown in Figure 1. Data sets were merged and analyzed with SAS 9.1 for Windows (SAS Institute, Cary, NC). The sensitivity of each claims-based algorithm was computed as the proportion of reference events in the WU-KTDB among Medicare-insured patients that were captured by claims. Confidence intervals (CI) were computed as 95% CI for corresponding proportions. Differences in proportions were compared by two-proportions Z tests. The test statistic is computed as follows: Z = (p1_hat − p2_hat)/{[pp_hat(1 − pp_hat)]1/2 *[(1/n1) + (1/n2)]1/2}, wherein the proportions compared are p1_hat = (x1/n1) and p2_hat = (x2/n2), and pp_hat = (x1 + x2/n1 + n2). Hypotheses examined included whether (1) sensitivity of combined Part A or B claims (±30 d) as a case definition across all diagnoses and procedures was superior to sensitivity achieved with use of Part A or Part B claims alone; (2) claims sensitivity varied according to event type (i.e., differed for procedures versus diagnoses); (3) claims sensitivity improved with capture window expansion; (4) the requirement for additional confirmatory diagnosis claims (Hebert approach) imposed a cost of reduced sensitivity; and (5) Sensitivity patterns changed over time during the study (1991 through 1997 versus 1998 through 2002). Hypotheses with a rational directionality were examined with a one-tailed test. Specifically, the addition of claims data for event ascertainment by inclusion of both Medicare parts or by an expanded capture window could increase sensitivity or produce no change but not reduce sensitivity (one-tailed); the requirement for additional confirmatory diagnosis claims could reduce or not alter the sensitivity achieved with single claims but not increase sensitivity (one-tailed). Differences in sensitivity between procedures and diagnoses and across time periods of the study were examined with a more conservative two-tailed test because of the possibility of superiority for either event type or of either time period.
Study design: Sampling, clinical event abstraction, and billing claims ascertainment. CV, cardiovascular; KT, kidney transplant; A, primary claims capture window; B, expanded window for claims ascertainment; C, adapted-Hebert algorithm for detection of diagnoses in claims.
Results
A total 1128 kidney transplants recipients were recorded in the WU-KTDB in 1991 through 2002, and all matched uniquely to USRDS_IDs; of these, 571 met criteria for primary Medicare insurance at transplantation. The number of individual cardiovascular events per diagnosis or procedure category in the clinical data ranged from 19 to 85; event counts during periods of reported Medicare eligibility ranged from 4 to 40 (Table 3).
Sensitivity of Medicare claims for detection of clinically recorded cardiovascular diagnoses and procedures, according to claims-based ascertainment algorithma
Within a ±30 d window for claims ascertainment, the range of sensitivities of claims for clinically recorded cardiovascular events was 75.0 to 100.0% for Part A claims alone and 75.0 to 100.0% for Part B claims alone but improved to 83.3 to 100.0% for case definitions using Part A or B claims. The improvement with combined use of information from both billing sources compared with either source alone was statistically significant, with aggregate sensitivity of single Parts A or B claims across diagnoses and procedures of 90.9% compared with 82.3% for Part A (Z = 2.35, one-tailed P = 0.009) and 84.6% for Part B (Z = 1.79, one-tailed P = 0.037) alone (Table 4).
Statistical comparison of differences in the overall sensitivity of claims according to case definition and event type
Perfect capture of the four studied procedures by claims was possible within ±30 d or with short claims window expansion. The sensitivity of combined Part A or B claims sought within ±30 d of clinically record dates was 100% for coronary bypass grafting, amputation, and peripheral revascularization. Capture of cardiac catheterization by Part A or B was nearly complete (95.4%) within ±30 d and was perfect (100.0%) with lengthening the claims capture window to ±45 d. In contrast, sensitivity of claims for the studied diagnoses trended lower with all study algorithms (91.2% with window up to ±90 d; Z = 1.95, two-tailed P = 0.05).
Overall, expansion of the period for claims ascertainment up to ±90 d did not affect sensitivity of combined Part A or B claims for the collection of study events (P = 0.22). Secondary analyses suggested higher capture of potentially chronic diagnoses of atrial fibrillation and venous thromboembolism with window expansion, in that sensitivity of the combined-claims algorithm for atrial fibrillation was maximized at 96.9% by window expansion to ±60 d and sensitivity for venous thromboemoblism was maximized at 94.4% by window expansion to ±90 d. Aggregate sensitivity with extended capture window up to ±90 d showed a nonsignificant trend toward higher capture than that achieved by querying claims within ±30 d of the clinically recorded date for these two diagnoses (96.0 versus 90.0%; Z = 1.18, one-tailed P = 0.12). Sensitivity was unchanged for myocardial infarction, heart failure, or stroke, and the pooled sensitivity of claims in a ±90-d window for events aside from atrial fibrillation and venous thromboembolism was of similar magnitude to that achieved within the primary capture window (92.1 versus 91.3%; Z = 0.22, one-tailed P = 0.41).
The requirement for additional confirmatory diagnosis claims in an adapted Hebert algorithm produced a slight decrease in maximal sensitivity achieved by single Part A or B claims for myocardial infarction but otherwise did not reduce sensitivity for the studied cardiovascular diagnoses. The overall detection of diagnoses with the adapted Hebert algorithm was of similar magnitude and statistically indistinguishable from that achieved with single Part A or B claims (88.2 versus 89.0%; Z = −0.19, one-tailed P = 0.43).
Sensitivity patterns were similar across the years of the study. Overall, there were 75 eligible cardiovascular events (57 diagnoses, 18 procedures) in the reference clinical data in 1991 through 1997 and 101 events (79 diagnoses, 22 procedures) in 1998 through 2002. The sensitivity of claims within a ±30 d window for study events in 1991 through 1997 compared with 1998 through 2002 was as follows: All events, 92.0 versus 90.1% (Z = 0.43, two-tailed P = 0.67); diagnoses, 91.2 versus 87.3% (Z = 0.71, two-tailed P = 0.48); and procedures, 94.4 versus 100% (Z = −1.12, two-tailed P = 0.27).
Discussion
Reductions in survey data collection directed at lowering time and effort burdens on transplant centers has been deemed a key focus of the OPTN strategic plan (3,4). In this policy climate, alternative measures such as billing claims will likely assume growing importance as measures of clinical events in epidemiologic research among kidney transplant recipients. To date, there are limited data on the accuracy of claims-based measures in this population. We investigated the ability of Medicare billing claims algorithms to detect cardiovascular events recorded in the electronic medical records of one transplant program and observed several key findings: (1) Medicare Part A and Part B claims showed similar but nonoverlapping performance, such that overall sensitivity for diagnoses and procedures was statistically superior with use of algorithms combining claims from both institutions and physician/suppliers; thus, components of unique clinical information are captured in the two billing sources; (2) Perfect capture of the four studied procedures was possible within ±30 d or with short claims window expansion; in contrast, maximal sensitivity for all diagnoses was somewhat less than 100% with all study algorithms; (3) expansion of claims capture window up to ±90 d did not increase sensitivity for the collection of cardiovascular events overall but produced a trend toward improved capture for two potentially chronic diagnoses; and (4) the requirement for additional confirmatory claims did not appreciably reduce sensitivity of claims for the diagnoses of interest.
It is notable that perfect capture of clinical events by billing claims in this study was possible only for the study procedures. This finding is logically consistent with the primary purpose of claims codes to support reimbursement. Claims may be most sensitive for events that maximize reimbursement, including expensive surgeries and procedures, but be more prone to miss some diagnoses. Although all of the cardiovascular diagnoses studied are clinically important, claims seem unlikely to capture completely all diagnoses in complex patients such as transplant recipients, who often have multiple active diagnoses present at a single care encounter. Notably, the first two diagnosis codes for ambulatory visits in our transplant clinic are generally allocated to indicate management of the renal allograft (V42.0) and immunosuppression (V58.69). The lower sensitivity of diagnoses may also be related to the intermittency of visits for chronic conditions, whereas procedures are often managed as discrete events. Both the study of Hebert et al. (9) and a more recent investigation of national Veterans Affairs administrative data (27) found that a 2-yr surveillance period maximized the sensitivity of claims for diagnosis of diabetes.
Ideally, dates on billing claims should correspond closely in time to dates of clinically recorded events. The observed need for expanded ascertainment windows to maximize capture of atrial fibrillation and venous thromboembolism in this analysis likely reflects both the nature of the clinical reference and the types of events. Date entry for some events in the clinical record was performed as month and year, so a minimum ±30 d capture window was chosen as a primary study definition. Furthermore, retrospective ascertainment of events may have produced some imprecision in the clinical reference. We previously found that expanding the detection window in Medicare claims from ±30 d to ±90 d improved concordance of pharmacy claims with both electronic medical records and OPTN survey reports (28). Notably, widening the claims capture window had modest impact on the sensitivity of claims for the overall collection of cardiovascular procedures and diagnoses in this study, producing no change for two thirds of the events. The number of captured atrial fibrillation and venous thromboembolism events, potentially chronic cardiovascular conditions, was larger with the expanded window, although statistical significance of the change was limited by the small sample of reference events. Aside from imprecision in date recording, it is possible that events that may be managed in multiple clinical encounters were present at discrepant dates recorded in the medical records and claims but reached priority for clinical recording and billing at somewhat different times; however, this subanalysis was not a primary hypothesis and requires examination with a larger sample.
Implications of the observed deficits in the sensitivity of claims for cardiovascular diagnoses can be framed by considering claims-based estimates of disease frequency from previously published epidemiologic studies. For example, recent studies using single Part A or B Medicare claims as event measures among national samples of kidney transplant recipients reported 1-yr cumulative incidences of 3.0% for stroke/transient ischemic attack (21) and 5.6% for myocardial infarction (19), whereas the 1-yr incidence of new-onset congestive heart failure was 10.2% by the Hebert method (15). Assuming generalization of the sensitivity estimates from this study, the corrected incidences of these events would be 3.4, 6.7, and 11.0%, which do not represent marked changes for population-based estimates. Furthermore, a substantial focus of claims-based epidemiology is to estimate relative risks for events in relation to baseline factors of interest, and relative measures would not be affected by submaximal sensitivity unless it is differential with respect to a factor of interest.
An ideal surrogate measure optimizes both sensitivity and specificity for reference events, but often there is a tradeoff between sensitivity and specificity, and the relative importance of these metrics of measurement accuracy may depend on intended use. Given the nature of the available reference source of cardiovascular events, in which some cardiovascular events that were managed in patients’ home communities may have escaped recording in the center's database, we studied the ability of claims to detect documented events as one dimension of measure performance in this analysis. A previous comparison of single-hospital claims for myocardial infarction with hospital charts among general Medicare beneficiaries documented high positive predictive value of claims but did not report other metrics of measurement accuracy such as sensitivity (22). Although we did not examine specificity in this analysis, the requirement for additional confirmatory diagnosis claims as in the Hebert method is an approach designed to improve specificity by reducing capture of cases in which a diagnosis is under consideration but not yet confirmed (9). Notably, application of a more stringent, adapted-Hebert case definition did not appreciably reduce the sensitivity of single claims for cardiovascular diagnoses in our analysis. In a previous comparison of billing claims in Canada with self-reported diagnoses of cardiovascular health measures including myocardial infarction, “other heart disease,” and stroke, the requirement for more than one claim was universally associated with decrements in measure concordance (a measure of total agreement adjusted for chance) (29).
Our study is limited by availability of only Medicare claims for comparison with the clinical database. Medicare claims are particularly relevant to research among kidney transplant recipients because, unlike the eligibility requirements of age ≥65 or disability in the general population, renal allograft recipients are offered disease-specific Medicare entitlement, and Medicare is the largest single insurer in this population; however, our findings may not generalize to the administrative claims of other insurance systems. Comparisons based on small samples face risk for type 2 errors, but our sample size was adequate for detection of some important differences. Furthermore, similarities in the magnitudes of many proportions that were statistically similar in this analysis suggest that differences that could prove statistically significant in larger samples may not be meaningful in practical terms.
Our analyses used clinical records from one center as reference measures. Extension of our analyses using aggregated clinical data from multiple centers is warranted to assess the robustness of these findings. The reason for limited research on the accuracy of billing claims as study measures in dialysis and transplant populations is not the importance of the topic but rather likely reflects the technical complexity of assembling suitable data. Most payers, including Medicare, are not care providers and maintain databases limited to records of claims and benefits eligibility. Validation of claims-based research measures requires linkage of claims data to external data containing alternative measures of specific diagnoses and procedures of interest, such as electronic medical records, clinical databases, or surveys. Along with the need for electronically recorded alternative measures, requirements for access to identifying individual information to implement unique patient-level data links pose challenges for measures agreement studies with administrative data. Our study of claims sensitivity using linked single-center clinical records and Medicare claims is an important step that we hope will inspire multicenter, collaborative assessments of the accuracy of claims as research measures in transplantation.
Conclusions
This investigation of a novel linkage of Medicare billing claims and clinical records demonstrates generally good sensitivity of claims for cardiovascular events among kidney transplant recipients. Our findings support use of combined Part A and B claims to maximize sensitivity and suggest that the sensitivity of claims may be somewhat higher for procedures than for diagnoses. Administrative data provide measures for conducting timely, cost-effective, and unobtrusive research on large populations. Future collaborative and ideally prospective study is warranted to define algorithms that maximize specificity as well as sensitivity of claims from Medicare and other insurers as research measures in this population.
Disclosures
None.
Acknowledgments
K.L.L. received support from grant K08DK073036 from the National Institute of Diabetes and Digestive and Kidney Diseases. D.C.B. received support from grant P30DK079333 from the National Institute of Diabetes and Digestive and Kidney Diseases.
Portions of data reported here have been supplied by the US Renal Data System.
Footnotes
Published online ahead of print. Publication date available at www.cjasn.org.
The interpretation and reporting of these data are the responsibility of the authors and in no way should be seen as an official policy or interpretation of the US government, the National Institute of Diabetes and Digestive and Kidney Diseases, or the National Institutes of Health.
See related editorial, “Reliability of Medicare Claim Forms for Outcome Studies in Kidney Transplant Recipients: Epidemiology in Clinical Outcome Trials,” on pages 1156–1158.
- Received January 30, 2009.
- Accepted April 18, 2009.
- Copyright © 2009 by the American Society of Nephrology