Skip to main content

Main menu

  • Home
  • Content
    • Published Ahead of Print
    • Current Issue
    • Podcasts
    • Subject Collections
    • Archives
    • Kidney Week Abstracts
    • Saved Searches
  • Authors
    • Submit a Manuscript
    • Author Resources
  • Trainees
    • Peer Review Program
    • Prize Competition
  • About CJASN
    • About CJASN
    • Editorial Team
    • CJASN Impact
    • CJASN Recognitions
  • More
    • Alerts
    • Advertising
    • Feedback
    • Reprint Information
    • Subscriptions
  • ASN Kidney News
  • Other
    • ASN Publications
    • JASN
    • Kidney360
    • Kidney News Online
    • American Society of Nephrology

User menu

  • Subscribe
  • My alerts
  • Log in
  • My Cart

Search

  • Advanced search
American Society of Nephrology
  • Other
    • ASN Publications
    • JASN
    • Kidney360
    • Kidney News Online
    • American Society of Nephrology
  • Subscribe
  • My alerts
  • Log in
  • My Cart
Advertisement
American Society of Nephrology

Advanced Search

  • Home
  • Content
    • Published Ahead of Print
    • Current Issue
    • Podcasts
    • Subject Collections
    • Archives
    • Kidney Week Abstracts
    • Saved Searches
  • Authors
    • Submit a Manuscript
    • Author Resources
  • Trainees
    • Peer Review Program
    • Prize Competition
  • About CJASN
    • About CJASN
    • Editorial Team
    • CJASN Impact
    • CJASN Recognitions
  • More
    • Alerts
    • Advertising
    • Feedback
    • Reprint Information
    • Subscriptions
  • ASN Kidney News
  • Visit ASN on Facebook
  • Follow CJASN on Twitter
  • CJASN RSS
  • Community Forum
Original ArticlesChronic Kidney Disease
You have accessRestricted Access

A Concept–Wide Association Study of Clinical Notes to Discover New Predictors of Kidney Failure

Karandeep Singh, Rebecca A. Betensky, Adam Wright, Gary C. Curhan, David W. Bates and Sushrut S. Waikar
CJASN December 2016, 11 (12) 2150-2158; DOI: https://doi.org/10.2215/CJN.02420316
Karandeep Singh
*Division of Learning and Knowledge Systems, Department of Learning Health Sciences and
†Division of Nephrology, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, Michigan; Departments of
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rebecca A. Betensky
‡Biostatistics,
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Adam Wright
§Division of General Internal Medicine, Department of Medicine and
‖Department of Medicine, Harvard Medical School, Boston, Massachusetts
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Gary C. Curhan
‖Department of Medicine, Harvard Medical School, Boston, Massachusetts
¶Division of Renal Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts; and
**Epidemiology, and
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
David W. Bates
§Division of General Internal Medicine, Department of Medicine and
‖Department of Medicine, Harvard Medical School, Boston, Massachusetts
††Health Policy and Management, Harvard T.H. Chan School of Public Health, Boston, Massachusetts;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sushrut S. Waikar
‖Department of Medicine, Harvard Medical School, Boston, Massachusetts
¶Division of Renal Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts; and
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Figures & Data Supps
  • Info & Metrics
  • View PDF
Loading

Abstract

Background and objectives Identifying predictors of kidney disease progression is critical toward the development of strategies to prevent kidney failure. Clinical notes provide a unique opportunity for big data approaches to identify novel risk factors for disease.

Design, setting, participants, & measurements We used natural language processing tools to extract concepts from the preceding year’s clinical notes among patients newly referred to a tertiary care center’s outpatient nephrology clinics and retrospectively evaluated these concepts as predictors for the subsequent development of ESRD using proportional subdistribution hazards (competing risk) regression. The primary outcome was time to ESRD, accounting for a competing risk of death. We identified predictors from univariate and multivariate (adjusting for Tangri linear predictor) models using a 5% threshold for false discovery rate (q value <0.05). We included all patients seen by an adult outpatient nephrologist between January 1, 2004 and June 18, 2014 and excluded patients seen only by transplant nephrology, with preexisting ESRD, with fewer than five clinical notes, with no follow-up, or with no baseline creatinine values.

Results Among the 4013 patients selected in the final study cohort, we identified 960 concepts in the unadjusted analysis and 885 concepts in the adjusted analysis. Novel predictors identified included high–dose ascorbic acid (adjusted hazard ratio, 5.48; 95% confidence interval, 2.80 to 10.70; q<0.001) and fast food (adjusted hazard ratio, 4.34; 95% confidence interval, 2.55 to 7.40; q<0.001).

Conclusions Novel predictors of human disease may be identified using an unbiased approach to analyze text from the electronic health record.

  • chronic kidney disease
  • end stage kidney disease
  • natural language processing
  • informatics
  • electronic health record
  • adult
  • Ascorbic Acid
  • Cohort Studies
  • creatinine
  • Disease Progression
  • Electronic Health Records
  • fast foods
  • Follow-Up Studies
  • humans
  • kidney
  • Kidney Diseases
  • Kidney Failure, Chronic
  • Natural Language Processing
  • nephrology
  • Outpatients
  • Renal Insufficiency
  • Retrospective Studies
  • risk factors
  • Tertiary Care Centers

Introduction

Every year, over 100,000 Americans develop ESRD requiring RRT with hemodialysis, peritoneal dialysis, or kidney transplantation (1). Predictors of kidney disease have traditionally been identified through classic epidemiologic approaches, whereby individual risk factors are adjusted for known or suspected confounders and evaluated for their association with CKD progression. This has led to the identification of a number of potential associations with the development of ESRD (2–8), including race, comorbid conditions, conventional and novel biomarkers, lifestyle factors, and medications.

A limitation of retrospective or prospective cohort studies is that associations are typically tested with prespecified covariates. This is akin to candidate gene approaches used to study the genetic basis of diseases. Genome–wide association studies have enabled the discovery of new associations through a paradigm of simultaneous, unbiased testing of multiple associations. Similar approaches have been used to discover new associations between diseases and environmental exposures using environment–wide association studies (9,10) and between a single genetic variant and multiple phenotypes using phenome–wide associations studies (11).

Although work has been done using approaches, such as topic modeling, to incorporate unstructured notes into prediction models (12), modern epidemiologic approaches have not used the clinical narrative in the discovery of disease associations. Clinical notes may contain a rich description of numerous epidemiologic exposures. Discovering associations on the basis of the clinical narrative carries the added complexity of an open cohort, where patients may enter and leave the cohort at various time points and may be lost to follow-up or experience competing events.

In this study, we present and show a new methodology, which we term a concept–wide association study, for examining relationships between several thousand concepts extracted from clinical notes with the development of ESRD.

Materials and Methods

Study Design

We conducted a retrospective cohort study using the full text of clinical notes in the year before the first outpatient general nephrology visit. The date of the first outpatient general nephrology visit was identified through visit and billing data. The outcome was defined using International Classification of Diseases, 9th revision diagnosis codes (585.6, V42.0, and 996.81) and procedure codes (90970, 50360, 50365, 50370, 50380, 55.61, and 55.69) as the date on which RRT was first performed. All identified events were adjudicated through chart review.

Population Studied

Patients were included if they were seen between January 1, 2004, and June 18, 2014 by an adult nephrologist at a Brigham and Women’s Hospital–affiliated outpatient clinic. Exclusion criteria included visits only with transplant nephrology, known ESRD before the first nephrology visit, fewer than five clinical notes in the year preceding the first nephrologist visit, no documented follow-up, or no baseline creatinine values. The threshold of five notes was chosen, because 85% of patients had at least five notes in the year preceding the first nephrology visit.

Data Collection

We obtained data from the Partners Research Patient Data Registry, a centralized clinical data warehouse. We obtained information on patient demographics (age, sex, and race), billing codes, the full text of all electronic clinical notes, and laboratory values, including serum creatinine, calcium, phosphorus, albumin, bicarbonate, and the urine albumin-to-creatinine ratio. We defined baseline laboratory values for each laboratory test as the first available result on or after the first nephrology visit up to 365 days after the visit. Death was determined from the Social Security Death Index, and ascertainment was limited to 30 days after the final clinical note. If ESRD did not occur by this time, the observation was treated as censored. The Partners Healthcare Institutional Review Board approved this study, and the need for informed consent was waived. We adhered to the Declaration of Helsinki.

Extraction of Concepts from Clinical Notes

Concepts rather than individual words were evaluated so that phrases, such as “CHF” and “congestive heart failure” could be considered together when evaluating their association with kidney failure. Concepts were extracted from all clinical notes starting from 1 year before the first nephrologist visit up to but not including the nephrology visit (Figure 1). All clinical note types (including phone calls, outpatient notes, and inpatient notes) were included. Of note, Brigham and Women’s Hospital’s electronic health record is primarily an outpatient record; although it contains admission and initial consultation notes, it typically does not contain inpatient progress notes. Concepts were coded as binary variables for each patient. Concepts with <1% patient prevalence were not evaluated further.

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

Clinical notes from the year preceding the first nephrology visit were processed, and the extracted concepts were evaluated for associations with time-to-ESRD.

Concepts were extracted from notes in a two-step process. First, negated phrases were removed using the NegEx negation engine (13) using Python 2.7 (14). Second, notes were processed with the National Library of Medicine’s MetaMap software (15) (version 2013v2), which maps phrases to Unified Medical Language System codes known as concept unique identifiers. Extracted concepts were restricted to the Systematized Nomenclature of Medicine–Clinical Terms and RxNorm ontologies to map general and drug concepts, respectively. Concepts were not limited by semantic type, and therefore, all types of concepts contained were extracted (e.g., diagnoses, medications, signs, and symptoms). Mapping of phrases to multiple concepts was allowed (e.g., “heart failure” maps to concepts for both “heart failure” and “congestive heart failure”).

Natural language processing systems may occasionally create erroneous mappings (e.g., the phrase “GN” in a clinical note maps to the concept for “Guinea Republic”). Instead of reporting the name of the concept intended by the phrase mappings, we report the phrase (or phrases) matching with the highest frequency for each concept (i.e., we report “GN” and not “Guinea Republic”).

Statistical Analyses

Proportional subdistribution hazards (competing risk) regression as described by Fine and Gray (16) was performed with ESRD defined as the event of interest and death considered as a competing risk. The Fine and Gray (16) model directly assesses the effect of covariates on the cumulative incidence of a particular type of failure in a competing risks setting. The subdistribution hazard is the rate of ESRD per unit time for individuals who are still alive at that time or have died before that time. That is, individuals who have died without ESRD are treated as though they are still at risk for ESRD. Competing risk regression is preferable for prognostic questions, whereas Cox regression is preferable for etiologic questions (17). We were interested in identifying predictors that would differentiate patients whose kidney disease progressed to ESRD versus those who either did not progress or died—a question of prognosis—and therefore, we chose competing risk regression as our primary approach.

Competing risk regression for each concept was carried out in two phases: (1) unadjusted and (2) adjusted for a published ESRD risk prediction score developed by Tangri et al. (18) using age, race, sex, and baseline laboratory values (serum creatinine, calcium, phosphorus, albumin, bicarbonate, and the logarithm of the urine albumin-to-creatinine ratio).

Multiple hypothesis testing (19) was accounted for using the method by Storey (20), which controls the false discovery rate, defined as the expected proportion of false positives among all significant hypotheses. Using this method, P values were transformed into q values. Hazard ratios (HRs) and 95% confidence intervals (95% CIs) were not adjusted in any way. Concepts with q values <0.05 were reported as associations; this equates to a 5% expected proportion of false positives among all concepts declared to have associations. The false discovery rate method was chosen, because it explicitly controls the error rate of test conclusions among significant results, scales well in the face of increasing numbers of tests, and has higher power compared with the Bonferroni method (19). Concepts with HR>1 were identified as positive predictors, and concepts with HR<1 were identified as negative predictors. Analyses were performed in R 3.1.2 (21). q Values were computed using the qvalue R package (available on Bioconductor) by Storey et al. (22), and multiple imputation was performed using the mice R package (23).

Identifying Associated Concepts

Each concept is likely to have a number of associations with other concepts in either the same direction (e.g., diabetes mellitus and insulin) or opposite directions (e.g., women and men). For a given concept, knowing its associated concepts may help in evaluating its plausibility as a true risk factor or a confounder. The Φ-coefficient is a measure of association for two binary variables, and it can be derived by computing a Pearson correlation on binary variables. We reported the three phrases with the greatest Φ-coefficients (in either direction) for each concept identified in our analysis after excluding duplicates.

Handling Missing Data

The adjusted analysis required calculation of the Tangri score, which is dependent on several variables (18). Missing values were multiply imputed using regression switching with predictive mean matching, a nonparametric method (24). Five datasets were imputed incorporating the event flag, log time, age, race, sex, serum creatinine, calcium, phosphorus, albumin, bicarbonate, and log urine albumin-to-creatinine ratio. The Tangri score was calculated using the imputed variables. Results were pooled using the rules by Rubin (25).

Sensitivity Analyses

The analysis was repeated using Cox regression with death considered as a censoring event. Death-censored ESRD has been used by several ESRD prediction models but may be problematic for the purpose of estimating the risk of ESRD, because death likely represents informative censoring (8,18,26–30). The hazards from a cause–specific hazard model are conditional on survival and cannot be interpreted as marginal hazards that ignore death. If death is independent of ESRD, results similar to those obtained by competing risk regression would be expected with the cause–specific Cox regression. However, in settings where a variable of interest is associated with a competing event, the two approaches may yield conflicting results (31). The adjusted competing risk and Cox analyses were repeated using nonmissing covariates (age, sex, black race, and eGFR) to limit bias from imputation.

Results

After applying inclusion and exclusion criteria to 9817 patients seen in nephrology clinic, we identified 4013 patients who constituted the analytic cohort (Figure 2). Of these, 134 were confirmed to have developed ESRD during follow-up, and 160 were confirmed to have died without developing ESRD (Table 1). Median follow-up time was 1.2 years (range =0–10.2 years).

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

Approximately half of the patients considered were included in the study cohort.

View this table:
  • View inline
  • View popup
Table 1.

Baseline characteristics of the patients included in the study and patients excluded on the basis of having fewer than five clinical notes in the year before the first nephrology visit

After processing 103,962 clinical notes authored by 4589 distinct clinicians with NegEx and MetaMap, 38,698 unique concepts were identified. Of these, 7576 were present in at least 1% of patients and subsequently studied.

Predictors of ESRD from Univariate Analysis

Using a false discovery rate threshold of <5% (q<0.05) (Table 2), competing risk regression identified 960 concepts with q values <0.05 (Supplemental Table 1). Of these, 184 had an HR>1, and 776 had an HR<1.

View this table:
  • View inline
  • View popup
Table 2.

Unadjusted competing risk regression for ESRD showing concepts with q values <0.05 (top 10 positive and negative predictors shown; full results are in Supplemental Table 1)

Of the positive predictors included in the Tangri score (18)—a CKD risk prediction score on the basis of demographic and laboratory data—the unadjusted analysis identified men (“man”), “creatinine,” and “proteinuria” as concepts associated with higher risk of ESRD. Mention of “proteinuria” had an HR of 2.16 (95% CI, 1.51 to 3.10; q<0.001), and “nephrotic-range proteinuria” had an HR of 5.09 (95% CI, 2.48 to 10.40; q<0.001). Other notable positive predictors included concepts related to heart failure; coronary artery disease; type 1 diabetes; complications of chronic kidney, such as hyperkalemia and anemia; medications associated with each of these conditions; and noncompliance with medications. Interestingly, the mention of “no swelling” was also associated with ESRD.

Negative predictors for ESRD included a large number of factors that may be indicative of either a positive association with death (e.g., “ventilator”), poor candidacy for dialysis, or kidney transplant (e.g., “metastatic cancer”) or a lower risk of ESRD (e.g., “female”). Although we cannot distinguish among these reasons for a given concept, all contribute negatively to the cumulative incidence of progression to ESRD in the framework of a competing risk model. Interestingly, “female” had an HR of 0.58 (95% CI, 0.41 to 0.83; q=0.02), which is a near-perfect reciprocal of the risk conferred by “male,” whereas “healthful food” had an HR of <0.01 (95% CI, <0.01 to <0.01; q<0.001), which is in the opposite direction of the risk conferred by “fast food” (HR, 3.62; 95% CI, 1.72 to 7.63; q=0.005) but with a much larger effect size.

Predictors of ESRD after Adjusting for Tangri Score

After adjusting for the Tangri score, we identified 885 concepts using a false discovery rate threshold of <5% (q<0.05) (Table 3), of which 130 had HRs>1 and 755 had HRs<1 using competing risk regression (Supplemental Table 3). Seven hundred seventy-two concepts were identified in both the univariate and multivariate competing risk analyses.

View this table:
  • View inline
  • View popup
Table 3.

Competing risk regression for ESRD adjusted for Tangri score showing concepts with q values <0.05 (top 10 positive and negative predictors shown; full results are in Supplemental Table 3)

Missing Data

Among variables required for imputation of the Tangri score, the proportion of missingness in the final cohort was 56.1% for the urine albumin-to-creatinine ratio, 37.8% for phosphorus, 9.8% for serum albumin, 2.6% for calcium, 2.4% for bicarbonate, and 0% for age, sex, and eGFR.

Sensitivity Analyses

Using Cox regression models, 129 concepts were found to have q<0.05 in the univariate analysis: 120 concepts with HRs>1 and nine concepts with HRs<1 (Supplemental Table 2); 127 concepts were identified by both competing risk and Cox regression, 833 were by competing risk regression only, and 2 were by Cox regression only. After adjusting for the Tangri score using Cox regression, no concepts were found to have q<0.05.

The competing risk–adjusted analysis was repeated using nonmissing covariates. Adjusting for age, sex, race, and eGFR, 843 concepts were identified with q<0.05, 95 were identified with HRs>1, and 748 were identified with HRs<1 (Supplemental Table 4). Of these, 791 concepts were identified in both the primary analysis and this analysis, 52 concepts were identified in this analysis only, and 94 concepts were identified in the primary analysis only. Cox regression with this set of covariates identified seven concepts (Supplemental Table 5), all with HRs>1, in contrast to no concepts identified in the primary analysis.

Discussion

This is the first study to use an unbiased approach using text from the clinical notes to identify predictors of human disease. The approach’s face validity was confirmed by the identification of several well established risk factors for ESRD, including men, degree of proteinuria, diabetic kidney disease, heart failure, and anemia. This study also identified novel predictors of ESRD that have not been previously described, such as fast food and high–dose ascorbic acid.

This paper describes a hypothesis-generating approach, and caution is needed when interpreting the results. Identified predictors could fall into any of seven possible categories.

  1. Vague and unlikely to be meaningful

  2. Confounded by indication

  3. Mention of the concept in a note, rather than its actual presence, indicates a risk (a specific type of confounding by indication)

  4. False positive result (due to 5% false discovery rate)

  5. Associated with the event of interest but not causal

  6. Associated with the competing event

  7. True risk factor (associated with the event of interest and causal)

Confounding by Indication

Confounding may be obvious in some cases (docusate as a marker of hospitalization) and less obvious in others. The association of pes planus (flat foot) or volar (sole of the foot) with ESRD may be confounded by documentation of a foot examination among diabetic individuals referred to podiatry for diabetic nephropathy. Of the 54 patients noted to have a finding of flat foot, 39 had diabetes mellitus mentioned their notes. The association of “Ascorbic Acid 500 MG” with ESRD may be confounded by its association with “cardiac transplantation”—a relationship that was identified through a systematic evaluation of interconcept associations—although the Φ-coefficient of 0.28 reflects only a weak association between the two concepts. These situations are akin to the linkage disequilibrium problem observed in genome–wide association studies, where false associations may be identified when two genes are in close proximity.

Biologically Plausible Predictors

High–dose ascorbic acid could conceivably lead to a higher risk of ESRD through its metabolism to oxalate, the most common constituent of kidney stones (a risk factor for CKD progression [32,33]) and the metabolic abnormality found in primary hyperoxaluria, a group of rare genetic diseases associated with kidney failure. Fast food is known to be high in sodium and phosphate content (34), and both excessive salt intake and hyperphosphatemia are known to blunt the renoprotective effects of angiotensin–converting enzyme inhibitors and promote CKD progression (35,36). However, neither of these concepts were identified in the adjusted Cox regression analysis, and therefore, the higher cumulative incidence of ESRD for these two concepts may in part be driven by a lower risk of death. In both instances, the lower risk of death may be driven by confounding due to healthy user effect (37).

Sensitivity to Statistical Approach

Cox regression identified fewer predictors than competing risk regression in both the unadjusted and adjusted analyses. The likely explanation for the observed differences is that Cox regression and competing risk regression ask two different but related questions. Cox regression tests whether the mention of a concept is directly associated with the outcome of ESRD, and competing risk regression tests whether individuals with a given concept present in their notes were more likely to (live to) experience ESRD (31). If a concept is associated with a higher risk of death (the competing event), the HR for ESRD as measured by competing risk regression will be lower than the cause-specific hazard, because fewer individuals are alive and able to experience ESRD and vice versa. Because many of the concepts have varying associations with death, it is not surprising that concepts that have either a positive or negative association with death are identified with competing risk regression but not with Cox regression.

To a lesser extent, differences in results were also observed when choosing covariates that did not require imputation. Because over one half of patients had missing values for albuminuria and albuminuria is likely to be monitored more closely in patients with severe or worsening kidney disease, there are likely to be differences in patients with and without missing values. Multiple imputation is fairly robust to this problem as long as the plausible contributors to missingness are included in the imputation process, but we cannot know this definitively.

Limitations

The primary limitation of this study is that its findings are drawn from a single tertiary care center, which may have idiosyncrasies in documentation style and patient characteristics that may differ from other institutions. Validating this analysis in other cohorts is needed. Other limitations include missing covariate information and the open cohort design.

This approach was successful in translating the clinical narrative into a tool for the discovery of possible predictors that have not been previously linked to kidney failure. Future studies to replicate our findings and approach would be informative. The approach outlined here could potentially be used for patient-level prognosis, for population health management, or as a tool to identify previously unsuspected risk factors for CKD progression (38). As the adoption of electronic health records continues to rise and a generation of individuals has their entire health histories stored electronically, this approach provides a novel way to gain potential insights about disease risk as a natural byproduct of care delivery and electronic health record documentation.

Disclosures

All authors have completed and submitted the International Committee of Medical Journal Editors Form for Disclosure of Potential Conflicts of Interest. D.W.B. is a coinventor on Patent No. 6029138 held by Brigham and Women’s Hospital on the use of decision support software for medical management licensed to the Medicalis Corporation (Toronto, ON, Canada). He holds a minority equity position in the privately held company Medicalis Corporation, which develops web–based decision support for radiology test ordering. He serves on the board for SEA Medical Systems (San Jose, CA), which makes intravenous pump technology. He is on the clinical advisory board for Zynx, Inc. (Los Angeles, CA), which develops evidence-based algorithms. He consults for EarlySense (Ramat Gan, Israel), which makes patient safety monitoring systems. He receives equity and cash compensation from QPID, Inc. (Boston, MA), a company focused on intelligence systems for electronic health records. He receives cash compensation from CDI (Negev), Ltd. (Beersheba, Israel), which is a not for profit incubator for health information technology startups. He receives equity from Enelgy (Northridge, CA), which makes software to support evidence–based clinical decisions. He receives equity from Ethosmart (Ein Iron, Israel), which makes mobile applications to help patients with chronic diseases. He receives equity from Intensix (Netanya, Israel), which makes software to support clinical decision making in intensive care. He receives equity from MDClone (Beersheba, Israel), which takes clinical data and produces deidentified versions of it. The financial interests of D.W.B. have been reviewed by Brigham and Women’s Hospital and Partners HealthCare in accordance with their institutional policies. Otherwise, no conflicts of interest were reported.

Acknowledgments

Because Dr. Curhan is the Editor-in-Chief of CJASN, he was not involved in the peer-review process for this manuscript. Another editor oversaw the peer-review and decision-making process for this manuscript.

This research was supported, in part, by a National Institutes of Health T32 training grant awarded to the Division of Renal Medicine at Brigham and Women’s Hospital.

The funding source had no role in the study design, conduct, analysis, or decision to submit the manuscript.

Footnotes

  • Published online ahead of print. Publication date available at www.cjasn.org.

  • This article contains supplemental material online at http://cjasn.asnjournals.org/lookup/suppl/doi:10.2215/CJN.02420316/-/DCSupplemental.

  • Received March 4, 2016.
  • Accepted September 1, 2016.
  • Copyright © 2016 by the American Society of Nephrology

References

  1. ↵
    United States Renal Data System: 2014 USRDS Annual Data Report: An Overview of the Epidemiology of Kidney Disease in the United States, Bethesda, MD, United States Renal Data System, 2014
  2. ↵
    1. Ejerblad E,
    2. Fored CM,
    3. Lindblad P,
    4. Fryzek J,
    5. Dickman PW,
    6. Elinder C-G,
    7. McLaughlin JK,
    8. Nyrén O
    : Association between smoking and chronic renal failure in a nationwide population-based case-control study. J Am Soc Nephrol 15: 2178–2185, 2004pmid:15284303
    OpenUrlAbstract/FREE Full Text
    1. Orth SR,
    2. Hallan SI
    : Smoking: A risk factor for progression of chronic kidney disease and for cardiovascular morbidity and mortality in renal patients--absence of evidence or evidence of absence? Clin J Am Soc Nephrol 3: 226–236, 2008pmid:18003763
    OpenUrlAbstract/FREE Full Text
    1. Perry HM Jr..,
    2. Miller JP,
    3. Fornoff JR,
    4. Baty JD,
    5. Sambhi MP,
    6. Rutan G,
    7. Moskowitz DW,
    8. Carmody SE
    : Early predictors of 15-year end-stage renal disease in hypertensive patients. Hypertension 25: 587–594, 1995pmid:7721402
    OpenUrlCrossRefPubMed
    1. Echouffo-Tcheugui JB,
    2. Kengne AP
    : Risk models to predict chronic kidney disease and its progression: A systematic review. PLoS Med 9: e1001344, 2012pmid:23185136
    OpenUrlCrossRefPubMed
    1. Tangri N,
    2. Kitsios GD,
    3. Inker LA,
    4. Griffith J,
    5. Naimark DM,
    6. Walker S,
    7. Rigatto C,
    8. Uhlig K,
    9. Kent DM,
    10. Levey AS
    : Risk prediction models for patients with chronic kidney disease: A systematic review. Ann Intern Med 158: 596–603, 2013pmid:23588748
    OpenUrlCrossRefPubMed
    1. Perneger TV,
    2. Whelton PK,
    3. Klag MJ
    : Risk of kidney failure associated with the use of acetaminophen, aspirin, and nonsteroidal antiinflammatory drugs. N Engl J Med 331: 1675–1679, 1994pmid:7969358
    OpenUrlCrossRefPubMed
  3. ↵
    1. Desai AS,
    2. Toto R,
    3. Jarolim P,
    4. Uno H,
    5. Eckardt K-U,
    6. Kewalramani R,
    7. Levey AS,
    8. Lewis EF,
    9. McMurray JJV,
    10. Parving H-H,
    11. Solomon SD,
    12. Pfeffer MA
    : Association between cardiac biomarkers and the development of ESRD in patients with type 2 diabetes mellitus, anemia, and CKD. Am J Kidney Dis 58: 717–728, 2011pmid:21820220
    OpenUrlCrossRefPubMed
  4. ↵
    1. Patel CJ,
    2. Bhattacharya J,
    3. Butte AJ
    : An Environment-Wide Association Study (EWAS) on type 2 diabetes mellitus. PLoS One 5: e10746, 2010pmid:20505766
    OpenUrlCrossRefPubMed
  5. ↵
    1. Patel CJ,
    2. Ioannidis JPA
    : Studying the elusive environment in large scale. JAMA 311: 2173–2174, 2014pmid:24893084
    OpenUrlCrossRefPubMed
  6. ↵
    1. Denny JC,
    2. Bastarache L,
    3. Ritchie MD,
    4. Carroll RJ,
    5. Zink R,
    6. Mosley JD,
    7. Field JR,
    8. Pulley JM,
    9. Ramirez AH,
    10. Bowton E,
    11. Basford MA,
    12. Carrell DS,
    13. Peissig PL,
    14. Kho AN,
    15. Pacheco JA,
    16. Rasmussen LV,
    17. Crosslin DR,
    18. Crane PK,
    19. Pathak J,
    20. Bielinski SJ,
    21. Pendergrass SA,
    22. Xu H,
    23. Hindorff LA,
    24. Li R,
    25. Manolio TA,
    26. Chute CG,
    27. Chisholm RL,
    28. Larson EB,
    29. Jarvik GP,
    30. Brilliant MH,
    31. McCarty CA,
    32. Kullo IJ,
    33. Haines JL,
    34. Crawford DC,
    35. Masys DR,
    36. Roden DM
    : Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat Biotechnol 31: 1102–1110, 2013pmid:24270849
    OpenUrlCrossRefPubMed
  7. ↵
    1. Lehman LW,
    2. Saeed M,
    3. Long W,
    4. Lee J,
    5. Mark R
    : Risk stratification of ICU patients using topic models inferred from unstructured progress notes. AMIA Annu Symp Proc 2012: 505–511, 2012pmid:23304322
    OpenUrlPubMed
  8. ↵
    1. Chapman WW,
    2. Bridewell W,
    3. Hanbury P,
    4. Cooper GF,
    5. Buchanan BG
    : A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 34: 301–310, 2001pmid:12123149
    OpenUrlCrossRefPubMed
  9. ↵
    1. Python Software Foundation
    : Python Language Reference, version 2.7, Wilmington, DE, Python Software Foundation, 2013
  10. ↵
    1. Aronson AR
    : Effective mapping of biomedical text to the UMLS Metathesaurus: The MetaMap program. Proc AMIA Symp 2001: 17–21, 2001pmid:11825149
    OpenUrlPubMed
  11. ↵
    1. Fine JP,
    2. Gray RJ
    : A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc 94: 496–509, 1999
    OpenUrlCrossRef
  12. ↵
    1. Noordzij M,
    2. Leffondré K,
    3. van Stralen KJ,
    4. Zoccali C,
    5. Dekker FW,
    6. Jager KJ
    : When do we need competing risks methods for survival analysis in nephrology? Nephrol Dial Transplant 28: 2670–2677, 2013pmid:23975843
    OpenUrlCrossRefPubMed
  13. ↵
    1. Tangri N,
    2. Stevens LA,
    3. Griffith J,
    4. Tighiouart H,
    5. Djurdjev O,
    6. Naimark D,
    7. Levin A,
    8. Levey AS
    : A predictive model for progression of chronic kidney disease to kidney failure. JAMA 305: 1553–1559, 2011pmid:21482743
    OpenUrlCrossRefPubMed
  14. ↵
    1. Glickman ME,
    2. Rao SR,
    3. Schultz MR
    : False discovery rate control is a recommended alternative to Bonferroni-type adjustments in health studies. J Clin Epidemiol 67: 850–857, 2014pmid:24831050
    OpenUrlCrossRefPubMed
  15. ↵
    1. Storey JD
    : The positive false discovery rate: A Bayesian interpretation and the q-value. Ann Stat 31: 2013–2035, 2003
    OpenUrlCrossRef
  16. ↵
    R Core Team: R: A Language and Environment for Statistical Computing, Vienna, Austria, R Core Team, 2014
  17. ↵
    Storey JD, Bass AJ, Dabney A, Robinson D: qvalue: Q-Value Estimation for False Discovery Rate Control, 2015. Available at: http://qvalue.princeton.edu. Accessed March 1, 2016
  18. ↵
    1. van Buuren S,
    2. Groothuis-Oudshoorn K
    : {mice}: Multivariate Imputation by Chained Equations in R. J Stat Softw 45: 1–67, 2011
    OpenUrlCrossRef
  19. ↵
    1. Marshall A,
    2. Altman DG,
    3. Holder RL
    : Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: A resampling study. BMC Med Res Methodol 10: 112, 2010pmid:21194416
    OpenUrlCrossRefPubMed
  20. ↵
    1. Rubin DB
    : Multiple Imputation for Nonresponse in Surveys, New York, Wiley, 1987
  21. ↵
    1. Keane WF,
    2. Zhang Z,
    3. Lyle PA,
    4. Cooper ME,
    5. de Zeeuw D,
    6. Grunfeld J-P,
    7. Lash JP,
    8. McGill JB,
    9. Mitch WE,
    10. Remuzzi G,
    11. Shahinfar S,
    12. Snapinn SM,
    13. Toto R,
    14. Brenner BM; RENAAL Study Investigators
    : Risk scores for predicting outcomes in patients with type 2 diabetes and nephropathy: The RENAAL study. Clin J Am Soc Nephrol 1: 761–767, 2006pmid:17699284
    OpenUrlAbstract/FREE Full Text
    1. Goto M,
    2. Wakai K,
    3. Kawamura T,
    4. Ando M,
    5. Endoh M,
    6. Tomino Y
    : A scoring system to predict renal outcome in IgA nephropathy: A nationwide 10-year prospective cohort study. Nephrol Dial Transplant 24: 3068–3074, 2009pmid:19515800
    OpenUrlCrossRefPubMed
    1. Wakai K,
    2. Kawamura T,
    3. Endoh M,
    4. Kojima M,
    5. Tomino Y,
    6. Tamakoshi A,
    7. Ohno Y,
    8. Inaba Y,
    9. Sakai H
    : A scoring system to predict renal outcome in IgA nephropathy: From a nationwide prospective study. Nephrol Dial Transplant 21: 2800–2808, 2006pmid:16822793
    OpenUrlCrossRefPubMed
    1. Johnson ES,
    2. Thorp ML,
    3. Platt RW,
    4. Smith DH
    : Predicting the risk of dialysis and transplant among patients with CKD: A retrospective cohort study. Am J Kidney Dis 52: 653–660, 2008pmid:18585833
    OpenUrlCrossRefPubMed
  22. ↵
    1. Landray MJ,
    2. Emberson JR,
    3. Blackwell L,
    4. Dasgupta T,
    5. Zakeri R,
    6. Morgan MD,
    7. Ferro CJ,
    8. Vickery S,
    9. Ayrton P,
    10. Nair D,
    11. Dalton RN,
    12. Lamb EJ,
    13. Baigent C,
    14. Townend JN,
    15. Wheeler DC
    : Prediction of ESRD and death among people with CKD: The Chronic Renal Impairment in Birmingham (CRIB) prospective cohort study. Am J Kidney Dis 56: 1082–1094, 2010pmid:21035932
    OpenUrlCrossRefPubMed
  23. ↵
    1. Lau B,
    2. Cole SR,
    3. Gange SJ
    : Competing risk regression models for epidemiologic data. Am J Epidemiol 170: 244–256, 2009pmid:19494242
    OpenUrlCrossRefPubMed
  24. ↵
    1. Baxmann AC,
    2. De O G Mendonça C,
    3. Heilberg IP
    : Effect of vitamin C supplements on urinary oxalate and pH in calcium stone-forming patients. Kidney Int 63: 1066–1071, 2003pmid:12631089
    OpenUrlCrossRefPubMed
  25. ↵
    1. Alexander RT,
    2. Hemmelgarn BR,
    3. Wiebe N,
    4. Bello A,
    5. Morgan C,
    6. Samuel S,
    7. Klarenbach SW,
    8. Curhan GC,
    9. Tonelli M; Alberta Kidney Disease Network
    : Kidney stones and kidney function loss: A cohort study. BMJ 345: e5287, 2012pmid:22936784
    OpenUrlAbstract/FREE Full Text
  26. ↵
    1. Ritz E,
    2. Hahn K,
    3. Ketteler M,
    4. Kuhlmann MK,
    5. Mann J
    : Phosphate additives in food--a health risk. Dtsch Arztebl Int 109: 49–55, 2012pmid:22334826
    OpenUrlPubMed
  27. ↵
    1. D’Elia L,
    2. Rossi G,
    3. Schiano di Cola M,
    4. Savino I,
    5. Galletti F,
    6. Strazzullo P
    : Meta-analysis of the effect of dietary sodium restriction with or without concomitant renin-angiotensin-aldosterone system-inhibiting treatment on albuminuria. Clin J Am Soc Nephrol 10: 1542–1552, 2015
    OpenUrlAbstract/FREE Full Text
  28. ↵
    1. Zoccali C,
    2. Ruggenenti P,
    3. Perna A,
    4. Leonardis D,
    5. Tripepi R,
    6. Tripepi G,
    7. Mallamaci F,
    8. Remuzzi G; REIN Study Group
    : Phosphate may promote CKD progression and attenuate renoprotective effect of ACE inhibition. J Am Soc Nephrol 22: 1923–1930, 2011pmid:21852581
    OpenUrlAbstract/FREE Full Text
  29. ↵
    1. Shrank WH,
    2. Patrick AR,
    3. Brookhart MA
    : Healthy user and related biases in observational studies of preventive interventions: A primer for physicians. J Gen Intern Med 26: 546–550, 2011pmid:21203857
    OpenUrlCrossRefPubMed
  30. ↵
    1. Lazarus B,
    2. Chen Y,
    3. Wilson FP,
    4. Sang Y,
    5. Chang AR,
    6. Coresh J,
    7. Grams ME
    : Proton pump inhibitor use and the risk of chronic kidney disease. JAMA Intern Med 176: 238–246, 2016
    OpenUrlCrossRefPubMed
PreviousNext
Back to top

In this issue

Clinical Journal of the American Society of Nephrology: 11 (12)
Clinical Journal of the American Society of Nephrology
Vol. 11, Issue 12
December 07, 2016
  • Table of Contents
  • Table of Contents (PDF)
  • About the Cover
  • Index by author
View Selected Citations (0)
Print
Download PDF
Sign up for Alerts
Email Article
Thank you for your help in sharing the high-quality science in CJASN.
Enter multiple addresses on separate lines or separate them with commas.
A Concept–Wide Association Study of Clinical Notes to Discover New Predictors of Kidney Failure
(Your Name) has sent you a message from American Society of Nephrology
(Your Name) thought you would like to see the American Society of Nephrology web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
A Concept–Wide Association Study of Clinical Notes to Discover New Predictors of Kidney Failure
Karandeep Singh, Rebecca A. Betensky, Adam Wright, Gary C. Curhan, David W. Bates, Sushrut S. Waikar
CJASN Dec 2016, 11 (12) 2150-2158; DOI: 10.2215/CJN.02420316

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Request Permissions
Share
A Concept–Wide Association Study of Clinical Notes to Discover New Predictors of Kidney Failure
Karandeep Singh, Rebecca A. Betensky, Adam Wright, Gary C. Curhan, David W. Bates, Sushrut S. Waikar
CJASN Dec 2016, 11 (12) 2150-2158; DOI: 10.2215/CJN.02420316
del.icio.us logo Digg logo Reddit logo Twitter logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like

Jump to section

  • Article
    • Abstract
    • Introduction
    • Materials and Methods
    • Results
    • Discussion
    • Disclosures
    • Acknowledgments
    • Footnotes
    • References
  • Figures & Data Supps
  • Info & Metrics
  • View PDF

More in this TOC Section

Original Articles

  • Survey of Salary and Job Satisfaction of Transplant Nephrologists in the United States
  • Implications of Accumulated Cold Time for US Kidney Transplantation Offer Acceptance
  • Association of Polypharmacy with Kidney Disease Progression in Adults with CKD
Show more Original Articles

Chronic Kidney Disease

  • Potential Mechanisms Involved in Chronic Kidney Disease of Unclear Etiology
  • Age and the Course of GFR in Persons Aged 70 and Above
  • Association of Mitochondrial DNA Copy Number with Risk of Progression of Kidney Disease
Show more Chronic Kidney Disease

Cited By...

  • Comparison of Approaches to the identification of Symptom Burden in Hemodialysis Patients Utilizing Electronic Health Records
  • Google Scholar

Similar Articles

Related Articles

  • PubMed
  • Google Scholar

Keywords

  • chronic kidney disease
  • end stage kidney disease
  • Natural Language Processing
  • informatics
  • electronic health record
  • Adult
  • Ascorbic Acid
  • cohort studies
  • creatinine
  • disease progression
  • Electronic Health Records
  • fast foods
  • follow-up studies
  • humans
  • kidney
  • kidney diseases
  • Kidney Failure, Chronic
  • nephrology
  • Outpatients
  • renal insufficiency
  • Retrospective Studies
  • risk factors
  • Tertiary Care Centers

Articles

  • Current Issue
  • Early Access
  • Subject Collections
  • Article Archive
  • ASN Meeting Abstracts

Information for Authors

  • Submit a Manuscript
  • Trainee of the Year
  • Author Resources
  • ASN Journal Policies
  • Reuse/Reprint Policy

About

  • CJASN
  • ASN
  • ASN Journals
  • ASN Kidney News

Journal Information

  • About CJASN
  • CJASN Email Alerts
  • CJASN Key Impact Information
  • CJASN Podcasts
  • CJASN RSS Feeds
  • Editorial Board

More Information

  • Advertise
  • ASN Podcasts
  • ASN Publications
  • Become an ASN Member
  • Feedback
  • Follow on Twitter
  • Subscribe to ASN Journals
  • Wolters Kluwer Partnership

© 2022 American Society of Nephrology

Print ISSN - 1555-9041 Online ISSN - 1555-905X

Powered by HighWire