Abstract
Background and objectives: Mutation-based molecular diagnostics of autosomal dominant polycystic kidney disease (ADPKD) is complicated by locus and allelic heterogeneity, large multi-exon gene structure and duplication in PKD1, and a high level of unclassified variants. Comprehensive screening of PKD1 and PKD2 by two recent studies have shown that atypical splice mutations account for 3.5% to 5% of ADPKD. We evaluated the role of bioinformatic prediction of atypical splice mutations and determined the pathogenicity of an atypical PKD2 splice variant from a multiplex ADPKD (TOR101) family.
Design, setting, participants, & measurements: Using PubMed, we identified 17 atypical PKD1 and PKD2 splice mutations. We found that bioinformatics analysis was often useful for evaluating the pathogenicity of these mutations, although RT-PCR is needed to provide the definitive proof.
Results: Sequencing of both PKD1 and PKD2 in an affected subject of TOR101 failed to identify a definite mutation, but revealed several UCVs, including an atypical PKD2 splice variant. Linkage analysis with microsatellite markers indicated that TOR101 was PKD2-linked and IVS8 + 5G→A was shown to cosegregate only with affected subjects. RT-PCR of leukocyte mRNA from an affected subject using primers from exons 7 and 9 revealed six splice variants that resulted from activation of different combinations of donor and acceptor cryptic splice sites, all terminating with premature stop codons.
Conclusions: The data provide strong evidence that IVS8 + 5G→A is a pathogenic mutation for PKD2. This case highlights the importance of functional analysis of UCVs.
Autosomal dominant polycystic kidney disease (ADPKD) is the most common hereditary kidney disorder worldwide, affecting approximately one in 500 live births. It is characterized by focal development and progressive enlargement of renal cysts, leading to end-stage renal disease (ESRD) in late middle age. Typically, only a few renal cysts are detected in most affected subjects before 30 yr of age. However, by the fifth decade of life, hundreds to thousands of renal cysts are found in most patients. Overall, ADPKD accounts for 5% to 8% of end-stage renal disease (ESRD) in developed countries (1,2). Extrarenal complications of ADPKD are variable and include inguinal hernias, colonic diverticulae, valvular heart disease, and intracranial arterial aneurysms (1).
Mutations of two genes, PKD1 (MIM 601313) and PKD2 (MIM 173910), account for approximately 85% and 15% of all cases of ADPKD in linkage-characterized European populations (3,4). Although the clinical manifestations of PKD1 and PKD2 overlap completely, a strong locus effect on renal disease severity is evident with more severe renal disease in PKD1 than PKD2 (median age at ESRD: 54 yr versus 74, respectively) (5). PKD1 is a large gene consisting of 46 exons with an open reading frame of approximately 13 kb and is predicted to encode a protein of 4302 amino acids. Its entire 5′ region up to exon 33 has been duplicated six times proximally on chromosome 16p, and the presence of these highly homologous pseudogenes has made genetic analysis of PKD1 difficult (1,2). Recent availability of protocols for long-range and locus-specific amplification of PKD1 has enabled the complete mutation screening of this complex gene (6–9). In contrast, PKD2 is a single-copy gene consisting of 15 exons with an open reading frame of approximately 3 kb and is predicted to encode a protein of 968 amino acids (1,2).
The diagnosis of ADPKD is straightforward in affected subjects with a positive family history and enlarged kidneys with multiple cysts (6). Renal ultrasound is a useful method for this purpose, and age-dependant criteria based on cyst number have been derived for subjects born with 50% risk of PKD1 or PKD2 (6,10). However, ultrasound diagnosis of ADPKD in younger at-risk subjects with equivocal or negative findings and in subjects affected by PKD2 or de novo disease remains a challenge (6). For these reasons, molecular screening is a useful tool in the clinical setting. However, marked allelic heterogeneity is evident, with over 200 different PKD1 and over 50 different PKD2 mutations reported to date (2,6–9,11–13). The majority of these mutations are unique and scattered throughout both genes. Although the majority of these mutations are predicted to be protein truncating (frame-shift deletion/insertion, nonsense or canonical splice changes), a large number of unclassified variants (UCVs; in-frame deletions, mis-sense and atypical splice changes) has also been reported (7–9). Comprehensive screening of both PKD1 and PKD2 by two recent studies identified definitive and probable mutations in 42% to 63% and 26% to 37% of patients, respectively (8,9). These two studies also reported that atypical splice mutations account for approximately 3.5% to 5% of ADPKD (8,9). In the current study, we performed and evaluated the utility of bioinformatics analysis on 17 reported atypical PKD1 and PKD2 splice mutations. We also determine the pathogenicity of an atypical splice variant found in a family affected by PKD2 and highlight the importance of functional analysis of UCVs in molecular diagnostic testing.
Materials and Methods
Bioinformatic Analysis of Reported Atypical Splice Site Variants
We used PubMed (http://www.ncbi.nlm.nih.gov/sites/entrez?db=PubMed) to identify all of the atypical splice variants for PKD1 and PKD2 reported to date. We then performed bioinformatic analysis of the splice variants using Neural Network (http:/www/fruitfly.org/seq_tools/splice.html) and the maximum entropy (MAXENT) model (14).
Study Subjects
All available family members from TOR101 were clinically assessed for ADPKD. After receiving their informed consent, we reviewed their medical records and used renal ultrasonography to screen all at-risk subjects without a known diagnosis of ADPKD. We used the following criteria for the diagnosis of ADPKD: (1) the presence of at least three renal cysts (unilateral or bilateral) in an at-risk subject younger than 30 yr, (2) the presence of at least two cysts in each kidney in an at-risk subject age 30 to 59 yr, or (3) the presence of at least four cysts in each kidney in an at-risk subject age 60 yr or older (10). All study subjects provided a blood sample for serum creatinine and DNA genetic analysis. The Institutional Review Board of the University Health Network in Toronto approved all of the protocols used for this study.
Gene-Based Mutation Screening
Sequence analysis of PKD1 and PKD2 was performed in a clinically affected subject (II: 4) using a commercial diagnostic service (Athena Diagnostics, Worcester, MA; http://www.athenadiagnostics.com/content/test-catalog/find-test/service) (8,9). Briefly, genomic DNA was used for locus-specific long-range PCR amplification of eight segments encompassing the entire PKD1 duplicated region. The long-range PCR products served as templates for 43 nested PCRs, and the unique region of PKD1 and the entire PKD2 were amplified from genomic DNA in 28 additional PCRs. All 71 PCR products were bidirectionally sequenced, including the coding regions and exon–intron splice junctions of both genes (9).
Haplotype and Linkage Analysis
Genomic DNA was extracted from peripheral blood leukocytes using the FlexiDNA extraction Kit (Qiagen, Mississauga, ON, Canada). All available family members from TOR101 were genotyped with microsatellite markers at both PKD1 and PKD2 loci by means of a published protocol (15). The locations of the markers relative to PKD1 are as follows (the number between markers denotes intermarker distance in centimorgans): D16S521–2.0-HBAP1–2.0-KG8/PKD1-0.8-D16S2618 (16). KG8 is an intragenic marker located within the 3′ end of PKD1. The locations of the markers relative to PKD2 are as follows: D4S231–2.0-D4S1534–2.3-SPP1–0.2-PKD2-2.5-D4S423 (17). Genotyping was performed by [32P] α-deoxycytidine triphosphate-labeling of PCR products and was analyzed by PAGE and autoradiography. Haplotypes were constructed by hand and by using the program GENEHUNTER (v2.1_r5) (18). Two-point and multipoint linkages with “affected-only analysis” were performed with M-LINK from the FASTLINK package (v4.0) (19,20) and GENEHUNTER (v2.1_r5), respectively. An autosomal dominant model with a disease allele frequency of 0.001 and a phenocopy rate of 0.001 was assumed. Marker allele frequencies were obtained from married-in subjects and reconstruction of the genotypes of the founders.
Segregation Analysis of an Atypical PKD2 Splice Variant
PCR amplification of exon 8 of PKD2, including the mutated atypical splice site (IVS8 + 5G→A), was performed with genomic DNA with a forward and a reverse primer, 5′-CCCGCCGCCCCCGCCGTTATTATAATACAGTCACACCATTTTGTTT-3′ (the first 16 bases of this primer provide a 5′-GC clamp) and 5′-TGAGAAGCAGTGACAACTGTGA-3′, and HotStart TaqDNA polymerase (Qiagen) for 35 cycles at an annealing temperature of 60°C. Five microliters of PCR reaction was then used for denaturing HPLC (DHPLC) at 53.5°C using the WAVE system (Transgenomic Inc., Omaha, NE), and the elution profiles of the normal and mutant variants were used for segregation analysis in TOR101.
RT-PCR and Splice Variant Analysis
Total RNA from extracted from peripheral blood mononuclear cells using the Trizol method (Invitrogen, Carlsbad, CA). Reverse transcription was performed in a 20-μl volume with 2 μg of total RNA, 1 μl of oligo(dT)12–18, and SuperScript II RNase H-Reverse Transcriptase (Invitrogen, Carlsbad, CA) according to the manufacturer's instructions. One microliter of RT product from first-strand reaction was amplified in 25 μl of hot start PCR reaction, including 0.25 μM of each primer and HotStar Taq DNA polymerase (Qiagen, Mississauga, ON, Canada). The forward and reverse primers used were: 5′-CCCAACTTTGAGCATCTGG-3′ and 5′-CCAAAACTCGATTAGCTTCCTC-3′, respectively. The amplified fragment spans the last part of exon 7, the entire exon 8, and the beginning of exon 9. Cycling conditions were 94°C for 15 min, followed by 36 cycles of 94°C for 45 s, annealing temperature for 45 s, and 72°C for 1 min (touch-down annealing temperature from 63 to 60.5°C through the first six cycles).
Cloning of the RT-PCR products was performed with the TOPO TA Cloning Kit (Invitrogen) according to manufacturer's instructions. Four microliters of gel-extracted PCR products, excluding the lower wild-type band from II:4 (Figure 2A), were used for the ligation reaction, and the mixture was incubated at room temperature for 10 min. Then 2 μl of the TOPO cloning reaction mix were added to One Shot Chemically Competent E. coli (Invitrogen) and mixed gently. After incubation on ice for 10 min, the mixture was heat-shocked at 42°C for 30 s, and immediately transferred to ice. Two hundred fifty microliters of SOC medium was added into the vial containing the reaction mix which was shaken horizontally at 37°C at 200 rpm for 1 h. Fifty microliters of the transformation mix were spread on two prewarmed selective LB plates (containing 100 μg/ml ampicillin with 40 μl of 40mg/ml X-gal) and incubated at 37°C overnight. All white colonies were inoculated on a plate, numbered serially, and resuspended individually in 50 μl of water. After 5 min of heating at 95°C and centrifugation, 1 μl of supernatant from each colony was added into a final 10-μl PCR reaction using the same RT-PCR conditions, except the cycle number was reduced to 30. Thirty randomly chosen plasmids with cloned cDNA inserts were sequenced using the RT-PCR primers.
To determine the relative frequencies of the wild type (WT) and six splice variants (Table 2), purified PCR products from 97 randomly chosen white colonies were sized on a 2.5% agarose gel. To distinguish the WT allele (334 bp) from splice variant V (330 bp), all samples with PCR products of approximately 330 bp were restriction digested by Hinf I (New England Bio Lab, Ipswich, MA), which only cleaved splice variant V into 260 and 70 bp fragments, and resized on a 2.5% agarose gel. Similarly, to distinguish splice variants II and III (410 and 408 bp, respectively), all samples with PCR products of approximately 410 bp were restriction digested by Bsr I (New England Bio Lab) which only cleaved the splice variant III into approximately 370- and 40-bp fragments, and resized on a 2.5% agarose gel.
Splice Site Prediction
Exonic splicing enhancers (ESEs) are common cis-regulatory elements that act as bindings sites for Ser/Arg-rich proteins (SR proteins), a family of conserved splicing factors that participate in multiple steps of the splicing pathway (21). We used ESEfinder (http://exon.cshl.edu/ESE), which predicts ESE binding sites based on the consensus motifs of four SR proteins (SF2/ASF, SC35, SRp40 and SRp55), to assess whether IVS8 + 5G→A might affect an ESE binding site (21). Three predictive software packages were used to calculate the strengths of the 5′ and 3′ cryptic splice sites of our cloned RT-PCR products: (1) SpliceSiteFinder (http://violin.genet.sickkids.on.ca/∼ali/splicesitefinder.html) uses the Shapiro and Senapathy consensus matrix, which reflects the degree of conservation at each position of the consensus 5′ss motif (22); (2) the Maximum Dependency Decomposition (MDD) model uses a decision-tree approach that emphasizes the strongest dependencies in the early branches of the tree (23); and (3) MAXENT, which can monitor the dependencies between different positions by using a maximum-entropy distribution consistent with lower order marginal constraints (24–26). The latter two software tools are available from the Burge lab at MIT (http://genes.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.html) (24,25).
Results
From the published literature, we identified 11 atypical splice variants for PKD1 and 3 for PKD2 (7–9,11–13), and performed bioinformatic analysis on them using Neural Network and MAXENT (Table 1). Although seven of 11 PKD1 and one of three PKD2 atypical splice variants are definitive (Class A) mutations confirmed by RT-PCR, our bioinformatics analysis failed to identify four of them (IVS31 + 25del19; IVS43 + 14del20; IVS43 + 17del18; IVS25–16G>A) as pathogenic. Three of these latter mutations are 18- to 20-bp deletion of a small intron (75 to 87 bp) resulting in a short intron that may be suboptimal for normal splicing (13). However, one or both of the above algorithms predicted aberrant splicing in the remaining four mutations. For example, Neural Network predicted the activation of a cryptic splice site 8 bp upstream from the authentic 3′ splice site by the PKD1 mutation IVS15–10C>A as a result of a decreased 3′ authentic splice site score (from 0.95 [WT] to 0.79 [mutant]) and increased 3′ cryptic splice site score (from < 0.1 [WT] to 0.98 [mutant]). This prediction was also supported by bioinfomatic analysis using MAXENT (Table 1). Similarly, both Neural Network and MAXENT predicted aberrant splicing for the atypical PKD1 splice variants IVS24 + 5G>C and IVS17 + 4del4, and atypical PKD2 splice variant IVS8 + 5G>A. However, without RT-PCR confirmation, these mutations were classified as probably or likely pathogenic, and their clinical utility remains to be defined (8,9). In the case of the atypical PKD1 splice variant IVS24 + 28G>T, the changes of mutant 5′ authentic and cryptic splice site scores from MAXENT (5.97→7.2), but not Neural Network (0.91→0.63), suggest that this variant may be pathogenic. Last, both algorithms predicted the atypical PKD1 splice variants IVS26 + 76C>A, IVS20–16C>G, IVS37–4C>T, and IVS10–4A>G, and atypical PKD2 splice variant IVS6–4T>C as neutral polymorphisms, although none of the predictions have been confirmed by RT-PCR (Table 1).
Bioinformatic analysis of PKD1 and PKD2 atypical splice variants
Subject III:9 from TOR101 was referred to us for evaluation as a potential living kidney donor to her affected aunt (II:4) (Figure 1B). She was at risk for ADPKD but had a negative renal ultrasound at age 25 yr. To assess her risk for ADPKD, we sought to identify the pathogenic mutation in her aunt by sequencing of both PKD1 and PKD2 using a commercial service (Athena Diagnostics, Worcester, MA). Three mis-sense and one intronic variants were identified, but none was reported to be definitively pathogenic (Table 2). Of note, the atypical PKD2 splice mutation (IVS8 + 5G→A) (Figure 1A) we identified in TOR101 has been independently reported as a probable pathogenic mutation by two recent studies (8,9). To further delineate the molecular genetic defect of TOR101, we genotyped eight affected subjects from this family using microsatellite markers at both PKD1 and PKD2. Affected-only linkage analysis indicated that TOR101 was PKD2 linked (parametric multipoint lod score: 2.06 for PKD2 and −2.54 for PKD1) and the PKD2 haplotype 1 to 2–1 to 2 cosegregated with all of the affected subjects (Figure 1B). Consistent with PKD2 linkage, none of the older affected family members (II:2, eGFR of 45 ml/min at age 70 yr; II:4, eGFR of 20 ml/min at age 69 yr; and II:6, eGFR of 52 ml/min at age 66 yr) from TOR101 had ESRD at their last follow-up. DHPLC of genomic exon 8 PCR amplicons showed that IVS8 + 5G→A displayed a distinct elution profile compared with control and cosegregated with all of the affected subjects in TOR101 (Figure 1C).
(A) Sequence tracing of a heterozygous PKD2 mis-sense splice site variant (IVS8 + 5G→A) identified in II:4. (B) Pedigree structure and haplotype analysis. The PKD2 haplotype (D4S231-D4S1534-SPP1-D4S423) 1 to 2–1 to 2 segregated with all of the affected subjects. (C) DHPLC showing a distinct elution profile for IVS8 + 5G→A in II:4, which cosegregated with all of the affected subjects. Squares and circles denote male and female, respectively. The cross symbol denotes the study subject; the solid filled symbol, the affected subject.
Directing sequencing of PKD1 and PKD2 identified four unclassified variants in an affected subject (II:4) from TOR101
RT-PCR using total RNA from peripheral blood mononuclear cells and primers from exons 7 and 9 revealed multiple higher molecular extra bands in the affected subject II:4 compared with one distinct band in the normal control (Figure 2A). Gel purified RT-PCR products from II:4 were subcloned and miniprepped, and 30 randomly chosen plasmids with cloned cDNA inserts were sequenced. In addition to the WT allele, six splice variants were identified, all terminating with a premature stop codon (Figure 2B and Table 3). Sequence analysis of the six splice variants revealed activation of three 5′ cryptic splice sites (A, B, C) and two 3′ cryptic splice sites (B′, C′) along with the authentic 5′ (D) and 3′ (A′) splice sites in different combinations (Figure 2C). To evaluate the relative frequencies of these splice variants, we performed PCR in 97 randomly selected white colonies using the same RT-PCR primers from exons 7 and 9. Purified PCR products (some also restriction digested with Hinf I or Bsr I; see Material and Methods) were analyzed according to their size by 2.5% agarose gel electrophoresis. Fifty-seven clones (approximately 58%) yielded a WT allele, and the remaining 40 clones yielded six splice variants in varying frequencies (Table 3).
(A) RT-PCR from peripheral blood mononuclear cells using primers from exon 7 and 9. The normal control subject (lane 1) had one distinct band, whereas the affected subject (II:4) has multiple higher molecular weight extra bands (lane 2). Lane 3 is a negative control without DNA. (B) Diagrammatic representation of the different splice variants identified from II:4, which were produced through activation of different combinations of donor and acceptor cryptic splice sites. (C) Representation of the authentic (D and A′) and activated cryptic (B, C, A, C′ and B′) splice sites along the genomic DNA.
Effects of splice variants on their encoded protein products
Using ESEfinder (17), we identified three SRp55 motifs with significant scores (> 2.676) overlapping the authentic IVS8 splice site (data not shown). The IVS8 + 5G→A transition, in turn, is predicted to result in the loss of one of these motifs, and could therefore diminish the affinity of SRp55 for the authentic 5′ splice site in favor of other cryptic splice sites. We also used three algorithms to score the relative strengths of the splice sites used in the cloned RT-PCR variants (Table 4). In general, there was good agreement between these algorithms, with the authentic 5′ and 3′ (D(IVS8 + 5G) and A′) splice sites being the most preferred, followed by A, B, C, and D(IVS8 + 5A) in descending order among the 5′ cryptic splice sites, and B′ and C′ in descending order among the 3′ cryptic splice sites. However, the Shapiro and Senapathy algorithm predicted that the mutant variant D(IVS8 + 5A) would be more preferred than the 5′ cryptic splice sites B and C, which is opposite from the results predicted by the MDD and MAXENT algorithms. In addition, although both the MDD and MAXENT algorithms predict a stronger preference for the 5′ cryptic splice site A than B, the observed frequencies of usage of these splice sites suggests the converse.
Predicted strength of authentic and cryptic splice sites and frequency of their use
Discussion
Sequence-based mutation screening for PKD1 and PKD2 is now available and provides a novel means for improved diagnosis in ADPKD (6–8). This approach is particularly useful in the clinical evaluation of younger at-risk subjects with equivocal imaging results and in patients with PKD2 or de novo ADPKD. Conversely, gene-based molecular diagnostics may also be used for disease exclusion in the evaluation of younger subjects at risk of ADPKD for living-related kidney donation (6). Comprehensive screening of both PKD1 and PKD2 by two large recent studies identified definitive and probable mutations in 42% to 63% and 26 to 37% of cases, respectively (8,9). The assignment of pathogenicity for the former class of mutations (e.g., protein-truncating) is straightforward. In contrast, the assignment of pathogenicity in the latter class of mutations (e.g., nonconserved coding sequence mis-sense variants, in-frame deletions, and atypical splice changes) from a high level of UCVs remains challenging in the absence of a functional assay, and the clinical utility of this latter class of mutations needs to be further defined. Among the UCVs, splicing defects may be difficult to detect given the recent realization that certain silent coding sequence mis-sense mutations can disrupt premRNA processing, with dramatic effects on the structure of the gene product (21). In addition, aberrant splicing can also occur, with mis-sense changes affecting 5′ and 3′ splice sites other than the invariant GT and AG dinucleotides, respectively (27–29). Because most mutation screening is performed using genomic DNA as templates, atypical splicing mutations are likely underestimated.
The atypical PKD2 splice variant IVS8 + 5G→A we described in TOR101 has been recently reported by Rossetti et al. and was predicted to be possibly pathogenic on the basis of conservation of the genomic sequence (8). By linkage, segregation, and RT-PCR studies we have provided further and stronger evidence for the pathogenicity of this mutation. Indeed, similar atypical splice mutations have been documented in other diseases. For example, IVS51 + 5G→A in CDH23 caused Usher syndrome type 1D through inframe skipping of exon 51 (27). In addition, several atypical splice mutations, confirmed by RT-PCR, have been reported for PKD1 and PKD2 (see Table 1) (8,9,11–13). PremRNA splicing is a complex cellular process in which the removal of introns is performed by the complex interactions of the splicosome, cis-regulatory splice enhancers and silencers, and transregulatory factors (28,29). Using ESEfinder, we found that the IVS8 + 5G→A transition might result in the loss of one of three SRp55 motifs and could therefore diminish the affinity of SRp55 for the authentic 5′ splice site in favor of other cryptic splice sites. We also assessed the relative strengths of all of the donor and acceptor splice sites used in the normal and splice variants. The donor sites were scored based on their affinity for the U1 snRNA 5′ terminus using a 9 bp motif from position −3 to + 6 relative to the GT sequence (21,30). We found that the IVS8 + 5G→A transition appeared to have a dramatic effect in reducing the strength of the authentic 5′ splice site (Table 4), which allows for other cryptic sites (e.g., A and B) with relatively high strengths to compete for splicing. This may help explain why the G nucleotide is highly conserved at the position IVS8 + 5 in approximately 80% of the 5′ splice sites examined across five different species (24). Although the splice site scores predicted by the three algorithms used in this study were generally concordant, they did not always predict the observed frequencies of splice site usage, suggesting that these models only partially account for complex splicing process.
In summary, we identified 17 atypical PKD1 and PKD2 splice mutations from the published literature. We found that bioinformatic analysis can be useful for evaluating the pathogenicity of these mutations although RT-PCR is needed to provide the definitive proof. In a multiplex family with ADPKD, we identified an atypical PKD2 splice mutation IVS8 + 5G→A, which co-segregated with the affected members of this PKD2-linked family. Bioinformatic analysis further showed that this mutation is predicted to disrupt an evolutionary conserved binding motif for an exonic splice enhancer. RT-PCR showed that the mutation resulted in six splice variants through activation of different donor and acceptor cryptic splice sites with all mutant gene products predicted to terminate with premature stop codons. Taken together, our data provide definitive evidence that IVS8 + 5G→A is a pathogenic mutation for PKD2 and highlights the importance of functional analysis of UCVs in molecular diagnostic testing.
Disclosures
None.
Acknowledgments
We are indebted to all of the participating members of the study family (TOR101). This work was supported by a grant from the Kidney Foundation of Canada (to Y.P.).
Footnotes
Published online ahead of print. Publication date available at www.cjasn.org.
K.W. and X.Z. contributed equally to this study.
- Received February 27, 2008.
- Accepted October 21, 2008.
- Copyright © 2009 by the American Society of Nephrology