The Prp19-associated complex [NTC (nineteen complex)] plays a crucial role in intron removal during premature mRNA splicing in eukaryotes. Only one component of the NTC, Cwc2, is capable of binding RNA. In the present study we report the 1.9 Å (1 Å=0.1 nm) X-ray structure of the Cwc2 core domain, which is both necessary and sufficient for RNA binding. The Cwc2 core domain contains two sub-domains, a CCCH-type ZnF (zinc finger) and a RRM (RNA recognition motif). Unexpectedly, the ZnF domain and the RRM form a single folding unit, glued together by extensive hydrophobic interactions and hydrogen bonds. Structure-guided mutational analysis revealed that the intervening loop [known as the RB loop (RNA-binding loop)] between ZnF and RRM plays an essential role in RNA binding. In addition, a number of highly conserved positively charged residues on the β-strands of RRM make an important contribution to RNA binding. Intriguingly, these residues and a portion of the RB loop constitute an extended basic surface strip that encircles Cwc2 halfway. The present study serves as a framework for understanding the regulatory function of the NTC in RNA splicing.
- RNA recognition motif
- nineteen complex (NTC)
- structural biology
- zinc finger
Intron removal in pre-mRNA (premature mRNA) is an essential step in eukaryotic mRNA processing carried out by the spliceosome, a multi-component RNP (ribonucleoprotein) assembly [1,2]. The spliceosome contains five major components, namely U1, U2, U4, U5 and U6 snRNPs (small nuclear RNPs), which collectively catalyse a two-step transesterification reaction. In addition to these five snRNPs, several protein complexes are involved in this crucial reaction. The NTC (nineteen complex) is formed by the scaffold protein Prp19 and a number of associated splicing factors. The NTC joins and stays with the spliceosome during the two-step splicing reaction, indicating an important role for this complex in pre-mRNA splicing [3,4]. For example, the NTC is required for stable association of the U5 and U6 snRNPs with spliceosome  and regulates the second step of the reaction . Many RNA–protein and RNA–RNA interactions are specified by the NTC [1,6].
There are at least ten components in the yeast NTC [5,7,8], among which only one protein, Cwc2/NTC40 (hereafter referred to as Cwc2), is capable of binding to RNA. Cwc2 is predicted to contain two RNA-binding motifs at its N-terminus, including a CCCH-type ZnF (zinc finger) and an RRM (RNA recognition motif) . Furthermore, the flexible C-terminus of Cwc2 interacts with the WD40 domain of Prp19 . In vivo depletion of Cwc2 resulted in the destabilization of spliceosome snRNA . Purified full-length Cwc2 protein exhibited normal RNA-binding capacity with low sequence specificity, whereas the RRM together with the C-terminal flexible region of Cwc2 displayed a reduced level of RNA binding . However, it remains unclear how Cwc2 binds to RNA.
Both the ZnF and the RRM are among the most abundant and well-studied structural motifs in eukaryotes. The ZnF, which comes in all flavours, is capable of not only recognizing nucleic acids but also of mediating protein–protein interactions [11–14]. On the basis of structural analysis, ZnF motifs were classified into eight fold groups . The TIS11d family, with CX8CX5CX3H (where X, any amino acid) sequence, was identified as a new subgroup of ZnF motif which contains few secondary structural elements . But there is no other available structure to support this newly classified subtype. On the other hand, RRM-containing proteins play an important role in most post-transcriptional processes owing to their diverse modes of RNA binding in higher organisms. The canonical RRM binds to RNA through three aromatic residues located in the consensus sequences termed RNP1 and RNP2 [16,17]. Bioinformatic studies indicated that half of RRM-containing proteins harbour multiple copies of RRM or other domains, which are often found to be ZnF motifs . RRM is involved not only in RNA recognition but also in protein–protein interactions . Despite several reports on protein–protein interactions involving the RRMs [18–22], it remains unknown whether RRM can directly associate with ZnF and, if so, how this might happen.
In the present study we have determined the structure of the core domain of Cwc2 at 1.9 Å (1 Å=0.1 nm) resolution by X-ray crystallography. Structural analysis revealed that the core domain of Cwc2 protein contains two subdomains, a CCCH-type ZnF domain and a RRM. To our surprise, the ZnF domain and the RRM are closely associated with each other through a large buried interface, appearing as a single folding unit. The extensive hydrophobic interface between ZnF and RRM is augmented by networks of hydrogen bonds. The ZnF domain shows remarkable structural similarity to the TIS11d family, suggesting that it belongs to this new subtype. Interestingly, the linker loop connecting the ZnF and RRM domains, together with specific residues on several β-strands of the RRM, forms a positively charged surface strip that plays an essential role in RNA binding. The structural features, together with biochemical characterization, allowed us to propose a working model for RNA recognition.
Protein preparation and crystallization
All clones were generated using a standard PCR-based cloning strategy, and mutagenesis of Cwc2 was generated with two-step PCR. The identities of individual clones were verified through double-strand plasmid sequencing. Cwc2 variants were overexpressed in the Escherichia coli strain BL21(DE3) at 18°C using pET15b vectors with an N-terminal His6 tag or pET21b vectors with a C-terminal His6 tag. Cwc2 ZnF domain and RRM were cloned in pET15b and pBB75 vectors respectively and were co-expressed in E. coli strain BL21(DE3). The soluble fraction of the E. coli lysate was purified over a Ni-NTA (Ni2+-nitrilotriacetate) column (Qiagen). After affinity purification, all proteins were further purified by cationexchange chromatography (Source-15S; GE Healthcare) and size-exclusion chromatography (Superdex-200; GE Healthcare). The protein concentrations were determined by spectroscopic measurement at 280 nm. Crystals were grown at 18°C using the hanging-drop vapour diffusion method. Rod-shaped crystals appeared overnight in the well buffer containing 21% PEG [poly(ethylene glycol)] 3350, 200 mM ammonium citrate and 100 mM sodium citrate, pH 6.5, and grew to full size in 3 days.
Data collection, structure determination and refinement
The complex of Cwc2-(1–121) and Cwc2-(133–227) SAD (single-wavelength anomalous dispersion) and native data were collected at the SSRF (Shanghai Synchrotron Radiation Facility) beamline BL17U. Cwc2-(1–227) native data was collected on the Rigaku Saturn 944+ CCD (charge-coupled-device) configured with the Rigaku MicroMax-007HF generator. All data were integrated and scaled using the HKL2000 package. Further processing was carried out using programs from the CCP4 suite .
The zinc position in the Zn-SAD data of the complex of Cwc2-(1–121) and Cwc2-(133–227) was determined by the program SHELXD . The identified single zinc atom was refined and the initial phases were generated in the program PHASER  with the SAD experimental phasing module. The real-space constraints were applied to the SAD electron density with density modification. A crude model was traced automatically using the program BUCCANEER  and was optimized further by RESOLVE  together with PHASER. Manual model building and refinement were performed iteratively with COOT  and PHENIX . The Cwc2 model obtained from Zn-SAD data was used for molecular replacement with the program PHASER into the native data of the complex of Cwc2-(1–121) and Cwc2-(133–227) and then using this complex native structure as a molecular replacement model for Cwc2-(1–227) native structure determination. Both structures were refined with COOT and PHENIX iteratively. Data collection and refinement statistics are summarized in Table 1.
Limited proteolysis assay
The full-length Cwc2 protein was incubated with increasing concentrations of trypsin (Sigma) in 20 μl reaction buffer containing 20 mM Tris/HCl, pH 8.0, and 150 mM NaCl at room temperature (25°C) for 10 min, followed by addition of 0.2 μl 100 mM PMSF. Half of each sample was separated on SDS/PAGE (15% gel) and stained with Coomassie Blue R250. The stable band was excised and identified by MS analysis with Q-Star (ABI) after in-gel trypsin digestion.
EMSA (electrophoretic mobility-shift assay)
The yeast U6 snRNA was prepared and 32P-labelled according to a previously published procedure . For EMSA, a range of different concentrations of Cwc2 or mutant proteins were incubated with 20000 d.p.m. 32P-labelled RNA probe for 30 min on ice in reaction buffer (20 mM Hepes/KOH, pH 7.9, 300 mM NaCl, 10 mM ZnCl2, 10% glycerol, 10 mM dithiothreitol, 0.45 mg/ml E. coli tRNA and 0.8 mg/ml BSA), followed by addition of 0.4 vol. glycerol loading dye (0.05% Bromophenol Blue, 10% glycerol) . Reactions were then resolved on 15 cm native 5% acrylamide gels (37.5:1 acrylamide/bisacrylamide) in 0.5× Tris-borate buffer containing 6% glycerol at 20 V/cm for approximately 1.5 h. Dried gels were exposed to phosphorimager screens and analysed by a Typhoon 9400 variable scanner (Amersham Pharmacia).
Overall structure of the N-terminus of Cwc2
Cwc2 from Saccharomyces cerevisiae has been reported to contain a CCCH-type ZnF domain and a non-consensus RRM  (Figure 1A). The C-terminal sequences of Cwc2 are known to interact with Prp19 . Sequence alignment revealed that the N-terminal two-thirds of Cwc2 are highly conserved in its orthologues from yeast to human, whereas their C-terminal sequences show little homology  (see Supplementary Figure S1 at http://www.BiochemJ.org/bj/441/bj4410591add.htm). To identify stable structural core domain(s) of Cwc2, the full-length protein (residues 1–337) was subjected to digestion by increasing amounts of trypsin. Figure 1(B) clearly shows that Cwc2 contains a trypsin-resistant core domain with an apparent molecular mass of approximately 25 kDa. The boundaries of this stable core domain were identified by MS to include residues 1–237 (results not shown).
We crystallized the structural core domain (residues 1–227) in the P212121 space group (Table 1). The structure was determined at 1.9 Å resolution using a zinc-based SAD method (Table 1). The structural core domain of Cwc2 has a globular appearance, with a diameter of approximately 42 Å. As anticipated, the overall structure is composed of two domains (Figures 2A and 2B), a ZnF domain (61–120) and an RRM (136–227). The 15-residue linker sequence (121–135) between these two domains was named the RB loop (RNA-binding loop) owing to its essential role in RNA binding (described below). Five amino acids (127–131) in the middle of the RB loop had discontinuous electron density, suggesting a flexible conformation. Analysis of the surface electrostatic potential revealed a positively charged continuous strip which extends from the RB loop to the β-strands of RRM (Figure 2B), suggesting a potential role in RNA binding. This strip consists of at least eight positively charged residues: three lysine residues (Lys132, Lys133 and Lys135) from the C-terminal end of the RB loop, two arginine residues (Arg172 and Arg174) from β2, Lys187 from β3, Lys224 from β4 and Lys179 from the loop connecting β2 and β3 (Figure 2C and Supplementary Figure S1).
Unexpectedly, the ZnF domain is closely stacked against the RRM (Figure 3A), raising the possibility that these two domains might interact with each other independently without any covalent linker. To examine this scenario, we co-expressed these two domains (1–121 and 122–227, or 1–132 and 133–227). Interestingly, the untagged ZnF domain could be pulled down by Ni-NTA resin only when it was co-expressed with the His6-tagged RRM (see the Supplementary Experimental section and Supplementary Figure S2 at http://www.BiochemJ.org/bj/441/bj4410591add.htm). Consistent with this result, 11 residues of the RB loops in the Cwc2 structural core domain (residues 1–227) could be removed by elevated concentrations of proteases. The protease-resistant core was also crystallized and the structure was determined at 1.7 Å resolution (see Supplementary Figure S3 at http://www.BiochemJ.org/bj/441/bj4410591add.htm and Table 1). The structure is almost identical to that of the intact Cwc2 core domain, with a rmsd (root mean squared deviation) of 0.142 Å for all aligned Cα atoms.
The interaction of RRM with ZnF is primarily mediated by the α10 helix. The interaction includes a network of extensive hydrophobic contacts, and four inter-domain hydrogen bonds. Notably, the hydrophobic contacts are mediated by the single aromatic residue Phe194 on α10 of RRM, which is nestled in a hydrophobic cave formed by seven non-polar amino acids from ZnF: Leu53, Phe72, Phe76, Ala77, Ile93, Pro94 and Phe112. Reinforcing these hydrophobic contacts, the carboxylate side-chain of Glu197 on α10 of RRM accepts three charge-stabilized hydrogen bonds from the side-chains of Arg114 and Lys78 on ZnF (Figures 3C and 3D). In addition, Glu193 on α10 of RRM also accepts a hydrogen bond from Lys116 of the ZnF motif (Figure 3C). Glu193, Phe194 and Glu197, which make important contributions to the RRM–ZnF interaction, are highly conserved from yeast to human. Mutation of these residues led to abrogation of the interaction between RRM and ZnF, which was validated by pull-down assay (Supplementary Figure S2). These results suggested that the ZnF domain indeed interacts with the RRM to form a stable structure in the absence of a connecting loop.
The structure of the CCCH ZnF domain
The CCCH ZnF domain (residues 1–120) comprises seven short α-helices. The bound zinc ion is tetrahedrally co-ordinated by three cysteine residues (Cys73, Cys81 and Cys87) and one histidine residue (His91), which are arranged in a Cys-X7-Cys-X5-Cys-X3-His sequence (Figure 4A and Supplementary Figure S1). This zinc-binding core (residues 72–93) is surrounded by helices α2 and α6 and the N-terminal loop sequences 1–14 and 18–30 (Figure 4A). The position of the zinc atom was determined by its anomalous signal, and the electron density of the entire metal-binding core is of excellent quality (see Supplementary Figure S4 at http://www.BiochemJ.org/bj/441/bj4410591add.htm). The zinc-binding core (residues 72–93) only contains a short helix α5 between the first two cysteine residues and is preceded by a loop and followed by another loop (Figure 4B). This structural fold comprises few secondary structural elements and is different from those of eight known ZnF fold groups . A novel ZnF fold has been identified in the single-stranded RNA-binding protein TIS11d . Structural alignment revealed that the ZnF domain from Cwc2 adopts a similar fold as the first or second ZnF domain of TIS11d (Figure 4C). We propose that the ZnF domains present in Cwc2 and TIS11d are classified as a new subtype of ZnF structure (Figure 4C).
TIS11d binds to RNA via hydrogen bonds, mediated by positively charged residues in TIS11d, and stacking interactions between two conserved aromatic residues and the RNA bases . Mutation of either of the two aromatic residues in TIS11d resulted in abrogation of the RNA-binding activity . Among the two aromatic residues of TIS11d, only one appears to be conserved in Cwc2. This residue, Tyr89 in Cwc2, is exposed to solvent and thus might be involved in RNA binding. However, the missense mutation Y89A in Cwc2 only exhibited a moderate effect on RNA binding (see Supplementary Figure S5 at http://www.BiochemJ.org/bj/441/bj4410591add.htm), suggesting a limited role by the aromatic residues of the ZnF domain. In addition, the ZnF domain of Cwc2 does not contain the corresponding residues that mediate the critical hydrogen bonds in TIS11d–RNA interactions.
Because the isolated ZnF domain of Cwc2 remains completely insoluble, we are unable to assess its RNA-binding ability. The observation that several α-helices (α1, α2 and α6) and two loops (1–14 and 18–30) surround the ZnF domain raises the possibility that the ZnF domain could interact with RNA under conditions in which these α-helices and/or N-terminal loops undergo major conformational changes during dynamic spliceosome assembly.
The structure of the RRM
The RRM of Cwc2 adopts a canonical βαββαβ topology, with a four-stranded antiparallel β-sheet (β1, β2, β3 and β4) packed against two α-helices (α9 and α10) (Figure 5A). The overall structure is similar to that of other typical RRM, as exemplified by superposition with the second RRM of Prp24  (Figure 5B). In addition, it contains a short helix, α8, between strand β1 and helix α9.
To examine the role of RRM in RNA binding, we used an RNA-binding assay with the full-length 112-nucleotide U6 snRNA that Cwc2 directly interacts with . As anticipated, the fulllength Cwc2 and the trypsin-resistant core domain exhibited robust RNA-binding activity and showed moderate binding affinity differences (Figure 6A, lanes 1–10, and Supplementary Figure S6 at http://www.BiochemJ.org/bj/441/bj4410591add.htm). In contrast, the C-terminal fragment of Cwc2 (residues 227–339) exhibited no detectable RNA-binding activity (Figure 6A, lanes 11–15). These findings demonstrate that Cwc2 binds to RNA via its structural core domain. Intriguingly, the RRM (122–227) alone failed to bind RNA (Figure 6B, lanes 1–5).
A representative RRM usually recognizes RNA using two conserved sequence elements, RNP1 in β3 and RNP2 in β1. Aromatic amino acids from these two elements stack with two nucleotides, whereas the first positively charged residue of RNP1 (lysine or arginine) donates a hydrogen bond to the phosphate group between the two nucleotides [16,17]. On the basis of sequence alignment (Figure 1A and Supplementary Figure S1), the corresponding aromatic residues were identified to be Tyr138 in RNP2 and Phe183 in RNP1 in Cwc2. In addition, a positively charged residue, Lys179, is located in close proximity to RNP1. A triple mutation in the RRM of the core domain (Y138A, K179A and F183A) led to complete abolishment of RNA binding (Figure 6B, lanes 6–10), confirming an essential role for these residues in RNA binding. The loss of RNA binding was not due to structural disruption, because the triple mutant Cwc2 (Y138A/K179A/F183A) exhibited the same solution behaviour on gel filtration as the wild-type Cwc2 (see Supplementary Figure S7 at http://www.BiochemJ.org/bj/441/bj4410591add.htm). Together with the observation that RRM alone is unable to bind to RNA, these results suggest a requirement of additional structural elements in Cwc2 for RNA binding.
The RB loop plays an important role in RNA binding
The extended RB loop (residues 122–135) connects the ZnF domain and the RRM. The C-terminal portion of the RB loop contains a stretch of positively charged amino acids, RKKNK (residues 131–135) (Supplementary Figure S1). These residues appear to form a positively charged strip together with a number of other residues from several β-strands of the RRM (Figures 2B and 2C and Supplementary Figure S1), hinting at a potential RNA binding site. To examine this hypothesis, we individually mutated the basic amino acids (Arg131–Lys135) of the RB loop to alanine. These missense mutations failed to alter the RNA-binding activity of Cwc2 (results not shown). However, when all four positively charged amino acids were replaced by an alanine residue, the RNA-binding activity was nearly abolished for the quadruple mutant (R131A/K132A/K133A/K135A) (Figure 6C, lanes 1–5). Therefore, we named this connecting loop the RB loop.
To further examine the importance of the RB loop integrity in RNA binding, we co-expressed two discrete domains with the RB loop sequences linked to either ZnF or RRM (residues 1–135 and 136–227, or 1–121 and 122–227). These proteins were purified to homogeneity and tested for their RNA-binding activity in vitro. Strikingly, the RNA-binding mode of ZnF and the RRM–RB loop complex (residues 1–121 and 122–227) was dramatically changed (Figure 6C, lanes 6–10). Additionally, there was little RNA-binding activity for ZnF–RB loop and RRM complex (residues 1–135 and 136–227) (Figure 6C, lanes 11–15). These results strongly suggest that the covalent linkage of the RB loop with the ZnF and RRM domains is important for U6 snRNA binding.
Previous structural studies on RRM revealed a critical role in RNA binding for both the β-sheet surface of the RRM and the loops connecting the β-sheets and α-helices [16,32–35]. In addition, the N-terminal extension of the 65 kDa C-terminal RRM facilitated RNA-binding activity indirectly via stabilization of the RRM structure . In the present study, we provide the first piece of evidence that the linker sequences outside the RRM – those connecting the ZnF domain and the RRM – directly interact with RNA.
In the present study, we report the high-resolution crystal structure of the bulk of Cwc2, an essential protein for RNA splicing and the only known RNA-binding factor in the NTC. An unanticipated structural feature is the tight association between the ZnF domain and the RRM, and this feature appears to be essential for the RNA-binding activity of Cwc2. Guided by the structure, we performed RNA-binding studies and identified essential amino acids on the surface of the structure.
Surprisingly, these amino acids do not simply define a localized surface epitope. Rather, they collectively form an elongated strip that extends from the RB loop on one side of the globular Cwc2 protein to the β-strands of the RRM located on the opposite side. On the basis of these results and analyses, we propose a novel RNA-binding model for the Cwc2 protein (Figure 6D). Whereas the C-terminal portion of Cwc2 interacts with Prp19, the scaffold protein of the NTC, the RB loop and RRM bind to RNA through the positively charged strip (Figure 6D). Supporting this model, the critical aromatic residues in the RRM of Cwc2 are highly conserved from yeast to human  (see Supplementary Figure S1). Intriguingly, however, the positively charged RB loop residues required for RNA binding are only highly conserved in yeast but less so in higher organisms (Supplementary Figure S1). This could reflect evolutionary differences in the NTC. For example, the human Cwc2 orthologue RBM22 contains only one lysine residue in the C-terminal portion of the inter-domain connecting loop, suggesting a variation in RNA binding in humans.
One important unanswered question is whether Cwc2 binds to specific RNA sequences or whether it merely exhibits relatively non-specific RNA-binding activity to assist the RNA splicing function. The gradual shift of the U6 snRNA in the presence of increasing amounts of Cwc2 is consistent with relatively non-specific RNA-binding activity. But this could be due to the lack of a specific binding site in the U6 snRNA. Additionally, Cwc2 failed to bind to different RNA ligands comprising stem-loop, single-stranded or double-stranded RNA from U6 snRNA respectively (see Supplementary Figure S8 at http://www.BiochemJ.org/bj/441/bj4410591add.htm). We have begun to address this issue by attempting to identify potential RNA binding site(s) from a pool of random and degenerate RNA sequences. The preliminary results do not support stringent RNA sequence recognition by Cwc2.
Our biochemical assay indicated that the Cwc2 RRM alone is insufficient for binding to RNA. This might be due to its poorly conserved RNP1 sequence compared with the consensus RNP1 motif (Figure 1A). The first residue of RNP1 in Cwc2 is cysteine, instead of a positively charged amino acid in other RNP1 sequences, which directly binds to a phosphate group of RNA. In addition, a lysine residue (Lys185) is located at the fifth position and its side chain contributes to the positively charged strip (Figures 2C and 5C). Importantly, the presence of Lys185 results in the register shift of one amino acid for Phe186, which is now buried in the hydrophobic core of RRM and unable to interact with the RNA base, as observed for the corresponding residue in the consensus RNP1 motif.
RNA splicing is a multi-step reaction requiring multiple protein–RNA complexes. In these steps, the NTC plays an obligate role in the regulation of spliceosome rearrangement and maintenance of splicing fidelity . The present study represents an important step towards deciphering the mysteries of RNA binding by the NTC. The results are somewhat unexpected, which gives rise to an atypical model of RNA binding. Our proposed model of RNA binding by Cwc2, which is based on mutational analysis, remains to be experimentally verified through biochemical and structural investigations.
Peilong Lu, Guifeng Lu and Ping Yin performed all of the experiments and analysed the experimental data. Li Wang and Wenqi Li purified the recombinant proteins. Chuangye Yan determined the structure. Ping Yin supervised the project and prepared the manuscript.
This work was supported by the Special China Postdoctoral Science Foundation [grant number 201003126].
We are grateful to members of Yigong Shi's laboratory for careful discussion. We thank the scientists J. He and S. Huang at the beamline BL17U of the Shanghai Synchrotron Radiation Facility.
The structure of the mRNA splicing complex component Cwc2 amino acids 1–121+133–227 and 1–127 alone have been deposited in the Protein Data Bank under accession numbers 3U1L and 3UIM respectively.
Abbreviations: EMSA, electrophoretic mobility-shift assay; Ni-NTA, Ni2+-nitrilotriacetate; NTC, nineteen complex; pre-mRNA, premature mRNA; RB loop, RNA-binding loop; rmsd, root mean squared deviation; RNP, ribonucleoprotein; RRM, RNA recognition motif; SAD, single-wavelength anomalous dispersion; snRNP, small nuclear ribonucleoprotein; ZnF, zinc finger
- © The Authors Journal compilation © 2012 Biochemical Society