Research article

Crystal structure of Ssu72, an essential eukaryotic phosphatase specific for the C-terminal domain of RNA polymerase II, in complex with a transition state analogue

Yong Zhang, Mengmeng Zhang, Yan Zhang


Reversible phosphorylation of the CTD (C-terminal domain) of the eukaryotic RNA polymerase II largest subunit represents a critical regulatory mechanism during the transcription cycle and mRNA processing. Ssu72 is an essential phosphatase conserved in eukaryotes that dephosphorylates phosphorylated Ser5 of the CTD heptapeptide. Its function is implicated in transcription initiation, elongation and termination, as well as RNA processing. In the present paper we report the high resolution X-ray crystal structures of Drosophila melanogaster Ssu72 phosphatase in the apo form and in complex with an inhibitor mimicking the transition state of phosphoryl transfer. Ssu72 facilitates dephosphorylation of the substrate through a phosphoryl-enzyme intermediate, as visualized in the complex structure of Ssu72 with the oxo-anion compound inhibitor vanadate at a 2.35 Å (1 Å=0.1 nm) resolution. The structure resembles the transition state of the phosphoryl transfer with vanadate exhibiting a trigonal bi-pyramidal geometry covalently bonded to the nucleophilic cysteine residue. Interestingly, the incorporation of oxo-anion compounds greatly stabilizes a flexible loop containing the general acid, as detected by an increase of melting temperature of Ssu72 detected by differential scanning fluorimetry. The Ssu72 structure exhibits a core fold with a similar topology to that of LMWPTPs [low-molecular-mass PTPs (protein tyrosine phosphatases)], but with an insertion of a unique ‘cap’ domain to shelter the active site from the solvent with a deep groove in between where the CTD substrates bind. Mutagenesis studies in this groove established the functional roles of five residues (Met17, Pro46, Asp51, Tyr77 and Met85) that are essential specifically for substrate recognition.

  • C-terminal domain
  • RNA polymerase II
  • transcription regulation
  • transcription termination
  • tyrosine phosphatase


The CTD (C-terminal domain) of RNA polymerase II in eukaryotes orchestrates the temporal and spatial control of transcription and epigenetic regulation of gene expression [1]. The CTD, found only in eukaryotes, consists of 15–52 tandem heptapeptide repeats of the consensus sequence, Tyr1SerProThrSerProSer7 [2,3]. The local conformation of CTD phosphorylation and prolyl-isomerization states provide the platform for recruiting various transcriptional regulatory factors. CTD phosphorylation occurs primarily at serine residues 2 and 5 and serves as a prominent regulator of gene expression [4]. It has previously been shown that Ser7 can also be phosphorylated in vivo and its phosphorylation is important for snRNA (small nuclear RNA) biogenesis [5]. CTD regulation also occurs through prolyl-isomerization of the proline residues in the CTD heptapeptide sequence [6]. By switching the cistrans conformation of a proline residue adjacent to a phosphorylated serine residue, the interaction of the CTD and the binding partners it recruits can be modulated.

Dynamic reversible phosphorylation of the CTD plays an essential role, not only in the recruitment and assembly of transcription complexes, but also in the temporal control of transcription and mRNA processing (reviewed in [7]). Ser5 phosphorylation is required for assembly of the PIC (pre-initiation complex) and facilitates mRNA capping via recruitment of capping enzymes. As the transcription complex moves away from the initiation site, Ser5 gradually becomes dephosphorylated, whereas Ser2 is phosphorylated. Ser2 phosphorylation is the predominant CTD pattern on both elongating and terminating RNA polymerase II, which ensures efficient 3′-RNA processing by triggering the recruitment of the 3′-RNA processing machinery. Non-phosphorylated CTDs are necessary for the recycling of RNA polymerase II and its subsequent binding to a promoter.

Phosphorylation/dephosphorylation of the CTD is catalysed by cyclin-dependent kinases and CTD-specific phosphatases [8]. The best characterized CTD-specific phosphatases are the Fcp/Scp (small CTD phosphatase) family. Fcp1 is essential for cell survival in budding and fission yeasts, presumably due to its function in RNA polymerase II recycling [9]. Scp has been identified as a phos.Ser5 (phosphorylated Ser5) phosphatase [10], but it only affects a subset of genes involved in neuronal differentiation [11]. It is a component of the neuronal silencing complex REST (repressor element 1-silencing transcription factor) and prohibits the inappropriate differentiation of neuronal cells [11]. Scp1 and Fcp1 share the same active-site topology and reaction mechanism [12,13], but are distinct in their preferences for Ser2 or Ser5 dephosphorylation and domain architecture [14]. Interestingly, Scps which exhibit a strong preference towards phos.Ser5 are conserved in higher eukaryotes, but not in yeast. Inactivating Scp genes will promote neurogenesis, but does not lead to cell death [11]. It is also observed that even though neuronal gene expression can be de-repressed upon Scp inactivation, general transcription in the cell is not eliminated [11]. Therefore eukaryotes must have other Ser5 phosphatases which function as housekeeping proteins for RNA polymerase II recycling, whereas Scps function at an epigenetic level and only specifically affect the expression profile of a subset of genes.

A novel phosphatase, Ssu72, was identified as the Ser5 phosphatase in yeast whose phosphatase activity was identified as essential for cell survival and transcription cycling [15,16]. Ssu72 belongs to a highly conserved protein family found in eukaryotes. Compromising Ssu72 phosphatase activity results in the accumulation of a hyperphosphorylated Ser5 form of the CTD [16]. Ssu72 was first identified by its genetic interaction with the transcription factor TFIIB (transcription factor IIB), as a mutation in Ssu72 disrupted this interaction and affected the accuracy of start site selection [15]. Further studies identified Ssu72 as a subunit of the CPF (cleaving and processing factor) complex, indicating its participation in mRNA processing regulation [17]. Ssu72 may also be involved in transcription termination as mutations at Ssu72 directly alter the termination of the expression of snoRNA (small nucleolar RNA) [18]. Interestingly, prolyl-isomerization regulation of the CTD, mediated by Ess1 in yeast, is also linked functionally to the phosphatase activity of Ssu72 [19]. It has been proposed that Ess1 can change the balance of the cistrans conformation of proline by adjusting the conformation suitability of Ser5–Pro6 as a Ssu72 substrate, which in turn determines the pathway selection for transcriptional termination [19]. Therefore the effectiveness of Ssu72-mediated dephosphorylation of Ser5 of the CTD can be modulated by the prolyl-isomerization state of the CTD. Recently, an atypical phosphatase Rtr1 has been defined as a CTD phosphatase, but its biological role is not fully understood [20].

To gain insight into the molecular mechanism of phosphoryl transfer by the Ssu72 phosphatase family and the substrate recognition of Ssu72, we purified recombinant yeast, Drosophila and human Ssu72 to examine their catalytic activities and kinetic properties against a generic phosphatase substrate and their natural substrate CTD peptides. Furthermore, the X-ray crystal structures of Drosophila Ssu72 were obtained. The structure of Ssu72 clearly indicates it is a unique member of the LMWPTP [low-molecular-mass PTP (protein tyrosine phosphatase)] superfamily, and further exhibits a deep groove that may potentially bind to the CTD of RNA polymerase II. A complex structure of Ssu72 with vanadate highlights the formation of a phosphoryl-cysteine intermediate and mimics the transition state of the phosphoryl transfer reaction. A unique cap domain excludes direct access to the active site of Ssu72 by substrate and provides a possible explanation for the selectivity of Ssu72. We have also identified several residues in the CTD binding groove that play essential roles in substrate recognition.


Cloning and protein purification

The Ssu72 genes from various organisms were cloned from Saccharomyces cerevisiae genomic DNA, Drosophila melanogaster synthetic genes (Drosophila Genomics Resource Center) and Homo sapiens cDNA (Origene). The genes encoding Ssu72 were amplified by PCR using primers with convenient ligation-independent cloning sites [21] and corresponding templates. The PCR products were directly inserted into an in-house-generated pET28b derivative vector, pETHis8-SUMO, which was linearized with T4 DNA polymerase. The resulting expression plasmid pETHis8-SUMO-ssu encodes a corresponding Ssu72 fusion protein with an N-terminal 124-amino-acid tag consisting of a His8 leader (MGSSHHHHHHHHSSGSDSEVNQEAKP) followed by an 86-amino-acid SUMO (small ubiquitin-related modifier) fragment and a PreScission protease [22] recognition sequence (ALEVLFQGPGSG). Escherichia coli BL21 (DE3) were transformed with the pETHis8-SUMO-ssu vectors and grown in Luria–Bertani medium containing 50 μg/ml kanamycin at 37 °C until the D600 reached 0.6–1.0. The cultures were supplemented with 0.5 mM IPTG (isopropyl β-D-thiogalactopyranoside) and then grown at 16 °C for 16 h. The induced cells were harvested with centrifugation (5000 rev./min, Beckman Avanti J-26, JLA 8.1 rotor) and disrupted by sonication on ice (Misonix sonicator 4000, amplitude 90%) for 5 cycles (30s each cycle, with 1s on/off pulses, and a 4min pause between cycles). The recombinant Ssu72 protein was affinity purified using a Ni-NTA (Ni2+-nitrilotriacetate) column (Qiagen) and eluted with buffer A [250 mM imidazole, 500 mM NaCl, 100 mM Tris/HCl (pH 8.5) and 10 mM 2-mercaptoethanol]. A final concentration of 3 mM EDTA was added to the eluted protein to eliminate any residual nickel ion. The eluted protein was then dialysed against buffer B [20 mM Tris/HCl (pH 8.5), 100 mM NaCl and 10 mM 2-mercaptoethanol] and digested with PreScission protease (GE Healthcare) at 4 °C overnight. The truncated tag and untruncated protein were removed by reloading the digested sample on a second Ni-NTA column equilibrated with buffer C [30 mM imidazole, 20 mM Tris/HCl (pH 8.5), 100 mM NaCl and 10 mM 2-mercaptoethanol]. Flow-through fractions were pooled and applied on a MonoQ column, and proteins were eluted with a NaCl gradient. Ssu72 proteins were further purified by gel filtration using a Superdex-75 column (GE Healthcare) equilibrated against buffer D [25 mM Hepes (pH 7.5), 80 mM NaCl and 1 mM DTT (dithiothreitol)]. Finally, the pure Ssu72 proteins were concentrated with a 10 kDa Vivaspin-20 concentrator (GE Healthcare) to ~10 mg/ml.

Crystallization and compound soaking

Crystallization trials were carried out using sitting-drop vapour diffusion at 4 °C. An initial Ssu72 crystal was identified from the Classics suite crystal screen (Qiagen) by mixing equal amounts of protein solution (10 mg/ml) and reservoir solution containing 0.1 M Hepes sodium salt (pH 7.5), 10% (v/v) propan-2-ol and 20% (w/v) PEG [poly(ethylene) glycol] 4000. Subsequently, production of the crystal was optimized under the following conditions: 0.1 M Hepes sodium salt (pH 7.5), 10% (v/v) propan-2-ol and 14–18% (w/v) PEG 4000. Crystals appeared within 1 week and grew for 10 more days. A 20–25% (v/v) glycerol was supplemented with crystallization conditions as the cryoprotectant for all crystals used in the experiments. To obtain the complex structure of Ssu72 with vanadate, apo crystals were soaked in mother liquor containing 2 mM sodium orthovanadate solution for 2 h prior to cryoprotection. X-ray diffraction data were collected at beamline 5.0.1 (Advanced Light Source) using a 2×2 ADSC CCD (charge-coupled device) detector. Data were processed with HKL2000 and are summarized in Table 1.

View this table:
Table 1 Crystallographic data statistics

Parentheses indicate statistics for highest resolution shell.

Structure determination and refinement

The crystal structures of Drosophila Ssu72 were determined by MR (molecular replacement) using a low-resolution structure (PDB code 3FMV; 3.3 Å) from Northeast Structure Genomic Consortium as the initial search model using the program Phaser from the CCP4 package [23]. Initial refinement was carried out using the Phenix refinement suite [24] under NCS (non-crystallographic symmetry) restraints with a 5% test set (reflections) excluded for Rfree cross-validation [25]. Electron-density maps σA-weighted 2Fo−Fc and Fo−Fc maps were calculated after each cycle of refinement and inspected to guide model rebuilding using Coot [26]. For the complex structure of Ssu72 and vanadate, the locations of the vanadate group were clear in Fo−Fc maps. The inhibitor model was built into the electron density using Coot. The final models were evaluated by PROCHECK [27]. Refinement statistics are summarized in Table 1. PyMOL (DeLano Scientific; was used to produce molecular graphics renditions.

Phosphatase activity assays

An assay for non-specific phosphatase activity was carried out using the general substrate PNPP (p-nitrophenyl phosphate) (Fluka, Sigma–Aldrich) in 0.2 ml of reaction mixture containing 0.1 M citrate buffer (pH 6.0), 1 mM DTT, various PNPP concentrations (0–40 mM for Drosophila and human Ssu72, 0–320 mM for yeast Ssu72), and 2.0 μg of the enzyme. After incubation at 28 °C (or 37 °C for human Ssu72) for 15 min, the reaction was terminated by adding an equal volume of 2.0 M NaOH, and the released PNPP was measured at 405 nm using a Tecan infinite 200 microplate reader. The optimum pH for enzyme activity was determined in 0.1 M citrate buffer, 0.1 M Mes buffer and 0.1 M Tris/HCl (adjusted in the range of pH 4 –9) at 28 °C (or 37 °C for human Ssu72) with 2.4 mM PNPP as the substrate. Temperature effects on activity were measured in 0.1 M citrate buffer (pH 6.0) at different temperatures. Michaelis–Menten kinetic parameters for purified Ssu72 towards PNPP were determined by measuring initial reaction rates at various PNPP concentrations in the above reaction buffer. Data were fitted to the Michaelis–Menten equation with the program Origin7.5 (OriginLab).

The substrate specificity of the Drosophila Ssu72 was examined using Ser5-phosphorylated CTD peptides (Anaspec) by Malachite Green colorimetric assay [29]. The assay was performed in a 200 μl PCR tube in 20 μl of assay buffer [0.1 M Mes (pH 6.0) and 1 mM DTT] containing various concentrations of Ser5-phosphorylated CTD peptide in the absence and the presence of purified Ssu72 at 10 ng per reaction. The assay tubes were incubated in a PCR machine (Bio-Rad) at 28 °C for 10 min. It was stopped with 80 μl of Malachite Green reagent (Biomol Green), and the absorbance was read at 620 nm according to the manufacturer's instructions. Kinetic data were analysed according to the Michaelis–Menten equation with the program Origin7.5 (OriginLab).

DSF (differential scanning fluorimetry)

Yeast, Drosophila and human Ssu72s with various concentrations (1–50 μM) were mixed with 1 mM sodium vanadate in 96-well low-profile PCR plates (ABgene, catalogue number AB-0700) and incubated on ice for 30 min. SYPRO orange dye was added into each well immediately before placing the plate in an RT (reverse transcription)–PCR machine (LightCycler 480, Roche). The protein unfolding experiment was carried out with an increase of temperature from 20 °C to 85 °C. The melting temperature curves of Ssu72s were monophasic and Tm values were derived from the curves.

CD-monitored thermal denaturation

Drosophila Ssu72 (43 μM) was incubated with 1 mM sodium orthovanadate in 20 mM Hepes buffer (pH 7.5). CD spectra was monitored at 220 nm with an AVIV model 420 spectropolarimeter equipped with a thermoelectric temperature control unit. Data points were collected every min/ °C as the sample temperature increased from 30 to 70 °C (1 °C per min). Melting temperature was obtained by fitting the data to a Boltzman sigmoidal function by Origin7.5 (OriginLab).


Phosphatase activity of Ssu72 from various organisms and the overall structure of Drosophila Ssu72

The sequence alignment of ten Ssu72 phosphatases from various organisms in the NCBI (National Center for Biotechnology Information) database (Figure 1A) shows a high degree of conservation from yeast to human (43% identity for yeast with human, 60% identity for Drosophila with human), consistent with the proposed biological role of Ssu72 as an essential phosphatase for the CTD of RNA polymerase II. To study the phosphatase activity and structure of the Ssu72 family, the genes derived from S. cerevisiae, D. melanogaster and H. sapiens were cloned and overexpressed in E. coli. The proteins were purified to homogeneity and phosphatase activities were evaluated by a generic phosphatase assay, the PNPP assay. All three enzymes exhibit phosphatase activity with an optimal pH around 6.0–6.5 (results not shown). Yeast Ssu72 has the lowest enzymatic activity with a kcat of 0.090±0.002 s−1, and a Km of 38±3 mM (Supplementary Figure S1C at For Drosophila Ssu72, much higher activity is observed with the kcat and Km determined to be 1.30±0.04 s−1 and 11.5±0.9 mM (Table 2 and Supplementary Figure S1A). The kcat and Km of human Ssu72 is comparable with the Drosophila homologue and was determined to be 0.47±0.01 s−1 and 11.0±0.8 mM (Supplementary Figure S1B). Our kinetic data for Ssu72 are consistent with previously reported results on yeast and human Ssu72 [30]. The results confirm that the proteins, which were subsequently used in structural studies, are catalytically active.

Figure 1 Sequence alignment and structure of Ssu72

(A) Primary sequence alignment of Ssu72 from Drosophila (Dro, GenBank® accession number NP_608342), Homo sapiens (Hom, GenBank® accession number NP_054907), Saccharomyces cerevisiae (Sce, GenBank® accession number NP_014177), Schizosaccharomyces pombe (Spo, GenBank® accession number NP_594076), Rattus norvegicus (Rat, GenBank® accession number NP_001020828), Bos taurus (Bos, GenBank® accession number XP_595220), Gallus gallus (Gal, GenBank® accession number NP_001007876), Salmo salar (Sal, GenBank® accession number NP_001136192), Xenopus laevis (Xen, GenBank® accession number NP_001084864) and Arabidopsis thaliana (Ara, GenBank® accession number NP_177523). Helices and strands are indicated by coils and arrows respectively. Active-site residues are marked with filled triangles below the alignment. The residues (also included in the mutagenesis study) for the tentative substrate CTD proline binding are marked by open triangles. (B) Surface representation of the Drosophila Ssu72 apo protein structure. The signature motif residue arginine residue is highlighted in orange, and the nucleophilic cysteine residue is highlighted in red. (C) Ribbon representation of the Drosophila Ssu72 structure. The active-site essential residues are shown in stick representation. (D) Surface representation of the conserved residues (shown in orange) on the Drosophila Ssu72 structure. Partially conserved residues are shown in light blue. (E) Superimposition of Drosophila Ssu72 (core domain in pale green, cap domain in cyan, finger region in yellow) and human LMWPTP 1 (PDB code 1XWW, core domain in white, a short helix that differs in different isoforms coloured in red).

View this table:
Table 2 Probing the residues that are important for the substrate CTD peptide binding

N.A., no activity detected.

Drosophila Ssu72 was crystallized, and the diffraction of apo crystals was processed to a resolution of 2.85 Å (1 Å=0.1 nm). The protein crystallizes in space group P21212 (unit cell a=158.1 Å, b=101.9 Å, c=65.6 Å; α=β=γ=90.0 °) with four molecules per asymmetry unit. The protein structure consistently exhibits a high thermal factor (Table 1). This may also explain the difficulty in obtaining a high-resolution structure for apo Ssu72 as well as the crystal's high sensitivity to environmental changes. The four molecules in each asymmetry unit are highly identical with the exception of two flexible loops (residues 47–53 and 127–134). The topology can be approximately divided into two portions with a deep groove cutting through the surface of the molecule, separating the protein into the ‘cap’ and the ‘core’ domains (Figure 1B). The connections between the ‘core’ and ‘cap’ domains are two flexible loops, suggesting that the individual domains can move in a hinge-like motion to desolvate the active site and impose selectivity toward substrates. The essential catalytic residue Cys13, whose substitution results in loss of phosphatase activity and death of yeast strains [15], is located at the tip of the groove (Figure 1B). The ‘core’ domain of the protein has a typical Rossmann fold with a series of twisted β-strands sandwiched by α-helices at each side (Figure 1C). The ‘cap’ portion (residue 41–92), which is well-conserved among Ssu72 from different species, is unique with no similar sequence or structure identified in other protein families (Figures 1A and 1D). When we colour the protein according to residue conservation, the highly conserved residues are localized around the groove dividing the ‘cap’ and ‘core’ domains of the protein, indicating functional importance of this groove (Figure 1D).

Ssu72 is a LMWPTP

Unlike kinases which evolve from the same ancestor and therefore maintain the same three-dimensional topology, protein phosphatases adapt different catalytic mechanisms for the phosphoryl-transfer reaction. Three different catalytic mechanisms are identified among protein phosphatases (Supplementary Figure S2 at The first category, called cysteine-based phosphatases, utilizes cysteine as the nucleophile at the active site [31,32]. This well-studied mechanism establishes that a phosphate group from the substrate is transferred to the thiol group as a phosphoryl-intermediate, which in turn undergoes hydrolysis. On the other hand, serine/threonine phosphatases, such as PP1 and PP2, utilize a dramatically different strategy for the hydrolysis of phosphate monoesters by using di-metal ions and activating a water molecule to directly break the P–O bond without a phosphoryl-protein intermediate (reviewed in [33]). The third category, HAD (haloacid dehalogenase)-like phosphatases, including the CTD phosphatase Scp/Fcp family, resembles the two-step reaction mechanism of cysteine-based phosphatases, but utilizes aspartic acid as a nucleophile [34].

The cysteine-based protein phosphatases are further divided into three sub-families based on the relative location of the signature motif, termed the ‘PTP’ loop (-CX5R-), and their preferences for substrates. The first subfamily, classical tyrosine phosphatase, is usually approx. 30 kDa with the PTP loop close to the middle of the protein, and it specifically dephosphorylates phosphoryl tyrosine. Dual-specificity protein phosphatases, on the other hand, recognize both tyrosine and serine/threonine phosphorylation, as indicated by the name, with the PTP loop situated closer to the C-terminal end of the protein. The third family LMWPTP has a single catalytic phosphatase domain of approx. 18 kDa. The PTP loop is located at the N-terminus of the LMWPTP, with a nucleophile cysteine usually in the vicinity of 10–15th residues. LMWPTP favours phosphoryl-tyrosine as a substrate even though high concentrations of phosphoryl-serine/threonine are also subjected to LMWPTP-catalysed hydrolysis in vitro [35].

Although Ssu72 exhibits very low primary sequence identity with proteins from any of the cysteine-based phosphatase family, except the existence of the PTP motif, the fold of the core domain is highly identical with that of LMWPTP (Figure 1E). A -CX5R- signature motif is located at the N-terminus of the protein. With the primary sequence consensus of only 15%, the overall fold of Ssu72 exhibits a Z score of 11.6 in a DALI search [36] and an rmsd (root mean square deviation) of 1.7 Å when superimposed with the main chain of LMWPTP (PDB code 1XWW) (Ssu72 6–195 excluding residues 41–96; LMWPTP 1–157 excluding residues 50–67).

Complex structure of Ssu72 with the inhibitor vanadate

Another characteristic of cysteine-based phosphatases is their sensitivity to oxo-anion compounds such as vanadate, tungsten and molybdate [37]. These compounds can inhibit cysteine-based phosphatases by forming transition states or product analogues with cysteine nucleophiles. In order to better understand the reaction mechanism and roles of catalytic residues, apo Ssu72 crystals were transferred to a crystallization medium that contained 2 mM sodium orthovanadate. A structure of Ssu72 in which the vanadate ion was co-ordinated in the active site was then compared with the apo Ssu72 structure. Interestingly, the incorporation of the vanadate ion stabilizes the crystal, which exhibits a higher resolution and lower thermal factor than apo Ssu72 crystals (Table 1).

The structure of the Ssu72–vanadate complex was solved using apo Ssu72 as a search model in MR and refined to a resolution of 2.35 Å (Figure 2A). The refined model is almost identical with apo Ssu72 with signature motif (-CX5R-) located at the crevice. Strong positive electron density can be observed close to the nucleophilic cysteine residue (Cys13) with a characteristic trigonal bipyramidal coordination (Figure 2B). A vanadate group can be built into the electron density with three oxygen atoms at equatorial positions, and cysteine sulfur atom, as well as another oxygen atom at each side of the apical position (Figure 2C). The distance of vanadium to sulfur is refined to 2.3–2.4 Å, and the other four oxygen atoms are all approx. 1.9 Å away, consistent with a previously published vanadate adduct with bovine LMWPTP [38]. The bond length of S–V suggests the existence of covalent bond formation between the compound and the active-site residue of Ssu72, explaining the inhibitory effect of vanadate compounds on Ssu72 (Figure 2C).

Figure 2 Complex structure of Ssu72 and vanadate

The green broken lines indicate hydrogen bonds in all parts of the Figure. The numbers are the distances of hydrogen bonds (in Ångstroms). (A) Stereoview of 2Fo−Fc electron density (blue) of active-site residues of Ssu72 contoured at 2σ. The nucleophile cysteine residue is highlighted in yellow. (B) 2Fo−Fc electron-density map (green) of vanadate contoured at 1.6σ. (C) The interactions between vanadate and active site Cys13 and Asp144. The S–V bond between Cys13 and vanadate (blue broken line) is within the covalent bond range. (D) The bound vanadate mimics the formation of the trigonal pyramidal transition state in the phosphoryl-transfer reaction. (E) The hydroxyl group of the Ser20 side chain forms a hydrogen bond with the nucleophile Cys13. Asp141 is located distal to the active site with its carbon atom coloured blue.

The other conserved residue of the signature motif, Arg19, is located close to the nucleophilic cysteine residue (Figure 2C), and has dual functions in the phosphoryl-transfer reaction which is conserved in all cysteine-based phosphatases [39]. First, the positively charged arginine side chain can help to recruit the phosphate group to the active site through electrostatic interactions. More importantly, the side-chain of arginine has the potential to stabilize the tetrahedral intermediate by forming hydrogen bonds with vanadate [40] (Figure 2D). The stability of the co-planar bidentate is essential for the phosphoryl-transfer reaction since lysine replacement of arginine cannot fully replace the arginine function in Yersinia PTP [40]. Indeed, in the complex structure of Ssu72 with vanadate, the two nitrogen atoms of the guanidinium group from the highly conserved Arg19 pair with two equatorial oxygen atoms in the vanadate group (Figure 2D). The pairing of the vanadate oxygen and side chain of Arg19 constitutes a six-membered ring hydrogen-bonding network that presumably reduces the activation free energy for the phosphoryl-transfer reaction. The structure mimics the formation of the trigonal-pyramidal transition state in the phosphoryl-transfer reaction (Figure 2D) and explains how such a high-energy state is stabilized in Ssu72. Additional hydrogen bonding occurs between the backbone amides of the PTP loop and the equatorial oxygen atoms (Figure 2D). As found in other cysteine-based phosphatases, a serine residue is located after this arginine residue and potentially stabilizes the thiolate anion of nucleophile cysteine by lowering its pKa [41]. Indeed, in our structure, the hydroxyl group of the Ser20 side chain forms a hydrogen bond with nucleophile Cys13, which can potentially promote the deprotonated state of cysteine and facilitate the nucleophilic attack on phosphorus (Figure 2E). The elimination of this hydrogen bond by replacing Ser20 with alanine abolished the phosphatase activity of the protein in our PNPP assay (results not shown).

Aspartate-containing flexible loop

In addition to the -CX5R- PTP motif, another highly conserved residue in all LMWPTPs is an aspartic acid residue, usually 110 amino acids away from the nucleophile cysteine residue. The three-dimensional position of this aspartic acid residue is conserved in all cysteine-based phosphatases. In Yersinia PTPase, this conserved aspartate residue accounts for the basic limb of the pH-dependence curve [42]. This aspartate residue fills the role of general acid to provide the proton for the substrate-leaving after phosphoryl transfer [39]. In the primary sequence of Drosophila Ssu72, two aspartate residues, Asp141 and Asp144 (Asp140 and Asp143 in humans), are highly conserved in this area. The mutation of either of these two aspartate residues in human Ssu72 reduced the kcat/Km of Ssu72 by approx. 20-fold [30]. To identify which is the general acid for the phosphoryl-transfer reaction, we inspected the interaction of the transition-state analogue vanadate with active site residues. In the complex structure of Ssu72 and vanadate, this highly conserved position is occupied by Asp144 which can make two potential hydrogen bonds with vanadate (Figure 2C). However, Asp141 is 12.7 Å away from the vanadate group (Figure 2E). Therefore we conclude that the Asp144 is the general acid that contributes to the reaction, and the loss of activity of the Asp141 mutation might be caused by structural disruption. Indeed, when we mutated Asp141 to alanine, the proteins can be expressed (detected by SDS/PAGE), but are not accumulated in the soluble fraction, probably due to an issue with protein folding.

In other cysteine-based phosphatases, this loop containing general acids/bases is presumably highly flexible and can adopt the active conformation upon ligand binding or sway away when the active site is empty. For example, in classical PTPs, this aspartate residue is 10 Å away from the active site in the absence of substrate and only extends into the active site when occupied [39]. In our crystal structure of Drosophila Ssu72 complexed with vanadate, the loop is in the ‘active’ form with Asp144 extending into the active site, mimicking the conformation of an active site for phosphoryl transfer (Figure 2C). Without the vanadate compound, the loop still extends into the active site, forming a hydrogen bond with a highly ordered water molecule.

To biochemically confirm that the addition of the transition state analogue vanadate stabilizes the conformation of the loop containing the aspartate residue, we used DSF to assay the effect of an oxo-anion compound on the folding of Ssu72 [43]. In this method, fluorescence from a dye with affinity for the hydrophobic surface is monitored. The fluorescence increases upon the exposure of hydrophobic surfaces during protein unfolding. In our experiment, the incorporation of the vanadate compound consistently stabilized Ssu72 and increased the Tm. For yeast Ssu72, Tm increased from 37.8±0.2 °C to 43.5±0.3 °C upon vanadate addition. Similar trends were observed in human Ssu72 (Tm increased from 44.0±0.3 °C to 49.2±0.5 °C) and Drosophila Ssu72 (Tm from 42.5±0.2 °C to 45.6±0.4 °C) (Supplementary Figures S3A–S3C). We also used CD to detect the effect of the vanadate compound to correlate with our DSF results. Even though the buffer condition is slightly different, the stabilization effect of vanadate is consistent with a Tm of Drosophila Ssu72 increase from 44.0±0.3 °C to 46.2±0.2 °C (Supplementary Figure S3D). We reason that the binding of vanadate stabilizes the flexible loop containing general acid Asp144 through hydrogen bonding, thereby enhancing protein folding.

The ‘cap’ region

Unlike the bottom core portion of the Ssu72 protein whose topology is highly conserved in all cysteine-based phosphatases, the ‘cap’ region (residue 41–92) of Ssu72 is unique and shows no structural similarity to any other protein (less than 4.0 in the Z score) using a DALI search [36]. From Glu41 to Glu57, two short anti-parallel β-strands are connected by an extended flexible loop resembling a ‘finger’, comprising the most dynamic portion of the Ssu72 structure (Figure 1E). This mobile tip excludes the active-site pocket from the bulk solvent and limits access to the binding pocket for Ssu72. When we mutate a residue at the tip of the finger, Asp51 to alanine, the binding to CTD substrate is greatly reduced, even though the phosphatase activity is not affected when tested using the PNPP assay (Table 2). Consistent with our mutagenesis result, a complex structure of human Ssu72 with the substrate CTD and the scaffold protein symplekin was recently determined, showing the incorporation of the substrate CTD close to this flexible region (Figure 3A) [44].

Figure 3 Important residues for Ssu72 function

(A) Superimposition of Drosophila (Dro) Ssu 72 and human Ssu72–CTD complex (PDB code 3O2Q). Drosophila Ssu72 is coloured light blue, and human Ssu72 is coloured salmon. The CTD peptide is represented as a white stick. The flexible loop region (indicated by ‘Finger’) encloses over the CTD-binding site in the human Ssu72–CTD peptide complex structure. (B) Superimposition of the active sites of Ssu72 (cyan) and LMWPTP (PDB code 1XWW, pink). The key residues are represented as a white stick in LMWPTP, and as an orange stick in Ssu72 respectively. The vanadate is shown in yellow to indicate the active site. (C) Stereoview of the CTD-binding groove presented by superimposition of the CTD peptide from the symplekin–Ssu72–CTD complex structure (PDB code 3O2Q) into the Drosophila Ssu72 structure. The residues in the mutagenesis study are shown in stick representation.

Unlike Ssu72, the active site in LMWPTP (PDB code 1XWW) is highly exposed (Figure 1E). Such architectural design is consistent with the substrate specificity of these phosphatases. LMWPTPs target receptor tyrosine kinases such as platelet-derived growth factor receptor, insulin receptor [45], ephrin receptor [46] and fibroblast growth factor receptor [47]. Therefore the active site of LMWPTP needs to be accessible and open to accommodate such bulky substrates. On the other hand, the only substrate Ssu72 recognizes in vivo is the CTD of RNA polymerase II, which is a highly disordered peptide and presumably adopts a long extended loop to fit into the more exclusive active site of Ssu72.

One important regulatory mechanism for LMWPTP activity is through the differential phosphorylation of Tyr131 and Tyr132 [48]. These two sites are proposed to increase the activity of the phosphatase or recruit the SH2 (Src homology 2) domain of Grb2 to LMWPTP [48]. In contrast, no phosphorylation regulation has yet been observed for Ssu72. The flexible loop, containing two tyrosine residues (Tyr131 and Tyr132) in LMWPTP, is substantially shorter in Ssu72, with the two tyrosine residues omitted in the Ssu72 sequence (Figure 3B). We speculate that instead of using phosphorylation as a regulatory mechanism, Ssu72 adapts the unique cap domain to modulate phosphatase activity and substrate selectivity. Use of divergent cap domains to limit the accessibility of the active site is a common strategy in biology. For example, the HAD super family, a large family of proteins for phosphoryl transfer, utilizes different ‘cap’ domains to mediate different phosphoryl transfers of different substrates [34]. A core domain with a Rossmann fold is conserved among all 3000 HAD family members.

Potential proline-binding pocket

Ssu72 can dephosphorylate the hyperphosphorylated CTD in vitro [16]. To test the activity of Drosophila Ssu72 towards the CTD, we used CTD-derived peptides with different length and phosphorylation patterns to identify the optimal minimal length for the CTD recognition. It has been noticed that the binding of the CTD by Ssu72 is highly dependent on the tyrosine residue from the following repeat of phos.Ser5 [49]. This is quite different from human Scp in which the Pro3 N-terminus from the phos.Ser5 is essential for CTD recognition [13]. To evaluate this, we first tested against a 17-mer CTD peptide with a phos.Ser5 site (SPSYSPTSPSYSPTpSPS), which is the optimal substrate to Scps. However, the activity is relatively low with a kcat of 0.19±0.01 s−1 and Km of 0.85±0.09 mM. Consistently, when using a singly phosphorylated double repeats peptide (YSPTSPSYSPTpSPS) as the substrate, the kcat is measured to be 0.44±0.03 s−1 and Km is 1.2±0.2 mM. The best activity is measured when both Ser5 are phosphorylated in the double repeats CTD peptide (YSPTpSPSYSPTpSPS) which results in a kcat of 0.397±0.004 s−1 and Km of 0.152±0.006 mM (Table 2 and Supplementary Figure S4 at This turnover rate is 20-fold better than the previously reported yeast Ssu72 (kcat estimated to be 0.02 s−1) [49], and comparable with the other Ser5 phosphatases, Scps (Km=0.21±0.05 mM and kcat=2.44±0.04 s−1) [13]. We further identified a minimal optimal peptide (10-mer, YSPTpSPSYSPT) which shows comparable activity with Km=0.96±0.03 mM and kcat=0.42±0.04 s−1 (Table 2). This result suggests different recognition element requirements for Scp and Ssu72 for substrate binding, even though they both dephosphorylate phos.Ser5 of the CTD. The residues that contribute greatly to the binding of the CTD by Scp are primarily located N-terminal to the phos.Ser5 subject to dephosphorylation [13]. Conversely, Ssu72 requires certain C-terminal residues following the phos.Ser5 for CTD peptide recognition.

Most recently, the complex structure of human Ssu72 bound to the CTD and scaffold protein symplekin was reported [44]. A particularly interesting question about Ssu72 is the identity of residues that play essential roles in substrate recognition. We reasoned that such residues would not interfere with active site phosphatase activity, as characterized by the PNPP assay, but should exhibit much lower activity towards CTD peptides. To investigate which residues are important specifically for the substrate recognition of Ssu72, we made mutations of residues that are located in the substrate-binding groove. We are particularly interested in the residues that might contribute to the binding of Pro6 in the CTD sequence. Unlike any other CTD-bound proteins such as Scp1 or Cgt1 [1], Pro6 exhibits a unique cis conformation upon Ssu72 binding [44]. Through mutagenesis and steady-state kinetics, we identified five residues that play pivotal roles for the specific activity towards the CTD (Table 2). A pair of methionine residues, Met17 and Met85, clamp on to Pro6 of the CTD, with Pro46 from the opposite side, through hydrophobic interactions (Figure 3C). The mutation of either of the two methionine residues, or Pro46 to alanine results in the loss of Ssu72 activity towards the CTD peptide, even though the activity towards PNPP is unaffected or insignificantly reduced (Table 2). On the other hand, the loss of activity due to mutation of Tyr77 is surprising since Tyr77 does not directly contribute to the substrate binding. Instead, it orients the side chain of Asn54 with hydrogen bonding and stabilizes the conformation of the highly flexible region (residues 41–57). The comparison of the human Ssu72–CTD complex structure and our Drosophila structures discloses that the Ssu72 structure is highly conserved upon vanadate incorporation with only difference at this highly flexible region (residues 41–57) enclosing the CTD substrate in the complex structure (Figure 3A). This region is formed by two anti-parallel β-strands with a flexible loop and is highly dynamic in apo structure (Figure 1E). The phosphate group of phos.Ser5 of the CTD is found at the location of the vanadate ion in our structure. Residues on this flexible loop are particularly important for the function of Ssu72. The D51A mutant exhibits comparable phosphatase activity with the wild-type Ssu72 in the PNPP assay, but is greatly compromised when phosphoryl CTD peptides are used as the substrate (Table 2 and Figure 3C). Another mutation, Y56A, disrupts the protein activity in both the PNPP assay and Malachite Green assay (Table 2). Not all residues located in the groove contribute to Ssu72 activity. For example, the alanine mutation of Leu82, which is located very close to the substrate CTD, results in no loss of phosphatase or specificity activity (Table 2).

Ssu72 is an essential protein in eukaryotes whose phosphatase activity plays a key role in RNA polymerase II recycling and the selection of transcription termination pathway. In the present study, we successfully solved the high-resolution X-ray crystal structures of Drosophila Ssu72 in complex with the vanadate, mimicking the transition state of the phosphoryl transfer. This structure shows that Ssu72 is a unique subfamily of LMWPTP with a ‘cap’ domain to confer its substrate specificity. Moreover, kinetic studies of Drosophila Ssu72 using phos.Ser5 peptides as the substrates demonstrated that Ssu72 can dephosphorylate phos.Ser5 of the CTD peptide. A deep groove engulfing the nucleophile Cys13 between the core domain and the ‘cap’ domain was observed in the Ssu72 structures and is predicted to be the binding site for the CTD peptide. In addition, the complex structure of Ssu72 and vanadate mimics the formation of the trigonal pyramidal transition state in the phosphoryl-transfer reaction and explains how such a high-energy state is stabilized in Ssu72.


Yong Zhang, Mengmeng Zhang and Yan Zhang designed the experiment. Yong Zhang conducted the experiments. Mengmeng Zhang and Yan Zhang wrote the paper.


This work was supported by the National Institutes of Health [grant number R03DA030556] (to Y.Z.) and a University of Texas startup grant [grant number 19171662 (to Y.Z.)].


We thank the staff of the Advanced Light Source (ALS), Berkeley, CA, U.S.A. for assistance during data collection at beam line 5.0.1. We also thank Dr Katherine Brown and Dr Annie Gnanam for the help of measurement on CD, and Dr Xi Chen for helpful discussion.


  • The atomic co-ordinates and structure factors (PDB codes 3OMW and 3OMX) have been deposited in the Protein Data Bank.

Abbreviations: CTD, C-terminal domain; DSF, differential scanning fluorimetry; DTT, dithiothreitol; HAD, haloacid dehalogenase; LMWPTP, low-molecular-mass protein tyrosine phosphatases; MR, molecular replacement; Ni-NTA, Ni2+-nitrilotriacetate; PEG, poly(ethylene) glycol; phos.Ser5, phosphorylated Ser5; PNPP, p-nitrophenyl phosphate; rmsd, root mean square deviation; Scp, small CTD phosphatase


View Abstract