Biochemical Journal

Review article

Evolution of protein phosphatases in plants and animals

Greg B. G. Moorhead, Veerle De Wever, George Templeton, David Kerk


Protein phosphorylation appears to be a universal mechanism of protein regulation. Genomics has provided the means to compile inventories of protein phosphatases across a wide selection of organisms and this has supplied insights into the evolution of this group of enzymes. Protein phosphatases evolved independently several times yielding the groups we observe today. Starting from a core catalytic domain, phosphatases evolved by a series of gene duplication events and by adopting the use of regulatory subunits and/or fusion with novel functional modules or domains. Recent analyses also suggest that the serine/threonine specific enzymes are more ancient than the PTPs (protein tyrosine phosphatases). It is likely that the latter played a key role at the onset of metazoan evolution in conjunction with the tremendous expansion of tyrosine kinases and PTPs at this point. In the present review, we discuss the evolution of the PTPs, the serine/threonine specific PPP (phosphoprotein phosphatase) and PPM (metallo-dependent protein phosphatase) families and the more recently discovered phosphatases that utilize an aspartate-based catalytic mechanism. We will also highlight examples of convergent evolution and several phosphatases which are unique to plants.

  • genomics
  • kelch phosphatase
  • laforin
  • phylogenetics
  • protein tyrosine phosphatase (PTP)
  • starch excess 4 (SEX4)
  • TFIIF associating component of CTD phosphatase/small CTD phosphatase (FCP1/SCP)


There are few individuals whose contribution to science has had a more profound effect on the study of biology than that of Charles Darwin. It is an honour to write an article on protein phosphatase evolution to commemorate his 200th birthday and 150 years since the 1859 publication of “On the Origin of Species”. We live in a time when we are beginning to understand evolution at a molecular level and, having passed into the post-genomic era, we are currently seeing a transition to an age of evolutionary and comparative genomics, which ideally will culminate into evolutionary proteomics. These emerging fields will undoubtedly have a profound effect on not only evolutionary and genomic biology, but on biological research in general.

Recent studies have highlighted the vast number of modified proteins within proteomes and the often large number of functional groups found on any given protein. Many studies have also reinforced the view that protein phosphorylation is a common means to the regulation of most cellular processes, is ancient in origin, and is probably a universal covalent modification among organisms [13]. This is highlighted by the fact that protein kinases and phosphatases, which respectively add and remove phosphate on proteins, constitute 2–4% of the genes in a typical eukaryotic genome [4,5]. Early studies on protein phosphorylation focused on its role in modifying enzyme activity, while more recently it has been found to play a role in protein localization, protein turnover and to provide a specific interaction site for other proteins [6]. Like protein kinases, the study of protein phosphatases began in the field of glycogen metabolism and through biochemistry, molecular biology and most recently genomics, we can now catalogue the phosphatase complement of organisms and begin to examine evolutionary relationships between the protein phosphatases. This review will discuss the spectrum of protein phosphatases, evolutionary relationships, convergent evolution and the role of phosphatases in the evolution of metazoans.


Proteins can be phosphorylated on nine amino acids (tyrosine, serine, threonine, cysteine, arginine, lysine, aspartate, glutamate and histidine) with serine, threonine and tyrosine phosphorylation being predominant in eukaryotic cells and playing key regulatory roles [5]. Although phosphorylation on aspartate and histidine residues in bacterial two-component systems have been well characterized, several recent phospho-proteomic studies have shown bacterial protein phosphorylation to be more predominant than originally thought [79]. In three bacterial species, 63–79 proteins were found to be phosphorylated, with a surprising distribution on serine, threonine and tyrosine of approx. 69, 22 and 9% respectively. In humans, phosphorylation on serine, threonine and tyrosine is approx. 86.4, 11.8 and 1.8% respectively [10]. The enzymes that dephosphorylate these amino acids (serine, threonine and tyrosine) form four groups based on unique catalytic signatures/domain sequences and substrate preference. The majority of phospho-serine and phospho-threonine dephosphorylation is accounted for by the PPP (phosphoprotein phosphatase) [PP (protein phosphatase) 1, PP2A, PP2B (or PP3), PP4, PP5, PP6 and PP7] and PPM (metallo-dependent protein phosphatase; PP2C) families and includes the enzymes shown in parentheses. Remarkably, the PPM and PPP families are unrelated in sequence and are probably evolved from two unique ancestral genes, but converged to have highly related structures at the catalytic centre [11]. The PTP (protein tyrosine phosphatase) family is defined by the catalytic signature CX5R and is very diverse in domain structure and substrate preference with some family enzymes now recognized to dephosphorylate complex carbohydrates, mRNA and phosphoinositides [12,13]. Finally, for the most recently classified group, the aspartate-based phosphatases, catalysis is driven by an aspartic acid signature (DXDXT/V) and includes the FCP/SCP [TFIIF (transcription initiation factor IIF)-associating component of CTD (C-terminal domain) phosphatase/small CTD phosphatase] and HAD (haloacid dehalogenase) family enzymes. We refer readers to several excellent reviews on phosphatase classification and functions [1219].


Having a complete inventory of phosphatases for several organisms reveals many aspects of this group of enzymes, with one striking feature being immediately apparent: ancient phosphatase domains functionally evolved by docking to novel regulatory subunits (most PPP members), or through additional domains that were acquired by fusing with these catalytic domains at the gene level (PTP, PPM and aspartate-based families) (see Figure 1). Although the PTP core domain has also evolved to accommodate specific substrates, other phosphatase catalytic subunits typically display broad substrate specificity in vitro. As a result, specificity of function is brought about by an array of modules or domains and regulatory subunits. Domains and subunits provide a means of subcellular targeting, allow docking of other proteins and inositol lipids and often permit regulation by covalent modifications [20]. The evolutionary strategy of adding domains on to a core unit was first noted for the PTPs, but is also apparent for the PPM (PP2C) family (especially in plants) and a number of other phosphatases (e.g. PP5). The PPP family members PP1 and PP2A are some of the most highly conserved enzymes known. Conversely, they obtain specificity for their catalytic engines via interactions with many novel regulatory subunits. The best example of this is PP1, with nearly 100 recognized regulatory subunits (Figure 1). Two of the more recently identified members of the PPP family, PP4 and PP6, followed a similar evolutionary route as PP1 and PP2A, with a remarkable conservation of regulatory subunits across species [21,22]. For instance, the PP4-interacting proteins, R2 and R3α and β of humans, have readily apparent orthologues in the Saccharomyces cerevisiae genome that have been shown to complex with the yeast PP4 catalytic subunit [23].

Figure 1 Distinct evolutionary routes for protein phosphatases

Catalytic and regulatory modules have evolved by two means to accommodate specific substrates and/or functions. (A) represents evolution by domain fusion on a genome level, exemplified by PTP evolution. This evolutionary mode is also followed by some PPP members (e.g. kelch-like phosphatases and aspartate-based phosphatases). The ancestral catalytic domain (blue rectangle) fused with a regulatory module (red oval). Subsequent evolution of regulatory modules (pentagon, oval, triangle) confer unique properties to the catalytic domain. Modifications in the catalytic domain throughout evolution render further substrate specificity (depicted by the shading of the catalytic domain). Some catalytic domains have become inactive (red cross). (B) reflects evolution via complex formation on a proteome level, conceptualized using PP1 as an example. The catalytic subunit (green oval) accommodates novel substrates via interaction with numerous regulatory subunits (rounded triangle, pentagon, star). The RVxF/W motif (black bar) is the primary PP1-docking motif.


The PTPs are defined through their common signature motif (CX5R) that drives catalysis [13]. It is thought that this motif and catalytic mechanism evolved independently three times to yield three groups or classes of PTPs [12]. The human genome encodes more than 100 PTPs with orthologues for all but one in mouse [12]. Of the more than 100 PTPs, 11 are catalytically inactive and 16 dephosphorylate either glycogen (1), mRNA (2) or phosphoinositides (13), and not proteins. It is thought that the remainder of these PTPs dephosphorylate phosphotyrosine and, in some cases, also phosphoserine and phosphothreonine. Class I, which include the classic and dual-specificity enzymes, have a common PTP domain structural fold and are by far the largest group of PTPs and are further divided into subfamilies. The classic enzymes (receptor and non-receptor) are given this name as they defined the PTPs and they all dephosphorylate tyrosine residues. The DSPs (dual-specificity phosphatases; also known as DUSPs; this group includes all non-protein phosphatase PTPs) are further divided into the MAPKP [MAPK (mitogen-activated protein kinase) phosphatase], slingshot, PRL (phosphatase of regenerating liver), atypical DSP, CDC14 (cell division cycle 14), PTEN (phosphatase and tensin homologue deleted on chromosome 10) and MTMs (myotubularins). Class II and III are represented by the LMPTP (low-molecular-mass PTP) and Cdc25 isoforms and are thought to have evolved from a bacterial rhodanese-like enzyme and a bacterial arsenate reductase respectively (see [12,13,24] for further details).

In plants, no true protein tyrosine kinases exist, even though in Arabidopsis there are at least 1055 protein kinases (compared with 518 in humans) of which ∼630 are RLKs (receptor-like kinases). These RLKs are most like human Pelle kinases that display serine/threonine kinase activity, and as such it is believed that all plant RLKs are serine/threonine kinases [25,26]. Consistent with the absence of tyrosine kinases is the near total lack of plant phosphotyrosine-specific phosphatases in the PTP family (i.e. classic enzymes) [2729]. As with plants, other genomes examined that were found to lack tyrosine kinases also have very few PTPs [30]. For example, most species that belong to the apicomplexa (protozoans) have no class II, class III or classic PTPs and only a very few DSPs (in this case, PRL and MAPKP). Apicomplexa genomes also have no obvious protein tyrosine kinases [30]. As will be discussed below, it is now thought that the PTPs that dephosphorylate phosphotyrosine evolved before tyrosine kinases due to the action of promiscuous serine/threonine kinases that phosphorylate tyrosine residues. The appearance of true tyrosine kinases coincides with a dramatic expansion of the PTP superfamily just before the evolution of metazoans.

Sequence and structural analysis of the class I PTPs reveals a universal structural fold supporting the concept of evolution from a common ancestral gene [12]. Consistent with this is evidence of gene duplication. The highly sequence-related pairs TCPTP (T-cell PTP)/PTP1B, PTPα/PTPε and PTPD1/PTPD2 each share very similar exon structures in the genomes of human, mouse and chimpanzee [31]. Human DUSP24 is localized adjacent to DUSPs 13a and 13b indicating two ancestral gene duplications. Humans have 21 RPTPs (receptor PTPs) and 12 of them have a tandem arrangement of intracellular catalytic domains (D1 is membrane proximal and D2 is membrane distal). With only one exception, all D1 are active and all D2 are inactive, yet all D2s are still well conserved compared with other PTP domains [13,32]. Phylogenetic analysis of PTP domains shows that D1 and D2 are from separate clades (i.e. all D1s are most like each other), but group together when compared with all other PTP domains, indicating that the PTP domain duplicated within the ancestral PTP gene before the whole gene duplicated and ultimately gave rise to the 12 RPTPs [32]. Recent analysis has shown that the catalytic cysteine residue of the CX5R motif is susceptible to oxidation due to the same properties that make it a good nucleophile [33]. Nature has evolved two mechanisms to protect this group from irreversible oxidation [34]. PTP1B has been shown to form a reversible cyclic sulfenamide with the catalytic cysteine residue forming a bond with the amide nitrogen of the neighbouring serine residue. PTPs from the DSP, LMPTP and CDC25 groups have been demonstrated to form disulfide bonds between the catalytic cysteine residue and another cysteine residue of the same protein [34]. Although the D2 domains of RPTPs are inactive, they intriguingly maintain their CX5R signature and are inactivated by loss of other key catalytic amino acids. The remarkable conservation of D2 PTP domains and specifically the CX5R motifs has been postulated to be due to the potential role of these domains as cellular redox sensors [34].

Certain members of the PTP family exemplify how sequence and function conservation can go hand in hand. The class I dual-specificity enzyme Cdc14 controls mitotic exit in metazoans and yeast. Searching for this enzyme in plants and eukaryotic (green) algae revealed a single orthologue in Chlamydomonas reinhardtii with sequence coverage over both the N-terminal region and PTP domain [27]. Cdc14-like protein(s) are also present in higher-plant genomes, yet phylogenetic analysis showed that they form a distinct clade, separate from characterized Cdc14 proteins. These Cdc14-like sequences lack key catalytic and substrate targeting residues and thus could not function as active phosphatases, but seem to have been co-opted into protein–protein interaction domains [27]. Although putative Cdc14 orthologues have been noted in many protozoans, there is an absence of this PTP in all apicomplexans, except Cryptosporidium [30].

The single class II enzyme, also known as LMPTP, is tyrosine-specific and widespread, being found in all kingdoms including Archea and most bacterial species, and is one of only three PTPs in yeast [12,35]. A survey of several bacterial LMPTPs shows that the enzyme is 30–40% identical with the human enzyme. This astonishing conservation suggests an ancient conserved function, but to date no clear role has been defined for LMPTP.

The sole class III PTP, also known as Cdc25, dephosphorylates the cell-cycle protein kinase Cdc2 [36]. The Cdc25s display sequence similarity to rhodaneses and are thought to have evolved from a bacterial, rhodanese-like protein [12,27]. Our genomic analysis of several plant and algal species for phosphatases related to that of human Cdc25 revealed that, like Cdc14, there appears to be no Cdc25 in plants [27]. Several, but not all, unicellular parasites have Cdc25 orthologues. A notable absence is again in the apicomplexa [30]. Candidate plant and algal Cdc25s actually form a clade with several arsenate reductases and not the well-characterized human and yeast Cdc25s suggesting that these candidates are really arsenate reductases. These sequences also lack the N-terminal regulatory domain found in true Cdc25s. This insight, along with the lack of Cdc14 in plants, indicates that the role that these phosphatases play in cell-cycle control was established after ancestral plant and animals/yeast diverged, or was lost early in the plant lineage.

Inactive PTPs

A number of enzymes are inactive, yet conserved, in many organisms (e.g. the pseudokinases), which opens discussion towards their biological function [3739]. The phosphatases are no exception to this, particularly the PTPs since the core catalytic motif is well defined and inactive mutations are readily identified [12,13]. Inactive PTPs display PTP-like domain structures, but possess mutations in the cysteine or arginine residues of the CX5R motif, in an upstream conserved aspartate residue or in combinations of the three, all exemplified by the STYX domain proteins and the D2 domain of RPTPs [13,40]. In the case of STYX, such alterations are thought to disrupt the enzymatic activity of the phosphatase, yet potentially maintain interaction with phosphorylated substrates ([12,40] and references therein), not unlike how a 14-3-3 protein docks a phosphoserine or threonine motif. Mutations must have occurred early in evolution since various inactive PTPs {e.g. tensins, PTPLA/Bs [PTP-like (proline instead of catalytic arginine), member A/B]} are conserved within eukaryotes [41,42].

The MTMs, a well-defined class I PTP subfamily, are phosphoinositide phosphatases specific for the 3-position of PtdIns3P and PtdIns(3,5)P2. These phosphoinositides provide docking sites for proteins involved in endosomal–lysosomal membrane trafficking. Only a few MTMs have been found in a number of apicomplexan species, while we identified two presumably active MTMs in Arabidopsis [27]. In humans, seven of the 16 MTMs are predicted to be inactive [12]. Their characterization indicates that they play essential roles, some of which are linked to human disease. For example, loss of either MTMR2 or MTMR13, an active and inactive MTM respectively, both cause a form of Charcot–Marie tooth disease [43]. Hence, the inactive protein is equally essential to prevent disease onset, and both proteins are thought to interact.

Another family of potentially inactive PTPs are the tensins. The tensin family has four members in humans (Tensins 1–3 and Cten), each containing an SH2 (Src homology 2) and PTB (phosphotyrosine-binding) domain at their C-termini [44]. Primary sequence analyses predict either an inactive PTP domain [45] or a PTEN-C2-like domain in the N-terminal region of Tensins 1–3 [46]. Cten lacks the N-terminal region and therefore the putative PTP domain. Recently, Tensins 1 and 2 and Cten expression were shown to stimulate cell migration whereas Tensin 3 expression had an inhibitory effect. Interestingly, our sequence alignment shows that Tensin 3 maintains the CX5R motif, whereas Tensins 1 and 2 display mutations in these residues (cysteine to asparagine and arginine to lysine respectively; results not shown), whereas Cten obviously lacks the CX5R motif.

Finally, the PTP-like proteins PTPLA/B are classified as inactive PTPs, although they have no sequence homology with the PTPs except for the CX5R motif, where in their case the arginine residue is mutated into a proline [47]. Significant progress towards the elucidation of their biological function has been made with the PTPLA homologues in plants and baker's yeast, PAS2 (Pasticcino 2) and Phs1 respectively. Reciprocal complementation studies identified them as functional orthologues [48], while PTPLA/B expression also rescues Phs1-impaired cells [49]. Both PAS2 and Phs1 function as dehydratases in the elongase complex required for the production of very-long-chain fatty acids. Further studies are required to clarify the role of the CX5R motif in these proteins and the function of PTPLA in metazoan organisms.

Overall, these examples show that loss of key catalytic residues, but overall domain conservation, led these proteins to evolve from their initial function as phosphate-removing enzymes towards different functions, e.g. phosphotyrosine-recognition with potential scaffolding and localization capacities.


Until recently, phosphotyrosine signalling was considered the hallmark of intercellular communication of multicellular animals as other unicellular organisms (including yeast) lack true tyrosine kinases. Previously, a moderate selection of SH2 and PTP domains were noted in the genomes of plants, fungi and dictyostelium and in a few cases shown to be functional, as predicted [5052]. Baker's yeast, for example, has three PTPs [12,52,53] and phospho-proteomics of several yeast species have shown a handful of tyrosine phosphorylated proteins, even in the absence of true tyrosine kinases [5456]. This is explained by what has been described as promiscuous serine/threonine kinase activity and the action of the well-characterized MAPKK (MAPK kinase) enzymes that phosphorylate MAPK activation loops in a TXY motif [57].

The choanoflagellates are unicellular protists and, through comparative genomics of mitochondrial DNA, they are currently considered the closest living relatives of metazoans [58]. Recent completion of the first choanoflagellate (Monosiga brevicollis) genome has revealed an abundance of tyrosine kinases (∼128), phosphotyrosine-specific PTPs (∼39) and proteins with the phosphotyrosine-docking domains SH2 (∼123) and PTB (∼20) [39,57,58]. Surprisingly, these numbers exceed that of any other metazoan and support the concept that the phosphotyrosine-signalling machinery was present prior to the evolution of multicellular animals.

These landmark genomic analyses tell us that although phosphorylation on tyrosine was present prior to metazoan evolution, it had a limited role until tyrosine kinases evolved and with the tools of kinase, phosphatase and phosphotyrosine-docking domain at hand, a rapid expansion of all components occurred in some pre-metazoan ancestor that ultimately led to choanoflagellates and multicellular animals. A comparison of the 38 classic (tyrosine-specific) PTPs of humans to the 39 M. brevicollis PTPs starkly shows that only four are probably true orthologues, which presumably evolved in a pre-metazoan ancestor, while the remaining independently evolved in each lineage with unique combinations of signalling domains. This rapid expansion of the PTPs and the phosphotyrosine-signalling system certainly played, to some extent, a role in metazoan evolution. In contrast, as discussed in [39,57], other well-recognized signalling domains/systems (excluding small GTPases, phosphoserine/threonine kinases, phosphatases and docking domains) are much more ancient and appear in combinations that are common to both M. brevicollis, metazoans and other eukaryotes, like plants and fungi, serving as a reminder of how ancient other (non-PTP) phosphatases are.


The PPM family phosphatases, which include the PP2C and pyruvate dehydrogenase phosphatase, are Mn2+/Mg2+-dependent serine/threonine-specific enzymes that are resistant to microcystin, okadaic acid and other classic toxin inhibitors of the PPP family. Unlike most PPP members, the PP2C enzymes do not have additional subunits but, like the PTPs, do display a wide variety of additional domains that confer unique functions. All eukaryotic PPM enzymes have 11 conserved motifs with nine highly conserved amino acids, four of which are aspartate residues that co-ordinate metal ions necessary for catalysis. Analysis of 16 archaeal genomes shows the presence of one PPM gene, while bacteria display putative PPM genes in 27 out of 121 genomes examined [59], grouped into two subfamilies. Interestingly, subfamily II enzymes usually have additional domains with most being GAF and PAS domains, while subfamily I genes are nearly always lone catalytic domains [59]. While most bacterial species with PPMs have one or a few PPMs, two Streptomyces species have as many as 49 PPMs. Phylogenetic analysis of these enzymes with eukaryotic PPMs reveals that the archaeal and bacterial enzymes are rooted in the eukaryotic PPMs, suggesting that the Mn2+/Mg2+-dependent enzymes originated in eukaryotes and radiated into bacteria and archaea by horizontal gene transfer [59]. Because PPM genes are spread widely throughout the bacterial kingdom, it is thought PPMs originated very early in the eukaryotic lineage, then spread horizontally into bacteria early in that lineage, and probably transferred several times. Support for an early eukaryotic origin is upheld by the observation that PPM phosphatases are found throughout eukaryotes, including plants, and have no sequence homology with the ancient PPP enzymes.

The PPM enzymes are the largest phosphatase family in plants, with 76 members in Arabidopsis thaliana. Phylogenetics assembles them into ten subfamilies with six non-clustered genes [60]. What is apparent from analysis of this group is the large number of N- and C-terminal additions to these genes. They are thought to confer specificity and/or localization to enzyme function. These domains include MAPK-docking regions, FHA (forkhead associated) domains, MORN (membrane occupation and recognition nexus) repeats and putative transmembranespanning domains [60].

The PP2C members of humans have orthologues in most other vertebrates [61]. Each vertebrate PP2C member (for instance all PP2Cεs) clusters with the others, supporting the idea that the family arose by a series of duplication events. A further phylogenetic reconstruction of the metazoan PP2Cs reveals that all enzymes cluster into two groups with one group having unique sequences near the active site, as previously noted in mammalian PP2Cs [61]. Analysis also indicates that two series of rapid duplications took place, one before the emergence of bilaterians and a second at the emergence of vertebrates. The vertebrate-specific duplications have been linked to tissue-specific expression patterns and may contribute to tissue development in vertebrates. Interestingly, many other (non-phosphatase) gene families, including the protein kinase subfamilies, evolved via a series of duplication events [4], reminiscent of PP2C, supporting the concept that expansion of signalling protein families was a major driving force during key events of metazoan evolution.


The PPP family is defined by three signature motifs (-GDXHG-, -GDXVDRG- and -GNHE-) within a ∼280 amino acid catalytic domain. PPP enzymes are regarded as very ancient, with most members widely distributed in all eukaryotes and most bacterial and archeal genomes having at least one PPP-like member. A comparison of the PPP complements of humans and Arabidopsis shows that plants have a complete absence of PP2B, but an expansion of the number of PP1, PP2A, PP4 and PP6 genes and the addition of a number of novel enzymes. Remarkably, the catalytic domains of mammalian PP1s are 76–88% and 90% identical with plant and fungi PP1s respectively. Perhaps more remarkable is that human PP1β is 100% identical with rat, mouse, rabbit and chicken and 97% identical with the zebra fish, goldfish and Atlantic salmon enzymes. Other noteworthy PPP complements are found in the protozoan parasites Encephalitozoon cuniculi, with only five PPP enzymes, and Entamoeba histolytica and Trichomonas vaginalis both having large expansions with 81 and 169 predicted PPPs respectively [62], compared with 13 and 26 in humans and plants [27]. In addition there is the notable absence of PP7 in yeast, but the presence of the unique PPP members PPQ, PPZ1, PPZ2 and PPG1.

We have generated an unrooted tree for the PPP members in humans and Arabidopsis thaliana (Figure 2), and the result is consistent with similar analyses using a broader range of eukaryotic PPP members [19,62,63]. PP5, PP7 and two plant-specific enzymes form a clade separate from other PPP members, suggesting two distinct branches that probably separated early in eukaryotic evolution. It has been proposed that the N-terminal TPR (tetratricopeptide repeat) domains of PP5 and the C-terminal EF hand extensions of PP7 were originally separate regulatory subunits and gene fusion yielded the products we see today [62]. Because these extra domains are conserved across eukaryotes, this too must have been an early event in PP5 and PP7 evolution. With the presumed exception of the Kelch-domain phosphatases, which were derived from a PP1 ancestor (Figure 1 and [62], discussed below), all other PPP enzymes of this branch have regulatory subunits that define the function of these proteins. As indicated previously, because a majority of these additional regulatory subunits are present in genomes of other eukaryotes and are often functionally conserved, it is likely that most originated early in eukaryotic evolution (Figure 1). This may explain why these PPP enzymes are so highly conserved. Having these interaction partners that define the function of the catalytic subunit certainly would hinder additional evolution of the catalytic engine. For instance, the majority of the PP1 regulatory subunits have a primary docking site called the RVXF/W motif that allows interaction with PP1 in a hydrophobic cleft on the surface of the enzyme [6466]. Undoubtedly this docking site was exploited in the history of PP1 allowing it to be recruited again and again to specifically dephosphorylate new target proteins by incorporation of this short docking motif in the regulatory subunit (Figure 1). Consistent with this proposal is the knowledge that nearly 100 PP1-binding proteins have now been identified and it is expected that there are at least that many more several times over [14,24].

Figure 2 Phylogenetic analysis of the PPP family enzymes from Homo sapiens and A. thaliana

An unrooted tree was generated by comparison of catalytic domains from members of the PPP family from human and A. thaliana. Catalytic domains were aligned using ClustalX, hand-optimized using GeneDoc, and the bootstrap tree was generated using the neighbour-joining function of ClustalX [27]. Proteins with an At suffix are from A. thaliana, while an Hs suffix denotes human sequences. The Arabidopsis sequences are: TOPP1 (At2g29400), TOPP2 (At5g59160), TOPP3 (At1g64040), TOPP4 (At2g39840), TOPP5 (At3g46820), TOPP6 (At5g43380), TOPP7 (At4g11240), TOPP8 (At5g27840), TOPP9 (not generally referred to as TOPP9, but listed as this here for clarity; At3g05580), AtPP2A-1 (At1g59830), AtPP2A-2 (At1g10430), AtPP2A-3 (At3g58500), AtPP2A-4 (At2g42500), AtPP2A-5 (At1g69960), AtPP4-1 (At4g26720), AtPP4-2 (At5g55260), AtPP5 (At2g42810), AtPP6 (At1g50370), AtFYPP3 (At3g19980) and AtPP7 (At5g63870). The human sequences are (GenBank® accession numbers): HsPP1α (NP_002699), HsPP1β (NP_002700), HsPP1γ (NP_002701), HsPP2A-α (NP_002706), HsPP2A-β (NP_004147), HsPP2B-α (NP_000935), HsPP2B-β (NP_066955) and HsPP2B-γ (NP_005596), HsPP4 (NP_002711), HsPP5 (NP_006238), HsPP6 (NP_002712), HsPP7-1 (PPEF1; NP_006231) and HsPP7-2 (PPEF2; NP_006230).

A unique PP1-related enzyme with extra domains is the PPKLs (PP1 and kelch-like). The PPKL enzyme family is defined by their large, kelch repeat containing N-terminal extensions and C-terminal PP1-like phosphatase domain. The kelch domain is a widespread and ancient motif, with limited conserved residues [67]. The founding members of the PPKL family are the Plasmodium falciparium protein PfPPα [68] and four proteins identified in A. thaliana [28,69]. BLAST searches with the kelch-repeat domain of these five proteins against translated genomes identified further family members in the kingdom plantae (green algae, vascular plants, mosses) and the superphylum alveolata, with candidates in both the obligate parasitic apicomplexa and the free-living aquatic ciliophora (Supplementary Figure S1 at We did uncover PPKL members in Solanum lycopersicum (tomato) and Vinis vinifera (grape) yet despite extensive searches, which included newly sequenced genomes, we could not identify orthologues in red algae, fungi or animals, nor in any non-eukaryotic species, in keeping with previous reports [69,70].

Comparisons of the conserved residues in the PPP family with the PPKL enzymes [71,72] indicate that most, but not all, conserved PPP residues are maintained in this family (see Supplementary Figure S2 at This analysis further underscores how (i) PPKLs contain five inserts in their phosphatase domain [68,69]; (ii) PPKL enzymes are more closely related to PP1 than to PP2A; (iii) a significant number of PP1 family residues, particularly those involved in small-molecule-inhibitor binding, either diverge from the consensus or are preceded by peptide sequences which may alter inhibitor binding [73]; and (iv) PPKL clades can be defined by certain residues. For further details please refer to Supplementary Figure S2 and [6870].

The origin of the PPKL family is still under debate, as is the origin of the apicomplexa, to which a large number of the PPKLs are annotated (Supplementary Figure S1 and [74,75]). We built a phylogenetic tree with the PPKL family members using either domain separately (results not shown) or whole protein sequences (Supplementary Figure S1; this Figure is representative for each approach). We, and others, have found PPKLs only in plants, green algae and several Alveolata species [70]. This shared genotype between otherwise dissimilar phyla support the current alveolate origin hypothesis whereby a progenitor of (green and red) algae and plants yielded an ancestral enzyme (i.e. the kelch domain/PP1-like enzyme), which was retained in these organisms via secondary endosymbiosis of the algae and subsequent nuclear gene transfer [76,77]. Further evolution of the alveolates and green plants could thus produce the plant-related kelch-domain phosphatases. This hypothesis could also explain why both apicomplexa and green plants lack most PTP members. Finally, the current PPKL distribution, which lacks members in the chromista, also supports recent studies that question the strength of the current grouping that yields the chromalveolate phylum [74,75]. We refer readers to the following websites ( or for current classification schemes.


While the other families of protein phosphatases have been studied for more than 20 years, those utilizing aspartate-based catalysis have only been noted in the last decade. The first identified member of this family was FCP1, a TFIIF-associated RNA polymerase II CTD phosphatase [78]. The aspartate-based catalysis group is often divided into FCP/SCP-like and HAD-family phosphatases, but both actually share the same catalytic motif and belong to different groups within the HAD superfamily ( The HAD-family phosphatases are ancient proteins, with FCP1 conserved throughout Eukaryota, like members of the PPP family. However, unlike most members of the PPP family, specificity of function has been gained through the addition of other domains to the catalytic module, not through association with other proteins (see Figure 1A for a modular representation). One common additional domain is the BRCT domain, which may function to direct these phosphatases to phosphorylated substrates.

The evolutionary conservation of FCP1, as with other phosphatases mentioned above, is completely logical where we understand biochemical function. FCP1 conservation throughout eukaryota is probably due to its role in the dephosphorylation of the CTD of RNA polymerase II, a process that is itself ancient and conserved. This is in contrast with the EYA (eyes absent) protein, another aspartate-based phosphatase, which has been identified only in multicellular eukaryotes (one to four members per organism), and plays a role in development. Mutations of these proteins lead to severe developmental abnormalities, such as Branchio-oto-renal syndrome, caused by mutation to the human EYA1 protein [79]. Most EYA homologues contain an N-terminal transactivation domain that forms an active transcription factor along with orthologues of the SIX (Sine oculis) protein of Drosophila. The exception to this rule is in plants, where Arabidopsis and Oryza express EYA orthologues that lack transactivation domains [27]. The absence of the transactivation domain itself is not surprising, as plants lack a homologue to SIX. However, since plants also lack all of the systems currently known to be affected by mutation of the EYA proteins, the function of these proteins in plants is still unknown. The aspartate-based phosphatases provide two excellent examples of how protein sequence conservation can infer functional conservation. The first is a ubiquitin-like domain containing CTD phosphatase, UBLCP1, which, like the EYA proteins, is conserved throughout multicellular eukaryotes. Only a single study exists on the human protein, demonstrating both phosphatase activity and nuclear localization of UBLCP1 [80]. The second is a 50 kDa translocase of the inner mitochondrial membrane (termed TIM50 or TIMM50), an essential component of the mitochondrial inner membrane in both human and yeast cells [81]. This protein has been studied in some detail; however, very little attention has been given to the presence of the phosphatase domain, save a single study demonstrating protein phosphatase activity [81]. Simply due to their respective conservation patterns, we infer that UBLCP1 will be involved, as are the EYA proteins, with multicellular development, and that the phosphatase domain of TIM50 plays an important role in the regulation of mitochondrial translocation.

Conversely, where species have ‘unique’ versions of these proteins, it suggests a species-specific role. Arabidopsis, rice and poplar genomes indicate a dramatic expansion in the total number of FCP-like phosphatases, with more than double the number of proteins compared with humans (19, 19 and 23 respectively, compared with eight in humans [27]). Hence these proteins may have a number of unique roles in plants, indeed demonstrated by recent studies implicating two of the unique members, CPL1 and CPL2, in jasmonic acid biosynthesis, as well as stress and auxin responses [82,83].


Two of the best examples of convergent evolution at the molecular level come from the phosphatase field. We have already noted the structural convergence of the PPP and PPM active sites with little or no sequence relatedness in their respective genes. More recently the human and plant PTPs, laforin and SEX4 (starch excess 4), have been demonstrated to be functional equivalents [84]. Lafora disease is an autosomal recessive disorder characterized by the accumulation of long-stranded, poorly branched glycogen (polyglucosan), known as Lafora bodies. Nearly 50% of Lafora disease patients have a recessive mutation in EPM2A (epilepsy of progressive myoclonus type 2 gene A) [85] that encodes laforin, a DSP with a carbohydrate-binding module (CBM 20) N-terminal to the PTP domain (Figure 3). Laforin is the only human phosphatase with a carbohydrate-docking module (PP1 associates with glycogen through separate glycogen-binding subunits) and displays the currently unique ability among PTPs to dephosphorylate complex carbohydrates [84,86]. Consistent with results from a mouse knockout of EPM2A, laforin is now recognized as a glycogen phosphatase and mutation of the laforin gene leads to aberrant accumulation of glycogen and disease development. Both starch and glycogen are phosphorylated molecules and only recently has the significance of this modification been appreciated. In plants the phosphorylation and dephosphorylation of starch is critical to its accumulation and appropriate mobilization in leaves [8789]. The power of genomics combined with phylogenetics (and biochemistry) was nicely demonstrated when laforin was further investigated [84]. A number of protists (unicellular eukaryotes) synthesize and store floridean starch, another complex carbohydrate related to amylopectin. Genomics of several of these protists (Toxoplasma gondi, Eimeria tenella, Tetrahymena thermophila, Paramecium tetraurelia and Cyanidioschyzon merolae) has revealed putative laforin orthologues. Characterization of the least-conserved laforin (from C. merolae) showed it has biochemical properties similar to human laforin and localizes to floridean starch in vivo [84]. Each of these five protists identified with a laforin orthologue are of red algal descent, contain a true mitochondrion and produce floridean starch. Other organisms of red algal descent lack laforin orthologues [88] and in each case they were noted to not make floridean starch and/or not have true mitochondria. Further analysis of genomic data revealed that laforin orthologues only exist in these five protists and all vertebrates, but not in any invertebrate animals or other eukaryote, bacteria or archeal genomes. In the same study it was suggested that laforin originated in an ancestral red algae early in eukaryotic evolution and was maintained only in vertebrates and those organisms that produce floridean starch.

Figure 3 Domain structure and gene organization of SEX4 and laforin

The plant starch phosphatase SEX4 is shown with its chloroplast transit peptide (cTP), dual-specificity (or PTP) phosphatase domain (DSP) and carbohydrate-binding module 20 (CBM20). Human laforin has dual-specificity (or PTP) phosphatase domain (DSP) and carbohydrate-binding module 20 (CBM20) domains, but they are reversed in order indicating convergent evolution to a common target substrate.

A genetic screen for plants that produce excess starch (SEX mutants) yielded several novel mutants including one designated SEX4 whose locus was mapped to At3g52180 in Arabidopsis. This gene, like laforin, encodes a DSP with a carbohydrate-binding module (CBM 20), but in this case the domain order is reversed (Figure 3) indicating that SEX4 and laforin are not orthologues. SEX4 was predicted and shown to localize to the chloroplast (the site of starch synthesis/degradation) and to associate with and dephosphorylate starch [8789]. Being the only genes in animals and plants whose mutation causes inappropriate glycogen/starch accumulation in combination with both having CBM and DSP domains, led Gentry et al. [84] to postulate that SEX4 and laforin were functional equivalents. In an elegant study they generated stable cell lines of the Arabidopsis SEX4 mutant cell line sex4-3 transformed with either wild-type SEX4 or human laforin (preceded by a SEX4 chloroplast transit peptide). Both of these reverted the sex4-3 mutant starch-excess phenotype. This is an excellent example of convergent evolution showing that plants and animals evolved functionally equivalent phosphatases.


The age of comparative genomics will have a major impact on evolutionary biology and biological research in general. Already this field has played a role in resolving key issues in the phylogenetic relationships between organisms (particularly those that sit at evolutionary nodes) and shedding light on the emergence of basic cellular processes [90] and evolution of species-specific traits [91].

Information gleaned from genomics studies will also allow new large-scale or systems biology approaches to answer research questions in all fields of biology. For example, knockdown and knockout studies can be initiated that take a complement of phosphatases and asks what phenotype is generated when the expression of an individual enzyme is dramatically modified or eliminated. We can also do experiments where individual enzymes from the phosphatase catalogue can be overexpressed in cells and then examine their influence on the phosphorylation state of particular proteins. Indeed, the age of comparative genomics has meant that the protein phosphatases have come of age.


This work was supported by the Natural Sciences and Engineering Research Council of Canada [grant number 216895]; the Alberta Ingenuity Center for Carbohydrate Science [grant number RT707047]; and the Alberta Cancer Board [grant number 23161].

Abbreviations: CDC, cell division cycle; CTD, C-terminal domain; DSP (or DUSP), dual-specificity phosphatase; EPM2A, epilepsy of progressive myoclonus type 2 gene A; EYA, eyes absent; FCP/SCP, TFIIF (transcription initiation factor IIF)-associating component of CTD phosphatase/small CTD phosphatase; HAD, haloacid dehalogenase; LMPTP, low-molecular-mass PTP; MAPK, mitogen-activated protein kinase; MAPKP, MAPK phosphatase; MTM, myotubularin; PAS2, Pasticcino 2; PP1 etc., protein phosphatase 1 etc.; PPKL, protein phosphatase 1 and kelch-like; PPM, metallo-dependent protein phosphatase; PPP, phosphoprotein phosphatase; PRL, phosphatase of regenerating liver; PTB domain, phosphotyrosine-binding domain; PTEN, phosphatase and tensin homologue deleted on chromosome 10; PTP, protein tyrosine phosphatase; PTPLA/B, PTP-like (proline instead of catalytic arginine), member A/B; RLK, receptor-like kinase; RPTP, receptor PTP; SEX4, starch excess 4; SH2 domain, Src homology 2 domain; SIX, Sine oculis; TFIIF, transcription initiation factor IIF; UBLCP1, ubiquitin-like domain containing CTD phosphatase 1


View Abstract