Review article

The histidine phosphatase superfamily: structure and function

Daniel J. Rigden


The histidine phosphatase superfamily is a large functionally diverse group of proteins. They share a conserved catalytic core centred on a histidine which becomes phosphorylated during the course of the reaction. Although the superfamily is overwhelmingly composed of phosphatases, the earliest known and arguably best-studied member is dPGM (cofactor-dependent phosphoglycerate mutase). The superfamily contains two branches sharing very limited sequence similarity: the first containing dPGM, fructose-2,6-bisphosphatase, PhoE, SixA, TIGAR [TP53 (tumour protein 53)-induced glycolysis and apoptosis regulator], Sts-1 and many other activities, and the second, smaller, branch composed mainly of acid phosphatases and phytases. Human representatives of both branches are of considerable medical interest, and various parasites contain superfamily members whose inhibition might have therapeutic value. Additionally, several phosphatases, notably the phytases, have current or potential applications in agriculture. The present review aims to draw together what is known about structure and function in the superfamily. With the benefit of an expanding set of histidine phosphatase superfamily structures, a clearer picture of the conserved elements is obtained, along with, conversely, a view of the sometimes surprising variation in substrate-binding and proton donor residues across the superfamily. This analysis should contribute to correcting a history of over- and mis-annotation in the superfamily, but also suggests that structural knowledge, from models or experimental structures, in conjunction with experimental assays, will prove vital for the future description of function in the superfamily.

  • acid phosphatase
  • cofactor-dependent phosphoglycerate mutase
  • fructose-2,6-bisphosphatase
  • histidine phosphatase
  • structure–function relationship


Research into the histidine phosphatase superfamily dates back to 1935 when an activity was found in yeast capable of interconverting 2PGA and 3PGA (2- and 3-phosphoglycerate) [1]. Of course, it was only later that the sequence of the enzyme responsible, dPGM (cofactor-dependent phosphoglycerate mutase), was determined [2], and later still when it was realized that very different activities, the first being F26BPase (fructose-2,6-bisphosphatase), could be catalysed by homologous enzymes [3]. In the 20 years since, a great many different functions have been associated with the superfamily and structures have been determined for some of the respective proteins. All of these new discoveries have been phosphatase activities of one kind or another, illustrating that, ironically enough, the earliest and arguably best studied member of the superfamily is anomalous in catalysing primarily a mutase reaction. Although the superfamily has undoubtedly benefited from containing the key glycolytic enzyme dPGM, the subject of many early studies [4], a reading of the literature also illustrates a disadvantage arising from the mutase-dominated superfamily history; many superfamily members discovered in genomes have been suggested to be mutases when a phosphatase activity is in fact much more likely. Indeed, like other superfamilies [5], the histidine phosphatases present considerable challenges to automated genome annotation tools and have a history of widespread mis- and over-annotation. Even human interpretation of sequence relationships has been regularly biased towards mutase activity rather than the actual predominant phosphatase activity [68]. An analysis of the structure–function relationship in the family is essential to inform future function annotation and is one of the objectives of the present review. Another is to bring together, for the first time, the literature on activities of histidine phosphatases and thereby illustrate how impressively diverse they are, involving many proteins of medical or applied interest.


Iterative database searches in current databases using well-known characterized proteins such as dPGMs or F26BPases provide clear evidence for a deep split of the superfamily into two branches. PSI-BLAST searches using queries from branch 1 fail to produced branch 2 members among hit lists and vice versa. As discussed in detail below, the functions associated with branch 1 are much more diverse than those in branch 2, the latter containing mainly enzymes characterized as APs (acid phosphatases) or phytases (Table 1). Nevertheless, modern profile–profile comparison methods readily detect the distant evolutionary relationship between the two branches, a relationship revealed unambiguously by the first crystal structure of a branch 2 enzyme in 1993 [9]. In fact, even early sequence studies predicted a relationship between dPGMs and F26BPases (in branch 1) and APs (in branch 2) on the basis of their sharing a small RHG motif, near the beginning of the sequence, whose histidine residue was already known to be phosphorylated during catalysis [10].

View this table:
Table 1 The principal known or reliably inferred molecular functions of histidine phosphatase domains grouped into broad functional categories

Detail on reactions catalysed, their broader significance and further references can be found in Supplementary Table S2 ( and in the text.

Sequences in the histidine phosphatase superfamily are very diverse, both within and between the two principal branches. Supplementary Table S1 ( shows percentage sequence identities determined for pairwise comparisons between members of known structure. In branch 1, SixA, which is considerably smaller than the other members (see below), shares only 15–19% sequence identity with them. The similarly sized dPGM, PhoE and F26BPase share just 19–28% sequence identity between different catalytic activities. The comparison of dPGMs and BPGMs (bisphosphoglycerate mutases) shows how simple sequence-identity-based function annotation can easily go astray: human dPGM and BPGM, with different principal catalytic activities, share 53% sequence identity, whereas the value for the bacterial dPGMs from Escherichia coli and Mycobacterium tuberculosis, catalysing the same reaction, is only 42%. Between branches 1 and 2 the highest observed percentage sequence identity is just 18%. Low values, down to 9%, are also recorded for pairwise comparisons within branch 2 where less well conserved N- and C-terminal extensions can contribute to diversity.

The division of the superfamily is reflected in its twin entries in the Pfam domain database [11] ( The ‘phosphoglycerate mutase family’ entry (PF00300) corresponds to branch 1 and the ‘histidine acid phosphatase’ entry (PF00328) represents branch 2. Their relationship is acknowledged by their union in the ‘phosphoglycerate mutase-like superfamily’ clan (CL0071). In the present version of the Pfam database (version 22, July 2007), branch 1 with just over 3000 members outnumbers branch 2 by four to one. Interestingly, archaea contain no members of branch 2 and just 37 representatives of branch 1. These numbers, and the diversity of the archaeal sequences, suggest that the common ancestor of archaea lacked a histidine phosphatase and a small number of organisms later acquired branch 1 members by lateral gene transfer. Interestingly, branch 2 proteins are largely found in eukaryotes (634 sequences) with fewer in bacteria (108 sequences). In branch 1, the reverse is true, with bacterial sequences (2192) outnumbering eukaryotic proteins (781).

The COG (Clusters of Orthologous Groups) database [12] (, designed to group orthologous proteins from genome sequences, covers a small portion of the histidine phosphatase superfamily with three entries. However, there are a number of problems with these entries, problems that automated annotation pipelines have unfortunately propagated into large numbers of individual sequence annotations. For example, examination of COG0588, ‘phosphoglycerate mutase 1’, shows that many members do not share substrate-binding residues (discussed later) with the well-characterized E. coli dPGM and therefore are probably not genuine dPGM enzymes. Worse, COG0406, containing predominantly bacterial sequences, is labelled ‘fructose-2,6-bisphosphatase’ on the basis of the inclusion of a Saccharomyces cerevisiae sequence with that activity, yet F26BP has not been detected outside eukaryotes. {A putatively active PFK-2 (phosphofructokinase 2)/F26BPase has been found in two Desulfovibrio species [13,14], neither present in COG0406.} The remaining entry is COG2062, ‘phosphohistidine phosphatase SixA’.

Importantly, many members of the superfamily cannot be reliably annotated with any of the known functions (see below) since they bear very limited sequence similarity to characterized members. Clustering experiments (D. J. Rigden, unpublished work) show many well-defined groups, presumably sharing the same function, for which no functional data are available. Given sufficient knowledge of the structure–function relationship in the superfamily, it should be possible to apply bioinformatics tools to predict their functions.


A census

Supplementary Table S2 ( presents a compilation of functions characterized for members of the histidine phosphatase superfamily. It contains only activities, demonstrated or strongly inferred, with potential physiological relevance: many enzymes will hydrolyse unnatural model substrates, for example. A condensed version with activities placed in broad categories is shown as Table 1. The most impressive feature of the superfamily is the range of activities catalysed. This is manifest most obviously in the presence of the dPGMs in a list otherwise dominated by phosphatases. As discussed later, a common catalytic machinery is shared by mutases and phosphatases. The list also shows that specificities among enzymes vary dramatically from small molecules such as phosphoglycerate (Mr=187), the substrate of dPGM, to large phosphorylated proteins such as the HPt (histidine-containing phosphotransfer) domain of ArcB [15], the substrate of SixA (Mr=9050).

dPGM, as a participant in both glycolysis and gluconeogenesis, contributes to both catabolic and anabolic pathways. Other anabolic enzymes include CobC, involved in cobalamin synthesis [16], mannitol-1-phosphatase, which forms the Eimeria parasite storage compound mannitol [17], and Neo13, characterized from Streptomyces fradiae, which contributes to synthesis of the antibiotic neomycin [18]. Catabolic enzymes include EPP (ecdysteroid phosphate phosphatase), which releases free ecdysteroids from conjugated reserves in insects [19,20], and glucose-1-phosphatase, required for growth of E. coli on a glucose 1-phosphate carbon source [21]. Fungi have two superfamily members involved in scavenging phosphate from extracellular sources: Pho5p, which degrades extracellular nucleotides [22], and phytase, which releases the phosphate groups from phytic acid [23,24].

As well as metabolic functions, a surprisingly large proportion of the family is involved in signalling or regulation of one kind or another. EPP, mentioned above, is likely to be important for insect development, whereas two distinct members of the family, F26BPase and TIGAR [TP53 (tumour protein 53)-induced glycolysis and apoptosis regulator], have roles, demonstrated or strongly suggested respectively, in controlling the concentration of intracellular F26BP (fructose 2,6-bisphosphate) and thereby influencing the rates of glycolysis and gluconeogenesis in eukaryotic cells [2527]. For a fuller discussion of F26BPases, and their N-terminal PFK-2 domains, the reader is referred to other reviews [13,14]. Interestingly, the well-characterized F26BP-binding residues of F26BPase are not conserved in TIGAR, indicating that the two proteins have converged on the same substrate and share more distant ancestry than their common activity would suggest [28]. The primary in vivo role of PMU1 (whose name derives from mis-annotation as a potential phosphomutase) in yeast is likely to involve regulation of the concentration of another small molecule, AICAR (5-amino-4-imidazolecarboxamide riboside), and which has an influence over transcription of adenine synthesis genes [8]. Eukaryotic Minpp1 (multiple inositol-polyphosphate phosphatase 1) regulates the cellular concentrations of other signalling molecules, namely inositol pentakisphosphate and inositol hexakisphosphate [29]. Also in eukaryotes, the signalling molecule lysophosphatidic acid is a known substrate of both lysophosphatidic acid phosphatase [30] and PAP (prostatic acid phosphatase) [31].

Protein phosphatases from both branches have regulatory roles. In branch 1, SixA acts on a phosphohistidine residue in ArcB with a well-characterized role as a regulator of two-component signalling in bacteria [15,32]. In branch 2, superfamily members target phosphotyrosine residues in important regulatory proteins such as ErbB2 [33], ErbB4 [34] and the EGF (epidermal growth factor) receptor [35]. Other protein phosphatase activities in branch 2 have interestingly different roles, in vertebrate bone development [36] and in excystation of the parasite Giardia lamblia [37].

In two cases, eukaryotic members of the superfamily have had wholly non-catalytic roles proposed for them: protein–protein interactions in the case of human PGAM5 [38] and stabilization/assembly in the case of the TFIIIC (transcription factor IIIC) τ 55 kDa subunit in yeast [39]. However, in both cases, the prerequisites for catalysis (see below) are apparently present so that catalytic roles remain possible. This situation is graphically illustrated by the story of Sts-1, a key T-cell regulator [6,40,41] with a multidomain structure. The presence of a histidine phosphatase domain led investigators to test for dPGM activity and find none [6]. Since loss of that domain destroyed the dimeric quaternary structure, it was proposed that it functioned merely to assemble the dimer [6]. However, after its close kinship with EPP was recognized, it was tested for and was found to have activity against ecdysteroid and steroid phosphates [20], although its true in vivo substrate may be neither.

There are several cases where indirect evidence provides broad clues as to the function of histidine phosphatase superfamily members. For example, the cell-surface Pho4 protein of Schizosaccharomyces pombe can be repressed by thiamine, suggesting a role in thiamine metabolism, possibly cleavage of a thiamine phosphate [42]. In the case of BluF from Rhodobacter capsulatus, it is the genomic location of its gene in an operon containing exclusively cobalamin synthesis genes which argues strongly for a role (distinct from CobC above, also present in R. capsulatus) in that process [43]. Similarly, ORF2 (open reading frame 2) of the plasmid-borne avrPphF gene in Pseudomonas syringae pathovar phaseolicola, a histidine phosphatase, was implicated in bacterial virulence, a role later confirmed, although its substrate remains unknown [44]. For a homologue from Arabidopsis thaliana, At74, induction by glucose and an increase in glucose uptake by overexpressing plants point to a role in carbohydrate metabolism [45].

Domain fusions are among the most potent sources of functional clues in the superfamily. Figure 1 illustrates the domain content of at least partly characterized members of the histidine phosphatase superfamily. Genome searches suggest that further functionally suggestive domain architectures exist in other parts of the superfamily (D. J. Rigden, unpublished work). First among these domain combinations is the pairing of F26BPase with the kinase domain (placed N-terminally) that catalyses the near-reverse reaction: synthesis of F26BP. The relative activity of these two domains is regulated by phosphorylation state in a way not yet fully understood [14], and differs between mammalian isoenzymes due to intrinsic sequence features [46]. Very recently, a potentially similar situation has been discovered involving a branch 2 histidine phosphatase domain [47]. The N-terminal domain of yeast Vip1 has inositol-hexakisphosphate kinase activity and is followed by a phosphatase domain of as yet unknown specificity, but which is very likely to act on multiply phosphorylated inositol species. The SH3 (Src homology 3) and UBASH (ubiquitin-associated and SH3 domain-containing) domains found in EPP and Sts-1, suggestive of protein- and ubiquitin-binding roles respectively, have been investigated experimentally [6], confirming predictions. The same proteins also contain 2H phosphoesterase domains [48], as yet uncharacterized, but likely to share ligands in some way with the histidine phosphatase domain. Additional domains have been noted in some PFK-2/F26BPase sequences [13,49]. Thus, in some trypanosomatid sequences, multiple ankyrin repeat motifs are present [49], indicating a possible site of interaction of the PFK-2/F26BPase protein with others, while plant homologues contain a carbohydrate-binding module at their N-terminus, suggesting a possible means to control PFK-2/F26BPase activity [13]. Finally, histidine phosphatase domains are found N-terminally fused to dehydrogenase domains in proteins involved in opine synthesis [5052]. With the C-terminal domains assigned functions such as agropine synthesis reductase [50], it is clear that the histidine phosphatase domains will be involved in some way in the opine synthetic pathway.

Figure 1 Domain architectures of selected histidine phosphatase superfamily members

All domain combinations that are at least partially characterized are shown, as well as some designed to illustrate the range of sizes of phosphatase domains. Domains are drawn approximately to scale. Catalytic domains are coloured blue (histidine phosphatase domains; dark for branch 1, light for branch 2), red (PFK-2 domains), purple (the 2H phosphoesterase domain seen in Sts-1 and relatives [20]), pink (the reductase domain of enzymes involved in opine synthesis [5052]) and brown (the Vip1 kinase domain [47]). Other domains are coloured mauve (the carbohydrate-binding module of plant PFK-2/F26BPases [13]), green (the ankyrin repeats seen in some trypanosomatid PFK-2/F26BPases [49]; darker shades indicating more reliably assigned repeats), and grey [UBA (ubiquitin-associated)] or yellow (SH3), both found in Sts-1 and relatives [20].

Subcellular localization

When the known subcellular localizations of histidine phosphatase superfamily members are listed, an interesting dichotomy is observed. In branch 1, cytoplasmic location is the norm with other members found partially or wholly in the nucleus (Table 2). The only known exceptions to this rule are the predicted periplasmic location of the Ais/AfrS family [53], and the presence in yeast cell wall of dPGM [54]. In contrast, all branch 2 family members appear to enter the secretory pathway. Some remain in the ER (endoplasmic reticulum), others are found at the cell surface, periplasm or cell wall, and others are simply secreted (Table 2). Lysosomal AP, unusually, is transported as a type I membrane protein from the ER, via the plasma membrane, to the lysosome [55] where its luminal portion is slowly proteolytically cleaved.

View this table:
Table 2 Subcellular localization of some histidine phosphatase superfamily members

There seems to be no obvious intrinsic reason for the largely distinct subcellular localizations of the two branches, but the fact that the division is visible across bacteria and eukaryotes perhaps suggests that each branch became specialized for an intra- or extra-cellular use soon after an early divergence. In accordance with their subcellular localization, branch 2 enzymes commonly, although not invariably, contain disulfide bonds, with Swiss-Prot [56] entries cataloguing anything from zero to five of them per protein. This range is confirmed by available crystal structures, since an enzyme from Francisella tularensis (PDB code 2GLC) has none, whereas Aspergillus fumigatus phytase (PDB code 1QWO) has five. The disulfide bonds, where present, are found throughout the structure, but not near the catalytic site, indicating further a general stabilizing function rather than any regulatory role. Swiss-Prot entries indicate no known disulfide bonds in branch 1, but conservation and proximity in a molecular model suggest strongly that the unusual periplasmic branch 1 enzymes of the Ais/AfrS family contain two [53].

Medical and applied importance

As expected from their diverse important roles, deficiencies or mutations in many human members of the histidine phosphatase superfamily, from both branches 1 and 2, lead to diseases. Deficiency conditions of varying degrees of severity result from loss of activity of dPGM [57], BPGM [58], lysosomal AP [59] or macrophage AP [60].

For some branch 1 human enzymes, there is potential therapeutic interest in inhibition of activity. One exciting emerging medical angle concerns dPGM as a sensitive node of the glycolytic pathway, which is up-regulated in cancer cells of various kinds [61]. For example, when phosphorylation of dPGM was prevented in a tumour cell line, a reduction in glycolysis and arrest of tumour growth was observed [62]. Later, a compound MJE3 isolated as having an anti-proliferative effect on breast tumour cells [63] was found to target dPGM by covalent attachment of a spiroepoxide group to a lysine residue near the catalytic site [64]. Modulation of erythrocyte BPGM activity is also of potential medical interest. BPGM is the sole enzyme responsible for controlling the concentration of 2,3-BPG (2,3-bisphosphoglycerate) in erythrocytes. Its synthase activity produces 2,3-BPG, while the phosphatase side reaction degrades it (Table 1). 2,3-BPG lowers the apparent affinity of haemoglobin for oxygen by binding selectively to the deoxygenated form. A rise in 2,3-BPG concentration is consistently seen in conditions of hypoxia [65], suggesting an adaptive response while lowering 2,3-BPG concentrations has an anti-sickling effect [66]. Since BPGM is responsible for the loss of 2,3-BPG in stored blood, there is also an applied interest in discovery of an agent that would inhibit its phosphatase activity [67].

Although it is fair to say that we still lack a complete understanding of AP activities and in vivo substrates, these branch 2 enzymes have important clinical applications (reviewed in [68]). In particular, PAP has long been used as a serum marker for prostate cancer [69], and its overexpression on cancer cells is being exploited for immunotherapy [70]. More speculatively, both well-characterized activities of PAP lead to down-regulation of cancer growth [31,33], suggesting that the enzyme may have therapeutic value. Conversely, should inhibition of the enzyme be desired, good inhibitors are already available [71]. TRAP (tartrate-resistant acid phosphatase) is a marker for bone resorption and can be monitored in patients with metabolic bone disorders [72]. Again in humans, the suggestion that multiple inositol-polyphosphate phosphatase could contribute to the pathogenesis of some thyroid carcinomas [73] means that specific inhibitors could be of therapeutic interest.

In other medically relevant species, histidine phosphatases are also of interest. For example, a peripheral vacuolar AP is required for excystation of the parasite Giardia lamblia [37], suggesting that a selective inhibitor of that enzyme might have value as an anti-parasitic agent. In another group of parasites, the Leishmania species, three APs have been described and, although conclusive evidence is lacking, their conservation among Leishmania species is interpreted as suggestive of an important role in the growth or development of the parasite [74]. Finally, a better understanding of the molecular functions of the histidine phosphatases implicated in synthesis of different classes of antibiotics [18,5052,75] may help efforts to produce novel antibiotics.

The most well-known applied use of a histidine phosphatase is the addition of phytase to animal feed, first in order to increase the availability of the phosphate component of phytate, obviating the need to supply phosphorus supplements, and secondly to liberate iron and other trace elements from their phytate chelation [76]. To this end, phytases with improved molecular properties have been both designed through mutations of known enzymes [77] and sought by screening (e.g. [78]). Another potential agricultural application could involve EPP [19,20] as a novel insecticide target. Although formal validation is still required, the role of EPP in liberating inactive stores of ecdysteroids during insect development suggests that its inhibition should have serious consequences for the insect. Finally, parasitic protozoa of the genus Eimeria are a significant problem in the poultry industry. These organisms use mannitol-1-phosphatase during the synthesis of mannitol, an essential storage compound, suggesting that inhibitors of this enzyme, absent from the host, could be useful for parasite control [79].


Table 3 presents a compilation of all histidine phosphatase superfamily structures available at the time of writing. As the deposition dates show, structural studies date back to the early days of protein crystallography. The first dPGM structure was deposited in 1975, 1 year after publication of the first Protein Data Bank [80] newsletter listed just 12 proteins for which co-ordinates were available. Unfortunately, erroneous positioning of catalytic histidine residues misled efforts to understand the mechanism [4] until the availability of a corrected structure in 1997 [81]. The first branch 2 crystal structure was that of rat PAP in 1993 [82]. In recent years, structure determinations have accelerated, so that at least nine distinct molecular functions are now covered. In order to provide a structural context for discussion of catalysis and substrate diversity, all known structures of the superfamily were collected together and superimposed with MUSTANG [83] with post-processing of the results with STACCATO [84]. A dendrogram representation of the structures, on the basis of a consensus calculation on the results of six different structural superposition methods, was created using the PROCKSI server ( and is shown in Figure 2. This Figure confirms the broad division of the superfamily into two branches, the first containing dPGMs, F26BPases and other activities, and the second containing APs and phytases.

View this table:
Table 3 Available structures of members of the histidine phosphatase superfamily

1 Å=0.1 nm

Figure 2 Consensus tree derived from analysis of all histidine phosphatase superfamily structures

The Figure was generated at the PROCKSI server ( References for the protein structures are shown in Table 3.

A selection of 11 structures was made in order to represent the broad diversity of known structures in the superfamily. Comparison of these, in order of increasing size, with the near-minimal core α/β domain present in SixA [85] reveals how insertions into the fold differentiate between different lineages and define the substrate-binding site, thereby conferring the observed activity. These insertions, sometimes as long as the core domain itself, together with the sometimes substantial N- and C-terminal extensions, explain the fact that the largest histidine phosphatase structure yet determined is almost three times the size of the smallest. PDB accession codes and references can be found in Table 3. A structure-based sequence alignment of representative histidine phosphatases is shown in Figure 3.

Figure 3 Structure-based sequence alignment of selected histidine phosphatase superfamily members

The Figure was generated by post-processing of a MUSTANG [83] structure alignment with STACCATO [84]. The structures shown are PDB codes 1UJC, E. coli SixA [85]; 1H2E, G. stearothermophilus PhoE [92]; 1K6M, human liver F26BPase [46]; 1E58, E. coli dPGM [89]; 1NT4, E. coli glucose-1-phosphatase [91]; 1DKQ, E. coli phytase [103]; 1QWO, A. fumigatus phytase [93]. The secondary structures of the smallest (1UJC) and largest (1QWO) proteins are shown below the alignment with β-strands of the core β-sheet (six in 1UJC, seven in 1QWO) and selected helices numbered. The alignment is shaded blue according to sequence conservation with certain sets of residues (see text for details) picked out as follows: white on red, conserved catalytic core (see also Figure 6); black on green, proton donors; black on yellow, additional members of the ‘phosphate pocket’; black on purple, substrate binding residues. Boxed blue residues are non-binding, but conserved positions lying near the catalytic histidine (see Figure 6 and text). Note that the alanine mutations present at position 17 of 1DKQ and position 18 of 1NT4 have been replaced by the naturally occurring histidine. The Figure was produced using JALVIEW [153].

The principal structural difference between SixA and the Thermus thermophilus protein of unknown function lies in the 35-residue insertion of the latter between β3 and α3. An insertion, of at least this size, is present here in all structures apart from SixA. In addition, the β1–α1 loop of the T. thermophilus protein is slightly larger and adopts a different conformation, packing against the β3–α3 insertion. The characterized M. tuberculosis phosphatase [86] resembles the T. thermophilus protein, but has a slightly larger α4–β5 loop which also packs against the β3–α3 insertion. The monomeric S. pombe dPGM and Geobacillus stearothermophilus PhoE (Figure 4b) follow this same general scheme, although, of course, with detailed sequence differences defining different specificities (see below). In F26BPase, a substantial 25-residue C-terminal tail is added to this scheme (Figure 4c). The tail twice traverses the entrance to the catalytic site, essentially blocking substrate access, and its removal increases F26BPase activity [87]. Although a tail is absent from S. pombe dPGM, the E. coli dPGM structures show how the dPGM C-terminal tails, where present, adopt a completely different conformation from those in F26BPases (Figure 4d). dPGM tails are shorter and lie along the rim of the entrance to the catalytic site. Again with the exception of the S. pombe enzymes, dPGMs contain an additional ∼25-residue section in the large β3–α3 insertion (Figure 4d), which, in the tetrameric enzymes such as that from S. cerevisiae, is involved in forming the dimer–dimer interface. Insertions from dimeric dPGMs exhibit sequence and conformational similarity to those of tetrameric enzymes, illustrating that relatively small sequence differences must define quaternary state. Indeed a single mutation at this interface in S. cerevisiae dPGM dramatically destabilized the tetrameric state without affecting kinetic properties [88]. Sequence differences between dimeric E. coli dPGM and the tetrameric S. cerevisiae enzyme in this region have been analysed [89].

Figure 4 Cross-eyed stereo cartoons of structures of representative histidine phosphatase superfamily members

Structures are shown as cartoons and are coloured grey except for orange, N-terminal region; red, C-terminal tail; magenta, insertion in the β3–α3 region; blue, insertion in the β1–α1 region; and turquoise, oligomeric dPGM-specific insertion (d). The structures are: (a) PDB code 1UJC, E. coli SixA [85]; (b) PDB code 1H2E, G. stearothermophilus PhoE [92]; (c) PDB code 1K6M, human liver F26BPase [46]; (d) PDB code 1E58, E. coli dPGM [89]; (e) PDB code 1NT4, E. coli glucose-1-phosphatase [91]; (f) PDB code 1DKQ, E. coli phytase [103]; (g) PDB code 1QWO, A. fumigatus phytase [93]. The phosphorylable histidine residue of SixA is shown as green spheres to localize the catalytic site.

In branch 2, the comparison of the E. coli glucose-1-phosphatase structure with SixA again shows insertions after β1 and β3, the former once again packing against the latter (Figure 4e). Importantly, however, the β3–α3 insertion is very large with approx. 100 residues. This insertion bears no obvious sequence similarity to the branch 1 insertions at the same position, and is composed almost entirely of α-helices. The E. coli glucose-1-phosphatase structure contains other insertions far from the catalytic site and a long C-terminal region, again apparently unrelated to comparable regions in branch 1 of the superfamily. The C-terminal extension forms a short region of antiparallel β-sheet with β6. In E. coli phytase, the β3–α3 insertion is even longer, at 140 residues almost as long as SixA itself (Figure 4f). Finally, and remarkably, the A. fumigatus phytase structure differs from the two branch 2 structures mentioned above in having a large N-terminal extension, of largely irregular secondary structure, which inserts into the back of the substrate-binding site, opposite the entrance (Figure 4g), resulting in the availability of Gln27 and Tyr28 for substrate interaction (see below).

The superimposed structures reveal that, quite surprisingly, the entrances to the catalytic sites lie on opposite faces of the protein in the two branches. Thus Figures 4(b)–4(d) show three branch 1 enzymes where substrate enters from the right past the C-terminal regions (red) with important roles in allowing and responding to substrate binding. The three branch 2 structures shown in Figures 4(e)–4(g) contain catalytic sites accessed from the left. This unusual situation has presumably been made possible by the positioning of most of the catalytic apparatus after the central two strands, β1 and β4, of the core β-sheet. In this way, additions to the fold can form a substrate-binding site facing in either direction.

It is tempting to consider the simple SixA structure, with few short loops between the regular secondary structure elements of the core (Figure 4a), as resembling the ancestor of all present-day members of the superfamily. By this hypothesis, evolution added insertions to the basic scaffold to produce structures contain-ing cavities suitable for binding of small-molecule substrates. The absolute lack of sequence or structural similarity between the respective insertions in branch 1 and branch 2 structures (Figure 4) strongly suggests that this decoration of a basic fold happened twice independently. Furthermore, the parallel processes of elaboration on the basic fold began early, since both branches are widely represented in both bacteria and eukaryotes.


Structural determinants

Catalytic activity in the superfamily centres on phosphorylation and dephosphorylation of a histidine residue that follows the first β-strand of the fold (His8 in E. coli SixA) (see Figures 5 and 6). A scheme for this catalytic cycle is shown in Figure 5. The in-line transfer [90] of the phospho group from substrate to enzyme occurs with the aid of several residues, forming a ‘phosphate pocket’, that hydrogen bond to the phospho group before, during and after transfer. These include a pair of flanking arginine residues, positions 7 and 55 in E. coli SixA numbering, and the other histidine, His108, which are completely conserved in known active members of the superfamily (see, e.g., Figure 3). Interestingly, the alignment also reveals a few other residues that are highly conserved between otherwise very diverse proteins (see also Figure 6). The most obvious of these is Gly9 which forms part of the well-known and characteristic RHG motif. Gly9 packs closely against the protein core and is well-conserved as a glycine or occasionally an alanine residue. The structure of E. coli glucose-1-phosphatase (PDB code 1NT4 [91]) captures a very rare example of an asparagine residue at this position and shows how this larger side chain leads to structural displacements elsewhere in the fold. The carbonyl of Gly9 makes an important hydrogen bond with the phosphorylable histidine side chain holding it in an appropriate conformation for catalysis (Figure 6). The backbone nitrogen of Gly9 makes a further hydrogen bond with the side chain of Thr59 (SixA numbering) which is >95% conserved as threonine or serine in both branches of the superfamily. The alignment also reveals a L[S/T]XXG motif in the region between β1 and β2. As shown in Figure 6, the glycine residue (numbered 27 in SixA) is conserved for steric reasons; a larger side chain here would disturb the catalytic apparatus. The leucine side chain contacts several key residues (His8, Gly9 and Arg55), while the threonine/serine residue stabilizes the local structure through hydrogen bonds to both the backbone of the neighbour of Gly9 (not sequence-conserved) and the backbone of Gly27. Taken as a whole, this hotspot of conservation in the second layer behind the catalytic site illustrates how important maintenance of catalytic histidine conformation is for activity.

Figure 5 Catalytic mechanism of the histidine phosphatase superfamily

The essentially invariant four residues of the catalytic core (see also Figures 3 and 6) are shown numbered as in E. coli SixA. His8 is phosphorylated during the course of the reaction. The other three residues interact electrostatically with the phospho group before, during and after its transfer and form most or all of the ‘phosphate pocket’. Additional neutral or positive residues, represented as PP in the diagram, may also contribute to the ‘phosphate pocket’ by hydrogen-bonding to the phospho group (see also Figure 3). The proton donor, an aspartate or glutamate residue whose position varies in different families (Figure 3), is shown as PD.

Figure 6 The near-superimposable conserved catalytic cores of E. coli SixA and A. fumigatus phytase

The cores are shown in green and nearby conserved residues in purple for E. coli SixA (left; 156 residues; PDB code 1UJC [85]) and A. fumigatus phytase (right; 442 residues; PDB code 1QWO [93]) which share 12% sequence identity overall. Broken lines represent hydrogen bonds. Selected β-strands are drawn and labelled. The tungstate ion binding to the ‘phosphate pocket’ of SixA is drawn as ball-and-stick in both panels. The Figure was generated using PyMOL (DeLano Scientific; A three-dimensional interactive version of this Figure can be seen at

Aside from the two histidine and two arginine residues mentioned above, other interactions with the phospho group (the PP residue in Figure 5) vary to a surprising extent (see Figure 3). In most of branch 1, Asn16 (E. coli dPGM numbering) provides a hydrogen bond. In the M. tuberculosis phosphatase [86], a serine residue is present at this position and may substitute functionally, although with the available crystal structure, and its non-physiological ligands, it is hard to say. Any Asn16-like interaction is clearly absent in SixA, however, due to a local conformational difference [85]. An additional interaction with the phospho group is provided by Gln22 in PhoE [92]. In SixA, allowing for a small conformational change on substrate binding, Arg21 is well positioned to interact with a phospho group [85], as predicted [53]. The same modelling predicted that a different residue, Arg64 would fulfil the corresponding role in the Ais family [53]. Recent modelling suggests that a further lineage in the superfamily, the EPP/Sts-1 family, has independently evolved an arginine residue at this position [20]. In branch 2 of the family, the phosphorylated structure of A. fumigatus phytase (PDB code 1QWO; [93]) reveals that the pair of arginine residues and second histidine residue are supplemented by Arg62 which is positioned similarly to Asn16 in E. coli dPGM [93]. The same interactions are predicted to hold for all branch 2 enzymes of presently known structure.

For the most studied members of the superfamily, dPGM, F26BPase and AP, there have been many mutagenesis studies supporting the roles of the key catalytic residues (see, e.g., [88,9498]). Typically, loss of these residues results in mutant proteins with no or much reduced catalytic activity, but there is an important exception that should be mentioned. When the phosphorylable His256 of rat testis F26BPase was replaced with an alanine residue, a highly unexpected 17% of catalytic activity remained [99], probably due to water taking over the nucleophilic role [100]. This result argues that caution should be exercised in inferring lack of catalytic activity for homologous sequences lacking elements of the otherwise conserved catalytic machinery.

An important component of the catalytic scheme (Figure 5) is the proton donor, required to donate a proton to the leaving group as the substrate transfers its phospho group to the enzyme. The only exception, not requiring proton donation, would be priming of dPGM with 1,3-BPG, involving transfer of the acyl 1-phosphate [4]. In most branch 1 members, a conserved glutamate residue, Glu88 (E. coli dPGM numbering), functions as proton donor [101]. It was therefore a surprise when it was realized that the lineages having a minimal fold lack this residue [92] and are predicted to use differently placed aspartate residues for this function. The crystal structure of SixA [85] suggests that the predicted conserved Asp18 is indeed likely to act as proton donor. In branch 2, early mutagenesis data clearly pointed to Asp304 (E. coli phytase numbering) as the proton donor [102], and crystal structures show it to be well positioned with respect to the rest of the catalytic apparatus. The side chain of Asp304, although contributed by a different part of the sequence, occupies approximately the same position in superimposed structures as Glu88 in E. coli dPGM. Asp304 as proton donor remains the consensus view, but it has been argued that the equivalent to the preceding histidine residue may, under certain circumstances, act as a secondary proton donor, at least in glucose-1-phosphatase [91]. This would require movement of the histidine side chain from the crystallographically observed position [91]. It is worth noting that such movement would be harder in branch 1 enzymes since proton donor Glu88, or non-catalytic Glu74 in SixA, packs against the histidine side chain. A further suggestion is that Asp304 and equivalents simply facilitate an intramolecular proton movement on the substrate [71]. During the hydrolysis of the phosphoenzyme structure, the same acidic residues activate bound water molecules for nucleophilic attack. Crystal structures have revealed suitably positioned water molecules in, for example, E. coli dPGM [89] and E. coli phytase [103] complex structures. When proton donors are highlighted on a sequence alignment (Figure 3), it becomes clear that, within the superfamily, they can be placed either shortly after strands β1 (SixA), β3 (dPGMs etc.) or β4 (branch 2).

The anomalous mutases

Most of the superfamily are simple phosphatases carrying out the scheme in Figure 5. However, there are anomalous members in the form of the mutases: dPGMs and BPGMs. Each of these catalyses three reactions (Table 1), but at very different relative rates [4]. Of these, the phosphatase reaction conforms to the scheme in Figure 5. The ‘priming’ of dPGMs, i.e. their phosphorylation by 2,3-BPG or 1,3-BPG on the phosphorylable histidine residue, can be considered as an incomplete phosphatase reaction. The ‘synthase’ reaction, the production of 2,3-BPG from 1,3-BPG, the predominant activity of BPGM, would first involve the transfer of the 1-phosphate of 1,3-BPG to enzyme. After reorientation of the resulting 3PGA in the catalytic site, the phospho group attaches to the 2-position and product 2,3-BPG is released from the enzyme. The mutase reaction, catalysed at the highest rate by dPGMs, involves binding of one phosphoglycerate form, either 3PGA or 2PGA, to the already phosphorylated (‘primed’) enzyme and transfer of phospho group to the empty 2- or 3-position. Intermediate 2,3-BPG then reorients within the catalytic site, transferring the other phospho group to the enzyme. The primed form of the enzyme is thereby restored and phosphoglycerate is released. The distinguishing characteristic of mutases compared with phosphatases is thus the ability to maintain intermediates bound while allowing their reorientation with the active site. In the case of dPGMs, a plausible mechanism for this reorientation has been proposed [104]. Differences between dPGMs and BPGMs relate first to the necessity of BPGMs to release 2,3-BPG, whereas for dPGMs, it is the obligatory intermediate, and secondly to the requirement of dPGMs to stabilize the phosphorylated form of the enzyme and protect it from hydrolysis.

The C-terminal tail has long been implicated in stabilizing the phosphoenzyme form of dPGM since the naturally tail-free dPGM from S. pombe and a proteolysed form of S. cerevisiae dPGM lacking seven C-terminal residues, both have lower mutase and higher phosphatase reaction rates when compared with dPGMs having the tails [105]. Interpretation of these data was complicated by the lack of interpretable density for the tail in all S. cerevisiae structures. The position of the tail was only seen on the determination of a structure of E. coli dPGM in its phosphorylated form, strongly linking ordering of the tail and phosphorylation state [89]. The earlier hypothesis that the tail helped to exclude water from the catalytic site, and thereby avoid hydrolysis of the phosphoenzyme [4], was rendered unlikely by the presence of ordered water molecules in the catalytic site of the phosphoenzyme structure [89]. Instead, the C-terminal tail appears to provide additional stabilizing interactions with the phosphorylated state, through interactions of tail residues with Asn19. This residue, and Asn16, which forms two hydrogen bonds with the phosphohistidine residue, lie on a loop whose conformation varies according to phosphorylation state [106]. Interestingly, PhoE complex structures indicate that, although C-terminal tail structuring similarly accompanies modification of the phosphorylable histidine residue, a different mechanism involving Arg9 and the C-terminal carboxy group applies [92].

For dPGMs, there are no reliable structures of complexes with substrates since there are reasons to doubt the validity of the reported 3PGA complex [107], as discussed in [106]. Nevertheless, bound sulfate and vanadate seen in other structures have offered clues as to likely substrate-binding residues [104,106]. The binding of 2PGA and 3PGA (to phosphorylated enzyme) and 2,3-BPG in two different orientations (as retained intermediate in the mutase reaction, bound to unphosphorylated enzyme) was modelled in E. coli dPGM [106]. The prediction was that backbone amide groups of Thr22 and Gly23 bind the carboxy group in all cases, and that side chains of Arg89, Tyr91, Arg115, Arg116 and Asn185 bind the distal phospho group. For BPGM, a remarkable set of complex structures, after various periods of soaking in ligands, have recently provided a detailed molecular picture of 2,3-BPG binding and transfer of its phospho group to the enzyme [90]. The dPGM predictions are broadly confirmed, suggesting that subtle differences are responsible for the different spectra of activities of dPGM and BPGM. Different explanations have been suggested, frequently involving Gly14 (human BPGM numbering; serine in dPGMs) and Ser24 (glycine in dPGMs) [106,109]. Misleadingly, the first BPGM structure [109] showed structural differences from dPGMs that turned out not to be present in its ligated form [90]. Now that a direct comparison of phosphorylated dPGM and BPGM can be made, it is clear that neither residue obviously influences activity directly. However, a noted restriction of catalytic size volume [109], in part relating to differences in the C-terminal tail region, is still apparent in BPGM compared with dPGM. This could easily impede the reorientation of 2,3-BPG within the catalytic site that lies at the heart of the mutase reaction. Additionally, although the sets of substrate-binding residues (seen in BPGM, modelled in dPGM) are very similar, a structural superposition shows many conformational differences in their side chains that could relate to different affinities for 1,3-BPG or 2,3-BPG between BPGMs and dPGMs. One particularly interesting position is residue 100 (BPGM numbering), occupied by a conserved arginine residue in BPGMs and by a conserved lysine residue in dPGMs. Deciphering subtle differences in ligand affinities between the two families will probably require the determination of a crystal structure of one in complex with 1,3-BPG.

In passing, it is intriguing to note that the present superfamily is not the only example of evolutionarily related phosphatases and mutases, other examples being the cofactor-independent phosphoglycerate mutases, one of whose domains is homologous with alkaline phosphatases [110] and β-phosphoglucomutase, closely related to the phosphatase branch of the HAD (haloalkanoic acid dehalogenase) enzyme superfamily [111]. In all cases, transient phosphorylation of the enzyme is involved in catalysis.

Substrate binding in phosphatases

Information regarding which residues are responsible for substrate binding in other members of the superfamily is not available for all structures; indeed the substrate(s) of some enzymes of known structure remain unknown [112]. In the case of F26BPase, the crystal structure of the complex with fructose 6-phosphate and phosphate [100], as well as mutagenesis data [113], implicate the following set of residues in substrate binding: Ile267, Tyr336, Arg350, Lys354, Tyr365, Gln391 and Arg395 (numbering for rat testis enzyme). For PhoE, no complex is available with natural ligand, but its preferred substrate, α-naphthyl phosphate, can be convincingly docked into its notably hydrophobic active site, suggesting that Met21, Trp109, Val153, Leu86 and Phe108 are all substrate-binding residues [114]. The crystal structure of E. coli SixA alone allowed for the prediction of a tentative model of the complex with substrate ArcB Hpt domain [85]. The model implicated SixA residues Asn26, Glu30 and Lys156 in substrate binding.

In branch 2 of the superfamily, complexes with inhibitors (vanadate, tartrate, tungstate) are again more common than substrate complexes. Nevertheless, substrate complex crystal structures are available for both E. coli enzymes, the phytase and the glucose-1-phosphatase [91]. The principal phytate-binding residues in the former are Thr305, Arg267 and Lys24 (there are other interactions via water). In glucose-1-phosphatase, Glu196 and Tyr247 (via water) bind glucose-1-phosphate, whereas the specificity of the enzyme for the 3-position of its alternative substrate phytate is defined by the gating residue Leu24 [91]. The mode of interaction of phytate with A. fumigatus phytase can be predicted based on the E. coli phytase complex [115]. In this position, phytate interacts with the catalytic core plus Gln27, Tyr28, Lys68, His189, Lys278, Tyr282 and Asn340.

For future efforts in genome annotation in the superfamily, it is important to know which regions of the enzymes are likely to contribute to core catalysis and/or the determination of substrate specificity. As mentioned above, the conserved catalytic core is contributed by residues lying in the regions after strands β1, β2 and β4 (see Figures 3 and 6), whereas proton donors map to the portions following β1, β3 and β4 (Figure 3). These are the basic components for catalysis (although the surprising activity of the H256A F26BPase mutant must be remembered). To these may be added additional ‘phosphate pocket’ interactions, varying between lineages, but, so far, always contributed by residues lying in the β1–β2 loop (Figure 3). The pattern for specificity-determining residues, binding the remainder of the substrate, is very different. These are to be found scattered throughout the sequence alignment, clustered in the loops following β1, β3 and β4, but also at various points in the large and very diverse insertions between β3 and β4, in the N-terminus of A. fumigatus phytase [93] and at the very C-terminus of SixA [85].


Although large-scale sequencing projects continue to discover novel protein families awaiting function elucidation, function predictions within superfamilies are equally valuable. The histidine phosphatase superfamily, although possessing an ever-lengthening list of known functions, undoubtedly harbours many novel activities awaiting discovery. Sequence and structure analyses paint a clear picture of evolution of two deep-rooted branches, the first of them, two thirds comprising bacterial proteins, usually intracellular and of widely varied functions. The second branch, predominantly eukaryotic and extracellular, contains some well-known enzymes such as phytase and other members, the APs, of partial or inconclusive functional annotation. Humans contain several proteins from both branches, many with known or potential medical importance. Phosphatases in parasites may also have therapeutic value, whereas others have (potentially) important agricultural applications.

After years of sporadic progress, the number of structures of histidine phosphatases is rising rapidly, allowing for a review of the level of our understanding of the structure–function relationship of the superfamily. All histidine phosphatase structures contain a conserved catalytic core, and conserved ancillary residues involved in maintaining the correct orientation of the core, but vary dramatically in their proton donor and in substrate-binding residues. This structural variability of specificity determinants renders harder still function annotation in the superfamily and suggests that structural knowledge will be important, both for reliable annotation of new sequences with known activities and for prediction of novel activities, prior to experimental demonstration. Potentially, both modelling (see, e.g., [20,53,116]) and further crystal structures could be used for structure-based function predictions.

In summary, with many families and even structures of unknown function, the histidine phosphatase superfamily illustrates well the problems that have led experimentalists to develop large-scale function screens [117], crystallographers to suggest a structural genomics approach with superfamilies [118] and bioinformaticians to seek novel structure-based function assignment methods (see, e.g. [119]). An investigation of the histidine phosphatase superfamily by any or all of these methods would certainly yield many new insights.

Abbreviations: AP, acid phosphatase; 1,3-BPG, 1,3-bisphosphoglycerate; 2,3-BPG, 2,3-bisphosphoglycerate; BPGM, bisphosphoglycerate mutase; dPGM, cofactor-dependent phosphoglycerate mutase; EPP, ecdysteroid phosphate phosphatase; ER, endoplasmic reticulum; F26BP, fructose 2,6-bisphosphate; F26BPase, fructose-2,6-bisphosphatase; HPt, histidine-containing phosphotransfer; Minpp1, multiple inositol-polyphosphate phosphatase 1; PAP, prostatic acid phosphatase; PFK-2, phosphofructokinase 2; 2PGA, 2-phosphoglycerate; 3PGA, 3-phosphoglycerate; PGAM5, phosphoglycerate mutase 5; SH3, Src homology 3; TFIIC, transcription factor IIC; TIGAR, TP53 (tumour protein 53)-induced glycolysis and apoptosis regulator; TRAP, tartrate-resistant acid phosphatase; UBASH, ubiquitin-associated and SH3 domain-containing


View Abstract