β-Glucosidase from Kluyveromyces marxianus (KmBglI) belongs to the GH3 (glycoside hydrolase family 3). The enzyme is particularly unusual in that a PA14 domain (pf07691), for which a carbohydrate-binding role has been claimed, is inserted into the catalytic core sequence. In the present study, we determined the enzymatic properties and crystal structure of KmBglI in complex with glucose at a 2.55 Å (1 Å=0.1 nm) resolution. A striking characteristic of KmBglI was that the enzyme activity is essentially limited to disaccharides, and when trisaccharides were used as the substrates the activity was drastically decreased. This chain-length specificity is in sharp contrast with the preferred action on oligosaccharides of barley β-D-glucan glucohydrolase (ExoI), which does not have a PA14 domain insertion. The structure of subsite (−1) of KmBglI is almost identical with that of Thermotoga neapolitana β-glucosidase and is also similar to that of ExoI, however, the structures of subsite (+1) significantly differ among them. In KmBglI, the loops extending from the PA14 domain cover the catalytic pocket to form subsite (+1), and hence simultaneously become a steric hindrance that could limit the chain length of the substrates to be accommodated. Mutational studies demonstrated the critical role of the loop regions in determining the substrate specificity. The active-site formation mediated by the PA14 domain of KmBglI invokes α-complementation of β-galactosidase exerted by its N-terminal domain, to which the PA14 domain shows structural resemblance. The present study is the first which reveals the structural basis of the interaction between the PA14 domain and a carbohydrate.
- chain-length specificity
- crystal structure
- glycoside hydrolase family 3
- PA14 domain
- subsite (+1)
β-Glucosidases (EC 184.108.40.206) occur in most living organisms and play fundamental roles in various biological processes. The enzymes catalyse the hydrolysis of O-glucosidic bonds to release Glc (glucose) units from the non-reducing end. The β-glucosidase gene from Kluyveromyces marxianus ATCC12424 (formerly known as K. fragilis) has been cloned and its amino acid sequence determined . The enzyme was classified in the GH3 (glycoside hydrolase family 3) , one of the largest families in the CAZy database (http://www.cazy.org) . Currently, GH3 comprises approx. 2400 members. Despite the large number of the sequences, structural information are available for only five enzymes, i.e. β-D-glucan glucohydrolase from Hordeum vulgare (ExoI) , two β-N-acetylhexosaminidases from Vibrio cholerae (NagZ) and Bacillus subtilis (YbbD) (PDB code 3BMX) [5,6], exo-1,3-1,4-β-glucanase (ExoP) from Pseudoalteromonas sp. BB1 (PDB code 3F93) and β-glucosidase from Thermotoga neapolitana (Bgl3B) . Among them, ExoI, NagZ and Bgl3B have been characterized both structurally and enzymatically. The domain constitutions of the three enzymes are quite different. NagZ, which consists of 330 residues, is a one domain enzyme of the (β/α)8-barrel fold. ExoI, which is a protein of 605 residues, consists of two domains: the N-terminal (β/α)8-barrel domain and the C-terminal (α/β)6-sandwich domain (see Figure 1F). Bgl3B (721 amino acid residues) has a slightly longer C-terminal stretch than ExoI, and this C-terminal extension adopts a FnIII (fibronectin type III)-like fold, which makes Bgl3B a three-domain enzyme (see Figure 1E).
β-Glucosidase from K. marxianus ATCC12424 (845 residues) shares 26% and 27% amino acid sequence identities with ExoI and Bgl3B respectively. Interestingly, it possesses a distinct domain architecture that is clearly different to ExoI and Bgl3B, i.e. its GH3-C domain [structurally corresponding to a (α/β)6-sandwich] is divided into two units by the insertion of a PA14 domain (pf07691) . The PA14 domain, named after its location in the protective antigen of anthrax toxin , was originally identified by an iterative database search starting from the insertion sequences of GH3 enzymes . The crystal structure of anthrax toxin revealed that the PA14 domain consists of a two-layered β-sheet . The PA14 domain is found in a variety of proteins such as glycosidases, glycosyltransferases, proteases, amidases, toxins, adhesins and signalling molecules . The Flo1 protein of yeast is involved in the cell-wall sugar-mediated flocculation, and Kobayashi et al.  localized the sugar-recognition site of the protein at its N-terminal PA14 domain. Human fibrocystin-L, mutation of which causes polycystic kidney and hepatic disease, also has the PA14 domain, and several mutations have been mapped in the domain . These observations suggest that the PA14 domain has a carbohydrate-binding role, and, in some cases, it plays a fundamental role in biological events.
Currently, the PA14 domain occurs in approx. 820 sequences, and approx. 230 are present as the insertion sequences in the GH3 enzymes. The PA14 domain also occurs in GH2, 10, 20 and 31; however, these GH enzymes have the PA14 domains separately from the catalytic core domains . Consequently, the GH3 enzymes with the PA14 domain insertions should be attractive targets to elucidate the structure–function relationship of GH3 enzymes and PA14 domains, individually and in combination. As representatives of GH3 enzymes with the PA14 domain insertion, two β-glucosidases from Agrobacterium tumefaciens (AtCbgI) [13,14] and Volvariella volvacea (VvBglII)  and one β-xylosidase from Prevotella ruminicola (PrXyl3A)  have been characterized. However, our knowledge regarding the PA14 domain still remains incomplete, due to the lack of structure-based biochemical analysis.
We have succeeded in molecular cloning, purification and crystallization of β-glucosidase from K. marxianus NBRC1777 (KmBglI) which shares 98% identity with β-glucosidase from K. marxianus ATCC12424 . In the present paper, we report the detailed enzymatic and structural analyses of KmBglI. Structure-based mutational analyses revealed that the PA14 domain plays a critical role in determining the substrate specificity at subsite (+1). The binding mainly occurs at loop regions of the PA14 domain, and the critical residue for the sugar-binding is conserved among KmBglI, AtCbgI and VvBglII, but not for PrXyl3A which shows a different substrate specificity to the other three enzymes. This is the first study that unequivocally demonstrates the interaction between the PA14 domain and a carbohydrate. It should also be mentioned that the structural and functional similarities were suggested between the PA14 domain of KmBglI and the N-terminal domain of β-galactosidase from Escherichia coli.
Strains and chemicals
E. coli BL21 (DE3) cells and pET3a were purchased from Novagen (EMD Chemicals). pNP (p-nitrophenyl)-glycosides were purchased from Wako Pure Chemical Industries. Sophorose, gentiobiose and 4-MU-Glc (4-methylumbelliferyl β-glucoside) were purchased from Sigma–Aldrich. Laminari- and cello-oligosaccharides were from Seikagaku Corporation. Laminaritetraose was further purified by size-exclusion chromatography (Bio-Gel P2; Bio-Rad).
The cDNA cloning of KmBglI has been described in our previous paper . To prepare the N-terminal His6-tagged protein, the bglI gene was amplified by PCR using KOD-plus polymerase (Toyobo) with the primer pairs (BglI-f and BglI-r) listed in Supplementary Table S1 (at http://www.BiochemJ.org/bj/431/bj4310039add.htm). The forward primer contained an NdeI site and six repeating CAC codons for histidine, and the reverse primer contained a BamHI site. The amplified fragment was inserted into the corresponding sites of a pET3a vector. The resulting plasmid pET3a-KmBglI was used to transform E. coli BL21 (DE3) cells. Expression and purification of the enzyme was performed as described in the Supplementary Experimental section (at http://www.BiochemJ.org/bj/431/bj4310039add.htm). The purified protein was dialysed against 20 mM citrate/phosphate buffer (pH 6.0) and concentrated by Nanosep 10K Omega (Pall). The protein concentration was calculated from the absorbance at 280 nm with a molar absorption coefficient of 105825 M−1·cm−1 which was estimated from the amino acid composition of KmBglI.
Enzyme assays were performed at the optimal pHs of the WT (wild-type) KmBglI. The substrates used were pNP-Glc (pNP-β-D-glucopyranoside), pNP-Xyl (pNP-β-D-xylopyranoside), pNP-Fuc (pNP-β-D-fucopyranoside), pNP-Ara (pNP-α-L-arabinofuranoside), pNP-Gal (pNP-β-D-galactopyranoside), pNP-GlcA (pNP-β-D-glucuronide), pNP-GlcNAc (pNP-N-acetyl-β-D-glucosaminide) and pNP-GalNAc (pNP-N-acetyl-β-D-galactosaminide). The reaction mixture contained 50 mM citrate/phosphate buffer (pH 5.5), substrate and enzyme in a total volume of 300 μl. After incubation for an appropriate time at 30 °C, the reaction was stopped by adding 300 μl of 1 M Na2CO3. The amounts of released pNP were determined by measuring the absorbance at 405 nm. When 4-MU-Glc was used as the substrate, fluorescence was detected by excitation and emission at wavelengths of 360 and 460 nm respectively. Activities toward β-linked gluco-oligosaccharides were examined in 50 mM citrate/phosphate buffer (pH 6.0). The reactions were performed at 30 °C, and stopped by heat treatment at 95 °C for 2 min. The amount of Glc released from disaccharides was measured using a glucose hexokinase assay kit (Sigma–Aldrich), in which the concentrations of the released Glc were estimated to be half of the total Glc concentrations. Activities on laminaritriose and cellotriose were determined by analysing the reaction products using high-performance anion-exchange chromatography with a CarboPac PA1 column, followed by pulsed amperometric detection (Dionex ICS3000). The elution was performed by a linear gradient of 0–0.5 M sodium acetate in 125 mM NaOH at a flow rate of 1 ml/min for 30 min.
The kinetic parameters were calculated by curve fitting the experimental data with the Michaelis–Menten equation, using Grafit 4 (Erithacus Software). The substrate concentrations were varied from 0.3–2-fold the respective Km values, in which transglucosylation was not observed. When trisaccharides were used as the substrates, kcat/Km values were determined at the low substrate concentrations (less than one-third of the Km value for the respective disaccharides). In the case of the hydrolysis of laminaritriose by the wild-type enzyme, Glc was the sole observable product because the formed laminaribiose was immediately hydrolysed; therefore the activity was estimated by dividing the amount of liberated Glc by three. In the other assays, activities were determined within the range that the concentrations of mono- and di-saccharide formed from trisaccharide were identical.
Modes of action of WT and Δ(503–512) enzymes on oligosaccharides were examined by TLC. The reaction mixtures containing 0.1 mM laminari- or 1 mM cello-oligosaccharides in a total volume of 100 μl were incubated at 30 °C for 30 min. The amounts of WT and Δ(503–512) enzymes added to the assay mixtures were standardized according to their enzyme activities on the respective disaccharides. The reaction products were concentrated by vacuum centrifugation and spotted on to a TLC plate (Silica gel 60; Merck). The plate was developed by 80% acetonitrile, and the carbohydrates were visualized by heating the plate after briefly soaking in orcinol/H2SO4 reagent.
Enzyme preparation, crystallization and MAD (multiple anomalous dispersion) data collection of SeMet (selenomethionine)-labelled protein have been described previously . The SOLVE/RESOLVE programs were used for site detection of selenium, phase calculation and initial model building of the MAD data set . The high-resolution data set [2.15 Å (1 Å=0.1 nm)] of SeMet-KmBglI was collected using synchrotron radiation (BL17A; Photon Factory, Tsukuba, Japan). Non-labelled, non-tagged KmBglI was prepared by the same procedure as for SeMet-KmBglI. The non-labelled KmBglI was crystallized at 20 °C using a reservoir solution consisting of 40 mM potassium dihydrogen phosphate (pH 5.1), 16% (w/v) PEG [poly(ethylene glycol)] 8000, 20% (v/v) glycerol and 10 mM Glc. The crystal was flash-cooled in a nitrogen stream at 100 K, and the X-ray diffraction data set was collected using synchrotron radiation (BL6A). The data sets were processed and scaled using HKL2000 . The structure of non-labelled KmBglI was solved by starting from the refined SeMet-labelled structure. Visual inspection of the models was carried out using Coot . Water molecules were added using the built-in find-water function of Coot and individually checked for significant signal and consistent contact with H-bond donor/acceptor. Cycles of refinement without non-crystallographic symmetry restraints were run using Refmac5 . The Figures were prepared using PyMOL (DeLano Scientific; http://www.pymol.org), and the structural alignment was performed by LSQMAN . The buried surface area was calculated by PISA .
Characterization of a recombinant KmBglI
A recombinant His-tagged KmBglI was used to determine the enzymatic properties. Although six histidine residues were inserted after the N-terminal methionine residue, the numbering of the residues has been given according to the non-tagged protein (1–845) throughout the present study. The purified protein migrated as a single band with an apparent molecular mass of 95000 Da on SDS/PAGE, which closely matches the calculated molecular mass (94524 Da). Size-exclusion chromatography indicated that the native molecular mass was 390 kDa, indicating a tetramer formation (Supplementary Figure S1 at http://www.BiochemJ.org/bj/431/bj4310039add.htm). The enzyme was stable up to 45 °C for 30 min and in a pH range of 4.5–9.0 (Supplementary Figure S1). The optimum temperature was 55 °C, and the optimal pH was 5.5 for pNP-Glc and 6.0 for laminaribiose (Supplementary Figure S1). Specific activities on pNP-Glc were identical between His-tagged and non-tagged enzymes (results not shown).
The glycon specificity of KmBglI was examined using pNP-monosaccharides (Table 1). The enzyme showed the highest activity towards pNP-Glc (kcat=150 s−1; kcat/Km=1500 mM−1·s−1) and a considerable activity towards pNP-Xyl (kcat=17 s−1; kcat/Km=76 mM−1·s−1), whereas it showed faint activity on pNP-Fuc, pNP-Ara and pNP-Gal (kcat/Km=2.0, 1.5 and 0.32 mM−1·s−1 respectively). No activity was detected for pNP-GalNAc, pNP-GlcNAc and pNP-GlcA. The substrate specificity was also examined by using β-glucosyl oligosaccharides (Table 1). KmBglI hydrolysed laminaribiose (47 s−1) and cellobiose (54 s−1) at a similar catalytic centre activity, but the Km value for cellobiose (17 mM) was 18-fold higher than that for laminaribiose (0.96 mM). The enzyme hydrolysed sophorose with a catalytic efficiency (3.5 mM−1·s−1) similar to that for cellobiose (3.2 mM−1·s−1), whereas it hydrolysed gentiobiose at the lowest efficiency (0.33 mM−1·s−1) among the disaccharides. The kcat/Km values for laminaritriose (0.27 mM−1·s−1) and cellotriose (0.086 mM−1·s−1) were 180- and 37-fold lower than those for laminaribiose and cellobiose respectively.
The crystal of SeMet-labelled protein was obtained without Glc , whereas the crystal of the non-labelled protein grew only in the presence of Glc. Consequently, the crystal structures of the apo-form of SeMet-labelled protein and the Glc-complexed form of the non-labelled protein were determined at 2.15 and 2.55 Å resolutions respectively. The statistics for data collection and refinement are given in Tables 2 and 3 respectively. Both crystals belong to the space group C2 and contain four subunits per ASU (asymmetric unit). There is no remarkable difference between the SeMet-labelled and non-labelled structures. Figure 1(A) shows the Glc complex structure in the ASU. The four molecules form a tetramer related by a non-crystallographic 222 symmetry. The buried molecular surface areas are approx. 1040 Å2 (AC and BD interface) and 1060 Å2 (AB and CD interface).
Overall structure of KmBglI
Figure 1(D) shows the monomeric structure of the Glc-complexed KmBglI. All four chains in the ASU have identical domain architectures consisting of an N-terminal (β/α)8-fold-like domain (residues 1–295; blue), an (α/β)6-sandwich domain (residues 307–381 and 560–658; green), a PA14 domain (residues 392–559; yellow) and a C-terminal domain (residues 700–845; orange). The rmsd (root-mean square deviation) for the Cα atoms between each four chains was within 0.23 Å, except for the PA14 domains which contain flexible regions with disordered structures (Figure 1B and Table 3). Comparative analysis using the Dali server revealed that KmBglI adopts a similar structure to Bgl3B (Z score=46.2, rmsd=2.1 Å for 652 residues)  (Figures 1E and 2) and ExoI (Z score=35.5, rmsd=2.3 Å for 491 residues)  (Figures 1F and 2) (the secondary structures are designated as those of ExoI). The N-terminal domain of ExoI adopts a canonical (β/α)8-barrel fold, whereas the N-terminal domain of KmBglI adopts a (β/α)8-barrel-like fold, in which strand-a and strand-c are connected by a short loop (residues 22–28) and consequently two α-helices corresponding to helix-A and -B of ExoI are deleted (Figures 1G and 1I). This large deletion is also observed in Bgl3B (Figure 1H). The topology of the N-terminal domain of KmBglI is thus interpreted as a ββ(β/α)6-barrel, in which the second β-strand is antiparallel. The second domain has a six-stranded β-strand (β strand i-n) sandwiched by α-helices to form an (α/β)6-sandwich structure. The patterns of the secondary structure of this domain are similar to those of Bgl3B  and ExoI  with some exceptions (discussed later) (Figures 1D–1F and 2). Interestingly, the chain of the second domain of KmBglI is separated into two units by the insertion of a PA14 domain (residues 392–559; yellow) between β-strand-k and helix-K (Figures 1D and 2). This domain has the closest similarity to the PA14 domain of anthrax protective antigen (PDB code 1ACC) (Z score=9.4, rmsd=3.1 Å for 117 residues) , although their amino acid sequences show less than 25% identity. Both of them have a jelly-roll fold with ten antiparallel β-strands (β-strands 1–10) which form a two-layered β-sheet. The other closely related structures are the PA14 domain of C2 toxin of Clostridium botulinum (PDB code 2J42) (Z score=8.5, rmsd=3.2 Å for 121 residues)  and, interestingly, the N-terminal domain of GH2 enzymes including E. coli β-galactosidase  (Z score=8.3, rmsd=3.2 Å for 121 residues), although they have not been recognized as PA14 domains. The C-terminal domain of KmBglI has an immunoglobulin-like fold, which is almost identical with the FnIII domain of Bgl3B (Figures 1 and 2).
The catalytic core of GH3 β-glucosidases consists of the N-terminal and (α/β)6-sandwich domains [4,7]. The two domains of KmBglI superimpose well with those of Bgl3B (Z score=41.4, rmsd=2.0 Å for 455 residues) and ExoI (Z score=34.0, rmsd=2.3 Å for 437 residues), and they share a common domain orientation. The buried areas of the N-terminal-(α/β)6-sandwich and N-terminal–C-terminal domain interfaces are approx. 1560 Å2 and 1100 Å2 respectively. The PA14 domains are also tightly associated with the other domains (average buried surface area is approx. 1200 Å2 in chains A, C and D); but some residues in chain B are disordered.
In the Glc-complex structure, each subunit in the ASU holds one Glc molecule. All Glc molecules are present in the β-anomeric state and adopt a 4C1 chair conformation (Figure 1C). The Glc molecule was bound in a cleft located between the N-terminal and (α/β)6-sandwich domains, and the cleft is covered by the loops (including Phe445 and Phe508, see below) of the PA14 domain. The β-anomeric hydroxy group forms a hydrogen bond with Glu590 (2.6 Å), and the anomeric carbon is located at a distance of 2.9 Å from Asp225. In the SeMet-labelled protein, a glycerol molecule (cryoprotectant) occupies each of the four Glc-binding sites (Supplementary Figure S2 at http://www.BiochemJ.org/bj/431/bj4310039add.htm).
Characterization of mutant forms of KmBglI
To better understand the catalytic mechanism of KmBglI, mutant enzymes [D225A, F445A, F508A, E590A and Δ(503–512)] were constructed and characterized (see the Supplementary Experimental section). Asp225 and Glu590 were assumed to be the catalytic nucleophile and acid/base residues respectively. Phe445, Phe508 and the region 503–512 were assumed to be involved in the subsite (+1) formation. The kinetic parameters of D225A and E590A mutants for 4-MU-Glc are summarized in Supplementary Table S2 (at http://www.BiochemJ.org/bj/431/bj4310039add.htm). The Km and kcat values of F445A and F508A for pNP-Glc were slightly decreased as compared with those of the WT enzyme (Table 4). The F445A substitution did not significantly alter the kinetic parameters for laminari- and cello-biose, whereas the F508A replacement led to an approx. 6-fold increase of the Km values and 4-fold decrease of the kcat values for the disaccharides. The kinetic parameters of the deletion mutant Δ(503–512) for pNP-Glc, laminaribiose and cellobiose were not significantly different to those of the F508A mutant. The modes of action of WT and Δ(503–512) enzymes on laminari- and cello-oligosaccharides was examined by TLC (Figures 3A and 3B). The WT enzyme hydrolysed the disaccharides significantly faster than the tri- and tetra-saccharides, whereas the Δ(503–512) mutant was capable of releasing glucose from the oligosaccharides (degree of polymerization ≥3) at a comparable level with when the disaccharides were used as substrates. The kcat/Km values of Δ(503–512) mutant for laminari- and cello-trioses were calculated to be 0.17 and 0.02 mM−1·s−1, being approx. 63% and 23% as compared with the values of the WT enzyme respectively.
Tetramer formation of KmBglI
KmBglI forms a tetramer with tight subunit–subunit interactions in the crystal to reflect the tetrameric form in solution. Most of the reported GH3 enzymes, including ExoI , exist as monomers, but Clostridium thermocellum β-glucosidase (CtBglB) , VvBglII  and PrXyl3A  have been shown to form a dimer, tetramer and nonamer respectively. The crystal structure of KmBglI suggests that the tetramer formation is not essential for catalysis because the subunit interfaces are far from the Glc-binding site.
Structure and specificity of subsite (−1)
GH3 includes enzymes acting on β-glucoside, β-xyloside, β-N-acetylglucosaminide and α-L-arabinofuranoside. Interestingly, the glycon specificity is diverse across the members; some are quite specific to one glycosidic linkage, whereas others can act on two or more linkages. KmBglI hydrolyses pNP-Glc with a high catalytic efficiency and hydrolyses pNP-Xyl, pNP-Fuc, pNP-Ara and pNP-Gal with lower efficiencies (Table 1). TnBglB, the closest homologue of Bgl3B (96% identity), hydrolyses pNP-Glc most efficiently and pNP-Xyl, pNP-Fuc and pNP-Gal at the levels of 2–43% of pNP-Glc . In contrast, ExoI is reported to be specific to β-glucoside .
The KmBglI–Glc complex structure was compared with the Glc complex structure of Bgl3B (Figure 4A)  and the S-laminaribioside (Glcβ1–3SGlcβ1–3OGlc–thiopNP) complex structure of ExoI (Figure 4B) . The Glc molecule of KmBglI overlapped with the Glc in Bgl3B and the non-reducing end Glc of S-laminaribioside in ExoI; thus it must occupy the subsite (−1). The catalytic residues of Bgl3B (Asp242 and Glu458) and ExoI (Asp285 and Glu491) are well overlapped on Asp225 and Glu590 of KmBglI. The results of the mutational study revealed the importance of these residues (Supplementary Table S2). Pozzo et al.  reported that the rotamer angle of the tryptophan residue that is next to the nucleophile significantly differs between the structures of Bgl3B (χ=−92 °) and ExoI (χ=−169 °). ExoI uses this tryptophan residue for the formation of subsite (+1) , whereas Bgl3B uses it for the formation of subsite (−1). In KmBglI, the rotamer angle of the tryptophan residue (Trp226) is quite similar (χ=−90 °) to that of Blg3B, and the structure of subsite (−1) of KmBglI is almost identical with that of Bgl3B. The strict substrate specificity of ExoI and the relatively relaxed substrate specificities of Bgl3B and KmBglI may arise from their structural differences at subsite (−1).
Structure and specificity of subsite (+1)
KmBglI hydrolysed sophorose, laminaribiose, cellobiose and gentiobiose with varied catalytic efficiencies (0.33–49 mM−1·s−1), whereas ExoI can hydrolyse them at a comparable level . In the S-laminaribioside complex structure of ExoI, the reducing end Glc is clamped by Trp286 and Trp434 at subsite (+1) (Figures 4B and 4E). When a laminaribiose molecule was modelled into the structure of KmBglI by reference to S-laminaribioside in ExoI, the side chains of Phe445 and Phe508 from the PA14 domain sandwich the Glc molecule (Figures 4B and 4C). Phe445 of KmBglI is located at the structurally neighbouring position of Trp286 of ExoI, and its side chain forms hydrophobic interactions with C5 and C6 of the Glc. This residue is not strongly conserved among the PA14 domains inserted into GH3 enzymes (Figure 5), and the F445A substitution had a moderate effect on the activity (Table 4). These results suggest that Phe445 is important, but not essential, for the catalytic activity of KmBglI. On the other hand, the side chain of Phe508 in the loop linking β-strands 7 and 8 (7-8loop) (Figure 5) is located at an almost identical position with that of Trp434 of ExoI, although the positions of their Cα atoms are 8.3 Å apart (Figure 4B). The F508A substitution had a pronounced effect on the activities for disaccharides. The catalytic efficiencies toward laminaribiose and cellobiose decreased to 5 and 3% of the WT enzyme respectively. These results, and its conservation among the PA14 domain sequences inserted into GH3 (Figure 5), indicate that Phe508 plays a significant role in the catalytic activity. The PA14 domain-mediated subsite (+1) formation of KmBglI is thus totally different to that of ExoI, for which Trp286 from the (β/α)8-fold domain and Trp434 from the (α/β)6-sandwich domain are responsible. This structural difference may explain the difference in the catalytic efficiencies between the enzymes towards β-glucosides. In Bgl3B, the edge strand in the (α/β)6-sandwich domain (β-strand-k of ExoI and KmBglI) is missing and the corresponding region is replaced by an α-helix and a long unstructured loop (residues 409–434) (Figure 2). This loop, although its tip is not visible in the structure, is presumably involved in subsite (+1) formation (Figure 4D) . These findings suggests that the three enzymes have independently evolved with respect to subsite (+1) formation.
Several GH3 enzymes effectively hydrolyse oligosaccharides. The ratios of the kcat/Km values of ExoI  and the relative activities of CtBglB  and Azospirillum irakense β-glucosidase (AiCelA)  for cellobiose/cellotriose/cellotetraose are reported to be 0.1:1:1.5, 3.2:1:0.9 and 2.6:1:1 respectively (Figure 3C). In contrast, KmBglI, AtCbgI and VvBglII, all of which possess a PA14 domain insertion, are quite specific towards small substrates. The kcat/Km ratios of KmBglI for laminaribiose/laminaritriose and cellobiose/cellotriose are 180:1 and 37:1 respectively (Figure 3C), and AtCbgI and VvBglII cannot release glucose from pNP-cellobioside [14,15]. The difference of the chain-length specificities between ExoI and KmBglI can be explained by the structural difference at the entrances of their catalytic pockets. The active site of ExoI is covered by a loop between strand-j and helix-J2 (30 residues including Trp434) from the (α/β)6-sandwich domain, which results in the formation of a deep pocket (Figure 4E). In contrast, the corresponding loop of KmBglI takes a different conformation and yields a shallow pocket. Instead, the PA14 domain covers the active site and consequently deepens it. The PA14 domain approaches the active site via two loops; one is a loop linking β-strand 3 and 4 (residues 444–459 including Phe445) and the other is the 7-8loop (residues 498–514 including Phe508). The latter may sterically hinder the binding of long-chain oligosaccharides (Figure 4C and see below). PrXyl3A, which also possesses a PA14 domain insertion, is able to act on xylo-oligosaccharides, as well as xylobiose . Sequence alignment of the PA14 domains from four enzymes showed that PrXyl3A lacks the region corresponding to the 7-8loop of KmBglI (Figure 5). Accordingly, we constructed the loop deletion mutant, Δ(503–512), and examined its substrate specificity (Figure 3 and Table 4). The deletion caused a drastic decrease of the kcat/Km values for the disaccharides to 1% of the WT enzyme, probably due to the lack of Phe508. However, remarkably, the mutant retained 63% and 23% activities towards laminari- and cello-trioses respectively, as compared with the WT enzyme. Consequently, the ratios of the kcat/Km values of the mutant for laminaribiose/laminaritriose and cellobiose/cellotriose became 3:1 and 2:1 respectively, indicating that the mutant no longer showed the chain-length specificity (Figure 3C).
In ExoI, a helix-like loop that connects the (β/α)8-barrel and (α/β)6-sandwich domains could act as a hinge that allows the domains to move relative to each other during successive catalytic events [32–34]. In KmBglI, the N-terminal and (α/β)6-sandwich domains are also connected by a linker, but the domains are unlikely to move because the tetrameric assembly would limit such movement. Alternatively, the subsite (+1) could become flexible by the significantly higher B-factor of the PA14 domain to facilitate the capture of the substrates (Figure 1B and Table 3). Formation of the catalytic pocket via a domain–domain interaction has been reported in several GHs. For example, an active site of E. coli β-galactosidase (LacZ) is formed by the interaction (known as α-complementation) between the (β/α)8-barrel domain and the N-terminal domain that adopts a similar structure to the PA14 domain (Z score=8.3, rmsd=3.2 Å for 121 residues) . Interestingly, both of the PA14 domain of KmBglI and the N-terminal domain of LacZ participate in their active site formation via loops extending from the same edge of the domains (results not shown).
The PA14 domain frequently occurs in GHs, bacterial toxins, yeast adhesins and mammalian signalling molecules , and the carbohydrate-binding function has long been claimed without any structural evidence. In the present study, using the KmBglI protein, we revealed the structural basis of the interaction between the PA14 domain and a carbohydrate. Unfortunately, since the sequences of PA14 domains are not strongly conserved, especially at the loop regions, the role(s) of the domain cannot be generalized at present. But, particularly for GH3 members, the domain could form the subsite(s) and affect the substrate specificity. The overall structure of KmBglI is similar to those of Bgl3B and ExoI, indicating that these enzymes have evolved from a common ancestor. However, their active-site structures are quite different. KmBglI appears to have evolved to become more active on small substrates by accepting the insertion of the PA14 domain into the (α/β)6-sandwich.
Erina Yoshida performed the biochemical experiments, interpreted the results and participated in the writing of the paper. Masafumi Hidaka performed the structural analysis, interpreted the results and participated in the writing of the paper. Shinya Fushinobu supervised the structural studies. Takashi Koyanagi, Hiromichi Minami and Hisanori Tamaki provided their expertise in genetic experiments. Motomitsu Kitaoka provided his expertise in enzymology. Takane Katayama designed the research, interpreted the results and edited the paper. Hidehiko Kumagai designed the research and obtained the funding for the work.
This work was supported, in part, by a Grant-In-Aid from the New Energy and Industrial Technology Development Organization (NEDO), Japan.
We thank the staff of the Photon Factory for the X-ray data collection and Dr N. Nagano for helpful comments on our manuscript.
The atomic co-ordinates and structure factors (PDB codes 3ABZ and 3AC0) have been deposited in the Protein Data Bank.
Abbreviations: ASU, asymmetric unit; AtCbgI, Agrobacterium tumefaciens β-glucosidase; Bgl3B, Thermotoga neapolitana DSM4359 β-glucosidase; CtBglB, Clostridium thermocellum β-glucosidase; ExoI, Hordeum vulgare β-D-glucan glucohydrolase; FnIII, fibronectin type III; GH, glycoside hydrolase; KmBglI, Kluyveromyces marxianus NBRC1777 β-glucosidase; MAD, multiple anomalous dispersion; 4-MU, 4-methylumbelliferyl; NagZ, Vibrio cholerae β-N-acetylhexosaminidase; pNP, p-nitrophenyl; PrXyl3A, Prevotella ruminicola β-xylosidase; rmsd, root-mean square deviation; SeMet, selenomethionine; TnBglB, Thermotoga neapolitana Z2706-MC24 β-glucosidase; VvBglII, Volvariella volvacea β-glucosidase II; WT, wild-type
- © The Authors Journal compilation © 2010 Biochemical Society