Biochemical Journal

Research article

Role of a PA14 domain in determining substrate specificity of a glycoside hydrolase family 3 β-glucosidase from Kluyveromyces marxianus

Erina Yoshida, Masafumi Hidaka, Shinya Fushinobu, Takashi Koyanagi, Hiromichi Minami, Hisanori Tamaki, Motomitsu Kitaoka, Takane Katayama, Hidehiko Kumagai


β-Glucosidase from Kluyveromyces marxianus (KmBglI) belongs to the GH3 (glycoside hydrolase family 3). The enzyme is particularly unusual in that a PA14 domain (pf07691), for which a carbohydrate-binding role has been claimed, is inserted into the catalytic core sequence. In the present study, we determined the enzymatic properties and crystal structure of KmBglI in complex with glucose at a 2.55 Å (1 Å=0.1 nm) resolution. A striking characteristic of KmBglI was that the enzyme activity is essentially limited to disaccharides, and when trisaccharides were used as the substrates the activity was drastically decreased. This chain-length specificity is in sharp contrast with the preferred action on oligosaccharides of barley β-D-glucan glucohydrolase (ExoI), which does not have a PA14 domain insertion. The structure of subsite (−1) of KmBglI is almost identical with that of Thermotoga neapolitana β-glucosidase and is also similar to that of ExoI, however, the structures of subsite (+1) significantly differ among them. In KmBglI, the loops extending from the PA14 domain cover the catalytic pocket to form subsite (+1), and hence simultaneously become a steric hindrance that could limit the chain length of the substrates to be accommodated. Mutational studies demonstrated the critical role of the loop regions in determining the substrate specificity. The active-site formation mediated by the PA14 domain of KmBglI invokes α-complementation of β-galactosidase exerted by its N-terminal domain, to which the PA14 domain shows structural resemblance. The present study is the first which reveals the structural basis of the interaction between the PA14 domain and a carbohydrate.

  • chain-length specificity
  • crystal structure
  • β-glucosidase
  • glycoside hydrolase family 3
  • PA14 domain
  • subsite (+1)


β-Glucosidases (EC occur in most living organisms and play fundamental roles in various biological processes. The enzymes catalyse the hydrolysis of O-glucosidic bonds to release Glc (glucose) units from the non-reducing end. The β-glucosidase gene from Kluyveromyces marxianus ATCC12424 (formerly known as K. fragilis) has been cloned and its amino acid sequence determined [1]. The enzyme was classified in the GH3 (glycoside hydrolase family 3) [2], one of the largest families in the CAZy database ( [3]. Currently, GH3 comprises approx. 2400 members. Despite the large number of the sequences, structural information are available for only five enzymes, i.e. β-D-glucan glucohydrolase from Hordeum vulgare (ExoI) [4], two β-N-acetylhexosaminidases from Vibrio cholerae (NagZ) and Bacillus subtilis (YbbD) (PDB code 3BMX) [5,6], exo-1,3-1,4-β-glucanase (ExoP) from Pseudoalteromonas sp. BB1 (PDB code 3F93) and β-glucosidase from Thermotoga neapolitana (Bgl3B) [7]. Among them, ExoI, NagZ and Bgl3B have been characterized both structurally and enzymatically. The domain constitutions of the three enzymes are quite different. NagZ, which consists of 330 residues, is a one domain enzyme of the (β/α)8-barrel fold. ExoI, which is a protein of 605 residues, consists of two domains: the N-terminal (β/α)8-barrel domain and the C-terminal (α/β)6-sandwich domain (see Figure 1F). Bgl3B (721 amino acid residues) has a slightly longer C-terminal stretch than ExoI, and this C-terminal extension adopts a FnIII (fibronectin type III)-like fold, which makes Bgl3B a three-domain enzyme (see Figure 1E).

Figure 1 The overall structure of KmBglI, T. neapolitana β-glucosidase (Bgl3B) and barley β-D-glucan glucohydrolase (ExoI)

The N-terminal domain (blue), (α/β)6-sandwich domain (green), PA14 domain (yellow), C-terminal domain (orange) and linker region (black) are shown. The bound Glc (KmBglI and Bgl3B) and S-laminaribioside (ExoI) molecules are shown as sphere models (red). (A) A tetrameric structure of the Glc-complex structure of KmBglI. Chain A is shown as a ribbon model and the other chains are shown as Cα trace models. (B) Structural differences between the four PA14 domains of KmBglI. The chains are superimposed by excluding the PA14 domain, and coloured as described above. (C) The liganded Glc molecule is shown as a ball-and-stick model with the |Fo|−|Fc| electron density map (2.5σ). The broken lines indicate hydrogen bonds. Residues are coloured as described above. (DF) The monomer structures of KmBglI (D), Bgl3B (E) and ExoI (F). Bound molecules in active site and the catalytic residues are shown as sphere and stick models respectively. (GI) The structural differences among N-terminal domains of KmBglI (G), Bgl3B (H) and ExoI (I). The domains are shown as ribbon diagrams, and A−1 and A−2 helices located at the N-terminus of the domain are coloured in cyan. The secondary structures that form the (β/α)8-barrel scaffold are labelled. The topologically different chains are coloured in magenta.

β-Glucosidase from K. marxianus ATCC12424 (845 residues) shares 26% and 27% amino acid sequence identities with ExoI and Bgl3B respectively. Interestingly, it possesses a distinct domain architecture that is clearly different to ExoI and Bgl3B, i.e. its GH3-C domain [structurally corresponding to a (α/β)6-sandwich] is divided into two units by the insertion of a PA14 domain (pf07691) [8]. The PA14 domain, named after its location in the protective antigen of anthrax toxin [9], was originally identified by an iterative database search starting from the insertion sequences of GH3 enzymes [10]. The crystal structure of anthrax toxin revealed that the PA14 domain consists of a two-layered β-sheet [9]. The PA14 domain is found in a variety of proteins such as glycosidases, glycosyltransferases, proteases, amidases, toxins, adhesins and signalling molecules [10]. The Flo1 protein of yeast is involved in the cell-wall sugar-mediated flocculation, and Kobayashi et al. [11] localized the sugar-recognition site of the protein at its N-terminal PA14 domain. Human fibrocystin-L, mutation of which causes polycystic kidney and hepatic disease, also has the PA14 domain, and several mutations have been mapped in the domain [12]. These observations suggest that the PA14 domain has a carbohydrate-binding role, and, in some cases, it plays a fundamental role in biological events.

Currently, the PA14 domain occurs in approx. 820 sequences, and approx. 230 are present as the insertion sequences in the GH3 enzymes. The PA14 domain also occurs in GH2, 10, 20 and 31; however, these GH enzymes have the PA14 domains separately from the catalytic core domains [10]. Consequently, the GH3 enzymes with the PA14 domain insertions should be attractive targets to elucidate the structure–function relationship of GH3 enzymes and PA14 domains, individually and in combination. As representatives of GH3 enzymes with the PA14 domain insertion, two β-glucosidases from Agrobacterium tumefaciens (AtCbgI) [13,14] and Volvariella volvacea (VvBglII) [15] and one β-xylosidase from Prevotella ruminicola (PrXyl3A) [16] have been characterized. However, our knowledge regarding the PA14 domain still remains incomplete, due to the lack of structure-based biochemical analysis.

We have succeeded in molecular cloning, purification and crystallization of β-glucosidase from K. marxianus NBRC1777 (KmBglI) which shares 98% identity with β-glucosidase from K. marxianus ATCC12424 [17]. In the present paper, we report the detailed enzymatic and structural analyses of KmBglI. Structure-based mutational analyses revealed that the PA14 domain plays a critical role in determining the substrate specificity at subsite (+1). The binding mainly occurs at loop regions of the PA14 domain, and the critical residue for the sugar-binding is conserved among KmBglI, AtCbgI and VvBglII, but not for PrXyl3A which shows a different substrate specificity to the other three enzymes. This is the first study that unequivocally demonstrates the interaction between the PA14 domain and a carbohydrate. It should also be mentioned that the structural and functional similarities were suggested between the PA14 domain of KmBglI and the N-terminal domain of β-galactosidase from Escherichia coli.


Strains and chemicals

E. coli BL21 (DE3) cells and pET3a were purchased from Novagen (EMD Chemicals). pNP (p-nitrophenyl)-glycosides were purchased from Wako Pure Chemical Industries. Sophorose, gentiobiose and 4-MU-Glc (4-methylumbelliferyl β-glucoside) were purchased from Sigma–Aldrich. Laminari- and cello-oligosaccharides were from Seikagaku Corporation. Laminaritetraose was further purified by size-exclusion chromatography (Bio-Gel P2; Bio-Rad).

Enzyme preparation

The cDNA cloning of KmBglI has been described in our previous paper [17]. To prepare the N-terminal His6-tagged protein, the bglI gene was amplified by PCR using KOD-plus polymerase (Toyobo) with the primer pairs (BglI-f and BglI-r) listed in Supplementary Table S1 (at The forward primer contained an NdeI site and six repeating CAC codons for histidine, and the reverse primer contained a BamHI site. The amplified fragment was inserted into the corresponding sites of a pET3a vector. The resulting plasmid pET3a-KmBglI was used to transform E. coli BL21 (DE3) cells. Expression and purification of the enzyme was performed as described in the Supplementary Experimental section (at The purified protein was dialysed against 20 mM citrate/phosphate buffer (pH 6.0) and concentrated by Nanosep 10K Omega (Pall). The protein concentration was calculated from the absorbance at 280 nm with a molar absorption coefficient of 105825 M−1·cm−1 which was estimated from the amino acid composition of KmBglI.

Enzyme assay

Enzyme assays were performed at the optimal pHs of the WT (wild-type) KmBglI. The substrates used were pNP-Glc (pNP-β-D-glucopyranoside), pNP-Xyl (pNP-β-D-xylopyranoside), pNP-Fuc (pNP-β-D-fucopyranoside), pNP-Ara (pNP-α-L-arabinofuranoside), pNP-Gal (pNP-β-D-galactopyranoside), pNP-GlcA (pNP-β-D-glucuronide), pNP-GlcNAc (pNP-N-acetyl-β-D-glucosaminide) and pNP-GalNAc (pNP-N-acetyl-β-D-galactosaminide). The reaction mixture contained 50 mM citrate/phosphate buffer (pH 5.5), substrate and enzyme in a total volume of 300 μl. After incubation for an appropriate time at 30 °C, the reaction was stopped by adding 300 μl of 1 M Na2CO3. The amounts of released pNP were determined by measuring the absorbance at 405 nm. When 4-MU-Glc was used as the substrate, fluorescence was detected by excitation and emission at wavelengths of 360 and 460 nm respectively. Activities toward β-linked gluco-oligosaccharides were examined in 50 mM citrate/phosphate buffer (pH 6.0). The reactions were performed at 30 °C, and stopped by heat treatment at 95 °C for 2 min. The amount of Glc released from disaccharides was measured using a glucose hexokinase assay kit (Sigma–Aldrich), in which the concentrations of the released Glc were estimated to be half of the total Glc concentrations. Activities on laminaritriose and cellotriose were determined by analysing the reaction products using high-performance anion-exchange chromatography with a CarboPac PA1 column, followed by pulsed amperometric detection (Dionex ICS3000). The elution was performed by a linear gradient of 0–0.5 M sodium acetate in 125 mM NaOH at a flow rate of 1 ml/min for 30 min.

The kinetic parameters were calculated by curve fitting the experimental data with the Michaelis–Menten equation, using Grafit 4 (Erithacus Software). The substrate concentrations were varied from 0.3–2-fold the respective Km values, in which transglucosylation was not observed. When trisaccharides were used as the substrates, kcat/Km values were determined at the low substrate concentrations (less than one-third of the Km value for the respective disaccharides). In the case of the hydrolysis of laminaritriose by the wild-type enzyme, Glc was the sole observable product because the formed laminaribiose was immediately hydrolysed; therefore the activity was estimated by dividing the amount of liberated Glc by three. In the other assays, activities were determined within the range that the concentrations of mono- and di-saccharide formed from trisaccharide were identical.


Modes of action of WT and Δ(503–512) enzymes on oligosaccharides were examined by TLC. The reaction mixtures containing 0.1 mM laminari- or 1 mM cello-oligosaccharides in a total volume of 100 μl were incubated at 30 °C for 30 min. The amounts of WT and Δ(503–512) enzymes added to the assay mixtures were standardized according to their enzyme activities on the respective disaccharides. The reaction products were concentrated by vacuum centrifugation and spotted on to a TLC plate (Silica gel 60; Merck). The plate was developed by 80% acetonitrile, and the carbohydrates were visualized by heating the plate after briefly soaking in orcinol/H2SO4 reagent.


Enzyme preparation, crystallization and MAD (multiple anomalous dispersion) data collection of SeMet (selenomethionine)-labelled protein have been described previously [17]. The SOLVE/RESOLVE programs were used for site detection of selenium, phase calculation and initial model building of the MAD data set [18]. The high-resolution data set [2.15 Å (1 Å=0.1 nm)] of SeMet-KmBglI was collected using synchrotron radiation (BL17A; Photon Factory, Tsukuba, Japan). Non-labelled, non-tagged KmBglI was prepared by the same procedure as for SeMet-KmBglI. The non-labelled KmBglI was crystallized at 20 °C using a reservoir solution consisting of 40 mM potassium dihydrogen phosphate (pH 5.1), 16% (w/v) PEG [poly(ethylene glycol)] 8000, 20% (v/v) glycerol and 10 mM Glc. The crystal was flash-cooled in a nitrogen stream at 100 K, and the X-ray diffraction data set was collected using synchrotron radiation (BL6A). The data sets were processed and scaled using HKL2000 [19]. The structure of non-labelled KmBglI was solved by starting from the refined SeMet-labelled structure. Visual inspection of the models was carried out using Coot [20]. Water molecules were added using the built-in find-water function of Coot and individually checked for significant signal and consistent contact with H-bond donor/acceptor. Cycles of refinement without non-crystallographic symmetry restraints were run using Refmac5 [21]. The Figures were prepared using PyMOL (DeLano Scientific;, and the structural alignment was performed by LSQMAN [23]. The buried surface area was calculated by PISA [24].


Characterization of a recombinant KmBglI

A recombinant His-tagged KmBglI was used to determine the enzymatic properties. Although six histidine residues were inserted after the N-terminal methionine residue, the numbering of the residues has been given according to the non-tagged protein (1–845) throughout the present study. The purified protein migrated as a single band with an apparent molecular mass of 95000 Da on SDS/PAGE, which closely matches the calculated molecular mass (94524 Da). Size-exclusion chromatography indicated that the native molecular mass was 390 kDa, indicating a tetramer formation (Supplementary Figure S1 at The enzyme was stable up to 45 °C for 30 min and in a pH range of 4.5–9.0 (Supplementary Figure S1). The optimum temperature was 55 °C, and the optimal pH was 5.5 for pNP-Glc and 6.0 for laminaribiose (Supplementary Figure S1). Specific activities on pNP-Glc were identical between His-tagged and non-tagged enzymes (results not shown).

The glycon specificity of KmBglI was examined using pNP-monosaccharides (Table 1). The enzyme showed the highest activity towards pNP-Glc (kcat=150 s1; kcat/Km=1500 mM1·s1) and a considerable activity towards pNP-Xyl (kcat=17 s1; kcat/Km=76 mM1·s1), whereas it showed faint activity on pNP-Fuc, pNP-Ara and pNP-Gal (kcat/Km=2.0, 1.5 and 0.32 mM1·s1 respectively). No activity was detected for pNP-GalNAc, pNP-GlcNAc and pNP-GlcA. The substrate specificity was also examined by using β-glucosyl oligosaccharides (Table 1). KmBglI hydrolysed laminaribiose (47 s1) and cellobiose (54 s1) at a similar catalytic centre activity, but the Km value for cellobiose (17 mM) was 18-fold higher than that for laminaribiose (0.96 mM). The enzyme hydrolysed sophorose with a catalytic efficiency (3.5 mM1·s1) similar to that for cellobiose (3.2 mM1·s1), whereas it hydrolysed gentiobiose at the lowest efficiency (0.33 mM−1·s−1) among the disaccharides. The kcat/Km values for laminaritriose (0.27 mM1·s1) and cellotriose (0.086 mM1·s1) were 180- and 37-fold lower than those for laminaribiose and cellobiose respectively.

View this table:
Table 1 Substrate specificity of KmBglI


The crystal of SeMet-labelled protein was obtained without Glc [17], whereas the crystal of the non-labelled protein grew only in the presence of Glc. Consequently, the crystal structures of the apo-form of SeMet-labelled protein and the Glc-complexed form of the non-labelled protein were determined at 2.15 and 2.55 Å resolutions respectively. The statistics for data collection and refinement are given in Tables 2 and 3 respectively. Both crystals belong to the space group C2 and contain four subunits per ASU (asymmetric unit). There is no remarkable difference between the SeMet-labelled and non-labelled structures. Figure 1(A) shows the Glc complex structure in the ASU. The four molecules form a tetramer related by a non-crystallographic 222 symmetry. The buried molecular surface areas are approx. 1040 Å2 (AC and BD interface) and 1060 Å2 (AB and CD interface).

View this table:
Table 2 Data collection statistics

Numbers in parentheses correspond to the shell of data at the highest resolution.

View this table:
Table 3 Refinement statistics and contents in the ASU

Overall structure of KmBglI

Figure 1(D) shows the monomeric structure of the Glc-complexed KmBglI. All four chains in the ASU have identical domain architectures consisting of an N-terminal (β/α)8-fold-like domain (residues 1–295; blue), an (α/β)6-sandwich domain (residues 307–381 and 560–658; green), a PA14 domain (residues 392–559; yellow) and a C-terminal domain (residues 700–845; orange). The rmsd (root-mean square deviation) for the Cα atoms between each four chains was within 0.23 Å, except for the PA14 domains which contain flexible regions with disordered structures (Figure 1B and Table 3). Comparative analysis using the Dali server revealed that KmBglI adopts a similar structure to Bgl3B (Z score=46.2, rmsd=2.1 Å for 652 residues) [7] (Figures 1E and 2) and ExoI (Z score=35.5, rmsd=2.3 Å for 491 residues) [4] (Figures 1F and 2) (the secondary structures are designated as those of ExoI). The N-terminal domain of ExoI adopts a canonical (β/α)8-barrel fold, whereas the N-terminal domain of KmBglI adopts a (β/α)8-barrel-like fold, in which strand-a and strand-c are connected by a short loop (residues 22–28) and consequently two α-helices corresponding to helix-A and -B of ExoI are deleted (Figures 1G and 1I). This large deletion is also observed in Bgl3B (Figure 1H). The topology of the N-terminal domain of KmBglI is thus interpreted as a ββ(β/α)6-barrel, in which the second β-strand is antiparallel. The second domain has a six-stranded β-strand (β strand i-n) sandwiched by α-helices to form an (α/β)6-sandwich structure. The patterns of the secondary structure of this domain are similar to those of Bgl3B [7] and ExoI [4] with some exceptions (discussed later) (Figures 1D–1F and 2). Interestingly, the chain of the second domain of KmBglI is separated into two units by the insertion of a PA14 domain (residues 392–559; yellow) between β-strand-k and helix-K (Figures 1D and 2). This domain has the closest similarity to the PA14 domain of anthrax protective antigen (PDB code 1ACC) (Z score=9.4, rmsd=3.1 Å for 117 residues) [9], although their amino acid sequences show less than 25% identity. Both of them have a jelly-roll fold with ten antiparallel β-strands (β-strands 1–10) which form a two-layered β-sheet. The other closely related structures are the PA14 domain of C2 toxin of Clostridium botulinum (PDB code 2J42) (Z score=8.5, rmsd=3.2 Å for 121 residues) [25] and, interestingly, the N-terminal domain of GH2 enzymes including E. coli β-galactosidase [26] (Z score=8.3, rmsd=3.2 Å for 121 residues), although they have not been recognized as PA14 domains. The C-terminal domain of KmBglI has an immunoglobulin-like fold, which is almost identical with the FnIII domain of Bgl3B (Figures 1 and 2).

Figure 2 Comparison of the amino acid sequences of KmBglI, Bgl3B and ExoI

The secondary structures and their designations are shown: arrows and coils represent the β-strands and α-helices respectively. The residues involved in substrate recognition are indicated by filled triangles. The structure-based sequences alignment was carried out by MATRAS [36] and the secondary structure was assigned with DSSP [37].

The catalytic core of GH3 β-glucosidases consists of the N-terminal and (α/β)6-sandwich domains [4,7]. The two domains of KmBglI superimpose well with those of Bgl3B (Z score=41.4, rmsd=2.0 Å for 455 residues) and ExoI (Z score=34.0, rmsd=2.3 Å for 437 residues), and they share a common domain orientation. The buried areas of the N-terminal-(α/β)6-sandwich and N-terminal–C-terminal domain interfaces are approx. 1560 Å2 and 1100 Å2 respectively. The PA14 domains are also tightly associated with the other domains (average buried surface area is approx. 1200 Å2 in chains A, C and D); but some residues in chain B are disordered.

Glucose-binding site

In the Glc-complex structure, each subunit in the ASU holds one Glc molecule. All Glc molecules are present in the β-anomeric state and adopt a 4C1 chair conformation (Figure 1C). The Glc molecule was bound in a cleft located between the N-terminal and (α/β)6-sandwich domains, and the cleft is covered by the loops (including Phe445 and Phe508, see below) of the PA14 domain. The β-anomeric hydroxy group forms a hydrogen bond with Glu590 (2.6 Å), and the anomeric carbon is located at a distance of 2.9 Å from Asp225. In the SeMet-labelled protein, a glycerol molecule (cryoprotectant) occupies each of the four Glc-binding sites (Supplementary Figure S2 at

Characterization of mutant forms of KmBglI

To better understand the catalytic mechanism of KmBglI, mutant enzymes [D225A, F445A, F508A, E590A and Δ(503–512)] were constructed and characterized (see the Supplementary Experimental section). Asp225 and Glu590 were assumed to be the catalytic nucleophile and acid/base residues respectively. Phe445, Phe508 and the region 503–512 were assumed to be involved in the subsite (+1) formation. The kinetic parameters of D225A and E590A mutants for 4-MU-Glc are summarized in Supplementary Table S2 (at The Km and kcat values of F445A and F508A for pNP-Glc were slightly decreased as compared with those of the WT enzyme (Table 4). The F445A substitution did not significantly alter the kinetic parameters for laminari- and cello-biose, whereas the F508A replacement led to an approx. 6-fold increase of the Km values and 4-fold decrease of the kcat values for the disaccharides. The kinetic parameters of the deletion mutant Δ(503–512) for pNP-Glc, laminaribiose and cellobiose were not significantly different to those of the F508A mutant. The modes of action of WT and Δ(503–512) enzymes on laminari- and cello-oligosaccharides was examined by TLC (Figures 3A and 3B). The WT enzyme hydrolysed the disaccharides significantly faster than the tri- and tetra-saccharides, whereas the Δ(503–512) mutant was capable of releasing glucose from the oligosaccharides (degree of polymerization ≥3) at a comparable level with when the disaccharides were used as substrates. The kcat/Km values of Δ(503–512) mutant for laminari- and cello-trioses were calculated to be 0.17 and 0.02 mM−1·s−1, being approx. 63% and 23% as compared with the values of the WT enzyme respectively.

View this table:
Table 4 Kinetic parameters of mutant KmBglI proteins on various substrates

Numbers in square brackets represent the ratios of the values as compared with the WT enzyme.

Figure 3 Modes of action of WT KmBglI and the Δ(503–512) mutant on oligosaccharides

The amounts of WT and Δ(503–512) enzymes added were standardized according to enzyme activities on the respective disaccharides; i.e. 0.3 μg of WT and 10 μg of Δ(503–512) were used for the hydrolysis of laminari-oligosaccharides, and 4.5 μg of WT and 90 μg of Δ(503–512) were used for the hydrolysis of cello-oligosaccharides. TLC analysis of the reaction products containing (A) laminari-oligosaccharides (G1, glucose; G2, laminaribiose; G3, laminaritriose; G4, laminaritetraose) and (B) cello-oligosaccharides (G1, glucose; G2, cellobiose; G3 cellotriose; G4, cellotetraose). (C) The preference for disaccharides relative to trisaccharides of several GH3 enzymes. With regard to WT and Δ(503–512) KmBglI, the ratios (fold) are estimated by comparing their kcat/Km values. As for Clostridium thermocellum β-glucosidase (CtBglB) [27], Azospirillum irakense β-glucosidase (AiCelA) [31] and ExoI [30], the specific activities (units/mg) were compared.


Tetramer formation of KmBglI

KmBglI forms a tetramer with tight subunit–subunit interactions in the crystal to reflect the tetrameric form in solution. Most of the reported GH3 enzymes, including ExoI [4], exist as monomers, but Clostridium thermocellum β-glucosidase (CtBglB) [27], VvBglII [15] and PrXyl3A [16] have been shown to form a dimer, tetramer and nonamer respectively. The crystal structure of KmBglI suggests that the tetramer formation is not essential for catalysis because the subunit interfaces are far from the Glc-binding site.

Structure and specificity of subsite (−1)

GH3 includes enzymes acting on β-glucoside, β-xyloside, β-N-acetylglucosaminide and α-L-arabinofuranoside. Interestingly, the glycon specificity is diverse across the members; some are quite specific to one glycosidic linkage, whereas others can act on two or more linkages. KmBglI hydrolyses pNP-Glc with a high catalytic efficiency and hydrolyses pNP-Xyl, pNP-Fuc, pNP-Ara and pNP-Gal with lower efficiencies (Table 1). TnBglB, the closest homologue of Bgl3B (96% identity), hydrolyses pNP-Glc most efficiently and pNP-Xyl, pNP-Fuc and pNP-Gal at the levels of 2–43% of pNP-Glc [28]. In contrast, ExoI is reported to be specific to β-glucoside [29].

The KmBglI–Glc complex structure was compared with the Glc complex structure of Bgl3B (Figure 4A) [7] and the S-laminaribioside (Glcβ1–3SGlcβ1–3OGlc–thiopNP) complex structure of ExoI (Figure 4B) [30]. The Glc molecule of KmBglI overlapped with the Glc in Bgl3B and the non-reducing end Glc of S-laminaribioside in ExoI; thus it must occupy the subsite (−1). The catalytic residues of Bgl3B (Asp242 and Glu458) and ExoI (Asp285 and Glu491) are well overlapped on Asp225 and Glu590 of KmBglI. The results of the mutational study revealed the importance of these residues (Supplementary Table S2). Pozzo et al. [7] reported that the rotamer angle of the tryptophan residue that is next to the nucleophile significantly differs between the structures of Bgl3B (χ=−92 °) and ExoI (χ=−169 °). ExoI uses this tryptophan residue for the formation of subsite (+1) [30], whereas Bgl3B uses it for the formation of subsite (−1). In KmBglI, the rotamer angle of the tryptophan residue (Trp226) is quite similar (χ=−90 °) to that of Blg3B, and the structure of subsite (−1) of KmBglI is almost identical with that of Bgl3B. The strict substrate specificity of ExoI and the relatively relaxed substrate specificities of Bgl3B and KmBglI may arise from their structural differences at subsite (−1).

Figure 4 Structural comparison among KmBglI, Bgl3B and ExoI at the active sites

(A and B) Wall-eyed stereoviews showing the superpositioning of KmBglI and Bgl3B (A), and KmBglI and ExoI (B). KmBglI and bound Glc are shown as in Figure 1. Bgl3B and bound Glc molecule, and ExoI and bound S-laminaribiose molecule (ball-and-stick) are coloured in grey. The broken lines indicate the Cα-atom distances between Phe508 of KmBglI and Trp434 of ExoI and between Trp226 of KmBglI and Met316 of ExoI. (CE) Active-site pocket formations of KmBglI (C), Bgl3B (D) and ExoI (E). The molecular surface of the N-terminal domain (blue) and the (α/β)6-sandwich domain (green) are shown. The surfaces are rendered transparent to show the bound molecules. (C) PA14 domain (yellow) and the side chains of Trp226, Phe445 and Phe508 of KmBglI are shown as a ribbon diagram and stick models respectively. The broken line indicates the disordered loop between residues 540–542. The bound Glc molecules in the KmBglI structure (magenta) and the modelled laminaribiose molecule (cyan) by reference to S-laminaribioside of ExoI are shown as stick models. (D) A long loop (residues 409–434, red) and the side chains of Trp243 of Bgl3B are shown as a ribbon diagram and stick models respectively. The broken line indicates the disordered structure between residues 418 and 422 (the amino acid sequence is DSWGT). (E) The side chains of Trp286 and Trp434, and the bound S-laminaribioside molecule are shown as stick models.

Structure and specificity of subsite (+1)

KmBglI hydrolysed sophorose, laminaribiose, cellobiose and gentiobiose with varied catalytic efficiencies (0.33–49 mM1·s1), whereas ExoI can hydrolyse them at a comparable level [30]. In the S-laminaribioside complex structure of ExoI, the reducing end Glc is clamped by Trp286 and Trp434 at subsite (+1) (Figures 4B and 4E). When a laminaribiose molecule was modelled into the structure of KmBglI by reference to S-laminaribioside in ExoI, the side chains of Phe445 and Phe508 from the PA14 domain sandwich the Glc molecule (Figures 4B and 4C). Phe445 of KmBglI is located at the structurally neighbouring position of Trp286 of ExoI, and its side chain forms hydrophobic interactions with C5 and C6 of the Glc. This residue is not strongly conserved among the PA14 domains inserted into GH3 enzymes (Figure 5), and the F445A substitution had a moderate effect on the activity (Table 4). These results suggest that Phe445 is important, but not essential, for the catalytic activity of KmBglI. On the other hand, the side chain of Phe508 in the loop linking β-strands 7 and 8 (7-8loop) (Figure 5) is located at an almost identical position with that of Trp434 of ExoI, although the positions of their Cα atoms are 8.3 Å apart (Figure 4B). The F508A substitution had a pronounced effect on the activities for disaccharides. The catalytic efficiencies toward laminaribiose and cellobiose decreased to 5 and 3% of the WT enzyme respectively. These results, and its conservation among the PA14 domain sequences inserted into GH3 (Figure 5), indicate that Phe508 plays a significant role in the catalytic activity. The PA14 domain-mediated subsite (+1) formation of KmBglI is thus totally different to that of ExoI, for which Trp286 from the (β/α)8-fold domain and Trp434 from the (α/β)6-sandwich domain are responsible. This structural difference may explain the difference in the catalytic efficiencies between the enzymes towards β-glucosides. In Bgl3B, the edge strand in the (α/β)6-sandwich domain (β-strand-k of ExoI and KmBglI) is missing and the corresponding region is replaced by an α-helix and a long unstructured loop (residues 409–434) (Figure 2). This loop, although its tip is not visible in the structure, is presumably involved in subsite (+1) formation (Figure 4D) [7]. These findings suggests that the three enzymes have independently evolved with respect to subsite (+1) formation.

Figure 5 Amino acid sequence alignment of the PA14 domains of KmBglI, V. volvacea β-glucosidase (VvBglII), A. tumefaciens β-glucosidase (AtCbgI) and P. ruminicola β-xylosidase (PrXyl3A)

The alignment was performed by PROMALS [37] using the secondary structure prediction algorithm, and shaded by BoxShade 3.21. The β-strands and α-helices predicted by PSIPRED [38] are indicated as (e) and (h) below the sequences respectively. The β-stands in KmBglI are shown as arrows. Residues Phe445 and Phe508 of KmBglI are indicated by open circles.

PA14 domain

Several GH3 enzymes effectively hydrolyse oligosaccharides. The ratios of the kcat/Km values of ExoI [30] and the relative activities of CtBglB [27] and Azospirillum irakense β-glucosidase (AiCelA) [31] for cellobiose/cellotriose/cellotetraose are reported to be 0.1:1:1.5, 3.2:1:0.9 and 2.6:1:1 respectively (Figure 3C). In contrast, KmBglI, AtCbgI and VvBglII, all of which possess a PA14 domain insertion, are quite specific towards small substrates. The kcat/Km ratios of KmBglI for laminaribiose/laminaritriose and cellobiose/cellotriose are 180:1 and 37:1 respectively (Figure 3C), and AtCbgI and VvBglII cannot release glucose from pNP-cellobioside [14,15]. The difference of the chain-length specificities between ExoI and KmBglI can be explained by the structural difference at the entrances of their catalytic pockets. The active site of ExoI is covered by a loop between strand-j and helix-J2 (30 residues including Trp434) from the (α/β)6-sandwich domain, which results in the formation of a deep pocket (Figure 4E). In contrast, the corresponding loop of KmBglI takes a different conformation and yields a shallow pocket. Instead, the PA14 domain covers the active site and consequently deepens it. The PA14 domain approaches the active site via two loops; one is a loop linking β-strand 3 and 4 (residues 444–459 including Phe445) and the other is the 7-8loop (residues 498–514 including Phe508). The latter may sterically hinder the binding of long-chain oligosaccharides (Figure 4C and see below). PrXyl3A, which also possesses a PA14 domain insertion, is able to act on xylo-oligosaccharides, as well as xylobiose [16]. Sequence alignment of the PA14 domains from four enzymes showed that PrXyl3A lacks the region corresponding to the 7-8loop of KmBglI (Figure 5). Accordingly, we constructed the loop deletion mutant, Δ(503–512), and examined its substrate specificity (Figure 3 and Table 4). The deletion caused a drastic decrease of the kcat/Km values for the disaccharides to 1% of the WT enzyme, probably due to the lack of Phe508. However, remarkably, the mutant retained 63% and 23% activities towards laminari- and cello-trioses respectively, as compared with the WT enzyme. Consequently, the ratios of the kcat/Km values of the mutant for laminaribiose/laminaritriose and cellobiose/cellotriose became 3:1 and 2:1 respectively, indicating that the mutant no longer showed the chain-length specificity (Figure 3C).

In ExoI, a helix-like loop that connects the (β/α)8-barrel and (α/β)6-sandwich domains could act as a hinge that allows the domains to move relative to each other during successive catalytic events [3234]. In KmBglI, the N-terminal and (α/β)6-sandwich domains are also connected by a linker, but the domains are unlikely to move because the tetrameric assembly would limit such movement. Alternatively, the subsite (+1) could become flexible by the significantly higher B-factor of the PA14 domain to facilitate the capture of the substrates (Figure 1B and Table 3). Formation of the catalytic pocket via a domain–domain interaction has been reported in several GHs. For example, an active site of E. coli β-galactosidase (LacZ) is formed by the interaction (known as α-complementation) between the (β/α)8-barrel domain and the N-terminal domain that adopts a similar structure to the PA14 domain (Z score=8.3, rmsd=3.2 Å for 121 residues) [35]. Interestingly, both of the PA14 domain of KmBglI and the N-terminal domain of LacZ participate in their active site formation via loops extending from the same edge of the domains (results not shown).

The PA14 domain frequently occurs in GHs, bacterial toxins, yeast adhesins and mammalian signalling molecules [10], and the carbohydrate-binding function has long been claimed without any structural evidence. In the present study, using the KmBglI protein, we revealed the structural basis of the interaction between the PA14 domain and a carbohydrate. Unfortunately, since the sequences of PA14 domains are not strongly conserved, especially at the loop regions, the role(s) of the domain cannot be generalized at present. But, particularly for GH3 members, the domain could form the subsite(s) and affect the substrate specificity. The overall structure of KmBglI is similar to those of Bgl3B and ExoI, indicating that these enzymes have evolved from a common ancestor. However, their active-site structures are quite different. KmBglI appears to have evolved to become more active on small substrates by accepting the insertion of the PA14 domain into the (α/β)6-sandwich.


Erina Yoshida performed the biochemical experiments, interpreted the results and participated in the writing of the paper. Masafumi Hidaka performed the structural analysis, interpreted the results and participated in the writing of the paper. Shinya Fushinobu supervised the structural studies. Takashi Koyanagi, Hiromichi Minami and Hisanori Tamaki provided their expertise in genetic experiments. Motomitsu Kitaoka provided his expertise in enzymology. Takane Katayama designed the research, interpreted the results and edited the paper. Hidehiko Kumagai designed the research and obtained the funding for the work.


This work was supported, in part, by a Grant-In-Aid from the New Energy and Industrial Technology Development Organization (NEDO), Japan.


We thank the staff of the Photon Factory for the X-ray data collection and Dr N. Nagano for helpful comments on our manuscript.


  • The atomic co-ordinates and structure factors (PDB codes 3ABZ and 3AC0) have been deposited in the Protein Data Bank.

Abbreviations: ASU, asymmetric unit; AtCbgI, Agrobacterium tumefaciens β-glucosidase; Bgl3B, Thermotoga neapolitana DSM4359 β-glucosidase; CtBglB, Clostridium thermocellum β-glucosidase; ExoI, Hordeum vulgare β-D-glucan glucohydrolase; FnIII, fibronectin type III; GH, glycoside hydrolase; KmBglI, Kluyveromyces marxianus NBRC1777 β-glucosidase; MAD, multiple anomalous dispersion; 4-MU, 4-methylumbelliferyl; NagZ, Vibrio cholerae β-N-acetylhexosaminidase; pNP, p-nitrophenyl; PrXyl3A, Prevotella ruminicola β-xylosidase; rmsd, root-mean square deviation; SeMet, selenomethionine; TnBglB, Thermotoga neapolitana Z2706-MC24 β-glucosidase; VvBglII, Volvariella volvacea β-glucosidase II; WT, wild-type


View Abstract