Research article

Solution structure of family 21 carbohydrate-binding module from Rhizopus oryzae glucoamylase

Yu-Nan Liu, Yen-Ting Lai, Wei-I Chou, Margaret Dah-Tsyr Chang, Ping-Chiang Lyu


CBMs (carbohydrate-binding modules) function independently to assist carbohydrate-active enzymes. Family 21 CBMs contain approx. 100 amino acid residues, and some members have starchbinding functions or glycogen-binding activities. We report here the first structure of a family 21 CBM from the SBD (starch-binding domain) of Rhizopus oryzae glucoamylase (RoCBM21) determined by NMR spectroscopy. This CBM has a β-sandwich fold with an immunoglobulin-like structure. Ligand-binding properties of RoCBM21 were analysed by chemical-shift perturbations and automated docking. Structural comparisons with previously reported SBDs revealed two types of topologies, namely type I and type II, with CBM20, CBM25, CBM26 and CBM41 showing type I topology, with CBM21 and CBM34 showing type II topology. According to the chemical-shift perturbations, RoCBM21 contains two ligand-binding sites. Residues in site II are similar to those found in the family 20 CBM from Aspergillus niger glucoamylase (AnCBM20). Site I, however, is embedded in a region with unique sequence motifs only found in some members of CBM21s. Additionally, docking of β-cyclodextrin and malto-oligosaccharides highlights that side chains of Y83 and W47 (one-letter amino acid code) form the central part of the conserved binding platform in the SBD. The structure of RoCBM21 provides the first direct evidence of the structural features and the basis for protein–carbohydrate recognition from an SBD of CBM21.

  • carbohydrate-active enzyme
  • carbohydrate-binding module (CBM)
  • glucoamylase
  • Rhizopus oryzae
  • solution structure
  • starch-binding domain (SBD)


Carbohydrates are the most diverse biopolymer in biological systems and manifest this diversity in several ways, including (1) the number of monomeric units in polymers, (2) the types of monosaccharides and (3) the kinds of glycosidic linkages. This diversity affords carbohydrates numerous critical biological roles [1] in energy and carbon storage, mechanical support and buffering of force or environmental stress [2,3], modulation of cell physiology, cellular signalling and molecular recognition [4]. For instance, cellulose is the major structural component of plant tissue, and chitin constitutes the exoskeleton of arthropods. Other examples include glucosaminoglycans, which comprise the extracellular ‘glue’ that cushions mechanical forces on joints [2], and xanthan gum, which contributes to the formation of bacterial biofilms that protect bacteria from environmental stress [3]. Starch and glycogen also function as energy sources for plants and animals respectively.

Starch is an abundant polysaccharide composed of two types of D-glucose polymers: amylose and amylopectin. The former is composed of mostly linear α-1,4-linked glucose residues, whereas the latter is the highly branched component of starch containing 5–6% of α-1,6-linkages. Together amylose and amylopectin fold into helical structures [5,6] that are further organized into compact granular structures to efficiently save energy and the carbon for life. On the other hand, glycogen is a readily mobilized storage form of glucose in animals. The glucose residues in glycogen are also linked by α-1,4-glycosidic bonds, and branched α-1,6-glycosidic bonds occur at about every tenth residue. The controlled breakdown of glycogen buffers the blood glucose level.

Biodegradation of polysaccharides allows the photosynthetically fixed energies and carbon to enter recycling pathway of ecosystem. Cells have developed various carbohydrate-active enzymes [7,8], such as glycoside hydrolases, glycosyltransferases, polysaccharide lyases and carbohydrate esterases, which modify polysaccharides to activate their biological functions. Several of these carbohydrate-active enzymes have developed into a modular structure comprising a catalytic module and one or more CBMs (carbohydrate-binding modules) and some ancillary modules. A CBM is defined as contiguous amino acid sequence (∼50–200 amino acids) within a carbohydrate-active enzyme with a discrete fold having carbohydrate-binding activity. To date, 45 families of CBMs have been recorded on the Carbohydrate-Active Enzyme website (

CBM family 21 is a family of modules containing ∼100 amino acid residues. This family currently has 63 entries: one bacterial, 29 fungal and 33 metazoal ( The starch-binding function of the CBM from Rhizopus oryzae glucoamylase has been extensively studied [9,10]. Protein phosphatase-1s (PP1s), which respond to signals such as insulin [11], indirectly manipulate carbohydrates and control glycogen metabolism in eukaryotes. Moreover, regulation of this enzyme is strongly correlated with the cause of diabetes in humans [12]. The glycogen-binding functions of PP1G (PP1 regulatory subunit) in several higher eukaryotes also have been widely studied. PP1Gs bind glycogen via family 21 CBMs [13].

Seven of the 45 CBM families (20, 21, 25, 26, 34, 41 and 45) have starch-binding activity. Starch-binding CBMs are also called SBDs (starch-binding domains) in glycoside hydrolases. The dissociation constants (Kd) for the binding of SBDs to starch are in the micromolar range [9,14]. SBDs are functionally independent of the catalytic domains and are postulated to disrupt the structure of starch and to anchor the catalytic domain to insoluble starch [1417]. Prior to our present study, no representative structure of a CBM21 has been available, and thus the detailed interactions between CBM21s and starch/glycogen remain unclear. Here we describe the first structure of a family 21 CBM from R. oryzae glucoamylase (RoCBM21) as determined by NMR spectroscopy. We compare this structure with that of several previously reported SBDs. Our results also illustrate the conservation of the starch-binding fold and the diverse mechanisms by which residues in SBDs interact with ligands.


Clone construction

The gene encoding RoCBM21 was cloned on the basis of a previously reported protocol with slight modification [9]. The DNA fragment encoding SBD (A1–T106; for brevity the one-letter code for amino acids is used) of R. oryzae glucoamylase was amplified by PCR using the forward primer 5′-CATATGGCAAGTATTCCTAGCAGT-3′ and the reverse primer 5′-CTCGAGTTATGTAGATACTTGGT-3′ (restriction sites are in bold). The PCR product was cloned into the pGEM-T Easy cloning vector (Promega) and verified by DNA sequencing. The SBD DNA fragment was subsequently ligated into the pET23a(+) expression vector (Novagen) at NdeI and XhoI sites to generate pET-RoCBM21. There is a difference (T53→I53) between the sequence in database (NCBI protein entry: ABB77799) and the RoCBM21 that we report here because the gene for this RoCBM21 was cloned from a local strain of R. oryzae.

Sample purification

Escherichia coli BL21-Gold (DE3) cells (Novagen) transformed with pET-RoCBM21 were grown in Luria–Bertani medium containing ampicillin (100 μg/ml) at 37 °C with shaking at 250 rev./min until the attenuance (D600) reached 0.6. For isotope labelling, BL21-Gold (DE3) cells with pET-RoCBM21 were cultured in M9 medium [18] with 15NH4Cl as nitrogen source and [13C]glucose as carbon source. Protein synthesis was induced by adding isopropyl β-D-thiogalactoside to a final concentration of 400 μM, and the cell culture was incubated at 20 °C for 16 h. The cells were harvested by centrifugation at 3700 g for 15 min at 4 °C, and the pellet was resuspended in 20 ml of 10 mM sodium acetate, pH 4.5, and then sonicated. The cell debris was removed by centrifugation at 16000 g for 15 min at 4 °C. The supernatant was applied to amylose resin (New England BioLabs) (equilibrated and washed with 10 mM sodium acetate, pH 4.5) and was eluted with 10 mM glycine, pH 10. The eluate was dialysed against 10 mM sodium acetate, pH 4.5, using an Amicon concentrator (Mr cut-off 3000). The concentrated sample was further subjected to a Hitrap SP column (GE Healthcare) and washed with sodium acetate buffer, pH 4.5, containing 50 mM NaCl.

NMR spectroscopy for structure determination

NMR data were acquired on a Bruker Avance 600 MHz or 800 MHz spectrometer. For structure determination, 1 mM RoCBM21 (unlabelled, 15N-labelled or 13C/15N-double-labelled) was dissolved in 10 mM sodium acetate, pH 4.5, and subjected to NMR experiments at 25 °C. The protein concentrations were quantitated by Bio-Rad Protein Assay. Backbone assignment was accomplished with HNCA [1921], HN(CO)CA [19,21], HNCACB[22], CBCA(CO)NH [23,24], HNCO [1921] and HN(CA)CO [21] experiments [25]. Because RoCBM21 contains a relatively high proportion of aromatic residues, the assignment of aromatic side chains was assisted by HBCBCGCDHD and HBCBCGCDCEHE experiments [26]. The chemical-shift resonance assignments of remaining atoms were accomplished using both NOESY [27,28] and 1H–15N HSQC (heteronuclear single-quantum coherence)-NOESY [29,30] with the assistance of through-bond correlation spectra. Homonuclear DQF (double quantum-filtered)-COSY [31], TOCSY [27] and 1H–15N HSQC-TOCSY were utilized to obtain through-bond correlations. The mixing times were as follows: TOCSY spectra, 90 ms; NOESY spectra, 50, 100 or 150 ms. All two-dimensional (2D) spectra were recorded with 512 t1 increments, and 2048 t2 complex data points were processed using TopSpin 1.3 (Bruker). Distance restraints were derived from NOESY spectra recorded with a 100 ms mixing time. A 2D 1H–15N HSQC spectrum was recorded after dissolving the freeze-dried RoCBM21 in 99% 2H2O at 25 °C for 36 h to identify the protected amide protons. RoCBM21 did not readily dissolve after freeze-drying, and therefore excess 2H2O was added to completely dissolve the protein. The excess 2H2O was subsequently removed by freeze-drying to yield a 500 μl sample. Hydrogen-bond constraints were obtained from the HSQC amide proton protection and confirmed with surrounding NOE (nuclear Overhauser effect) signals and HNCOHB (through-hydrogen-bond coherence) [32]. Dihedral angle constraints were obtained using the TALOS program [33] with chemical shifts of N, HA, CA, CB and C atoms. Sodium 2,2-dimethyl-2-silapentane-5-sulphonate was used as an internal reference for proton chemical shifts, and heteronuclear chemical shifts were referenced assuming γ15N/γ1H=0.101329118 and γ13C/γ1H=0.251449530.

Structure calculation and structural analyses

Partially assigned peak lists and chemical-shift lists were acquired from manual assignment using the program SPARKY 3 ( The peak intensities were derived using the default peak-fitting protocol assuming Lorentzian lineshapes. Structure calculations for RoCBM21 were carried out using CNS 1.1 [35] and ARIA 2.0 [36] with torsion angle dynamics and standard simulated annealing protocols [37]. These calculations were followed by explicit water refinement using the OPLS force field [38]. Of 200 structures that we obtained, the 15 structures with the lowest total energies were selected for analysis. Their quality was assessed with PROCHECK-nmr [39]. The SBD structures for comparison are CBMs from A. niger glucoamylase (AnCBM20) [40], Thermoactinomyces vulgaris R-47 α-amylase I (TvCBM34 I) [41], Bacillus halodurans maltohexaose-forming amylase (BhCBM25 and BhCBM26) [42] and Klebsiella pneumoniae pullulanase (KpCBM41) [43].

Chemical-shift perturbations

To investigate the ligand-binding residues and ligand-binding interactions of RoCBM21, maltotriose, maltoheptaose and β-cyclodextrin were applied to the chemical-shift perturbation. Maltotriose and maltoheptaose were prepared as 100 mM stocks, whereas β-cyclodextrin was prepared as a 20 mM stock, because of its lower solubility. RoCBM21 (1 mM) was titrated with individual ligands, and the 1H–15N HSQC spectra of RoCBM21 were recorded to monitor the interactions. Weighted averaged 1H and 15N chemical-shift changes were calculated using the following equation [4447]: Embedded Image (1)

Docking simulations

AutoDock3.05 [48] was used to simulate the binding modes of starch molecules at the binding sites. The carbohydrate molecules were docked to different binding sites in separate simulations. Affinity grids, 90×90×90, three-dimensionally centred on the binding sites with 0.375 Å (1 Å=0.1 nm) spacing were calculated using the program Autogrid [48]. The Lamarckian genetic algorithm was used for conformational searches. For each carbohydrate at one of the two binding sites, 100 trials were made with a population size of 150 for each trial. Initial position and conformation were chosen randomly. The translation step was 2.0 Å, and the rotation step was 50°. Other docking parameters were as follows: mutation rate, 0.02; cross-over rate, 0.8; elitism, 1; local search rate, 0.06; with 1 million energy evaluations. Final conformations from the 100 trials were clustered using an rmsd (root mean square deviation) tolerance of 1.5 Å.


NMR spectra and molecular structure

The 1H–15N HSQC spectrum of RoCBM21 showed a well-dispersed pattern of peaks characteristic of typical β-strands (Figure 1A). The spectral widths of proton dimensions were set to 16 p.p.m. to accommodate the peaks. The most upfield chemical shift was −1.181 p.p.m., corresponding to I79 Hδ1; the Hϵ1 of W70 resulted in the most downfield chemical shift at 11.88 p.p.m. Owing to the large number of aromatic residues in RoCBM21 (18/106), several chemical shifts were affected by the π-electron currents of aromatic rings. An atom that is close to an aromatic ring may experience either shielding (above or below the ring) or deshielding (in the plane of the aromatic ring) effect of the π-electron current. Some assigned chemical shifts have even been reported as anomalous or suspicious by the software currently used by BMRB to check for chemical-shift outliers [49]. These chemical shifts were carefully verified on request of the BMRB annotator. For example, the chemical shift of V39 Hβ (0.739) may be affected by the rings of Y83 and W47, N52 Hβ3 (0.775) by the ring of W47, N97 Hβ3 (−0.09) by the ring of Y102, I79 Hγ12 (−0.22) by the ring of Y40, and Y102 Hβ3 (0.482) by the ring of F82; as a result, their corresponding resonances have chemical-shift values upfield-shifted from those normal expected for the listed atoms. In HNCA, HN(CO)CA, CBCA(CO)NH, HNCACB, HNCO, and HN(CA)CO spectra, the backbone sequential connectivity proceeds continuously through the whole protein sequence, except for proline residues and G18. The connectivity between D17, G18 and S19 was absent in all those backbone connective experiments, and thus the chemical shift of amide nitrogen from G18 could not be assigned unambiguously. An example view of strips demonstrating backbone connectivity in an HNCACB spectrum is shown in Figure 1(B).

Figure 1 NMR spectra of RoCBM21

(A) Assignment of amide resonances of RoCBM21 on a 1H–15N HSQC spectrum. All backbone amide peaks are well resolved, except for Q10, which overlaps D12, and Y86, which overlaps E87. (B) Example assignment strips from residue F21 to Y26 in the HNCACB spectrum. Cα peaks are phased to the positive phase (black); Cβ, negatively phased peaks (grey). (C, D) Antiparallel secondary structures with a bulged structure in the N-sheet and a loop in C-sheet. Thick arrows indicate the NOEs found between the sequential Hα and the amide proton, thick double-headed arrows indicate interstrand Hα-Hα NOEs, thin double-headed arrows indicate interstrand amide proton to amide proton NOEs, thin arrows indicate interstrand Hα to amide proton NOEs and dotted lines indicate interstrand hydrogen bonds. NOEs in the loop regions are not shown.

The structures were calculated based on 2247 restraints, including 2071 NOE-derived distance constraints, 102 dihedral angle restraints, and 74 distance restraints from hydrogen bonds as described in the Experimental section. NOEs of N- and C-β-sheets are illustrated in Figure 1(C). A total of 200 structures were generated at the final iteration of the ARIA software calculation. A total of 15 structures with the lowest total energies were chosen for analyses and were deposited in the PDB (Figure 2A). The structure with the lowest rmsd to the average structure was chosen as the representative structure (Figure 2B). The NMR statistics are summarized in Table 1. The RMSD with respect to the average structure was 0.48±0.06 Å for backbone and 0.96±0.11 for heavy atoms in the well-defined region (residues 8–16, 20–40, 52–60, 67–87, 92–95 and 102–104) and 1.14±0.31 Å for backbone and 1.43±0.29 for heavy atoms for all residues. In the Ramachandran plot, 95% of non-glycine and non-proline residues are in the most-favoured or additionally allowed region, and 98.5% are in the generously allowed region. Most of the N97 residues and several of the N45 and N101 residues of the ensemble were found in the disallowed region. N97 and N101 are located in loop 8, whereas N45 is in loop 4; both loops are in the most flexible region of RoCBM21 (see Supplementary Figure 1 at

Figure 2 Solution structure of RoCBM21

(A) Stereo view of the RoCBM21 ensemble. The front side of RoCBM21 is shown with the N-terminal loop up and C-terminal loop down. Strands are in cyan. (B, C) Ribbon view and surface view of RoCBM21 respectively.

View this table:
Table 1 Structural statistics of RoCBM21 in aqueous solution at pH 4.5 and 298 K

The well-defined region infers to regions including residues 8–16, 20–40, 52–60, 67–87, 92–95 and 102–104. Note: 1 kcal=4.184 kJ.

The RoCBM21 domain contains 106 residues, the sequence of which has little similarity (<25%) to that of other SBD families. The solution structure of RoCBM21, however, shows a conventional β-sandwich fold and an immunoglobulin-like architecture, features that are characteristic of most CBMs [17]. The β-sandwich is symmetric and is composed of eight antiparallel β-strands: β1 (V9–Y16), β2 (F21–V27), β3 (V34–D42), β4 (I53–G60), β5 (Y67–A74), β6 (I79–V88), β7 (T92–N95), and β8 (Y102–V104). These β-strands can be subdivided into an N-terminal strand containing β-sheet (N-sheet) (Figure 2B), consisting of β1β2β5, and a C-terminal strand containing β-sheet (C-sheet) (Figure 2C), consisting of β3β6β7/8, which are connected by β4. Strand β2, positioned in the middle of the N-sheet, is paired antiparallel to β1 and β5, whereas β6, positioned in the middle of the C-sheet, is paired antiparallel to β3, β7 and β8. Strand β4, which is partially paired antiparallel to β3 and β5, lies across both β-sheets. The hydrophobic core of RoCBM21 is composed of V9, L11, I14, Y16, F21, I25, V27, W70 and F72 in the N-sheet and V36, V38, Y40, F82, I84, Y86, V88, Y93, Y102 and V104 in the C-sheet. In β1, residues L11–Y14 are not hydrogen-bonded to β2, and they form a bulge (Figure 1C). Both β7 and β8 are hydrogen-bonded to β6 and they are spanned by loop 8 (Figure 1D). The solvent-accessible surface of the RoCBM21 structure is shown in Figure 2(C). The DelPhi electrostatic potential [50] was mapped on to the surface.

Structural comparisons with SBDs

We compared the RoCBM21 structure with those of different SBD families (Table 2). Figures 3(A), 3(B) and 3(C) show the similarity in primary, secondary and tertiary structures respectively. Equivalent β-strands and loops in different families of SBDs are labelled and grey-scale-coloured according to the structure of RoCBM21 (Figure 3A). Two types of topologies can be discerned from the structural comparison: the structures of AnCBM20, BhCBM25, BhCBM26 and KpCBM41 have type I topology, whereas the structures of RoCBM21s and TvCBM34 structures show type II topology. These two topology types are similar, except that a strand must be shifted to overlap two topologies. For example, β1 in AnCBM20 is equivalent to β2 in RoCBM21; subsequent strand equivalents can be fitted, one by one, and the final strand (β8) of RoCBM21 is superimposable with strand 7 of AnCBM20. Notably, the final strand in AnCBM20 plays the same role as β1 in RoCBM21, forming hydrogen bonds with the middle strand (β2 or its equivalent) of the N-terminal β-sheet. All β-strands in RoCBM21 are antiparallel, but the first and the last strands are parallel in AnCBM20. In sum, the overall topology of RoCBM21 (type II topology) is similar to that of AnCBM20 (type I topology), but the order of the equivalent strands is sequentially shifted by one. It appears that most N-terminal SBDs have type II topology, whereas the C-terminal SBDs have type I topology (except for KpCBM41) (Table 2) [43].

Figure 3 Type I topology and type II topology of SBDs

The regions of the eight β-strands in RoCBM21 and their corresponding strands in other SBDs are shown. Strands 7 and 8 form hydrogen bonds with strand 6. The extra loop between the first two β-strands (corresponding to the bulged structure of RoCBM21) at the N-terminus of TvCBM34 I is underlined. (A) Primary structures of SBDs. The sequences corresponding to secondary structure are indicated. (B) Schematic diagrams of type I topology and type II topology. The strand order is equivalent to that shown in (A) and is shown above the type-II-topology diagram. (C) Three-dimensional structures of type I topology (represented by AnCBM20) and type II topology (represented by RoCBM21).

View this table:
Table 2 Structural comparison of different families of SBDs with RoCBM21

The sequence ranges in PDB files of BhCBM25 and BhCBM26 correspond to amino acids 863—958 and 771–863 of open-reading-frame BH0413.

Ligand-binding and chemical-shift perturbation

Three carbohydrates (maltotriose, maltoheptaose and β-cyclodextrin) tested in the present study showed similar patterns of chemical-shift perturbation; the same amino acid residues were affected in the titrations, but the magnitude of the change differed between carbohydrates. The 1H–15N HSQC spectra of RoCBM21 before and after titration with β-cyclodextrin overlapped (Figure 4A), and a summary of the chemical-shift changes and the residues affected is plotted in Figure 4(B). The RoCBM21 residues that exhibited significant chemical-shift perturbations (1 S.D.: 0.07 p.p.m. as a threshold for β-cyclodextrin and maltoheptaose, and 0.03 p.p.m. for maltotriose) were mapped on the three-dimensional structure (Figure 4C). According to these perturbations, the residues affected by ligand binding can be catagorized into three types. First, residues A41, W47, N52, Y83, K85, K91, D95, N96, N97 and S99 are located at the corresponding site I of previously reported SBDs. Likewise, residues N29, I30, A31, Y32, S33, K34, S57, F58, I62, N66, Y67, E68 and Y69 form the corresponding site II. The residues with significant chemical-shift changes and two carbohydrate-binding sites were mapped on the structure of RoCBM21 (Figure 4C). Interestingly, several residues located in the hydrophobic core were also affected by ligand binding; they are L11, V36, V38, W70 and I79. Loops 1, 4 and 8 are flexible regions with average rmsd values >1.5 Å (see Supplementary Figure 1 at Besides high flexibility, loops 4 and 8, which enclose site I, share another feature: they are rich in asparagine residues (N46, N48, N49, N50 and N52 are in loop 4, and N96, N97, N98 and N101 in loop 8). The titration with carbohydrate ligands caused large chemical-shift perturbations in asparagine residues N50, N52, N96, N97 and N98, suggesting that residues in these polyasparagine loops might be involved in CBM–starch interactions. However, the specific role of these polyasparagine loops still needs further studies. The presence of these polyasparagine loops is a distinct feature of some members of CBM21 [51]. Two molecules of β-cyclodextrin are docked respectively into the site I and site II of RoCBM21 (Figure 4D). In site I, the side chains of W47 and Y83 form the central part of the binding platform [42] (play roles as H26, W34 and W74 in CBM25, and Y23, Y25 and W36 in CBM26). These two aromatic side chains undergo hydrophobic interaction (we avoid the term ‘stacking’ for fear of confusing organic chemists) with the ring of glucose residues and collaborate with residues surrounding site I to bind ligands. Docking of maltotriose and maltoheptaose (see Supplementary Figure 3 at reveals that maltoheptaose interacts with W47 and Y83 in a similar pattern to β-cyclodextrin; maltotriose, on the other hand, is further away from W47. This consists with the smaller chemical-shift perturbations to loop-4 residues in maltotriose titration. In site II, residues N29, Y32 and E68 (which collaborate with residues surrounding site II) interact with three ligands similarly, as illustrated by the large chemical-shift changes observed. The dissociation constants (Kd) were extracted from fitting chemical-shift perturbations based on a theoretical curve [14,52,53]. The Kd fitted with site-I residues N50, W47 (side chain) and Y83 chemical-shift perturbations was determined to be 23.5±2.5 μM; this value is close to that reported for site I of CBM20 [14,53], but is slightly higher than the value reported by Chou et al. [9]. The Kd fitted with site-II residues N29, I30 (Y32 was not used because of peaks missing during the low-concentration titration) and E68 chemical-shift perturbations was calculated as 12.5±1.3 μM, in agreement with the previously reported value [9].

Figure 4 Ligand-binding and ligand-docking studies of RoCBM21

(A) The 1H–15N HSQC spectrum of RoCBM21 before (black peaks) and after (red peaks) titration (β-cyclodextrin as ligand). Peaks with larger chemical-shift perturbations are indicated with green arrows. (B) Weighted averaged chemical-shift changes with respect to residue number. Black, light yellow and red represent the ligands maltotriose, maltoheptaose and β-cyclodextrin respectively. The perturbation thresholds are set to >0.07 for β-cyclodextrin and maltoheptaose and 0.03 for maltotriose (dotted lines). (C) RoCBM21 structure labelled with residues that are affected upon titration. Residues with chemical-shift changes larger than thresholds are considered as significantly affected (and thus hypothesized to play critical roles in ligand binding) are shown as stick structures. (D) Two molecules of β-cyclodextrin are docked into RoCBM21.


We report here the first solution structure of a family-21 CBM. On the basis of the structures currently available, topologies of the SBDs can be categorized into two types. The type II topology contains all antiparallel β-strands, as exemplified by CBM21 and CBM34. Their first strands are hydrogen-bonded to their second strands, which are the centre strands of the N-sheets. In the type I topology, the first strands are the centre strands of the N-sheets; the tail strands take the place of the first strands in type II topology group.

Aromatic residues and polar residues play a vital functional role in CBMs [5457]. Aromatic residues are thought to undergo hydrophobic interaction with the pyran rings of carbohydrates. The roles of aromatic residues in RoCBM21–carbohydrate binding have been studied by site-directed mutagenesis [9]. Mutants Y32A, Y67A, Y83A, W47A or Y93A significantly alter the starch absorption. Chemical-shift perturbations of Y32, Y67, Y83 and W47 were also observed in 1H–15N HSQC, consistent with results obtained in mutagenesis studies. There was no chemical-shift perturbation for Y93, but its side chain was oriented toward the core of RoCBM21. Therefore mutation at this residue might affect RoCBM21 stability and thus indirectly affect carbohydrate binding. Y32, located in loop 3 of site II, had the largest chemicalshift change. Interestingly, Y67 is close to site II, but its side chain points to the outside of the N-sheet rather than to site II, and there are corresponding outwardly oriented aromatic residues in several other SBDs (e.g. CBM34, CBM25, and CBM26). The significance of these conserved outwardly oriented aromatic residues remains unclear. Both Y83 and W47 are in site I, with Y83 in the middle of the C-sheet and W47 in loop 4. The observation of chemical-shift changes of both residues consists with the effect of mutations. This has revealed the involvement of these two residues in ligand binding. The side chain of W47 shows a chemical-shift change upon soluble ligand binding (Supplementary Figure 2 at This highlights the importance of such a tryptophan ring in starch binding.

C-terminal deletions have been made for A. awamori glucoamylase (97% similar to A. niger glucoamylase) [58]. Deletion of residues 597–615 does not affect the raw-starch absorption. This removed fragment 597–615 corresponds to RoCBM21 β1 and part of β8 (Figures 3B and 3C) The rest of the truncated SBD is equivalent to β2 in loop 8, and this region is invariant in both the type I and type II topologies of SBDs, implying a minimal scaffold for raw-starch absorption. This scaffold is also found in the N3 domain of K. pneumoniae pullulanase, although no starch-binding activity was reported. The N3 domain is a CBM-like domain that folds into a type-II-like topology. Further removal of residues 571–596 abolishes the raw-starch absorption, but retains some ability to assist the catalytic domain in the digestion of raw starch. The remaining part of the SBD is equivalent to the structure of the fragment from β2 to β5 of SBDs (Figures 3B and 3C). This result defines the minimal scaffold with which the binding domain can contribute to raw-starch digestion. The N-domain of isoamylase (PDB code: 2BF2) [59], which is not classified as a CBM, also contains such a functional scaffold. Deletion of residues 559–570 is equivalent to the removal of β5. This deletion includes the critical core tryptophan residue, W562 (W70 in RoCBM21), and the disruption of the hydrophobic core results in a loss of the minimally required structural scaffold.

Our molecular structure of RoCBM21 increases the understanding of the conserved folds of SBDs and the differences between these domains. Several mammalian proteins contain glycogen-binding family 21 CBMs, including the PP1G (human), suggesting a critical role for this family in glycogen synthesis and sugar metabolism [11]. The RoCBM21 structure provides the basis for modelling other glycogen-binding family-21 CBMs. A better model will help to define a detailed molecular mechanism by which PP1G regulates glycogen synthesis and carbohydrate metabolism. However, a molecular model of PP1G built by Souchet et al. [60] (based on homology modeling of the ∼300-residue protein fragment) deviates substantially from our RoCBM21 structure in the theoretical CBM21 region. Whether the CBM21 in PP1G compromises the overall similarity to the ∼300-residue protein fragment (maintaining a modular structure, as in RoCBM21) or compromises the modular homology {forming a part of the (α/β)8-barrel, as Souchet et al. [60] proposed} is an interesting question worthy of further investigation.

Carbohydrates are important materials for the food and beverage industries. Perhaps, in the near future, they will also play significant roles in the fuel industry. Detailed knowledge of both carbohydrate manipulation, via studies of catalytic domains of carbohydrate-active enzymes, and carbohydrate binding, via studies of CBMs, is vital to an overall understanding of carbohydrate-active enzymes. The structural investigations of various families of SBDs and different members of CBM21s have allowed us to characterize inter-/intra-family variations of CBMs and different means by which these proteins recognize carbohydrates. These studies therefore may give the ability to engineer novel carbohydrate-active enzymes to meet future needs of humans.


The 800 MHz NMR spectra were obtained at High-Field Biomacromolecular NMR Core Facility (Taipei City, Taiwan) supported by the National Research Program for Genomic Medicine. The RoCBM21 clone was kindly given by Mr Chia-Chin Sheu at Simpson Biotech Co. (Taipei, Taiwan). This work was supported by the National Science Council, Taiwan, R.O.C. (grant numbers 95-2752-B-007-003-PEA and 94-2311-B-007-005) and The National Tsing Hua University Structural Proteomic Center for Drug Discovery (Hsinchu, Taiwan) (grant 95N2517E1 to P.-C.L. and 95-2627-B-007-003 to M.D.-T.C.). We thank Dr Shou-Lin Chang (Department of Life Sciences, National Tsing Hua University, Hsinchu, Taiwan), Dr Shang-Te Danny Hsu (Bijvoet Center, Utrecht University, Utrecht, The Netherlands) and Dr Chi-Fon Chang (NMR Supporting Team, High-Field Biomacromolecular Solution NMR core facility, Institute of Biomedical Sciences, Taipei, Taiwan) for discussing NMR experiments, Ms Tsun-Ai Yo and Ms Nan-Lu Ho (National Tsing Hua University, Hsinchu, Taiwan) for technical help, and Dr Bunzo Mikami (Division of Applied Life Sciences, Graduate School of Agriculture, Kyoto University, Gokasho, Uji, Kyoto, Japan) for sharing files and information about CBM41.


  • The atomic co-ordinates and structure factors (2DJM) have been deposited in the PDB (Protein Data Bank), Research Collaboratory for Structural Bioinformatics, Rutgers University, New Brunswick, NJ, U.S.A. ( The chemical shifts of RoCBM21 have been deposited in the BMRB (Biological Magnetic Resonance Data Bank), University of Wisconsin-Madison, Madison, WI, U.S.A.) under the entry number BMRB7083.

Abbreviations: AnCBM20, family 20 CBM from Aspergillus niger glucoamlyase; BhCBM25, and BhCBM26, families 25 and 26 CBMs from Bacillus halodurans maltohexaose-forming amylase; BMRB, Biological Magnetic Resonance Data Bank; CBM, carbohydrate-binding module; 2D, two-dimensional; NOE, nuclear Overhauser effect; PDB, Protein Data Bank; PP1, protein phosphatase 1; PP1G, protein phosphatase-1 regulatory subunit; rmsd, root mean square deviation; RoCBM21, family 21 CBM from Rhizopus oryzae glucoamylase; SBD, starch-binding domain; TvCBM34, I and TvCBM34 II, family 34 CBMs from Thermoactinomyces vulgaris from α-amylase I and α-amylase II; for, brevity the one-letter code is used for amino acid residues (e.g. Y83 is tyrosine-83)


View Abstract