An enzyme's substrate specificity is one of its most important characteristics. The quantitative comparison of broad-specificity enzymes requires the selection of a homogenous set of substrates for experimental testing, determination of substrate-specificity data and analysis using multivariate statistics. We describe a systematic analysis of the substrate specificities of nine wild-type and four engineered haloalkane dehalogenases. The enzymes were characterized experimentally using a set of 30 substrates selected using statistical experimental design from a set of nearly 200 halogenated compounds. Analysis of the activity data showed that the most universally useful substrates in the assessment of haloalkane dehalogenase activity are 1-bromobutane, 1-iodopropane, 1-iodobutane, 1,2-dibromoethane and 4-bromobutanenitrile. Functional relationships among the enzymes were explored using principal component analysis. Analysis of the untransformed specific activity data revealed that the overall activity of wild-type haloalkane dehalogenases decreases in the following order: LinB~DbjA>DhlA~DhaA~DbeA~DmbA>DatA~DmbC~DrbA. After transforming the data, we were able to classify haloalkane dehalogenases into four SSGs (substrate-specificity groups). These functional groups are clearly distinct from the evolutionary subfamilies, suggesting that phylogenetic analysis cannot be used to predict the substrate specificity of individual haloalkane dehalogenases. Structural and functional comparisons of wild-type and mutant enzymes revealed that the architecture of the active site and the main access tunnel significantly influences the substrate specificity of these enzymes, but is not its only determinant. The identification of other structural determinants of the substrate specificity remains a challenge for further research on haloalkane dehalogenases.
- haloalkane dehalogenase
- phylogenetic analysis
- principal component analysis
- protein engineering
- substrate specificity
Enzymes are biological catalysts that are essential components of every biological system and are valuable in biotechnology. The key functional characteristics of an enzyme are its catalytic activity towards different substrates and its substrate specificity, i.e. the range of substrates it can convert. As such, the identification of enzymes that efficiently catalyse new chemical reactions or display novel substrate specificities is of great scientific and practical interest. The traditional way of isolating novel biocatalysts is a time-consuming multistep process involving the enrichment of organisms from a natural resource, construction of a genomic library, cloning of the library into a host organism, screening for appropriate activity, and protein purification and biochemical characterization. This process has been greatly accelerated by the development of new techniques in molecular biology and bioinformatics, including high-throughput techniques for screening mutant and metagenomic libraries, methods for the in silico identification of potential targets using sequence database searches and bioinformatics tools, and various novel approaches to protein engineering [1,2].
HLDs (haloalkane dehalogenases; EC 184.108.40.206) are enzymes that catalyse hydrolytic cleavage of the carbon–halogen bond in a wide range of halogenated compounds. They have a number of potential practical applications, including roles in industrial biocatalysis [3,4], bioremediation , detoxification , biosensing  and molecular imaging . The properties of several HLDs have been improved by directed evolution [9–12]. A substantial body of knowledge concerning the structure and function of HLDs also allows construction of modified enzymes by rational design [4,13,14].
Structurally, HLDs belong to the α/β-hydrolase superfamily. Their active site is buried in the predominantly hydrophobic cavity at the interface of the α/β-hydrolase core domain and the helical cap domain, and is connected to the bulk solvent by access tunnels [4,15–18]. The active-site residues that are essential for catalysis are referred to as the catalytic pentad, and comprise a nucleophilic aspartate residue, a basic histidine residue, an aspartic or glutamic acid moiety that serves as a general acid and either two tryptophan residues or a tryptophan–asparagine pair that serve to stabilise the leaving halide ion . The HLD family currently includes 14 distinct enzymes with experimentally confirmed dehalogenation activity [20–27]. An analysis of the sequences and structures of these HLDs and their homologues divided the family into three phylogenetic subfamilies, HLD-I, HLD-II and HLD-III, which differ mainly in the composition of their catalytic pentad and cap domain .
To date, HLDs have been isolated from bacterial strains originating from soil [23,24,27,28], sea water [22,27], obligatory animal pathogens [20,21], plant symbionts  and plant parasites . Although the biological function of many HLDs remains unknown, those that were isolated from bacteria inhabiting contaminated soil are known to be involved in metabolic pathways that enable the host organisms to utilize halogenated compounds as carbon sources [23,28,30]. HLDs catalyse the hydrolysis of chlorinated, brominated and iodinated alkanes, alkenes, cycloalkanes, alcohols, carboxylic acids, esters, ethers, epoxides, amides and nitriles [4,31,32], and are thus broad-specificity enzymes, exhibiting miscellaneous activity across a wide range of substrate classes. The substrate specificity of HLDs can be described in terms of a quantitative profile of their specific activities with respect to a set of specific substrates. Quantitative comparisons of such specificity profiles can be used to identify appropriate catalysts for practical applications and to further our understanding of the relationships between the enzymes in terms of their function, structure and evolution.
The present study focused on the comparison and classification of the substrate specificities of nine members of the HLD family. A functional classification of the HLDs was carried out using PCA [PC (principal component) analysis] and the classification thus derived was compared with one derived on the basis of the enzymes' evolutionary relationships. The purpose of this comparison was to see whether the substrate specificity of individual HLDs reflects the evolution of the family and thus could be predicted from established phylogenetic classifications. Factors influencing the substrate specificity of HLDs were assessed by structural and functional comparison of wild-type and mutant enzymes. The present study also identifies ‘universal’ substrates converted by all of the enzymes examined, as well as ‘preferred’ and ‘characteristic’ substrates for individual SSGs (substrate-specificity groups). Such knowledge will be useful for the selection of appropriate biocatalysts for specific biotechnological applications and the development of platforms for screening HLD activity in different hosts, environments or in vitro samples.
All halogenated compounds used were of at least 95% purity, and were purchased from Sigma–Aldrich.
Preparation of enzymes and activity assay
The wild-type HLDs examined were: DatA from Agrobacterium tumefaciens C58 , DbeA from Bradyrhizobium elkani USDA94 (T. Prudnikova, P. Rezacova, Z. Prokop, T. Mozga, Y. Sato, M. Kuty, Y. Nagata, J. Damborsky, I. Kuta-Smatanova and R. Chaloupkova, unpublished work), DbjA from Bradyrhizobium japonicum USDA110 , DhaA from Rhodococcus rhodochrous NCIMB 13064 , DhlA from Xanthobacter autotrophicus GJ10 , DmbA  and DmbC  from Mycobacterium bovis 5033/66, DrbA from Rhodopirellula baltica SH1  and LinB from Sphingobium japonicum UT26  (Supplementary Table S1 at http://www.BiochemJ.org/bj/435/bj4350345add.htm). Mutant enzymes were constructed by rational design or focused directed evolution and include DbeA1 and DbeA2, which carry the insertions Val-Ala-Glu-Glu-Gln-Asp-His-Ala-Glu between residues 142 and 143 and Glu-Val-Ala-Glu-Glu-Gln-Asp-His-Ala between residues 141 and 142 respectively (R. Chaloupkova, T. Mozga, Y. Sato, T. Prudnikova, T. Koudelakova, E. Chovancova, P. Rezacova, Y. Nagata, I. Kuta-Smatanova and J. Damborsky, unpublished work); DbjAΔ, from which the His140-Thr-Glu-Val-Ala-Glu-Glu146 residues were deleted ; and DhaA31, which incorporates the substitutions I135F, C176Y, V245F, L246I, Y273F  (Supplementary Table S2 at http://www.BiochemJ.org/bj/435/bj4350345add.htm). His-tagged enzymes were heterogeneously expressed in Escherichia coli or Mycobacterium smegmatis strains using appropriate vectors and purified to homogeneity using immobilized metal-affinity chromatography as described previously [11,13,21,22,26]. The specific activities of HLDs towards the set of 30 halogenated substrates were taken from previous studies [11,22,29] (T. Prudnikova, P. Rezacova, Z. Prokop, T. Mozga, Y. Sato, M. Kuty, Y. Nagata, J. Damborsky, I. Kuta-Smatanova and R. Chaloupkova, unpublished work) or determined under the conditions used in those studies for DbjA, DhaA, DhlA, DmbA and LinB (Supplementary Table S3 at http://www.BiochemJ.org/bj/435/bj4350345add.htm). Enzyme concentration was estimated using Bradford reagent (Sigma–Aldrich) with BSA as a standard. Specific activity was measured using reagents containing mercuric thiocyanate and ferric ammonium sulfate; the halide ions released during the dehalogenase reaction were quantified by an end-point spectrophotometric measurement . Reactions were carried out in 100 mM glycine buffer (pH 8.6) in 25 cm3 microflasks closed by Mininert valves (Alltech) at 37 °C. The initial experimental concentration of the halogenated substrates in the reaction mixture was established on gas chromatograph GC Trace 2000 (Thermo Fisher Scientific) equipped with a flame ionization detector and capillary column DB-FFAP 30 m×0.25 mm×0.25 μm (J&W Scientific) (Supplementary Table S3). The samples were periodically withdrawn with a 1 cm3 syringe (Hamilton) during the 40 min of measurement after the initiation of the reaction by the addition of an enzyme. All withdrawn samples were immediately mixed with 35% nitric acid to stop the reaction. The reagents with mercuric thiocyanate and ferric ammonium sulfate were subsequently added to the collected samples and absorbance of the final mixture was measured in a microtitre plate at 460 nm by a Sunrise spectrophotometer (Tecan). The spontaneous hydrolysis of substrates in buffer was tested in the abiotic control. The spe-cific activities were quantified by an initial linear slope of the increasing halide concentration plotted against the time after the subtraction of spontaneous hydrolysis. The kinetic constants of all nine wild-type HLDs towards 1-chlorobutane (substrate number 4) or 1-iodobutane (substrate number 29) were collected from the literature or measured as described in the Supplementary Experimental section (Supplementary Table S4 at http://www.BiochemJ.org/bj/435/bj4350345add.htm).
A matrix containing the activity data for the nine wild-type HLDs towards 30 substrates (Supplementary Tables S3 and S5 at http://www.BiochemJ.org/bj/435/bj4350345add.htm) was analysed by PCA to uncover the relationships between individual HLDs (cases) and their substrates (variables). PCA of the data matrix X allows it to be expressed as the product of two new matrices plus a noise matrix of residuals: X=TP'+E [34,35]. The score matrix T (nine HLDs×30 substrates) summarizes the X-variables, the loading matrix P' (number of PCs×30 substrates) shows the influence of individual variables on the projection model and the residual matrix E (nine HLDs×30 substrates) quantifies the differences between the original values and the projections. The underlying principles of PCA can be visualized by considering its geometrical interpretation [35,36]. It is impossible to imagine nine points, representing the activity of individual HLDs, distributed in thirty-dimensional space. PCA projects these points on to a lower-dimensional subspace, and establishes a reduced set of new orthogonal co-ordinates called PCs. PCs are fitted to points in multi-dimensional space by the least squares method, such that the first PC is aligned in the direction of maximum variance in the data set, the second is aligned in the direction of the maximum remaining variance and so on. The co-ordinate values of individual cases in the new co-ordinate system are called scores (t), and the projection of the data points on to the two-dimensional plane defined by any two PCs is called a score plot. The cosines of the angles between a given PC and the axes defined by the original variables are called loadings (p), and they represent the contributions of the original variables to a particular PC. PCA was conducted using the Statistica 8.0 software package (StatSoft). Two PCAs were performed. In the first, the raw data concerning individual enzymes' specific activities towards particular substrates were used as the primary input data. In the second, the raw data were log-transformed and weighted relative to the individual enzyme's activity towards other substrates prior to analysis, in order to better discern individual enzymes' specificity profiles. Thus: (i) each specific activity value was incremented by 1 unit to avoid logarithmic transformation of zero values; (ii) the log of this new value was taken; and (iii) this log value was then divided by the sum of all the log values for that particular enzyme to give a log-transformed weighted measure of that enzyme's activity towards that specific substrate relative to its activity towards all of the other substrates considered. These transformed data were used to identify enzymes with interesting or unusual specificity profiles, without regard to their overall specific activity. The score plots obtained from the analysis of these log-transformed data were used to classify the HLDs into SSGs; substrates that were important in defining individual groups were identified from the loading plots. The co-ordinates of individual enzymes in the space defined by the biologically significant PCs arising from this analysis were used to calculate a matrix of Euclidean distances. This matrix was in turn used to construct a dendrogram to characterize the similarities of individual HLDs in terms of their substrate specificity profiles. The dendrogram was generated using the NJ (neighbour-joining) method , as implemented in the DISTTREE program in the VANILLA v1.2 software package .
The phylogenetic analysis of HLDs was carried out as described previously . Briefly, all of the available sequences of HLDs and their closest homologues were gathered from the NCBI (National Center for Biotechnology Information) non-redundant protein database  using PSI-BLAST database searches . HLDs were separated from other related protein families by clustering using CLANS . A multiple sequence alignment of HLDs was constructed using MUSCLE v3.5 , and was then manually refined using the BioEdit v7.0.1 sequence editor . Selected regions of the alignment were used to estimate a suitable evolutionary model and parameters by PROTTEST , and then for phylogenetic reconstruction by the NJ method. A distance matrix for NJ inference was generated using the MLDIST program of the VANILLA v1.2 package according to the WAG model of amino-acid substitution . The resulting phylogenetic tree was rooted by outgroup analysis. A Mantel test, performed using version 2.11.1 of the ‘R’ environment for statistical computing and graphics , was used to investigate the correlation between the matrices of the HLDs' phylogenetic distances and the matrix of Euclidean distances obtained from the PCA comparing substrate specificity profiles of wild-type HLDs.
Characterization of wild-type and engineered HLDs with a homogenous set of substrates
The substrate specificities of nine wild-type and four mutant HLDs with respect to a homogenous set of 30 substrates (Figure 1) were studied and quantitatively compared. This substrate set was selected using statistical experimental design from 194 potential HLD substrates to sample entire space of 28 different physico-chemical properties (Supplementary Experimental section and Supplementary Table S6 at http://www.BiochemJ.org/bj/435/bj4350345add.htm). An identical set of substrates and assay conditions was used for the characterization of all of the enzymes; otherwise the subsequent statistical analysis of the data obtained would have been less reliable. All of the HLDs examined exhibited good relative activities towards 1-bromobutane (substrate number 18), 1-iodopropane (substrate number 28), 1-iodobutane (substrate number 29), 1,2-dibromoethane (substrate number 47) and 4-bromobutanenitrile (substrate number 141). 1,2-Dichloroethane (substrate number 37), 1,2-dichloropropane (substrate number 67), 1,2,3-trichloropropane (substrate number 80), chlorocyclohexane (substrate number 115) and (bromomethyl)cyclohexane (substrate number 119) were found to be generally poor substrates for the HLDs examined (Figure 2).
Functional classification of wild-type HLDs
The matrix of the HLDs' untransformed specific activities towards the various substrates was subjected to PCA. Three biologically significant PCs were identified, which together accounted for 79% of the variance in the primary dataset (Supplementary Table S7 at http://www.BiochemJ.org/bj/435/bj4350345add.htm). PC1 ranked the enzymes according to the magnitude of their overall activity towards the tested substrates: LinB~DbjA > DhlA~DhaA~DbeA~DmbA>DatA~DmbC~DrbA (Supplementary Figure S1 at http://www.BiochemJ.org/bj/435/bj4350345add.htm). LinB and DbjA were generally the most active of the HLDs analysed; their specific activities were two to three orders of magnitude greater than those of DmbC and DrbA. PC2 and PC3 further separated the HLDs; specifically, these components identified three enzymes with unique specific activities towards several substrates (Supplementary Figure S2 at http://www.BiochemJ.org/bj/435/bj4350345add.htm). DmbA exhibits high activity towards 2-iodobutane (substrate number 64), 1-chloro-2-(2-chloroethoxy)ethane (substrate number 111) and chlorocyclopentane (substrate number 138). DbjA exhibits exceptionally high activity towards 2-bromo-1-chloropropane (substrate number 76) and chlorocyclopentane (substrate number 138), and also catalyses the dehalogenation of the highly resistant substrates 1,2-dichloropropane (substrate number 67) and 1,2,3-trichloropropane (substrate number 80). DhlA possesses exceptional activity towards the chlorinated substrates 1,2-dichloroethane (substrate number 37) and 1,3-dichloropropane (substrate number 38).
Analysis of the untransformed data revealed that enzymes with similar overall activities can nevertheless have divergent activity profiles; similarities in the magnitude of two enzymes' overall activities can obscure interesting and potentially useful differences in their reactivity towards specific substrates. In such cases, data pre-treatment methods can be used to facilitate the interpretation of datasets by emphasizing biologically relevant information . To this end, the activity data were log-transformed and weighted as described in the Experimental section, in order to minimize complications arising from differences in the enzymes' absolute catalytic proficiency and to emphasize the differences in their specificity profiles (Figure 3 and Supplementary Table S5). The top three biologically significant PCs from this second model accounted for 62% of the variance in the transformed dataset (Supplementary Table S7). PCs quantify how individual HLDs act with all of the 30 tested substrates, resulting in the clusters of HLDs with similar specificity profiles. On the basis of the model, the HLDs were divided into four SSGs (Figure 3A, left-hand panel): (i) SSG-I comprising DbjA, DhaA, DhlA and LinB, (ii) SSG-II containing DmbA, (iii) SSG-III containing DrbA and (iv) SSG-IV comprising DatA, DbeA and DmbC. This classification of the HLDs was primarily due to differences in their position along PC1 and PC2 (Figure 3A, left-hand panel); the classification of DrbA and DmbA into separate groups was justified by the difference in their position along PC3 (Supplementary Figure S4A at http://www.BiochemJ.org/bj/435/bj4350345add.htm). HLDs in the same SSG exhibited common substrate preferences that differentiated them from HLDs in other groups (Table 1 and Figure 3, left-hand panel). HLDs in SSG-I are characterized primarily by their catalytic robustness. Their activity can be detected towards most of the tested substrates. All members are active towards at least one of the poorly degradable compounds: 1,2-dichloroethane (substrate number 37), 1,2-dichloropropane (substrate number 67), 1,2,3-trichloropropane (substrate number 80) and chlorocyclohexane (substrate number 115). Enzymes in SSG-II, SSG-III and SSG-IV are more selective for specific halogenated compounds, differentiating them from other SSGs. The substrate-specificity profile of SSG-II is similar to SSG-I, as is obvious for example from the good conversion of 1,2-dibromoethane (substrate number 47) and 1-bromo-2-chloroethane (substrate number 137). On the other hand, the substrate specificity of SSG-II is unique due to good activity towards otherwise not preferred substrates and inactivity towards 1,3-di-iodopropane (substrate number 54). DrbA from SSG-III possesses extremely low or zero activity towards all of the tested compounds. A unique preference for 1-chlorobutane (substrate number 4) and inactivity with otherwise good substrates are SSG-III characteristics. SSG-IV is mainly characterized by preference for terminally substituted brominated and iodinated propanes and butanes.
Functional classification of mutant HLDs
In addition to the wild-type enzymes, four mutants (DbeA1, DbeA2, DbjAΔ and DhaA31; see Supplementary Table S5) were examined. The incorporation of these enzymes' specificity data generated a new PCA model, whose top three biologically significant PCs accounted for 58% of the total variance in the dataset (Supplementary Table S7). The incorporation of the data on the mutant enzymes did not affect the proposed functional classification of the HLDs (Figure 3A, right-hand panel), demonstrating the robustness of the model constructed for the wild-type enzymes. The engineered HLDs were found to cluster in the same SSG as their ‘parent’ enzymes.
The most pronounced difference between a mutant and its ‘parent’ in terms of substrate specificity was observed with DhaA31. Relative to DhaA, DhaA31 exhibited a decreased relative activity towards longer substrates such as 1-bromohexane (substrate number 20), and 1-iodohexane (substrate number 31) and increased relative activity towards 2-iodobutane (substrate number 64), 1,2-dibromopropane (substrate number 72) and 1,2,3-trichloropropane (substrate number 80). Compared with DbjA, DbjAΔ exhibited a loss of activity towards 2-iodobutane (substrate number 64) and decreased relative activity towards 2-bromo-1-chloropropane (substrate number 76). However, it also exhibited a gain in activity towards 1,2-dibromo-3-chloropropane (substrate number 155) and enhanced relative activity towards 1,3-di-iodopropane (substrate number 54), 1,2-dibromopropane (substrate number 72) and 1-bromo-2-chloroethane (substrate number 137). Relative to the ‘parent’ enzyme, the DbeA1 and DbeA2 mutants exhibited improved relative activity towards 1-bromobutane (substrate number 18), 1-bromohexane (substrate number 20) and 1,3-dibromopropane (substrate number 48), and a reduced relative activity towards 3-chloro-2-methylprop-1-ene (substrate number 209), 1,5-dichloropentane (substrate number 40), 2-iodobutane (substrate number 64) and 1,3-di-iodopropane (substrate number 54), which was the best substrate for the ‘parent’ DbeA.
Comparison of the functional and evolutionary classifications of HLDs
The examined dataset included representatives of all three HLD phylogenetic subfamilies. The HLD-I subfamily was represented by a single enzyme, DhlA. The HLD-II subfamily was represe-nted by six enzymes, DatA, DbeA, DbjA, DhaA, DmbA and LinB, whereas the HLD-III subfamily was represented by DmbC and DrbA. DbeA and DbjA have a protein sequence identity of 71% and are thus the most closely related pair of enzymes in the dataset, followed by DmbA and LinB, which showed a 68% protein sequence identity (Supplementary Table S8 at http://www.BiochemJ.org/bj/435/bj4350345add.htm). On the other hand, the DbjA–DrbA and DhlA–DmbA pairs both have mutual protein sequence identities of 19%, and are thus the most dissimilar pairs of enzymes in the dataset.
A comparison of the phylogenetic tree with the substrate specificity dendrogram revealed that members of the same phylogenetic subfamily are spread across different SSGs (Figure 4). DhlA from the HLD-I subfamily is in SSG-I, along with the HLD-II subfamily members DbjA, DhaA and LinB. DmbA did not cluster together with its close relative LinB; instead, it forms a separate cluster, SSG-II. The other two HLD-II members, DbeA and DatA, are in SSG-IV together with DmbC from the HLD-III subfamily; the second representative of HLD-III, DrbA, is in its own specificity group, SSG-III. The Mantel test confirmed further the absence of a statistically significant correlation between the enzymes' evolutionary relationships and their substrate specificity profiles (rs=−0.286; P=0.915).
The biochemical characterization of nine HLDs with a set of 30 halogenated substrates, followed by multivariate statistical analysis, phylogenetic inference and structural comparisons, allowed us to investigate the relationships between the structure, function and evolution of this broad-specificity family of enzymes. The analysis of the substrate preferences of individual wild-type HLDs revealed that 1-bromobutane (substrate number 18), 1-iodopropane (substrate number 28), 1-iodobutane (substrate number 29), 1,2-dibromoethane (substrate number 47) and 4-bromobutanenitrile (substrate number 141) are good substrates for all nine enzymes. These ‘universal’ substrates are suitable for screening or biochemical characterization of putative HLDs.
Substrate specificities of individual HLDs
PCA carried out with untransformed data ranked the enzymes according to their absolute activities along PC1. LinB and DbjA possess the highest activities of all the analysed HLDs, and are thus the most suitable family members for mechanistic studies [4,13] and biotechnological applications [4–7]. At the other end of the spectrum DatA, and especially DmbC and DrbA, have very low specific activities towards most of the tested substrates. In the case of DrbA and DmbC, this may be related to the unique composition of their catalytic pentad, Asp-His-Asp+Asn-Trp,  or their highly oligomeric structures . Their low activity values may also reflect incompatibility with the selected class of substrates. Nonetheless, DrbA, which originates from a marine organism, exhibited good catalytic efficiency and high relative activity towards 1-iodobutane (substrate number 29)  (Supplementary Tables S4 and S5); this compound is produced by marine algae, along with other iodinated compounds . The low activity of DatA may be due to its unusual active site, in which a tyrosine residue takes the place of the tryptophan residue located next to the nucleophile. This tryptophan residue is typically involved in stabilizing the leaving halide in HLDs and is highly conserved in other members of the HLD family . To date, DatA is the only characterized HLD to feature this exchange.
The distribution of HLDs along PC2 and PC3 highlighted certain unique functional properties of individual enzymes, such as the ability to convert resistant organic compounds or high activity towards specific substrates. Knowledge of these important catalytic properties is useful when selecting HLDs for use as biocatalysts or biosensing components. Relative to other HLDs, DhlA possesses a uniquely high activity towards 1,2-dichloroethane (substrate number 37). DhlA is naturally produced by 1,2-dichloroethane-degrading micro-organisms , which have already been used successfully in a full-scale groundwater treatment plant . DbjA is the only characterized HLD to exhibit significant activity towards the persistent compound 1,2-dichloropropane (substrate number 67) (Supplementary Table S3). DhaA and DbjA can also convert the highly toxic environmental pollutant 1,2,3-trichloropropane (substrate number 80), albeit at a slow rate (Supplementary Table S3). Enzymes having at least some activity towards a target compound can be further optimized by protein engineering ; the catalytic efficiency of DhaA towards 1,2,3-trichloropropane (substrate number 80) has recently been improved by a factor of 26 by means of directed evolution , resulting in an efficient catalyst for biotechnological applications.
Functional and evolutionary classifications of HLDs
To examine the similarities and differences in the substrate specificities of the wild-type and mutant enzymes, the raw data were subjected to a transformation to suppress the obfuscating effects of the different absolute activities of individual enzymes. The wild-type HLDs were divided into four SSGs; SSG-I consisted of DbjA, DhaA, DhlA and LinB. It has been suggested in previous studies [26,49,50] that these enzymes belong in different specificity classes; our analysis suggests that the previously observed differences between the members of this group are relatively insignificant if one considers a broader range of enzymes and substrates. The common feature of the SSG-I enzymes is their catalytic robustness. In particular, the SSG-I members exhibited measurable activity towards most of the chlorinated compounds, suggesting that these HLDs can effectively stabilize a chloride leaving group. Kinetic analysis with 1-chlorobutane (substrate number 4) revealed that SSG-I enzymes also exhibit higher turnover numbers than other tested HLDs (Supplementary Table S4). The highest specific activities observed with this group of enzymes were obtained with brominated ethanes and propanes. This is consistent with earlier studies, which showed that dibrominated compounds having low lowest unoccupied molecular orbital energies are efficiently and rapidly dehalogenated by HLDs [32,51,52].
Three SSG-I members, DbjA, DhaA and LinB, belong to the HLD-II subfamily. Notably, the substrate-specificity profiles of these three enzymes are more similar to that of DhlA from the HLD-I subfamily than to those of the other three HLD-II members in the dataset, namely DmbA (which we classified into SSG-II), and DatA and DbeA (both classified into SSG-IV). The fact that the DbeA and DmbA enzymes were not classified into SSG-I alongside their close evolutionary relatives DbjA and LinB demonstrates that a close evolutionary relationship between two HLDs does not necessarily imply that they will have similar activity and specificity profiles. At the time of writing, only 14 members of the HLD family have been experimentally characterized and shown to be dehalogenation competent. However, a recent sequence database search and bioinformatics analysis identified more than 200 putative members of this family (E. Chovancova, unpublished work).
These results indicate that it is not possible to predict the substrate specificity of putative HLDs solely on the basis of sequence similarities with experimentally characterized family members. This observation is in accordance with previous observations that a subtle change in the key active-site residues can lead to modulation, or even a switch, of enzyme substrate specificity [53,54]. Several mutants of LinB carrying a single point substitution at the opening of the access tunnel have been reported to have modified activities towards various halogenated substrates . Similarly, a few mutations in the specificity-determining regions of HLDs have led to changes in substrate specificity during laboratory  and natural  evolution. We speculate that the incongruence between the phylogenetic and functional classifications of HLDs reflects a certain ‘plasticity’ of these enzymes. This would enable the host organisms to quickly evolve the capacity to convert novel substrates, which is essential for the adaptation of bacteria to various living environments.
Statistical analysis of the merged dataset of the wild-type and mutant dehalogenases demonstrated that the developed PCA model can be used for the classification of characterized members of the HLD family (Supplementary Figure S4). The prediction of the specificity group can be made for any newly isolated HLD with determined specificity profile using the protocols described in the Experimental section.
Structural determinants of substrate specificity in HLDs
We have previously proposed that the substrate specificity of individual HLDs is influenced by the architecture of their active-site cavities and the anatomy of their access tunnels [11,13,17,51,57,58]. The active-site cavities of DbjA  and LinB  are the largest of all HLDs whose structure is known, and both enzymes do indeed perform well with bigger substrates such as monohalogenated butanes, pentanes, hexanes, cyclopentanes and cyclohexanes. The large active sites are also consistent with these enzymes' very broad substrate specificity. The cavities of DhaA  and DmbA  are smaller and therefore cannot accommodate so readily these larger substrates. The smallest and most occluded active-site cavity is that of DhlA [15,59]; it is optimized for its ‘natural’ substrate, 1,2-dichloroethane (substrate number 37). Notably, this enzyme shows enhanced activity towards other small substrates . The key role of the access tunnels in controlling the specificity of HLDs was strikingly demonstrated in a recent directed-evolution experiment that sought to improve the activity of DhaA towards 1,2,3-trichloropropane (substrate number 80) . A DhaA31 mutant carrying five substitutions in its access tunnels was prepared; this increased the occlusion of its active site, restricting the access of water molecules to the active site. In turn, the exclusion of water enhances the stability of the activated complex, enhancing the activity of the mutant towards halogenated ethanes and propanes. However, the mutant also exhibits decreased activity towards longer haloalkanes such as hexanes, presumably due to the steric hindrance between the alkyl chains of substrates and the large hydrophobic residues introduced in the access tunnels.
Comparative analysis of closely related HLDs provided further insight into the structural determinants of their substrate specificity. DmbA exhibits a 68% sequence identity with LinB, but despite this these two enzymes were classified into different SSGs. While their catalytic residues are positioned identically [17,18], there are significant differences in the anatomies of their active-site cavities and main access tunnels, which might be responsible for the observed differences in substrate specificity (Figure 5A). DbeA and DbjA exhibit a 71% sequence identity and provide a second example of a pair of closely related enzymes with different kinetic properties and substrate specificities (Supplementary Tables S3 and S4). Compared with DbeA, DbjA carries an insertion of nine amino acids between the main and cap domains . A structural comparison of the two enzymes revealed that their active-site cavities and main access tunnels are structurally similar; the main structural difference lies in the conformational behaviour of the His139 residue, which is located in close proximity to the insertion (Figure 5B). The His139 residue adopts two alternative conformations in DbjA , but only one conformation in DbeA (T. Prudnikova, P. Rezacova, Z. Prokop, T. Mozga, Y. Sato, M. Kuty, Y. Nagata, J. Damborsky, I. Kuta-Smatanova and R. Chaloupkova, unpublished work). The role of the conformational behaviour of the His139 residue in controlling the enzyme's specificity was probed using the deletion mutant DbjAΔ. The His139 residue adopts only one conformation in DbjAΔ, resembling that observed in DbeA (Figure 5C). However, the mutant DbjAΔ retains the substrate specificity of DbjA, demonstrating that the His139 residue does not play an essential role in controlling substrate specificity. This conclusion was further supported by an experiment using two mutants of DbeA, DbeA1 and DbeA2. These mutants were constructed to mimic the active site and the main access tunnel of DbjA (R. Chaloupkova, T. Mozga, Y. Sato, T. Prudnikova, T. Koudelakova, E. Chovancova, P. Rezacova, Y. Nagata, I. Kuta-Smatanova and J. Damborsky, unpublished work). As was the case with DbjAΔ, their substrate-specificity profiles were more similar to their ‘parent’ enzyme, DbeA, than to the target protein DbjA.
By comparing wild-type and mutant HLDs, we were able to address an intriguing question, namely, whether it is possible to interconvert the substrate specificity of two HLDs by modifying their active-site cavities and main access tunnels. Even when the mutants had identical active-site and main tunnel residues with those observed in the target enzyme, switches in the mutants' substrate specificity were not detected. The mutants all exhibited a similar substrate specificity to their respective ‘parent’ enzymes, and were classified into the same SSGs as their ‘parents’ by PCA, indicating that our mutations did not target one or more of the key determinants of HLD substrate specificity. Thus the interconversion of substrate specificity remains one of the challenges for the rational design of HLDs. In addition to re-engineering of the active site and the main access tunnel, it may also be necessary to modify auxiliary access tunnels or tunnel openings , the distribution of charges on the protein's surface [61,62], protein solvation  or protein dynamics .
Tana Koudelakova determined activities of DhaA and DhlA, carried out PCA, interpreted the data and contributed to writing the manuscript. Eva Chovancova conducted phylogenetic analysis, interpreted the data and contributed to writing the manuscript. Jan Brezovsky interpreted the data, conducted computer modelling and contributed to writing the manuscript. Marta Monincova selected the set of substrates for testing, and determined activities and kinetic constants of DbjA and DmbA. Andrea Fortova determined activities of LinB. Jiri Jarkovsky designed transformation of the primary data set, carried out the Mantel test and interpreted the data. Jiri Damborsky designed the concept of the project, calculated molecular descriptors, interpreted the data and contributed to writing of the manuscript.
This work was supported by the Ministry of Education, Youth and Sports of the Czech Republic [grant numbers LC06010, MSM0021622412, MSM0021622413, CZ.1.05/2.1.00/01.0001 (to T.K., J.J., J.B. and A.F. respectively) and the Grant Agency of the Czech Academy of Sciences [grant number IAA401630901] (to J.D.) is also gratefully acknowledged
We thank Tomas Mozga and Radka Chaloupkova (Mazaryk University, Brno, Czech Republic), Pavlina Rezacova (Institute of Molecular Genetics of the Academy of Sciences of the Czech Republic, Prague, Czech Republic), Tatyana Prudnikova and Ivana Kuta-Smatanova (Institute of Systems Biology and Ecology of the Academy of Sciences of the Czech Republic, Nove Hrady, Czech Republic) for providing their unpublished data for analysis, and Hana Moskalikova (Enantis Ltd., Brno, Czech Republic) for her help with the measurement of catalytic constants.
Abbreviations: HLD, haloalkane dehalogenase; NJ, neighbour-joining; PC, principal component; PCA, PC analysis; SSG, substrate-specificity group
- © The Authors Journal compilation © 2011 Biochemical Society