Like other forms of engineering, metabolic engineering requires knowledge of the components (the ‘parts list’) of the target system. Lack of such knowledge impairs both rational engineering design and diagnosis of the reasons for failures; it also poses problems for the related field of metabolic reconstruction, which uses a cell's parts list to recreate its metabolic activities in silico. Despite spectacular progress in genome sequencing, the parts lists for most organisms that we seek to manipulate remain highly incomplete, due to the dual problem of ‘unknown’ proteins and ‘orphan’ enzymes. The former are all the proteins deduced from genome sequence that have no known function, and the latter are all the enzymes described in the literature (and often catalogued in the EC database) for which no corresponding gene has been reported. Unknown proteins constitute up to about half of the proteins in prokaryotic genomes, and much more than this in higher plants and animals. Orphan enzymes make up more than a third of the EC database. Attacking the ‘missing parts list’ problem is accordingly one of the great challenges for post-genomic biology, and a tremendous opportunity to discover new facets of life's machinery. Success will require a co-ordinated community-wide attack, sustained over years. In this attack, comparative genomics is probably the single most effective strategy, for it can reliably predict functions for unknown proteins and genes for orphan enzymes. Furthermore, it is cost-efficient and increasingly straightforward to deploy owing to a proliferation of databases and associated tools.
- comparative genomics
- metabolic reconstruction
- orphan enzyme
- pathway hole
- unknown protein
Metabolic engineering, the targeted manipulation of pathways and transporters using recombinant DNA, is a fairly mature technology for micro-organisms  and a maturing one for plants [2–4]. However, its potential is often limited by ignorance of the components of the target metabolic network, i.e. the system's ‘parts list’. In many cases, this ignorance extends to the core components of target pathways, such as metabolite transporters in plants [5,6], but even more often it involves not knowing what else besides the core components is ‘out there’ in the system. An illustration of this is that engineering the lysine biosynthesis pathway in plants uncovered a previously unknown lysine catabolism enzyme as a key component of the lysine network .
Ignorance of parts lists also limits the effectiveness of metabolic reconstructions [8–10] (using the genome sequence to reproduce an organism's complete metabolic network in silico) by giving rise to metabolic gaps  or ‘missing network content’ . Besides its many applications in metabolic engineering , metabolic reconstruction is being increasingly used to explore the interaction of microbes with their environments [14,15] and to understand pathogen function inside and outside the host [16–18]. But if the parts list of proteins in the genome is highly incomplete, as it often is, an organism's capabilities will inevitably be underestimated. Similarly, in biomedicine, proteomic and transcriptomic screens for disease states have uncovered many unknowns (biomarkers) that correlate with these states [19,20]. But going from there to mechanistic understanding and rational drug therapies demands knowledge of function [21,22].
It is therefore clearly crucial to know organisms' metabolic parts lists, i.e. to assign functions to all of the proteins associated with metabolism. But we are still far from this goal, and the gap between the richness of genomic information and our knowledge of protein function is, in a certain sense, actually growing. Because this ‘unknown’ protein problem has not had coverage commensurate with its importance, the first section of this review documents its scope. Because one of the most powerful ways of attacking the problem, i.e. comparative genomics (taken to mean the integrated analysis of genomes and post-genomic data), is still underutilized by biochemists, the second section outlines the principles whereby comparative genomics can predict functions for unknown proteins. The last section illustrates application of these principles using as examples enzymes that bacteria and eukaryotes have in common.
‘UNKNOWN’ PROTEINS: THE ELEPHANT IN THE ROOM
The scope of the unknown protein problem
The large-scale sequencing of genomes has revealed that 30–40% of the proteins encoded by typical bacterial genomes have no clearly known function [8,23]. Moreover, many of the ‘known’ functions may be uncertain inasmuch as they are unsupported by experimental evidence; even in an organism as well studied as Escherichia coli, there is experimental information for only 54% of the gene products . The prevalence of unknowns is even greater in archaeal and eukaryotic genomes, and is well over 50% in higher plants and animals [8,24–26] (Figure 1A).
Thus with ~1000 genomes now completed, if a conservative average of 3000 genes per genome is assumed, it follows that today's databases contain ~106 unknown proteins. Some of these are organism-specific (so-called ‘ORFans’ [8,27]), but the vast majority belong to unknown orthologue families, of which there are thousands [8,28]. Furthermore, as more genomes are sequenced, more protein families are found (Figure 1B, blue line) and only a minority of them have known or partially known functions (red line). Of course, only a fraction of unknown protein families are associated with metabolism (as enzymes, transporters or regulators). But there is reason to think that it is a significant fraction given (i) the prevalence of gaps in known metabolic networks , (ii) the fact that new metabolic functions continue to be discovered even in well-characterized organisms such as E. coli , and (iii) the many cases where the same pathway step turns out to be mediated by totally different proteins in different organisms (‘non-orthologous displacement’) .
The reverse side of the unknown protein problem is that some 36% of the 3736 enzymes with an EC number have no matching protein or gene sequences; these have been termed ‘orphan enzymes’ [9,31–33] and are listed in the ORENZA and ADOMETA databases (Table 1). Since only 60–80% of enzymes have EC numbers , this implies that there are ~1900 orphan enzymes in total. Like unknown protein families, the number of orphan enzymes is growing (Figure 1C).
The dual problem of proteins with no matching function and biochemical functions with no matching protein is thus a huge one. Making these matches presents one of the most urgent challenges of the post-genomic era; it can only be met by community-wide mobilization [8,34,35].
Ubiquitous unknowns: the top targets
As noted above, most unknown proteins belong to orthologue families that occur in a range of genomes. In some cases, this range is extremely broad, and includes most or even all forms of life from bacteria and archaea to higher plants and mammals [8,28]. There are many such widely distributed (henceforth ‘ubiquitous’) proteins, as shown by the OrthoMCL database of orthologous protein families . For instance, most bacteria share >400 orthologue families with Arabidopsis and humans; of these, about half lack known functions. Ubiquitous proteins are plainly ancient in origin  and must have crucial functions in metabolism, transport or core cellular processes such as translation that are shared by all organisms [8,10]. Thus, among all the families of unknown proteins, the ubiquitous ones merit the highest priority for functional characterization because they have the greatest potential payoff in new biological knowledge [8,10]. Fortunately, they are also the best targets for comparative genomics approaches, as we now discuss.
THE PREDICTIVE POWER OF COMPARATIVE GENOMICS
Beyond homology-based predictions
Homology-based approaches to predicting function, from pairwise sequence comparisons  to fold-recognition algorithms , obviously only work when at least one of the orthologues in a family has an experimentally verified function. Although long-range homology can sometimes correctly place unknown proteins in a general class (e.g. ‘esterase’), assigning a precise function calls for approaches that go beyond homology. Enter comparative genomics. Broadly defined, comparative genomics is the integration of different types of genomic and post-genomic evidence to link protein with function. It began in the late 1990s just after the first set of genomes was sequenced. Ten years later, the success stories are now plentiful and several reviews have covered both techniques and specific examples (e.g. [40–43]).
The ‘guilt by association’ principle
The basic principle by which comparative genomics predicts functions is ‘guilt by association’: it finds associations between known and unknown genes in sequenced genomes, and deduces probable functions from these associations . A familiar example is the grouping of bacterial genes into operons, in which the genes encode related functions such as steps in a metabolic pathway. In this case, the function of an unknown gene can be inferred from those of known genes in the operon. Many other types of associations besides operonic arrangements can be derived from whole-genome datasets and their attendant post-genomic resources [41,45–47]. These are summarized in Figure 2 and briefly described below, along with relevant databases and tools (which are listed with their URLs in Table 1). The databases listed are primarily for bacteria and plants, reflecting the authors’ expertise.
Associations based on genomes
Of the ways in which genes can be associated, gene clustering, i.e. proximity in the genome, is the most generally useful. Although not absent from eukaryotes [48,49], clustering is far more marked in prokaryotes, where functionally related genes not only are arranged in operons, but also can be divergently transcribed from the same promoter region  or may simply be neighbours or near-neighbours, even though not co-transcribed [45,50]. On average, ~35% of bacterial metabolic genes are in conserved clusters . Clusters that are conserved across diverse genomes are the most informative [45,50], which is one reason ubiquity is so helpful. Gene clustering can be analysed using the STRING, SEED and MicrobesOnline databases, among others.
Phylogenetic occurrence profiles
Another very useful type of association is phylogenetic co-occurrence, whose underlying principles are that enzymes of the same pathway will be either all present in or all absent from a given organism [23,41] and that genes that functionally replace each other will have reciprocal (anticorrelated) distributions . The presence/absence patterns of genes among genomes can often identify candidates for ‘missing’ genes  such as those encoding orphan enzymes, or link unknown genes to known pathways. Phylogenetic profiles can be analysed using STRING, PHYDBAC, MBGD, the Signature Genes tool at NMPDR and the Phylogenetic Profiler at JGI. The two latter tools are designed to detect genes whose occurrence is correlated or anticorrelated among user-specified sets of organisms.
In a gene-fusion event, separate parent gene products are encoded in a single multifunctional polypeptide. Such fusions, which have been called ‘Rosetta stone’ proteins, suggest a high probability of functional interaction between the two proteins, e.g. as enzymes in the same pathway or as components of a protein complex [53,54]. Just as with gene clustering, if the function of one of the fused genes is known and the other is not, the fusion allows strong functional predictions. Prokaryotic gene-fusion events are catalogued in the FusionDB database.
Shared regulatory sites
Genes participating in the same pathway or process are often regulated by a common protein recognizing a specific DNA sequence, or by common riboswitches [55,56]. Finding shared regulatory sites is thus a powerful way to find genes that are functionally linked. Gene regulation databases include SwissRegulon and PRODORIC.
Metabolic reconstruction is both a goal and a method; the quest to reconstruct an organism's full metabolic repertoire in silico itself helps discover and rationalize that repertoire. Thus reconstructing a complete functional pathway from the set of genes in a genome using reference biochemical knowledge, as pioneered by E. Selkov, is of great value in inferring function from various kinds of genomic data because it imposes consistency [57,58]. The completeness of the reconstructed pathway indicates the correctness of initial gene function assignments and establishes which pathway steps are not yet connected to a gene. Metabolic reconstruction is most effective when applied iteratively; problems of wrong functional assignments and missing genes become apparent, and are resolved, in successive cycles . One way to implement metabolic reconstruction is via a ‘subsystems’ approach, in which a metabolic pathway (a ‘subsystem’) is analysed by experts across a large collection of genomes in parallel [60,61]. This approach is particularly helpful in identifying and making sense of pathway variants (e.g. truncated pathways or non-orthologous displacements). Another, more widespread, approach is genome-wide metabolic reconstruction and modelling, which has a wider scope of metabolism coverage but is essentially focused on a single organism. It can nonetheless reveal pathway gaps or inconsistencies that may otherwise be missed [12,58].
Associations based on post-genomic resources
As well as genomes themselves, various kinds of functional genomic data can yield functional associations between proteins. Although such post-genomic data are often still too noisy to be used as primary sources, they can be very effectively combined with genomics-based data.
Gene expression profiles
Associations can be derived from co-expression datasets (from microarrays), which are now well developed for model bacteria as well as for plants and animals (e.g. [62–64]). Moreover, the sets of conditions and (for plants and animals) the site or developmental stage in which a gene is expressed can provide vital clues about function . Microarray databases and tools include MicrobesOnline and GenExpDB for bacteria, and ATTED and the Golm Transcriptome Database for Arabidopsis.
At the protein level, protein–protein interaction datasets (e.g. ) [from two-hybrid or TAP (tandem affinity purification) tag experiments] have analogous value to those from microarrays. Also, for plants or other eukaryotes, organellar proteome data can sometimes rule in or rule out a possible function, for instance in the case of an enzyme of a pathway whose organellar location is known . Protein–protein interaction databases include DIP, APID and (for E. coli) eNet. Plant proteome databases are PPDB and SUBA II.
Essentiality and other phenotype data
The availability of large-scale bacterial and plant knockout collections, along with databases on knockout phenotypes, can quickly show whether a gene is essential or is associated with a particular phenotype [66–68]. Besides revealing associations directly (e.g. when auxotrophy connects a gene with a biosynthetic pathway) phenotype data, especially essentiality data, pinpoint important genes. Essentiality data for bacteria are integrated into the SEED database; plant phenome databases include RAPID, SeedGenes and Chloroplast2010.
Structural genomics projects have determined the structures of hundreds of proteins of unknown function, many of which are ubiquitous . Although structural genomics is usually unable to assign a specific function to a target protein, three-dimensional structures help, via fold recognition, to establish long-range homology when this is obscured at the sequence level, and thus contribute to general class functional assignments. Furthermore, a structure can be very helpful for comparative genomics because the ligands that the protein is computationally predicted to bind (e.g. ) can be compared with possible substrates inferred from, e.g., gene clustering evidence. Protein structures are compiled in the Protein Data Bank. If no structure is available, structure prediction algorithms such as PHYRE and PSIPRED GenTHREADER can be useful substitutes.
The genome deluge
A total of over 1000 prokaryotic and eukaryotic genomes have now been completely sequenced, approx. 4000 more are in the pipeline (Figure 3A), and the pace continues to quicken . This progress is highly favourable for comparative genomics, because a crucial feature of comparative genomics associations is that the number that can be found grows roughly at the square of the number of genomes , as shown schematically in the inset of Figure 3(A). The power of comparative genomics to identify functional associations between genes will thus keep growing rapidly. Moreover, since post-genomic datasets are also expanding rapidly, and analysing multiple types of associations improves predictions [41,44,45], the specificity and robustness of predictions will also keep growing. This means that many functions that are elusive today will become predictable in the foreseeable future.
SYNERGY OF PROKARYOTE–EUKARYOTE INTEGRATIONS
Of the genomes completed so far, approx. 10% come from a diverse set of eukaryotes in which all major groups are represented (Figure 3B); the percentage of eukaryotes among ongoing genomes is similar and their absolute number is almost 4-fold higher (Figure 3C) . These eukaryotic genomes, which already collectively encode some 1.8×106 ORFs (open reading frames), can now or soon will be included in comparative genomics analyses. Such inclusion is very valuable because analysing prokaryotic and eukaryotic genomes together yields information that cannot be obtained by looking at either group alone, and many discoveries have now been made this way. This section illustrates the synergy using three historical examples involving metabolic pathways of engineering interest, i.e. folate synthesis, NAD synthesis and leucine degradation, plus a case study showing how much faster an engineering target enzyme can be found with comparative genomics than without it.
Example 1: a missing folate biosynthesis enzyme
The folate biosynthesis pathway is an attractive engineering target in bacteria [72,73] and plants . Although the other pathway genes had been identified, until recently the gene for one enzyme (dihydroneopterin triphosphate pyrophosphatase) was missing in both groups (Figure 4A). This enzyme can be viewed as mediating the committing step in folate biosynthesis since its substrate, dihydroneopterin triphosphate, has three other known fates in various organisms (Figure 4A). Partial purification and characterization of dihydroneopterin triphosphate pyrophosphatase from E. coli had shown that it is a small (17 kDa) protein that requires Mg2+ for activity and is optimally active at pH 8.5 . Comparative genomics analysis (Figure 4B) revealed a gene (ylgG) encoding a small protein belonging to the Nudix family embedded in a folate synthesis operon in Lactococcus lactis and other bacteria. This made YlgG a prime candidate for the missing enzyme as Nudix family members include nucleoside triphosphate pyrophosphatases (dihydroneopterin triphosphate is structurally analogous to a nucleoside triphosphate) and Nudix enzymes characteristically require a bivalent cation and have an alkaline pH optimum. Experimental tests showed that inactivating ylgG in L. lactis resulted in dihydroneopterin triphosphate accumulation and folate depletion, and that recombinant YlgG had high dihydroneopterin triphosphate pyrophosphatase activity; ylgG was consequently renamed folQ . The equivalent E. coli gene (nudB) was identified 2 years later via a classical strategy involving cloning and characterizing all 13 E. coli Nudix proteins , which demanded notably more effort than the comparative genomics approach. Lastly, having identified the L. lactis enzyme, it was possible to show that its closest homologue in Arabidopsis also had high dihydroneopterin triphosphate pyrophosphatase activity (Figure 4B) .
Example 2: the tryptophan to quinolinate route in NAD synthesis
Manipulating levels of NAD and related cofactors, i.e. NAD(P)(H), is a useful tool for metabolic engineering [78,79]. Such engineering requires knowledge of the NAD biosynthesis pathway genes, to which comparative genomics has contributed significantly for the early pathway steps leading to quinolinate, the universal de novo precursor of the pyridine ring of NAD [80,81]. Before the advent of comparative genomics, two different pathways to quinolinate were known: the two-enzyme ‘prokaryotic’ pathway from aspartate and the five-enzyme ‘eukaryotic’ route from tryptophan (Figure 5A). However, in certain bacteria, classical radiotracer studies had demonstrated 14C incorporation from tryptophan into NAD and some of the ‘eukaryotic’ pathway enzyme activities had been detected, pointing to the existence of an alternative pathway in these organisms. Comparative genomics analysis identified candidates for all five bacterial genes of this pathway, all of which were then validated by complementation and biochemical assays . The most crucial observations leading to identification of the genes for the alternative pathway were the absence from some genomes of genes encoding both enzymes (NadA and NadB) of the ‘prokaryotic’ pathway (Figure 5B) and the presence of various operon-like gene clusters containing homologues of four out of the five ‘eukaryotic’ pathway enzymes (Figure 5C). The one missing enzyme, KFA (N-formylkynurenine formamidase), of the bacterial pathway (which is non-orthologous to eukaryotic KFA) was correctly predicted from its tendency to cluster with the other four (Figure 5C).
Example 3: the leucine-degradation pathway
Leucine degradation yields acetyl-CoA and acetoacetate, which are important intermediates in primary and secondary metabolism , including the synthesis of hydroxymethylglutaryl-CoA and thence isoprenoids and sterols (Figure 6A). The leucine degradation pathway has been well studied in humans and all of the human genes are elucidated and characterized (Figure 6B). In contrast, before comparative genomics work, relatively little was known about this pathway in bacteria, and no bacterial genes had been connected directly to steps after isovaleryl-CoA.
Attempts to identify bacterial genes solely by homology of their products with those of eukaryotic genes produced ambiguous results since most leucine-degradation enzymes belong to large families of paralogues. Such paralogues usually retain a ‘general class’ function (e.g. ‘dehydrogenase’), but differ widely in substrate specificity. However, a comparative genomics approach (outlined in Figure 6B) provided convincing evidence for the presence of the entire pathway of leucine catabolism in a number of diverse bacteria . The first step was identification of a conserved gene cluster containing the bacterial orthologues (genes 2b and 4) of two of the human genes (Figure 6C). This observation enabled upgrading functional predictions for two additional bacterial genes in the same cluster (genes 1 and 2a) from a general class to a specific function. At the time that this analysis was performed, no methylglutaconyl-CoA hydratase gene had been identified in any organism. Another conserved bacterial gene in the cluster (gene 3), a member of the enoyl-CoA hydratase family, was predicted to fulfil this functional role, and this prediction was projected to the orthologous gene in the human genome. The prediction for the human gene has since been verified experimentally [83,84], nicely illustrating ‘two-way comparative genomics traffic’ between prokaryotes and eukaryotes.
Another functional inference concerns the last conserved member of the same cluster (gene 5) (Figure 6C). Its assignment as acetoacetyl-CoA synthetase is supported by homology with other acyl-CoA synthetases and by clustering with the leucine-catabolism pathway where acetoacetate is a final product. The gene cluster in Bacillus halodurans contains two paralogous forms (genes 5 and 5′), whereas each of the very similar clusters in Bacillus anthracis and Bacillus subtilis has either one or the other, suggesting that they are isofunctional. Traditionally, acetoacetyl-CoA synthetase has not been considered to be closely tied to leucine catabolism, but the gene clustering evidence strongly suggests that this is so, at least in some bacteria.
Case study: identifying the plant choline-oxidizing enzyme
The two-step pathway from choline to glycine betaine (Figure 7A) has long been a target for metabolic engineering of resistance to salinity and water deficit in bacteria and plants because glycine betaine is a potent osmoprotectant [85,86]. The genes for the E. coli pathway had been cloned and sequenced by 1991 : betA, encoding a membrane-bound FAD-containing choline dehydrogenase, and betB, encoding a soluble NAD-linked betaine aldehyde dehydrogenase. In E. coli, these genes are clustered with betT, specifying a choline transporter and betI, coding for a transcriptional repressor (Figure 7B) . Identical or similar choline oxidation pathways and gene clusters occur in many other bacteria ; these clusters can also include betC, coding for choline sulfatase (Figure 7B) or choline transporter genes other than betT (e.g. opuAC).
Investigation of the plant pathway showed that it is plastid-localized, that the second enzyme is a betaine aldehyde dehydrogenase as in bacteria , and that the first is not a dehydrogenase, but a ferredoxin-dependent choline mono-oxygenase [90,91]. The ferredoxin electron donor can be reduced photosynthetically or by ferredoxin–NADP reductase plus NADPH in darkness . The plant choline-oxidizing system thus comprises three proteins: choline mono-oxygenase, ferredoxin and ferredoxin-NADP reductase (Figure 7A). Cloning and characterization of choline mono-oxygenase showed it to be a Rieske-type [2Fe–2S] enzyme [92,93]. Rieske-type oxygenases with reductase and ferredoxin components were already well known in bacteria , but choline mono-oxygenase was the first such case from plants.
After discovery of choline mono-oxygenase activity in 1989 , it took 8 years to identify the gene: 6 years to purify the protein  and 2 more to clone the cDNA from peptide sequence data . But had it been possible to apply comparative genomics to the search for the plant choline mono-oxygenase gene, it could, as we now explain, have been identified in approx. 2 h using the SEED database and its tools.
The starting point for our retrospective analysis is the sequence of plant betaine aldehyde dehydrogenase, which appeared in 1990 . This protein has many strong homologues in bacteria, first among them being betB proteins, whose genes cluster with betA and other bet genes (Figure 7B). However, certain of these homologues are encoded by genes in a different sort of cluster. This sort contains a gene for a Rieske-type protein as well as various other genes of choline and glycine betaine metabolism, including betC and dimethylglycine and sarcosine oxidases (Figure 7C). A gene specifying a reductase–ferredoxin fusion protein to service the Rieske-type protein is sometimes present, as is a betA gene (Figure 7C). These clusters strongly implicate the Rieske-type protein in choline metabolism, most probably as a choline oxygenase. (The co-occurrence of the Rieske-type gene with betA is not inconsistent with this inference because these enzymes could be alternatives. They have opposite cofactor requirements: an electron donor for choline mono-oxygenase compared with an electron acceptor for BetA, and choline mono-oxygenase has an oxygen requirement which BetA does not.) When the Rieske-type protein from clusters such as those in Figure 7C is used to search plant genomes, the only BlastP hits are choline mono-oxygenases.
The inference that the bacterial Rieske-type proteins are choline mono-oxygenases awaits experimental validation, but this makes no difference to the chain of reasoning. This chain would be quite strong enough to warrant experimental tests of the plant homologues were their function unknown.
In the present review, we have sought to convince biochemists that the unknown protein problem is vast, and that comparative genomics can help to solve it, especially when prokaryote and eukaryote genomes are analysed together. Although comparative genomics approaches are being adopted by more and more researchers, they remain underutilized. Given how fast the power of these approaches is increasing, and will continue to increase (Figure 3A), this underutilization means the loss of many opportunities and even, as the choline mono-oxygenase case suggests, significant waste of time and effort (“8 years in the lab can save 2 hours at the computer”).
Barriers to the adoption of comparative genomics have been noted briefly elsewhere , but they bear additional comment here. First, there is a perception that the necessary bioinformatic skills are specialist ones. This is not the case; powerful but fairly intuitive websites such as STRING and SEED (Table 1) now bring comparative genomics tools within the reach of any experimentalist after a few hours of instruction. Another perception is that a prerequisite for comparative genomics is high-level literacy in the metabolism, physiology, ecology and systematics of a wide range of prokaryotes and eukaryotes. This barrier is minimal; online databases now make all the necessary background knowledge just a few mouse-clicks away, so it can easily be acquired ‘on the fly’. Lastly, solving the unknown protein problem can seem daunting. But, as noted at the outset, it can be achieved by a sustained community effort. In practice, such an effort will require sharing of unpublished ideas, predictions and observations in a co-ordinated fashion . Although this requires some change of mindset away from the classical ‘single-PI specialist’ model, this is not a utopian dream, because adopting the new mindset requires only enlightened self-interest: researchers rapidly realize that much more progress can be made with it than without it.
Supported by the National Science Foundation [grant number MCB-0839926 to A. D. H.], the National Institutes of Health [grant number AI066244-01 to V.deC.-L.], and by an endowment from the C. V. Griffin Sr. Foundation.
We thank Dr Elizabeth Vierling for discussions that prompted this review, Dr Alex Toker for the invitation to write it, and Dr Ross Overbeek and Dr Andrei Osterman for providing the analyses summarized in Figure 1(B) and for insightful criticism of the manuscript. We also thank Dr Tim Helentjaris for the “8 years in the lab…” quote.
Abbreviations: KFA, N-formylkynurenine formamidase
- © The Authors Journal compilation © 2010 Biochemical Society