RGMs (repulsive guidance molecules) comprise a recently discovered family of GPI (glycosylphosphatidylinositol)-linked cell-membrane-associated proteins found in most vertebrate species. The three proteins, RGMa, RGMb and RGMc, products of distinct single-copy genes that arose early in vertebrate evolution, are ∼40–50% identical to each other in primary amino acid sequence, and share similarities in predicted protein domains and overall structure, as inferred by ab initio molecular modelling; yet the respective proteins appear to undergo distinct biosynthetic and processing steps, whose regulation has not been characterized to date. Each RGM also displays a discrete tissue-specific pattern of gene and protein expression, and each is proposed to have unique biological functions, ranging from axonal guidance during development (RGMa) to regulation of systemic iron metabolism (RGMc). All three RGM proteins appear capable of binding selected BMPs (bone morphogenetic proteins), and interactions with BMPs mediate at least some of the biological effects of RGMc on iron metabolism, but to date no role for BMPs has been defined in the actions of RGMa or RGMb. RGMa and RGMc have been shown to bind to the transmembrane protein neogenin, which acts as a critical receptor to mediate the biological effects of RGMa on repulsive axonal guidance and on neuronal survival, but its role in the actions of RGMc remains to be elucidated. Similarly, the full spectrum of biological functions of the three RGMs has not been completely characterized yet, and will remain an active topic of ongoing investigation.
- axon guidance
- gene evolution
- gene structure
- iron metabolism
- protein modelling
- repulsive guidance molecule (RGM)
The RGM (repulsive guidance molecule) gene family consists of three members, RGMa, RGMb, and RGMc [1–6]. Each gene encodes a protein whose expression is restricted to a small number of tissues and is hypothesized to be involved in distinct biological functions ranging from control of iron metabolism to regulation of axonal guidance and neuronal survival in the developing nervous system. The RGM family receives its name from the axonal guidance molecule RGMa , a protein found primarily in the developing and adult central nervous system [1–3,7]. A second member, RGMb (or Dragon ) is also detected in the nervous system, but in a different expression pattern than RGMa [4,8]. The biological actions of RGMb are poorly characterized to date. The third member of the family is RGMc [also called HJV (haemojuvelin), HFE2 (HLA-like protein involved in iron (Fe) homoeostasis) and DL-M (Dragon-like muscle)]. Unlike RGMa or RGMb, RGMc is not expressed in the nervous system, but rather is produced by striated muscle and the liver [3,5,8,9]. RGMc surprisingly regulates iron metabolism, as inactivating mutations cause juvenile haemochromatosis, a severe systemic iron overload disorder in humans . To date, there has been no comprehensive assessment of the most fundamental aspects of the biology of the RGM family, including regulation of gene expression, control of protein biosynthesis, the relationship of protein structure to function, or mechanisms of action of each of the RGM proteins. In the present review we address the molecular biology and biochemistry of the RGM family, attempt to define and critically evaluate what is known, and identify new areas for future investigation.
Chromosomal organization and gene structure
RGMa has been identified in ten mammalian and eight non-mammalian vertebrates, where it is a single-copy gene (Table 1). A single RGM gene also has been described in several invertebrate species, including urochordates, echinoderms, molluscs and nematodes , as will be discussed in the molecular evolution section below. In vertebrates, RGMa comprises one of six conserved genes in a syntenic locus , as can be assessed by analysis of the corresponding parts of the human, mouse and chicken genomes (Figure 1). In these three species, RGMa is positioned in the opposite transcriptional orientation from the other nearby genes. The locus is also conserved in zebrafish (Figure 1). Within the cluster of six conserved genes near RGMc in human, mouse and chick, Mctp2 (multiple C2 domains, transmembrane 2) is found 5′ to RGMa, and Chd2 (chromodomain helicase DNA-binding protein 2), St8sia2 (ST8 α-N-acetyl-neuraminide α-2,8-sialyltransferase 2), and Slco3a1 (solute carrier organic anion transporter family member 3A1) are located 3′. The latter three genes also are positioned downstream of RGMa in the zebrafish genome, but only upstream Mctp2 is absent (Figure 1). In addition, in all four species, Nr2f2 (nuclear receptor subfamily 2, group F, member 2) is located upstream of RGMa, although both the relative orientation and the distance among species varies (∼2 Mb in human and mouse genomes and ∼830 kb in zebrafish, where the transcriptional direction is reversed) (Figure 1).
Human and mouse RGMa genes are of comparable size, ∼46 and ∼44 kb respectively, and have a similar organization, being composed of four exons separated by three variably-sized introns, although the precise 5′ end of exon 1 has not been defined in either species (Figure 2 and Table 2). In both genes, exon 1 is non-coding, and consists of most of the 5′ UTR (untranslated region) of RGMa mRNA. Exon 2 contains the remaining 35 nucleotides of the 5′ UTR and the first 26 codons of the RGMa protein, whereas exon 3 encodes the next 72 codons (73 in mice), and exon 4 the remaining 328 codons (321 in mice), plus a 3′ UTR of ∼1800 nucleotides and a single polyadenylation signal (Figure 2). The four exons are well conserved between human and mouse RGMa, with nucleotide identity ranging from a low of 64% for exon 1 to a high of 99% for exon 2 (calculated using data in [12–15]). The three introns are not as conserved as the exons (<30% compared with ∼60% identity respectively), although their lengths are similar between the two species (Figure 2). Although four exons have been identified in the zebrafish RGMa gene, the nucleotide sequence of exon 1 is not similar to its mammalian counterparts [14–16]. In the chicken, the 5′ end of the largest RGMa cDNA could not be mapped to the RGMa locus, possibly because the genomic sequence is incomplete in this region , and its DNA sequence also differs markedly from the other species. Thus only three exons have been identified definitively in chicken RGMa, corresponding to mammalian exons 2–4 (Figure 2).
RGMa was cloned initially from mRNA isolated from chick embryonic optic tectum . Subsequently, RGMa transcripts were shown to be expressed at highest levels in both the adult and developing central nervous system in chicken, mouse and zebrafish [1–4,7,18]. RGMa mRNA also has been detected at lower levels in peripheral tissues, including heart, lung, liver, skin, kidney and testis, at least in the adult rat . By Northern blotting, the major RGMa transcript has been shown to be ∼3.6 kb in length in the mouse , which is consistent with the aggregate size of the four RGMa exons [13,20]. Other minor transcripts have been seen by Northern blotting, but their exact relationship with the RGMa gene has not been established to date [19,21].
In the developing mouse embryo, RGMa mRNA has been detected as early as E (embryonic day) 8.5 in the neural folds of the central nervous system . Later in development, RGMa transcripts are found in several brain regions, including hippocampus, midbrain, the ventricular zone of the cortex, and parts of the brainstem and spinal cord [1,8,21]. Similar observations have been reported in the developing chicken [2,7] and zebrafish . The biochemical processes responsible for these distinct patterns of RGMa gene expression in the central nervous system have not been elucidated to date, in large part because nearly nothing is known about the organization or function of the RGMa gene promoter, about mechanisms of regulation of RGMa gene transcription, or about RGMa mRNA turnover. Similarly, the signalling pathways that govern RGMa gene expression in different tissues and in response to physiological and pathological stimuli have not been characterized.
Protein sequence and expression
The initial identification of chick RGMa after its cDNA cloning revealed it to be a cell membrane-associated GPI (glycosylphosphatidylinositol)-linked two-chain protein that was derived from a primary translation product of 432 amino acids . Subsequent cloning of human and mouse RGMa cDNAs predicted similarly sized proteins of 434 and 438 residues , respectively, that were 91% identical to each other and 80% identical to chick RGMa (Table 3). In all three species and in zebrafish RGMa, the N-terminal signal peptide is estimated to be ∼30 residues, although the first amino acid of the mature protein has not been characterized experimentally. The RGMa precursor also contains a conserved GPI attachment signal at its C-terminus of ∼45 amino acids. This segment is removed in the endoplasmic reticulum during RGMa biosynthesis when the GPI anchor is added to the nascent protein [2,22]. Other recognizable protein elements in RGMa include an RGD motif (arginine-glycine-aspartic acid; a potential integrin-binding site [2,23]), and a partial vWD (von Willebrand type-D) domain [2,24] that contains the site of internal cleavage to generate two-chain RGMa (Figure 3) (these domains and other aspects of the biochemistry of RGM proteins will be discussed in the section on structure–function relationships below). The mechanism of intramolecular cleavage of RGMa has not been established, although it appears to occur during its biosynthesis, leading to a mature RGMa that is a disulfide-bonded two-chain protein composed of an N-terminal fragment of ∼123 residues, and a C-terminal segment of ∼238 residues [2,25], and that is linked to the outer face of the plasma membrane by its C-terminal GPI anchor [2,26,27] (Figure 3B). The number and pattern of disulfide bonds has not been established yet for the 14 cysteines found in mature RGMa (a molecular model is discussed in the section on structure–function relationships below). RGMa also appears to be a glycoprotein, with three potential asparagine-linked glycosylation sites in mammals and two in the chicken (Figure 3A) [2,26]. At present it is not known if other RGMa isoforms exist, such as single-chain species, or whether soluble forms of the protein are found in the extracellular fluid.
Physiological functions and mechanisms of action
RGMa was identified as a factor involved in guiding axons by repulsion from the temporal half of the developing chicken retina toward the anterior optic tectum in the brain, and membranes derived from cells expressing chick RGMa were shown to inhibit temporal retinal growth cones, but had little effect on nasal growth cones . Perhaps surprisingly given these initial observations, genetic knockout of RGMa in mice did not alter retinal axonal patterning, but rather caused defects in neural tube closure . Thus the exact in vivo functions of RGMa in mammals remain to be determined.
It has been shown that RGMa regulates repulsive guidance of retinal axons via binding to neogenin [7,28], a transmembrane protein that is also a receptor for netrins, a family of secreted molecules involved in neuronal development and cell survival (reviewed in ). Unlike netrins, RGMa does not bind to proteins related to neogenin, such as DCC (deleted in colorectal cancer) or members of the Unc (unco-ordinated) sub-family , although recent observations suggest an indirect association with Unc5b . In addition to regulating retinal axonal guidance, the interaction between RGMa and neogenin has been found to promote neuronal survival . Initial studies of the early events triggered after RGMa binds to neogenin have suggested the involvement of several signal transduction intermediates, including protein kinase C, the small GTPase RhoA, RhoA kinase [27,30], and focal adhesion kinase [31,32], as well as the putative transcriptional co-activator, LIM-only protein 4 , but the full spectrum of biochemical mechanisms responsible for mediating the biological effects of RGMa by neogenin has not been established.
Similar to other members of the RGM family, RGMa also has been found to bind to selected BMPs (bone morphogenetic proteins) [19,34], which belong to the TGF (transforming growth factor)-β growth factor family . In initial biochemical studies, a fusion protein composed of human RGMa linked to the IgG Fc fragment was shown to bind radiolabelled BMP-2 and BMP-4 but not BMP-7 or TGF-β1 in cross-linking experiments . In cell-based studies, over-expression of RGMa was found to increase activity of a co-transfected promoter-reporter gene containing a BRE (BMP-response element), whereas knockdown of endogenous RGMa led to a reduction in reporter gene expression . Although these preliminary observations are intriguing, a role for BMPs in the biological actions of RGMa has not been defined.
Chromosomal organization and gene structure
RGMb is a single-copy gene in the eight mammalian and seven non-mammalian vertebrates in which it has been identified (Table 1). Similar to RGMa, RGMb resides within a conserved chromosomal locus, and comprises one of five linked genes that are found in the same relative orientation to each other in the human, mouse and chicken genomes (Figure 4). In each of these species, RGMb is located in a tail-to-tail transcriptional orientation with Chd1, in a relationship similar to that of RGMa and Chd2 (compare Figures 1 and 4). This suggests that a duplication event involving this chromosomal region occurred during evolution prior to the emergence of mammals. Further away and upstream of RGMb are Riok2 (right open reading frame kinase 2), Lix1 (Limb expression 1) and Lnpep (leucyl/cystinyl aminopeptidase) (Figure 4). In contrast, to date very little is known about the chromosomal environment of RGMb in the zebrafish genome (Figure 4).
The human RGMb gene is ∼25 kb in length, and contains 5 exons (Figure 5 and Table 2), including two 5′ non-coding exons (1 and 2), which include ∼406 nucleotides of a ∼524 nucleotide 5′ UTR of RGMb mRNA. The 5′ end of exon 1 has not been mapped. The remaining 118 nucleotides of the 5′ UTR are found in exon 3, which also includes the first 45 codons of the coding region. Exon 4 encodes the next 170 codons, and exon 5 the remaining 222 codons plus a 3′ UTR of 308 nucleotides that includes a single polyadenylation signal (Figure 5). In the mouse genome, only three RGMb exons have been identified to date, and these correspond to exons 3–5 of the human RGMb gene (Figure 5). The 3′ UTR of mouse RGMb mRNA encoded by exon 3 is longer than its human counterpart, being ∼2.5 kb in length. In zebrafish, only the coding region for RGMb has been mapped to its genome , and is found within three distinct exons (Figure 5).
RGMb was discovered by an informatics-based search for genes related to RGMa , and was independently cloned as a gene whose putative promoter was bound by the homeodomain transcription factor, DRG11, which is expressed in DRG (dorsal root ganglia) of the sympathetic nervous system [4,36,37]. RGMb (DRG-‘ON’ or Dragon) was co-localized with DRG11 mRNA in dorsal root ganglia and in the spinal cord. RGMb mRNA also was detected in the developing neural tube prior to the onset of expression of DRG11, and has been found in other areas of the nervous system where DRG11 is not produced . This latter result suggests that RGMb gene expression is controlled by additional regulatory factors besides DRG11. Results of in situ hybridization experiments have found that RGMb mRNA is expressed in the DRG, in the spinal cord excluding the ventricular zone, in the retina, in the optic nerve, and in other distinct regions of the brain, including the developing mouse midbrain, hindbrain and forebrain [1,4,8,38], although the pattern of RGMb gene expression does not overlap appreciably with that of RGMa . RGMb mRNA also has been detected in the nervous system of the developing zebrafish , and has been found in the reproductive tract of rodents . Based on results of Northern blotting studies, there appears to be a single RGMb transcript in mice of ∼4.2 kb [1,4], which is approximately the same size as the three mouse RGMb exons (Table 2). As with RGMa, the mechanisms responsible for RGMb gene expression in different tissues or under different physiological or pathological conditions have not been characterized, and virtually nothing is known about the structure or function of the RGMb gene promoter.
Protein sequence and expression
Cloning of mouse RGMb cDNA revealed a predicted protein of 438 amino acids [1,4], which is 89% identical to human RGMb (437 amino acids) and 65% identical to zebrafish RGMb (436 amino acids) (Table 3). The primary RGMb translation product is predicted to contain an N-terminal signal peptide of ∼50 residues, although this has not been verified experimentally, and a C-terminal GPI attachment signal of ∼35 amino acids [1,4]. Other identifiable motifs in RGMb include a partial vWD element. After forced expression of mouse RGMb in HEK-293 and COS-7 cells, only a single protein band of ∼50 kDa could be detected in cell extracts by immunoblotting, and a similarly sized protein was released into the culture medium after incubation of cells with PI-PLC (phosphoinositide-specific phospholipase C), which cleaves the GPI anchor [1,4]. These latter results indicate that only a single-chain RGMb species is attached to the outer face of the cell membrane [4,40] (Figure 3B), although the protein contains a putative internal proteolytic cleavage site similar to that in RGMa. RGMb also appears to be a glycoprotein, and is predicted to encode up to two asparagine-linked glycosylation sites (Figure 3A). As with RGMa, mature RGMb contains 14 cysteines whose potential organization into disulfide bonded residues has not been established (but see discussion of potential molecular models in the section on structure–function relationships below).
Potential physiological functions
No biological functions of RGMb have been elucidated, except for its possible ability to promote cell–cell adhesion by homophilic interactions [1,4], and its capability to bind selected BMPs [40,41]. As with RGMa, overexpressed full-length RGMb has been found to increase the activity of a promoter–reporter gene containing a BMP-responsive transcriptional control element in cell culture systems [39,40], but unlike RGMa, RGMb has not been shown to bind to neogenin.
Chromosomal organization and gene structure
RGMc is a single-copy gene in the nine mammalian and six non-mammalian vertebrates in which it has been identified (Table 1). Unlike RGMa and RGMb, RGMc has not been found to date in the chicken or other avian species. In human and mouse genomes, RGMc comprises one of 10 linked genes in a syntenic locus that includes among others, Txnip (thioredoxin interacting protein), Polr3gl [polymerase (RNA) III (DNA directed) polypeptide G-like], Ankrd34 (ankyrin repeat domain 34), Lix1l (related to Lix1, which maps near RGMb), and Chd1l [related to Chd1 and Chd2, which are located near RGMb and RGMa respectively (compare Figures 1, 4 and 6)]. Of note, however, the relative transcriptional orientation of RGMc and Chd1l (tail-to-head) differs from that of RGMa–Chd2 and RGMb–Chd1 (tail-to-tail). Moreover, in zebrafish, the RGMc chromosomal environment differs from mammals (Figure 6). Although the location of two Txnip-like genes and Polr3gl are adjacent to RGMc, and is similar to what is seen in mammals, Mtx1 and Thbs3a are just upstream of zebrafish RGMc, but are located at a distance of more than 8 Mb from mouse RGMc. Furthermore, there is no Chd homologue present on the zebrafish RGMc locus.
Human and mouse RGMc genes are similar in size (∼4.3 and ∼4.0 kb respectively, Table 2) and organization, being composed of four exons separated by three introns (Figure 7), and are considerably smaller than mammalian RGMa or RGMb (Table 2). In both species, exon 1 is ∼160 nucleotides in length, although the 5′ end has not been identified, and contains most of the 5′ UTR of RGMc mRNA. The remaining 90 nucleotides of the 5′ UTR are found in exon 2, along with the first 31 codons of the RGMc protein (28 in mouse). Exon 3 encodes the next 173 codons (169 in mouse), and exon 4 the remaining 222 codons (223 in mouse), plus a 3′ UTR of ∼1150 nucleotides with a single polyadenylation signal (Figure 7). The four RGMc exons are well-conserved between the mouse and human genes, with nucleotide sequence identity ranging from 73 to 83% (calculated using references [12–15]). The three introns are less conserved, although their lengths are similar between mouse and human (Figure 7). The zebrafish RGMc gene is larger than its mammalian counterparts, and contains 5 exons distributed over ∼11.4 kb (Figure 7). Exons 1 and 2 are non-coding but are not similar in DNA sequence to mammalian RGMc exon 1. In contrast, zebrafish exons 3–5 correspond to mammalian RGMc exons 2–4, with nucleotide sequence identity ranging from 50 to 59%.
RGMc was independently discovered as a gene within a locus linked to the human iron overload disorder juvenile haemochromatosis , as an mRNA related to RGMa and RGMb [1,3,4,8], and as a novel transcript expressed during skeletal muscle differentiation . In addition to skeletal muscle, RGMc mRNA has been detected in the heart and in the liver [1,5,8]. During mouse development, RGMc transcripts are found first in the somites, precursors of skeletal muscle, as early as E11.5, which is before muscle can be identified morphologically . Similar observations have been made in zebrafish [4,16]. In the mouse, RGMc mRNA is detected by E13.5 in the heart and liver [5,42].
Very little is known about RGMc gene regulation. In mice, RGMc mRNA levels were shown to be increased in the liver but not in skeletal or cardiac muscle after systemic injection of bacterial lipopolysaccharide , but as with RGMa and RGMb, the biochemical mechanisms responsible for controlling RGMc gene transcription or mRNA stability in different tissues or under different physiological or pathological conditions have not been established, and virtually nothing is known about the structure or function of the RGMc gene promoter.
Protein sequence, processing and expression
The initial cloning of human and mouse RGMc cDNAs revealed primary translation products of 426 and 420 amino acids respectively, with a predicted N-terminal signal peptide of ∼31 residues and a C-terminal GPI-attachment signal of ∼45 amino acids [1,3,9], although as in other RGM molecules, the precise boundaries have not been determined experimentally. Mouse and human RGMc precursor proteins are 88% identical to each other (Table 3). Similar to RGMa, RGMc contains up to three asparagine-linked glycosylation sites, and similar to its paralogues, has several shared protein motifs, including an RGD sequence and a partial vWD domain with a conserved proteolytic cleavage site (Figure 3A). In addition, and unlike RGMa or RGMb, mammalian RGMc proteins encode a furin-like PPC (pro-protein convertase) recognition and cleavage sequence near the C-terminus (Figure 3A), and the protein has been shown to be cleaved by furin at this site [43–45]. As a consequence, RGMc appears to undergo a complex series of biosynthetic and processing steps, leading to the production of four distinct protein isoforms in skeletal muscle and after expression of the recombinant protein in heterologous mammalian cells [9,43,45,46]. Two of the RGMc proteins, a disulfide-bonded two-chain species that is similar to RGMa, and a single-chain isoform similar to RGMb, are attached to the extracellular face of the plasma membrane by a GPI linkage [9,43,45,47] (Figure 3B). In addition, single-chain RGMc species have been detected in the extracellular fluid of cultured cells, and in blood [9,43–48] (Figure 3B). These latter two proteins differ at their C-termini, with the smaller species being derived from the larger by PPC-mediated proteolytic cleavage [9,43,45]. Results of biosynthesis experiments additionally support the idea that the two soluble single-chain RGMc proteins originate from the single-chain cell-associated molecule [9,43]. Analogous studies have not been reported for RGMa or RGMb. As in RGMa and RGMb, the disulfide bonding pattern of the 14 cysteines found in mature full-length RGMc has not been experimentally defined, but a possible model is discussed below.
Physiological functions and mechanisms of action
A role for RGMc in systemic iron metabolism was first inferred when mutations in the human gene were linked to the severe iron overload disorder, juvenile haemochromatosis . This relationship was strengthened when mice engineered to lack RGMc were found to have excessive accumulation of iron in multiple tissues [42,49]. It has been postulated that the normal biological actions of RGMc lead to induction of expression of the secreted hepatic peptide hepcidin [6,42], which functions as a negative regulator of the uptake of dietary iron from the duodenum and of the release of stored iron from macrophages [6,50]. Humans with juvenile haemochromatosis and mice with RGMc deficiency have low levels of serum or urinary hepcidin [51,52], and mice lacking RGMc also have diminished expression of hepcidin mRNA in the liver [42,49]. The mechanism of regulation of hepcidin by RGMc is currently under active investigation, with the leading hypothesis being that cell-membrane associated RGMc facilitates signalling by BMPs through its receptors to promote hepcidin gene expression [41,53–55]. In this model, soluble RGMc has been proposed to act as an inhibitor, presumably by sequestering BMPs away from cell-surface receptors [45,48].
Similar to RGMa, RGMc binds to the extracellular portion of neogenin [46,47,56], although the role of neogenin in the biological actions of RGMc has not been established. One report has demonstrated preferential binding of two-chain RGMc to neogenin , and mouse versions of two juvenile haemochromatosis-associated RGMc amino acid substitution mutants, D172E and G320V, which did not form a two-chain species [9,46], were unable to bind . Similar results were observed with the human G320V juvenile haemochromatosis-associated protein [9,43,45,47]. In other experiments, neogenin was unable to alter BMP-mediated hepcidin gene expression , although it is unclear which RGMc protein isoforms were used in these studies. Further studies will be needed to elucidate the biochemical mechanisms by which RGMc regulates systemic iron metabolism under different physiological conditions, to determine if there is a role for neogenin in the biological actions of RGMc, and to characterize the functions of different RGMc species in normal physiology and in disease.
MOLECULAR EVOLUTION OF THE RGM FAMILY
One unresolved question about the RGM family concerns the evolutionary relationships among the three members. To address this issue, we performed a series of phylogenetic analyses by querying multiple sequence alignments of selected RGM proteins after applying the following two criteria: (i) using only well-annotated sequences in which the protein defined by translation from both mRNA and genomic sequences is identical, and (ii) minimizing the level of ‘mammalian bias’ by selecting RGM genes from a diversity of organisms. We found that three out of four assessments supported the hypothesis that RGMc diverged from a common ancestor earlier than did RGMa or RGMb (see legend to Figure 8 for a summary of methods). Two of the phylogenetic trees are presented in Figure 8. Similar conclusions were reached by Schmidtmer and Engelkamp , whereas Camus and Lambert  have advocated the alternative viewpoint that RGMa and RGMc are more closely related to one another.
Inspection of RGM genomic loci strengthens the view that RGMa and RGMb have a closer relationship to each other than to RGMc. RGMa and RGMb genes are physically linked to Chd2 and Chd1 respectively, in mammalian, chicken, and zebrafish genomes (Figures 1 and 4), and are each part of a more extensive syntenic linkage group that includes in order (at least in the human genome) RGMA - CHD2 - ST8SIA1 - SLCO3a1 and RGMB - CHD1 - ST8SIA4 - SLCO4C1, indicating that the organization of paralogous genes within the duplicated chromosomal regions has been maintained (Figures 1 and 4). In contrast, only a Chd1-related pseudo-gene is found near the same chromosomal locus as RGMc in mammals, but is located at a much greater distance from RGMc than Chd2 or Chd1 are from RGMa or RGMb respectively (compare Figures 1, 4 and 6). Also, in mammals, the pseudo-gene Lix1-like is found near RGMc, but in a different relationship than Lix1 and RGMb (compare Figures 4 and 6).
Single RGM genes have been identified in several invertebrates. The evidence is strongest for existence of an RGM protein in the sea squirt, Ciona intestinalis, where a polyadenylated mRNA has been characterized that corresponds to the four-exon genomic DNA sequence (NCBI accession number AK173741), and encodes a predicted protein of 637 amino acids (calculated using Transeq ), with multiple cysteine residues (15 in the putative mature protein compared with 14 in vertebrate RGMs), and overall similarity of 40%, 38% or 27% to mouse RGMa, RGMb or RGMc respectively. Similar to RGMb, Ciona RGM contains no RGD motif, but instead has an RGN sequence [15,57]. Similar to mammalian RGMc, the Ciona RGM has a predicted PPC site near its C-terminus. To date, however, this putative protein has not been characterized.
An RGM gene also has been identified in the purple sea urchin, Strongylocentrotus purpuratus, where it maps near a CHD1-like gene (LOC575959) as seen in RGMa and RGMb loci in vertebrates (Figures 1 and 4). The protein predicted to be encoded by this gene contains an RGD motif and 16 cysteines (14 of which align with the 14 conserved cysteines in mammalian RGMs), and is ∼40% identical to mammalian RGMa or RGMb, and ∼35% identical to RGMc . In the nematode, Caenorhabditis elegans, a single RGM gene also has been predicted, but the putative protein is <30% identical to mammalian RGMs, lacks several of the conserved cysteine residues found in mammalian RGM proteins, and unlike vertebrate RGM proteins, does not contain either an RGD or RGN sequence . Although a single RGM has been reported in molluscs (California brown sea slug, Aplysia californica) , definitive genomic evidence is lacking. Clearly, further analysis of putative RGM genes and their encoded proteins in invertebrates is needed for more complete understanding of the evolution and functions of the RGM family.
STRUCTURE–FUNCTION RELATIONSHIPS AMONG RGM PROTEINS
Three-dimensional structures can provide critical insights into structure–function relationships within a protein family. Although no such information is available yet for the RGM family, emerging computational methods such as comparative modelling [60,61], fold recognition , and ab initio techniques [63,64] have the potential to help overcome this deficiency. Comparative modelling can approximate the three-dimensional structure of a target protein for which only the amino acid sequence is available, provided that an empirical three-dimensional ‘template’ structure is available from a protein with >30% sequence identity. Alternatively, threading methods, which search for an optimal fit of query sequences onto known three-dimensional structures of proteins in databases, can be used when a comparative modelling approach is unsuccessful. However, neither comparative modelling nor threading techniques were able to identify appropriate templates for RGM proteins. As a consequence, we constructed initial structural models for the RGM family with ab initio approaches, which use the physical properties of the primary amino acid sequence to predict structures. We employed ‘Rosetta’ ab initio modeling software, because it has been the most consistent and accurate in predicting structures of folded domains in a series of trials (CASP: critical assessment of techniques for protein structure predictions [63–70]). For the RGM family, structural segments were generated using the Rosetta fragment server with input amino acid sequence information derived from 22 RGM proteins (see legend to Figure 9). One thousand independent simulations were generated and were organized into clusters according to structural similarities, as outlined in the legend to Figure 9. All ab initio models analysed suggest that RGM proteins adopt a two-lobed structure (Figure 9).
Mature RGMa, RGMb and RGMc each contain 14 similarly placed cysteine residues (Figure 3A), and all appear to be disulfide-bonded proteins [9,45,47]. However, the number or location of disulfide bonds is unknown. The majority of ab initio models show a disulfide bond between Cys9 and either Cys7 or Cys8, although one model suggests two disulfide bonds (Figure 9A, cysteine residues shown as space-filling models in purple), and this could be the linkage responsible for maintaining two-chain forms of RGMa or RGMc. Both Cys11 and Cys12, and Cys13 and Cys14, are also predicted to form disulfide bonds in all models generated, and are located within the C-terminal part of the two-lobed structure (Figure 9A). Although the connectivity varies slightly between models, the majority of the predictions suggest two disulfide bonds for the N-terminal lobe between Cys1 and Cys2, and Cys4 and Cys5, for a total of 5 or 6 disulfide linkages per RGM molecule. This would leave 2–4 free cysteines in the protein (Figure 9A). Clearly, direct experiments are needed to define the actual disulfide bonding pattern for each RGM family member.
von Willebrand factor is a glycoprotein that helps mediate platelet adhesion at damaged blood vessels through interactions with blood clotting Factor VIII [24,71]. It contains five distinct structural domains (vWA, B, C, D and CK) , and one of these motifs (type D) has been recognized in all RGM proteins . Our ab initio models suggest that this partial vWD domain is highly structured, and contains surface exposed α-helices and β-strands (yellow region in Figure 9). These are consistent with the crystal structure of the entire vWD domain [RCSB (Research Collaboratory for Structural Bioinformatics) protein structural data base accession number 1ijb] . The RGM partial vWD region contains the site of intramolecular proteolytic cleavage to generate two-chain forms of RGMa and RGMc (see Figures 3A and 3B), and this cleavage has been hypothesized to occur by acid-labile hydrolysis between an aspartic acid and proline residue . In the model depicted in Figure 9, these two amino acids are located on the surface of the protein (surface of space-filling model in 9B). Of note, a substitution of this aspartic acid residue to glutamic acid in human RGMc (D172E) causes juvenile haemochromatosis , and in biochemical experiments the mutant protein does not form a two-chain molecule [9,46]. Another disease-causing amino acid substitution in human RGMc of G320V also appears to block production of the two-chain protein [9,46]. The ab initio model depicted in Figure 9 suggests that Gly320 is located on a surface that is in proximity to Asp172. On the basis of the model it thus appears possible that the G320V substitution, which increases the side-chain volume and hydrophobicity, may inhibit interactions with some unknown protein/protease to prevent proteolysis at residue Asp172. Alternatively, the substitution may induce certain conformational changes that indirectly impair proteolytic cleavage at Asp172.
RGMa and RGMc each contain a RGD motif, a tripeptide classically identified as an integrin-binding element , whereas RGMb does not [3,23]. Structurally, RGD motifs are found at or near the end of an α-helix , and our ab initio models map the RGM RGD sequence to a loop between two α-helices on the surface of the protein (Figure 9A). The exact function of this motif in RGMa or RGMc is not known, although amino acid substitutions of glycine to valine or arginine (G99V or G99R) appear to cause juvenile haemochromatosis in humans [6,73], and the analogously mutated mouse RGMc (G92V) was unable to bind BMP-2 in biochemical assays .
RGM proteins contain several putative asparagine-linked glycosylation sites, and have been shown to be glycoproteins [2,9,26], although the functional role of glycosylation has not been established for any RGM family member yet. In our ab initio structural models, at least two of these sites map to the surface of the molecule (Figure 9). As noted earlier, RGMc but not RGMa or RGMb contains a pro-protein convertase recognition and cleavage site near the C-terminus of the mature protein (Figure 3). As seen in Figure 9(A), this part of the protein in our ab initio model also maps to a surface loop, and thus potentially would be readily accessible to targeted proteolysis by furin or other pro-protein convertases.
SUMMARY AND CHALLENGES FOR THE FUTURE
The RGM family appears to have been composed of three genes early in vertebrate evolution, being present in a common ancestor to mammals and fish. Each gene is expressed in a distinct developmental and tissue-specific pattern, with RGMa and RGMb being produced in different parts of the central nervous system, and RGMc being synthesized in striated muscle and liver. The molecular mechanisms governing such diverse tissue-restricted gene expression have not been established, and little is known about the structure or function of RGM gene promoters, about their mechanisms of transcriptional regulation, or about control of RGM mRNA processing or stability. At the protein level, the three RGM family members share several motifs and are predicted to have similar three-dimensional structures based on our ab initio modeling, but the respective proteins appear to undergo distinct biosynthetic and processing steps, whose regulation has not been characterized. From the perspective of function, all three RGM proteins appear capable of binding selected BMPs, although binding domains have not been mapped. It appears that interactions with selected BMPs may mediate at least some of the biological effects of RGMc to control hepcidin gene expression, but to date no role for BMPs has been defined in the actions of RGMa or RGMb. To date only RGMa and RGMc have been shown to bind to neogenin, and although signalling through neogenin is critical for the biological effects of RGMa on repulsive axonal guidance and on neuronal survival, its role in the actions of RGMc remains to be elucidated. Similarly, the full spectrum of biological functions of the three RGMs has not been completely characterized yet, and will remain an active topic of ongoing investigation.
This work was supported by the National Institutes of Health [grant numbers R01 DK42748 (to P. R.), T32 HL007781 and F30 HL095327 (to C. J. S.)] and the National Science Foundation [grant number NSF-0746589 (to U.S.)].
We thank Kevin Kendall at MacVector for advice and guidance.
Abbreviations: BMP, bone morphogenetic protein; Chd, chromodomain helicase DNA-binding protein; DRG, dorsal root ganglion; E, embryonic day; GPI, glycosylphosphatidylinositol; Lix1, Limb expression 1; Mctp2, multiple C2 domains, transmembrane 2; PI-PLC, phosphoinositide-specific phospholipase C; Polr3gl, polymerase (RNA) III (DNA directed) polypeptide G-like; PPC, pro-protein convertase; RGD motif, arginine-glycine-aspartic acid; RGM, repulsive guidance molecule; Slco/SLCO, solute carrier organic anion transporter family; St8sia/ST8SIA, ST8 α-N-acetyl-neuraminide α-2,8-sialyltransferase; TGF, transforming growth factor; Txnip, thioredoxin interacting protein; Unc, unco-ordinated; UTR, untranslated region; vWD, von Willebrand type D
- © The Authors Journal compilation © 2009 Biochemical Society