CUB domains are 110-residue protein motifs exhibiting a β-sandwich fold and mediating protein–protein interactions in various extracellular proteins. Recent X-ray structural and mutagenesis studies have led to the identification of a particular CUB domain subset, cbCUB (Ca2+-binding CUB domain). Unlike other CUB domains, these harbour a homologous Ca2+-binding site that underlies a conserved binding site mediating ionic interaction between two of the three conserved acidic Ca2+ ligands and a basic (lysine or arginine) residue of a protein ligand, similar to the interactions mediated by the low-density lipoprotein receptor family. cbCUB-mediated protein–ligand interactions usually involve multipoint attachment through several cbCUBs, resulting in high-affinity binding through avidity, despite the low affinity of individual interactions. The aim of the present review is to summarize our current knowledge about the structure and functions of cbCUBs, which represent the majority of the known CUB repertoire and are involved in a variety of major biological functions, including immunity and development, as well as in various cancer types. Examples discussed in the present review include a wide range of soluble and membrane-associated human proteins, as well as some archaeal and invertebrate proteins. The fact that these otherwise unrelated proteins share a common Ca2+-dependent ligand-binding ability suggests a mechanism inheri-ted from very primitive ancestors. The information provided in the present review should stimulate further investigations on the crucial interactions mediated by cbCUB-containing proteins.
- Ca2+-binding site
- CUB domain
- cbCUB (Ca2+binding CUB domain)
- ionic bond
- protein–ligand interaction
CUB domains were initially identified in various extracellular proteins, most of which were known to be involved in developmental processes, and were named from the proteins in which they were initially discovered, i.e. the human complement proteases C1r and C1s, the embryonic sea urchin protein Uegf, and the human BMP-1 (bone morphogenetic protein-1) . The CUB domain repertoire currently features several thousands of samples occurring in over 2000 soluble or cell-membrane-associated proteins involved in a wide range of vital functions, including complement activation, developmental patterning, tissue repair, axon guidance and angiogenesis, cell signalling, fertilization, haemostasis, inflammation, neurotransmission, receptor-mediated endocytosis, and tumour suppression [2,3]. Spanning approximately 110 amino acids, CUB domains were initially recognized from a seven-residue consensus sequence, including four cysteine residues conserved in most cases except for the N-terminal CUB domains of C1r and C1s . The first CUB domain crystal structures to be solved were those of PSP-I (porcine seminal plasma protein 1)/PSP-II and bovine aSFP (acidic seminal fluid protein), three members of the mammalian spermadhesin family involved in sperm–egg binding and displaying a single CUB domain architecture . These analyses revealed a compact ellipsoidal structure assembled from ten β-strands organized in a sandwich of two five-stranded β-sheets, each containing two parallel and four antiparallel strands (Figure 1). For several years, this architecture was considered the generic CUB domain fold, until cbCUBs (Ca2+-binding CUB domains) were discovered.
EMERGENCE OF A cbCUB SUBSET
Evidence that the CUB domain repertoire comprises a particular subset endowed with Ca2+-binding ability arose recently, from the resolution of the crystal structures of the N-terminal CUB1-EGF (epidermal growth factor domain) segments of human C1s and MASP2 [MBL (mannan-binding lectin)-associated serine protease 2] [5,6]. Both CUB domains were found to lack the first two β-strands present in spermadhesins , hence featuring two four-stranded β-sheets, each made of antiparallel strands (Figure 1). Such a deletion of both β1 and β2 strands is observed solely in the CUB1 domains of the mammalian C1r/C1s/MASP family [7,8]. Deletion of β1 is, in contrast, a feature common to all cbCUBs (Figure 2). Compared with the spermadhesins, a major and quite unexpected characteristic of the human C1s and MASP2 structures was the presence of a Ca2+-binding site on the distal edge of their CUB domain (Figure 1). In both cases, coordination of the Ca2+ ion was found to involve three acidic residues (glutamic acid and two aspartic acid residues). Together with a tyrosine residue closely associated with the Ca2+-binding sites, this triad proved to be conserved in a large proportion of the CUB repertoire, giving rise to the hypothesis that this signature defined a CUB domain subset with the specific ability to bind Ca2+. The crystal structures of the CUB2 domain from neuropilin-1 , of the human MASP1/3 CUB1–EGF–CUB2 segment , and more recently of CUB domains 5–8 of CUBN (cubilin)  provide additional examples of cbCUBs and validate the link between the Tyr-Glu-Asp-Asp signature (Figure 2) and the presence of a Ca2+-binding site.
STRUCTURAL FEATURES OF THE cbCUB CA2+-BINDING SITE
Comparative analysis of the nine currently available CUB domain Ca2+-binding site structures shows highly conserved features, with subtle differences (Table 1 and Figure 3). The same mode of Ca2+ coordination by the acidic triad is conserved. The residues equivalent to Glu45 and Asp98 of human C1s (coloured orange in Figure 3) are exposed to the surface and always coordinate Ca2+ through a single carboxy oxygen. In contrast, the central residue of the triad (Asp53 in C1s, coloured red in Figure 3) is buried and is in most cases a bidentate Ca2+ ligand. This buried Asp is stabilized by a hydrogen-bond interaction with the hydroxy group of the conserved tyrosine residue of the cbCUB signature (Tyr17 in C1s). Additional Ca2+ coordination is more variable and involves interactions with one or two main-chain carbonyl groups provided by the central loop L9, plus one or two water molecules (Figures 2 and 3). The complete coordination sphere can only be seen in high-resolution structures where the positions of water molecules are clearly defined. As illustrated in Figure 3, the Ca2+ coordination has a pentagonal bipyramid geometry. As seen for other types of Ca2+-binding sites, slight variations of the coordination are observed. For example, the Ca2+ ligands contributed by loop L9 in the cbCUB modules vary slightly in their number, nature and position in the sequence (Figures 2 and 3). Whereas both aspartic acid residues of the acidic triad appear to be strictly conserved, glutamic acid is replaced by aspartic acid in some cases, such as in the human C1r CUB2 domain (Figure 2). In this case, there is evidence that such a replacement does not prevent formation of a functional Ca2+-binding site , indicating some tolerance at this position.
STRUCTURAL COMPARISON WITH OTHER CA2+-BINDING SITES
The cbCUB Ca2+-binding site is positioned on the edge of a β-sheet (Figure 1), a location reminiscent of the synaptotagmin I C2B domain (which holds several Ca2+ ions) and the C-terminal module of the thermostable Thermotoga maritima xylanase , two binding sites involving four acidic residues. The fact that the L5 and L9 loops contributing Ca2+-ligands each connect the two sides of the β-sheet appears to be a structural feature relatively specific to cbCUBs, although it is also observed in haemolysin-type Ca2+-binding repeats . A more thorough comparison of other β-sheet-based Ca2+-binding sites is beyond the scope of the present review and can be found in . The residues contributing Ca2+ ligands in cbCUBs are located at the tips of strands β6 and β9, and in loops L5 and L9, in contrast with EF hands and LA [LDLR (low-density lipoprotein receptor) type A] modules, where these are provided by a helix–loop–helix motif (Figure 3). The resulting seven-ligand Ca2+ coordination seen in cbCUBs is quite similar to that of the classical EF hand , and is intermediate between the six-ligand octahedral geometry characteristic of LA modules (Figure 3D) and the complete coordination sphere seen in the human C1s cbEGF (Ca2+-binding EGF domain)-binding site involving eight ligands as in most cbEGFs (Figure 3E).
The direct Ca2+-coordination sphere involves three acidic residues in cbCUBs, and four in classical EF hands. This high number of negative charges in comparison with other types of Ca2+-binding sites is considered to represent an optimal charge configuration  and favours the binding of multivalent ions compared with univalent ones, such as Na+ . Whereas no ion exchange was seen in the case of the human C1s cbEGF Ca2+-binding site, harbouring a complete coordination sphere, the C1s cbCUB1-binding site can accomodate Mg2+ (Figure 3F) as well as a lanthanide ion , as also observed for EF hands . In contrast with Ca2+ binding (Figures 3A and 3B), no direct Mg2+ ligand is provided by the human C1s CUB1 loop L9, which therefore exhibits some flexibility since its conformation slightly differs between the two subunits of the C1s CUB1–EGF dimer. Nevertheless, the Mg2+ ion retains a stabilizing role, since loops L5 and L9 are both ordered, which is not the case in the absence of a divalent ion. An illustration of the stabilizing role of Ca2+ is provided by a comparison of the homologous CUB1 domains of human and rat MASP2 (Figure 4), which shows that, in the latter, several residues (particularly Asp101 and Tyr102) within the Ca2+-binding area are disordered due to the lack of Ca2+ stabilization [6,8]. The stabilizing effect of Ca2+ provides a basis for the recent observation that the human C1r CUB2 domain has a compact folded structure in the presence of Ca2+, and a disordered flexible conformation in the absence of Ca2+ . The stabilizing role of Ca2+ and Mg2+ and the occurrence of Ca2+-induced conformational changes have also been described in the case of EF-hand-binding sites .
A PROTEIN–PROTEIN INTERACTION PLATFORM UNDER CA2+ CONTROL
An essential role of the cbCUBs of C1r, C1s and the MASPs is to mediate assembly of complexes between these proteases and their cognate recognition proteins C1q, MBL and the ficolins . The observation that, in the presence of a divalent ion, the free side-chain oxygens of the two monodentate acidic Ca2+ ligands point towards the outside of the cbCUBs, approximately in the same direction (Figures 3A, 3B, 3F and 4A) prompted us to postulate that these residues could directly mediate electrostatic bonds. Indeed, site-directed mutations in human MASP2 and MASP3 allowed mapping of the residues interacting with MBL and the ficolins, revealing homologous binding sites for these proteins (Figure 5) involving residues from loops L5 and L9 of the cbCUBs either contributing Ca2+ ligands or located in close vicinity of the Ca2+-binding site [6,9]. Conversely, point mutants of MBL and the ficolins were also generated, revealing that interaction of these proteins with the MASPs involves a conserved lysine residue located in their collagen-like region [17,18], thus providing strong indication of an ionic interaction between this residue and the outer acidic Ca2+-binding residues of the MASP cbCUBs . Further support for this hypothesis was provided by the identification of the C1q-binding sites of C1r and C1s, indicating that these involve the outer (glutamic acid and aspartic acid) Ca2+ ligands of the CUB1 domains of C1r and C1s and the CUB2 domain of C1r (Figure 5G), these residues probably forming an electrostatic interaction with conserved lysine residues of the C1q collagen-like triple helix . The first direct experimental evidence of such interactions between acidic Ca2+ ligands of a cbCUB domain and a basic residue contributed by a protein ligand was obtained recently from the crystal structure of a complex between the IF–Cbl [Cbl (cobalamin, vitamin B12)-bound IF (intrinsic factor)] and the IF–Cbl-binding segment of CUBN comprising cbCUBs 5–8 . This structure shows that the CUBN–IF interaction involves primarily salt bridges between the two outer Ca2+ ligands of CUB6 (Glu1096 and Asp1146) and CUB8 (Glu1328 and Asp1373) and the side chains of Lys159 and Arg323 contributed by the α and β domains of IF respectively (see details of the CUB6–8–IFα interaction in Figures 5D and 5F). Both of the CUBN–IF interfaces are stabilized by additional hydrophobic interactions and hydrogen bonds involving residues contributed by loops L5 and L9, which are different in CUB6 and CUB8. These analyses thus provide structural evidence that the Ca2+ ion both stabilizes the conformation of the surrounding loops L5 and L9 and positions the side chains of its outer monodentate glutamic acid and aspartic acid ligands in an orientation that allows them to form a salt bridge with a reactive lysine or arginine residue contributed by a protein ligand. This mechanism probably applies to most other cbCUBs.
A LOW-AFFINITY INTERACTION SIMILAR TO THOSE MEDIATED BY THE LDLR FAMILY
Ca2+-binding sites are known to mediate a wide variety of interactions. In several instances, such as C-lectins, α2β1 integrins and annexin 5, the Ca2+ ion itself is directly involved in the interaction [19–21]. The cbEGF mediates stable dimerization of CUB1–EGF segments in the C1r/C1s/MASP family. In this case, a hydrophobic interaction is mediated by a Ca2+-ligand (Phe135 in C1s; Figure 3E). The ionic interactions mediated by cbCUBs involve a further type of mechanism that appears to be very similar to those carried out by the LDLR family , as exemplified by the interactions between LA modules 3 and 4 of LDLR and its associated protein RAP (receptor-associated protein) (Figures 6A and 6B) . Thus, despite their complete lack of homology with cbCUBs, both LA modules also form an ionic interaction with a lysine residue of RAP through two exposed acidic Ca2+ ligands . The interactions of ApoER2 (apolipoprotein E receptor 2) with its ligand reelin (Figures 6C and 6D) , and of the LDLR EGF1 domain with PCSK9 (proprotein convertase subtilisin/kexin type 9) (Figures 6E and 6F)  also involve the same type of ionic bonds between one or two acidic Ca2+ ligands and a basic (lysine or arginine) residue.
As judged from the cases investigated so far, a characteristic feature of the proteins containing cbCUBs is that interaction with their ligands always engages multiple CUB domains (Figures 5A and 5B), ranging from two for CUBN and PCPE-1 (procollagen C-proteinase enhancer-1) [10,26] to four for MASP dimers , up to six in the case of the C1s–C1r–C1r–C1s tetramer . In the case of PCPE-1, the cooperative binding of the CUB1 and CUB2 domains to the procollagen substrate of BMP-1 is facilitated by the presence of a flexible linker between the two domains . In fact, individual cbCUBs appear to bind their ligands with rather low affinity, and therefore efficient interaction requires cooperative binding through multiple domains. The same requirement for a multipoint interaction also applies to LA modules. Thus, except for the recognition of the secreted brain glycoprotein reelin where a single LA module of ApoER2 is required (Figures 6C and 6D) , at least two consecutive LA modules are needed to achieve high-affinity ligand binding through avidity, thereby ensuring specificity [27,28]. In other cases, such as the LDLR/PCSK9 and ApoER2/reelin interactions, increased affinity is achieved by means of other (mainly hydrophobic) interactions in addition to the electrostatic ones.
As for cbCUBs, the fact that they mediate low-affinity interactions probably arises for a large part from the nature of the underlying Ca2+-binding site, and particularly from the fact that the Ca2+ ion is fully exposed to the solvent and therefore exchangeable. The only available Ca2+-binding affinity is that of the recombinant CUB2 domain of human C1r, which was reported to bind Ca2+ with a rather high Kd (430 μM), and is therefore expected to be only partially saturated in blood . For comparison, the Kd values determined for Ca2+-binding sites such as EF hands , LA domains and cbEGFs  mostly lie in the low micromolar range, with a few exceptions >100 μM. The Kd measured using the isolated C1r CUB2 domain might, however, not be representative of its Ca2+-binding affinity in situ, considering the possible stabilizing role of neighbouring protein modules . In contrast with the above figure, it is noteworthy that engagement of six cbCUBs allows the native C1s–C1r–C1r–C1s tetramer to bind C1q with very high affinity (Kd=0.7–7.0×10−10 M) [11,31]. Moreover, the functional significance of these low-affinity interactions is illustrated by the fact that a single mutation of the outer aspartic acid ligand of MASP2-CUB1 resulted in a severe pathological deficiency .
Along the same lines, the cbCUB-mediated electrostatic interactions of CUBN are known to be strongly sensitive to low pH, raising the possibility of a decreased affinity in acidic compartments such as the endosome . From a general standpoint, the low affinity of the interactions mediated by cbCUBs appears to be particularly well adapted to the assembly of multimolecular proteases or to receptor–ligand interactions, in which most of these domains are involved, since the function and/or regulation of both types of complexes involves assembly/disassembly processes
cbCUB, AN EXTRACELLULAR DOMAIN INVOLVED IN MAJOR BIOLOGICAL FUNCTIONS
As illustrated in Figure 7, cbCUBs are found in a wide variety of soluble and cell-membrane-associated proteins. They often occur in multiple copies and, with a few exceptions such as CDCP2 (CUB-domain-containing protein 2), associate with many different types of domains. Remarkably, proteins containing cbCUBs are involved in extremely diverse functions. Many of the soluble proteins carry a serine protease or zinc protease domain, and these enzymes fulfil highly specialized functions such as complement activation (C1r, C1s and MASPs) [33,34], extracellular matrix assembly and growth factor activation [BMP-1, TLL-1 (tolloid-like 1) and TLL-2] , or egg fertilization [OVCH2 (ovochymase 2)] . DMBT1 (deleted in malignant brain tumours 1; also called gp-340) is a candidate tumour suppressor in various cancers and is also involved in mucosal innate immunity against bacteria , whereas the inactive serine protease PAMR1 (peptidase domain-containing protein associated with muscle regeneration 1) is associated with muscle regeneration . Cell-membrane-associated proteins also participate in a variety of biological functions, such as intestinal uptake of vitamin B12 (CUBN) , or regulation of complement activation and inflammation in the developing central nervous system [CSMD1 (CUB and sushi domain-containing protein 1)] . Nrp-1 (neuropilin-1) and Nrp-2, which are co-receptors for class 3 semaphorins and vascular endothelial growth factor, are involved in vascular development and tumorigenesis , whereas MFRP (membrane frizzled-related protein) is involved in eye development . Several proteins such as CUZD1 (CUB and zona pellucida-like domain-containing protein 1) , LRP12 (LDLR-related protein 12) , SEZ6L (seizure 6-like) , or CDCP1  have been implicated in different types of cancer.
It appears likely that the highly specialized functions of these proteins involve, for a large part, electrostatic interactions between their constituent cbCUBs and protein ligands, along the same scheme as demonstrated experimentally in the case of CUBN . The interaction of the neuronal receptors NRP-1 and -2 with class 3 semaphorins obviously represents a further example of such an interaction, considering that the semaphorin 3A-binding site in the CUB1 domain of NRP-1 was mapped to a region adjacent to the putative Ca2+-binding site, comprising residues from loops L3, L5 and L9 , the NRP-1 binding site of semaphorin 3A being assigned to a positively charged region of its receptor-binding domain . In the same way, the essential role of cbCUBs in the interactions mediated by TLL proteases is underlined by the loss of function resulting from mutations of predicted glutamic acid or aspartic acid Ca2+ ligands in CUB domains 2–4 of Drosophila tolloid protein  and in CUB2 of human BMP-1 . Mutations of putative Ca2+ ligands in the CUB1 domain of human PCPE-1 also inhibit its stimulating activity towards BMP-1, a strong inhibitory effect being also produced by mutation of Phe90 in loop L7, suggesting additional contribution of hydrophobic interactions . The ligand residues interacting with the cbCUBs of tolloids and PCPE-1 have not yet been identified.
Along the same lines, the fact that DMBT1 binds Gram-negative and Gram-positive bacteria in a Ca2+-dependent manner  suggests that its cbCUBs are involved in this process. It is also tempting to hypothesize that the ability of rat CSMD1 to specifically block activation of the classical complement pathway  arises from its ability to dissociate the C1 complex, by competing through its cbCUB domains with the C1s–C1r–C1r–C1s tetramer for interaction with reactive lysine residues on C1q . In contrast, the observation that TSG6 (tumour-necrosis-factor-stimulated gene 6 protein) binds through its CUB domain to fibronectin in a divalent-cation-independent manner  argues against a Ca2+-dependent ionic interaction.
AN ANCIENT DOMAIN CONSERVED THROUGH EVOLUTION
The fact that a similar interaction mode is conserved in a variety of otherwise functionally unrelated proteins suggests that this property has been inherited from primitive ancestors. Indeed, MASP homologues containing cbCUBs have been identified in several primitive invertebrates, including the sea anemone Nematostella vectensis , the amphioxus Branchiostoma belcheri  and the ascidian Boltenia villosa  (Figure 2). Interestingly, in N. vectensis and B. belcheri, the CUB1 and CUB2 domains show more sequence homology with their counterparts in the human MASPs than with each other. TLL proteins containing cbCUBs have also been identified in N. vectensis  (Figure 2), and BMP homologues have been shown to play a role in the development of sea anemones . Likewise, a gene identified in the archaeon Aciduliprofundum boonei features the sequence signature of cbCUBs  (Figure 2). Interestingly, in addition to MASP, N. vectensis also expresses genes corresponding to the C3 and factor B complement proteins, suggesting that a primitive multicomponent innate defence system was established in a common ancestor of Cnidaria and Bilateria more than 600 million years ago .
Although CUB domains were identified almost two decades ago, the existence of a particular Ca2+-binding subset was recognized only recently. It seems clear now from available sequence data that this subset represents the majority of existing CUB domains and has evolved from a very primitive common ancestor. Recent structural data  have confirmed previous proposals [6,9] that the Ca2+ ion not only stabilizes the corresponding extremity of the CUB domain, but also provides a framework of amino acids, including exposed glutamic acid and aspartic acid residues, serving a further function, the ionic interaction with protein ligands. Proteins containing cbCUBs have highly specific roles, often involved in major biological functions such as immune defence and development, and it is likely that these domains largely contribute to this specialization, by conferring on them the ability to specifically recognize their protein ligands.
Proteins containing cbCUBs can be easily identified at the sequence level through their conserved Tyr-Glu-Asp-Asp motif in addition to the CUB signature. The information provided in the present review can be harnessed to carry out in-depth investigations of their interaction properties, using a combination of site-directed mutagenesis, three-dimensional structural analyses and functional analyses. These studies should facilitate identification of the ligands of these proteins and of the residues involved in their binding sites, which, in case of a Ca2+-dependent interaction, are expected to be contributed by loops L5 and L9. Such investigations are timely and crucial, considering the major biological implications of proteins containing cbCUB domains.
The authors' work is supported by the Commissariat à l'Energie Atomique, the Centre National de la Recherche Scientifique, the Université Joseph Fourier, Grenoble, and in part by a grant from the CNRS.
Abbreviations: ApoER2, apolipoprotein E receptor 2; aSFP, acidic seminal fluid protein; BMP, bone morphogenetic protein; cbCUB, Ca2+-binding CUB domain; Cbl, cobalamin; CDCP, CUB-domain-containing protein; CSMD, CUB and sushi domain-containing protein; CUBN, cubilin; CUZD, CUB and zona pellucida-like domain-containing protein; DMBT, deleted in malignant brain tumours; EGF, epidermal growth factor domain; cbEGF, Ca2+-binding EGF domain; IF, intrinsic factor; LDLR, low-density lipoprotein receptor; LA, LDLR type A; LRP, LDLR-related protein; MBL, mannan-binding lectin; MASP, MBL-associated serine protease; MFRP, membrane frizzled-related protein; Nrp, neuropilin; OVCH, ovochymase; PAMR, peptidase domain-containing protein associated with muscle regeneration; PCPE-1, procollagen C-proteinase enhancer-1; PCSK9, proprotein convertase subtilisin/kexin type 9; PSP, porcine seminal plasma protein; RAP, receptor-associated protein; TLL, tolloid-like; TSG6, tumour-necrosis-factor-stimulated gene 6 protein
- © The Authors Journal compilation © 2011 Biochemical Society