The enamel matrix protein amelogenin is secreted by ameloblasts into the extracellular space to guide the formation of highly ordered hydroxyapatite mineral crystallites, and, subsequently, is almost completely removed during mineral maturation. Amelogenin interacts with the transmembrane proteins CD63 and LAMP (lysosome-associated membrane protein) 1, which are involved in endocytosis. Exogenously added amelogenin has been observed to move rapidly into CD63/LAMP1-positive vesicles in cultured cells. In the present study, we demonstrate the protein region defined by amino acid residues 103–205 for CD63 interacts not only with amelogenin, but also with other enamel matrix proteins (ameloblastin and enamelin). A detailed characterization of binding regions in amelogenin, CD63 and LAMP1 reveals that the amelogenin region defined by residues PLSPILPELPLEAW is responsible for the interaction with CD63 through residues 165–205, with LAMP1 through residues 226–251, and with the related LAMP2 protein through residues 227–259. We predict that the amelogenin binding region is: (i) hydrophobic; (ii) largely disordered; and (iii) accessible to the external environment. In contrast, the binding region of CD63 is likely to be organized in a ‘7’ shape within the mushroom-like structure of CD63 EC2 (extracellular domain 2). In vivo, the protein interactions between the secreted enamel matrix proteins with the membrane-bound proteins are likely to occur at the specialized secretory surfaces of ameloblast cells called Tomes' processes. Such protein–protein interactions may be required to establish short-term order of the forming matrix and/or to mediate feedback signals to the transcriptional machinery of ameloblasts and/or to remove matrix protein debris during enamel biomineralization.
- enamel matrix
- lysosome-associated membrane protein 1 (LAMP1)
- protein–protein interaction
- yeast two-hybrid assay
With the identification of a cDNA sequence for amelogenin in 1983  and the subsequent discoveries of additional organic components of the enamel extracellular matrix, including ameloblastin , enamelin  and amelotin , our understanding of enamel formation has advanced significantly. The expression of these structural proteins remains relatively unique to the developing tooth organ; primarily to the ectoderm-derived enamel layer during amelogenesis, but amelogenin, ameloblastin and enamelin are also transiently expressed in ectomesenchyme-derived dentin during dentinogenesis [5,6]. Certain proteases, including the serine protease kallikrein-4  and the matrix metalloproteinase 20 , are described as relatively unique to the developing enamel. The spatiotemporal expression for each of these enamel proteins continues to be defined. Of these proteins, amelogenin contributes more than 90% of the bulk of the organic matrix, and is absolutely essential for proper enamel formation [9,10]. Amelogenin is secreted by ameloblasts into the extracellular space to guide hydroxyapatite crystal formation and is subsequently almost completely removed during the enamel maturation.
At the ameloblast Tomes' processes, protein–protein interactions occur between the secreted enamel matrix proteins and membrane-bound proteins. The enamel matrix proteins amelogenin, ameloblastin and enamelin each interact with CD63 in vitro . CD63 is a member of the tetraspanin family, in which most members are cell-surface proteins that are characterized by the presence of four transmembrane segments . The tetraspanin proteins mediate signal transduction events that play roles in the regulation of cell development, activation, growth and motility . CD63, and other tetraspanins, are known to form complexes with integrins and act as organizers of membrane microdomains and signalling complexes [13,14]. CD63 resides not only in the cytoplasmic membranes of most cell types, but also in late endosomes, lysosomes and secretory vesicles, and traffics among these different compartments . This has led to the suggestion that CD63 plays a role in the recycling of membrane components and the uptake of degraded proteins from the extracellular matrix .
Amelogenin interacts with LAMP (lysosome-associated membrane protein) 1 . LAMP1 is a transmembrane protein that is highly expressed in late endosomes and lysosomes and is often used as a marker for these two organelles . LAMP1 immunoreactivity is also observed at the plasma membrane and in early endocytic compartments . The presence of LAMP1 on the plasma membrane is suggestive of LAMP1 acting as a cell-surface intermediate that can be shuttled to the lysosome through endocytosis. Thus LAMP1 may be involved in endocytosis, pinocytosis or phagocytosis . Recent studies have shown that exogenously added amelogenin moves rapidly into the CD63/LAMP1-positive vesicles that subsequently localize to the perinuclear region . Collectively, these observations suggest a possible mechanism by which amelogenin, or degraded amelogenin peptides, can be removed from the extracellular matrix during enamel formation and maturation through direct interaction with CD63 and LAMP1 at the ameloblast Tomes' processes, and the subsequent trafficking into the cell cytoplasm . In the present study, we have identified the protein–protein-interaction regions in amelogenin, CD63 and LAMP1, and explore possible structural information of the binding regions using computational biology methods.
MATERIALS AND METHODS
Detection of protein–protein interactions by Y2H (yeast two-hybrid) assay
The Y2H assay system was used to identify and to confirm interactions between enamel matrix proteins and membrane-bound proteins. The Y2H assay system uses two vectors, pGBKT7 and pGADT7 (BD Biosciences Clontech). The pGBKT7 vector expresses proteins, or protein regions, fused to the GAL4 DNA-binding domain, whereas the pGADT7 vector expresses proteins, or protein regions, fused to the GAL4 AD (activation domain). The GAL4-AD-fusion protein is targeted to the yeast nucleus by the SV40 (simian virus 40) nuclear localization sequence. The native signal peptides of proteins studied here were excluded to ensure that the fusion proteins were transported into the nucleus.
The cDNA sequences and open reading frame information for the enamel matrix proteins and membrane-bound proteins used in the present study can be accessed from the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov). The protein sequences and nomenclature are illustrated in Figure 1.
The cloning of full-length genes of mouse amelogenin M180 (using the cDNA and protein reference numbers NM_009666 and NP_033796.1; the M180 refers to the 180-amino-acid amelogenin protein that excludes the 16-amino-acid N-terminal signal peptide), mouse enamelin (NM_017468) and rat ameloblastin (NM_012900) into pGBKT7 vectors has been described previously . In the present study, various fragments of amelogenin M180 were cloned into pGBKT7. The entire open reading frames of human CD63 (NM_001780 coding for protein NP_001771.1) and human LAMP1 (NM_005561 coding for protein NP_005552.3), and cDNA fragments of CD63, LAMP1 and human LAMP2 (NM_002294 coding for protein NP_002285.1) were cloned into pGADT7 vectors. The PCR primers to coding regions of the genes were synthesized. All primers except for INV48566043, which targets to the pUC ori region of pGBKT7, were designed to contain restriction enzyme sites for the efficient and in-frame cloning (Table 1). The complete details of engineering the constructs is not given here, but are available from M.L.P. on request. All constructs were engineered such that the ‘defined’ amino acid regions immediately followed insertion at the multi-cloning site, and the C-terminal amino acid was immediately followed by a stop codon. The plasmid constructs were created following standard protocols . The constructed plasmids were sequenced across their cloning sites and the entire insert to confirm correct orientation, sequence and reading frame. Each pair of candidate ‘bait’ constructs (pGBKT7-insert) and candidate ‘prey’ construct (pGADT7-insert) was co-transformed into the yeast host strain PCY2. The β-galactosidase activity was detected by the filter assay described previously .
Protein secondary structure and solvent accessibility prediction
The SABLE server  (http://sable.cchmc.org) was used to analyse the secondary structure and solvent accessibility of amelogenin M180. The SABLE server utilizes advanced machine learning protocols, evolutionary profiles and predicted RSA (relative solvent accessibility) to compute protein secondary structures . The SABLE server has been trained with a subset of 860 protein families derived from the Pfam protein family database. The prediction accuracy was estimated to be between 77.0 and 78.4% for the three-state classification on different control sets comprising 603 proteins with no homology with proteins included in the training . The prediction accuracy is continuously and independently evaluated to be 70–80% by the EVA web server (http://cubic.bioc.columbia.edu/eva/) .
Disorder propensity of amelogenin M180 prediction
The IUPred server  (http://iupred.enzim.hu) was used to predict the disorder propensity of amelogenin M180. The IUPred server recognizes disorder regions from the amino acid sequence based on the estimated pairwise energy content . The underlying assumption of the IUPred server is that intrinsically unstructured proteins adopt no stable structure because their amino acid composition does not allow sufficient favourable interactions to form . The IUPred server predicted experimentally determined disorder at approx. 90% accuracy at CASP6 .
The three-dimensional structure model of CD63 EC2 (extracellular domain 2)
The Protein Homology/analogY Recognition Engine (Phyre) is a successor of 3D-PSSM  (http://www.sbg.bio.ic.ac.uk/phyre), which is a web-based method for protein fold recognition using one- and three-dimensional sequence profiles coupled with secondary structure and solvation potential information. The prediction performance is continuously and independently evaluated by the EVA . In the present study, Phyre was used to model three-dimensional structure of the EC2 of CD63, and the Accelrys® Discovery Studio Visualizer was used to generate presentations of the three-dimensional structure.
Determination of protein–protein-interaction regions with the Y2H assay
Previously, Y2H screening of a mouse embryo [E17 (embryonic day 17)] cDNA library in conjunction with mouse amelogenin and rat ameloblastin as baits identified the transmembrane protein CD63 as an interacting partner of both enamel proteins . Subsequently, these protein–protein interactions were confirmed using the Y2H filter assay to assess protein interactions between human CD63 and (i) mouse amelogenin, (ii) mouse enamelin, and (iii) rat ameloblastin . These previously identified protein–protein interactions were used in this study as positive controls (Table 2, rows b–d). Tompkins et al.  have identified the interaction between mouse amelogenin M59 and mouse LAMP1 by affinity pull-down and far-Western assays. In the present study, we used a Y2H assay to demonstrate the interaction of mouse full-length amelogenin M180 with human LAMP1 (Table 2, row e).
Tetraspanin CD63 has two extracellular domains and three intracellular domains, with both the N- and C-termini located within the cytoplasm. The extracellular domains are named EC1 and EC2 and are defined as amino acids 33–51 and 103–203 respectively, based on the 238 amino acids in both mouse and human full-length CD63 protein (using the amino acid sequences and numbering identified by GenBank® accession numbers NP_031679.1 and NP_001771.1 respectively). The binding of CD63-(166–205) to amelogenin has been reported previously . In the present study, amino acids 103–205 from the human CD63 protein were used to establish that amelogenin, ameloblastin and enamelin proteins each interacted with CD63-(103–205) (Table 2, rows f–h, and compared with negative and positive controls at rows a and z). Subsequently amino acids 165–205 from human CD63 were subcloned into pGADT7 and used to confirm a strong interaction of CD63-(165–205) with full-length amelogenin (Table 2, row i).
The amelogenin isoform M59, also known as LRAP (leucine-rich amelogenin peptide) , and M180 interact with LAMP1 (Table 2, row e). Since M59 is a partial subset of the amino acid residues contained within the larger M180 protein , we sought to identify the possible regions responsible for the amelogenin–LAMP1 interaction. We cloned the amelogenin cDNA fragments corresponding to exons 3 and 5 (amino acids 3–33) or the amino acids corresponding to exon 6D (residues 155–179) into the ‘bait’ vector (pGBKT7). We cloned the human LAMP1 cDNA fragments corresponding to amino acids 1–121, 95–251, 226–361 or 342–386 into the ‘prey’ vector (pGADT7). The C-terminal region of LAMP1 beyond amino acid 386 was not investigated because this region is not exposed to the extracellular environment. The β-galactosidase activity from the Y2H assay suggests that amelogenin-(155–179), as defined by exon 6D, is entirely responsible for the interaction of amelogenin with LAMP1 (Table 2, rows o–s compared with rows j–n). The two regions of LAMP1 showing the strongest interaction with amelogenin-(155–179) were LAMP1-(95–251) and -(226–361) (Table 2, rows p and q).
Fine mapping of amelogenin-(155–179) was performed by dividing further the domain between Trp168 and Pro169. This site was chosen since it corresponds to the primary proteolytic cleavage site responsible for the removal of the C-terminal teleopeptide from the full-length amelogenin . Amelogenin-(155–168) and -(169–179) were tested against the following peptide regions: (i) the overlapping region of LAMP1 (amino acids 226–251) that showed the strongest interactions with amelogenin (Table 2, rows p and q); (ii) the highly homologous region of human LAMP2 defined by amino acids 227–259; and (iii) human CD63-(165–205). LAMP2 is predicted to exhibit a functional redundancy in vivo with LAMP1, as suggested by the mouse gene knockout study [29,30]. The results from the Y2H assay indicate that amelogenin-(155–168) was entirely responsible for the interaction with LAMP1-(226–251), LAMP2-(227–259) and CD63-(165–205) (Table 2, rows t–v, when compared with rows w–y). The assays in Table 2 were performed at various times, but always with the same positive and negative controls. A single Y2H filter assay highlighting these regions is shown (Figure 2A).
Regions of LAMP1 (amino acids 226–251), LAMP2 (amino acids 227–259) and CD63 (amino acids 165–205) all interact with the same small motif of amelogenin (P155LSPILPELPLEAW168). To assess whether similarities exist among the binding regions of LAMP1, LAMP2 and CD63, we aligned these peptide regions with ClustalW using the Gonnet matrix. The overall alignment shows moderate similarity (∼27%) within the three regions (Figure 2B). However, limiting the comparison to residues 3–17 shows a ∼60% amino acid similarity.
Bioinformatic analysis of the amelogenin-binding region
The three-dimensional molecular structure of amelogenin has not been determined. Our attempt to compute the three-dimensional molecular structure of amelogenin M180 failed to produce a reliable model. To obtain limited structural information for the amelogenin binding region (amino acids 155–168), we analysed the amelogenin protein sequence with various bioinformatics tools. The secondary structures of amelogenin M180 were predicted using SABLE (Figure 3A). We predict that there are two helices and one β-strand within the previously defined ‘A-domain’ (residues 1–42) . The amelogenin region from amino acids 42 to 180 is predicted to be in a coil structure with a probability of 80–92%.
The disorder probability for amelogenin M180 was predicted with the IUPred server (Figure 3B). The profiles for either the A-domain or the B-domain responsible for amelogenin self-assembly  are distinguishable from the remaining regions of amelogenin. The A-domain (residues 1–42) is predicted to be structured, whereas the remainder of the protein (amino acids 43–180) is predicted to be largely disordered. The amelogenin region (amino acids 155–168) was identified by the Y2H assay to be responsible for amelogenin interactions with members of the LAMP family and CD63, and corresponds largely to the previously identified self-assembly B-domain (residues 157–173) and is predicted to be disordered.
The hydrophilicity of amelogenin M180 was re-calculated with ProScale at the window size of 5 using Hopp and Woods values (Figure 3C) . Almost the entire M180 protein, except for the short C-terminus (11 amino acids), is largely hydrophobic.
The relative solvent accessibility of amelogenin M180 was calculated with the SABLE server (Figure 3D). The majority of the buried regions of M180 are located within the A-domain, whereas, beyond the A-domain, the M180 protein is largely exposed. In short, the amelogenin M180-binding region (amino acids 155–168) is largely disordered, hydrophobic and accessible.
Three-dimensional structure modelling of the binding region in CD63
The molecular structure of the EC2 of tetraspanin CD81 has been determined using X-ray crystallography [32,33]. The atomic structure of CD81 EC2 was used as the template to build a three-dimensional model of CD63 EC2. The predicted three-dimensional structure of CD63 EC2 is presented in schematic backbone and solvent-accessible surface (Figure 4). As expected, the EC2 of CD63 is in a mushroom-like structure. The backbone of the amelogenin-binding region of CD63 coloured green is in a ‘7’ shape, and forms part of the ‘mushroom head’, as well as part of the ‘mushroom stalk’. The three residues V203LV205 are lacking in this model. These residues are very likely to be part of the transmembrane helix, as indicated by the computational analysis of the transmembrane region (results not shown).
We were unable to build a reliable three-dimensional molecular structural model for LAMP1 and LAMP2 because the suitable structure template, which is normally derived from the experimentally determined structures, is currently unavailable. Both LAMP1 and LAMP2 belong to LAMP family and share similar domain structures. LAMP1-(226–251) and LAMP2-(227–259) regions are located within ‘Domain 2’ and are right next to the hinge region between Domain 1 and Domain 2.
The identification that amelogenin binds to CD63  and LAMP1  led us to hypothesize that amelogenin may be rapidly taken up by the ameloblast cell through receptor-mediated endocytosis . The detailed characterization of the binding regions in amelogenin, CD63 and LAMP1 by the Y2H assay reveals that the region (P155LSPILPELPLEAW168) of amelogenin M180 is responsible for the interaction with CD63 through the domain specified by amino acids 165–205, LAMP1 through the domain specified by amino acids 226–251 and LAMP2 through the domain specified by amino acids 227–259. The binding region (residues 155–168) of amelogenin is hydrophobic, and is predicted to be largely disordered and accessible to the external environment. The region defined by amino acids 165–205 of CD63 is likely to be in a ‘7’ shape within the mushroom-like structure of CD63 EC2.
All of the data presented in Table 2 were derived from the Y2H assays. No significant ambiguity clouded our interpretation, as smaller regions of amelogenin, CD63 and LAMP1 were tested in multiple ways. However, we found that, for amelogenin, residues 155–179 interacted weakly with LAMP1-(1–121) (Table 2, row o). In this instance, there is a moderate similarity (∼38%) between LAMP1 amino acids 226–251 and 36–64 based on the ClustalW alignment. This weak homology is likely to explain the weak interaction noted. Fine dissection of regions of ameloblastin and enamelin responsible for binding to CD63, as well as the interaction of ameloblastin and enamelin with LAMP1/LAMP2, were not examined in the present study, and a separate investigation is needed to explore those interaction domains.
Attempts to characterize the structure of enamel matrix by X-ray crystallography have been made since the 1960s [10,34–39]. To date, the X-ray crystallographic structure of full-length amelogenin has yet to be solved . The secondary structures of amelogenin have been characterized experimentally since the 1980s by CD and FTIR (Fourier-transform infrared) spectroscopy methods [10,40,41]. However, the composition and location for each of the selected structural elements remain vague. Secondary-structure predictions of M180 suggested that there are two helix and one β-strand element within the A-domain at the N-terminal region, whereas the remaining regions of amelogenin are in a random coil. The scarcity of highly ordered secondary structures (helix and β-strand), and the large random-coil region, imply that monomeric amelogenin M180 may not have a stable tertiary structure. Amelogenin M180 is rich in proline (24.4%) and glutamine (13.9%), and contains 13 Gln-Pro-Xaa repeats. This amino acid compositional bias and short repeats are characteristic of ‘intrinsically unstructured proteins’ , supporting our prediction for the propensity of mouse amelogenin M180 to exhibit a disordered structure. Our conclusion of ‘largely disordered’ is consistent with two previous reports on the structure of the enamel matrix: (i) “The organic material of dental enamel seems to be completely disordered at the molecular level” ; and (ii) “70% of the fetal enamel protein chains exhibit rapid (≤10−6 s), nearly isotropic molecular motion” . Such disordered regions are quite often lacking from the electron-density maps obtained by X-ray crystallography , and are flexible and dynamic in solution . A popular hypothesis is that the natively disordered proteins are malleable, leading to advantages with respect to functions such as regulation and binding of diverse ligands [39,45].
Amelogenin is a largely hydrophobic molecule with a hydrophilic C-terminus , that has low solubility under physiological conditions and tends to form nanosphere assemblies . The amelogenin nanospheres assemble further to form microribbons . Nanospheres of purified bacterially produced recombinant amelogenin have been demonstrated in vitro , whereas similar nanostructures are also observed in vivo . Several models of amelogenin assemblies have been proposed [37,38,48]. The common feature among these models is the presence of a hydrophobic core surrounded by a hydrophilic shell. The hydrophilic shell is very likely to be composed of the highly charged C-terminal region (P169ATDKTKREEVD180), which has been referred to as the C-terminal teleopeptide of amelogenin [1,28]. This region defined by residues 169–180 is specifically cleaved shortly after its secretion [28,49], and this proteolytic processing is essential for proper enamel formation . After this cleavage, the stability of the amelogenin assemblies is predicted to decrease as a result of a significant decrease for molecular charge and with a concurrent sharp increase of hydrophobicity at the outer shells. This physical change may also initiate the disassembly of amelogenin nanosphere. Also after the removal of teleopeptide region, this binding region of amelogenin (residues 155–168) would be exposed at the surface of amelogenin nanospheres, and could be made available to the plasma membrane at the Tomes' processes.
The only available experimental structure for tetraspanin family is the crystal structure of CD81 EC2, which has been resolved by Kitadokoro et al. [32,33]. Sequence analyses suggest that all tetraspanin EC2s share a mushroom-like structure, with a highly variable region embedded in the ‘mushroom head’ . The crystal structures of CD81 EC2 have been used to model structures of other tetraspanins, CD53 , Q9V3R4  and CD82 . The three-dimensional structure of the CD63 EC2 presented here is built with Phyre using the atomic structure of CD81 EC2 (PDB code 1G8Q) as the template. The ‘mushroom stalk’ of the EC2 is suggested to interact with other tetraspanins in the tetraspanin web . Based on our three-dimensional modelling data, amino acids 165–184 of the CD63 EC2 would be more available and accessible than the EC2 region defined by amino acids 185–205 for intermolecular interactions (e.g. interactions with amelogenin). As supported by the homology analysis (Figure 2B), we postulate that the tetraspanin region (V165PDSCCINVTVGCGINFNEK184) plays a central role in supporting a protein–protein interaction with amelogenin.
CD63, LAMP1 and LAMP2 are ubiquitously expressed and localize to the plasma membrane. They are also present in the membranes of the endosome/late endosome and lysosome [15,17]. A lysosomal targeting motif is present at the C-termini of CD63, LAMP1 and LAMP2 , suggesting that they all may employ a similar mechanism in their trafficking from the plasma membrane to the lysosome. Although CD63 and LAMP1 are expressed by ameloblasts at all stages of amelogenesis, CD63 is more highly expressed in late-secretory and post-secretory stages, whereas LAMP1 is more highly expressed in the early secretory stage . Protein interactions between the secreted enamel matrix proteins (such as amelogenin, ameloblastin and enamelin) and membrane-bound proteins (such as CD63, LAMP1 and LAMP2) probably occur at the ameloblast Tomes' processes. Such interactions may be required to establish short-term order of the forming matrix, and/or to mediate feedback signals to the transcriptional machinery of ameloblasts, and/or to quickly remove matrix protein debris during enamel biomineralization.
This work was supported by Grants DE006988, DE013045, DE013404 and DE014867 from the National Institutes of Health, National Institute of Dental and Craniofacial Research.
Abbreviations: AD, activation domain; EC, extracellular domain; LAMP, lysosome-associated membrane protein; LRAP, leucine-rich amelogenin peptide; Y2H, yeast two-hybrid
- © The Authors Journal compilation © 2007 Biochemical Society