The genetic element pSSVx from Sulfolobus islandicus, strain REY15/4, is a hybrid between a plasmid and a fusellovirus. This plasmid–virus hybrid infects several species of the hyperthermophilic acidophilic crenarchaeon Sulfolobus. The open reading frame orfc68 of pSSVx encodes a 7.7 kDa protein that does not show significant sequence homology with any protein with known three-dimensional structure. EMSA (electrophoretic mobility-shift assay) experiments, DNA footprinting and CD analyses indicate that recombinant C68, purified from Escherichia coli, binds to two different operator sites that are located upstream of its own promoter. The three-dimensional structure, solved by a single-wavelength anomalous diffraction experiment on a selenomethionine derivative, shows that the protein assumes a swapped-hairpin fold, which is a distinctive fold associated with a family of prokaryotic transcription factors, such as AbrB from Bacillus subtilis. Nevertheless, C68 constitutes a novel representative of this family because it shows several peculiar structural and functional features.
- plasmid–virus hybrid
- Sulfolobus islandicus
- transcription factor
Studies on crenarchaeal viruses have shown that they possess unusual morphotypes and quasi-orphan genome sequences leading to the identification of seven new families of double-stranded DNA viruses, of which the Fuselloviridae represent the most well-studied specimens. Morphologically speaking, the viral particles are approx. 60 nm×90 nm in size, are spindle- or lemon-shaped and have tail fibres that emanate from one end .
Two distinct genetic elements, SSV2 and pSSVx, belong to this viral family and coexist in the same Sulfolobus islandicus REY15/4 host, thus representing the only known two-virus system in Archaea . pSSVx is a satellite virus that generates virus particles with the help of SSV2-associated packaging mechanisms. In a previous study, we demonstrated that replication of SSV2 and pSSVx is induced during growth of the natural host REY15/4. This kind of physiological induction represents a unique feature in the interactions of crenarchaeal viruses with their hosts .
Analyses of the transcriptional patterns of pSSVx genes over the whole host growth cycle showed that only seven of the nine ORFs (open reading frames) identified were actively transcribed. Strikingly, the expression of these seven ORFs displayed significant variations in temporal control, as well as in the complexity of the transcriptional patterns, suggesting a close interdependence between gene expression and pSSVx replication. Two of these ORFs, orf154 and orf288, are conserved in the Fuselloviridae family , and encode proteins that are necessary for specific pSSVx DNA recognition and pre-packaging. The remaining pSSVx genome sequence is typically plasmidic, with the putative minimal replicon shared with members of the pRN plasmid family [5–9]. This conserved region includes orf60, orf892-RepA and orf76, which encode CopG (a copy number control protein), RepA (a replication initiator protein) and PlrA (a putative plasmid regulatory protein) respectively . orf91 encodes a protein with a putative zinc-finger motif that is similar to domains of nucleic acid-binding proteins in plant RNA viruses (carlaviruses), as well as transcriptional activators of late genes in both coliphages and relative satellites . orfc68 is the only reverse-oriented pSSVx gene and has no homologues in the crenarchaeal family of pRN plasmids . The protein encoded by this gene, namely C68, is a quasi-orphan protein since it shares significant homology only with other crenarchaeal proteins whose biological role is unknown. In fact, it is highly similar (42% identity) to ORFc56 of pSSVi , and there are homologues in Sulfolobus tokodaii , Sulfolobus acidocaldarius  and S. islandicus genomes .
Because C68 homologues are not present in other prokaryotic plasmids, it has been speculated that its acquisition by the pSSVx genome was crucial for viral-like function and the ability to respond to diverse viral stimuli . Interestingly, unlike all of the other pSSVx mRNAs, the orfc68 transcript gradually accumulates, even after the plasmid copy number of pSSVx has reached its plateau value and the life cycle of pSSVx is complete. This suggests that orfc68 expression is up-regulated and that it may regulate the pSSVx life cycle and/or the host–virus interaction process .
In the present paper, we report a functional and structural characterization of C68. Secondary-structure predictions pointed out a significant structural similarity with some prokaryotic transcription factors. Thus, to verify whether C68 can act as transcription factor, we analysed its ability to bind to its own promoter in vitro by spectroscopy and EMSAs (electrophoretic mobility-shift assays). X-ray diffraction studies were also performed to provide a detailed three-dimensional characterization. These data revealed that, although C68 exhibits a swapped-hairpin-like fold typical of a family of prokaryotic transcription factors, it presents several novel features at both functional and structural levels. The present study provides insights into the variety of quasi-orphan genes from crenarchaeal viruses, considering that only a few viral proteins have been studied in detail so far and in particular only a few three-dimensional structures are available in the PDB.
MATERIALS AND METHODS
Cloning of orfc68 and overexpression and purification of C68
The orfc68 gene was PCR-amplified from the plasmid pSSVrt  by using the primers 5′orfc68 (5′-ACGAACAAATTGTTTCATATGAGACCAGGCATACG-3′) and 3′orfc68 (5′-TTATCAAAAAAACACTCGAGTTAAATTTTTAATTCCTTCACC-3′).
The PCR product was digested with NdeI and XhoI and ligated to the NdeI/XhoI-digested pET30. Overexpression of C68 in the Escherichia coli BL21-CodonPlus®(DE3)-RIL cells was induced at a D600 of 0.7 by the addition of 0.5 mM IPTG (isopropyl β-D-thiogalactopyranoside) for 3 h. The cells from 1 litre of culture were resuspended in 20 ml of ice-cold lysis buffer (50 mM sodium phosphate buffer, 100 mM NaCl and 1 mM EDTA, pH 7.0) containing Complete™ protease inhibitor cocktail tablets (Roche). Lysis was performed as described by Cannio et al. .
To purify C68, the cleared lysate was dialysed overnight against 20 mM sodium phosphate buffer (pH 7.5) and then loaded on to a 1 ml Resource S column (GE Healthcare) connected to an AKTA™ chromatography system. The elution was performed with a linear salt gradient from 0 to 500 mM NaCl in 20 mM sodium phosphate buffer (pH 7.5). Protein-containing fractions were pooled and loaded on to a Superdex 75 10/30 column (GE Healthcare) and the elution was carried out in 50 mM sodium phosphate buffer (pH 7.5) containing 200 mM KCl. LC (liquid chromatography)–MS was undertaken as described by D'Ambrosio et al. , to verify the molecular mass of the recombinant protein.
A seleno-L-methionine derivative of C68 (SeMetC68) was prepared by growing recombinant E. coli BL21(DE3)-RIL cells in 1 litre of M9 minimal medium containing 0.2% glucose, 1 mM MgSO4, 0.1 mM CaCl2, 50 μg/l kanamycin and 100 μg/l thiamine. After reaching a D600 of 0.7, an amino acid mixture (50 mg/l isoleucine, leucine and valine and 100 mg/l phenylalanine, threonine and lysine) was added to the culture. After equilibration, 60 mg/l seleno-L-methionine was added to the culture and induction was performed. A purification protocol similar to that for the native enzyme was used for purification of the seleno-L-methionine derivative.
Purified C68 was analysed by gel-filtration chromatography connected to MiniDAWN Treos light-scattering system (Wyatt Technology) equipped with a QELS (quasi-elastic light scattering) module for mass value and hydrodynamic radius (Rh) measurements. A 500 μg sample (1 mg/ml) was loaded on a S75 10/30 column, equilibrated in 50 mM Tris/HCl (pH 8.0) and 150 mM NaCl. A constant flow rate of 0.5 ml/min was applied. Data were analysed using Astra 188.8.131.52 software (Wyatt Technology). In batch mode, C68 samples were prepared in 50 mM Tris/HCl (pH 8.0) and 150 mM NaCl. A stock solution of 3.5 mg/ml was filtered through a 0.02 μm Millex syringe driven filter unit (Millipore). After a further measure of protein concentration, samples were prepared at the following concentrations: 0.5, 1.0, 2.0 and 3.0 mg/ml.
Binding of C68 to target DNA sequences was measured by EMSA using PCR fragments or synthetic oligonucleotides as DNA probes. The 54 bp and 134 bp fragments (promoter of orfc68 and Copg/c68 intergenic region respectively, as described in Figure 1) were amplified by PCR using the pSSVrt  as template by using the primers 5′orfc68pr (5′-GTCTCATAATAAACAATTTGTTCGT-3′) and 3′orfc68pr (5′-GCTTTTCGTATGCGTAATATTTAAAA-3′) and 5′orfc68pr/3′int (5′-ACCATTTTGTCACCAGGTACAGTA-3′) respectively.
The EcoRI-digested labelled probes were generated by a fill-in reaction with Klenow polymerase (Roche) and [α-32]dATP (PerkinElmer Life Science).
Thermal pre-incubation of the purified C68 protein was conducted for 15 min at 50 °C in assay buffer [25 mM Tris/HCl (pH 8.0), 50 mM KCl, 10 mM MgCl2, 1 mM DTT (dithiothreitol) and 5% (v/v) glycerol] in the presence of 500 ng of poly(dI-dC)·(dI-dC) as competitor. The labelled probes were added at a concentration of 0.030 μM (for the 54 bp promoter region) and 0.015 μM (for the 130 bp CopG/c68 intergenic region). The binding reactions were performed with increasing amounts of C68 (4–8 μM) for 15 min at 50 °C. In the displacement experiments, the binding reactions were performed with 7 or 8 μM C68 with the concurrent addition of increasing amounts of unlabelled probe (1:1, 1:10, 1:100, 1:1000 and 1:10000 ratio of labelled/unlabelled DNA) to the EMSA mixture.
In order to analyse the binding of C68 to each of the two interacting regions (IR1 and IR2) identified by foot-printing (see below), the complementary primers IR1A (5′-GATATACGCATACGAAAAGCAGC-3′) and IR1B (5′-GCTGCTTTTCGTATGCGTATATC-3′) as well as IR2A (5′-GATACGGGGTAATACCCAAAAGTG-3′) and IR2B (5′-CACTTTTGGGTATTACCCCGTATC-3′) were annealed. The EMSAs were then performed as described above and DNA–C68 complexes were fractionated by 10% native PAGE.
Hydroxyl radical footprinting
The CopG/c68 intergenic region used as probe was PCR-amplified with the 3′int oligonucleotide labelled previously at the 5′ end with T4 polynucleotide kinase (Roche).
Aliquots of 1–6 pmol of C68, 100 ng of poly(dI-dC)·(dI-dC) and the radiolabelled PCR fragment (~500 c.p.s.) were combined in a 50 μl binding reaction in 50 mM Tris/HCl (pH 8.0), 80 mM KCl, 25 mM MgCl2 and 1 mM DTT and incubated for 20 min at 50 °C. The hydroxyl radical cleavage was performed by adding 16 μM iron/32 μM EDTA, 0.8 mM sodium ascorbate and 0.012% H2O2 to each binding-reaction tube and allowing the reaction to proceed for 10 min at room temperature (25 °C). The cleavage was stopped by adding 5 μl of stop solution (3.6 mM thiourea, 4.5 mM EDTA and 110 mM sodium acetate). The DNAs were then precipitated with 95% ethanol overnight at −20 °C. The sequencing reaction was performed with the fmol DNA Cycle Sequencing system (Promega) according to the instructions of the manufacturer. The samples were then loaded on a 6% polyacrylamide denaturing gel.
Far-UV CD spectra (260–190 nm) were recorded by using a Jasco J-715 spectropolarimeter, equipped with a PTC-423S/15 Peltier temperature controller. CD measurements were carried out using a 0.1-cm-pathlength cell and a protein concentration of 10 μM in a 10 mM Tris/HCl (pH 8.0) buffer.
For the titration with the DNA, concentration of the double-stranded oligonucleotides (IR1 and IR2) was varied from 0 to 10 μM and the CD signal was monitored at 213 nm. A sequence region located in the promoter of orf76 was used as negative control for non-specific binding. The baselines were corrected by subtracting buffer and DNA spectrum.
Thermal unfolding curves were recorded in the temperature mode at 213 nm from 20 °C to 105 °C. The chemical unfolding spectra of C68 (10 μM) were recorded in presence of increasing concentrations of guanidinium chloride (from 0 to 8 M) following incubation overnight in the presence of the denaturant. CD spectra were signal averaged over at least three independent scans and the baseline corrected by subtracting the buffer spectrum.
Preparation of crude extract of Sulfolobus cells and in vitro expression analyses
Western blot analysis of C68 was performed on protein extracts from cells of S. islandicus REY15/4. 250 ml of cell cultures grown as described previously  were resuspended in 5 ml of 25 mM Hepes buffer (pH 7.0) and then sonicated. Protein extracts (100 μg) were then run on a SDS/PAGE gel (15%), electroblotted on to PVDF membranes and detected immunologically using rabbit polyclonal antisera raised against C68 (Igtech). Antigen–antibody interactions were detected with horseradish-peroxidase-conjugated secondary antibodies and an enhanced chemiluminescence kit (GE Healthcare). To determine the relative abundances of C68 in extracts prepared from cells grown to different growth phases, aliquots (100 μg) of the extracts were run on SDS/PAGE together with increasing amounts of purified recombinant C68. The relative amount of C68 was quantified using a Gel-Doc phosphorimager and Quantity One software (Bio-Rad Laboratories). As a control for the relative abundance of extrachromosomal pSSVx, DNA was extracted from a similar amount of cells harvested at the same growth phases, as described above, and run on an agarose gel.
Crystallization and X-ray data collection
Crystallization experiments were carried out on native and SeMetC68, using the hanging-drop vapour-diffusion method . The search for initial crystallization conditions was performed using Hampton Research Crystal Screen kits I and II . Crystals of native protein were obtained at a protein concentration of 6 mg/ml with reservoir solution consisting of 1.6 M sodium formate and 0.1 M sodium acetate (pH 3.6) at 298 K, whereas SeMetC68 was crystallized at a protein concentration of 5 mg/ml with reservoir solution consisting of 1.4 M ammonium sulfate and 0.1 M sodium acetate (pH 5.4), at 298 K.
Both crystals belonged to the I23 space group and were found to contain two molecules within their asymmetric unit. Native and SAD (single-wavelength anomalous diffraction) datasets were collected at the Synchrotron source Elettra in Trieste at the temperature of 100 K, using a Mar CCD (charge-coupled device) detector. In both cases, since the crystallization solution was not suitable to provide cryoprotection, the crystals were washed rapidly in the reservoir solution containing 25% (v/v) glycerol and immediately flash-frozen in a nitrogen-gas stream at 100 K. Datasets were processed using the HKL crystallographic data reduction package (Denzo/Scalepack)  and statistics are given in Table 1.
Structure determination and refinement
The structure of C68 was determined by the SAD method. Determination and refinement of two selenium sites and phase calculations were carried out with the program SOLVE  using data between 50.0 and 2.8Å (1Å=0.1 nm) resolution, and the initial phases were improved further with RESOLVE . The model of C68 was built automatically by using RESOLVE  and optimized manually by using the program O . The structure was refined against the native dataset using CNS . The first cycles of the refinement were carried out with two-fold NCS (non-crystallographic symmetry) restraints with an energy barrier of 300 kcal/mol·Å2 (1 kcal=4.184 kJ). After Rfactor and Rfree reached 0.270 and 0.301 respectively, the NCS restraint was removed. The ordered water molecules were added automatically and checked individually. Each peak contoured at 3σ in the (Fo−Fc) maps was identified as a water molecule, provided that hydrogen bonds would be allowed between this site and the model. Many cycles of manual rebuilding, using the program O  and positional and temperature refinement using the program CNS , were necessary to reduce the crystallographic Rfactor and Rfree values (in the 50.0–2.8Å resolution range) to 0.258 and 0.288 respectively. Restrained individual B-factor refinement was not performed until the last cycle. Table 1 summarizes the refinement statistics. Co-ordinates and structure factors have been deposited with the PDB under accession code 3O27.
Expression, purification and quaternary-structure characterization of C68
C68 protein was purified to homogeneity in a two-step procedure with cation-exchange and gel-filtration chromatographies (Figure 2A). The purification yield was approx. 3 mg of protein per litre of culture. SDS/PAGE analysis showed that the purified protein was a single band with an expected molecular mass of ~7 kDa (Figure 2A, lane 5), which agrees with the theoretical molecular mass (7744 Da) and the mass determined by LC–MS (7742 Da).
To assess the quaternary structure of C68, a combination of size-exclusion chromatography, multi-angle light scattering and QELS was performed. These analyses showed that C68 has a molecular mass of 14470 (0.9%) Da, which indicates that the protein is a dimer (Figure 2B). DLS (dynamic light scattering) analyses were also performed to see whether the molecular size distribution of C68 changed at increasing protein concentrations. This analysis was carried out at 20 °C with protein concentrations ranging from 0.5 to 3.0 mg/ml. No significant variation in the Rh value (2.0±0.45 nm) was observed as a function of the protein concentration (Figure 2B, inset). These findings show that, under the experimental conditions used, no species larger than dimers exist in solution.
C68 binding to DNA
The DNA-binding activity of C68 was analysed by EMSA with DNA fragments issued from the region upstream of the orfc68 gene. A labelled fragment containing the orfc68 promoter region from −47 to +7 was used. The mobility of the DNA fragment was shifted upon the addition of increasing amounts of C68 (from 4 to 8 μM) in the presence of 40 ng/μl competitor DNA [poly(dI-dC)·(dI-dC)] (Figure 3A). Increasing amounts (10–10000-fold excess) of unlabelled specific DNA fragments gradually decreased the intensity of the shift (Figure 3B). Conversely, binding of C68 to its own promoter was not affected by the presence of increasing amounts of non-specific competitor DNA (polylinker region of the pUC28 vector) at up to a 2000-fold excess (Figure 3C). Furthermore, under the same conditions, C68 did not bind to a DNA fragment containing a 50 bp 5′-flanking region of the orf154/orf288 operon of the pSSVx genome (results not shown). These results demonstrate that C68 binds specifically to its own promoter. The dissociation constant of the C68–promoter interaction was calculated by incubating increasing amounts of protein with the regulatory region (results not shown) and analysing the intensity of the shifted complex by signal-to-noise densitometric scanning. The dissociation constant (~4 μM), defined as the protein concentration at which 50% of protein is bound, was calculated using GraphPad Prism software.
Binding of C68 to a larger DNA region (from −127 to +7), including the orfc68 promoter and that of the adjacent orf60 gene, was also investigated. As shown in Figure 3(D), C68 forms two specific and distinct concentration-dependent complexes (B1 and B2), indicating that the interaction occurred at two distinct binding sites. At 6 μM C68, only the B1 complex was present, whereas at higher concentrations, an additional band (B2) was visible. A rather rapid transition in the formation of complex B2, at the expense of B1, was detected at 9 μM. The B1 species appeared to precede B2, thus indicating that formation of B2 is a two-step process and suggesting the existence of an apparent binding co-operativity. At the highest protein concentrations used, a third band (Figure 3D, labelled W) appeared. As only two specific binding sites were identified by footprinting (see below), this additional band probably results from non-specific C68–DNA interactions or from the formation of aggregates.
The region immediately adjacent to IR1 might have a role in enhancing or stabilizing the binding of C68 to IR1. Indeed, when the IR1 site is located in the central part of the DNA seq-uence, the probe is completely saturated at 6 μM (Figure 3D), whereas when it is located at the border, C68 only binds the DNA partially (Figure 3A).
Control experiments performed with a ~130-bp-long region encompassing the promoter region of the orf154/orf288 operon showed no change in gel mobility (results not shown), demonstrating that C68 binds specifically to the orf60–orfc68 intergenic region.
To define the exact binding sites of C68, we performed a hydroxyl radical footprinting analysis of the −127 to +7 region. As shown in Figure 4, C68 protects two contiguous nucleotide stretches within this probe: the first site (IR1) extending from −31 to −47, is present in the orfc68 5′-flanking region and partially overlaps the BRE (B recognition element) sequence. The proximity of this binding site with crucial elements of its own promoter, suggests that C68 has a direct auto-regulatory effect in vivo. The second site (IR2), which is clearly identified by the flanking hypersensitive nucleotides, is located in the region from −50 to −69 and is equidistant from the orfc68 and orf60 promoters. DNase I hyperreactivity indicates major local DNA deformation, such as bending or minor groove widening. Considering that C68 is a dimer in solution, its binding sites are expected to comprise inverted repeat sequences, as is the case of many transcription regulators. An imperfect palindrome is detected only in the IR1 site as shown in Figure 4. However, both binding sites contain a TATG(X/XX)TTTTC consensus sequence whose centres are separated by 21 bp.
Subsequently, we performed binding assays of single sites using double-stranded oligonucleotides containing either one of the binding sites identified by footprinting, all embedded in fragments of identical length and flanking sequence composition. In both cases, only one retarded band was observed and, at a concentration of 4–8 μM, C68 bound more efficiently (up to 3-fold) to IR1 than IR2 (Figure 5A).
Combining the results of footprinting with the EMSAs (Figures 3D and 5A), which show that: (i) C68 affinity towards the IR2 site is lower than that towards IR1; (ii) complex B2 is detectable only at high concentration of C68; (iii) the efficiency of the binding of C68 at each site is lower than that observed when IR1 and IR2 are located on the same DNA fragment (Figure 3D); and (iv) DNase I-hypersensitive sites adjacent to IR2 are present, we conclude that the B1 complex is formed mainly upon interaction of C68 at the palindromic IR1 site. Furthermore the interaction of C68 at IR1 might induce conformational deformation on the DNA sequence that makes the low-affinity and non-palindromic IR2 site more accessible for C68 binding.
Insight into structural features of C68 and its DNA-binding mode by spectroscopic analyses
Far-UV CD spectra show that C68 has predominantly a β-structure (Figure 5B) that is highly stable and was not significantly affected by exposure to increasing temperatures (results not shown). In addition, similarly to other thermophilic proteins, C68 was resistant to guanidinium chloride and had a single inflection point (Cm) that was centred at 3.5 M guanidinium chloride (results not shown).
The conformational change upon binding of C68 to IR1 and IR2 was studied further (Figures 5B and 5C). The CD spectra of the protein–DNA complexes showed a decrease in optical ellipticity in the spectral region at 210–230 nm compared with free protein (Figure 5B). The change in ellipticity increased with the amount of DNA added up to stoichiometrically equal amounts of protein and DNA, with a more relevant effect in the binding with IR1. A transition from unordered regions to β-sheets was observed after the addition of both DNA sequences and was most noticeable for IR1 (Figure 5C), which agrees with the EMSA results. The DNA sequence derived from the region upstream of orf76, which encodes the PlrA protein from pSSVx, was used as negative control. Addition of this control DNA caused only a slight modification in CD spectra, indicating that the changes in the secondary structure is sequence-specific (Figure 5C).
In vivo expression analysis of C68
Changes in the C68 expression levels during growth of the pSSVx natural host, S. islandicus REY15/4, was estimated by Western blot analysis. Host cells were harvested at middle (0.6 units at D600/ml) and late (1.3 units at D600/ml) growth, i.e. before and after induction of pSSVx replication respectively (30–38 h) (Figure 6A). The accumulation of C68 followed a trend similar to that observed for the transcript , with the highest amount detected at the late-exponential phase of growth (Figure 6B). Interestingly, C68 represented about the 0.025% of the total proteins in cells at the late growth phase.
C68 was crystallized in the space group I23 with two molecules per asymmetric unit, termed A and B. The structure was solved by a SAD experiment on SeMetC68 and was refined to an Rfactor of 25.2% and an Rfree of 28.9%, using 4764 reflections in the resolution range 50.0–2.8Å. In general, monomer A was better defined in the electron-density maps; for this reason, it will be used subsequently for the structural comparison with other proteins. The first three N-terminal residues of both monomers and the protein regions 17–21 of monomer A and 16–23 of monomer B were not added to the final model, as a consequence of their poorly definition in the maps. The refined structure has good geometry with RMSDs (root mean square deviations) of 0.008Å and 1.4 ° for the ideal bond length and angle respectively. The average temperature factor (B) for all of the atoms was 43.2Å2. The stereochemical quality of the model was assessed by Procheck . The most favoured and additionally allowed regions of the Ramachandran plot contained 91.1% and 8.9% of non-glycine residues respectively. The refinement statistics are summarized in Table 1.
The two independent molecules in the asymmetric unit of C68 have some important structural differences. Indeed, the RMSD value calculated for the superimposition of the Cα atoms was 1.89Å. Such differences were observed at the N-terminus (residues 4–8) and C-terminus (residues 66–68), and at the loop region 16–23, and when these regions were excluded, the RMSD value fell to 0.73Å. These structural differences cause a diverse succession of the secondary-structural elements. Indeed the structure of the monomer A consists of five β-strands (β1, residues 10–11; β1*, residues 14–16; β2, residues 24–29; β3, residues 45–52; β4, residues 55–62), one α-helix (α1, residues 30–37) and one 310 helix (G1*, residues 64–67) (Figures 7 and 8), while monomer B lacks the secondary-structural elements β1, β1* and G1*. The absence of these secondary-structural elements in monomer B could reflect its worse definition in the electron-density maps with respect to monomer A.
The two monomers interact with each other through a swapped hairpin mechanism, generating a stable dimer that was identified by light-scattering and gel-filtration analyses. This dimer is characterized by a large β-sheet scaffold, composed of strands β1, β3, β4, β4′ and β3′ on one side and β1*, β2 and β2′ on the other side (Figure 7) (the prime symbol indicates secondary-structural elements of monomer B). There are extensive interactions between the two monomers, with a large buried surface area of 4029.8Å2. Dimer formation is stabilized by several intermolecular hydrogen bonds and a large hydrophobic core that mainly includes the side chains of residues belonging to β3, β4, β3′ and β4′ and, to a lesser extent, residues of all the other strands.
A structure similarity search of the C68 protein with the DALI server  identified four proteins, characterized by a swapped-hairpin fold, with a Z-score between 5.7 and 3.1. These proteins are the N-terminal domains of AbrB [28,29], Abh [30,31] and SpoVT  from Bacillus subtilis, as well as the E. coli chromosomal MazE [33,34]. Interestingly, all of these proteins are transcription factors, confirming that C68 may regulate transcription. Worthy of note is that C68 is the sole protein among these five transcription factors that contains only a DNA-binding domain. The other four proteins possess other structural domains that are involved in either protein oligomerization or binding to partner proteins [31–37].
Figure 8 shows the structure-based sequence alignment of C68 with these four similar proteins. Although all of the proteins share a common fold, they have important differences in structural details, which is expected because of the low sequence identity (23.8% for C68/AbrB, 21.4% for C68/Abh, 24.6% for C68/SpotV and 11.8% for C68/MazE). Generally speaking, the succession of the secondary-structural elements is conserved, even though the length of the single elements varies. The main differences are the presence in C68 of two additional structural elements, a β-strand at the N-terminus (β1*) and a 310 helix at the C-terminus (G1*), and of two large insertions, between strands β1* and β2 as well as β3 and β4 (Figure 8). These differences may be related to protein-specific structural requirements for the selection of the correct DNA target.
Previously reported modelling studies on the complex which AbrB forms with its cognate promoter, revealed that strand β1, helix α1 and the loops connecting the secondary-structural elements β1 and β2 (LP1), as well as α1 and β3 (LP2) are the structural elements involved in DNA binding . The structural superposition of C68 with AbrB in this complex (Figure 9) highlighted, that even though C68 recognizes DNA utilizing the same AbrB structural elements, some important differences are present. Indeed, the region mainly involved in DNA recognition, namely LP1, is much longer in C68 and contains small residues (three glycine and one serine) (Figure 8) that may allow for extension deep into the major groove of DNA (Figure 9). However, this region is completely disordered in our structure and could be stabilized upon DNA binding. Moreover, most of the residues which in AbrB are critical for DNA binding, namely Arg8, Lys9, Arg15, Arg23, Arg24 and Lys31 , are not conserved in C68 (Figure 8), even if their role could be played by some charged residues (Arg14, Lys31 and Lys40) present in their close neighbourhood (Figure 9).
Although the mechanisms of gene expression regulation have been widely studied, relatively little is known about transcription in Archaea compared with Eukarya and Bacteria [35–37]. Whereas archaeal RNA polymerase and basal transcription factors are very similar to molecular components of eukaryal RNA polymerase II, their transcriptional regulators strongly resemble bacterial ones. This ‘hybrid’ transcriptional apparatus probably uses novel and uncharacterized mechanisms to regulate gene expression. Many proteins encoded by archaeal genetic elements do not have homologues in the data banks, thus studies of archaeal gene expression may identify new proteins and mechanisms. Because they have small genomes that are easy to study, viruses and plasmids represent attractive models to understand complex biological mechanisms, such as transcription, the cell cycle and regulation of gene expression. These mobile genetic elements drive the evolution of new species and/or novel functions because of their unique ability to reshuffle genetic material through horizontal gene transfer.
Bioinformatic studies indicated that C68, a protein encoded by the hybrid plasmid–virus pSSVx isolated from the hyperthermophilic archaeon S. islandicus REY15/4, was a putative transcription factor. In the present study, we have functionally and structurally analysed this novel protein to elucidate its function. C68 is a dimeric protein that shows a swapped-hairpin fold, which is a distinctive structural characteristic of a family of prokaryotic transcription factors, whose representative member is AbrB from Bacillus subtilis [28,29]. These transcription factors can act as transcriptional activators or repressors towards a plethora of genes. C68 binds to its own promoter at two different sites, IR1 and IR2, located upstream of the BRE sequences; both of these sites contain an TATG(X/XX)TTTTC consensus sequence. In vivo expression analyses of the mRNA and protein levels indicate that C68 is up-regulated during the pSSVx life cycle and reaches its highest expression level in the late-stationary phase . Altogether, these data strongly suggest that C68 is a transcriptional activator that modulates its own expression. Analogous to previously characterized archaeal transcriptional activators, C68 may exert its function by facilitating the recruitment of TBP (TATA-box-binding protein) and/or stabilizing the PIC (pre-initiation complex) through binding sequences upstream of BRE [38–42]. Our data do not indicate whether C68 exerts its physiological function through binding to one or both sites. However, because the affinity for the IR1 site is higher than for IR2, and because binding to IR2 seems to be concentration-dependent, C68 binding at IR2 may only serve to enhance transcription activation. Alternatively, since IR2 is equidistant from the promoter region of orfc68 and orf60, binding to IR2 might regulate also the expression of the adjacent reverse-oriented orf60 gene.
So far, only a few members of the AbrB-like superfamily have been characterized functionally and structurally, such as AbrB [28,29,43,44], Abh [30,31] and SpoVT from B. subtilis [32,45], as well as the E. coli chromosomal MazE [33,34,46]. The first three proteins are transition state regulators that control genes involved in the transition between the active and the stationary growth phases of B. subtilis [28–32]. MazE functions as an antidote for the chromosomal addiction module MazE/MazF, which plays a role in bacterial cell death [33,34]. All of these transcription factors are multi-domain proteins that contain a swapped-hairpin N-terminal domain, involved in DNA binding, and either a C-ter-minal domain responsible for oligomerization (AbrB, Abh and SpoVT) or a C-terminal acidic extension responsible for the binding to partner proteins (MazE). However, in both of these cases, the C-terminal region seems to regulate DNA binding. AbrB, Abh and SpoVT are tetrameric proteins [28–32] that interact with numerous DNA targets; these proteins may recognize a general DNA tertiary structure rather than a consensus sequence [43,47] and are therefore ‘promiscuous’ in the selection of diverse targets. In contrast MazE, auto-regulates its own expression through the formation of a hexameric MazE2–MazF4 complex, that binds to specific palindromic sequences [29,33,46].
C68 presents several peculiar features with respect to the other members of the family: (i) it is a dimer and does not require a C-terminal region for exerting its physiological function, being self-consistent in the selection of DNA targets; (ii) it recognizes a specific DNA sequence; (iii) residues that in other proteins are responsible for DNA recognition are not conserved (Figures 8); (iv) the number of positively charged residues in DNA-binding regions is lower, as observed in MazE, with respect to that of AbrB, Abh and SpoVT (Figure 8 and see Supplementary Figure S1 at http://www.BiochemJ.org/bj/435/bj4350157add.htm); and (v) the putative DNA-binding regions contain only positively charged residues similarly to MazE and differently from AbrB, Abh and SpoVT which contain also negatively charged residues (Figure 8 and see Supplementary Figure S1) . Altogether, these distinctive features show that C68 adopts a completely different DNA binding mechanism, taking a special place in the family. It is tempting to speculate that loss of promiscuity of C68 with respect to AbrB, Abh and SpoVT is related to the absence of the C-terminal domain, which, in the other members of the family, had a regulative role. In our case, regulation of DNA binding could be associated to the longer LP1 loop. Although the structural data show that LP1 is completely disordered, CD analyses indicate that binding to a target DNA induces local folding which may be attributable mainly to structural arrangements of this loop. Other studies have shown that intrinsically disordered regions undergo structural transition towards folded forms upon binding to a specific target and that this ligand-induced structural transition may be a simple mechanism to regulate cellular processes . Combining the CD data with the footprinting results, we conclude that the interaction of C68 to its targets requires both local flexibility of the protein and DNA deformation upon C68 binding.
Altogether, these data show that C68 is a novel representative of the swapped-hairpin transcription factor superfamily. Because this protein is encoded by the pSSVx genetic element that is dispensable for host survival , it has probably lost structural features that are responsible for the pleiotropic behaviour of the AbrB-like members during its evolution. The distinctive structural characteristics and binding to a specific consensus sequence may restrict the function of this transcription factor to relatively few genes.
In order to verify whether C68 can act as a transcription factor toward other genes as well, we searched the S. islandicus genome for the C68 consensus sequences and identified a promoter of a DNA-binding protein located in a CRISPR (cluster of regularly interspaced palindromic repeats) locus that contains two binding sites strikingly similar to the IR1 and IR2 sequences. CRISPR and the associated protein genes (cas genes) are ubiquitous in Archaea and Bacteria, and these proteins may function similar to an immune system by preventing the invasion of foreign genomic elements [49,50]. It seems reasonable that C68 may regulate the CRISPR locus; further studies to verify this hypothesis are currently underway. Interestingly, homologues of C68 have been found mostly in the S. islandicus genome. Because these species are characterized by the presence of diverse genetic elements, it seems plausible that C68 homologues may regulate CRISPR-mediated host–virus interactions.
Patrizia Contursi, Katia D'Ambrosio, Luciano Pirone, Emilia Pedone, Tiziana Aucelli and Giuseppina De Simone performed experiments. Patrizia Contursi, Katia D'Ambrosio, Emilia Pedone, Qunxin She, Giuseppina De Simone and Simonetta Bartolucci designed experiments; Patrizia Contursi, Katia D'Ambrosio, Emilia Pedone, Qunxin She, Giuseppina De Simone and Simonetta Bartolucci analysed data; Patrizia Contursi, Katia D'Ambrosio, Emilia Pedone and Giuseppina De Simone wrote the paper.
This work was grant-aided by Università Federico II di Napoli in the framework of a programme named ‘short-term mobility’ for researchers, by Danish Council for Independent Research: Technology and Production Sciences [grant number 274–07-0116 to Q.S.] and by Ministero dell'Istruzione, dell'Università e della Ricerca Scientifica [grant number E61J100000200001].
The structural co-ordinates for C68 from the hybrid virus–plasmid pSSVx will appear in the PDB under accession code 3O27.
Abbreviations: BRE, B recognition element; CRISPR, cluster of regularly interspaced palindromic repeats; DLS, dynamic light scattering; DTT, dithiothreitol; EMSA, electrophoretic mobility-shift assay; IPTG, isopropyl β-D-thiogalactopyranoside; IR, interacting region; LC, liquid chromatography; NCS, non-crystallographic symmetry; ORF, open reading frame; QELS, quasi-elastic light scattering; Rh, hydrodynamic radius; RMSD, root mean square deviation; SAD, single-wavelength anomalous diffraction; SeMetC68, seleno-L-methionine derivative of C68
- © The Authors Journal compilation © 2011 Biochemical Society