(C) PLOS One

(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .

A structural classification of the variant surface glycoproteins of the African trypanosome [1]

['Sara Đaković', 'Division Of Structural Biology Of Infection', 'Immunity', 'German Cancer Research Center', 'Heidelberg', 'Johan P. Zeelen', 'Anastasia Gkeka', 'Division Of Immune Diversity', 'Monica Chandra', 'Monique Van Straaten']

Date: 2023-09

Long-term immune evasion by the African trypanosome is achieved through repetitive cycles of surface protein replacement with antigenically distinct versions of the dense Variant Surface Glycoprotein (VSG) coat. Thousands of VSG genes and pseudo-genes exist in the parasite genome that, together with genetic recombination mechanisms, allow for essentially unlimited immune escape from the adaptive immune system of the host. The diversity space of the "VSGnome" at the protein level was thought to be limited to a few related folds whose structures were determined more than 30 years ago. However, recent progress has shown that the VSGs possess significantly more architectural variation than had been appreciated. Here we combine experimental X-ray crystallography (presenting structures of N-terminal domains of coat proteins VSG11, VSG21, VSG545, VSG558, and VSG615) with deep-learning prediction using Alphafold to produce models of hundreds of VSG proteins. We classify the VSGnome into groups based on protein architecture and oligomerization state, contextualize recent bioinformatics clustering schemes, and extensively map VSG-diversity space. We demonstrate that in addition to the structural variability and post-translational modifications observed thus far, VSGs are also characterized by variations in oligomerization state and possess inherent flexibility and alternative conformations, lending additional variability to what is exposed to the immune system. Finally, these additional experimental structures and the hundreds of Alphafold predictions confirm that the molecular surfaces of the VSGs remain distinct from variant to variant, supporting the hypothesis that protein surface diversity is central to the process of antigenic variation used by this organism during infection.

The African trypanosome is a single-celled parasite that causes African Sleeping Sickness in humans and related diseases in many animals. It reproduces in the blood of an infected organism, completely exposed to the immune system, but survives due to its ability to repeatedly disguise itself with new surface coat proteins that fool the antibody response. The African trypanosome has thousands of these Variant Surface Glycoproteins (VSGs) in its genome to switch onto the parasite surface. Therefore, trypanosomiasis is centered on the antibody-VSG interaction and how it varies over time. At the most basic level, understanding the possibilities and constraints of VSG protein architecture is a critical foundation to understand the interaction of the pathogen with the host immune system. The work in this paper thoroughly characterizes this genomic collection of VSGs at the molecular level, classifying the coat proteins into related families based on several important criteria, thereby providing the groundwork for developing an understanding of how this organism is able to evade immunity and thrive during infection.

To address this issue, we use here a collection of experimentally determined structures of VSGs (including many solved over the last few years by our group, with five previously unpublished structures included in this manuscript) as well as hundreds of predicted structures generated by the deep-learning system, AlphaFold, that has proven powerful at the creation of accurate three-dimension protein models from amino acid sequence alone [ 27 , 28 ]. The combined experimental and predicted data establish a structure-based classification scheme for the VSG proteins that is generally consistent with previous attempts to classify the VSGs, but that also places classification efforts on an architectural and functional foundation.

Thus, in the last few years, many notions regarding the VSGs have had to be reexamined. Concurrently, the diversity space of the VSGs has been broadened dramatically, raising the question of whether the thousands of possible VSG proteins in the genome could be organized into a coherent schema relating sequence, structure, and function. Along those lines, several groups have undertaken bioinformatic analysis of the “VSGnome” (the set of possible VSG proteins from the trypanosome genome). Most prominent have been two papers that have used sequence clustering algorithms to divide the VSGs into subclasses [ 25 , 26 ]. While these papers differed in aspects of methodology and resultant classification, there was broad agreement on the classification of VSGs that appeared to be much more highly related to each other. However, none of these efforts directly took protein architecture into account, likely due to the paucity of experimentally determined structures available at the time of the analyses.

The uniqueness of each VSG molecular surface led to the assumption that antigenic variation occurred exclusively through amino acid sequence divergence. Recent structures and mass spectrometric analyses of VSGs have shown that this is not solely the case. Many VSGs can be modified by post-translational O-linked glycosylation at the top of the molecule and this modification has been shown to be potently immunomodulatory [ 17 ]. Intriguingly, the same class that adopts the trimeric oligomerization is to-date the only class in which such O-linked glycosylation has been observed.

Furthermore, for the thirty years after the determination of the first VSG structures [ 19 – 23 ], it was assumed that all VSGs were homodimers. However, more recent structures and biochemical analyses have shown that two broad VSG classes can be distinguished by their oligomeric state: one class harboring exclusively dimers and the other class characterized by concentration-dependent trimers (existing in solution in either a stable monomeric or trimeric state, depending on the protein concentration [ 16 , 17 , 24 ]). At the extreme densities of packing in the two-dimensional trypanosome surface coat, it is possible that all VSGs of this class will exist in the trimeric state, although no high-resolution imaging of the parasite coat that could establish this has been reported.

Therefore, at the heart of antigenic variation in the African trypanosome are the NTDs of the VSGs. These large subdomains are elongated folds centered on a three-helix bundle scaffold. The three-helix bundle is a ubiquitous fold harbored in proteins with diverse activities, where the structural elements specific for differing functions are often coded by sequences inserted between the helices or at the termini of the scaffold. All VSGs studied to date are so structured, possessing subdomains at spatially opposite ends of the bundle: a top lobe and bottom lobe. The recent elucidation of additional VSG structures upturned the notion that all VSGs would possess highly similar protein folds to the initial structures determined in the 1990s. What is now evident from recent work is that the VSGs possess much more architectural and topological variation than had initially been appreciated [ 16 – 18 ].

The mature VSG proteins are attached to the cell surface by a GPI-anchor, and consist of two main regions: (1) a large, N-terminal domain (NTD) of roughly 300–400 amino acids that is most distal to the membrane and (2) a smaller, membrane-proximal C-terminal domain (CTD) spanning 80–120 amino acids beneath the NTD that harbors the GPI-anchor [ 11 ]. Multiple studies have shown that the CTD is minimally immunogenic [ 12 ] and quite possibly inaccessible to the immune system when it is part of the coat [ 12 , 13 ], which correlates well with it being much more highly conserved in sequence than the NTD [ 14 , 15 ]. The NTD is therefore presumed to be the most antigenic portion of the VSG. Consistent with this hypothesis, the NTD typically shows approximately 10–30% identity from variant to variant [ 14 ]. Most of the conserved residues occur in the architecturally common regions of the VSGs, whereas the molecular surfaces (as visualized by protein structures) show very little similarity [ 11 ].

Within the blood, the trypanosome population is continuously exposed to the immune system of the host, yet is able to thrive and persist. This feat is made possible by a highly optimized system of antigenic variation, in which the ~10 million, monoallellically expressed molecules of the Variant Surface Glycoprotein (VSG) coat undergo repeated cycles of “switching”, a process by which antigenically distinct VSGs are expressed at different times [ 7 – 9 ]. This creates a cyclic process of trypanosome growth to high parasitemia, immune response and clearance of the dominant VSG variants, and the subsequent growth of immune-escape variants expressing different VSGs. This process renders the host in a perpetual state of infection that cannot be cleared without pharmacological intervention, typically leading to long-term morbidity and mortality [ 10 ]. Central to this process are the thousands of VSG genes and pseudogenes in the parasite genome that serve as the parasite’s extensive antigen repertoire.

African trypanosomiasis is a human and animal infectious disease caused by several species of protozoan parasites of the genus Trypanosoma [ 1 , 2 ]. These single-celled, eukaryotic parasites can live and reproduce extracellularly in the bloodstream of the host, and are transmitted by the tsetse fly vector (Glossina sp.) [ 3 ]. The geographical range of the tsetse fly in Africa correlates to the distribution of human trypanosomiasis, which at present covers a region of 8 million km 2 between 14 and 20 degrees latitude [ 4 ]. African trypanosomiasis has hampered Central Africa’s economic progress due to its impact on both human and livestock populations [ 5 , 6 ].

Model quality was assessed quantitively using calculations of root-mean-square-deviation (RMSD) and global distance test total score (GDT_TS) [ 43 ] as calculated by the AS2TS system ( http://as2ts.proteinmodel.org/ ) [ 44 ]. Parameters for the structural alignments and scoring were taken from default suggestions to match those used for the scoring in CASP ( https://proteopedia.org/wiki/index.php/Calculating_GDT_TS ), where for the comparison only one chain of an oligomer was used.

The multiple sequence alignment (MSA) was generated for each sequence by the MMseq2 web server implemented in Colabfold, using the Uniref, PDB70 and Environmental sequence database [ 42 ]. The server aligns the input sequence against the database and prepares the input files for the structure prediction that ran locally on our Nvidia P-4000 GPU. Improvement of the structural models was achieved through recycling three times (the software default). In the final step in Colabfold a relaxation/energy minimization is performed using AMBER (Assisted Model Building with Energy Refinement) [ 42 ]. At the end, Colabfold created 5 predictions for each sequence, ranked by pLDDT for a monomer and TM-score for the oligomers.

For the structure predictions, LocalColabFold [ 42 ] version 1.3.0 was installed on a local computer (using https://github.com/YoshitakaMo/localcolabfold ). The LocalColabFold command line interface was used with arguments to specify an input FASTA file, an output directory, using the default options for structure predictions. For monomer prediction “colabfold_batch—amber—templates—use-gpu-relax—num-recycle 3 input.fasta output_directory” was used. For the oligomer prediction the input FASTA file contained the VSG NTD sequence 2 x for class A and 3 x for class B separated by a colon, and adding—model-type AlphaFold2-multimer-v2 to the colabfold_batch command line.

The Trypanosoma Brucei Brucei Lister 427 VSG sequences were obtained from a publically accessible database at the Rockefeller University ( https://tryps.rockefeller.edu/ ). Structural prediction was performed only for the N-terminal domain sequences (spanning approximately 350 amino-acids, depending on the VSG). As the signal peptide is not present in the mature VSG protein, the signal sequence predicted by SignalP 6.0 ( https://services.healthtech.dtu.dk/services/SignalP-6.0/ ) was removed. The end of the helix that connects the N-terminal bottom lobe with C-terminal domain terminates the N-terminal domain.

The VSG545 data set was highly anisotropic and the Staraniso server [ 41 ] was used to determine the anisotropic diffraction cutoff and the output, modified data file was used to solve the structure. A dataset was collected at a wavelength of 1.0 Å at the Paul Scherrer Institut Villingen. The structure was solved by molecular replacement using a model of a dimer predicted with Alphafold [ 27 ]. After model building and refinement using PHENIX and COOT, the final model contains a truncated dimer in the asymmetric unit likely produced by further proteolysis in the crystallization drop. Final model statistics are shown in S1 Table . Like with VSG21, VSG545 was found to be truncated, with the far C-terminal sequence corresponding to the lower lobe removed.

The protein was concentrated to 2 mg/ml and was crystallized at 22°C by vapour diffusion using sitting drops formed from mixing a 1:1 volume ratio of the protein with an equilibration buffer consisting of 100 mM MES pH = 6.5, 22% PEG 8000 and 300mM Li 2 SO 4 . Crystals appeared after about 8 weeks and the same condition supplemented with 25% PEG 400 was used as a cryoprotectant for crystals flash-cooled in liquid nitrogen.

To generate N-terminal domain, the concentrated protein was subjected to limited proteolytic digestion. VSG545 at the concentration of 2 mg/ml was mixed with endoproteinase LysC (New England Biolabs) at 1:1200 LysC:VSG ratio and incubated for 1 hour at 37°C. The reaction was terminated by adding TLCK to 50μg/ml final concentration. The protein was further purified (separating the NTD from the CTD) by size exclusion chromatography.

VSG545 cloning and verification were performed in the same manner as VSG558 (except that for selection a hygromycin concentration of 5μg/ml was used). For purification of VSG545, cells from 2.4 L culture were pelleted and lysed with 20 ml 0.4 mM ZnCl2, containing a protease inhibitor cocktail (Roche cOmplete), after centrifugation (10.000 g, 10 min) the pellet was resuspended in 20 ml 10 mM Sodium Phosphate buffer, pH8 containing protease inhibitors (42°C) and centrifuged (10.000 g, 10 min). The supernatant was passed through a 20 ml Q-sepharose Fast-flow column (Cytiva) equilibrated with 10 mM Sodium Phosphate buffer, pH8. The flow through was collected and concentrated. To remove protease inhibitors, the protein was purified on a Superdex 200 Increase 10/300 GL column (GE Healthcare) equilibrated in 10 mM HEPES/NaOH pH = 8.0, 150 mM NaCl. The fractions containing the VSG protein were pooled and concentrated.

A VSG21 stably-expressing T.b. brucei strain was obtained as a kind gift of Dr. Hee-Sook Kim [ 39 ]. VSG21 was expressed and purified from trypanosomes according to the same protocol as VSG11. Full-length VSG21 at the concentration of 12.3 mg/ml was crystallized at 22°C by vapour diffusion using sitting drops formed from mixing a 1:1 volume ratio of the protein with an equilibration buffer consisting of 100 mM Tris/HCl pH = 7.0, 200 mM CaCl2 and 20% (W/V) PEG 3335. For data collection, crystals were soaked in 100 mM Tris/HCl pH = 7.0, 25% (V/V) ethylene glycol and 20% (W/V) PEG 3335 and flash-cooled in liquid nitrogen. A native dataset was collected at the Paul Scherrer Institut, Villingen. The structure was solved with molecular replacement using a model of a VSG21 dimer predicted by AlphaFold using the PHENIX package, followed by model optimization and refinement using PHENIX and COOT. Final model statistics are shown in S1 Table . VSG21 was found to be truncated, with the far C-terminal sequence corresponding to the lower lobe removed.

VSG558 was expressed and purified from trypanosomes in the same manner as VSG11. Purified full-length VSG558 was mixed with trypsin at 1:50 trypsin:VSG ratio and incubated 1 hour on ice and the NTD domain was isolated by size exclusion chromatography on a Superdex 200 10/300 GL column equilibrated in 10 mM HEPES/NaOH pH = 8.0, 150 mM NaCl. After purification the VSG558 NTD was concentrated to 5 mg/ml and crystallized at 22°C by vapour diffusion using hanging drops formed from mixing a 1:1 volume ratio of the protein with an equilibration buffer consisting of 100 mM Citric acid/NAOH pH = 5.5 and 17.5% PEG 3350. The crystals appeared after one week and the same condition supplemented with 25% PEG 400 or 25% MPD was used as a cryoprotectant for crystals flash-cooled in liquid nitrogen. The crystals diffracted to 1.74 Å and were collected at a wavelength of 1.0 Å at the Paul Scherrer Institut Villingen. The structure was solved by molecular replacement using a model of a dimer predicted with Alphafold [ 27 ], followed by model optimization and refinement using PHENIX and COOT. Final model statistics are shown in S1 Table .

Clones were initially screened with flow cytometry for VSG2 loss of expression using a monoclonal VSG2 WT antibody [ 30 ]. To determine the binding of antisera to live trypanosomes, 2 x 10 6 parasites were collected and incubated with 200ul FITC conjugated VSG2 WT antisera (FITC conjugation kit, Abcam ab102884) (1:200) in cold HMI-9 without FBS for 10 min on ice. Cells were washed twice with cold HMI-9, resuspended in 200μl cold HMI-9 and immediately analyzed with a Guava EasyCyte 4HT Flow Cytometer (Luminex). For a second step, negative clones were sequenced by isolating RNA using the RNeasy Mini Kit with on-column DNAse digestion (Qiagen) and cDNA synthesis with Superscript IV first strand synthesis system (Thermo Fisher). The sequences were then amplified, using Phusion High-Fidelity DNA Polymerase (New England Biolabs), a forward primer binding to the spliced leader sequence and a reverse binding to the VSG 3´ untranslated region. The final products were purified by gel extraction from a 1% gel with the NucleoSpin Gel and PCR clean-up kit (Macherey-Nagel) and verified by Sanger sequencing.

VSG558 was amplified from genomic DNA of T. brucei brucei strain Lister 427 VSG2 expressing cells. The plasmid used to generate VSG558-expressing trypanosomes was a modification of previous plasmids designed for integration into the trypanosome genome [ 30 ]. The plasmid was first linearized by EcoRV (New England Biolabs), and then transfected into VSG2-expressing cells (2T1): 10ug of plasmid was mixed with 100ul of 4x10 7 cells in Tb-BSF buffer (90mM Na 2 HPO 4 , pH 7.3, 5mM KCl, 0.15mM CaCl 2 , 50mM HEPES, pH7.3), using an AMAXA nucleofector (Lonza) program X-001, as previously described [ 29 ]. After 6h hygromycin was added to a concentration of 25 ug/ml and single-cell clones were obtained by serial dilutions in 24-well plates and collected after 5 days.

The methylated VSG615 protein was crystallized at 22°C by vapour diffusion using hanging drops formed from mixing a 1:1 volume ratio of the protein with an equilibration buffer consisting of 23% (w/v) PEG 4000, 100 mM sodium cacodylate pH = 6.0, 10 mM ZnCl 2 . For data collection, crystals were soaked in the same buffer augmented to 20% glycerol, flash-cooled in liquid nitrogen. A native dataset was collected at the Paul Scherrer Institut, Villingen. The structure was solved with molecular replacement using a model of a VSG615 trimer predicted by AlphaFold using the PHENIX package (loop regions with a pLDDT below 50 were removed for the search model). The structure is characterized by a very high Wilson B factor (85.19Å 2 ), high atomic temperature factors, and correspondingly high disorder in many regions of the model, although the electron density for the O-linked sugars (a principle reason for pursuing the VSG615 structure) was clear. Final model statistics are shown in S1 Table .

To generate the NTD of VSG615, the concentrated protein was subjected to limited proteolytic digestion using trypsin. The VSG at the concentration of 1 mg/ml was mixed with trypsin (5 mg/ml) (Sigma Aldrich) at 1:50 trypsin:VSG ratio and incubated for 3 hours on ice. The reaction was terminated by adding PMSF to 1 mM final concentration. The protein was further purified (separating the NTD from the CTD) by size exclusion chromatography on a HiLoad 16/600 Superdex 200 pg column (GE Healthcare) equilibrated in 20 mM HEPES/NaOH pH = 7.5, 150 mM NaCl. The fractions containing the NTD of VSG615 from size exclusion chromatography were concentrated to 10 mg/ml in 500 μl of final volume. The lysine residues on the protein were subsequently methylated by reductive alkylation [ 40 ]. 10 μl of 1M borane dimethylamine complex (DMAB) and 20 μl of 1M formamide into the protein solution and mixed gently. The mixture was incubated for 2 hours in the dark at 4°C with rotation and the entire process repeated. Prior to overnight incubation, 5 μl of 1M DMAB was added. To stop the reaction, 1M Tris pH 7.5 was added to bring the reaction to a final volume of 1 ml. The buffer was exchanged to 20 mM HEPES pH 7.5, 150 mM NaCl by size exclusion chromatography with Superdex 200 10/300 GL column. The fractions containing the methylated protein was further concentrated to 10 mg/ml for crystallization.

T. brucei brucei expressing VSG615 was obtained as a kind gift of Dr. Hee-Sook Kim [ 39 ] and were cultured at 37°C and 5% CO 2 in HMI-9 media (PAN Biotech) supplemented with 10% fetal calf serum (Gibco), L-cysteine and ß-mercaptoethanol. The cells from 4L culture were pelleted and the VSG615 protein purified through modifications of previously published protocols [ 31 ]. The cells were lysed with 40 ml 0.2 mM ZnCl 2 , after centrifugation (10.000 g, 10 min) the pellet was resuspended in 30 ml 20 mM HEPES/NaOH pH = 7.5, 150 mM NaCl (42°C) and centrifuged (10.000 g, 10 min). The supernatant was passed through a 25 ml Q-sepharose Fast-flow column (GE Healthcare) equilibrated with 20 mM HEPES/NaOH pH = 7.5, 150 mM NaCl. The flow-through was collected and concentrated to final concentration 1 mg/ml.

Native VSG11 WT datasets were collected at a wavelength of 1.0 Å at the Paul Scherrer Institut Villingen. For phasing, an iodine soaked crystal was collected at 1.54 Å on a Rigaku X-ray generator and DECTRIS PILATUS3 R detector. The structure was solved by single wavelength anomalous diffraction (SAD) using SHELX [ 32 ] and HKL3000 suite [ 33 ]. The initial model was built using Arp/wARP [ 34 ] with PHENIX [ 35 ], COOT [ 36 ] and PDB_REDO [ 37 ] for model optimization and refinement. That model was placed in a high-resolution VSG11 WT -Iodine dataset (collected at 1Å at the SLS synchrotron–see S1 Table ) by molecular replacement with PHASER [ 38 ] and the model optimized and refined with PHENIX-REFINE [ 35 ] and with cycles of manual model building using COOT [ 36 ]. The structures of VSG11 WT -Oil, VSG11 WT -AS, and VSG11 N2C -18mer were solved by molecular replacement using the refined VSG11 WT -Iodine model with the PHASER package [ 38 ] of PHENIX, and the models optimized and refined with PHENIX-REFINE [ 35 ] and with cycles of manual model building using COOT [ 36 ]. The structure of VSG11 N2C -18mer is characterized by a high Wilson B factor (66.92Å 2 ) and disorder in many regions of the model. Final statistics are shown in S1 Table .

VSG11 N2C -18mer crystals were grown at 22°C by vapor diffusion using hanging drops with a 1:1 volume ratio of 6 mg/ml protein to equilibration buffer containing 19% (w/v) PEG 2000MME, 0.2 M NaCL 0.1 M MES pH 6.0. For cryoprotection the crystals were transferred to the same buffer as used for the equilibration with 25% (v/v) glycerol and were flash-cooled in liguid nitrogen.

VSG11 WT -AS crystals were grown at 22°C by vapor diffusion using hanging drops with a 1:1 volume ratio of 6mg/ml protein to an equilibration buffer consisting of 0.1M sodium acetate pH 4.5 and 2M ammonium sulfate. For cryoprotection the crystals were transferred to the same buffer as that used for equilibration but supplemented with 25% v/v glycerol and were flash-cooled in liquid nitrogen.

VSG11 WT crystals containing only the N-terminal domain appeared after several days at 22°C using the hanging drop method with 6 mg/ml VSG11 (full-length) protein in a 1:1 volume ratio against 100 mM Tris/HCl pH = 7.5 1.6–1.75M sodium-potassium tartrate. CryoOil (MiTeGen) was used as cryoprotectant and the VSG11 WT -Oil crystals flash-cooled in liquid nitrogen. The loss of the CTD is presumed to have occurred during the crystallization stage. VSG11 WT -Iodine crystals soaked in 200 mM KI, 100 mM Tris/HCl pH 8.0 and 1.7 M NaKTartrate were flash-cooled directly in liquid nitrogen.

All VSG11 constructs were expressed in T. b. brucei cultured at 37°C and 5% CO 2 in HMI-9 media (PAN Biotech) supplemented with 10% fetal calf serum (Gibco), L-cysteine and ß-mercaptoethanol. The cells from 3.6 liter culture were pelleted and the VSG11 proteins purified through modifications of previously published protocols [ 31 ]. The cells were lysed with 40 ml 0.4 mM ZnCl 2 , after centrifugation (10.000 g, 10 min) the pellet was resuspended in 30 ml 20 mM Hepes/NaOH pH = 8.0, 150 mM NaCl (42°C) and centrifuged (10.000 g, 10 min). The supernatant was passed through a 25 ml Q-sepharose Fast-flow column (GE Healthcare) equilibrated with 20 mM Hepes/NaOH pH = 8.0, 150 mM NaCl. The flow through was collected and concentrated. The protein was further purified on a HiLoad 16/600 Superdex 200 pg column (GE Healthcare) equilibrated in 10 mM HEPES/NaOH pH = 8.0, 150 mM NaCl. The fractions containing the VSG protein were pooled and concentrated.

Clones initially screened with flow cytometry for VSG2 loss of expression using a monoclonal VSG2 WT antibody [ 30 ] and VSG11 gain of expression with anti-VSG11 antisera. To determine the binding of antisera to live trypanosomes, 1 x 10 6 parasites were collected and incubated with VSG2 WT antisera (1:4000) or VSG11 WT (1:1000) together with Fc block (1:200, BD Pharmingen) in cold HMI-9 without FBS for 10 min at 4°C. Cells were washed once with cold HMI-9 and resuspended in 200μl cold HMI-9 with rat anti-mouse IgM-FITC (1:500, Biolegend). After one wash with cold HMI-9, cells were resuspended in 150μl HMI-9 and immediately analyzed with FACSCalibur (BD Bioscences) and FlowJo software (v10). For a second step, clones were sequenced by isolating RNA using the RNeasy Mini Kit (Qiagen), followed by DNAse treatment with the TURBO DNA-free kit (Invitrogen) and cDNA synthesis with ProtoScript II First Strand cDNA Synthesis (New England Biolabs). The sequences were then amplified, using Phusion High-Fidelity DNA Polymerase (New England Biolabs), a forward primer binding to the spliced leader sequence and a reverse binding to the VSG 3´untranslated region. The final products were purified by gel extraction from a 1% gel with the NucleoSpin Gel and PCR clean-up kit (Macherey-Nagel) and sent for Sanger sequencing.

Plasmids used to generate VSG11 WT and VSG11 N2C are described in [ 17 ]. Plasmids were first linearized by EcoRV (New England Biolabs), and then transfected into VSG2-expressing cells (2T1): 10ug of each plasmid were mixed with 100ul of cells (at a concentration of between 2.5x10 7 and 3x10 7 ) in Tb-BSF buffer (90mM Na 2 HPO 4 , pH 7.3, 5mM KCl, 0.15mM CaCl 2 , 50mM HEPES, pH7.3), using an AMAXA nucleofector (Lonza) program X-001, as previously described [ 29 ]. Blasticidin at a concentration of 100ug/ml was added after 6h and single-cell clones were obtained by serial dilutions in 24-well plates and collected after 5 days.

Two constructs of VSG11 NTD were used in the structural studies in this manuscript (all VSGs, unless otherwise noted, are from the strain Lister 427). The first was the wild type sequence (VSG11 WT ) that (1) crystallized as a monomer in the asymmetric unit (with a crystallographic trimer) in sodium-potassium tartrate with two structures determined: VSG11 WT -Iodine, a crystal soaked in Na/K iodine diffracting to 1.27Å, and VSG11 WT -Oil, a crystal diffracting to 1.23 Å resolution cryo-cooled in oil, and (2) crystallized as two monomers in the asymmetric unit (not a dimer and with a crystallographic trimer–see Results below) in ammonium sulfate (diffracting to 1.75 Å, denoted VSG11 WT -AS). The second construct of VSG11 was a chimeric form consisting of the VSG11 WT NTD connected to the VSG2 CTD, denoted VSG11 N2C . This construct crystallized with 18 monomers of VSG11 N2C in an asymmetric unit comprised of six trimeric assemblies (denoted VSG11 N2C -18mer). VSG11 WT -18mer crystals were also obtained that diffracted to nearly 3Å resolution, but the best diffracting crystals were produced by the VSG11 N2C construct (2.6 Å resolution). Therefore, the presence of the VSG2 CTD was not required for the formation of the 18mer form, and since in both cases the CTDs were not present in the crystallized form, it is unlikely that the 18mer form is tied in any manner to the chimeric form.

Results

Overall classification of the VSGs based on structures Altogether, as of the writing of this manuscript, there are fourteen published, experimentally determined protein structures of the VSG NTD from Trypanosoma brucei in hand: VSG1, VSG2, VSG3, VSG11, VSG13, VSG21, VSG397, VSG531, VSG545, VSG558, VSG615, VSG1954, VSGsur, and IlTat1.24. These fourteen structures not only map well to the bioinformatic clustering schemes published previously, but they better discriminate between them and put the classification schemes on an architectural foundation. We have sought to take these structures, with an eye to the bioinformatics work, and create what we conclude is a more explanatory organizational scheme for the VSG proteins (Fig 4). To provide continuity with the previous classification schemes of Hutchinson [15] and Cross [25], we have preserved the A/B designation for classes. However, this has required a change in the meaning of subdivisions within class A to reflect insight from the protein structures (discussed below). Finally, this schema was then tested by modeling hundreds of VSG proteins with the deep learning system AlphaFold and comparing the resultant folds to the classes we created. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 4. Structure-based VSG Classification Scheme. The two broad subclasses of VSG protein discussed in the main text are denoted at the top of the figure (class A and class B). Subclasses are shown in different colors with a collection of structures illustrating each class contained in a color-coordinated box matching the subclass name. Structures are drawn as in previous figures with the exception that O-linked glycans are colored red like N-linked glycans. Beneath each subclass name in gray are the clustering designations of the classes in previous papers discussed in the main text. Cyan backgrounds indicate structures solved prior to 2017. “N” and “C” refer to the N- and C-termini of the respective NTDs (colored by chain and placed nearby the terminal residue itself). As discussed in the text, VSG21 and VSG545 crystal structures are missing the bottom lobe sequence in the models due to truncation of the protein during crystallization. https://doi.org/10.1371/journal.pntd.0011621.g004 To begin, there are two broad “super-families” of VSG structures based on the topological arrangement of the bottom lobe in the primary sequence (classes A and B, Figs 4 and S6). All VSG NTD structures determined have a bottom lobe subdomain structure (with the exceptions of VSG21 and VSG545 in this report that had this domain proteolytically removed). In class A the bottom lobe residues are present at the C-terminal portion of the NTD sequence, directly following the final helix of the bundle. In contrast, in class B the bottom lobe residues are present at the N-terminal portion of the NTD sequence between the amino acids that form the first and second helices of the bundle (S6 Fig). Further cementing this broad division into two main super-families are two observations. The first is that all class A VSGs studied to-date are found as dimers in solution and in protein crystals (the latter either as non-crystallographic dimers in the asymmetric unit or as monomers in the asymmetric unit where a crystallographic two-fold symmetry produces the dimer, [18,21,45]), whereas all studied class B VSGs are characterized by the same trimeric arrangement in the crystals (either through crystallographic or non-crystallographic symmetry, with monomers or trimers in the asymmetric unit, respectively), with biochemical evidence of monomer to trimer transitions based on protein concentration [24]. The second fact is that many of the class B VSGs are post-translationally modified by O-linked carbohydrates, whereas none of the class A VSGs have been found so modified. When comparing with previous efforts to classify the VSGs, our class A would correspond to class “A” in Cross [25], classes N1-N3/N5 in Weirather [26], and the older class A of Carrington [47]. Class B would correspond in these sources to classes B, N4, and B, respectively. Within our class A superfamily are two large structural subclasses, A1 and A2 (Fig 4). In class A2 are found all the VSG structures that were solved prior to 2018 (Iltat1.24, VSG1, and VSG2), illustrating that the set of conclusions about VSG structure and function accepted for over a quarter century were based on a very limited subset of VSGs containing highly related protein folds. This subclass is characterized by a top lobe that contains all the cysteine disulfides in the NTD, a top lobe fold that is a hodge-podge of alpha helices and beta-strands, and N-linked glycan chains located in the bottom lobe of the NTD. In sharp contrast, subclass A1 VSGs are significantly longer due to the presence of a large beta-sheet subdomain for the top lobe (forming a beta-sandwich in the dimer). Further distinguishing A1 from A2 is the distributed nature of the disulfide bonds (present throughout the length of the VSG in A1), the presence of a “middle lobe” of secondary structure straddling the beginning of the three-helix bundle, and the location of the N-linked sugar(s) just below the beta-sandwich top lobe. This dramatically different arrangement in structural elements is reflected by the differing positions of the N-terminus of the NTD, namely toward the middle of the VSG fold in A1 but located at the very top of the fold in A2. Additionally, the folds of the A1 VSGs from experimental and predicted structures (see below) suggest that this subclass can be further subdivided into three groups based on the size, conformation, and twist of the top lobe beta-sheet and the width of the space between the three-helix bundles in the dimer. These subclasses of A1 were in previous sequence clustering classification systems denoted as the separate classes A1 and A3[25] and N1, N3, and N5 [26]. However, all these subclasses are structurally similar to each other in the manners described above while differing markedly from the A2/N2 classes. Therefore, we considered it better to combine A1/A3 and N1/N3/N5 into a single class, A1 (a family contrasted to A2), and then subdivide them within A1: A1a, A1b, A1c (Fig 4). In this subdivision, we split the A3 class from Cross into A1b and A1c (which correspond to classes N5 and N3 in Weirather, respectively) based on differences in the top lobe architecture (e.g., the smaller top lobe beta-sandwich in A1c compared to A1b and the presence of a second beta-sheet over the middle lobe in A1c that is not present in A1b). Finally, Cross et al. [25] divide the class B VSGs into two subgroups, whereas Weirather et al. [26] do not subdivide their equivalent class, N4. In contrast to the marked structural divergences between classes A1 and A2, and even the differences within the distinct subclasses of A1, we find no broad structural differences within the class B VSGs (examining features such as the protein fold, disulfides, N- or O-linked glycans, or oligomerization). However, three more subtle differences between members of the B class can be used to divide them into two subgroups. For example, two helical regions differ between the B1 and B2 subgroups. One helix that exists in the top lobe of the B2 class is not generally present in the B1 class (S7A Fig). Secondly, one of the bundle helices in class B1 is disordered in places relative to the same helix in many class B2 members (S7B Fig). Thirdly, in class B2 several VSGs possess an NTD with more amino acids. In the AlphaFold modelling discussed below, these additional residues are predicted to form both an extended loop in the disordered region of the helix and also longer loops in the top lobe (S7A Fig). Buttressing this subdivision of class B, when multiple B1 and B2 VSGs (from experimental and AlphaFold predicted models) are analyzed by structure (using the Dali Server [48]), these are divided into two classes consistent with the B1 and B2 groupings produced by sequence analysis (see the structural dendogram in S7C Fig). We have therefore denoted a split in Fig 4 between the B1 and B2 classes.

[END]
---
[1] Url: https://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0011621

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/