(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .



Mycobacteria that cause tuberculosis have retained ancestrally acquired genes for the biosynthesis of chemically diverse terpene nucleosides [1]

['Jacob A. Mayfield', 'Division Of Rheumatology', 'Inflammation', 'Immunity', 'Brigham', 'Women S Hospital', 'Harvard Medical School', 'Boston', 'Massachusetts', 'United States Of America']

Date: 2024-10

Mycobacterium tuberculosis (Mtb) releases the unusual terpene nucleoside 1-tuberculosinyladenosine (1-TbAd) to block lysosomal function and promote survival in human macrophages. Using conventional approaches, we found that genes Rv3377c and Rv3378c, but not Rv3376, were necessary for 1-TbAd biosynthesis. Here, we introduce linear models for mass spectrometry (limms) software as a next-generation lipidomics tool to study the essential functions of lipid biosynthetic enzymes on a whole-cell basis. Using limms, whole-cell lipid profiles deepened the phenotypic landscape of comparative mass spectrometry experiments and identified a large family of approximately 100 terpene nucleoside metabolites downstream of Rv3378c. We validated the identity of previously unknown adenine-, adenosine-, and lipid-modified tuberculosinol-containing molecules using synthetic chemistry and collisional mass spectrometry, including comprehensive profiling of bacterial lipids that fragment to adenine. We tracked terpene nucleoside genotypes and lipid phenotypes among Mycobacterium tuberculosis complex (MTC) species that did or did not evolve to productively infect either human or nonhuman mammals. Although 1-TbAd biosynthesis genes were thought to be restricted to the MTC, we identified the locus in unexpected species outside the MTC. Sequence analysis of the locus showed nucleotide usage characteristic of plasmids from plant-associated bacteria, clarifying the origin and timing of horizontal gene transfer to a pre-MTC progenitor. The data demonstrated correlation between high level terpene nucleoside biosynthesis and mycobacterial competence for human infection, and 2 mechanisms of 1-TbAd biosynthesis loss. Overall, the selective gain and evolutionary retention of tuberculosinyl metabolites in modern species that cause human TB suggest a role in human TB disease, and the newly discovered molecules represent candidate disease-specific biomarkers.

Funding: This work was supported by the National Institutes of Health (U19 AI162584 to DBM and KR; R01 AI165573 to DBM; U19 AI1625598 to DRS; U19 AI162598, RO1 AI146194 and DP2 AI164249 to SM and BTG). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Data Availability: All data relevant to the manuscript are provided within the paper and its Supporting Information files. S1 Data contains raw data and R code in the R Markdown document S1_Data.Rmd that together can be used to reproduce all analyses and figures. Users must modify the path statements in S1_Data.Rmd to point to the provided files and additionally must have all R packages and dependencies installed, including the limms package. S1 Data includes the following files: README.txt, a copy of the data statement S1_Data.Rmd, an R Markdown document of annotated R code and explanatory comments S1_Data.html, an R Markdown report produced from S1_Data.Rmd S1_Data_files, the graphical output of S1_Data.Rmd 012623_terpene_qPCR.csv, a raw data file of qPCR results from S8 Fig 012623_terpene_TF.csv, transcription factor overexpression data from S8 Fig 070524_AUC_TbAd_MgGAST.csv, measurements of TbAd with varied Mg, S9 FigB-E altered_TbAd_fragments.csv, observed mass spec fragments from S7 Fig MycoMassDB.csv, an iteration of a database of known MTb lipids phenoDKOp.csv, the covariate key for the samples in the Rv3377-8c experiment phenoKO3X.csv, the covariate key for the samples in the Rv3378c experiment phenoMtbC3.csv, the covariate key for the samples in the MTC experiment xsetDKOp, xcms object of aligned mass spec peaks for the Rv3377-8c experiment xset3X, xcms object of aligned mass spec peaks for the Rv3378c experiment xsetStrains_061521, xcms object of aligned mass spec peaks for the MTC experiment plKO3X.snrc.csv, the mass spec peak list for the Rv3378c experiment outtree_noeuds.nwk, nearest-neighbor orthogroup tree from Fig 5C phyliptree.phy, NCBI taxonomy tree from Fig 5A RBH_summary.csv, reciprocal BLAST hit matrix of MTC strains, Fig 5C Rv3378del_RNA_covar.csv, the covariate key for Rv3377-8c transcriptomics Rv3378del_RNAseq.csv, read counts for Rv3377-8c transcriptomics TbAd_AUC.csv, area under the curve for TbAd in Fig 1B TbAd_deriv_mass.R, R list of terpene nucleoside masses terpene_functions.R, bespoke R functions used in these analyses terpene_OD600.csv, OD600 measurements for strains, S1 FigG Mayfield_S5Fig_data.xlsx, raw data and calculations for standard addition, S5 FigBC S1_raw_images.pdf, uncropped gel images from S1 Fig (includes irrelevant lanes) The R limms package is available at https://github.com/jamayfie/limms . The limms package vignette that includes detailed descriptions of all limms functions, arguments, design considerations and examples of how to use each function output as an R Markdown report is provided in S2 Data as LIMMS_vignette.html. The limms package includes data used for working examples in the vignette and help pages, including S1 Table in the manuscript. Raw RNAseq data are available through NCBI as BioProject accession number PRJNA1146031.

Copyright: © 2024 Mayfield et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Our central goals were to determine the origin, evolutionary timing, transfer mechanism, and biochemical outputs of the known tuberculosinyl biosynthetic locus in Mycobacteria. However, the complexity of mass spectrometry-based experiments comparing multiple strains and species required advancement of bioinformatic tools for comparative metabolomics. The output of a modern mass spectrometer is a list of ion masses, retention times, and peak intensities that can exceed 10,000 multidimensional data points (molecular events). Hence, comparative experiments incur a substantial multiple hypothesis testing penalty. Further, peak finding algorithms are prone to artifacts for events near the threshold of detection that distort statistical testing by introducing intermittent zero values. Comparisons such as nested samples, paired sample analysis, dose responses, and time courses have high discovery potential but are best analyzed using contrast-based methods instead of pairwise testing. Extending the first generation comparative lipidomics platform for two-way analysis [ 4 ], here we introduce limms as a second generation profiling tool for differential abundance analysis using flexible contrast-based comparisons, linear models, and Bayesian shrinkage of variance [ 18 ]. Using limms, we identified an unexpected and large family of approximately 100 previously unknown tuberculosinyl compounds. Further, combining sequence analysis and chemotyping, we identified the likely origin and timing of horizontal transfer of the locus, which revealed that gain of constitutively high 1-TbAd biosynthesis correlated with acquisition of human TB causation on an evolutionary time scale.

The 1-TbAd biosynthetic genes can influence mycobacterial survival in cells by inhibiting lysosomal acidification [ 9 , 10 ], which promotes pathogen survival in mouse macrophages [ 6 , 11 ] and in lungs during early infection in vivo [ 12 ]. These functional data align with the known chemical mechanism of 1-TbAd to act as a weak base [ 10 ] and with data showing that Rv3378c or 1-TbAd has an essential role in blocking lysosomal maturation and autophagy, which are 2 cellular processes involved in escape from host killing [ 11 , 13 , 14 ]. However, the locus is not essential for infection, as suggested by gene silencing experiments with mixed Mtb strains [ 15 ]. One potential explanation for all data is that tuberculosinyl metabolites are decisive for infection outcomes in certain circumstances, like persistence through nutrient limitation in macrophages, which 1-TbAd was recently shown to influence [ 11 ]. Mice have limited ability to model the early survival of single bacteria, transmission, and persistence events that occur during human tuberculosis disease, highlighting the need for human data to understand possible roles of 1-TbAd in virulence. Mycobacterial pathogens with competence for infection of mammals appeared in the MTC through evolution from nonvirulent soil Mycobacteria, with only a subset further disseminating among humans as epidemic TB disease. Therefore, we asked if 1-TbAd biosynthesis gene variations among MTC species that occurred over the same time frame as acquisition of the capability for productive infection of humans could offer clues to TB disease [ 16 , 17 ].

One recently discovered Mycobacteria-restricted lipid is the lysosomotropic base 1-tuberculosinyladenosine (1-TbAd), which comprises >1% of total Mtb lipid [ 5 ]. The tandem genes Rv3377c and Rv3378c encode 1-TbAd biosynthesis. Their atypical GC-content and lack of orthology suggested horizontal gene transfer from an undetermined source [ 6 ], a hypothesis later extended to include the adjacent gene Rv3376 [ 7 ]. Chemical and genetic investigations showed that Rv3378c encodes the tuberculosinyl transferase that generates 1-TbAd [ 5 ], the rearrangement product N 6 -TbAd [ 8 ], and the by-product isotuberculosinol [ 7 ]. Rv3377c is presumed to encode a synthase for the unusual halimane lipid tuberculosinol pyrophosphate [ 7 ], while Rv3376 encodes a haloacid dehydrogenase ortholog with an unknown role. Only the function of Rv3378c has been directly determined, and the breadth of molecules made by this enzyme remains unknown. Further, the origin, regulation, and function of this putative locus among mycobacteria that vary in virulence and human tropism remain unknown.

Whereas most mycobacterial species are nonpathogenic or infect nonhuman hosts, Mycobacterium tuberculosis (Mtb) is an obligate human pathogen that causes lung disease on a worldwide basis, killing more than 1 million people per year. The lipid-rich mycobacterial envelope contributes to the global burden of tuberculosis (TB) disease as major source of phenotypic variance and virulence factors. In addition to forming the primary barrier with the host, mycobacterial lipids carry out specific functions that induce cough [ 1 ], moderate immunity [ 2 ], and mediate antibiotic resistance [ 3 ]. Mass spectrometry has revealed thousands of mycobacterial lipids organized into 58 classes [ 4 ], emphasizing the extreme complexity of its evolved lipidome, but also providing a path to new pathogen-shed diagnostics and drug targets.

Results

Targeted analysis of 1-TbAd biosynthesis gene functions Given 1-TbAd’s ability to block lysosome function in macrophages and promote mycobacterial growth in macrophage culture [6,11] and in vivo [12], we sought to understand more about 1-TbAd production by testing the functions of all 3 biosynthetic genes using targeted knockouts. Knowing Rv3378c is essential for 1-TbAd production [10], here we deleted Rv3376 and Rv3377c through gene replacement, as well as creating a double mutant of Rv3377c and Rv3378c. Strains were validated through sequencing and RT-PCR (S1A–S1F Fig) and shown to not alter growth in 7H9 media (S1G Fig). While we complemented the Rv3378c and Rv3377c-Rv3378c double deletions, all attempts to complement the Rv3376 or Rv3377c deletion strains failed. We hypothesized non-native expression of these genes was genotoxic. Rv3377c and Rv3378c are thought to act sequentially, producing tuberculosinyl pyrophosphate and conjugating it to adenosine, respectively (Fig 1A, red) [5,7]. Targeted mass spectrometry detected 1-TbAd ([M+H]+ for TbAd and subsequent terpene nucleosides) and its rearrangement product N6-TbAd in parental Mtb H37Rv strains. Rv3377c was indeed necessary for both TbAd forms (Fig 1B). While failure to complement the Rv3377c deletion meant a second site effect was not ruled out, we noted that isolates with 2 different loss-of-function alleles in Rv3377c were also defective in 1-TbAd production [10]. In contrast, deletion of Rv3376 reduced but did not eliminate 1-TbAd, ruling out an essential biosynthetic function but consistent with an accessory role (Figs 1B and S2). Nakano [19] demonstrated Rv3376 has phosphatase activity, which might augment geranylgeranyl pyrophosphate pools (Fig 1A, green). PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 1. Engineered deletions of 1-TbAd biosynthesis genes reveal gene functions and greatly expand the lipid signature. (A) Schematic shows the 1-TbAd biosynthetic pathway. (B) Area-under-the-curve of extracted ion chromatograms tested 1-TbAd production by the parental Mtb strain (H37Rv) and single or two-gene knockouts as well as the Rv3378c deletion complemented with Rv3378c. A Benjamini–Hochberg adjusted p value is indicated only for significant pairwise t tests (*: p < 0.05, **: p < 0.01, ***: p < 0.001, ****: p < 0.0001). The peak area in the retention time window corresponding to Mtb H37Rv 1-TbAd [M+H]+ was measured in intervening solvent blank samples to indicate the measurement threshold. (C) Comparative metabolomics analysis showed genetic control of differentially abundant molecules. Positive mode mass spectrometry data were analyzed by comparing deletions in Rv3377c, Rv3378c, or a double deletion of both genes to the H37Rv parental strain. Differential abundance determined using t tests or a linear model fit using limms was compared. The number of significant events (p < 0.05 after adjustment using the Benjamini–Hochberg method) that also changed more than 2-fold were indicated (blue rectangle). The most abundant 1- and N6-tuberculosinyladenosine peaks are flagged (red and orange circles, respectively). Using limms, events with similar patterns of change in all 3 comparisons were determined by F-test. Events with >4-fold decrease in all 3 mutants and p < 0.001 (Benjamini–Hochberg adjusted p value of the F-test) are shown in black. (D) An independent metabolomic comparison of the Rv3378c deletion to H37Rv parent strain and the Rv3378c deletion complemented with Rv3378c was analyzed for differentially abundant positive mode events. Significantly changed events were determined using t tests or limms. The numbers of changed events (Benjamini–Hochberg adjusted p < 0.05 and 2-fold or greater change), the gene-dependent events (blue rectangle), and the most abundant 1- and N6-tuberculosinyladenosine peaks are indicated as in Fig 2A. The data in Fig 1B–1D can be found in S1 Data. https://doi.org/10.1371/journal.pbio.3002813.g001

limms for untargeted metabolomics Whereas conventional approaches measure the effect of gene deletion on expected or known products of enzymes (Fig 1A and 1B), lipidomics platforms enable untargeted approaches to measure the scope of effects on the organism that includes unknown molecules measured as percent of total lipids meeting defined change criteria (Fig 1C and 1D). To achieve this phenotypic expansion, comparative lipidomics relies on differential abundance, linking the mass, retention time, and intensity values of unnamed “molecular events” [4] to genetic or conditional effects. This untargeted approach can discover previously unknown compounds, connect chemicals to biosynthetic genes lacking known substrates or products, and link metabolites to unexpected or emergent networks. However, this approach applied to high-resolution mass spectrometry data creates a large multiple hypothesis testing problem: comparing lipid extracts from the parental, single and double knockouts in Rv3377c and Rv3378c generated 24,300 events to test in multi-way comparisons. Furthermore, mass spectrometry event lists are not immediately amenable to statistical analysis pipelines for identifying differential abundance because of intermittent zero intensities and technical variability. Therefore, we wrote the open-source R package, limms, to overcome these limitations and support a next-generation metabolomics platform. This software normalizes and imputes mass spectrometry data, facilitates contrast-based statistical comparisons [20], applies p value adjustments [20], and supports data visualization (S2 Data). Unlike prior approaches that are constrained to pairwise or all-ways comparisons [4,21,22], limms allowed flexible specification of multi-way contrasts, including a three-way complementation (Fig 1C) and four-way epistasis analysis (Fig 1D); furthermore, paired samples, nested contrasts, time courses, and dose-response analyses are accepted with their use explained in the limms vignette (S2 Data). limms works with data from any mass spectrometry platform, chromatography system, and type of metabolite. Broadly applicability was demonstrated by reanalysis of previously published data that measured intracellular metabolites from Saccharomyces cerevisiae using a different LC-MS system [23]. Changes in sulfur-containing amino acids were expected when yeast lacking cystathionine beta-synthase (CBS) activity were trans-complemented with human alleles; however, analysis using limms found statistically significant changes extended well beyond the CBS pathway (S3 Fig, S1 Table, and S2 Data). These data are included and utilized as examples in the limms vignette (included as S2 Data) and help pages.

limms revealed an unexpected lipidomic phenotype Using limms, lipidomics analysis of all events from the Mtb genetic studies focused on the genetically controlled metabolites whose intensity values changed at least 2-fold with p value <0.05 (Fig 1C and 1D and S3 Data). Because 1-TbAd and N6-TbAd were the only known products of this locus (Fig 1A), the large number of changed events after deletion of Rv3377c (894 events) or Rv3378c (1,012 events), or double deletion (1,291 events) was highly unexpected (Fig 1C). Mass-retention time addresses of changed events showed high overlap in all 3 mutants (Fig 1C; F-test p < 0.001 in S3 Data), consistent with the proposed pathway in Fig 1A, but did not clarify the order of gene action (Fig 1C). Separately, we compared the Rv3378c deletion mutant to the parental strain and complemented Rv3378c deletion, with complementation used to increase statistical stringency and address possible second-site mutations, and 292 events showed lower intensity when Rv3378c was deleted (Fig 1D, blue), while 50 up-regulated events showed small increases in intensity. Hence, 2 independent experiments showed a marked expansion of the lipid phenotype beyond 1-TbAd and N6-TbAd. Comparative lipidomics without limms was possible by applying statistical methods piecemeal. For example, the R package xcms used here for peak picking and alignment [22], tabulated p values for pairwise t tests. However, in these cases the mass spectrometry data are not normalized, p values are not adjusted, and complex experimental designs can only be approached by looking for overlaps in data sets by non-statistical means like Venn diagrams. For comparisons of the results with and without limms, we applied pairwise t tests to the 2 mass spectrometry data sets in Fig 1C and 1D. More events were detected as differentially abundant using limms and many fewer events localized near the x-axis (high fold-change but p > 0.01), with the lead compounds 1- and N6-tuberculosinyladenosine being better differentiated. Only 52 events were significantly decreased in all 3 mutant strains when the overlap between t test results in Fig 1C was computed. In contrast, 194 events were found to be significantly decreased for the same data via the F-test in limms. Furthermore, the F-test provided a p value that directly addressed the hypothesis that the single and double mutants altered the same events. Mapping the events detected by F-test as most changed onto the individual mutant contrasts showed the decreased events occupied similar space in all 3 mutants (Fig 1C, black), consistent with almost identical lipid phenotypes for all 3 mutant strains.

Origin on a plasmid from plant-associated bacteria Pethe and colleagues noted the atypical G-C content of Rv3377c and Rv3378c was consistent with horizontal gene transfer [6], a finding Becq and colleagues extended to include Agrobacterium or Rhizobium as possible donors [37]. When our initial efforts to find orthologs of Rv3376, Rv3377c, or Rv3378c outside of Mycobacteria failed, we searched for a genetic donor that more closely matched TbAd biosynthesis gene nucleotide composition among all available DNA sequences (Fig 5F). The 1-TbAd locus most closely resembled sequences from Agrobacterium and Rhizobium (Fig 5F) in agreement with Becq and colleagues; however, we identified the sequences as plasmids from plant-associated bacteria. A short stretch of the Bradyrhizobium chromosome was also identified (Fig 5F) but the rest of the genome was not implicated, more consistent with a sequence transferred to both Bradyrhizobium and M. tuberculosis. While the ancestral genes were not identified directly, Agrobacterium tumefaciens tumor-inducing (Ti) plasmids like the one identified here encode specialized machinery for horizontal gene transfer [38], providing a plausible mechanism for punctuated evolution. We speculate the terpene biosynthesis genes Rv3377c and Rv3378c were collated with the hydrolase Rv3376, which was less diverged from the Mtb genome and is oppositely transcribed, prior to a single horizontal gene transfer into the common ancestor of the MTC, M. lacus, and M. decipiens (Fig 5G). Genetic changes subsequently tuned 1-TbAd production while MTC species evolved distinct host tropisms, with the species that cause human TB epidemics showing high-level constitutive 1-TbAd biosynthesis.

[END]
---
[1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3002813

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/