(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .
Phylogenomics of Leptospira santarosai, a prevalent pathogenic species in the Americas [1]
['Diana Chinchilla', 'Centro Nacional De Referencia De Bacteriología', 'Instituto Costarricense De Investigación Y Enseñanza En Nutrición Y Salud', 'Inciensa', 'La Unión', 'Cartago', 'Costa Rica', 'Cecilia Nieves', 'Bacterial Symbionts Evolution', 'Centre Armand-Frappier Santé Biotechnologie']
Date: 2023-11
In conclusion, we report a comprehensive genome analysis of pathogenic Leptospira species with a focus on L. santarosai. Our study sheds new light onto the genomic diversity, evolutionary history, and epidemiology of leptospirosis in America and globally. Our findings also expand our knowledge of the genes driving O-antigen diversity. In addition, our work provides a framework for understanding the virulence and spread of L. santarosai and for improving its surveillance in both humans and animals.
Here we investigated the genome diversity of the main pathogenic Leptospira species based on a collection of 914 genomes from strains isolated around the world. Genome analyses revealed species-specific genome size and GC content, and an open pangenome in the pathogenic species, except for L. mayottensis. Taking advantage of a new set of genomes of L. santarosai strains isolated from patients in Costa Rica, we took a closer look at this species. L. santarosai strains are largely distributed in America, including the Caribbean islands, with over 96% of the available genomes originating from this continent. Phylogenetic analysis showed high genetic diversity within L. santarosai, and the clonal groups identified by cgMLST were strongly associated with geographical areas. Serotype identification based on serogrouping and/or analysis of the O-antigen biosynthesis gene loci further confirmed the great diversity of strains within the species.
Leptospirosis is a complex zoonotic disease mostly caused by a group of eight pathogenic species (L. interrogans, L. borgpetersenii, L. kirschneri, L. mayottensis, L. noguchii, L. santarosai, L. weilii, L. alexanderi), with a wide spectrum of animal reservoirs and patient outcomes. Leptospira interrogans is considered as the leading causative agent of leptospirosis worldwide and it is the most studied species. However, the genomic features and phylogeography of other Leptospira pathogenic species remain to be determined.
Leptospirosis is an emerging zoonosis caused by pathogenic species of a highly heterogeneous genus. Most studies have focused on Leptospira interrogans that is responsible for the majority of human infection cases worldwide. On the contrary, our knowledge is very limited for other pathogenic species, including L. santarosai, which may represent a public health problem in both humans and animals in the American continent. Our comparative genomic analyses of the pathogenic species revealed that L. santarosai is characterized by an open pangenome state with high genetic and serovar diversity. This first study of L. santarosai isolates not only contributes to the global understanding of genomics and evolution within the Leptospira pathogenic species but also provides the groundwork for better surveillance of this pathogen.
Funding: This research was supported by Santé Publique France through the french National Reference Center of Leptospirosis, the Institut Pasteur through grant PTR 30-2017 (MP and FJV) and by National Institutes of Health grant P01 AI 168148 (MP and FJV). CN received a Ph.D. studentship Calmette & Yersin from the Institut Pasteur International Network. FJV received a Junior 1 and Junior 2 research scholar salary award from the Fonds de Recherche du Québec—Santé. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Copyright: © 2023 Chinchilla et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Phylogenomics analysis of L. santarosai genomes will enable to better understand the genetic diversity and genome features of this pathogenic species which is prevalent in most countries of the American continent.
In the present study, we first performed an analysis of the pangenome in pathogenic Leptospira species and then took a closer look at the genetic diversity of L. santarosai including a set of strains recently isolated from patients in Costa Rica, which is an endemic country for leptospirosis [ 12 ].
Our previous analysis of the distribution of pathogenic Leptospira species showed that L. interrogans is the most frequently encountered and globally distributed species [ 1 ]. This cosmopolitan species is also by far the most studied in terms of virulence, and molecular epidemiology, among other aspects. On the contrary, to date, very little is known about the geographical distribution, reservoirs, genomic features and virulence factors of pathogenic species other than L. interrogans. In the same analysis, we showed that some pathogenic species were geographically restricted [ 1 ]. Thus, only limited reports have described the existence of L. santarosai outside the American continent. L. santarosai, named after Carlos A. Santa Rosa, a Brazilian veterinary microbiologist who pioneered the study of leptospirosis in Brazil, was first described in 1987 [ 11 ]. L. santarosai is predominant in many countries from Central and South America.
Over the past decade, the number of Leptospira species described has rapidly extended from 22 in 2014 to 69 in 2022 [ 4 ], largely due to the use of improved protocols for culture isolation from the environment [ 5 , 6 ] and the generalization of next generation sequencing [ 7 ]. Among the genus Leptospira, eight species (L. interrogans, L. kirschneri, L. noguchii, L. santarosai, L. mayottensis, L. borgpetersenii, L. alexanderi and L. weilii), which diverged after a specific node of evolution, constitute the most virulent group of pathogenic species [ 8 ]. These Leptospira species are the causative agents of leptospirosis in both human and animals, leading to a high disease burden in tropical countries [ 9 ] and major economic losses in the livestock sector [ 10 ].
Leptospira is a highly heterogeneous bacterial genus divided into pathogenic and saprophytic species and then further divided into more than 300 serovars, which are defined according to structural heterogeneity of the lipopolysaccharide (LPS) O-antigen. Nowadays, strain identification is mainly based on genome analysis, and core genome multilocus sequence typing (cgMLST) [ 1 ] enables identification of the species and below. Recent studies have also shown that whole-genome sequences can be used for predicting Leptospira serotypes on the basis of the rfb locus which contains the genes for the O-antigen biosynthesis [ 2 , 3 ]. This approach offers a promising alternative to the conventional serotyping method, which is laborious, time-consuming, expensive and requires a high level of expertise.
The sequencing data generated in this study are available in the NCBI database under the BioSample accession numbers SAMN34670613, SAMN34670614, SAMN34670615, SAMN34670616, SAMN34670617, SAMN34670618, SAMN34670619, SAMN34670620, SAMN34670621, SAMN34670622, SAMN34670623, SAMN34670624, SAMN34670625, SAMN34670626, SAMN34670627, SAMN34670628, SAMN34670629, SAMN34670630. Genome sequences used in this study are also available at
https://bigsdb.pasteur.fr/cgi-bin/bigsdb/bigsdb.pl?db=pubmlst_leptospira_isolates&page=query&project_list=21&submit=1 .
Core genome MLST (cgMLST) typing was performed using a scheme based on 545 core genes as previously described [ 1 ]. L. santarosai core-genome based phylogeny was constructed using the 1288 core-genes alignment resulting from Roary analysis (60% identity cut-off, option -i 60). The best-fit model and the maximum-likelihood phylogenetic tree were determined by IQ-TREE version 1.6.11 [ 22 ], considering 10,000 ultrafast bootstraps [ 23 ]. L. interrogans str Fiocruz L1-130 and L. borgpetersenii str. M84 were used as outgroups. Tree branches were transformed with the "proportional" option on FigTree software v1.4.4 (
http://tree.bio.ed.ac.uk/software/figtree/ ), which adjusts branch distances according to the number of tips under each node to improve visualization of the tree. Gene presence/absence analyses among rfb clusters from different genomes here studied were performed by protein-level searches using BLASTP [ 24 ] and subsequent network associations by NetworkX version 2.6.2 [ 25 ]. A similarity threshold of 60% was applied, as previously described [ 2 ]. The resulting presence/absence table obtained from the network association analysis was converted into a binary CSV file, where 0 represents gene absence and 1 represents gene presence. This binary table was subjected to hierarchical clustering based on shared protein-encoding genes (options: euclidean distance, ward linkage) using available tools at
https://mev.tm4.org . Jaccard’s similarity index was used to measure the similarity between rfb patterns.
Average Nucleotide Identity (ANI) and Percentage of Conserved Proteins (POCP) were calculated for the 64 L. santarosai genome sequences as well as L. interrogans str. Fiocruz L1-130 and L. borgpetersenii str. M84 used as outgroups ( S1 and S2 Figs). Genomes were annotated by Prokka version 1.13.7 [ 21 ]. ANI and POCP matrices were inferred using OMCL algorithm via GET_HOMOLOGUES version 20190411 [ 12 ]. Briefly, to calculate ANI, the option -A was employed along with option -a to utilize nucleotide sequences and perform BLASTN. This process generated a tab-separated file containing average percentage sequence identity values between pairs of genomes, calculated from sequences within all identified clusters (option t = 0). This tab-separated file served as the input to create a symmetric matrix, where the genomes were clustered based on their ANI values. Dendrograms based on this clustering were generated on both sides of the matrix to visually represent the proximity among genomes. Similarly, POCP was calculated by including the option -P and performing default BLASTP searches. This step yielded another tab-separated file, which was subsequently used to create a symmetric matrix. Analogous to the ANI matrix, the genomes were clustered based on the shared % of conserved proteins between pairs of genomes. These values are calculated as POCP = (C a + C b )/(total a + total b ), where C a and C b denote the number of conserved proteins from genome a in genome b and from genome b in genome a, respectively, normalized by the sum of total proteins in each genome. The clustering process also generated dendrograms, indicating the proximity among genomes in terms of conserved proteins.
Genome size and GC content for highly virulent Leptospira species were determined through DFAST annotation [ 18 ]. Individual values were plotted and grouped per species, with the mean and standard deviation displayed. Genome size and GC content were compared using the Kruskal-Wallis Rank Sum Test, for the comparison of Leptospira spp. and the Wilcoxon rank test, for the comparison of two phylogenetic-related groups. Post-hoc comparisons were performed using Dunn’s Kruskal-Wallis Multiple Comparisons (Dunn, 1964). P-values were adjusted with the Bonferroni method. Statistical analyses were performed in R [ 19 ], using FSA package [ 20 ].
Comparative analyses of the pangenome were performed using two software: Roary version 3.11.2 [ 16 ], and a combination of COG and OMCL algorithms in GET_HOMOLOGUES version 20190411 [ 12 ]. Both methods yielded a similar number of gene clusters. In the Roary analysis, a 60% identity cut-off was applied to define gene clusters (option -i 60), and no other parameters were modified. Among the Roary outputs, a tab-separated file containing the number of genes in the pangenome was used to create a graph depicting the variation in the number of gene clusters as a function of the number of genomes analyzed. Roary iterated 10 times, calculating the number of new genes added as each genome was sequentially incorporated into the analysis. This graph facilitated a quick determination of whether the pangenome was open or closed and allowed for the calculation of the α coefficient in Heap’s Law (n = κNγ, with γ = 1- α) [ 17 ]. On the other hand, GET_HOMOLOGUES was used to infer the pangenome distribution in cloud-, shell-, soft-core-, and core-genome. This was achieved by generating a tab-separated pangenome matrix file that included the number of all the clusters identified by both COG and OMCL algorithms. The matrix represented the intersection of the two methods and served as input for the parse_pangenome_matrix.pl script within GET_HOMOLOGUES, which classified the clusters as cloud (shared by up to 2 genomes), shell (shared by more than 2 genomes but less than 93% of genomes analyzed), soft-core (shared by 93–99% of genomes), or core-genes (shared by 100% of genomes). Due to the substantial number of genomes available for L. interrogans and L. borgpetersenii, as well as the redundancies observed in serogroups and serovars, representative genomes of each serogroup/serovar were selectively chosen to mitigate computational costs. Excluding genomes with redundant identities is not anticipated to result in significant alterations in the pangenome distribution.
Illumina sequencing was performed from extracted genomic DNAs of exponential-phase cultures using a MagNA Pure 96 Instrument (Roche, Meylan, France). Next-generation sequencing (NGS) was performed using Nextera XT DNA Library Preparation kit and the NextSeq 500 sequencing systems (Illumina, San Diego, CA, USA) at the Mutualized Platform for Microbiology (P2M) at Institut Pasteur. CLC Genomics Workbench 9 software (Qiagen, Hilden, Germany) was used for analyses. The generated contig sequences together with the sample metadata are available in BIGSdb hosted at the Institut Pasteur (
https://bigsdb.pasteur.fr/leptospira/ ). We also downloaded additional genome sequences of Leptospira isolates from the NCBI database ( S1 Table ). Only genomes meeting quality requirements, such as i) sequencing coverage >30x, ii) number of contigs <600, iii) cumulative contigs length within the typical range of Leptospira genomes (3.6-6Mb), iv) GC content within the typical range of Leptospira genomes (35–48%), and v) <100 uncalled cgMLST alleles out of the 545 pre-defined core genes, were selected for further analyses.
Isolates sequenced in this study (n = 153) were obtained from the collections of the French National Reference Center for Leptospirosis (Institut Pasteur, Paris, France), Laboratorio de Genética Molecular (Instituto Venezolano de Investigaciones Científica, Caracas, Venezuela), Institut Pasteur of Alger (Algiers, Algeria), Institute of Veterinary Bacteriology (University of Bern, Switzerland), Molecular Epidemiology and Public Health Laboratory (School of Veterinary Sciences, Massey University, New Zealand), Instituto de Higiene (Facultad de Medicina, Universidad de la República, Montevideo, Uruguay), Universidade Federal Fluminense (Rio de Janeiro, Brazil), Faculty of Veterinary Medicine (University of Zagreb, Croatia), National Collaborating Centre for Reference and Research on Leptospirosis (Academic Medical Center, Amsterdam, the Netherlands), Laboratory of Zoonoses (Pasteur Institute in Saint Petersburg, Saint Petersburg, Russia), Institute for Medical Research (Malaysia), Faculty of Medicine and Health Sciences (University Putra Malaysia, Malaysia), and Leptospirosis Research and Expertise Unit (Institut Pasteur Nouvelle-Calédonie, Nouméa, New Caledonia), Kimron Veterinary Institute (Israel). We also downloaded genomes from our previous studies including isolates from the collections of Lao-Oxford-Mahosot Hospital-Wellcome Trust-Research Unit (LOMWRU) (Microbiology Laboratory, Mahosot Hospital, Vientiane, Lao People’s Democratic Republic), Unidad Mixta Pasteur-Instituto Nacional de Investigación Agropecuaria (Institut Pasteur of Montevideo, Montevideo, Uruguay), Centre Hospitalier de Mayotte (France), and Department of Mycology-Bacteriology (Institute of Tropical Medicine Pedro Kourí, Havana, Cuba) [ 1 , 2 , 13 – 15 ] as well as genomes from the NCBI database. Information on strains and genomes used in this study are indicated in S1 and S2 Tables .
According to the decree number 40556-s of the General Health Law of Costa Rica, epidemiological studies that incorporate the review of clinical records do not require the approval of an ethics-scientific committee. Additionally, no written informed consent from patients was required, as the study was conducted as part of the routine diagnosis at the Centro Nacional de Referencia de Bacteriología of the Instituto Costarricense de Investigación y Enseñanza en Nutrición y Salud (INCIENSA). No additional clinical specimens were collected for the purpose of the study. Human samples were anonymized, and collection of the samples was conducted according to the Declaration of Helsinki.
Results and discussion
Distribution of pathogenic Leptospira species shows that L. santarosai isolates are mostly from the Americas We first investigated the geographical distribution of pathogenic Leptospira species using 914 genomes of isolates collected between 1928 and 2022 (S1 Table). Species included in our study are: L. interrogans (n = 410), L. borgpetersenii (n = 264), L. kirschneri (n = 88), L. mayottensis (n = 33), L. noguchii (n = 31), L. santarosai (n = 64), L. weilii (n = 24); L. alexanderi, with only 2 isolates in our database, was not included in this study. Strains were isolated from human (50%) and animal (49%) samples, in Europe (18 %), Africa (2 %), Indian Ocean (14%), Caribbean islands (6%), Central America (3%), South America (13%), North America (4%), Central Asia, South Asia, East and Western Asia (11%), Southeast Asia (14 %), and Australia and the Pacific region (15%) (Fig 1). Although this study is based on the genomes available in the databases and may introduce a bias, L. interrogans, L. kirschneri, and L. borgpetersenii are distributed worldwide, L. weilii is mostly found in Asia, Australia and the Pacific region, L. mayottensis in the Indian Ocean, and L. noguchii and L. santarosai in America as previously shown [1]. PPT PowerPoint slide
PNG larger image
TIFF original image Download: Fig 1. Geographic origins of the most frequent pathogenic Leptospira species in our genome database (n = 914). Each pie chart corresponds to a given world region. As shown in our map, L. santarosai (n = 64) is mostly found in America (North America, Central America, South America and the Caribbean islands). The base layer of the map is freely available from outline-world-map.com.
https://doi.org/10.1371/journal.pntd.0011733.g001 Leptospirosis is endemic in most countries of South and Central America, as well as in the Caribbean region [9,26–28]. In addition, most outbreaks of leptospirosis have been reported in the Latin America and the Caribbean region [29], where the disease is widespread in domestic and wild animals [12]. However, comprehensive data concerning human and animal leptospirosis remain largely scarce in most American countries [30]. We previously studied the genomes of L. noguchii isolated from human and animals in America [2] but our knowledge of L. santarosai, the other prevalent species in America, is rather limited. Here, we sequenced 18 L. santarosai strains, including twelve strains that were isolated in Costa Rica in 2020–2021 from patients. The ANI and POCP values were calculated for the 64 L. santarosai strains further confirming they all belong to the same species (S1 and S2 Figs). Of the 64 L. santarosai strains in our genome database, 28 were isolated in South America (Brazil, Colombia, Ecuador, Peru), 21 in Central America (Costa Rica, Panama), 6 in North America (US; not including Puerto Rico), 7 in the Caribbean region (Martinique, Guadeloupe, Trinidad and Tobago and Puerto Rico), and only two strains were isolated outside the Americas (China and Democratic Republic of the Congo) (Fig 1 and S2 Table). Of note, L. santarosai has not been isolated in Uruguay, where a large number of Leptospira strains have been isolated from cattle [15]. L. santarosai strains in our study were isolated from humans (n = 38), bovine (n = 12), rodents (n = 8, including rats, spiny rats, capybara and muskrat), opossum (n = 2), dog (n = 1), goat (n = 1), pig (n = 1), and racoon (n = 1) (S2 Table). Previous studies have shown that L. santarosai can be detected from different sources in many countries of America and the Caribbean region. It is the predominant species in humans, rodents and dogs in Peru and Colombia [31,32]. In Peru, it has additionally been found in rural environmental water samples (but not in urban samples), as well as in association with pigs and cattle [33]. In Brazil, L. santarosai has been isolated from dogs [34], cattle [35], goats [36], and capybaras [37]. Moreover, it has also been identified in patients in French Guiana [38], Guadeloupe [39] and the US [40]. Only a few reports have described the existence of L. santarosai outside the American continent. Some years after the original description of L. santarosai [11], Brenner et al. listed 65 L. santarosai strains, of which only three were isolated from outside America [41]. One L. santarosai strain was isolated from a patient in Sri-Lanka in 1966 but has never been reported in this country afterwards [42, 43]. The other two strains were isolated in Denmark and Indonesia but, again, L. santarosai has not been subsequently isolated in these countries. More recently, a strain belonging to L. santarosai serogroup Grippotyphosa was isolated from a patient in India and its genome sequenced [44]. However, because of highly fragmented genome (884 contigs) and missing genomic data (135 uncalled cgMLST alleles), the cluster assignment was not possible for this isolate and we removed its genome from our analysis. The serogroup Shermani, is commonly reported in serological surveys in animals in Asia [45–48]. Unfortunately, there is no evidence that the infecting strains described in these studies were L. santarosai or another species such as L. noguchii and L. inadai which also contain serovars from the serogroup Shermani [49]. Finally, the other country outside Americas where L. santarosai was reported is Taiwan in East Asia. Serogroup Shermani, presumably belonging to L. santarosai, is predominant among patients with severe leptospirosis in Taiwan [50]. However, only one L. santarosai strain, strain CCF, has been isolated from a patient with leptospirosis in Taiwan [51] and this strain, for which we do not have the complete genome [52], is no longer available (personal communication of Prof Chih-Wei Yang).
High genetic diversity of L. santarosai strains To further investigate the genetic diversity of L. santarosai isolates, we used a core genome MLST (cgMLST) scheme [1] (Fig 4). The species L. santarosai (n = 64) were divided into 55 cgMLST clonal groups (cgCGs) showing a high intraspecies genetic diversity (Fig 4) as shown in previous studies [29,32,39,57,58]. Among the 55 cgCGs, none is composed of more than 3 strains (S1 Table), and none is composed of both human and animal strains. We cannot therefore identify transmission of L. santarosai clones between different hosts. There is a wide range of possible reservoirs for L. santarosai in the Americas. Some countries in the region are among the largest cattle producers in the world so these animals could be important reservoirs for human infections. The Americas also exhibit a great biodiversity, so many species of wild animals such as rodents, marsupials, and domestic animals, such as dogs may be involved in transmission cycles. Among the 15 strains isolated from patients in Costa Rica, only two (id1256 and id1260) exhibit the same clonal group further confirming the great diversity of strains even within one small country. PPT PowerPoint slide
PNG larger image
TIFF original image Download: Fig 4. Phylogenetic tree of L. santarosai strains. Maximum-likelihood phylogeny based on the variable sites of the cgMLST scheme consisting of 545 core genes showing the distribution of species, serogroups and geographic origins. The species L. santarosai (n = 64) were divided into 55 cgMLST clonal groups belonging to 9 different serogroups; for 20 strains the serogroup is unknown or undetermined. Colors indicate strains isolated from the same geographic region (Central America in red, South America in blue, North America in green, Caribbean in orange, Southern Asia in purple, and Middle Africa in black). Branch lengths were not used to ease readability of groups and isolates.
https://doi.org/10.1371/journal.pntd.0011733.g004 Unfortunately, analyses to identify associations of Leptospira genotypes to particular epidemiological variables (host reservoir, disease outcome, etc.) cannot be performed with our small sample size. However, we could determine some phylogeographic lineages. A clear geographical separation of the clonal groups was observed for strains from (i) Central America, comprising isolates from Costa Rica (15; all human strains), Panama (6) but also one strain from South America (Colombia); South America which was further divided in two divergent groups, (ii) one containing strains from Brazil (10; mostly bovine strains) and the other (iii) including strains from Peru (9); and (iv) Caribbean islands, with strains from Guadeloupe (2), Martinique (2), Trinidad (1) and Puerto Rico (1) (Fig 4). This suggests ancestral presence of this species in these different countries and further separated evolution with no or low geographic diffusion. On the contrary, previous phylogenetic analyses of L. noguchii [2] and L. interrogans [1] did not reveal a correlation of genotype with geographical distribution.
[END]
---
[1] Url:
https://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0011733
Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/