(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org.
Licensed under Creative Commons Attribution (CC BY) license.
url:
https://journals.plos.org/plosone/s/licenses-and-copyright
------------
RefPlantNLR is a comprehensive collection of experimentally validated plant disease resistance proteins from the NLR family
['Jiorgos Kourelis', 'The Sainsbury Laboratory', 'University Of East Anglia', 'Norwich Research Park', 'Norwich', 'United Kingdom', 'Toshiyuki Sakai', 'Hiroaki Adachi', 'Sophien Kamoun']
Date: 2021-11
Reference datasets are critical in computational biology. They help define canonical biological features and are essential for benchmarking studies. Here, we describe a comprehensive reference dataset of experimentally validated plant nucleotide-binding leucine-rich repeat (NLR) immune receptors. RefPlantNLR consists of 481 NLRs from 31 genera belonging to 11 orders of flowering plants. This reference dataset has several applications. We used RefPlantNLR to determine the canonical features of functionally validated plant NLRs and to benchmark 5 NLR annotation tools. This revealed that although NLR annotation tools tend to retrieve the majority of NLRs, they frequently produce domain architectures that are inconsistent with the RefPlantNLR annotation. Guided by this analysis, we developed a new pipeline, NLRtracker, which extracts and annotates NLRs from protein or transcript files based on the core features found in the RefPlantNLR dataset. The RefPlantNLR dataset should also prove useful for guiding comparative analyses of NLRs across the wide spectrum of plant diversity and identifying understudied taxa. We hope that the RefPlantNLR resource will contribute to moving the field beyond a uniform view of NLR structure and function.
Funding: This work has been supported by Gatsby Charitable Foundation (
https://www.gatsby.org.uk/ ) (TS, SK), Biotechnology and Biological Sciences Research Council (BBSRC BB/P012574 (Plant Health ISP)) (SK), European Research Council (grant number 743165,
https://cordis.europa.eu/project/id/743165 ) (SK), Japan Society for the Promotion of Plant Science Postdoctoral fellowship (HA), and BASF Plant Science (JK, SK). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
The goal of this study is to provide a curated reference dataset of experimentally validated plant NLRs. This version of RefPlantNLR (v.20210712_481) consists of 481 NLRs from 31 genera belonging to 11 orders of flowering plants. We used RefPlantNLR to determine the canonical features of functionally validated plant NLRs and benchmark NLR extraction tools. We found that these NLR extraction tools can extract the majority of NLRs in the RefPlantNLR dataset; however, the domain architecture analysis produced by these tools is often inconsistent with that of RefPlantNLR. In order to simplify NLR extraction, functional annotation, and phylogenetic analysis, we developed NLRtracker: a pipeline that uses InterProScan [ 31 ] and predefined NLR motifs [ 32 ] to extract NLRs and provide domain architecture analyses based on the canonical features found in the RefPlantNLR dataset. Additionally, NLRtracker outputs the extracted NB-ARC domain facilitating downstream phylogenetic analysis. RefPlantNLR should also prove useful in guiding comparative and phylogenetic analyses of plant NLRs and identifying understudied taxa for future studies.
Given their multidomain nature, sequence diversity, and complex evolutionary history, prediction of NLR genes from plant genomes is challenging. Several bioinformatic tools have been developed to extract plant NLRs from sequence datasets. As an input, these tools take either annotated genomic features and transcriptomic data or alternatively can be run directly on the unannotated genomic sequence. NLR-Parser, RGAugury, RRGPredictor, and DRAGO2 identify transcript and protein sequences that have features of NLRs and are best described as NLR extractors [ 25 – 28 ]. RGAugury, RRGPredictor, and DRAGO2 also extract other classes of immune-related genes in addition to NLRs. These various tools use predefined motifs to classify sequences as NLRs, but they differ in the methods and pipelines. NLR-Annotator—an extension of NLR-Parser—and NLGenomeSweeper can also use unannotated genome sequences as input to predict the genomic locations of NLRs [ 29 , 30 ]. This output then requires manual annotation to extract the final gene models, and some of the annotated loci may represent partial or pseudogenized genes.
The mechanism of pathogen detection by NLRs can be either direct or indirect [ 4 ]. Direct recognition involves the NLR protein binding a pathogen-derived molecule or serving as a substrate for the enzymatic activity of a pathogen virulence protein (known as effectors). Indirect detection is conceptualized by the guard and decoy models where the status of a host component—the guardee or decoy—is monitored by the NLR [ 18 , 19 ]. Some sensor NLRs known as NLR-IDs contain noncanonical “integrated domains” that can function as decoys to bait pathogen effectors and enable pathogen detection [ 20 – 22 ]. These extraneous domains appear to have evolved by fusion of an effector target domain into an NLR [ 20 , 21 , 23 ]. The sequence diversity of integrated domains in NLR-IDs is staggering, indicating that novel domain acquisitions have repeatedly occurred throughout the evolution of plant NLRs [ 21 , 24 ].
Plant NLRs likely evolved from multifunctional receptors to specialized receptor pairs and networks [ 14 , 15 ]. NLRs that combine pathogen detection and immune signaling activities into a single protein are referred to as “functional singletons,” whereas NLRs that have specialized in pathogen recognition or immune signaling are referred to as “sensor” or “helper” NLRs, respectively. About one-quarter of NLR genes occur as “genetic singletons” in plant genomes, whereas the others form genetic clusters often near telomeres [ 16 ]. This genomic clustering likely aids the evolutionary diversification of this gene family and subsequent emergence of pairs and networks [ 6 , 15 ]. The emerging picture is that NLRs form genetic and functional receptor networks of varying complexity [ 15 , 17 ].
( A) Domain architecture of typical plant NLRs. The structural features and conserved motifs of the NB-ARC are indicated. ( B) The number of experimentally validated NLRs per plant genus (N = 481), and ( C) the per genus reduced redundancy set at a 90% sequence similarity threshold (N = 303) are plotted as a stacked bar graph. ( D) The class of pathogen to which NLRs in the RefPlantNLR dataset confer a response. Some NLRs may be involved in the response against multiple classes of pathogens, while others have a helper role or are found to be involved in allelic variation in autoimmune/hybrid necrosis responses, and ( E) the per genus reduced redundancy set at a 90% sequence similarity threshold are plotted as a stacked bar graph. The number of experimentally validated NLRs belonging to the monophyletic TIR-NLR, CC-NLR, CC R -NLR, or CC G10 -NLR subclade members is indicated. Underlying data and R code to reproduce the figures in S5 Data . CC, coiled-coil; HD, helical domain of apoptotic protease-activating factors; LRR, leucine-rich repeat; NB, P-loop containing NTPase domain; NLR, nucleotide-binding leucine-rich repeat; TIR, Toll/interleukin-1 receptor; WD, winged helix domain.
NLRs occur widely across all kingdoms of life where they generally function in non-self-perception and innate immunity [ 3 , 8 , 9 ]. In the broadest biochemical definition, NLRs share a similar multidomain architecture consisting of a nucleotide-binding and oligomerization domain (NOD) and a superstructure-forming repeat (SSFR) domain [ 10 ]. The NOD is either an NB-ARC (nucleotide-binding adaptor shared by APAF-1, certain R gene products, and CED-4) or NACHT (neuronal apoptosis inhibitory protein, MHC class II transcription activator, HET-E incompatibility locus protein from Podospora anserina, and telomerase-associated protein 1), whereas the SSFR domain can be formed by ankyrin (ANK) repeats, tetratricopeptide repeats (TPRs), armadillo (ARM) repeats, WD repeats, or leucine-rich repeats (LRRs) [ 10 , 11 ]. Plant NLRs exclusively carry an NB-ARC domain with the C-terminal SSFR consisting typically of LRRs ( Fig 1A ). The NB-ARC domain has been used to determine the evolutionary relationships between plant NLRs, given that it is the only domain that produces reasonably good global alignments across all members of the family. In flowering plants (angiosperms), NLRs form 3 main monophyletic groups with distinct N-terminal domain fusions: the TIR-NLR subclade containing an N-terminal Toll/interleukin-1 receptor (TIR) domain, the CC-NLR-subclade containing an N-terminal Rx-type coiled-coil (CC) domain, and the CC R -NLR subclade containing an N-terminal RPW8-type CC (CC R ) domain [ 12 ]. Additionally, Lee and colleagues [ 13 ] have recently proposed that the G10-subclade of NLRs is a monophyletic group containing a distinct type of CC (here referred to as CC G10 ; CC G10 -NLR). NLRs also occur in nonflowering plants where they carry additional types of N-terminal domains such as kinases and α/β hydrolases [ 11 ].
Reference datasets are critical in computational biology [ 1 , 2 ]. They help define canonical biological features and are essential to benchmarking studies. Reference datasets are particularly important for defining the sequence and domain features of gene and protein families. Despite this, curated collections of experimentally validated sequences are still lacking for several widely studied gene and protein families. One example is the nucleotide-binding leucine-rich repeat (NLR) family of plant proteins. NLRs constitute the predominant class of disease resistance (R) genes in plants [ 3 – 5 ]. They function as intracellular receptors that detect pathogens and activate an immune response that generally leads to disease resistance. NLRs are thought to be engaged in a coevolutionary tug-of-war with pathogens and pests. As such, they tend to be among the most polymorphic genes in plant genomes, both in terms of sequence diversity and copy number variation [ 6 ]. Ever since their first discovery in the 1990s, hundreds of NLRs have been characterized and implicated in pathogen and self-induced immune responses [ 4 ]. NLRs are among the most widely studied and economically valuable plant proteins, given their importance in breeding crops with disease resistance [ 7 ].
Results and discussion
Construction of the RefPlantNLR dataset To construct the current version of RefPlantNLR (v.202110712_481, S1–S3 Dataset), we manually crawled through the literature, extracting plant NLRs that have been experimentally validated to at least some degree. We defined experimental validation broadly as genes reported to be involved in any of the following: (1) disease resistance; (2) disease susceptibility, including effector-triggered immune pathology or trailing necrosis to viruses; (3) hybrid necrosis; (4) autoimmunity; (5) NLR helper function or involvement in downstream immune responses; (6) negative regulation of immunity; and (7) well-described allelic series of NLRs with different pathogen recognition spectra even if not reported to confer disease resistance. We defined NLRs as sequences containing the NB-ARC domain (Pfam signature PF00931) or a P-loop containing nucleoside triphosphate hydrolases (NTPase) domain (SUPERFAMILY signature SSF52540) combined with plant-specific NLR motifs [32] (see Material and methods for the used motifs) (Fig 1A). This resulted in 479 sequences. We also included RXL [33], which has an N-terminal Rx-type CC domain and C-terminal LRR domain, as well as AtNRG1.3 [34], which has a C-terminal LRR domain, both of which contain the RNBS-D motif of the NB-ARC domain but otherwise do not get annotated with a P-loop containing NTPase domain. Altogether, these 481 sequences form the current version of RefPlantNLR (S1 Table). In addition to the 481 NLRs present in this version of RefPlantNLR, we separately collected several characterized animal, bacterial, and archaeal NB-ARC proteins (S2 Table, S4 Dataset), which can be used as outgroups for comparative analyses. Furthermore, several characterized plant immune components have features often found in NLRs—such as the RPW8-type CC or the TIR domain—but lack the NB-ARC domain or NB-ARC–associated motifs that we used to define NLRs (see above). Since these proteins may have common origins with plant NLRs or may be useful for comparative analysis of these domains, we have collected them separately as well (S3 Table, S5–S7 Dataset).
[END]
[1] Url:
https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001124
(C) Plos One. "Accelerating the publication of peer-reviewed science."
Licensed under Creative Commons Attribution (CC BY 4.0)
URL:
https://creativecommons.org/licenses/by/4.0/
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/