(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .



Functional mapping of N-terminal residues in the yeast proteome uncovers novel determinants for mitochondrial protein import [1]

['Salomé Nashed', 'Sorbonne Université', 'Cnrs', 'Institut De Biologie Paris-Seine', 'Umr', 'Laboratoire De Biologie Computationnelle Et Quantitative', 'Paris', 'Houssam El Barbry', 'Médine Benchouaia', 'Angélie Dijoux-Maréchal']

Date: 2023-10

N-terminal ends of polypeptides are critical for the selective co-translational recruitment of N-terminal modification enzymes. However, it is unknown whether specific N-terminal signatures differentially regulate protein fate according to their cellular functions. In this work, we developed an in-silico approach to detect functional preferences in cellular N-terminomes, and identified in S. cerevisiae more than 200 Gene Ontology terms with specific N-terminal signatures. In particular, we discovered that Mitochondrial Targeting Sequences (MTS) show a strong and specific over-representation at position 2 of hydrophobic residues known to define potential substrates of the N-terminal acetyltransferase NatC. We validated mitochondrial precursors as co-translational targets of NatC by selective purification of translating ribosomes, and found that their N-terminal signature is conserved in Saccharomycotina yeasts. Finally, systematic mutagenesis of the position 2 in a prototypal yeast mitochondrial protein confirmed its critical role in mitochondrial protein import. Our work highlights the hydrophobicity of MTS N-terminal residues and their targeting by NatC as important features for the definition of the mitochondrial proteome, providing a molecular explanation for mitochondrial defects observed in yeast or human NatC-depleted cells. Functional mapping of N-terminal residues thus has the potential to support the discovery of novel mechanisms of protein regulation or targeting.

Mitochondria play a central role in eukaryotic cells and defects in their biogenesis are implicated in many serious human diseases. Most mitochondrial proteins are encoded in the nucleus, translated in the cytoplasm and directed to their final destination by N-terminal extensions, called mitochondrial targeting sequences (MTS). A good efficiency of this import mechanism is critical for mitochondrial function and for the success of gene therapies involving the replacement in the nucleus of a defective mitochondrial gene by a functional non-mutated version. Through a systematic analysis of amino acid preferences in yeasts, we observed a strong and specific overrepresentation of hydrophobic residues at position 2 of MTSs that are known to avoid cleavage of the hydrophobic initiator methionine and to define potential substrates of the N-terminal acetyltransferase NatC. Therefore, most MTSs have two consecutive hydrophobic residues at position 1 and 2, a feature that is conserved throughout evolution. Using CRISPR/Cas9 technology, we showed that mutation of mitochondrial proteins at position 2 reduced their import efficiency. We thus demonstrated that the residue at position 2 of MTS is an important determinant of mitochondrial import under strong selective pressure. These findings may have valuable implications for improving the therapy of human mitochondrial diseases.

Funding: M.G. received funding from the Sorbonne University Emergence program, and from the ARC Foundation for Cancer Research ( https://www.fondation-arc.org/recherche-cancer , PJA 20171206624). A.D.-M. received a salary from Sorbonne University as part of the Emergence program. L.G. received a Master 2 scholarship from the Systems Biology Network of the Institute of biology Paris-Seine. H.E.B. received a PhD grant from the doctoral school Complexité Du Vivant of Sorbonne University, and S.N.'s PHD was funded by the program for disabled students of the Centre National de la Recherche Scientifique. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Data Availability: Scripts used to perform analyses of amino acid usage bias are available at https://tinyurl.com/mr2r54x8 . Mass spectrometry proteomic data were deposited on the ProteomeXchange Consortium via the PRIDE partner repository with dataset ID PXD034922 ( http://www.ebi.ac.uk/pride/archive/projects/PXD034922 ). The microarray data and the related protocols are available at the ArrayExpress website with the dataset identifiers E-MTAB-11772 ( https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-11772/ ). All other relevant data are within the manuscript and its Supporting Information files.

Our work has revealed an unknown and critical feature of MTSs. Defects in mitochondrial biogenesis are implicated in a number of serious pathologies, including neuropathies, cardiovascular disorders, myopathies, neurodegenerative diseases and cancers [ 24 ]. In a long-term perspective, understanding the role of this signature for mitochondrial protein import is therefore crucial for human health. For example, this knowledge could be very useful for the design of mitochondrial gene therapies based on the targeting of specific proteins to the mitochondrial compartment [ 25 ]. Beyond the results obtained for the proteins of the mitochondrial compartment, we provide the community with the first functional mapping of N-terminal residues, whose future exploration will undoubtedly contribute to the discovery of novel mechanisms of protein regulation or targeting.

Most mitochondrial proteins are translated by cytoplasmic ribosomes and addressed to the mitochondrial compartment via a cleavable presequence, the so-called Mitochondrial Targeting Sequence (MTS). We found that, throughout the Saccharomycotina yeast lineage, MTS-bearing mitochondrial precursors present, just after the iMet, predominantly hydrophobic amino acids known to define potential NatC substrates (i.e., Leu, Phe, Ile, and Trp). Affinity-selective purification of the translating ribosomes then confirmed their co-translational recognition by this N-terminal acetyltransferase. Finally, we demonstrated the functional significance of the bias detected at position 2 of MTSs by site-directed mutagenesis of this position in a dominant negative allele of the essential mitochondrial protein Hsp60p. Only the amino acids found to be overrepresented at position 2 of MTSs in our in-silico analyses allowed the efficient mitochondrial import of the toxic Hsp60p, as revealed by the associated loss of cell viability.

This extensive knowledge on the general importance of protein N-terminal signatures sharply contrasts with the lack of data describing their potential implication in specific functional pathways. To address this issue, we have developed an in-silico approach at the proteome scale allowing for the systematic detection of functional biases in the use of the 20 amino acids at position 2 of proteins. The rationale of our approach is that, given the importance of the residue at position 2 in the life cycle of proteins, evolutionary constraints could have led to the selection of specific amino acids at this position in groups of proteins from the same cellular compartment or involved in the same biological pathway. As a proof of concept, we analyzed the S. cerevisiae proteome to statistically assess all significant and specific overrepresentations of one or more amino acids at position 2 in protein subsets sharing the same Gene Ontology (GO) annotations. Using this approach, we were able to identify various groups of proteins with common GO annotations and characterized by particular amino acid preferences at position 2. We hypothesized that these newly identified N-terminal signatures are likely associated with important functional roles. We further characterized, with a combination of in-silico analyses and experimental assays, the function of the strong position-specific bias detected at position 2 for mitochondrial proteins.

The biological importance of these N-terminal modifications is underlined by the strong defects observed when the corresponding enzymes are inactivated. Complete inactivation of MetAPs activity is lethal in Escherichia coli, Salmonella typhimurium, and Saccharomyces cerevisiae, and the identification of human MetAP1 and MetAP2 as targets for putative anticancer drugs further confirmed the importance of this enzyme family [ 16 , 17 ]. In addition, N-terminal acetylation has been implicated in several diseases, including cancers, developmental disorders, as well as Parkinson’s disease in humans [ 7 ], and also in defects in photosynthesis and growth in plant [ 18 ]. At the molecular level, the importance of the nature of the second residue and the associated N-terminal modifications have been implicated in all major steps of proteins life cycle including folding and aggregation, protein interactions and complex formation, protein subcellular targeting, and protein turnover through proteasomal degradation pathways (reviewed in 7). In particular, during the last decades, the exploration of the N-end rule pathway, linking the nature of the N-terminus to protein in vivo stability has emphasized the importance of protein N-end signatures and demonstrated that iMet cleavage and N-terminal acetylation are important players in the early control of proteins life cycle [ 15 , 19 – 22 ]. Finally, preferences for the use of amino acids at position 2, different from those observed globally in the proteome, have been described in various species, suggesting that selection pressures act on this particular protein position. In addition, these biases are taxon-specific, indicating progressive changes during evolution [ 23 ].

For instance, methionine aminopeptidases (MetAPs) will remove the initiator methionine (iMet) in a large fraction of the nascent chains displaying an amino acid with a small radius of gyration (Ala, Cys, Gly, Pro, Ser, Thr, or Val) at position 2 [ 6 ]. In addition, N-terminal acetyltransferases (NATs) catalyze the irreversible covalent attachment of an acetyl group (CH 3 CO) to the free α-amino group (NH 3 + ) at the protein N-terminus. Three major NATs, namely NatA, NatB and NatC, that acetylate nascent chains in a co-translational manner, are conserved from yeast to human [ 7 ]. The actual impact of the inactivation of NATs on the N-terminal acetylation of subsets of cellular proteins was measured by various proteomics techniques, eventually providing experimental data on the N-terminal acetylation status for up to 10% of the cellular proteome in yeast and in human [ 8 – 12 ]. All these studies led to the conclusion that NatA can acetylate the N-termini beginning with Ala, Cys, Gly, Ser, Thr, or Val after removal of iMet, whereas the other two can acetylate the uncleaved iMet if it is followed by a second specific residue, namely Asn, Asp, Gln, or Glu for NatB, and Ile, Leu, Phe, or Trp for NatC. Based on these substrate specificities together with the observed frequencies of the 20 amino acids at position 2 in proteomes and the partial N-terminal acetylome data, it was estimated that 80–90% of human and 50–70% of yeast proteins could potentially be acetylated by one of these three enzymes [ 7 , 13 , 14 ]. Additionally, Methyl-, myristoyl-, and palmitoyltransferases can also selectively modify, during or after translation, the N-terminus of proteins after iMet cleavage [ 5 ]. Finally, after protein cleavage by diverse peptidases including MetAPs, arginine can be post-translationally added by Arginyl-tRNA transferases at the N-terminal end of proteins exposing specific N-terminal residue such as Cys, Asp and Glu [ 15 ].

As soon as they emerge from the ribosomal tunnel, the protein nascent chains recruit factors that will play key roles in their life cycle [ 1 – 5 ]. Such recruitment is dependent on the nature of the protein N-terminal amino acid residues. In particular, the specific recognition and co-translational action of several N-terminal modification enzymes is determined by the type of the amino acid residue directly following the initiator methionine, namely the residue at position 2.

Results

Detection of functional biases in amino acid usage at position 2 in S. cerevisiae proteome The amino acid usage at position 2 of S. cerevisiae proteins (Fig 1A) is clearly different from the average amino acid distribution in its proteome. Strikingly, we observed a serine at this position in nearly 25% of the yeast proteins. This overrepresentation is both highly significant (HGT score equal to 268 corresponding to a p-value after hypergeometric test equal to 10−268, see Methods for a definition of HGT score) and specific to position 2 (aspecificity score of 0%, see Methods). By contrast, several amino acids are underrepresented (HGT score <-3 and aspecificity score <5%), including the hydrophobic residues leucine, isoleucine, and tyrosine as well as charged or polar residues such as arginine, glutamate and glutamine. Such asymmetric distribution, with the over-representation of serine, is conserved in budding yeasts (S1A Fig), as indicated by proteome analysis of 17 yeasts spanning nearly 400 million years of evolution of the Saccharomycotina lineage [26,27]. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 1. Specific amino acids are over-represented at position 2 in S. cerevisiae proteome. (A) Serine is the most represented protein at position 2 in S. cerevisiae proteome. Frequencies (Barplot, left) of the 20 amino acids at position 2 (green bars) are compared to their average usage at any position in S. cerevisiae proteome (blue bars). Table (right) recapitulates quantitative values calculated to identify significant amino acid usage biases at position 2: position2/mean log2Ratio, HGT score obtained from hypergeometric tests, and aspecificity scores (see Methods). Amino acids for which the absolute value of the HGT score is greater than 3 (p-value<10−3) and the aspecificity score is lower than 5% are highlighted. Such values indicate a significant and specific bias of usage at position 2 revealing a possible selection pressure. (B) Several protein groups defined by their annotation in the Gene Ontology database have a specific N-terminal signature. (B1) GO term selection procedure illustrated on the particular case of N-terminal preferences for leucine. Twenty-eight GO terms describing components were extracted on the basis of their significant overrepresentation of leucine at position 2 assessed by hypergeometric test (HGT score > 4). After several filters, 7 of them were finally retained, including a single GO BestN term and 6 GO BestF terms (see more details in S1 Supplementary Methods and S2 Fig). (B2) Heatmap of HGT scores for the 6 BestN GO terms that best describe the GO terms selected in the "components" classification when applying this strategy to the 20 amino acids. Stars indicate significant HGT scores (*HGT>3, **HGT>4). (B3) Bar graph showing the percentages of utilization of the 20 amino acids at position 2 in all selected BestN and BestF GO terms in the "components" classification. Significant amino acid overrepresentation (HGT score > 3) are highlighted. Amino acids are sorted in decreasing order of use at position 2 in the proteome. https://doi.org/10.1371/journal.pgen.1010848.g001 We further developed a computational approach to explore the functional relevance of amino acid usage biases at the proteome scale and assess their statistical significance (Fig 1B1). We took advantage of the robust Gene Ontology annotations of the S. cerevisiae proteome to detect all significant and specific overrepresentations of one or more amino acids at position 2 in protein subsets defined by common GO annotations. In a first step, we scanned the diverse GO terms describing the components, the cellular pathways or the protein molecular functions and we calculated the frequencies of the 20 amino acids at position 2 in the associated protein subsets. We then identified the subsets displaying significantly higher frequencies than expected by the overall distribution of amino acids at position 2. We assessed the significance of the enrichments by calculating the HGT scores derived from p-values obtained in hypergeometric tests. With this strategy, we identified 232 GO terms with a significantly increased frequency (HGT>4) for at least one amino acid. These GO terms were distributed as follows in the different GO categories: 72 GO terms corresponding to components, 138 to cellular pathways and 22 to molecular functions (S1 Table). We designed a dedicated algorithm to reduce this initial list and eliminate information redundancy in each category (See S1 Supplementary Methods and S2 Fig). Rather than relying on the hierarchical relationships between the GO terms, as usually done by existing solutions [28], our algorithm directly analyzes the characteristics of the protein subsets defined by these GO terms, such as the number of proteins, the enrichment factors, and the overlaps between the subsets. In a first filtering step, we eliminate GO terms with the lowest frequency bias (<1.8 fold change) and corresponding to very generic components or processes, which typically include a very high number of proteins. Our algorithm then retains the GO terms encompassing the largest number of proteins and maximizing the coverage of the original dataset (called BestN GO terms, see Methods). When possible, it complements this set of bestN GO terms with one or more smaller GO terms displaying the highest position 2 frequency biases (called BestF GO terms, see Methods). Each BestF is ultimately linked to the nearest BestN based on the overlap of their respective set of proteins. Fig 1B2 and 1B3 illustrates the results of this filtering for the GO category "components" and represents the amino acid usage biases observed in the 17 GO terms that were finally retained after processing the initial list of 72 GO terms. Results for the other two GO categories, "cellular pathways" and "molecular functions", are available as supplementary data (see S1 Table and S3 Fig in which the original lists were reduced from 138 to 21 and from 22 to 10 GO terms, respectively). The final list of GO terms "components" includes the 6 BestN GO terms "Ribonucleoprotein complex", "Chromosome", "Mitochondrion", "Nuclear pore central transport channel", "Extracellular Region" and "Storage vacuole" (Fig 1B2). These GO terms covered 85% of the proteins with the amino acid preferences associated with the 55 GO terms retained after the first filtering step that removed low bias generic GO terms. The 11 GO BestF terms (Fig 1B3) provided a more detailed picture of protein subsets with specific N-terminal amino acid usage. For example, "90S preribosome" (3.4-fold increase in glycine utilization at position 2) and "large cytosolic ribosomal subunit" (5.1-fold increase in alanine utilization at position 2) led to a much better characterization of the proteins involved in the bias observed in the more general GO term "ribonucleoprotein complex". Similarly, the "Kinetochore" proteins (4.4-fold increase in aspartate utilization at position 2) explain a large part of the aspartate bias detected in the GO term "Chromosome" since it accounts for 12 of the 21 proteins responsible for this bias detection. The BestF GO terms also pointed to several mitochondrial sub-compartments with high preferences for leucine at position 2 (4.0- to 13.8-fold increase in leucine usage), suggesting that this bias involved only a specific subset of the mitochondrial proteins localized in the "Mitochondrial inner membrane” and the "Mitochondrial matrix". The physiological significance of the amino acid utilization biases detected remains to be elucidated, and some of them might be related to the activity of N-terminal modifying enzymes such as MetAPs or NATs. One of the strongest biases revealed by our approach is the dramatic overrepresentation of leucine at position 2 in GO terms related to the mitochondrial compartment. In this work, we sought to clarify, in S. cerevisiae, the functional significance of usage bias at position 2 of mitochondrial precursors and its potential relationship with NatC.

The N-terminal Mitochondrial Targeting Sequence has a specific conserved signature at position 2 typical of NatC potential substrates We first investigated whether the high overrepresentation of leucine at position 2 of the mitochondrial precursors could correlate with other specific features in the amino acid composition of their N-terminal region. Especially, precursors of matrix, inner membrane and intermembrane space proteins are dependent for their mitochondrial import on an N-terminal sequence forming an amphiphilic alpha helix, 15–50 residues-long, enriched in hydrophobic and positively charged residues [29,30]. Such a mitochondrial targeting sequence, called MTS, is observed in 361 of the 726 yeast proteins annotated as mitochondrial with high confidence (S2 Table, see Methods). Consistently, the 361 mitochondrial precursors harboring a MTS showed a dramatic overrepresentation of arginine and an underrepresentation of negatively charged residues in the positions 3 to 20 of their N-terminal region (Fig 2A1). This profile contrasted strongly with that of mitochondrial precursors lacking MTS (Fig 2A2). Strikingly, we found that the bias at position 2 described above for the proteins associated with mitochondrial GO terms was specific of mitochondrial proteins with a MTS (Fig 2A1). More precisely, four hydrophobic residues (Leu, Phe, Trp and Ile) are significantly overrepresented (HGT value ranging from 3.9 for isoleucine to 71 for leucine) at position 2 of proteins with MTS (Fig 2A1). Furthermore, it should be noted that these biases in favor of Leu, Phe, Trp and Ile are strictly restricted to position 2 of the MTS. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 2. The specific N-terminal preference for large hydrophobic amino acid in MTSs make them putative substrate of NatC. (A) Heatmap of HGT scores showing specific uses of amino acids in MTSs, including the previously unknown specific preference at position 2 for large hydrophobic amino acids. HGT scores were calculated to compare amino acid usage at the first 20 positions of mitochondrial precursors with (A1) or without (A2) an MTS with their average usage in the proteome at these same positions. (B) Graphical representation of amino acid frequencies at position 2 of mitochondrial proteins revealing that a large majority of those carrying an MTS are putative substrates of NatC. The following amino acids are significantly overrepresented in position 2 of MTS compared to the proteome: Leu (p-val = 10−71), Phe (p-val = 10−15), Trp (p-val = 10−6) and Ile (p-val = 10−4). (C) Heatmap of HGT scores showing that amino acid preferences at position 2 of MTSs are conserved in 17 budding yeasts of the Saccharomycotina lineage. In each species, HGT scores were calculated to compare the use of amino acids at position 2 of MTSs with their average use at the same position in the corresponding proteome. The overrepresentation of large hydrophobic amino acids, especially leucine, is observed in all analyzed species. The phylogenetic tree was modified from (26). https://doi.org/10.1371/journal.pgen.1010848.g002 Quantitative analysis of the distribution of amino acids at position 2 of the MTSs (Fig 2B) shows the importance of this previously unknown N-terminal signature of the mitochondrial addressing sequence: nearly 60% of them have an N-terminal signature with a Leu, Phe, Trp and Ile, which contrasts strongly with the low representation of the latter at position 2 in the proteome (see L, F, I, W residues frequencies in Fig 1A). For instance, leucine accounts for more than 35% of the amino acids observed at position 2 of the MTSs, whereas it is poorly used in the S. cerevisiae proteome at this position (Fig 1A). Interestingly, we also found an overall significant under-representation at position 2 of the MTSs of amino acids with small radii of gyration (Gly, Ala, Ser, Cys, Thr, Pro and Val), known to induce the cleavage of iMet by methionine aminopeptidase [6]. Finally, the analysis of amino acid usage biases in the MTSs of 17 budding yeasts of the Saccharomycotina lineage [26,27], demonstrated that this specific signature at position 2 is a conserved characteristic of mitochondrial addressing sequences (Fig 2C). Indeed, in all the studied species, we observed: (1) a strong over-representation of the same hydrophobic amino acids at position 2 and in particular of leucine whose HGT score exceeds the value of 20 in all species, and (2) a bias restricted to position 2 since the HGT scores of leucine, phenylalanine, isoleucine and tryptophan were systematically lower than 3.9 for all the other positions of the MTSs (S1B Fig). Interestingly, the conserved amino acid bias described above at position 2 of MTSs perfectly correlated with the known specificities of N-acetyl transferases in yeast (Fig 2B). Indeed, leucine, phenylalanine, isoleucine and tryptophan, which are overrepresented at this position, are known to define potential substrates of NatC in yeast when positioned after iMet [9]. Conversely, serine, alanine and threonine, which are the most under-represented at this position, are known to induce iMet cleavage and subsequent targeting by NatA [8]. Together, these observations strongly suggest that position 2 of N-terminal mitochondrial targeting sequences is under selective pressure and that the nature of the residue at position 2 is crucial for the fate of mitochondrial precursors imported through the MTS-dependent pathway. The correspondence between the amino acid usage bias at position 2 of MTSs and the preferences of NatC suggests a link between mitochondrial precursors fate and NatC targeting. They raised two important questions addressed in the following sections. First, are mitochondrial precursors indeed targeted by the N-terminal acetyl transferase NatC? And second, what are the functional consequences of changing the residue located at position 2 of a typical MTS?

[END]
---
[1] Url: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1010848

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/