(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .



Integration of estimated regional gene expression with neuroimaging and clinical phenotypes at biobank scale [1]

['Nhung Hoang', 'Department Of Computer Science', 'Vanderbilt University', 'Nashville', 'Tennessee', 'United States Of America', 'Neda Sardaripour', 'Department Of Biomedical Engineering', 'Grace D. Ramey', 'Biological']

Date: 2024-09

An understanding of human brain individuality requires the integration of data on brain organization across people and brain regions, molecular and systems scales, as well as healthy and clinical states. Here, we help advance this understanding by leveraging methods from computational genomics to integrate large-scale genomic, transcriptomic, neuroimaging, and electronic-health record data sets. We estimated genetically regulated gene expression (gr-expression) of 18,647 genes, across 10 cortical and subcortical regions of 45,549 people from the UK Biobank. First, we showed that patterns of estimated gr-expression reflect known genetic–ancestry relationships, regional identities, as well as inter-regional correlation structure of directly assayed gene expression. Second, we performed transcriptome-wide association studies (TWAS) to discover 1,065 associations between individual variation in gr-expression and gray-matter volumes across people and brain regions. We benchmarked these associations against results from genome-wide association studies (GWAS) of the same sample and found hundreds of novel associations relative to these GWAS. Third, we integrated our results with clinical associations of gr-expression from the Vanderbilt Biobank. This integration allowed us to link genes, via gr-expression, to neuroimaging and clinical phenotypes. Fourth, we identified associations of polygenic gr-expression with structural and functional MRI phenotypes in the Human Connectome Project (HCP), a small neuroimaging-genomic data set with high-quality functional imaging data. Finally, we showed that estimates of gr-expression and magnitudes of TWAS were generally replicable and that the p-values of TWAS were replicable in large samples. Collectively, our results provide a powerful new resource for integrating gr-expression with population genetics of brain organization and disease.

Funding: This work was supported by the National Institutes of Health (1RF1MH125933 to MR; R35GM127087 to JAC; R01HG011138 to ERG; R01GM140287 to ERG) and the National Science Foundation (2207891 to MR). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2024 Hoang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

We propose that overcoming these limitations can help facilitate the adoption of TWAS in neuroimaging genomics. Here, we help to do so by using estimated gr-expression to integrate large-scale genomic, transcriptomic, neuroimaging, and clinical data sets. First, we showed that patterns of estimated gr-expression recapitulate brain regional identities and inter-regional correlation structure of directly assayed gene expression. Second, we used these estimates to perform TWAS of gr-expression and gray-matter volumes in the UK Biobank data set [ 35 – 37 ]. We directly benchmarked these TWAS against GWAS to show broad similarities but also important differences in the interpretability and statistical power of these approaches. Third, we integrated our results with an independent TWAS of brain-related clinical phenotypes from BioVU, the Vanderbilt Biobank [ 53 ]. This integration linked SNPs and genes to neuroimaging and clinical phenotypes through associations with estimated gr-expression. Fourth, we built polygenic models of gr-expression to discover associations of gr-expression with neuroimaging phenotypes in the Human Connectome Project (HCP) [ 54 ], a small neuroimaging-genomic data set with high-quality functional imaging data. Finally, we showed that estimates of gr-expression were replicable in an independent data set. We also showed that magnitudes of TWAS were generally replicable while p-values of TWAS were replicable in large samples of the UK Biobank. We developed a browser-based application for interactive exploration of our multifaceted association results. Collectively, our analyses help to facilitate the adoption of TWAS in neuroimaging genomics.

TWAS are common in the wider genomics literature [ 44 – 48 ] but, despite their advantages, are rare in neuroimaging genomics. We hypothesize that one major reason for their lack of adoption lies in the relatively theoretical nature of their appeal to neuroimaging researchers. First, the indirect nature of estimated gr-expression can make it difficult to relate this quantity to directly assayed gene expression of regional transcriptomic studies. Second, the similarly indirect nature of TWAS can make it difficult to ascertain the practical advantages of these studies relative to the more established GWAS. For example, the few existing TWAS of neuroimaging phenotypes in the literature [ 49 – 52 ] have not benchmarked these analyses against GWAS. Third, and related to these limitations, the field lacks integrated resources that link associations of regional estimates of gr-expression and SNPs on the one hand, to neuroimaging and clinical phenotypes on the other hand.

We used this estimated gr-expression to perform transcriptome-wide association studies or TWAS. We specifically associated Joint-Tissue estimates of gr-expression with neuroimaging phenotypes and brain-related clinical phenotypes. TWAS follow the same methodology as GWAS, except that they link variation of neuroimaging phenotypes to regionally specific gr-expression of genes, rather than to regionally agnostic variation of SNPs. TWAS have several advantages over GWAS: they integrate signals across multiple SNPs, provide interpretable results at the level of genes, are less susceptible to linkage disequilibrium, and require many fewer statistical tests. However, TWAS are also limited to genes with available estimates of regional gr-expression and, like GWAS, are ultimately association studies that cannot alone establish causal effects of genes on phenotypes.

Here, we help to bridge these gaps by estimating genetically regulated gene expression, or gr-expression, across cortical and subcortical brain regions. Gene expression is regulated by multiple genetic and environmental factors. Our estimation focuses on one of these factors, genetically encoded elements that are close to the gene along the linear genome (cis-genetic regulation) [ 43 ]. We do not consider other factors, including genetically encoded elements far from the gene (trans-genetic regulation), as well as environmental factors. The genetics literature includes a variety of methods for estimating regional gr-expression from genetic data [ 44 , 45 ]. Our study uses Joint-Tissue Imputation, a state-of-the-art method that trains linear regression models of gr-expression on directly measured gene expression from postmortem samples [ 43 ].

Such integrative analyses ultimately require data on genomes, brain-wide gene expression, as well as neuroimaging and clinical phenotypes in the same human populations. Correspondingly, such analyses are hampered, at present, by the lack of these multifaceted data. Instead, the genetic basis of individual variation in neuroimaging phenotypes is primarily investigated with genome-wide association studies (GWAS) [ 16 – 18 , 28 – 32 ]. Prominent examples of these studies have used data from the ENIGMA Consortium [ 33 , 34 ], the UK Biobank [ 35 – 37 ], and the ABCD Project [ 38 ]. These studies have linked variation in phenotypes to single-nucleotide polymorphisms (SNPs), variants of DNA base pairs at specific positions in the genome. Strengths of these studies include the ability to scan whole genomes and to directly discover nucleotide-level underpinnings of neuroimaging phenotypes. Limitations of these studies include the inability to disambiguate correlated association patterns of adjacent SNPs (known in genetics as linkage disequilibrium) and, more generally, to identify biological mechanisms of variation in neuroimaging phenotypes. They also include the need to test millions of associations (1 test for each pair of SNP and phenotype) and the consequent burden on statistical power necessitated by stringent correction for these many tests. In practice, robust GWAS for many complex phenotypes, such as height or blood pressure, can require samples from millions of people [ 39 – 41 ]. The costs of imaging the brain, however, make it impossible to acquire samples of this size in neuroimaging research [ 42 ]. Collectively, these limitations have left gaps in existing analyses of human brain individuality.

Much of human neuroscience seeks to understand the biological basis of individual variation in brain organization [ 1 – 6 ]. Studies have shown that this variation is stable over time [ 7 , 8 ], predicts function or behavior [ 9 , 10 ], and can act as a fingerprint of healthy [ 11 , 12 ] and diseased [ 13 , 14 ] brain states. They have also shown that much of this variation is strongly heritable and therefore genetically encoded [ 15 – 18 ]. Separately, complementary studies have shown the presence of correlated variation in gene expression and neural organization across brain regions [ 19 – 27 ]. Collectively, this literature motivates the need for integrative analyses of brain individuality across people and brain regions.

Results

Estimation of genetically regulated gene expression across brain regions at biobank scale

We used Joint-Tissue Imputation [43], a recently developed state-of-the-art method from computational genomics, to estimate the genetically regulated expression of 18,647 genes across 10 cortical and subcortical brain regions for 45,549 people from the UK Biobank (64 ± 7.7 years old, 52% female) and 657 people in the HCP (29 ± 3.6 years old, 52% female).

Joint-Tissue Imputation models estimate genetically regulated gene expression (gr-expression) as a weighted linear combination of SNPs that are close to the gene of interest along the linear genome. These models learn weights for each tissue–gene pair by training on genetic sequences and directly measured gene expression from postmortem samples (Fig 1A). Joint-Tissue Imputation leverages shared patterns of genetic regulation across brain regions to improve the estimation of gr-expression in individual regions. In this way, this method extends and generalizes PrediXcan, a pioneering estimation method that models gr-expression by training models only on expression data from the brain region of interest [55].

PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 1. Estimation of genetically regulated gene expression from genetic data. (A) Pipeline for estimation of gr-expression with Joint-Tissue Imputation. Left: Joint-Tissue Imputation models are trained on genetic sequences and directly assayed gene expression from postmortem brain samples in the GTEx and PsychEncode projects. Center: The models are trained to estimate gr-expression as a weighted sum of SNPs that are close to the gene of interest along the linear genome. The estimation includes elastic-net regularization because the number of these SNPs typically exceeds the number of samples in the training data. Right: The trained models were used to estimate gr-expression from genetic sequences of neuroimaging-genomic samples in the UK Biobank and the HCP. (B) An illustration of the 10 cortical and subcortical regions with available models of gr-expression. Numbers in parentheses refer to all models that passed baseline performance thresholds for the prediction of observed gene expression on held-out data (r2 > 0.01 and p FDR < 0.05). (C, D) Predictive performance of gr-expression models on held-out data from the GTEx data set. (C) Histograms of r [2], the variance of directly assayed gene expression explained by estimated gr-expression. (D) Histograms of p-values (−log 10 p FDR ) on these r2 values. Regions are colored as in panel B. FDR, false discovery rate; GTEx, Genotype-Tissue Expression Project; HCP, Human Connectome Project; SNP, single-nucleotide polymorphism. https://doi.org/10.1371/journal.pbio.3002782.g001

In our study, we used Joint-Tissue Imputation models that were previously trained on whole-genome sequences and gene-expression data from 838 brain samples in the Genotype-Tissue Expression Project (GTEx) [56]. The samples comprise 10 cortical and subcortical regions (Fig 1B). To test the replicability of our analyses, we additionally used the same models trained on sequencing and expression data from 415 independent samples of the dorsolateral prefrontal cortex (DLPFC) in the PsychENCODE Project [57]. Collectively, we considered 94,345 Joint-Tissue Imputation models, or all performant brain-regional models currently available in the literature.

Joint-Tissue Imputation models have been extensively validated in previous work [44–48]. This validation included quantifying the relationship of gr-expression to directly assayed expression. In this study, we adopted all models of gr-expression that passed baseline performance thresholds for the prediction of observed gene expression on held-out data (r2 > 0.01 and p FDR < 0.05). In practice, the predictive performance of gr-expression models spanned a wide range (Fig 1C and 1D). Low predictive performance does not necessarily mean that the models are inaccurate because the genetic regulation of gene expression—the upper bound on predictive performance—varies considerably for individual genes. Moreover, relatively low associations between gr-expression and assayed expression are more than offset by gains in statistical power of transcriptome-wide association analyses, as we describe below.

Genetically regulated gene expression recapitulates the organization of directly assayed gene expression We began by testing the extent to which gr-expression recapitulated existing knowledge of genetic-ancestry relationships, brain-regional identities, as well as inter-regional correlations of directly assayed gene expression. First, we tested if gr-expression patterns reflected known genetic-ancestry relationships from the ethnically diverse sample of the UK-Biobank cohort (Methods, S1 Table). Genetic ancestry denotes genetic commonalities within groups of people but does not necessarily reflect genealogical ancestry (family lines) or self-reported ethnicity. We followed standard practice to estimate genetic ancestry using principal component analysis of gene data. We specifically used principal component analysis to generate low-dimensional embeddings of brain-wide gr-expression from each person (using the people × [brain-wide gr-expression] matrix). As expected, this analysis partitioned people into clusters of African, Asian, and European populations with gradients between these clusters reflecting known patterns of genetic admixture (Fig 2A). This embedding reflects patterns of genetic ancestry that are known and were previously described in analyses of genetic-sequence data [58]. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 2. Genetic ancestry, regional identity, and inter-regional organization of estimated gr-expression. (A, B) Principal component embeddings of estimated gr-expression from the ethnically diverse sample of the UK-Biobank cohort (S1 Table). (A) An embedding of brain-wide gr-expression: scatter plots of principal components of the people × [brain-wide gr-expression] matrix, where people denote people from the UK-Biobank sample and brain-wide gr-expression denotes brain-wide estimates of gr-expression for all genes that had Joint-Tissue Imputation models for each of the 10 regions. (B) An embedding of regional gr-expression: scatter plots of principal components of the regions × [regional gr-expression] matrix where regions denote the 10 regions of people from the UK-Biobank sample and regional gr-expression denotes regional estimates of gr-expression for all genes that had Joint-Tissue Imputation models for each of these regions. (C–K) A 3 × 3 matrix of plots of inter-regional coexpression: correlations between directly assayed expression and estimated gr-expression. The first row and column show results on directly assayed gene expression data from the Allen Human Brain Atlas. The second row and column show results on directly assayed gene expression data from the GTEx project. The third row and column show results on estimated gr-expression from the ethnically diverse sample of the UK-Biobank sample. (C, G, K) Associations between inter-regional coexpression and Euclidean distance in each data set. (D, E, H) Associations between inter-regional coexpression across data sets. P-values denote the probability of obtaining coexpression of at least equal magnitude in data with preserved correlation coefficients between coexpression and Euclidean distance (estimated from 10,000 random samples). (F, I, J) Heatmaps of inter-regional coexpression, averaged across people in each data set (regional numbers follow numbers in panel B). DLPFC, dorsolateral prefrontal cortex; GTEx, Genotype-Tissue Expression Project. https://doi.org/10.1371/journal.pbio.3002782.g002 Second, we tested if gr-expression patterns reflected regional brain identities across people in the same sample. For this analysis, we generated principal component embeddings of individual region-specific gr-expression (using the regions × [regional gr-expression] matrix). This analysis partitioned gr-expression into well-delineated regional clusters and revealed anatomically interpretable groups of cortical, limbic, and basal ganglionic clusters (Fig 2B). Collectively, these results show that gr-expression simultaneously reflects genetic-ancestry identities across people and brain-regional identities within people. They imply, specifically, that associations of gr-expression, or TWAS, can capture variation across people, similarly to GWAS, as well as variation across regions, similarly to regional transcriptomic studies. Third, we compared inter-regional correlations of estimated gr-expression to inter-regional correlations of directly assayed expression data from the Allen Human Brain Atlas and the GTEx Project. Recent studies have shown that inter-regional coexpression exponentially decays as a function of inter-regional distance [59,60]. We reproduced these relationships by showing strong inverse nonlinear relationships between inter-regional coexpression in the Allen and GTEx data and Euclidean distance: Allen versus distance r spearman = −0.711 and GTEx versus distance r spearman = −0.721 (Fig 2C and 2G). We found a similar, albeit weaker, relationship in the estimated gr-coexpression data: UK Biobank versus distance r spearman = −0.480 (Fig 2K). More directly, we found strong linear relationships between the inter-regional coexpression in the Allen and GTEx data: Allen versus GTEx r pearson = 0.683 (Fig 2D). We found similar relationships between estimated and directly assayed inter-regional coexpression: UKB versus Allen r pearson = 0.613 and UKB versus GTEx r pearson = 0.861 (Fig 2E and 2H). Heatmaps of all coexpression patterns reflected associations between cortical, basal ganglionic, and other subcortical systems (Fig 2F, 2I and 2J). Finally, we showed that the relationship of coexpression with distance was not sufficient to explain these similarities of coexpression (p ≤ 0.005 for all tests). Collectively, these results provide multifaceted support for the biological validity, anatomical interpretability, and practical utility of estimated gr-expression. In this way, they establish a foundation for the use of gr-expression in neuroimaging TWAS.

TWAS link genetically regulated gene expression with regional gray-matter volumes We hypothesized that the integration of multiple SNPs into models of regional gr-expression would allow us to detect novel and neurobiologically meaningful associations. To test this hypothesis, we performed TWAS to identify associations between individual variation of regional gr-expression and gray-matter volumes (Fig 3A). Gray-matter volumes are heritable phenotypes that have been linked to many genetic variants in previous GWAS [16,17,31]. We focused our association studies on 8 regions with available FreeSurfer [61] segmentations and therefore excluded substantia nigra and hypothalamus from subsequent analyses (see Methods for regional definitions). PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 3. Variation of regional gr-expression and regional volumes across people. https://doi.org/10.1371/journal.pbio.3002782.g003 (A) A pipeline for transcriptome-wide association studies, or TWAS, of neuroimaging phenotypes. The inputs to TWAS comprise values of regional gr-expression (left) and regional phenotypes (right), estimated in the same people. The outputs are associations between the individual variation of regionally specific gr-expression and neuroimaging phenotypes across people (center). (B) Within-regional associations of gr-expression and gray-matter volumes for 2 representative regions. Each point denotes an association between the individual variation of gr-expression and volume in the same region. The horizontal axis shows the chromosome location of individual genes. The vertical axis shows the p-values (–log 10 p) of associations. Solid-color points represent associations that pass the thresholds of p FDR = 0.05 or p Bonferroni = 0.05 (horizontal lines). Source data can be found in S2 Table. (C) Associations between SNP-based GWAS and gene-based TWAS for 2 representative regions. Left: Scatter plots of p-values (–log 10 p) for associations of all genes and SNPs. These plots preserve all genes and SNPs but lack the one-to-one relationship between genes and SNPs. Right: Corresponding scatter plots for the best-performing genes and SNPs. Each gene in TWAS matches with its best-performing SNP in GWAS. Similarly, each SNP in GWAS matches with its best-performing gene in TWAS. These plots show one-to-one relationships but exclude many genes and SNPs. (D) Numbers of associations (p FDR < 0.05 or p Bonferroni < 0.05) detected with TWAS and GWAS. Solid colors denote numbers of associations detected with TWAS alone. Beige colors denote number of genes detected with GWAS alone. Stripe patterns denote numbers of genes detected with both TWAS and GWAS. The top bar for each region adopts an FDR correction for TWAS associations (p FDR < 0.05), while the bottom bar adopts a stricter Bonferroni correction (p Bonferroni < 0.05). (E, F) Enrichment analyses of TWAS for biological annotations in the NHGRI-EBI GWAS Catalog. (E) Enrichment for biological annotations of genes whose gr-expression predicted regional volumes (p FDR < 0.05). Each point represents a biological annotation associated with at least 1 gene. The horizontal axis shows the p-values (–log 10 p FDR ) of individual annotations. Source data can be found in S3 Table. (F) Relationship between p-values and brain-relatedness of biological annotations. The horizontal axis shows bins of p-values (–log 10 p FDR ). The vertical axis shows the fraction of brain-related annotations within each bin. The p-value on the correlation coefficient was computed by permuting the annotations (estimated from 10,000 random samples). (G, H) Heatmaps of inter-regional TWAS between gr-expression and regional volumes. (G) Absolute numbers of associations. Numbers of genes whose gr-expression in 1 region (columns) predicted (p FDR < 0.05) the volume of another region (rows). Source data can be found in S4 Table. (H) Overlap coefficients. Number of genes that were common to both intra-regional and inter-regional associations in G, normalized by the size of the smaller of the intra- and inter-regional gene sets. FDR, false discovery rate; GWAS, genome-wide association studies; SNP, single-nucleotide polymorphism; TWAS, transcriptome-wide association studies. Our first TWAS inferred associations between gray-matter volumes and gr-expression of the same regions. To minimize the confounders of genetic ancestry, we restricted our analyses to the “White British” sample of the UK-Biobank cohort (S1 Table) [37]. We therefore performed TWAS on 39,565 people (52.2% female, 64.3 ± 7.7 years old), with covariates of genetic ancestry, sex, and age (Methods). We identified 1,065 associations (of 778 unique genes) between gr-expression and the volumes of 8 brain regions (p FDR < 0.05, Fig 3B and S1 and S2 Tables). The number of regional associations varied from 68 genes in the amygdala to 205 genes in the cerebellar hemisphere. Many genes that were found in this analysis, including CRHR1, ARL17A, NSF, and OGFOD2, have been implicated in previous GWAS of regional brain volumes, and have also been linked to brain disorders, including epilepsy, schizophrenia, and brain cancer [62–64].

TWAS reinforce GWAS associations and discover novel associations To directly show the methodological advantages of gene-based TWAS, we directly compared these studies to SNP-based GWAS. We made this comparison in 3 complementary ways. Direct relationship to GWAS. First, we performed a GWAS on the same sample and compared our TWAS associations for individual genes to GWAS results for the SNPs that formed part of corresponding models of gr-expression. These comparisons were dominated by many-to-many relationships between genes and SNPs, because several SNPs typically associate with the gr-expression of a single gene, and similarly, a single SNP can contribute to the gr-expression of several genes. The correlations between GWAS and TWAS p-values were moderate but statistically significant (0.275 ≤ r spearman ≤ 0.373, p < 0.001 for all regions, Fig 3C left, S1 Fig). To focus on the strongest TWAS and GWAS signals, we filtered these data in a way that retained the lowest p-value SNP for each gene and, simultaneously, the lowest p-value gene for each SNP. This process resulted in much stronger and strictly one-to-one relationships (0.479 ≤ r spearman ≤ 0.583, p < 0.001 for all regions, Fig 3C right and S1 Fig). Collectively, these results show that gene-based TWAS associations are related to, but also distinct from, SNP-based GWAS associations. Statistical power. Second, we investigated the nature of these differences by contrasting the number of associations detected by TWAS and GWAS. The high multiple-testing burden of GWAS typically requires strict genome-wide Bonferroni corrections. By contrast, the relatively smaller number of statistical tests in TWAS results in a lower multiple testing burden, and the expected polygenic associations of many phenotypes make it common to adopt less strict false discovery rate (FDR) corrections as an alternative to Bonferroni [48]. In our analyses, TWAS under both corrections identified many more genes than the corresponding GWAS (Fig 3D). Specifically, under FDR correction, TWAS detected associations of 673 unique genes (p FDR < 0.05) that lacked GWAS associations of corresponding SNPs (p Bonferroni < 0.05). Many of these genes have been previously linked to brain-related disorders, including Alzheimer’s disease (WDR12, AGFG2, and CDK5RAP3), schizophrenia (SRA1, WDR55, CORO7, DDAH2, PCDHA8), autism spectrum disorder (MAPK3, PCDHA13), and major depressive disorder (ZMAT2 and ITIH4) [65–74]. Separately, under Bonferroni correction, TWAS detected associations of 110 unique genes (p Bonferroni < 0.05) that lacked GWAS associations of corresponding SNPs (p Bonferroni < 0.05). These results show that TWAS discovers associations of many genes that are undetected with GWAS. Neurobiological interpretability. Third, to interpret the function of discovered genes more systematically, we tested the enrichment of our TWAS results using the NHGRI-EBI GWAS Catalog, a catalog of gene annotations curated from all human GWAS in the current literature [75]. We discovered 276 enriched biological annotations at p FDR < 0.05 (Fig 3E and S3 Table) and found that brain-related annotations were much more likely to be enriched than other annotations in the catalog (p < 0.001). Moreover, in addition to the overall enrichment for brain-related annotations, we found a strong positive correlation between the p-values of the enrichment and the fraction of discovered brain-related annotations (r spearman = 0.964, p < 0.001, Fig 3F). In other words, we found that the most enriched gene annotations were primarily brain related. S2 Fig shows that these enrichments were replicable with a Bonferroni correction on TWAS associations. Collectively, these results show the neurobiological relevance of our discoveries.

TWAS discover associations of genetically regulated gene expression in one brain region with gray-matter volumes of other regions Separately, we built on our region-specific TWAS findings to test for associations between gr-expression in one brain region and gray-matter volumes of other regions. Such associations are undefined for SNPs (because all cells share the same genome), but are interpretable for gr-expression (because of known inter-regional similarities in gene expression and organization [15,20,23,25]). In practical terms, these analyses also help to discover associations of regional volumes with genes for which these regions currently lack models of gr-expression (Fig 1B). Inter-regional TWAS discovered between 73 and 209 (median 133) associations (p FDR < 0.05) of gr-expression in one region with the volume of another region (Fig 3G and S4 Table). gr-Expression in the amygdala and anterior cingulate had the largest number of such associations (Fig 3G, columns) relative to the total available number of gr-expression models in each region (Fig 1B). For example, the gr-expression of FOXO3 in the anterior cingulate predicted the volumes of all 8 regions. This gene has been strongly linked to healthy aging in diverse human populations [76–78]. By contrast, the volume of putamen was predicted by the largest number of genes from other regions (Fig 3G, rows). Several of these genes—including MYLK2, KTN1, DCC, BCL2L1, TPX2, and HELZ—were associated with putamen volume in previous studies [16,79–84]. In particular, in our study, the gr-expression of MYLK2 and KTN1 predicted putamen volume in all regions that had gr-expression models of these genes (in 8 and 4 regions, respectively). In other cases, gr-expression of some genes in many regions predicted volumes of many other regions. For example, the gr-expression of LRRC37A2 in all 8 regions predicted volumes of all regions except putamen and caudate. Similarly, gr-expression of MAPT in the cerebellar hemisphere predicted all volumes except putamen and caudate. Both LRRC37A2 and MAPT have been linked to Parkinson’s disease, and MAPT encodes for tau and has been well studied in the Alzheimer’s disease literature [50,85,86]. We finally quantified the overlap between intra-regional and inter-regional associations. A heatmap of overlap coefficients of these associations formed 3 anatomically distinct groupings of cortical, basal ganglionic, and limbic regions (Fig 3H). These groupings show that the volumes of anatomically similar regions are more likely to share gene associations or, alternatively, that genes from one region are associated with volumes of anatomically similar regions. S2 Fig shows that these groupings were replicable with a Bonferroni correction on TWAS associations. Collectively, these results suggest a strong relationship between gr-expression profiles of anatomically similar brain regions and, more generally, show the utility of inter-regional TWAS of neuroimaging phenotypes.

Genetically regulated gene expression links regional volumes with clinical phenotypes We next moved beyond literature-based annotations to test whether gr-expression associations can link regional volumes with clinical phenotypes. To achieve this, we integrated our results with a separate TWAS on a sample of 70,439 people in BioVU, a biobank that contains DNA samples and de-identified electronic health records for patients at Vanderbilt University Medical Center [53,87,88]. Clinical phenotypes derived from electronic health records in BioVU were represented by phenotype codes extracted from International Classification of Diseases (ICD-9) billing codes. The BioVU TWAS used the same Joint-Tissue Imputation models to estimate gr-expression and to discover clinical associations (Fig 4A). In what follows, we filtered this clinical TWAS to focus on 156 brain-related clinical phenotypes. We then compared associations of regional gr-expression with these phenotypes to associations in our inter-regional neuroimaging TWAS. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 4. Association of gr-expression with both neuroimaging and clinical phenotypes. (A) Pipeline for BioVU TWAS: transcriptome-wide association studies of regional gr-expression and clinical phenotypes from the BioVU Biobank. Top left: Inputs to TWAS comprise electronic health records and DNA samples of the same people. Top center: Clinical phenotypes are extracted from ICD-9 codes present in electronic health records. Bottom left and center: Regional gr-expression is estimated from DNA samples of the same people. Right: Clinical phenotypes and regional gr-expression are combined in the BioVU TWAS. (B) Heatmap showing the number of times by which genes (rows) with regional gr-expression (columns) were linked to both regional volumes and clinical phenotypes. Each count denotes a regional gr-expression that was associated (p FDR < 0.05) with both a regional volume in the UK Biobank TWAS and with a brain-related clinical phenotype in the BioVU TWAS. (C) Heatmap showing the number of genes with regional gr-expression that linked regional volumes (columns) with clinical phenotypes (rows). Each count denotes a regional gr-expression that was associated (p FDR < 0.05) with both a regional volume in the UK Biobank TWAS and with a brain-related clinical phenotype in the BioVU TWAS. (D) Enrichment of clinical phenotypes for genes whose gr-expression predicted (p FDR < 0.05) regional volumes (rows) in the UK Biobank TWAS. Each point represents a brain-related clinical phenotype associated with at least 1 gene. The horizontal axis shows the p-values (–log 10 p FDR ) of individual phenotypes. Source data can be found in S5 Table. FDR, false discovery rate; TWAS, transcriptome-wide association studies. https://doi.org/10.1371/journal.pbio.3002782.g004 We identified 98 genes whose gr-expression in a specific region associated (p FDR < 0.05) with both volumes in the UK Biobank TWAS and with brain-related clinical phenotypes in the BioVU TWAS (Fig 4B). There were 22 genes in this set whose gr-expression in 4 or more regions linked volumes and clinical phenotypes. In previous GWAS and clinical studies, these genes have been associated with neurogenesis (WNT3) [89,90], neurodevelopmental delays (QRICH1) [91,92], addiction (HCG27) [93], depression (CCDC71, CYP21A2) [94,95], and other brain-related disorders [96,97]. BioVU clinical phenotypes that shared associations of gr-expression with regional volumes included a variety of nervous system symptoms and disorders including, most prominently, demyelinating diseases, motor-related symptoms, and dementia (Fig 4C). Several HLA genes that play a major role in the immune response (including HLA-B/C, HLA-DRB1, and HLA-DRB5) were associated with 2 or more regional volumes and simultaneously with demyelinating diseases, including multiple sclerosis, a prominent immune-mediated disorder [98]. In addition, genes in the HLA-DR and HLA-DQ families were associated with volumes of the cerebellar hemisphere and hippocampus in the UK Biobank and simultaneously with the abnormal movement phenotype in the BioVU TWAS. These associations represent candidate causal mechanisms for linking these genes with Parkinson’s disease and other movement disorders [99–102]. Genes C4B, MST1, and LRRC37A showed similar patterns of associations, in this way supporting and expanding previous links to motor disorders [86,103–105]. Separately, we identified 9 brain-related clinical phenotypes that were enriched (p FDR < 0.05) for genes whose gr-expression predicted regional volumes (Fig 4D and S5 Table). Most of these phenotypes were enriched for genes that predicted multiple regional volumes. For example, myoclonus was enriched for genes that predicted volumes of 6 regions, while multiple sclerosis and lack of coordination were enriched for genes that predicted volumes of 4 regions. Further, senile dementia was enriched for genes that predicted hippocampal and cerebellar volumes, while speech disturbances was enriched for genes that predicted anterior cingulate volume. The majority of motor-related clinical phenotypes were enriched for genes that predicted volumes of the cerebellum, a well-known center of motor control. S3 Fig shows that our association and enrichment analyses were replicable with a Bonferroni correction on TWAS associations. Overall, these results show that associations of gr-expression with phenotypes at different biological scales can be combined to reveal genes that link regional volumes and clinical phenotypes. Despite differences in samples and phenotype modalities, we identified a large overlap in the 2 TWAS between associations with regional gr-expression. Furthermore, we found evidence in related literature that supports associations between regional volumes and an array of brain-related disorders. Collectively, these findings highlight the integrated relationships between gene expression and brain phenotypes and the implications of these relationships for the study of brain-related disorders.

Replicability of estimated genetically regulated gene expression and TWAS We finally tested the replicability of our analyses in 3 complementary ways. First, we tested the replicability of gr-expression models by comparing the estimated gr-expression of the DLPFC using models trained on 2 distinct postmortem samples: our main sample from GTEx and an independent replication sample from PsychEncode [43]. We found that models trained on the 2 samples had highly similar patterns of gr-expression (r pearson of gr-expression: median 0.799, Q1–Q3 0.559–0.917, Fig 6A). Likewise, we found similar TWAS of these models with DLPFC volumes (r spearman = 0.540, p < 0.001, Figs 6B and S6). These results suggest that our framework for estimating gr-expression is robust to the training data, at least for sufficiently large samples. Second, we tested the replicability of association p-values and magnitudes in the UK Biobank using the independent HCP TWAS. As we saw above, the small HCP sample produced almost no associations at p FDR < 0.05. Correspondingly, we found that a small percentage of associations with p FDR < 0.05 in the UK Biobank were also present at the nominal threshold of p < 0.05 in the HCP TWAS (median 7.00%, Q1–Q3 4.53%–7.75%, Fig 6C). By contrast, the magnitudes of individual associations are strongly correlated with p-values (r spearman between magnitudes and −log 10 p: median 0.783, Q1–Q3 0.773–0.786, S6 Fig) but, unlike p-values, are relatively independent of the sample size [115]. Correspondingly, we found consistently strong correlations between magnitudes of associations that passed p FDR < 0.05 in the UK Biobank TWAS (r spearman : median 0.518, Q1–Q3 0.486–0.622, all p < 0.005, Figs 6D, 6E and S6). PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 6. Replicability of estimated genetically regulated gene expression and TWAS. (A, B) Replication of estimated gr-expression trained on independent PsychEncode data. (A) Histogram of correlations between gr-expression of the DLPFC estimated with models trained on GTEx data and independent PsychEncode data. (B) Scatter plot of TWAS associations based on gr-expression of the DLPFC estimated with models trained on GTEx data and independent PsychEncode data. Each point denotes p-values of associations between estimated gr-expression and DLPFC gray-matter volumes in the white-British sample of the UK-Biobank cohort. (C–E) Replication of genes that passed p FDR = 0.05 in discovery TWAS of gray-matter volumes. (C) Percentages of genes that were replicated at nominal p < 0.05 in replication TWAS. Source data can be found in S2 Data. (D) Correlations between effect magnitudes of genes in the replication and discovery TWAS. Dots denote analyses on the full UK Biobank (discovery) and HCP (replication) samples. Box plots denote analyses of discovery-replication splits of the white-British UK-Biobank sample, ordered from small to large replication samples. Each box plot was estimated from 300 random splits. (E) Scatter plots of effect magnitudes in the UK Biobank and HCP TWAS. Each point denotes effect magnitudes for a gene that showed p FDR < 0.05 in the UK Biobank TWAS. DLPFC, dorsolateral prefrontal cortex; FDR, false discovery rate; GTEx, Genotype-Tissue Expression Project; HCP, Human Connectome Project; TWAS, transcriptome-wide association studies. https://doi.org/10.1371/journal.pbio.3002782.g006 Third, we repeated these analyses on TWAS of discovery and replication subsets generated from 1,200 random splits of the white-British UK-Biobank sample (S2 Data). These additional analyses showed that replication samples of the same size as our HCP sample (657 people) had similarly small percentages of replicable associations (median 6.60%, Q1–Q3 6.20%−6.82%) and that larger samples showed much higher percentages (Fig 6C). Likewise, these analyses showed that replication samples of the same size as our HCP sample had strong correlations between magnitudes of effects (r spearman : median 0.575, Q1–Q3 0.568–0.579; all p < 0.001) and that larger samples showed modestly increased correlations between magnitudes (Figs 6D and S6). Collectively, these analyses suggest that the estimated gr-expression and magnitudes of TWAS associations were generally replicable, while the p-values of TWAS associations were replicable in large replication samples.

[END]
---
[1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3002782

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/