(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .



A comparison of anatomic and cellular transcriptome structures across 40 human brain diseases [1]

['Yashar Zeighami', 'Douglas Research Centre', 'Department Of Psychiatry', 'Mcgill University', 'Montreal', 'Montreal Neurological Institute', 'Trygve E. Bakken', 'Allen Institute For Brain Science', 'Seattle', 'Washington']

Date: 2023-04

S1 Text. Supporting Figures: Fig A in S1 Text.

Classification and global burden of brain related diseases. Major human brain diseases and classification according to the Global Burden of Disease (GBD) study [1,2] partitioned by 7 broad classes. The GBD study established the standard Disability Adjusted Life Years (DALY) metric to quantify disease burden defined as the years lost due to premature death plus years lived with disability. DALY scores are shown according to the 2019 study for several larger classes with error bars in white indicating minimum and maximum projected loss of life and healthy years. While cerebrovascular diseases including brain ischemia and infarction and related disorders dominate (global 2017 DALY 55.1 million, not shown), the combined toll of psychiatric disorders has nearly twice DALY (110 million). Neurodegenerative diseases account for less (38.2 million) primarily through older populations with Alzheimer’s disease and related dementia (30.5 million) DALY. Color palette for these major GBD classes is used throughout the analysis. Fig B in S1 Text. Neurological disorders and associated genes. (A) Jaccard clustering based on relative percentage of shared genes (shown in gray scale color) between GBD classes for disease genes in this study. Inset numbers: number of genes in intersection, with diagonal total unique number to class. (B) Similar clustering of 40 neurological diseases and disorders. Top panel: fraction of genes uniquely associated with each disease. Color panel: membership GBD class for disease. Details of disease, gene sets, and metadata are given in S1 Table. Whereas the number of unique genes associated to GBD class psychiatric diseases (801) is 6 times larger than neurodegenerative diseases (132), a finer resolution does not reflect this bias with 110 genes (28.6%) unique to bipolar disorder, whereas 31 genes (30.3%) are unique to Parkinson’s disease, 59 (88.0%) unique to hereditary spastic paraplegia. Fig C in S1 Text. Biological process and pathway ontology analysis (www.toppgene.org) of genes uniquely associated with major GBD classes reflect common identifying annotations for these disease classes measured by FDR q-value. Color code in legend for GBD classes is used throughout the analysis. Specific associations of interest include well-known alterations in synapse structure and function (FDR q = 9.56×10−50) [3], and abnormal levels of extracellular neurotransmitter concentrations [4] in several psychiatric and neurologic disorders (q = 1.25×10−22). Major depressive disorder is one of the most important mental disorders associated with altered serotonergic activity [5], with less clear association in schizophrenia [6] and addiction [7]. Recent studies show that chronic type II diabetes mellitus (DM) is closely associated with neurodegeneration (q = 2.07×10−5), especially AD [8]. The primary signaling pathway activated in insulin signaling is the phosphoinositide 3-kinase (PI3K)-protein kinase B (Akt) signaling stream, and defective IGF binding or IRS-1 signaling, as a result of insulin resistance, leads to cognitive decline in patients [9]. Hedgehog (Hh) is one of few signaling pathways that is frequently used during development for intercellular communication, important for organogenesis of almost all organs in mammals, as well as in regeneration and homeostasis. This includes the brain and spinal cord and mutations in the human SHH gene and genes that encode its downstream intracellular signaling pathway cause several clinical disorders, include holoprosencephaly [10]. Brain tumors and other cancers are strongly associated with defects in signal-transduction proteins., and cancers caused by certain viruses have contributed greatly to our understanding of signal-transduction proteins and pathways [11]. Chronic morphine-induced molecular adaptation of the cAMP cascade has been confirmed in many and has been widely related to opioid dependence and withdrawal [12]. These unique GBD class ontology annotations represent molecular function and pathways central to these major classes. Fig D in S1 Text. Transcriptome patterning of 40 brain diseases with clustering removing pairwise overlapping genes also identifies 5 anatomic groups. Most distinctive is the strong match of ADG 1 and ADG 2 demonstrating the identity and distinction of these groups. Removing common genes retains the association of the majority of ADG 3 psychiatric, substance abuse, and movement diseases. The grouping of diseases in ADG 5 is identically preserved in the clustering, overall indicating common structure with Fig 1 and with pairs of diseases contained in the same ADG class with 67% agreement. Fig E in S1 Text. Clustering stability analysis for disorders with high gene count and overlap. To ensure that the co-clustering of psychiatric disorders is not the result of the high number of genes associated with these diseases as well as overlapping genes (see Fig B in S1 Text), we performed a clustering consistency analysis by sampling 200 genes from any disorder with more than 200 genes associated with it, and repeated the clustering analysis with the same N = 5 cluster size requirement. We then repeated this procedure 1,000 times and calculated the number of times each pair of disorders were co-clustered. The figure shows the frequency ratio of co-clustering across these 1,000 repeated analyses and indicates a stable cluster assignment. Fig F in S1 Text. Reproducibility of ADG clustering. A hold out analysis was conducted averaging the z-score normalized expression within each of the identified ADG groups identified in the full analysis of Fig 1 with one of 6 brains data left out. On right annotation, 1 ADG 1 indicates that brain 1 data was removed and diseases in ADG groups averaged in the remaining 5 brains. Data is presented over 57 structures common to all 6 brains. Viewed as rows across structures, the reproducibility of expression patterning is seen to be highly consistent across hold out datasets with average correlation (ADG 1, ADG 2, ADG 3, ADG 4, and ADG 5) = (0.983, 0.971, 0.976, 0.988, and 0.977). Viewed as columns across structures the patterning has consistent differential expression across ADG groups. The annotation bar on top of the heatmap shows the maximum repeatable differential signature observed in each structure. The signature is exact (6) in all hold out brains for 27 structures and agree in all but one for 19 additional structures, only LA, PRF, and Arc displaying variability. The expression signature itself is computed and compared as follows. For each structure and each hold out dataset the z-scored expression values are rank ordered giving a permutation of 1, 2, 3, 4, 5 from lowest to highest across the ADG 1–5. Each expression pattern is assigned a unique integer n through unique prime factorization as n = 2(1)3(2)5(3)7(4)11(5) and these integers are tabulated to find the most occurring pattern across hold out brains. The maximum occurring signature 3–6 is shown in the annotation bar indicating similar conservation of signature to the hold out analysis, with 6 representing the exact relationship of ADG groups in all brains. Fig G in S1 Text. Holdout analysis and ADG. (Diagonal and upper) In each of 6 Allen Human Brain Atlas (AHBA) subjects, the mean disease transcription profile for each of 40 diseases across structures is computed and the most similar (Euclidean distance) disease in the remaining 5 subjects is identified. The upper diagonal matrix shows the distribution of identified diseases with key 0–6 indicating the number assignments to given disease. Thus, ataxia with score 6 has a transcriptomic profile more similar to ataxia for each brain than to any other disease in the remaining brains. Since the closest neighbor is an asymmetric definition, the average of the matrix and its transpose is presented. A majority 29/40 diseases are uniquely identified by majority voting. ADG groups 3, 4, and 5 have high identifiability across subjects while there is higher misclassification between ADG 1 and 2. Percent exact as in Fig 1C is ADG 1–5 (0.716, 0.537 0.644, 0.958, 0.875). Color bar shows Global Burden of Disease (GBD) groups. (Lower diagonal) A more stringent hold out analysis is conducted first eliminating common genes between the diseases as in Fig 1 and by seeking the closest disease in transcriptome profile other than the given disease. Here, the distribution of disease mapping between brains is more variable having within ADG mapping ADG 1–5 (0.361, 0.187, 0.970, 0.175, 0.008). Fig H in S1 Text. Weighted gene clustering of brain disorders. In order to evaluate the effect of gene importance as reflected in the literature, we used the literature-based gene disease association weights provided by the DisGeNET dataset. Each gene–disease association (GDA) has a score based on the following formula: GDA-score = C + M + I + L, where C is based on curated data sources, M is based on mouse and rat animal model reports, I is inferred GDAs from the Human Phenotype Ontology, and GDAs inferred from VDAs reported by Clinvar, the GWAS catalog and GWAS db, and finally, L is based on number of publications reporting the given GDA. More specifically, C(N 1 ) = 0 + 0.3 × (N 1 = = 1) + 0.5 × (N 1 = = 2) + 0.6 × (N 1 >2), and N 1 is number of curated sources including CGI, CLINGEN, GENOMICS ENGLAND, CTD, PSYGENET, ORPHANET, and UNIPROT; M(N 2 ) = 0 + 0.2 × (N 2 > 0), N 2 is number of sources from Mouse and Rat from RGD, MGD, and CTD; I(N 3 ) = 0 + 0.1 × (N 3 > 0), N 2 is number of sources from HPO, CLINVAR, GWASCAT, and GWASDB; L(N 4 ) = 0 + N 4 × 0.01 × (N 4 < = 9) + 0.1 × (N 4 >9), N 4 is the number of publications supporting a GDA in the sources LHGDN and BEFREE (see details in https://www.disgenet.org/dbinfo). Using the GDA-score for each gene disease association, we then calculated a weighted average expression representing the disease-related global gene expression pattern across brain regions that replaces the equally weighted gene expression average. Using this approach, we redid the main analysis for the AHBA dataset. The results show the new approach preserves the main disease categories going from tumor and neurodegenerative disorders toward psychiatric and motor disorders, with a very similar expression pattern across brain regions going from subcortical nuclei to cortical expression as observed in Fig 1A. Overall pairwise disease ADG membership agrees with the original clustering at 85%. Fig I in S1 Text. Temporal evolution of average gene expression across 40 brain disorders. The mean disease-related gene expression was calculated for each disease across brain regions for each time point using BrainSpan dataset (https://www.brainspan.org/) across developmental and adult years. Interestingly, tumor-based disorders expressing genes involved in regulation of cell population proliferation (see Fig C in S1 Text) have a biphasic early life and late expression pattern, while developmental disorders show an early expression and drug abuse and psychiatric disorders show higher expression later, followed by a later stage expression in certain movement related and neurodegenerative disorders. We emphasize that one must be cautious to draw exact conclusions from these patterns since they are averaged across a multitude of genes and brain structures with heterogeneous gene expression patterns and this figure only shows the most dominant modes of expression across lifespan that survive in the averaging process. Based on proximity in the hierarchical clustering, the clustering preserves many of the adult associations based on proximity in the dendrogram. Annotation shows that GBD associations of diseases moderately agree. Fig J in S1 Text. Pairwise comparison of ADG. Pairwise B&H corrected (BH < 0.05) t tests between ADG groups 1–5. Individual t tests highlight the distinction in cortex expression between ADG 3 and other groups. The most significant structural ADG differences occur between ADG 1–3 in cortex (frontal lobe (FL, p<2.71×10−7)), short insular gyri (SIG 6.2×10−9), long insular gyri (LIG, 5.57×10−8), in amygdala, basolateral nucleus (BLA, 1.8×10−9), basomedial nucleus (BMA, 4.49×10−10), in cerebellar nuclei, globose nucleus (Glo, 1.18×10−9), and myelencephalon, vestibular nuclei (8Ve, 2.34×10−8). ADG 2 and 3 are distinguished in hippocampus, (CA1, 2.18×10−8), subiculum (S, 8.31×10−8), in amygdala (AMG), amygdalo-hippocampal transition zone (ATZ 1.94×10−10, BLA, 1.00×10−10, BMA, 5.63×10−10), and between ADG 3 and 4 thalamus, anterior group of nuclei (DTA, 3.01×10−7), lateral group of nuclei, dorsal division, (DTLv, 6.47×10−9), and hypothalamus, posterior hypothalamic area (PHA, 1.21×10−6). While there is not significant variation in the thalamus (TH, p = 0.338), myelencephalon (0.247), and cerebellum (CB, 0.966), differential telencephalic expression between psychiatric, substance abuse, and movement groups (ADG 3) and other ADGs is demonstrated by applying paired t tests between groups. Here, ADG 1 and ADG 3 are distinguished through differences in frontal lobe (FL, p < 2.71 × 10−7), hippocampus, dentate gyrus (DG, p < 3.46 × 10−6), and amygdala, basomedial nucleus (BMA, p < 4,49 × 10−10). Finally, ADG 4 and 5 differences are characterized by diencephalon expression: thalamus, anterior group of nuclei (DTA, p < 3.01 × 10−7), lateral group of nuclei, dorsal division (DTLv and hypothalamus, posterior hypothalamic area (PHA, p < 1.21 × 10−6)). Fig K in S1 Text. Expression levels of brain and non-brain diseases. (A) Expression levels of genes from Allen Human Brain Atlas (AHBA) classified as brain disease associated from this study (green), non-brain brain disease associated from OMIM study of [13] (gray) and remaining genes of AHBA not in these sets (red). Brain disease genes do not have significant expression differences from non-brain related genes, but both are different from non-disease associated genes with marginal significance. (B) Distribution of differential stability (DS) by major Global Burden of Disease classes. Horizontal mean ρ = 0.521 of 17,348 genes, with p-values shows significance (corrected for class size) of GBD mean differing from global mean. (C) Disease gene stability for 40 diseases sorted by median DS; colors are GBD classification. Minimum and maximum stable genes for each disease are shown. DS: differential stability. The set of high DS genes annotated (right) is substantially enriched for Gene Ontology biological processes and pathways compared to lower DS (left). Fig L in S1 Text. Anatomic markers for DS genes. For each of the 40 diseases, the highest and lowest differentially stable (DS) genes are selected. This results in 36 unique genes for low DS and 32 for high DS whose expression profiles are shown top (low DS) and bottom (high DS). High DS genes select for structural anatomic markers and cell types. This general expression consistency, less randomness, and reduced variation is seen for the expression profile of high DS genes. Fig M in S1 Text. Disease-associated canonical expression modules. Canonical module M1-M32 expression patterns are highly consistent across all 6 AHBA individuals, and patterns identified using any 5 brains could be found reproducibly in the sixth [13]. The modules range from structure-specific markers to complex co-expression patterns in the data, and several of the modules are specific to the ADG 1–5 groups. In addition to M1, M12 cited in the manuscript, M2 defines hippocampal expressing genes and M6 cortex-hippocampus co-expression; both are strongly represented by diseases in ADG 3. Representative genes and their correlation to the module eigengene are shown, PRKCA, STX1A is implicated in schizophrenia [14,15], ITGA4, MEF2C in autistic disorder [16,17]. M10 defines striatum expressing genes and is common among ADG 3 and 4 diseases. ADORA2A has been studied in amphetamine-related [18], depressive disorders and schizophrenia [19], and ANO3 in dystonia [20], Parkinson’s disease, ALDH1A2 in Parkinsonian disorders [21] and schizophrenia [22], SEMA5A, autistic disorder [23]. Modules M24 and M25 are highly glial enriched and common in ADG 1 and 2 diseases and effectively absent in ADG 3–5. FANCG has been studied in neurofibromatosis 1 [24], PPM1D in glioma [25], AIF1, Parkinson’s disease [26], and TREM2 in Alzheimer’s disease [27], amyotrophic lateral sclerosis [28]. Fig N in S1 Text. ADG group comparison within canonical modules. Corrected t tests between ADG groups for average disease correlation to the 32 canonical modules M1-32. Each set of data in the test consists of the correlation values in Fig 2C for those diseases in the corresponding ADG group at a fixed module. The tests are performed for all 6 pairs and each module independently. The -log10 Benjamini–Hochberg corrected values shown further validate the clustering of Fig 1 and provide more insight into the cell patterning of ADG groups. Fig O in S1 Text. Holdout analysis on canonical modules and ADG. Comparison of holdout analysis for mean profile of Fig 1 and based on canonical modules Fig 2. (A) Reproduction of holdout analysis for AHBA mean profile as in S6 Fig (upper diagonal.) In each of 6 Allen Human Brain Atlas (AHBA) subjects, the mean disease transcription profile across structures is computed and the most similar (Euclidean distance) disease in the remaining 5 subjects is identified. The matrix shows the distribution of identified diseases with key 0–6 indicating the number assignments to given disease. Perfect agreement in all subjects is a 6. (B) Similar analysis using canonical module assignments for 6 AHBA brains. Module-based assignment shows better definition of ADG 1 and 2 and less variance in ADG 3 with main psychiatric diseases, bipolar, schizophrenia, autistic disorder, and depression more closely identified. (C, D) Classification results by ADG and GBD categories. (E) Performance results for ADG and GBD comparing mean and module profiling. Mean is based on Fig 1, Fig F in S1 Text analysis; module based on canonical module assignments. ADG or GBD label indicates that the correct class was identified, Exact indicates that precise disease was identified. Mean ADG class is reduced 10% for modules but exact disease specification is improved 4%, while for GBD groupings there is both improvement of 4.5% across all classes and for 4% exact disease identification. Fig P in S1 Text. Human MTG cellular data, expression level, specificity, and diseases. (A) RNA-seq gene expression quantification with absolute expression levels estimated as counts per million (CPM) using exonic reads from [29]. (B) Cell type specificity was calculated based on the Tau-score (τ) defined in [30]. This measure has previously been employed using the same dataset [29]. Distribution of τ for brain disease associated, non-brain disease, and unassociated genes. (C) Bar distribution plots for cell type specificity for 24 cortex expressing diseases, ordered by median specificity and colored by phenotypic GBD class. The correlation between the cell type-specific tau score and the mesoscale differential stability metric is 0.445. Fig Q in S1 Text. Comparing cell type clusters (CTG). Corrected paired t tests are used to compare significant expression differences between pairs of CTG groups, e.g., CTG 1 –CTG 2, at a fixed cell type. Overbar: ANOVA at each of 75 fixed cell types and clustered as in Fig 3 over 3 CTG groups. The highest variability is seen among IT excitatory and non-neuronal cell types and at the subclass level GABAergic Vip cell types, consistent with the excitatory and inhibitory gradients of Fig 3. Fig R in S1 Text. (A) Clustering matrices for correlation between 24 cortically expressing diseases based on non-overlapping genes for both HBA and cell type MTG data. Data is shown for both matrices (upper diagonal MTG, lower diagonal AHBA) with clustering based on MTG data of Fig 3. There is general structural correspondence of these matrices and overall disease–disease Pearson correlation between the matrices is ρ = 0.615. (B) For each of these 2D embeddings and each disease, the mean Euclidean distance from each disease to other diseases within the same GBD group is computed, as well as the mean distance to diseases not in that GBD group. The ratio of these quantities GBD(d i ) is a measure of relative association of that disease with other diseases in the same GBD class. In symbols, as . Diseases are then grouped by their GBD class showing general agreement between the approaches, except astrocytoma which is a significant outlier better classified using the mesoscale HBA data. Solid color: AHBA brain wide, dark gray: MTG cell type, light gray: consensus. Fig S in S1 Text. Expression profiles of unique genes in autism, bipolar disorder, and schizophrenia. Gene expression normalized for uniquely expressing genes in autism (n = 19), bipolar disorder (n = 20), and schizophrenia (n = 25) clustered by expression level over 24 excitatory cell types. The 3 diseases show distinct expression profiles across excitatory types with schizophrenia widely expressing most genes. Fig T in S1 Text. Human and mouse EWCE distributions. (A) Aligned transcriptomic taxonomy of cell types in human MTG to 2 distinct mouse cortical areas, primary visual cortex (V1), and a premotor area, the anterior lateral motor cortex (ALM) from [29] allows comparison of cell type enrichments between species. Scatterplot of disease-subclass EWCE values for mouse and human colored by CTG 1–4. Pie chart insets show percentages of CTG and GBD phenotypic classes of top 10% outliers from the regression line, representing most significant EWCE differences. Percentages (CTG 1, 0.363; CTG 2, 0.252; CTG 3, 0.220; CTG 4, 0.163). GBD Phenotype (Psychiatric, 0.137; Substance, 0.180; Movement, 0.125; Neurodegenerative 0.05; Brain tumors, 0.112; Developmental, 0.244; Brain Related, 0.150). (B) Significant species distinct EWCE based on FDR-correction of permutation based p-values by disease and cell type. Fig 5C of the main manuscript displays the EWCE values, whereas here, those values having significant p-values in either species are shown. Disease clustering is as in Fig 3 with the same annotations and with color code (blue: human, orange: mouse, black: both species). Top barplot: number of cell type enrichments by species.

https://doi.org/10.1371/journal.pbio.3002058.s001

(DOCX)

[END]
---
[1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3002058

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/