(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org.
Licensed under Creative Commons Attribution (CC BY) license.
url:
https://journals.plos.org/plosone/s/licenses-and-copyright
------------
Integrative analysis of genomic variants reveals new associations of candidate haploinsufficient genes with congenital heart disease
['Enrique Audain', 'Department Of Congenital Heart Disease', 'Pediatric Cardiology', 'University Hospital Of Schleswig-Holstein', 'Kiel', 'German Center For Cardiovascular Research', 'Dzhk', 'Anna Wilsdon', 'School Of Life Sciences', 'University Of Nottingham']
Date: 2021-09
Numerous genetic studies have established a role for rare genomic variants in Congenital Heart Disease (CHD) at the copy number variation (CNV) and de novo variant (DNV) level. To identify novel haploinsufficient CHD disease genes, we performed an integrative analysis of CNVs and DNVs identified in probands with CHD including cases with sporadic thoracic aortic aneurysm. We assembled CNV data from 7,958 cases and 14,082 controls and performed a gene-wise analysis of the burden of rare genomic deletions in cases versus controls. In addition, we performed variation rate testing for DNVs identified in 2,489 parent-offspring trios. Our analysis revealed 21 genes which were significantly affected by rare CNVs and/or DNVs in probands. Fourteen of these genes have previously been associated with CHD while the remaining genes (FEZ1, MYO16, ARID1B, NALCN, WAC, KDM5B and WHSC1) have only been associated in small cases series or show new associations with CHD. In addition, a systems level analysis revealed affected protein-protein interaction networks involved in Notch signaling pathway, heart morphogenesis, DNA repair and cilia/centrosome function. Taken together, this approach highlights the importance of re-analyzing existing datasets to strengthen disease association and identify novel disease genes and pathways.
Congenital heart disease (CHD) is the most common congenital anomaly and represents a major global health burden. Multiple studies have identified a key genetic component contributing to the aetiology of CHD. However, despite the advances in the field of CHD within the last three decades, the genetic causes underlying CHD are still not fully understood. Herein we have assembled a large patient CHD cohort and performed a data-driven meta-analysis of genomic variants in CHD. This analysis has allowed us to strengthen the disease association of known CHD genes, as well as identifying novel haploinsufficient CHD candidate genes.
Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: The Department of Molecular and Human Genetics at Baylor College of Medicine receives revenue from clinical genetic testing conducted at Baylor Genetics Laboratories. M.E.H. is a co-founder of, consultant to and holds shares in Congenica, a genetics diagnostic company.
Funding: This study was supported by the German Center for Cardiovascular Research (DZHK) partner sites Berlin, Kiel; the Competence Network for Congenital Heart Defects, National Register for Congenital Heart Defects and KinderHerz (E.A. and M.P.H). BK is supported by a British Heart Foundation Personal Chair (CH/13/2/30154). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Data Availability: All relevant data are within the manuscript and its Supporting Information files. The data used in this study have been already published. The reference for each individual study is shown in S2 and S3 Tables. The assembled DNV dataset used in this study is provided in the S14 Table . In addition, we have provided a BED file with the CNV dataset ( S15 Table ).
This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Overlaying both CNVs and PTVs has been used to define novel CHD relevant disease genes in contiguous gene disorders [ 8 , 9 ]. Following this principle, we have performed a genome-wide integrative meta-analysis of published and publicly available datasets of CNVs and DNVs identified in probands with CHD. This analysis, which is one of the larger meta-analyses of genomic variants in CHD so far, strengthens the disease association of known CHD genes and identifies novel haploinsufficient CHD candidate genes.
CHD is multifactorial, with both environmental and genetic risk factors [ 3 , 4 ]. Familial aggregation of CHD including Thoracic aortic aneurysm (TAA), as well as a large proportion of genomic copy number variants (CNVs) and de novo intragenic variations (DNVs) in probands with CHD suggest a strong genetic component. An estimated 4–20% of CHD cases are due to rare CNVs, suggesting that a significant part of CHD is caused by gene-dosage defects [ 5 ]. Recently, exome sequencing in large cohorts has been used to identify novel disease genes and strengthen known disease associations through the demonstration of an excess of de novo protein truncating variants (PTV) and rare inherited loss-of-function (LOF) variants in probands with CHD [ 6 , 7 ].
Congenital Heart Disease (CHD) accounts for a large fraction of foetal and infant deaths, with incidence rates ranging from 7–9 per 1000 live births [ 1 ]. Within the last 30 years, survival rates have substantially increased due to improvements in surgical, interventional and clinical intensive care resulting in a rapidly growing number of CHD survivors reaching adulthood [ 2 ]. Nevertheless, there is still increased morbidity and mortality in individuals with CHD, resource utilization is high especially among severely affected patients, and importantly, the underlying etiology remains unclear for the majority of cases.
The protein-protein interaction networks (a-d, for clusters 1, 3, 8 and 9 respectively) were identified using GeNets ( S4 Fig ). Proteins are shown as nodes, interactions as edges. Enrichment for CNVs (blue) and DNVs (green) are highlighted. Proteins with no specific enrichment for CNV and/or DNVs but with B-H adjusted metaP < 0.05 are highlighted in red. The size of the circles denotes if the genes was found significantly highly and/or differentially expressed in the heart (large circles: significant expression; small circles: non-significant). The distribution of CHD case-CNVs and control-CNVs are shown for each cluster. Significant difference in the CNV distribution was calculated using a Wilcox rank sum test. The horizontal bar plots show the top ten GO enriched terms for each cluster (output from Enrichr tool). X-axis in the horizontal bar plot denotes the combined score from Enrichr, which is computed by multiplying the log-transformed p-value and the z-score. Bar color encoded the GO biological process significant level (dark blue: FDR < 5%, light blue: FDR 5–10%, grey: FDR > 10%).
CNVs and DNVs can affect heart development either through haploinsufficiency of a single gene, or through its combined impact on the function of several genes. Indeed, oligogenic models have been implicated in CHD, and proteins acting in the same complex or pathway are known to be encoded in genomic clusters [ 18 , 19 ]. We therefore conducted a systems-level analysis to identify global mechanisms by which haploinsufficiency might promote CHD. In particular, we assessed the combined effect of CNVs and DNVs with respect to human protein-protein interactions (PPIs). The InWeb and ConsesusPathDB databases provides ranked information about experimentally determined physical interactions and, therefore, serves as a proxy to understand the functional effects of CNV/DNVs on human protein complexes (Materials and Methods). The genes with Benjamini–Hochberg adjusted metaP < 0.05 (n = 492 genes) were used as seeds to build a PPI network from the data available in InWEb and ConsessusPathDB. No additional interections were considered. The final network consisted of 164 proteins and 290 interactions ( S4 Fig ). A total of 10 overlapping sub-clusters within this network were identified using the in-built clustering algorithm implemented in GeNets [ 20 ] (Materials and Methods). Gene-ontology (GO) enrichment analysis suggested that four out of these ten sub-clusters are enriched for genes involved in Notch signaling pathway, cardiocyte differentiation, DNA repair and centrosome function ( Fig 6 ). All the four clusters accommodate more CNV deletions in CHD cases compared to controls. Six out of the ten sub-clusters did not show significant enrichment for any particular biological process.
Panels show three different heart development stages: early development, maturation and infant/adult. X-axis denotes the combined p-value from DNV and CNV analysis (metaP, at different cut-offs). Y-axis denotes the genes’ mean expression in the heart (log scale). The 21 significant candidate CHD genes ( Table 1 ) are contained in the fraction with the higher expression (red box). Differences between the distributions were tested using a two-sided Wilcoxon rank sum test (reference group: all genes). ****: P<0.0001, ***: P<0.001, **: P<0.01, *: P<0.05, ns: non-significant.
We next evaluated the expression pattern of the 21 significant genes (Bonferroni corrected metaP < 0.05) in the heart using RNA-Seq data from human tissues at different developmental time points [ 17 ]. We stratified the analysis based on stages of heart development (see Materials and Methods ). Our analysis revealed that the most significant genes (metaP < 1 x 10e -5 ) show significantly increased mean expression in the heart (P < 0.0001, Wilcox test) at different developmental stages (development, maturation and infant/adult), compared to all protein-coding genes ( Fig 5 ). Moreover, 18 out of 21 genes fall in the in the top quartile of heart expression in both developmental and maturation stages ( S3 Fig ). To complement our expression analysis, we compared gene expression during human heart development with expression in two other mesodermal organs: kidney and liver. This allowed identification of genes with significant changes in its expression levels during crucial heart developmental stages, which would have not been possible when focusing on expression levels alone (Materials and Methods). We found that 17 out of 21 CHD candidate genes are differentially expressed in the heart (R 2 > 0.50, Bonferroni corrected P < 0.01) when compared to its expression levels in kidney and/or liver. Interestingly, the three genes (FEZ1, NALCN and MYO16) which are not among the highly expressed genes, were found to be significantly differentially expressed during heart development compared to kidney and/or liver ( S12 Table ).
As described above, we performed the same integrative approach for the LVOTO and conotruncal groups to identify CHD subtype-specific genes. Our analysis showed four significant genes ( S11 Table ). Three are observed in LVOTO (KMT2D, KMT2A and TAB2) and a single gene showed significant enrichment in the conotruncal subtype (NSD1).
We performed a further analysis based on specific CHD subtypes in addition to the collective analysis of all CHD phenotypes. The analysis focused on simplex CHD cases, within two main categories: Conotruncal (N CNV = 873, N DNV = 234) and LVOTO (N CNV = 594, N DNV = 351). We only included cases with a clear phenotypic description and without any overlapping phenotypic features between the two categories (LVOTO/Conotruncal). The conotruncal group consisted mostly of Tetralogy of Fallot (TOF), Truncus arteriosus and Transposition of great arteries (TGA), whereas LVOTO mainly constituted Aortic stenosis (AS) including Bicuspid aortic valve disease (BAV), Coarctation and Hypoplastic left heart syndrome (HLHS).
Cases/Controls: Number of cases and controls carrying CNV deletions overlapping the gene in the CNV analysis. P cnv : p-value from the CNV permutation test. nsDNV: Number of constrained non-synonymous variations in the de novo analysis. P dnv : p-value from the DNV analysis. Significant: The analysis where the gene was significant (dnv: DNV analysis, cnv: CNV analysis, both: Both analysis, none: Non-significant neither DNV nor CNV analysis). metaP: combined p-value (P dnv and P cnv ) using the Fisher method. P ihw : Bonferroni corrected p-value using independent hypothesis weighting (IHW) and LOEUF metric as covariate. LOEUF: o/e LOF ratio upper bound fraction from gnomAD. *All the 21 genes were significant after combining their p-values and applying Bonferroni correction. 1 Evidence is from mouse models [ 24 , 62 ].
To identify high confidence haploinsufficient CHD disease genes, we performed a joint analysis integrating the results from the CNV and the DNV analysis. We combined the results from both analyses (P dnv and P cnv ) using the Fisher combine method. We demonstrated that both enriched genes for DNV and CNV deletions are significantly represented among LOF constraint genes (measured by the o/e LOF ratio). Therefore, we applied a Bonferroni multiple testing correction using independent hypothesis weighting [ 16 ] (IHW) by incorporating the gene o/e LOF ratio, as a measure of haploinsufficiency ( S2 Fig ). Our analysis revealed 21 genes that were significantly enriched for CNV deletions and/or non-synonymous DNV ( Table 1 ). A gene was included in the final set of haploinsufficient CHD disease genes if it reached a significant corrected metaP < 0.05 (after Bonferroni adjustment with IHW).
X-axis denotes the P-values from the DNV analysis (binned). Y-axis denotes the o/e LOF ratio upper bound fraction (LOEUF). All groups were compared against the LOEUF distribution of all protein-coding genes (purple). Differences between the distributions were tested using a two-sided Wilcoxon rank sum test. ****: P<0.0001, ns: non-significant.
We next evaluated the distribution of o/e LOF ratio at different levels of DNV enrichment (genes were split based on P dnv ). Since the o/e ratio of LOF variation in each gene is strongly affected by its length, we instead used the 90% upper bound of its confidence interval (termed LOEUF), which keeps the direct estimate of the o/e ratio and allows to distinguish small genes from large genes, as suggested by Karczewski et al [ 10 ]. We observed that the genes with higher enrichment for nsDNV (lower P dnv ) show a significant decreased LOEUF compared to the mean of all protein-coding genes ( Fig 4 ).
We identified 14 genes significantly enriched in the DNV analysis (P < 0.05 after Bonferroni correction for multiple testing, S10 Table ). All of these genes were affected by at least two constrained non-synonymous DNV (nsDNV) and show significant overlap with 11/14 (78.6%) of the genes being known CHD disease genes. CHD7 (OMIM 214800) was the most significant haploinsufficient gene (P = 2.84 x 10 −26 ) with 18 nsDNVs identified in the patient cohort. Other highly enriched genes for nsDNV—KMT2D (OMIM 147920), KMT2A (OMIM 605130), NSD1 (OMIM 117550), TAB2 (OMIM 614980), and ADNP (OMIM 615873)—have been previously associated with different types of neurodevelopmental disorders with co-occurrence of CHD. In the case of KDM5B (OMIM 618109), it has only been described in the context of a recessive neurodevelopmental phenotype with cases presenting ASD (Atrial septal defects) [ 14 , 15 ].
We used two available statistical methods, Mupit [ 12 ] and DeNovoWEST [ 13 ], which test the significance of observed DNV at gene level, by comparing the number of observed variations with the number of expected variations (based on a sequence-dependent variation recurrence rate, see Materials and Methods ). While Mupit focuses on enrichment of protein-truncating DNVs specifically, the DeNovoWEST test incorporates missense constraint information at variant level and applies a unified severity scale at variant level based on the empirically-estimated positive predictive value of being pathogenic. Based on the complementary results of both tests [ 13 ], we reported the minimal observed DNV p-value (P dnv ) per gene.
To identify an independent set of haploinsufficient CHD candidate genes, we combined de novo variations identified in two large-scale CHD case-control studies [ 6 , 7 ] and performed a gene-based de novo variation (DNV) burden test [ 12 ]. We analysed a total of 4,195 rare DNVs within 2,534 genes in the patient cohort. After classifying every variant into functional groups (Materials and Methods) 526 of these variants were predicted to be protein-truncating and 2,647 were missense. We evaluated for potential differences of the DNV rates between cohorts (see Materials and Methods ). Comparison of the rate of each variant type across the groups was non-significant (P > 0.05, Poisson test, S9 Table ).
We independently performed a genome-wide test without the TAA cases to evaluate its impact on CHD. As expected, most of the genes (447 out of 528) remained significant after removing the contribution of TAA cases, since ~90% of the cases in the analyzed CNV cohort were CHD. Ten genes were significantly enriched independently when analyzing CHD and TAA cases, while 61 were significantly enriched only in TAA cases ( S8 Table ).
In addition, we tested previously described CNV deletion syndrome regions (
https://decipher.sanger.ac.uk/disorders#syndromes/overview ) associated with developmental disorders and/or CHD for enrichment in our analysis (Materials and Methods). We found eight of these regions enriched in the dataset ( S7 Table ), with the 16p11.2-p12.2 locus being the region with the largest number of deletions in cases (n = 230).
The plot shows the distribution of rare CNV deletions (green track) in CHD cases, the differences between the overlapping CNV deletions in cases and controls (black track) and highlight the location of the 63 significant loci discovered (in magenta).
To perform a systematic, genome-wide identification of potential haploinsufficient CHD disease genes and loci, we analysed the CNV burden of 19,969 protein-coding genes (GENCODE v19). To this end, we compared the number of rare CNV deletions (MAF < 0.01) among cases and controls for each gene, and identified genes with significant CNV burden using a permutation test (significance level of adjusted P < 0.05, see Materials and Methods ). If a CNV spanned two or more genes, all affected protein-coding genes were considered in the analysis. The distributions of rare CNV deletions in CHD cases across all 22 human autosomes is shown in Fig 3 . Significant candidate genes had a median number of 12 overlapping CNVs in cases, compared to a median of 0 overlapping CNVs in controls ( S1A Fig ). Because CNVs can be large chromosomal aberrations, multiple genes were affected by some of the CNVs. In total, 528 genes (Sheet A in S6 Table ) reached significance (Permutation test, P adjusted < 0.05). These 528 genes encompass a total of 63 loci (Sheet B in S6 Table , highlighted in magenta in Fig 3 ). The sizes of these loci range from 558 bp to 10.5 Mbp, with a median value of 243 Kbp ( S1B Fig ). The number of genes per locus ranged from 1 to 48, with a median value of 3 ( S1C Fig ). Only 16 loci contained a single gene (Sheet B in S6 Table ).
When the regression-based analysis was performed at different levels of the observed/expected LOF ratio (oeLOF) constraint metric ( Fig 2 and S5 Table ), we observed the higher enrichment toward the most LOF constrained genes (oeLOF < 0.01, P = 9.55 x 10 −18 , OR = 1.40) and still a moderate enrichment for genes with oeLOF < 0.1 (P = 0.002, OR = 1.09). No enrichment was observed in the set with oeLOF ratio > = 0.1 (P = 0.03, OR = 0.99). Based on these results we conclude that haploinsufficiency causes a significant component of CHD.
Haploinsufficiency has been shown to cause a reasonable proportion of CHD [ 5 ]. Thus, genes known to be associated with CHD and genes which are intolerant for LOF variations should be deleted more often in probands with CHD than in controls. To test this hypothesis, we performed a CNV burden test using sets of genes known to be involved in CHD. In addition, we included genes known to be associated with developmental disorders, a curated list of known haploinsufficient disease genes, autosomal recessive disease genes and genes predicted to be intolerant to LOF variations (based on the observed/expected LOF ratio from gnomAD [ 10 ]). The burden test was performed using a logistic regression framework [ 11 ] (implemented in PLINK v1.7). Fig 1 and S4 Table summarize the results from the burden test on the different gene sets: known CHD genes (grouped in syndromic, non-syndromic, monoallelic and biallelic), developmental disorder genes, haploinsufficiency disease genes, autosomal recessive genes and all protein-coding genes. We tested all protein-coding genes to address the possibility that the analyses could be biased by differences in the CNV rate within the case and control groups, since we have assembled our cohort from different datasets. We did not observe genome-wide (all tested protein-coding genes) enrichment (P = 0.39, OR = 0.99) nor enrichment in the autosomal recessive gene set (P = 0.52, OR = 1.03) when comparing rare CNV deletions in cases vs controls. In contrast, the analysis revealed significant differences in the burden of CNV deletions between cases and controls for the set of haploinsufficiency genes (P = 8.29 x 10 −13 , OR = 2.27). As expected, our analysis revealed significant enrichment for the set of known CHD genes, which is mainly explained by the contribution of monoallelic CHD genes (P = 2.04 x 10 −31 , OR = 4.13) and syndromic CHD gene set (P = 1.66 x 10 −33 , OR = 4.06). Unlike the monoallelic and syndromic CHD gene sets, no significant enrichment was found for the nonsyndromic (P = 0.75, OR = 1.16) and biallelic (P = 0.08, OR = 1.87) CHD gene sets. Our analysis revealed a moderate enrichment of rare CNVs in the developmental disorder gene set (P = 6.90 x 10 −11 , OR = 1.75).
We assembled a cohort with 7,958 cases (comprising both non-syndromic CHD, syndromic CHD and TAA cases) and 14,082 controls ( S1 Table ). Of the total of cases, 777 (~10%) were diagnosed with Thoracic Aortic Aneurysm (TAA). An overview of the sources used to assemble the present cohort is listed in S2 Table (for CHD cases) and S3 Table (for controls). We applied a set of quality control filters to our assembled CNV data before performing case-control association tests (Materials and Methods). In addition, common CNVs (minor allele frequency (MAF) in controls > 0.01) were excluded from the analysis. After filtering, 6,746 cases and 14,024 controls remained for further downstream analysis. Furthermore, we built a dataset of de novo variations (DNVs) identified in 2,489 probands with CHD from parent-offspring trios [ 6 , 7 ].
Discussion
We performed a meta-analysis of rare genomic variants in a cohort of 10,447 CHD probands, which provides a useful resource for interpreting CNVs and DNVs identified in patients with CHD. We implemented a statistical approach which allows the integration of different types of genomic variants to discover novel genes associated with CHD. Our data-driven integrative analysis took into account three major criteria at the genomic level: a) gene enrichment for DNVs, b) gene enrichment for CNV deletions and c) gene intolerance for LOF variations. Our analysis identified 21 significant haploinsufficient CHD genes. Fourteen of these are known CHD genes, and the remaining seven genes have not previously been associated with CHD (Table 1).
To further strengthen associations, we made use of a newly published human transcriptome atlas covering different developmental, maturation and adult stages in numerous organs [17]. Similar to previous results [7], our analysis highlights that the majority of the 21 significant genes are highly expressed during critical stages of heart development. Unlike earlier studies [7,21] which did not address the importance of expression changes over time, we evaluated the differential expression patterns of genes by comparing levels of expression in the heart, kidney and liver at different time points in development. This analysis allowed us to strengthen disease association for genes not falling under the high expression group and highlight the critical importance of all 21 genes independently of the genomic approach. This aspect is complemented by the fact that the majority of genes (14/21) were already known to cause CHD. To further strengthen disease associations, spatiotemporal expression at single-cell resolution during critical cardiac developmental timepoints and analyses of animal models with targeted mutation in the candidate disease genes is warranted. This could strengthen disease association further and provide pathophysiological information.
Among the 21 likely haploinsufficient disease genes for which the combined analyses showed enrichment (Bonferroni corrected metaP < 0.05), 14 genes (CHD7, KMT2D, KMT2A, NOTCH1, NSD1, TAB2, ANKRD11, ADNP, DYRK1A, RBFOX2, KANSL1, ELN, MED13L and GATA6) are well-established CHD genes, and our data confirms this association. To the best of our knowledge, association between CHD and seven genes (KDM5B, WHSC1, WAC, NALCN, ARID1B, FEZ1 and MYO16) had either not been established, or had been reported in small cases studies or a single individual only.
KDM5B is not an established CHD gene thus far, although one patient with compound heterozygous frameshift variants had an ASD [14]. While some have argued against the haploinsufficiency of the gene [22], our analysis suggests KDM5B as a plausible haploinsufficient CHD gene. Additional functional studies are warranted to confirm its role in CHD.
A recent CNV meta-analysis [23] based on non-syndromic CHD patients found that duplication of WHSC1 (also known as NSD2) is a possible cause of CHD. However, haploinsufficiency of WHSC1 has not previously been associated with CHD. In support of its role in CHD, Whsc1 has been reported to cause heart malformations in mouse models [24]. In addition, WHSC1 is known to interact with NKX2.5 [24]. In spite of this, the low incidence of CHD in individuals with Wolf-Hirschhorn syndrome suggests that haploinsufficiency of WHSC1 alone does not cause CHD.
Heterozygous truncating variations in WAC, as well as CNV deletions involving this gene, have been recently associated with the DeSanto-Shinawi syndrome, a rare neurodevelopmental disorder characterized by global developmental delay [25,26]. Furthermore, in two non-consanguineous unrelated individuals with heart malformations, among other disorders [27], microdeletions at 10p11.23-p12.1 (overlapping ARMC4, MPP7, BAMBI and WAC) were identified. Despite these isolated reports, no definite association between WAC and CHD has been established.
DNVs in NALCN have been reported to cause a dominant condition characterized by multiple features including developmental delay, congenital contractures of the limbs and face and hypotonia [28,29]. However, among the phenotypes observed, CHD have been not described thus far.
Heterozygous variation of ARID1B is a frequent cause of intellectual disability [30,31]. A recent analysis of 143 patients with ARID1B variations showed that individuals display a spectrum of clinical characteristics. Congenital heart defects were observed in 19.5% of the patients [32].
FEZ1 is a neurodevelopmental gene, which has been associated with schizophrenia [33]. Fez1 has been reported to be regulated by Nkx2-5 in heart progenitors in mice, suggesting a possible role in heart development [34].
MYO16 (NYAP3) encodes an unconventional myosin protein, involved in regulation of neuronal morphogenesis [35]. We have not found an association between MYO16 and heart development in the literature.
Although, several genes have been shown to be altered in syndromic and non-syndromic cases with CHD and TAA (e.g. HEY2 [36], MYH11 and NOTCH1 [37]), among the 10 genes significant in our analysis for TAA, CHD and the combined scenario, none has been reported previously to be associated with either CHD or TAA. Given the limited data size and only accessing CNV calls from TAA cases, future studies looking at CNV and DNV in both phenotypes are required to establish stronger genotype-phenotype correlation to better understand a possibly shared genetic architecture for the two disease entities.
Also, our study did not identify strong signals associated with specific CHD subtypes. We identified four genes with significant enrichment (adjusted metaP < 0.05) within the two evaluated CHD subtypes (LVOTO and conotruncal defects). Three were associated with LVOTO, and the contribution was mainly from DNV. KMT2D was enriched in LVOTO, which is consistent with the reported spectrum of CHD in patients with Kabuki syndrome, where a large proportion of individuals have LVOTO type CHD [38,39]. PDA and septal defects predominate in Wiedemann-Steiner Syndrome (KMT2A). However, aortic insufficiency and BAV have all been reported [40] suggesting that LVOTO might form part of the phenotypic spectrum. TAB2 was also enriched in LVOTO. Mutations in TAB2 are associated with a wide range of cardiac phenotypes [41–43]. Deletions at 6q24 causing haploinsufficiency of TAB2 have been associated with outflow tract abnormalities, including LVOTO [44,45], which might point to a role of TAB2 LVOTO pathogenesis. NSD1 was the only gene identified as significant among conotruncal defects. The significance of this is unclear as septal defects predominate in Sotos Syndrome. However, patients in previous studies were ascertained focusing on Sotos syndrome (OMIM 117550) rather than CHD. Given the current size of each CHD subgroup and the low number of CNV and DNV events in these genes, these results cannot be considered conclusive. More precise phenotypic descriptions for each individual are necessary to increase the genotype-phenotype correlation for individual genes and improve the identification of genetic subnetworks, along with larger sample sizes.
In addition to the gene-centered analysis, we also applied a systems-level analysis in order to identify potential novel pathophysiological mechanisms affected by haploinsufficiency. In this approach, we took advantage of GeNets [20], a computational framework for the analysis of protein-protein interactions, developed for the interpretation of genomic data. Our analysis allowed us to identify PPI clusters enriched for genes affected by CNVs and/or DNV in patients. Furthermore, GO enrichment analysis suggested distinct biological functions for four of these clusters.
Cluster 1 (Fig 6A) contains proteins involved in the Notch signaling pathway. Our data corroborate previous studies that confirm the central role of Notch pathway in the pathophysiology of CHD [46] and highlights the shared contribution of CNVs and DNVs within the cluster. Cluster 3 (Fig 6B) contains proteins driving essentials processes in the development of the heart such as atrial septum and cardiac right ventricle morphogenesis as well as proteins playing significant role in the positive regulation of gene expression. These mechanisms has been well studied elsewhere [47]. Interestingly, three out of the seven candidate novel CHD genes (WAC, ARID1B and KDMB5) were found to be contributing to these two clusters. Cluster 8 (Fig 6C) showed enrichment for processes related with chromosome organization and DNA repair. Association between DNA repair and CHD is not well established thus far. Cluster 9 (Fig 6D) was found to be associated with microtubule organizing function. This biological process has been not described in the context of CHD, although an earlier report [48] describes complex CHD among the phenotypes in individuals with 15q11.2 deletion syndrome, which involves the tubulin gamma complex protein 5.
Given the heterogenous data sources and the complex inheritance patterns often observed in patients with CHD, our study has limitations. Firstly, the patient data was collected from almost 200 different published sources, and in many cases it was only possible to obtain data from CNV calls which had already been suggested to be pathogenic. Thus, we are aware that our patient data are incomplete because genome-wide CNV data are missing from a large part of the patient cohort and re-emphasizes already established associations. This is not the case for controls, for which genome-wide data was used. As a direct consequence, even though the difference between the rates of CNV deletions in controls and cases decreased dramatically after applying a quality control filtering step, a slight difference remained between both cohorts. The lack of collected CNV data spanning sex chromosomes limited the analysis only to autosomal chromosomes. In addition, the distribution of CNVs that overlap known microdeletion syndromes such as DiGeorge syndrome and Williams syndrome is overrepresented in the dataset. Similarly, the degree of phenotyping varied across the different studies, and often only basic phenotypic terms relating to CHD were available. This made it impossible to refine the diagnosis to a precise phenotypic class of CHD in many individuals.
Previous research has shown that the chances of finding a genetic cause of CHD is higher in syndromic, rather than non-syndromic CHD [6,7]. Therefore, it is not surprising that known CHD genes were enriched in this cohort. To identify non-syndromic causes of CHD, it is important to take into account previous findings, which have shown a significant excess of apparently deleterious inherited PTVs in unaffected parents [6]. To help address the challenges of identifying non-syndromic CHD genes, we have used an integrative approach, which has allowed us to look for novel CHD associations in a larger sample size, in a binary fashion. This will help to facilitate future studies.
Variable expression and reduced penetrance are common features in CHD, including in even well-established conditions such as Noonan Syndrome [49]. Most cases of CHD occur as a one off in the family and when recurrences do happen, CHD subtypes are more likely to be discordant than concordant [50]. This suggests that other modifying factors may be at play, including genetic, or environmental factors, or both. This presents a further challenge in identifying genetic causes of CHD. Moving forward, detailed phenotypic descriptions using a standardized system, and better data sharing strategies (including primary data), will facilitate further gene discovery and improved genotype-phenotype correlation in CHD subgroups.
In summary, we have performed an integrative analysis of CNVs, DNVs, o/e LOF ratio and expression during heart development amongst more than 10,000 CHD patients. Our analyses identify seven potential disease genes and mechanisms with novel association with CHD and strengthen previously reported associations.
[END]
[1] Url:
https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1009679
(C) Plos One. "Accelerating the publication of peer-reviewed science."
Licensed under Creative Commons Attribution (CC BY 4.0)
URL:
https://creativecommons.org/licenses/by/4.0/
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/