(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org.
Licensed under Creative Commons Attribution (CC BY) license.
url:
https://journals.plos.org/plosone/s/licenses-and-copyright
------------
Deciphering the genetic architecture and ethnographic distribution of IRD in three ethnic populations by whole genome sequence analysis
['Pooja Biswas', 'Shiley Eye Institute', 'University Of California San Diego', 'La Jolla', 'California', 'United States Of America', 'School Of Biotechnology', 'Reva University', 'Bengaluru', 'Karnataka']
Date: 2021-12
Patients with inherited retinal dystrophies (IRDs) were recruited from two understudied populations: Mexico and Pakistan as well as a third well-studied population of European Americans to define the genetic architecture of IRD by performing whole-genome sequencing (WGS). Whole-genome analysis was performed on 409 individuals from 108 unrelated pedigrees with IRDs. All patients underwent an ophthalmic evaluation to establish the retinal phenotype. Although the 108 pedigrees in this study had previously been examined for mutations in known IRD genes using a wide range of methodologies including targeted gene(s) or mutation(s) screening, linkage analysis and exome sequencing, the gene mutations responsible for IRD in these 108 pedigrees were not determined. WGS was performed on these pedigrees using Illumina X10 at a minimum of 30X depth. The sequence reads were mapped against hg19 followed by variant calling using GATK. The genome variants were annotated using SnpEff, PolyPhen2, and CADD score; the structural variants (SVs) were called using GenomeSTRiP and LUMPY. We identified potential causative sequence alterations in 61 pedigrees (57%), including 39 novel and 54 reported variants in IRD genes. For 57 of these pedigrees the observed genotype was consistent with the initial clinical diagnosis, the remaining 4 had the clinical diagnosis reclassified based on our findings. In seven pedigrees (12%) we observed atypical causal variants, i.e. unexpected genotype(s), including 4 pedigrees with causal variants in more than one IRD gene within all affected family members, one pedigree with intrafamilial genetic heterogeneity (different affected family members carrying causal variants in different IRD genes), one pedigree carrying a dominant causative variant present in pseudo-recessive form due to consanguinity and one pedigree with a de-novo variant in the affected family member. Combined atypical and large structural variants contributed to about 20% of cases. Among the novel mutations, 75% were detected in Mexican and 50% found in European American pedigrees and have not been reported in any other population while only 20% were detected in Pakistani pedigrees and were not previously reported. The remaining novel IRD causative variants were listed in gnomAD but were found to be very rare and population specific. Mutations in known IRD associated genes contributed to pathology in 63% Mexican, 60% Pakistani and 45% European American pedigrees analyzed. Overall, contribution of known IRD gene variants to disease pathology in these three populations was similar to that observed in other populations worldwide. This study revealed a spectrum of mutations contributing to IRD in three populations, identified a large proportion of novel potentially causative variants that are specific to the corresponding population or not reported in gnomAD and shed light on the genetic architecture of IRD in these diverse global populations.
The study was performed to identify the underlying cause of inherited retinal degeneration (IRD) in 409 individuals from 108 families. Primarily, these families were recruited from three different geographic regions: Mexico, Pakistan and European Americans from the United States. Blood samples were collected from all individuals for genome analysis. This analysis detected causative variants in 61 out of the 108 pedigrees. A total of 93 gene variants were found in the 61 families. Among these, 54 were previously reported as causative variants and the remaining 39 have not been reported in IRD pedigrees. Interestingly, 54% of these novel variants were not listed in gnomAD. In addition to these findings, complex causative genotypes were observed in 20% of pedigrees. Overall, causative variants were detected in 63% Mexican, 60% Pakistani and 45% European American pedigrees. This study revealed the distribution of IRD causative variants in pedigrees with diverse ethnic and geographic backgrounds.
The affordable cost structure of whole-genome sequencing in recent years [ 9 – 13 ] has enabled the analysis of all genes including their untranslated regions and provided opportunities to identify causal variants in patients with IRDs with broad genetic and phenotypic heterogeneity. Utilizing these advances in the current study, we present the genetic analysis of IRD in 108 pedigrees. These pedigrees are mainly from three populations: the understudied populations from Pakistan (Punjab province) and Mexico as well as the well-studied European American population (individuals of European ancestry from North America). Analysis of these pedigrees revealed atypical sequence alterations and provided a glimpse of the genetic architecture of IRD in these distinctly diverse global populations.
Retinal disease genes have been identified previously by linkage analysis, homozygosity mapping, and sequencing the coding regions of several genes associated with genetic and genomic markers. The subsequent development of targeted screening panels for pathogenic variants in known IRD genes greatly improved genetic diagnosis but failed to identify novel variants and novel genes involved in IRD [ 3 – 5 ]. Gene arrays to selectively capture and sequence candidate genes are reported to result in the identification of mutations in 60%-70% of IRD patients [ 3 , 5 , 6 ]. Advances in whole-exome sequencing (WES) enabled the identification of causal variants associated with Mendelian diseases in known or novel genes efficiently [ 7 , 8 ]. Nevertheless, about 30%-40% of cases remain unresolved. Further, while the majority of studies conducted so far focused on selected populations, the genomic architecture of IRD in certain populations remains unknown.
Inherited retinal degenerations (IRDs) are a group of diseases, which result in dysfunction or progressive degeneration of retinal cells causing a profound bilateral loss of vision. IRDs are relatively rare. It is currently estimated that IRDs affect 1 in 3000 individuals [ 1 ]. Significant heterogeneity has been reported in the phenotype of IRD patients with a wide variation in the age of onset, rate of progression, severity of the disease, and clinical symptoms. Variants in the same gene may also lead to marked diverse phenotypes as well as result in different patterns of inheritance. Currently, at least 271 genes are known to be associated with IRD [ 2 ].
Overall, the underlying cause of IRD was identified in about 57% of pedigrees. However, the rate of causative mutation identification in Mexican (63%), Pakistani (60%) and European American (45%) pedigrees varied. The number of novel IRD causative mutations detected in each of these cohorts also varied from about 42% and 41% in both Mexican and European American pedigrees to 60% in Pakistani pedigrees. Further, among the novel IRD causative SNVs, 20% of those detected in Pakistani pedigree were not listed in gnomAD database while 50% and 75% of novel SNVs in European American and Mexican pedigrees were not in the gnomAD database ( Table 5 ).
(C) Analysis of pedigrees from the United States. Fifty-four pedigrees, which comprise 50% of the total analyzed in this study are of European ancestry. Causative mutations were detected in 24 (45%) pedigrees that included 25 known mutations (~59%), 14 novel single nucleotide changes (SNVs) and 3 novel structural changes (Tables 1 , 2 , 3 , 4 and 5 ). Seven (~50%) of the 14 novel SNVs in known IRD genes, were not listed in the gnomAD database ( Table 5 ). Two variants, c.722-1G>T in GUCY2D and c.1217G>A variant in CNGB1, detected in our European American cohort were reported in the African population at low frequency (gnomAD database), while the remaining appear to be unique to the European population ( Fig 6 and Table 5 ).
(B) Analysis of pedigrees from Pakistan. In this study, we have analyzed 15 consanguineous Pakistani pedigrees with multiple affected members and identified causative mutations in 9 IRD genes in 9 pedigrees (60%) while the causative mutations were not detected in 6 pedigrees (40%) ( Fig 6 ). Among the causative mutations detected in IRD associated genes, 6 are novel (60%) and 4 are previously reported ( Fig 6 and Table 5 ). Four of the 5 novel mutations involving SNVs detected were reported in gnomAD database as extremely rare variants in South Asians, one in Europeans and the remaining one (~20%) was not listed (Tables 1 , 2 , 3 , 4 and 5 ).
(A) Analysis of pedigrees from Mexico. In the current study, WGS analysis of 35 pedigrees with recessive retinal dystrophy excluding STGD1 detected 18 previously reported and 13 novel (42%) causative mutations in known IRD genes in 22 pedigrees (63%) leaving the remaining 13 pedigrees unresolved ( Fig 6 ). Nine (75%) of the 12 novel mutations involving SNVs observed in cases from Mexico were not listed in the gnomAD database while the remaining are reported only in the Latino population as very rare variants (Tables 1 , 2 , 3 , 4 and 5 ). Mutations in USH2A are the most frequent cause of recessive retinal degeneration in this population with four pedigrees from the current study (RF.VI148.1215, RF.VI145.1215, RF.VI129.0714, and RF.VI127.0514) and two additional pedigrees from our previous studies with causative mutations in USH2A [ 4 ]. Female carriers of RPGR mutations in two pedigrees (RF.VI123.0514 and RF.VI153.0216) developed retinal degeneration phenotype as reported earlier [ 73 ].
In summary, a novel de-novo causative variant c.940A>G, p.Lys314Glu in the IMPDH1 gene associated with autosomal dominant RP was observed in one pedigree (RF.VI13.0707); and a previously known dominant mutation in PRPF3 (c.1481C>T) was detected in RF.197.0113 in a pseudo-recessive pattern due to multiple consanguineous marriages. In addition, potentially pathogenic variants in two independent genes both segregating with the disease and each sufficient to cause pathology were detected in four out of the 7 pedigrees with atypical genotypes. Besides these 7 pedigrees, we previously reported the identification of mutations in two independent genes as the underlying cause of IRD in separate branches of a pedigree by WGS in a European American pedigree [ 11 ].
A-D: Mexican pedigrees with atypical mutations. A. RF.VI13.0707 pedigree with a de-novo mutation in IMPDH1; B. RF.VI104.0514 pedigree with a homozygous nonsense mutation in C2orf71 was detected in generation II while a previously reported homozygous splice site mutation in the CLN3 gene was observed in proband in the IV generation demonstrating the involvement of two different genes in IRD pathology in different generations. C. RF.VI157.0216 pedigree with mutations in genes OPN1SW and TOPORS associated with dominant color blindness and retinitis pigmentosa; D. RF.VI111.0514 pedigree with novel heterozygous causative mutation in PRPF8 and a known mutation in PRPF31, each sufficient to cause dominant IRD, were observed in monozygotic affected twins; E. RF.197.0113 consanguineous pedigree from Pakistan with a previously known dominant acting mutation in PRPF3 segregated in a pseudo-recessive pattern; F . RF.M.1111 an European American pedigrees with causative variants in more than one known IRD genes were observed to segregate with disease. A homozygous mutation in PDE6G and a hemizygous mutation in OPN1LW were observed in Pedigree RF.M.1111. G. RF.K.0216 Indian pedigree with a heterozygous PRPH2 mutation that is sufficient to cause retinal dystrophy and an additional mutation in ROM1 that can lead to digenic RP along with the PRPH2 variant. The asterisk indicates the availability of whole-genome sequencing data.
In addition to the large deletion, two affected (IV:3 & IV:6) and one unaffected (IV:5) offspring of an affected female (III:3) were observed to carry a rare heterozygous potentially pathogenic variant c.659T>G, p.Phe220Cys (Allele frequency in gnomAD = 0.00002) in the rhodopsin gene. While samples of the parents of these individuals were not available for genetic analysis, the novel rhodopsin variant c.659T>G, p.Phe220Cys was not detected in either maternal grandparents (II:1 & II:2) suggesting the possible paternal (III:4) inheritance of this variant in the three siblings (IV:3, IV:5 & IV:6). Further, this variant was not detected in the rest of the pedigree excluding the possible involvement of c.659T>G, p.Phe220Cys as the variant responsible for IRD pathology in the rest of the extended pedigree. The two affected individuals IV:3 and IV:6 have both the large deletion encompassing UBR2 and PRPH2 and the c.659T>G in the rhodopsin gene. The impact of having both sequence alterations in these individuals is unknown.
(D) A large structural change involving two genes. Clinical evaluation of eight affected individuals in a four-generation pedigree (C790) led to the diagnosis of autosomal dominant macular degeneration (MD) with no non-ocular abnormalities co-segregating with the MD phenotype ( Fig 3D1 ). Analysis of WGS of five affected members and seven unaffected members revealed a heterozygous 33 Kb deletion on chromosome 6 (Chr6: g.42,643,442_42,676,411del) in affected members and not in unaffected relatives. This deletion included two adjacent genes present in opposite orientation: exons 39 and 40 of UBR2 and exons 2 and 3 of PRPH2 ( Fig 3D2 ). Segregation analysis of 7 affected and 15 unaffected members using qPCR confirmed the segregation of the chromosome 6 deletion with the phenotype ( Fig 3D3 ). The PRPH2 gene alterations including loss of function mutations have been implicated in dominant MD [ 57 ] and other retinal dystrophies, while the UBR2 gene is not associated with IRD or any other pathological condition.
(C) A 22.8Kb deletion in the CERKL gene. In pedigree RF.T.8.11, a previously reported nonsense mutation p.Arg257* in the CERKL gene was identified in the heterozygous state by exome sequence analysis in the proband I:1 who was adopted (53). Whole-genome sequence analysis of this individual identified a large novel heterozygous 22.8 Kb deletion (Chr2: g.182,456,422_182,479,267del) on chromosome 2 ( Fig 3C ). Analysis of the samples of his two offspring established the compound heterozygous nature of the nonsense variant and the large deletion in the affected individual. The 22.8 Kb chromosome 2 deletion includes the entire coding sequence of exon 2 (243 bp) and about ~10.3 Kb of intron 1 (Chr2: g.182,456,422_182,468,805del) and 12.3 Kb of intron 2 (Chr2: g.182,468,565_182,479,267del) of the CERKL gene. The nonsense change is predicted to truncate the protein or result in nonsense-mediated decay (NMD) of the transcript [ 53 ]. The deletion of 22.8 Kb sequence encompassing exon 2 of CERKL may also result in the formation of a truncated protein due to coding region frameshift or the transcript may undergo NMD. Both sequence alterations detected in the CERKL gene in this individual are predicted to lead to the loss of functional protein; null mutations in CERKL have been established as the underlying cause of IRD [ 55 , 56 ].
(A) A 1.6Mb deletion in EYS segregating with IRD. Analysis of the WGS of two affected (II:1 & II:2) and one unaffected sibling (II:3) from a Mexican pedigree RF.VI96.0210 ( Fig 3A1 ) revealed a novel, 1.6 Mb homozygous deletion on chromosome 6 (Chr6: g.65,994,849_67,582,755del) in both affected members. This deletion was not observed in the unaffected sibling. The deleted region encompasses the exons 1 to 12 and 5’-untranslated region of the EYS gene implicated in recessive retinal degeneration ( Fig 3A2 ). PCR amplification of exons 1 to 12 of EYS in this pedigree revealed the loss of exons in II:1 and II:2 ( Fig 3A3 ). Amplification with primers flanking the deleted region followed by sequencing showed the overlapping of paralogous repeat sequences and deletion of in-between 1.6 Mb regions ( Fig 3A4 ) in the affected members.
Five unique structural variants were identified in EYS, LCA5, CERKL, PRPH2, and CNGB3 in five different pedigrees, one with dominant macular degeneration and four with recessive retinal degeneration. (A1) 1.6Mb homozygous deletion Chr6: g.65,994,849_67,582,755del is segregating with recessive retinal degeneration in a Mexican pedigree RF.V196.0210. (A2) The schematic diagram depicting a 1.6Mb homozygous deletion Chr6: g.65,994,849_67,582,755del encompassing the exons 1 to 12 and about 1.6Mb of 5’-untranslated region of the EYS gene. (A3) PCR amplification of EYS exons 1, 6, and 11 detected the presence of expected size product in unaffected individuals (I:1, I:2, II:3) whereas the presence of PCR product was not observed in two affected individuals (II:1 and II:2). (A4) Amplification with primers flanking the deleted region followed by sequencing revealed the junction point in individual II:1 due to the 1.6Mb deletion. Examination of the sequence flanking the junction point detected paralogous repeat sequences on both sides of the deleted region (blue and yellow boxes). (B1) In a consanguineous Pakistani pedigree RF.277.0113, a 110Kb homozygous deletion in LCA5 (chr6: g.80,205,052_80,315,592del) segregated with the phenotype. (B2) The novel 110Kb homozygous deletion includes 1 to 4 exons of LCA5. (B3) PCR amplification showed the absence of exon 1 to 4 of LCA5 in both affected individuals while exon 5 is present in all family members. (B4) Amplification with primers flanking the deletion resulted in the generation of the fragment with deletion. Sequencing this PCR product revealed the presence of paralogous repeat sequences flanking the junction point in affected individuals. Sequence marked with yellow and blue rectangles represent the paralogous sequence on both sides of the deleted region. (C1) A previously reported heterozygous stop mutation CERKL p.Arg257* and a novel heterozygous large 22.8 Kb deletion (Chr2: g.182,456,422_182,479,267del) on chromosome 2, which includes exon 2 of CERKL are observed in trans configuration in the proband of RF.T.8.11. (C2) The schematic diagram shows the 22.8Kb deletion which includes ~10.3Kb of intron 1 and exon 2 (243bp) and 12.3Kb of intron 2 of CERKL gene. (C3) The WGS reads mapped to the deleted region showed decrease in read depth. (C4) Electropherogram showing the sequence of junction fragment generated by amplification with primers flanking the deletion revealed the specific boundaries of the deletion that includes exon 2 of CERKL. (D1) The segregation analysis revealed a heterozygous 33Kb deletion on chromosome 6 (Chr6: g.42,643,442_42,676,411del) segregating with the disease. (D2) A cartoon depicting the deleted region which includes two different genes: exons 39 and 40 of UBR2 and exons 2 and 3 of PRPH2 present in opposite orientation (D3) Analysis of 7 affected and 15 unaffected members using qPCR confirmed the presence of the heterozygous deletion on chromosome 6 in affected members and not in unaffected relatives. (E1) A set of compound heterozygous deletions including a novel 7Kb deletion (Chr8: g.87,616,103_87,623,431del) and a previously known 7bp deletion p.Arg274Valfs*13 in CNGB3 gene were observed in RF.M.0592 pedigree with a single affected individual. (E2) The novel 7Kb heterozygous deletion (Chr8: g.87,616,103_87,623,431del) (Pink rectangle) includes coding exon 15 of CNGB3. (E3) qPCR analysis confirmed the presence of the heterozygous deletion of CNGB3 exon 15 in II:1, which was inherited from the mother (I:2). The asterisk indicates the availability of whole-genome sequencing data.
Five different pedigrees carried novel structural variants. These included one pedigree with dominant macular degeneration and the remaining four with recessive retinal degeneration. Of the recessive pedigrees, two had the novel structural variants in the homozygous state, one carried a previously reported nonsense mutation and one a previously reported frameshift mutation ( Table 3 and Fig 3 ). Three of these pedigrees are European American while one each is Mexican and Pakistani.
This analysis detected USH2A variants as the underlying cause of disease in eight different pedigrees. Among these, only one was a novel variant while the remaining 15 were reported previously (Tables 1 and 2 ). The targeted mutation screening performed prior to WGS on a subset of cases did not include all currently known IRD genes nor cover all variants in a given gene; our current WGS screening resulted in the identification of variants in known genes in this set of pedigrees.
The segregation analysis revealed 35 previously reported mutations that were identified in 25 pedigrees. There were seven frameshift, 17 missense, seven premature stop codon mutations, and four splice site altering changes. Of these ten are homozygous, six dominant heterozygous, two X-linked, and 17 compound heterozygous mutations found in these pedigrees. Pedigrees A-H are Mexican, I-K are Pakistani and L-V are European American, W-X are Ashkenazi Jewish and Y is Indian. The asterisk indicates the availability of whole-genome sequencing data.
In 22 pedigrees (9 Mexican, 4 Pakistani and 9 European American), 26 rare, potentially pathogenic novel (not previously reported as causative) variants in 17 different known IRD genes were identified as likely causative mutations. Seven of these pedigrees also have 8 previously reported mutations in known IRD genes. Among the variants detected, 9 were homozygous (in 8 pedigrees), 21 compound heterozygous (in 10 pedigrees), 2 dominant acting heterozygous (in 2 pedigrees), and 2 were X-linked variants (in 2 pedigrees) ( Table 1 ). Five of these variants were nonsense, 15 missense, 9 frameshift, and 5 intronic splice altering variants. Sanger sequencing analysis of all available family members confirmed co-segregation of candidate variants with IRD ( Fig 1 ).
ExAC database has constraint Z scores for 18,225 genes. In our analysis, we included 271 retinal disease-associated genes from the RetNet database [ 2 ] and 58 other possible candidate genes associated with IRD based on their expression in relevant cells and function. Among these were 311 genes listed in the ExAC database including 183 recessive, 75 dominant, 9 X-linked genes, and 44 undefined genes. Positive Z scores indicated increased variation intolerance and therefore these 311 genes had fewer variants than expected. Autosomal dominant/X-linked IRD related genes were highly conserved and sequence alterations in these genes have among the highest Z-scores. Therefore, we used Z-scores to prioritize the candidate variants for dominant and X-linked related genes but not for recessive genes.
We observed a total of 56,299 CNVs including 25,357 deletions, 13,223 duplications, and 17,719 insertions in 404 samples. More than half of the CNVs, 29,142 (52%) were found to be common as they were found in more than 30 samples. The CNV calling software (GenomeStrip) detected CNVs with lengths greater than 1000bp. In our analysis, we identified CNVs ranging from 1000bp to 313,600bp. The CNVs were called with a quality score, and those <1 were classified as likely false positives.
3.77 to 4.84 million SNVs including ~850,000 small INDELs were detected from autosomes in every individual and no outliers or plate biases were observed. Similarly, no outliers were observed in the X and Y chromosome data. The heterozygous and homozygous ratios were normal on autosomes as well as sex chromosomes in each female and male sample. Among the total SNVs observed, 112,335 (0.37%) were annotated as missense variants. These include 79,428 (71%) known and 32,907 (29%) novel variants.
Analysis of sequence data identified 202 female and 202 male subjects consistent with our records and validated relationships based on identity by descent (IBD) mapping analysis. The total number of reads obtained on each individual ranged from 765 million to 1,903 million, of which 78% ~ 95% were detected as appropriately mapped reads indicating the high quality of sequence data. Analysis using GATK best practice pipeline identified 30,071,475 single nucleotide variants (SNVs) in total, including 23,409,845 single nucleotide polymorphisms (SNPs) and 6,661,630 INDELs. The number of variants in each sample ranged from 3.77 to 4.84 million SNVs. A total of 18,301,653 known and 11,769,822 novel (based on dbSNP147) SNVs were observed in 404 subjects. Among the total number of identified SNVs, 21,026,019 (70%) were identified as very rare SNVs (allele frequency < 0.001). The rare and moderate/possibly disease-causing SNVs included 186,501 (0.61%) while only 53,101 (0.18%) of them were predicted to be deleterious/probably damaging.
The pattern of inheritance was observed to be recessive in 76 pedigrees, dominant in 25, and X-linked in 7. However, after completing the analysis, the pattern of inheritance was corrected in 4 pedigrees based on the causative mutations detected. One pedigree with multiple consanguineous marriages (RF.197.0113) was originally classified as recessive but determined to be dominant with a pseudo-recessive pattern of inheritance. Similarly, two pedigrees RF.VI123.0514 and RF.VI153.0216 were originally classified as dominant and recessive respectively but mutations in X-linked genes were identified as the underlying cause of the phenotype. One pedigree originally classified as dominant (RF.VI116.1215) was re-classified as recessive.
Discussion
Analysis of the whole-genome sequence of this cohort comprised of 404 individuals from 108 pedigrees with inherited retinal degeneration identified 93 causal variants in 232 individuals in 61 (57%) pedigrees. Among the causative variants detected, 39 (42%) are novel and 54 (58%) are previously reported variants in 44 well established IRD associated genes and two IRD genes we recently reported [10, 14]. Although a majority of pedigrees underwent prior screening for mutations in known genes without success, WGS analysis identified causative variants in IRD genes. This is primarily due to the limitations in the mutation screening panels used over the past two decades that did not include many currently known IRD associated genes. Further, the early version of exome capture probes that did not cover complete coding sequences. Variants in novel genes or variants in non-coding regions of known IRD genes with unknown impact or yet to be annotated may contribute to the phenotype in the 47 pedigrees that remained unresolved in this study.
The outcomes of the analysis of 108 IRD pedigrees provided insight into the genetic architecture of IRD. Overall novel mutations were identified in genes known to be associated with IRD in 36 pedigrees while previously reported mutations were detected in 25 pedigrees. The majority of the mutations (60%) were missense mutations including stop gain variants, 23% frameshift, while only 5% were structural variants and 12% were potential splice altering variants. All the causative CNVs detected in this study were novel. Analysis of the sequence flanking these deletions revealed microhomologies suggesting potential non-homologous end-joining leading to these deletions (Fig 3). Atypical genotypes were detected in a set of pedigrees (12%). These included causative mutations in more than one gene that segregated with IRD. While causative mutation(s) in one gene is potentially sufficient to explain pathology, the impact of having an additional causative mutation in a second IRD gene is unknown due to the significant overlap in the phenotype of IRDs. Further, intrafamilial genetic heterogeneity was observed in one pedigree. Such cases reveal the need for a comprehensive analysis of all known IRD genes for molecular diagnosis, counseling, and particularly for treatment decisions. In several cases, heterozygous pathogenic variants were also detected in IRD genes in several cases in addition to the primary causative mutations. A deeper phenotype-genotype analysis on a larger cohort, in the context of additional pathogenic variants, may provide further insight into variation in the IRD phenotype and molecular pathology of IRD. The occurrence of de-novo mutations is rare in retinal disease genes [74–77] and a heterozygous de-novo mutation in IMPDH1 was detected in one affected individual in our cohort. This is the first report of a de-novo variant in the IMPDH1 gene.
It is interesting to note that only a small proportion of novel causative genes were identified despite a significant proportion of our pedigrees originating from understudied populations. Further, the two novel genes observed to carry causative mutations in our cohort were detected in small pedigrees of European Americans [10, 14]. The low number of novel IRD causative genes detected is consistent with the low number of novel IRD genes reported in the literature in the past few years [2]. An exponential increase in novel IRD gene discovery occurred in two majors spurts between 2000–2005 and 2010–2015 [2]. The spurts coincided with the development of advanced genome analysis tools and consequent enhancement in our knowledge of the architecture of the genome. Continuing with this trend, recent studies revealed the contribution of atypical genomic changes in IRD genes to pathology [78–80]. Our findings are consistent with the observation that the discovery of novel IRD genes is approaching a plateau phase and atypical genomic alterations in known IRD genes may contribute to about 10%-15% of cases [12, 79]. The number of unrelated pedigrees with mutations in recently identified novel IRD genes, both in our studies and in the literature is small suggesting these mutations could be more recent or private and are not major contributors to IRD. The underlying cause of pathology in 47 (43%) pedigrees that remained unresolved in our cohort after WGS may also involve atypical genotypes including alterations in non-coding sequences or in regions of the genome that are not well understood [80–82]. Therefore, gaining a deeper understanding of the genome, particularly the impact of non-coding variants, may improve our understanding of the molecular architecture of IRD and help resolve the remaining cases. Further advances in genome analysis methodologies may also facilitate the detection of the molecular cause of IRD in these unresolved pedigrees.
The families analyzed in this study included families that are primarily from understudied populations from Pakistan and Mexico and a third, well-studied European American population. About a third of the pedigrees included in this study are from Mexico with a unique population in which the genetics of IRD are not well understood. Comprehensive genetic analysis of IRD in this population has been reported primarily in two publications including one of our own [4, 32, 83–85]. Our previous analysis of 6 Mexican pedigrees from this region using whole-exome sequencing detected 3 novel and 6 known causative variants in IRD associated genes [4]. Zenteno et al described targeted genetic analysis of a cohort of probands with IRD and detection of mutations in 66% of cases with 48% of these mutations being novel [32]. The current analysis of 35 pedigrees using the WGS detected causative mutations in 63% of pedigrees from Mexico and 42% of these are novel. These findings are similar to the observations reported in the prior two publications and reflect the understudied nature of this population [4, 32]. Further, 75% of these novel potentially pathogenic SNVs detected in our study are not listed in the gnomAD database. Since the Mexican population is an admixture of indigenous peoples and individuals of European ancestry [86, 87]; the detection of a large proportion of novel variants not listed gnomAD may be due to their possible origin from the indigenous population in Mexico that are not well represented in gnomAD data set.
The second population included in our analysis is from the Punjab province of Pakistan. Until recently, the genetics of IRD in this population was not well studied. The structure of the Pakistani population is unique with endogamous sub-populations of multi-ethnic origin and high consanguinity in each of these populations [88–90]. Our earlier studies on 208 multigenerational pedigrees from the same region with a diagnosis of recessive IRD [7, 91–104] found homozygous causative mutations in 149 pedigrees (~71%). So far, mutations in novel genes were observed in only five (2.5%) unrelated Pakistani pedigrees in our cohort ASRGL1 [99], IFT43 [104], ZNF513 [105], SLC24A1 [106], and CLCC1 [93]) while the remaining resolved pedigrees (97.5%) had mutations in known IRD genes. Among the mutations detected in known genes, p.Pro363Thr in RPE65 is the most common causative mutation found in this population [7, 91, 107]; this variant was observed only in the South Asian population (gnomAD database). An independent study on a cohort of Pakistani families also reported 70% novel and 30% previously identified variants in IRD associated genes [108–122]. Consistent with these findings, causative mutations were detected in 60% of pedigrees in the current study cohort with 60% of the mutations being novel. However, the majority of these novel IRD associated SNVs were listed in the South Asian population in the gnomAD database (7 out of 8) unlike the novel SNVs in the Mexican population.
Interestingly, the mutation detection rate was lower (45%) in European American pedigrees compared to the rate in Mexican and Pakistani pedigrees (63% and 60%, respectively). Despite the well-studied nature of this population, 41% of the mutations detected in this study cohort are novel. Furthermore, 50% of these novel causative SNVs are not listed in gnomAD database.
Overall, USH2A is most frequently associated with IRD followed by EYS, CERKL, CRX, IMPG1 and RPGR in the current study cohorts (Fig 7). Studies describing the genetic analysis of IRD in geographically distinct populations using a range of methods have been reported [12, 32, 123–132]. These studies found USH2A as the gene frequently associated with recessive RP worldwide including the European, Mexican and Pakistani populations [12]. In addition, the involvement of selected genes including EYS, RPE65, CEP290 in IRD is reported at higher frequency in certain populations [133]. Further, the involvement of ZNF513 and INPP5E in IRD is reported only in Pakistani and European populations respectively [134]. Population specific founder mutations have also been reported [135]. Our previous studies on Pakistani population identified p.Pro363Thr variant in RPE65 that is specific to the South Asian population as the common causative mutation [7, 92]. The distribution of potentially causative variants detected in the study cohort is consistent with findings on other populations. Although the Pakistani population and some of the sub-populations in Mexico are endogamous in nature, the occurrence of causative variants at higher frequency is not observed in these populations compared to other populations.
The majority of novel mutations identified in our cohort are either not listed in the gnomAD database or observed at very low frequency in Latino (for the Mexican), South Asian (for the Pakistani), or European (for the European American) populations (Tables 1, 3, 4 and 5). It is unknown if the novel variants detected in cases from the Mexican population are more recent variants in the Latino population or have originated from the indigenous population which might not be well represented in gnomAD data. Similarly, all the novel causative variants found in the Pakistani cohort are either absent or occur at very low frequency in the South Asian population suggesting those to be unique to this population. Further, these were observed only in one or a few Pakistani pedigrees despite the endogamous nature of this population. Surprisingly, a similar trend was observed with the novel mutations detected in the well-studied American population. Eight out of 21 novel mutations detected in European American pedigrees including AGBL5 and IFT88 variants were not listed in gnomAD while the remaining are specific to European population. These findings suggest that the novel mutations detected in our cohort are possibly specific to their corresponding populations or private mutations, particularly the ones observed in European Americans. A majority of pedigrees analyzed in the current study were prescreened for mutations utilizing targeted mutation screening methodologies designed based on data predominantly from European Americans [136–138]. This bias has possibly contributed to the detection of high proportion of novel causative variants, particularly in the set of European American pedigrees. Overall, the findings on geographically diverse and understudied Mexican and Pakistani populations and the well-studied Caucasian population including our own data revealed that the pattern of distribution of IRD causative mutations in this cohort was similar to the findings reported in other worldwide populations. As the number of pedigrees studied from each ethnic group is small, analysis of additional IRD cases from the understudied Pakistani and Mexican populations may provide better insight into the genetic architecture of these populations. Further, appropriate classification of the clinical relevance of novel potentially causative variants using population specific information and the impact of the corresponding gene will facilitate improved genetic diagnosis to patients from worldwide populations [139].
This study using WGS and in-depth integrated analysis of the nature and type of mutations in different populations, provided insight into the population-specific genetic architecture of IRD and enabled it’s comparison to other worldwide populations. Such information will be helpful in the design of efficient population-specific tools for molecular diagnosis, genetic counseling, and decision on the selection of therapies. Further analysis of the 47 pedigrees that remained unresolved in this study may lead to the identification of causative variants in novel genes or non-coding variants that can contribute to the phenotype by modifying enhancer-promoter interactions or other yet to be identified functions of non-coding sequences.
[END]
[1] Url:
https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1009848
(C) Plos One. "Accelerating the publication of peer-reviewed science."
Licensed under Creative Commons Attribution (CC BY 4.0)
URL:
https://creativecommons.org/licenses/by/4.0/
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/