(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .
Sequencing-based fine-mapping and in silico functional characterization of the 10q24.32 arsenic metabolism efficiency locus across multiple arsenic-exposed populations [1]
['Meytal Batya Chernoff', 'Department Of Public Health Sciences', 'University Of Chicago', 'Chicago', 'Illinois', 'United States Of America', 'Interdisciplinary Scientist Training Program', 'University Of Chicago Pritzker School Of Medicine', 'Dayana Delgado', 'Lin Tong']
Date: 2023-02
Inorganic arsenic is highly toxic and carcinogenic to humans. Exposed individuals vary in their ability to metabolize arsenic, and variability in arsenic metabolism efficiency (AME) is associated with risks of arsenic-related toxicities. Inherited genetic variation in the 10q24.32 region, near the arsenic methyltransferase (AS3MT) gene, is associated with urine-based measures of AME in multiple arsenic-exposed populations. To identify potential causal variants in this region, we applied fine mapping approaches to targeted sequencing data generated for exposed individuals from Bangladeshi, American Indian, and European American populations (n = 2,357, 557, and 648 respectively). We identified three independent association signals for Bangladeshis, two for American Indians, and one for European Americans. The size of the confidence sets for each signal varied from 4 to 85 variants. There was one signal shared across all three populations, represented by the same SNP in American Indians and European Americans (rs191177668) and in strong linkage disequilibrium (LD) with a lead SNP in Bangladesh (rs145537350). Beyond this shared signal, differences in LD patterns, minor allele frequency (MAF) (e.g., rs12573221 ~13% in Bangladesh ~0.2% among American Indians), and/or heterogeneity in effect sizes across populations likely contributed to the apparent population specificity of the additional identified signals. One of our potential causal variants influences AS3MT expression and nearby DNA methylation in numerous GTEx tissue types (with rs4919690 as a likely causal variant). Several SNPs in our confidence sets overlap transcription factor binding sites and cis-regulatory elements (from ENCODE). Taken together, our analyses reveal multiple potential causal variants in the 10q24.32 region influencing AME, including a variant shared across populations, and elucidate potential biological mechanisms underlying the impact of genetic variation on AME.
Inorganic arsenic is highly toxic, and exposure to arsenic increases risk for multiple diseases, including cancer. Individuals differ in their ability to metabolize and excrete arsenic, in part due to inherited genetic variation in and around the AS3MT gene, and these differences impact arsenic toxicity risk. To identify candidate causal variants in the AS3MT region, we applied fine-mapping methods to targeted sequencing data from The Health Effects of Arsenic Longitudinal Study (HEALS), the Strong Heart Study (SHS), and the New Hampshire Skin Cancer Study (NHSCS) (Bangladesh, American Indian, and European American populations). We detected 3 independent association signals in HEALS, 2 in SHS, and 1 in NHSCS; and we identified a set of candidate causal variants for each of these signals. One of the identified signals represents a potential causal variant that impacts arsenic metabolism across all three populations. Using omics-QTL co-localization analyses, we show that some of the variants identified act through regulation AS3MT in multiple tissue types. Overall, this work increases our understanding of variation in the AS3MT region and its role in arsenic metabolism across populations.
Funding: Targeted DNA sequencing across all cohorts was supported by National Institutes of Health grant R01 ES023834 (to B.L.P.) The Health Effects of Arsenic Longitudinal Study was supported by National Institutes of Health grants R35 ES028379 (to B.L.P.), R21 ES024834 (to B.L.P.), P42ES010349 (to J.G.), R01 CA107431 (to H.A.), P30 ES027792 (to H.A.), R24 ES028532 (to H.A.), and R24 TW009555 (to H.A.). The New Hampshire Skin Cancer Study was supported by U.S. National Institutes of General Medicine grant P20GM104416 (to M.R.K.) and by National Institutes of Health grants P42ES007373 (to C.Y. Chen) and R01CA057494 (to M.R.K.). The Strong Heart Study has been funded in whole or in part with federal funds from the National Heart, Lung, and Blood Institute, National Institute of Health, Department of Health and Human Services, under contract numbers 75N92019D00027, 75N92019D00028, 75N92019D00029, & 75N92019D00030. The study was previously supported by research grants: R01HL109315, R01HL109301, R01HL109284, R01HL109282, and R01HL109319 and R01HL090863, and by cooperative agreements: U01HL41642, U01HL41652, U01HL41654, U01HL65520, and U01HL65521 and from the National Institute of Environmental Health Sciences (P42ES033719, R01ES032638 and past grant R01ES021367). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This work was also supported by the National Institute of Environmental Health Sciences under the award number R35ES028379-03S1 (B.L.P.), the National Institute of General Medicine under award number T73M007281, The National Institute of Environmental Health Sciences award number 5F30ES031858-02 (M.B.C.), Susan G. Komen Research Training Grant under award number GTDR16376189, and the National Institute of Aging under award number T32AG51146-5. The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Data Availability: All summary statistics generated with this study are included within the manuscript and its supporting information files. Individual-level data requests for all the data underlying results presented in the study can be requested by contacting
[email protected] . Requests will then be routed to the three individual studies, as there are different mechanisms for data access for each study. Normalized expression matrices, summary statistics for eQTLs and mQTLs, and covariates used for QTL mapping are available at the GTEx Portal (
https://gtexportal.org/home/datasets ). DNAm normalized data is available at GEO (GSE213478); access to the DNAm raw data is provided through the AnVIL platform (
https://anvil.terra.bio/#workspaces/anvil-datastorage/AnVIL_GTEx_V9_hg38 ). All GTEx protected data are available via dbGaP (phs000424.v9 and phs000424.v8.p2). GTEx whole genome sequencing data can be requested through dbGaP (
https://gtexportal.org/home/protectedDataAccess ).
The causal variants underlying the observed associations between AS3MT/10q24.32 variation and AME remain unknown. Previous studies have had limited SNP density, small sample sizes, and have focused on single populations. In this work, we generate targeted sequencing data for multiple arsenic-exposed populations to identify candidate causal variants underlying the association between 10q24.32 variants and AME. We perform in-silico functional annotation to assess potential functional impact and further prioritize potential causal variants. Finally, we conduct co-localization analyses to examine variants’ potential effects on AS3MT expression and/or nearby DNA methylation. This work enhances our understanding of the genetic mechanism underlying variation in AME and susceptibility to iAs toxicity.
Individuals’ arsenic metabolism efficiency (AME) is often represented as the percentage of each arsenic species in urine relative to all species (iAs%, MMA%, and DMA%) [ 7 , 14 ], with higher DMA% indicating more efficient metabolism. While age, sex, and environmental factors contribute to inter-individual variation in AME [ 14 , 26 ], inherited genetic variation also plays an important role. Variation in the AS3MT/10q24.32 region has shown clear association with AME in multiple populations, including Bangladesh and American Indian communities in the US, with multiple independent association signals identified [ 14 , 27 – 29 ]. Prior studies have found that higher levels of urinary MMA% and/or lower levels of urinary DMA% are associated with increased risk for cancer, cardiovascular disease [ 30 ] and arsenic-induced skin lesions [ 14 ]. AME-associated SNPs have also been shown to impact the risk of arsenic-induced skin lesions, reflecting increased arsenic toxicity among those with lower AME [ 31 ].
iAs metabolism in humans is composed of a series of reduction and methylation reactions occurring primarily in the liver with some metabolism potentially occurring in other tissues such as the kidney [ 23 , 24 ]. iAs in the form of arsenite (iAs III ) or arsenate (As V ) enters the body. Based on the Challenger model of metabolism [ 14 , 23 , 25 ], iAs V can be reduced to iAs III , which can be methylated in a reaction catalyzed by arsenic (+3 oxidation state) methyltransferase (AS3MT) producing monomethyarsonic acid (MMA V ) [ 14 ], which can be reduced to monomethylarsonous acid (MMA III ). A second methylation step, also catalyzed by AS3MT, produces dimethylarsinic acid (DMA V ), which can be reduced to dimethylarsinous acid (DMA III ), the end-product of iAs metabolism in humans.
Arsenic-contaminated groundwater is a global public health issue, impacting >220 million individuals worldwide, with >85% of highly exposed individuals living in South Asia based on an exposure level of ≥10μg/L in drinking water [ 1 , 2 ]. The International Agency for Research on Cancer (IARC) classifies inorganic arsenic (iAs) as a “Group 1” human carcinogen [ 3 ] with chronic exposure increasing the risk of bladder [ 4 ], kidney, lung [ 5 ], liver, and skin cancers [ 3 , 6 ]. Exposure to iAs is also associated with increased risk for diabetes [ 7 ], as well as cardiovascular, cerebrovascular, and neurologic diseases [ 8 – 11 ]. A hallmark of chronic iAs exposure is the appearance of skin lesions, arsenical hyperkeratosis, typically on the hands and feet of exposed individuals [ 12 , 13 ]. The most common source of iAs exposure is contaminated drinking water [ 14 ] with ~2.1 million individuals in the U.S. [ 15 ] and 35 to 77 million in Bangladesh [ 16 , 17 ] exposed to iAs above 10ug/L, the maximum contaminant level set by the U.S. Environmental Protection Agency (EPA) [ 18 ] and World Health Organization (WHO) [ 1 , 19 ]. Other sources of arsenic exposure include the consumption of contaminated seafood and rice [ 20 – 22 ].
Arsenobetaine, a form of organic arsenic found in seafood (see methods ) was correlated with DMA% in all three cohorts (correlation of 0.18, 0.12, and 022 in HEALS, SHS, and NHSCS respectively); however, no clear differences were seen between our initial analysis results and those from a model that included adjustment for arsenobetaine. To further confirm our results, we performed a sensitivity analysis, excluding individuals with DMA% >85 (76 in HEALS, 224 in NHSCS, and 169 in SHS) to avoid the inclusion of DMA measures strongly influenced by exposure to organic arsenic. Here, we found no significant change in results with our lead SNPs still appearing among the top SNPs in the association analyses of all three populations.
Using known associations between individual characteristics and DMA%, we performed a series of effect modifications analyses to understand the interactions between these variables and our identified variants. We found no interaction between sex and the genetic effects on DMA% ( S9 Table ); the effects of our identified variants were similar in males and females. We also examined the interaction between smoking and DMA% and observed evidence of interaction between current smoking and the lead SNP in NHSCS (p = 0.04), but not in HEALS or SHS.
Under the assumption that 50% of DMA% signals are mQTLs (p 12 = 5x10 -6 ), we observed co-localization between DMA% signal rs4919687 and mQTLs in two tissue types: colon-transverse (one CpG) and ovary (two CpGs) ( S8 Table ). For colon, the associated CpG was in the gene body of CYP17A1 while for ovary, both associated CpGs were in the gene body of AS3MT. We did not observe any co-localizations between mQTLs and the DMA% association signals represented by either rs12573221 (HEALS set 2) or rs145537350 (HEALS set 1).
Among the nine tissue types analyzed, we identified mQTLs in strong LD with rs4919687 (HEALS set 3) in five tissue types: transverse colon, kidney cortex, lung, ovary, and testis. We also identified mQTLs in high LD with rs12573221 (HEALS set 2) in five tissue types: breast (mammary), lung, skeletal muscle, prostate, and testis. Finally, we identified mQTLs in high LD with rs145537350 (HEALS set 1) in four tissue types: breast (mammary), kidney cortex, lung, and whole blood.
Under the assumption that 50% of DMA% SNPs are cis-eQTLs (p 12 = 5x10 -6 ), we identified 25 tissue types in which a cis-eQTL for BORCS7 co-localized with the DMA% signal represented by rs4919687. Thus, both AS3MT and BORCS7 eQTLs co-localize with this DMA% signal. We also observed co-localization between this DMA% signal and tissue-specific eQTLs in CYP17A1OS (thyroid) and CYP17A1 (frontal cortex of the brain) (PP>80%) ( S7 Table ). Similar to AS3MT results, reducing p 12 (to a 5% probability of co-localization), resulted in much lower probabilities of co-localization for BORCS 7 ( S7 Table ).
Beyond AS3MT, we examined 27 additional genes within 500kb of AS3MT. HEALS DMA% lead SNPs rs4919687 (HEALS set 3) was in high LD with eQTLs for multiple genes (in at least one tissue type), including CYP17A1OS, AL356608.1, CYP17A1, BORCS7, NT5C2, and WBP1L. Among these, a BORCS7 eQTL (represented by rs11191421 and rs4919690) present in multiple tissues showed strong LD with HEALS lead SNP rs4919687 (set 3) in 43 tissue types. We also observed two genes, NFKB2 and RPARP-AS1, with cis-eQTLs whose lead SNPs were in high LD with a HEALS DMA% lead SNP rs12573221 (set 2) in tibial nerve and transverse colon tissues respectively. Finally, we observed one gene, SUFU, with a cis-eQTL in high LD with a DMA% association signal from SHS (rs191177668, association-based set 1) in the tibial artery.
Considering the additional lead signals identified in our population-specific analyses, we observed co-localization between DMA% signal rs12573221 (HEALS set 2) and an AS3MT eQTL in aortic artery. Even under the most liberal priors, we observed no evidence of co-localization for DMA% signals represented by rs145537350 (HEALS set 1) or rs191177668 (shared SNP in SHS and NHSCS).
Varying the prior probability (p 12 ) of the percentage of DMA% SNPs that are also cis-eQTLs from 50% to 5% decreased the number of tissue types in which co-localization was observed (PP of CCV>80%) between AS3MT cis-eQTLs and rs4919687 from 21 to 4 ( Table 5 and Fig 6A ) and resulted in no co-localization between AS3MT cis-eQTLs and other DMA% signals (other than HEALS set 3).
We observe evidence of co-localization in 21 GTEx tissue types, with four example tissue types shown in Panel A (aorta, adipose-visceral omentum, lung, and nerve-tibial). The low efficiency (low DMA%) allele at rs4919687 is consistently associated with lower AS3MT expression in all tissues for which co-localization was observed.
Following the identification of eQTLs in high LD with our AME-associated SNPs, we performed a co-localization analysis to determine whether the same causal variants impact AME and AS3MT expression, providing a regulatory mechanism by which our identified SNPs influence AME. Under the assumption that 50% of DMA% SNPs are eQTLs (p 12 = 5x10 -6 ), we found evidence for co-localization between AS3MT cis-eQTLs and the DMA% signal represented by rs4919687 in 21 tissue types (PP of common causal variant (CCV) >80%) ( Table 5 ) . The rs4919687 allele associated with decreased AME (lower DMA%) was associated with lower AS3MT expression across all tissue types in which co-localization was observed ( Fig 6B ), consistent with a mechanism in which lower AS3MT mRNA levels result in lower protein levels and lower enzymatic activity; thereby decreasing AME.
AS3MT is expressed in most human tissue types, and this expression is highly variable ranging from 411.5 TPM in the adrenal gland to 0.92 in whole blood ( S10 Fig ). We identified cis-eQTLs for AS3MT in 45 tissue types (among 47 analyzed), with 27 tissue types having multiple AS3MT cis-eQTLs (as many as four). Following identification of these QTLs, we wanted to determine if any eQTLs shared a common causal variant with our AME association signals thereby representing a regulatory mechanism by which our identified SNPs impact AME. Cis-eQTLs in 22 tissue types had a lead eSNP in high LD (r 2 >0.7) with rs4919687, a lead DMA% SNP in HEALS found in confidence set 3. Among these, rs4919690 (in confidence set 3) was the lead eSNP in 14 tissue types. Eight tissue types had a lead eSNP in high LD with rs12573221 (lead SNP in HEALS confidence set 2) and 5 tissues had a lead eSNP in high LD with rs145537350 (lead SNP in HEALS set 1) and rs191177668 (shared lead SNP in SHS and NHSCS). Examining the individual lead SNPs associated with urinary DMA% in the context of our eQTL analysis, we found that HEALS lead SNP rs4919687 had p < 5x10 -8 in forty tissue types and the shared lead SNP in SHS and NHSCS, rs191177668, had p-value < 5x10 -8 in eight tissue types (based on primary eQTL analysis). The second lead SNP in SHS (rs4919688) was not identified in any of our QTL analyses.
We aligned credible set variants with markers of open chromatin (Dnase I), cis-Credible Regulatory Elements (CREs), H3K27Ac, H3K4me3, and transcription factor (TF) binding sites (specifically TFs in cells and tissue types related to the kidney, liver, and heart). Three variants in HEALS confidence set 1 ( Table 4 and S9 Fig ), one variant in HEALS set 3 ( Table 4 and Fig 5 ), and 12 of the 50 variants in the HEALS set 2 overlapped at least one of the examined features. Examining the 85 variant SHS confidence set that contains the shared lead variant in SHS and NHSCS (rs191177668) as well as the lead variant in HEALS (rs145537350), we find that 24 of the variants overlap functional features ( S6 Table ).
We examined the shared signal through meta-analysis of the isolated HEALS and SHS association signal results using MANTRA. The 95% confidence set produced contained 8 variants including the lead primary SNPs in HEALS (rs145537350) and SHS (rs191177668) which had posterior inclusion probabilities of 0.488 and 0.251, respectively ( S3 Table ). This confidence set also included the lead tertiary HEALS SNP (rs4919687), a variant included in HEALS set 3 and in the SHS summary statistic-based set 1.
The results of ancestry-specific fine-mapping of individual-level data performed using SuSiE [ 33 , 34 ]] were consistent with those observed in our association analyses, with three distinct 95% confidence sets identified in HEALS and two in SHS ( Fig 4 ). We find overlapping SNPs in the confidence sets for HEALS set 1 and SHS set 1 ( S2 , S3 , and S4 Tables ). For SHS, the confidence sets identified through the analysis of summary statistics were somewhat different than those obtained from analyses of individual-level data, with SNPs in summary statistic based SHS confidence set 1 overlapping with SNPs in HEALS sets 1 and 3 ( S2 – S5 Tables ). However, in both the summary statistic and individual-level data analyses, no overlap was observed between HEALS confidence set 2 and any SHS set or between SHS set 2 and any HEALS set. SuSiE did not identify a confidence set for NH, potentially due to the weaker associations observed in this group.
We examined the impact of BMI as a covariate in each cohort. While BMI is associated with DMA% in each cohort, the association for the lead SNPs identified in each cohort are largely unchanged. In the cohorts with available measurements of arsenic in drinking water (HEALS and NHSCS) water arsenic exposure was inversely associated with DMA% ( S8 Fig ), as reported previously [ 14 , 32 ]. However, there was no detectable interaction (P<0.05) between any of the lead SNPs and exposure level measured in drinking water.
SHS and NHSCS share a lead SNP (rs191177668) and this SNP is in LD (r 2 = 0.5–1.0 depending on reference population) with a HEALS lead SNP, rs145537350 ( Fig 3 ), suggesting a shared causal variant across cohorts. The shared lead SNP in SHS and NHSCS, rs191177668, has a p-value of 8x10 -8 and beta estimate of -0.062 in HEALS (isolated primary signal). Controlling for the primary signal, this SNP has p-values of 0.6 and 0.2 in the HEALS secondary and tertiary analyses. We also observe two signals in HEALS and one in SHS that appear distinct (or population-specific), with minimal LD among the lead SNPs ( Fig 3 ). Some lead SNPs differ substantially in MAF across populations ( Table 3 ). For example, SHS lead SNP rs145537350 is common in SHS (MAF = 0.14) but has much lower frequency in HEALS (0.007) and NHSCS (0.005). We further examined MAF differences for the lead SNPs across SHS centers ( S1 Table ). Here, we find generally consistent MAF across SHS centers, with the exception of rs4919688 which is more common in the Arizona center (MAF = 0.482) compared with the Dakotas and Oklahoma centers (0.252, 0.221 respectively).
P-values are from linear models adjusted for age, sex, and population structure (in HEALS and SHS). The top panel for each population shows the primary association (adjusted for lead SNPs from all non-primary signals, if present). Additional panels show p-values for secondary (HEALS and SHS) and tertiary (HEALS) association signals (adjusted for lead SNPs from all other signals). Three signals were identified for HEALS, two for SHS, and one for NHSCS. LD estimates are based on several 1,000 Genomes populations (SAS for HEALS, MXL/PUR/CLM/PEL for SHS, and EUR for NHSCS).
Conditional association analysis of TOPMed-imputed data identified association signals for urinary DMA% in the 10q24.32 region for all three cohorts ( Fig 2 ). We identified three independent signals in HEALS (lead SNPs rs145537350, rs12573221, and rs4919687), two in SHS (rs191177668 and rs4919688), and one in NHSCS (rs191177668). The per-allele association estimates for these lead SNPs varied in magnitude from ~2% to ~12% (in DMA% units) ( Table 2 ). These results are similar to those observed for the non-imputed data ( S3 Fig ). Analysis of MMA% and iAs% across all three cohorts produced results generally consistent with those observed for DMA%, with similar association signals detected for all three arsenic species ( S4 , S5 , S6 and S7 Figs ). Alleles associated with increased DMA% tended to be associated with decreased MMA% and iAs%.
Total urinary arsenic varied substantially across cohorts, with higher concentrations in HEALS compared to SHS and NHSCS ( Fig 1 ). This difference was also reflected in measures of arsenic in participants’ drinking water ( S1 Fig ). DMA% was highest in NHSCS and lowest in HEALS ( Table 1 and S2 Fig ). MMA% was highest in SHS and lowest in NHSCS while iAs% was highest in HEALS and lowest in NHSCS ( Table 1 ).
Discussion
The relationship between AS3MT genotype, gene expression in the 10q24.32 region, and arsenic metabolism are well-established; however, questions remain regarding the precise causal variants driving these associations, the potential differences in associations across ancestry groups, as well as the specific genes in the region influenced by 10q24.32 genotypes across different tissue types. In this project, we applied fine-mapping approaches to sequencing-based genotype data in the 10q24.32 AME-associated region to identify candidate causal variants across three cohorts exposed to varying levels of arsenic in their drinking water. Fine-mapping analyses revealed that there are likely multiple causal variants in the 10q24.32 region impacting AME (represented by DMA%), with at least one causal variant likely shared across populations. In silico functional annotation and QTL co-localization further revealed that several of our candidate causal variants overlap regulatory features and impact expression of AS3MT and local DNA methylation.
Under the assumption of shared biological mechanisms and shared causal variants across populations, cross-population association analyses can narrow the list of potential causal variants in a region by identifying SNPs showing consistent evidence of association across all examined populations [35–37]. However, this was not the case in our study, as meta-analysis of shared association signals in HEALS and SHS produced a confidence set of 8 variants, a set larger than the corresponding confidence sets based on HEALS alone (4 and 7 variants). This is likely due to the broader signal observed in SHS, resulting in a confidence set with many more SNPs than those observed for HEALS (the result of more extensive LD among nearby variants in SHS). This extensive LD in SHS makes discriminating between potential causal variants more challenging in the meta-analysis context.
A novel finding of our analysis is the identification of associations that appear to be population-specific in both HEALS and SHS, which require further examination. Failure to replicate genetic associations across populations has been observed in many studies and has driven the increased emphasis on diversity in GWAS [38–43]. One explanation for a lack of replication is differences in allele frequencies, which can reduce power in populations with low MAF [34]. For example, one of the HEALS confidence sets (lead SNP rs12573221, confidence set 2) has a MAF range of 6.1–16.5% in HEALS and 0.1–4.4% in SHS; thus, the SNPs are likely too rare in SHS to be examined at the present sample sizes. Similarly, signal 2 (lead SNP rs4919688, confidence set 2) has a MAF of 26.8% in SHS and 2.6% in HEALS, which may explain the lack of replication of this SHS signal in HEALS. Our study used different sample sizes for our three populations. Larger sample sizes, as seen in HEALS, increase the study power and decrease the standard error of effect estimates, and thus improve our ability to identify association signals [44]. It is possible that we simply lacked the power to identify all association signals across all populations.
Beyond MAF, differences in LD likely contribute to the differences in observed signals across populations. HEALS signal three is represented by rs4919687 which has a MAF >0.1 in all populations. However, while this variant represents an independent candiate causal variant and confidence set (set 3) in HEALS, it co-occurs with SNPs from HEALS confidence set 1 in SHS confidence set 1 (derived from summary statics). Furthermore, we find the HEALS tertiary lead SNP (in HEALS confidence set 3) among the top 30 SNPs of the SHS primary association signal. Thus, a single confidence set in SHS (set 1) may capture two confidence sets in HEALS. Differences in LD patterns across the populations is a possible explanation for these observations [35,38,45], causing two distinct signals (in HEALS) to be indistinguishable in SHS.
In addition to these factors, differences in subject recruitment and inclusion among the studies used for this work may also have contributed to the differences in association signals. NHSCS included individuals both with and without skin cancer. Arsenic exposure and AME are risk factors for skin cancer, so this selection could contribute to biases in observed associations [30,46]. Chronic arsenic exposure is also associated with increased health risks, so the long-standing exposure in HEALS may also distinguish it from the other populations and make direct comparison challenging [46]. It is possible that the observed population-specific signals reflect true differences in genetic effects across populations. Gene-gene or gene-environment interactions could result in different effect size estimates across populations [38,40]. Differences in exposure level across populations may also impact the observed associations [40,41,43]. For instance, attenuation of SNP effects in populations with low exposure may result in low power to detect association [40]. We examined the gene-environment interaction between our lead SNPs and arsenic exposure in HEALS and while arsenic exposure did have an independent effect on DMA%, there was no evidence of effect modification between any of our identified variants and arsenic exposure (S10 Table).
We sought to understand the regulatory mechanisms by which the causal variants impact gene function using co-localization analyses focused on AS3MT and surrounding genes and DNA methylation features. AS3MT is expressed in most tissues at detectable levels (S10 Fig), allowing us to examine co-localization across a wide range of tissues. Expression is highest in the adrenal gland, potentially due to co-regulation with nearby CYP17A1 which plays a role in steroid hormone formation or for protection against arsenic which can disrupt endocrine function [47,48]. While co-localization was not detected for adrenal tissue, it was detected in many tissue types with low expression (e.g. subcutaneous adipose). This suggests that our results were not driven by the variability in expression across tissues and that low expression levels did not prevent QTL detection.
The liver is the major site of arsenic metabolism, but we observed only suggestive evidence of co-localization in this tissue type (Table 5 and S11 Fig) We detected AS3MT eQTLs in liver (rs4919690, p = 7.15x10-8), but our relatively small sample size may have limited our statistical power to robustly identify co-localization across all sets of priors analyzed.
Despite the ancestry mismatch between our Bangladeshi participants and the GTEx donors, we found compelling evidence of co-localization between our DMA% association signal (represented by rs4919687, HEALS set 3) and a multi-tissue cis-eQTL for AS3MT in 21 tissue types. The minor allele at SNP rs4919687 was associated with decreased DMA% and decreased AS3MT expression in tissues in which co-localization was observed, supporting the hypothesis that decreased expression results in lower amounts of the enzyme and ultimately lower AME. Our co-localization results suggest that HEALS SNP rs4919690 may be the causal variant underlying this signal (Posterior Inclusion Probability, PIP = 0.09), as it is the lead eSNP for the co-localizing AS3MT eQTL in 14 tissue types, and it is the lead eSNP for co-localizing BORCS7 eQTL in 9 tissue types. The repeated appearance of this SNP as a lead SNP in GTEx QTL analyses (genotyping based on whole-genome sequencing) suggests causality. Some co-localization was observed between AS3MT eQTLs and a second association signal in HEALS, suggesting that this variant may also impact arsenic metabolism by regulating AS3MT expression across tissues.
No prior evidence suggests a role for BORCS7 in arsenic metabolism; however, we observe co-localization between a DMA% association signal and cis-eQTLs for BORCS7 in multiple tissues (as observed previously [49]). Expression levels of AS3MT and BORCS7 are correlated in nearly all tissues in which co-localization is observed, suggesting co-regulation by a common causal variant (and potentially other mechanisms) or two causal variants in very strong LD. The correlated expression of the two genes has been noted previously [50]. A mouse strain carrying a human BORCS7/AS3MT was created to study arsenic metabolism as the AS3MT promoter abuts the 3’ UTR of BORCS7 [51]. However, this study did not describe any specific role for BORCS7 in arsenic metabolism. It is likely that the SNPs in this region are pleiotropic, influencing both AS3MT and the expression of the surrounding genes. Furthermore, co-regulation of these genes has been previously reported in multiple tissue types [31,49,50,52].
The co-localization of mQTLs with DMA% association signals further increases support for rs4919690 as a causal variant, as mQTLs for 10q24.32 CpGs co-localize with eQTLs (represented by lead SNP rs4919690) for AS3MT in lung tissue. The DMA% decreasing allele at rs4919690 (A) is associated with increased methylation of cg08650961, a CpG located in the body of the CNNM2 gene outside of a CpG island, and decreased AS3MT expression in lung tissue.
The population mismatch between our studies of AME (Bangladeshi and American Indian populations) and GTEx donors (primarily European ancestry) likely decreased our power to detect co-localization in both our eQTL and mQTL analysis. Furthermore, the smaller sample size of SHS likely increased the standard error of association estimates in this population, and when this is combined with the population mismatch between SHS and GTEx we likely had a decreased ability to detect co-localization in this population Our mQTL analysis sample sizes were also small, some n<100 samples, and our analyses were restricted to 9 tissue types with available DNAm data. This limited our power for mQTL detection and our ability to detect co-localization. Additionally, in the populations examined for this study, we are not able to fully assess the contribution of organic sources of arsenic to variability in our AME phenotype, which could potentially bias the associations observed.
In this study, we do not assess the association of AME-related SNPs with arsenic toxicity risks. However, we have previously shown that the association signals represented by rs12573221 and rs4919687 show clear associations with arsenic-induced skin lesion risk in HEALS [53]. Similarly, the novel signal for SHS and NH that we report here (rs191177668), which is in strong LD with and can serve as a proxy for the novel signal in HEALS (rs145537350), is also associated with skin lesion risk in the dataset previously described (OR = 1.64, CI = 1.2, 2.24) [53].
This study builds on previous work in the 10q24.32 region [29,31,49,54,55] in several ways. Previous studies have found signs of positive selection near the AS3MT gene associated with efficient arsenic metabolism [56]. They found selection signals in multiple populations including several from South America and East Asia and found that several the SNPs associated with the protective haplotype were also associated with MMA%. None of the identified SNPs showed a significant association with DMA%, though in the context of our study one SNP did appear in lD with the secondary lead SNP in SHS.
We leverage data from multiple arsenic-exposed cohorts with diverse ancestry in a single study, allowing us to consider both shared and population-specific effects of inherited genetic variation on AME. Additionally, our targeted sequencing data enabled us to identify a novel, independent association between 10q24.32 variation and DMA% in HEALS that we were unable to detect in our previous array-based work [28]. We further increased our sample size for cis-eQTL analyses using the latest data from GTEx and incorporated both expression and methylation data into our examination of the SNPs’ mechanism of action. Finally, we provide evidence that there are likely multiple causal SNPs within the 10q24.32 region associated with AME. Together, this allowed us to provide evidence regarding the potential causal variants and mechanisms underlying the established association between the 10q24.32 region and AME. Future studies can build on these findings, potentially establishing cellular or animal models with perturbations of potential causal sites in order to directly assess the impact of specific alleles on arsenic metabolism.
[END]
---
[1] Url:
https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1010588
Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/