(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org.
Licensed under Creative Commons Attribution (CC BY) license.
url:
https://journals.plos.org/plosone/s/licenses-and-copyright
------------
Genome-wide methylation data improves dissection of the effect of smoking on body mass index
['Carmen Amador', 'Mrc Human Genetics Unit', 'Institute Of Genetics', 'Cancer', 'University Of Edinburgh', 'Edinburgh', 'United Kingdom', 'Yanni Zeng', 'Faculty Of Forensic Medicine', 'Zhongshan School Of Medicine']
Date: 2021-12
Variation in obesity-related traits has a genetic basis with heritabilities between 40 and 70%. While the global obesity pandemic is usually associated with environmental changes related to lifestyle and socioeconomic changes, most genetic studies do not include all relevant environmental covariates, so the genetic contribution to variation in obesity-related traits cannot be accurately assessed. Some studies have described interactions between a few individual genes linked to obesity and environmental variables but there is no agreement on their total contribution to differences between individuals. Here we compared self-reported smoking data and a methylation-based proxy to explore the effect of smoking and genome-by-smoking interactions on obesity related traits from a genome-wide perspective to estimate the amount of variance they explain. Our results indicate that exploiting omic measures can improve models for complex traits such as obesity and can be used as a substitute for, or jointly with, environmental records to better understand causes of disease.
Most diseases and health-related outcomes are influenced by genetic and environmental variation. Hundreds of genetic variants associated with obesity-related traits, like body mass index (BMI), have been previously identified, as well as lifestyles contributing to obesity risk. Furthermore, certain combinations of genetic variants and lifestyles may change the risk of obesity more than expected from their individual effects. One obstacle to further research is the difficulty in measuring relevant environmental impacts on individuals. Here, we studied how genetics (genome-wide markers) and tobacco smoking (self-reported) affect BMI. We also used DNA methylation, a blood-based biomarker, as a proxy for to self-reported information to assess tobacco usage. We incorporated the effect of interactions between genetics and self-reported smoking or methylation. We estimated that genetics accounted for 50% of the variation in BMI. Self-reported smoking status contributed only 2% of BMI variation, increasing to 22% when estimated using DNA methylation. Interactions between genes and smoking contributed an extra 10%. This work highlights the potential of using biomarkers to proxy lifestyle measures and expand our knowledge on disease and suggests that the environment may have long-term effects on our health through its impact on the methylation of disease-associated loci.
Funding: The authors want to acknowledge funding from the Medical Research Council UK (MRC,
https://mrc.ukri.org/funding/ ): MC_UU_00007/10, MC_PC_U127592696, MC_PC_U127561128; the BBSRC (
https://bbsrc.ukri.org/funding/ ): BBS/E/D/30002275, BBS/E/D/30002276, and a Wellcome Trust (
https://wellcome.org/grant-funding ) Investigator Award to AMM: 220857/Z/20/Z. YZ was supported by the General Program of National Natural Science Foundation of China (81971270) and Sun Yat-sen University Young Teacher Key Cultivate Project. Generation Scotland received core support from the Chief Scientist Office of the Scottish Government Health Directorates [CZD/16/6] and the Scottish Funding Council [HR03006] and is currently supported by the Wellcome Trust [216767/Z/19/Z]. Genotyping of the GS:SFHS samples was funded by the Medical Research Council UK and the Wellcome Trust (Wellcome Trust Strategic Award "STratifying Resilience and Depression Longitudinally" (STRADL) Reference 104036/Z/14/Z). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Data Availability: Datasets supporting the conclusions of this article are included within the article and its supporting information tables. Generation Scotland data are available from the MRC IGC Institutional Data Access / Ethics Committee (
https://www.ed.ac.uk/generation-scotland/for-researchers/access ) for researchers who meet the criteria for access to confidential data. The managed access process ensures that approval is granted only to research which comes under the terms of participant consent which does not allow making participant information publicly available. UK Biobank data are available to researchers in academic, commercial, and charitable settings anywhere in the world by applying in:
https://www.ukbiobank.ac.uk/register-apply/ .
Here, we aim to estimate the contribution to obesity variation of smoking and its interaction with genetic variation in two different cohorts, using self-reported measures of smoking and a methylomic proxy for smoking. Thus, we measured the contribution of smoking-associated methylation signatures and genome-by-methylation interactions to trait variation. We performed analyses in both sexes jointly and independently and also including genome-by-smoking-by-sex interactions, and we showed that omics data can be exploited as proxies for environmental exposures to improve our understanding of complex trait architecture. We observed that using an appropriate set of CpG sites, methylation can be used to model trait variation associated with smoking, and genome-by-smoking interactions suggesting potential applications for better prediction and prognosis of complex disease and expanding these modelling approaches to other environments and traits.
In this study, we aim to estimate the contribution of smoking and its interaction with genetic variation to obesity variation, using self-reported measures of smoking and a methylomic proxy of smoking exposure. We hypothesised that use of a proxy, rather than self-reported smoking, and fitting genome-by-smoking interactions would lead to more a more accurate model. DNA methylation is an epigenetic mark that can be affected by genetics and environmental exposures [ 17 – 22 ]. Variation in methylation is correlated with gene expression, plays a crucial role in development, in maintaining genomic stability [ 23 – 25 ], and has been associated with disease [ 26 – 30 ] and aging [ 31 , 32 ]. Epigenome-wide association analyses (EWAS) have identified multiple associations between DNA methylation levels at specific genomic locations and smoking [ 18 , 33 – 35 ]. These so-called signatures of smoking in the epigenome can help discriminate the smoking status of the individuals in a cohort [ 19 ], and, if sufficiently accurate, could be an improvement on self-reported measures, by adding information not captured (accurately) in the self-reported measure, such as passive smoking or real quantity of tobacco smoked.
Variation in obesity-related traits such as body mass index (BMI) has a complex basis with heritabilities ranging from 40 to 70%, with the genetic variants detected to date explaining up to 5% of BMI variation [ 1 ]. In addition to genetics, studies suggest that the increase in obesity prevalence in recent decades is linked to environmental causes, such as dietary changes and a more sedentary lifestyle [ 2 – 5 ]. The fact that all relevant environmental effects have not been accounted for in genetic studies has potentially reduced GWAS power to detect susceptibility variants. On top of this, several studies suggest that gene-by-environment interactions also play an important role in obesity and other complex traits [ 2 , 6 – 10 ] and many researchers are focusing on finding interactions between specific genes and certain environments. Genotype-by-age interactions and genotype-by-sex interactions have also been detected for several health-related traits [ 10 – 12 ]. Recently, when performing GWAS on traits like BMI, lipids, and blood pressure, several studies have stratified their samples on the basis of smoking status or have explicitly modelled interactions leading to identification of new genetic variants associated with those traits [ 13 – 15 ]. Some studies have attempted to quantify the overall contribution of genetic interactions with smoking. Robinson, et al. [ 12 ] estimated them to explain around 4% of BMI variation in a subset of unrelated UK Biobank samples. In contrast, also in UK Biobank, using a new approach that only requires summary statistics, Shin & Lee [ 16 ] estimated the contributions of the interactions to be much smaller: 0.6% of BMI variation.
Fig 4 shows the estimates of the proportion of BMI variance explained by different sources included in the mixed linear models in ~9K individuals in Generation Scotland (GS9K - right panel) including models with methylation and genome-by-methylation interactions for models with self-reported smoking status fitted as a fixed effect. Results for other traits are displayed in S1 Fig and full details of the analyses for all traits including estimates, standard errors and log-likelihood ratio tests, and results for smoking status fitted as a random effect are shown in S8 Table . Inclusion of the methylation covariance matrix improved the models for all traits and explained 0.7% of the variance for height and between 3–5% of the variance for obesity-related traits. After including in the model this smoking-associated methylation component, the variation explained by self-reported smoking status dropped to zero for all traits ( S8 Table , Model = GKEM), i.e., smoking-associated methylation absorbed the variance explained by the self-reported variable. When exploring the interactions with self-reported smoking status, the estimates in the subset of individuals with methylation data available (N ~ 9K) are substantially larger than in the whole cohort. For example, for BMI, the size of the genome-by-smoking component increased from 4% (GS18K) to 13% (GS9K), however, due to the large standard errors, these two estimates are not significantly different from each other. Inclusion of the genome-by-methylation interaction component nominally improved the model fit for weight, BMI, and waist circumference, with estimates of the interaction component of over 20% of the trait variance. When fitting jointly the two interaction components (genome-by-smoking and genome-by-methylation) the estimates were not significant for either interaction component (or just nominally significant in the case of genome-by-methylation for BMI).
To explore the value of DNA methylation data as a proxy for environmental variation, we modelled similarity between individuals based on their DNA methylation levels at a subset of 62 CpG sites previously associated with smoking [ 18 , 33 ] and which had heritabilities lower than 40%, aiming to target methylation variation that is predominantly capturing environmental variation (for details see Methods ). To show that our models can provide accurate estimates we performed a series of simulations. Details and results for those are shown in S1 Text .
Fig 3 shows the proportion of BMI variance explained by the genome-by-smoking interactions in each of the cohorts and sub-cohorts (Generation Scotland, four UK Biobank groups and the UK Biobank meta-analysis). Results for other traits are displayed in S2 Fig and full details of the analyses for all traits including estimates, standard errors and log-likelihood ratio tests are shown in S4 , S5 and S6 Tables. Results for the genome-by-smoking-by-sex interactions are shown in S3 Fig and S7 Table .
We sought to replicate the results observed in Generation Scotland with data from the UK Biobank cohort (UKB). Analyses were run in four sub-cohorts for computational reasons (G1, G2, G3 and G4, grouping individuals in geographically close recruitment centres; for more information see Methods and S3 Table ), with the two sexes considered jointly and separately in three different analyses (the sample size of these groups permitted estimates to be obtained with the two sexes separately). Individual sub-cohort analyses were meta-analysed.
The heritability estimates of all analysed traits (i.e., proportion of the variance captured by G and K matrices together) are consistent with previous estimates in the same cohort [ 36 ]. The estimated contributions of smoking status (and the other covariates) to trait variation ranged between 0.35% (for height, assessed as a negative control, as we do not expect to find the same type of effects as with obesity-related measures) and 1.2% (for HDL cholesterol) and are shown in S2 Table . When included as random effect, smoking explained between 0.1% (for height) and 2.5% (for HDL cholesterol) of trait variation ( S1 Table ). Our models identified significant genome-by-smoking interactions for weight, BMI, fat percentage and HDL cholesterol (with log-likelihood ratio tests showing that the models including the interaction were significantly better), explaining between 4 and 8% of trait variation ( Table 1 ), similar to the values of Robinson et al. [ 12 ] for BMI. When the interactions included sex (genome-by-smoking-by-sex interactions) the component was significant for all traits, and explained variance ranging between 2–9% ( S2 Table ).
Results of GKGxSmk model for all traits in GS18K and meta-analysis of the recruitment centre-based sub-cohorts in UK Biobank. The table shows, for each trait, the proportion of the phenotypic variance explained (Var), its standard error (SE), the log-likelihood ratio test P value (LRT P, only for the interaction), the meta-analysis P value (P), for each of the components in the model: Genetic (G), Kinship (K) and Genome-by-Smoking interaction (GxSmk). Highlighted P values indicate nominally significant results for the GxSmk component.
The aim of this work was to explore the influence of smoking and genome-by-smoking interactions on trait variation, modelling them from self-reported information and using DNA methylation in both sexes jointly and separately. We used a variance component approach to fit a linear mixed model including a set of covariance matrices representing: two genetic effects (G: common SNP-associated genetic effects and K: pedigree-associated genetic effects not captured by the genotyped markers at a population level; the inclusion of matrix K in the analyses allows to use the related individuals in the sample), environmental effects reflecting impact of smoking (modelled as fixed or random effects), and genome-by-smoking effects (GxSmk) representing sharing of both genetics (G) and environment (smoking, Smk), and we estimated the proportion of variation that each component explained for seven obesity-related measures: weight, body mass index (BMI), waist circumference (waist), hip circumference (hips), waist-to-hip ratio (WHR), fat percentage (fat%), and HDL cholesterol (HDL) as well as height, to serve as a negative control. We defined the environment using either self-reported questionnaire data or its associated methylation signature as a proxy. A summary of the experimental design used in this study is shown in Fig 1 . For more detailed information, see Methods .
Discussion
Most complex diseases have moderate heritabilities, with various environmental sources of variation, for example, lifestyle and socioeconomic differences between individuals, also contributing to disease risk [5]. These diseases, particularly obesity, pose major challenges for public health and are associated with heavy economic burdens [3,4,37]. To prevent the problems resulting from complex diseases, effective personalised approaches that help individuals to reach and maintain a healthy lifestyle are required. To achieve that aim, knowledge of environmental effects and gene-by environment interactions (GxE, i.e., understanding the differential effects of an environmental exposure on a trait in individuals with different genotypes [38]) is required. This is a challenge, particularly for environmental factors that are not easy to measure, or that are measured with a lot of error. It has previously been assumed that GxE effects contribute to variation in obesity-related traits [6,8], but the total contribution to trait variation was not known. Previous analyses exploring GxE in obesity, as well as other traits, took advantage of particular individual genetic variants with known effects, or constructed polygenic scores, combining several genetic variants which reflect genetic risks for the individuals [39,40]. Here we analysed contributions of interactions between the genome (as a whole) with smoking, both using self-reported measures of smoking and methylation data as a proxy for smoking.
Our estimates of the effects of genome-by-smoking interactions in obesity-related traits are larger than those estimated in Shin and Lee [16] but in line with Robinson, et al. [12] for BMI. However, our analyses indicate that the magnitude is substantially different in the two sexes, with interactions playing a bigger role in males for most traits studied (weight, waist, hips, fat%). Joint analysis of males and females provides less accurate estimates, suggesting that splitting the sexes or modelling the interactions with sex is a more sensible way of analysing the data. The estimates of the variance explained by the interaction components obtained from the genome-by-methylation analyses were large, with also large standard errors. These results, despite not being significant after multiple correction testing, are potentially interesting and should be investigated further. Some studies have suggested that there is potential confounding between interaction and covariance effects in linear mixed models. The CpG sites used to model the methylation similarity between individuals were previously corrected for genomic effects (see Methods) removing potential covariance between the genetic and methylation effects [41,42].
We estimated that the impact of genome-by-smoking interaction ranges from between 5 to 10% of variation in the studied traits with the exception of height, which we used as a negative control. Our results suggest a larger interaction component in traits associated with weight (BMI, weight, waist, hips) than in those more related to adiposity (waist-to-hip ratio, fat percentage). Biological interpretation of these interactions implies that some genes contributing to obesity differences between individuals have different effects depending on smoking status. This could be mediated in several ways, for example, via genetic variants that affect both obesity and smoking. Some metabolic factors associated with food intake, such as leptin, are suspected to play a role in smoking behaviours, and rewarding effects of food and nicotine are partly mediated by common neurobiological pathways [43]. For example, if these common genetic architectures balance the two behaviours (i.e., more tobacco consumption leading to eating less [43]) the genetic effects of obesity-related traits will be different depending on the smoking status. The interactions could also be driven by gene-by-gene interactions (GxG), i.e., genetic variants affecting obesity modulated by smoking associated genetic variants. Under this scenario smoking status would be capturing smoking associated variants, and the genome-by-smoking interaction would represent GxG instead of GxE. However, given the relatively small heritability of tobacco smoking (SNP heritability ~18% [44]), it is unlikely that all the variation we detected is driven by GxG.
One of the sub-groups of UK Biobank (G3) showed consistently non-significant estimates of the interactions for all traits. The different behaviour for this cohort is not driven by characteristics like the proportion of smokers (S3 Table), or by its genetic stratification. Without any other evidence we cannot attribute these systematic lower estimates to anything but chance.
When we estimated the effect of smoking using the methylomic proxy (62 CpG sites associated with smoking from two independent studies [18,33]), the smoking associated variance increased substantially for all traits (from 2% to 6% for BMI). The methylation component captured the same variance as the self-reported component and some extra variation (S8B Table). This increase in variation captured could be due to a better ability to separate differences between different levels of smoking (e.g., the self-reported status does not include amount of tobacco smoked, while the methylation might be able to capture this information better). These smoking associated CpG sites could also be picking up variation from other environmental sources that are not exclusively driven by smoking, but correlated with it, such as alcohol intake. When checking in the literature for other possible associations between the 62 CpG sites and other environmental measures (S9 Table), 20 of these CpGs have previously been associated with age, 15 with alcohol intake or alcohol dependence, 11 with educational attainment, 10 with different types of cancer; and a few with other diseases [45,46]. Unlike for smoking, for most of these associations with other traits, it is unclear if they are casual, or if they could as well be driven by smoking (e.g., alcohol consumption is associated with smoking and picking up a smoking signal).
The fact that variation in obesity can be explained by CpG sites associated with smoking does not imply a causal effect of smoking or methylation on obesity. Methylation is affected by both genetic and environmental effects. Here we selected a subset of CpG sites with moderate to small heritability (lower than 40%, S9 Table) and we modelled them jointly with a genomic similarity matrix, making it unlikely that the variance picked up by the methylation matrix is genetic in nature. While most changes in methylation at these CpG sites are thought to be causally driven by smoking [18], associations between methylation and other complex traits, such as BMI, are less well characterised and mostly likely to be reversely caused [47] (i.e., BMI affecting methylation), however, since our aim was to use methylation as a proxy for the environment, causality does not impact the conclusion of the study. It is, however, important to notice the variable nature of the methylation data, which will change during the life course of individuals unlike the genetics of the individuals, making the inclusion of methylation, measured far back in time, less relevant in a prediction framework [48]. Although this approach should be useful in other populations, a relevant set of CpG sites should be selected reflecting demographic and ethnic relevant associations [49].
To conclude, we showed that methylation data can be used as a proxy to assess smoking contributions to complex trait variation. We used DNA methylation levels at CpG sites associated with smoking as a proxy for smoking status to assess the contribution of smoking to variation in obesity-related traits. This principle could be extended to take advantage of the wealth of uncovered associations between various omics and environmental exposures of interest, particularly for those that are difficult to measure. In humans, relevant interactions could be investigated by exploiting the links between methylation and alcohol intake, metabolomics and diets, the gut microbiome and diets, etc., and expanding to other species, between the gut microbiome and greenhouse emissions in cattle. This could help expanding our knowledge on their contribution to complex phenotypes, and potentially, help understand the underlying biology and to improve prediction and prognosis.
[END]
[1] Url:
https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1009750
(C) Plos One. "Accelerating the publication of peer-reviewed science."
Licensed under Creative Commons Attribution (CC BY 4.0)
URL:
https://creativecommons.org/licenses/by/4.0/
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/