(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .



Using epigenomics to understand cellular responses to environmental influences in diseases [1]

['Julia J. Wattacheril', 'Department Of Medicine', 'Center For Liver Disease', 'Transplantation', 'Columbia University Irving Medical Center', 'New York Presbyterian Hospital', 'New York', 'United States Of America', 'Srilakshmi Raj', 'Division Of Genomics']

Date: 2023-02

It is a generally accepted model that environmental influences can exert their effects, at least in part, by changing the molecular regulators of transcription that are described as epigenetic. As there is biochemical evidence that some epigenetic regulators of transcription can maintain their states long term and through cell division, an epigenetic model encompasses the idea of maintenance of the effect of an exposure long after it is no longer present. The evidence supporting this model is mostly from the observation of alterations of molecular regulators of transcription following exposures. With the understanding that the interpretation of these associations is more complex than originally recognised, this model may be oversimplistic; therefore, adopting novel perspectives and experimental approaches when examining how environmental exposures are linked to phenotypes may prove worthwhile. In this review, we have chosen to use the example of nonalcoholic fatty liver disease (NAFLD), a common, complex human disease with strong environmental and genetic influences. We describe how epigenomic approaches combined with emerging functional genetic and single-cell genomic techniques are poised to generate new insights into the pathogenesis of environmentally influenced human disease phenotypes exemplified by NAFLD.

It is now timely to rethink how we use molecular genomic assays to understand how the environment influences cells, and how this influence can vary because of differences in the DNA sequence between individuals. NAFLD is a useful focus, representing a disease of major public health importance with strong environmental influences, multiple extrahepatic manifestations, and variable interindividual susceptibility to disease development and progression. For example, the fibrosis that is part of more advanced NAFLD is influenced by glucose as an environmental agent and represents a major target for intervention to prevent the morbidity and mortality of end-stage liver disease. The goal of this review is to prompt innovative ways of thinking about how to gain therapeutic insights into environmentally driven diseases such as NAFLD, using novel molecular genomic and cellular approaches.

The constructive way of thinking about the nonepigenetic effects influencing our interpretation of EWAS is that they are not merely impediments to interpretation, but instead offer alternative insights into disease pathophysiology. We will review how a systematic change in either a cell subtype proportion or a genetic variant can lead to a DNA methylation, transcriptional or other changes that we collectively refer to as “molecular genomic” processes, in preference to the ambiguous term “epigenetic.” Such systematic changes reflect cell subtype or DNA variant associations with the exposure or disease and would be valuable to understand as potential contributors to the phenotype. The study of the relationship between genetic and molecular genomic variation is typically described as functional genomics and encompasses the identification of sequence variants that influence molecular genomic processes, including gene expression levels (expression quantitative trait loci (eQTLs)), DNA methylation (meQTLs), and chromatin accessibility (caQTLs). Some functional variants are only revealed following an environmental exposure, uncovering a potential way that individuals can differ in their responses to the same environmental challenges [ 2 ]. Furthermore, now that we are in the era of increasingly sophisticated single-cell genomic assays, we are finding unexpected heterogeneity within canonical cell types, and effects of functional sequence variants that are restricted to subtypes of cells [ 3 ], representing unprecedented insights into how environmental influences act within a tissue.

In this review, we take a fresh look at the evidence for epigenetic mediation of environmental influences. We use nonalcoholic fatty liver disease (NAFLD) including nonalcoholic steatohepatitis (NASH), as our disease focus. NAFLD is an excellent paradigm for a common disease about which much is known in terms of environmental and genetic risk factors. We describe how NAFLD has been studied to date with exploratory EWAS as an example of how the oversimplified interpretation of results may be misleading.

We have, however, become more critical in our interpretation of genome-wide studies of epigenetic mediators, described as epigenome-wide association studies (EWAS). We have assumed that a change in a transcriptional regulator like DNA methylation, found when comparing pools of cells, reflects individual cells undergoing reprogramming. This interpretation is now recognised to be too simplistic and ignores other ways the same outcome can occur. Evidence linking environmental exposures with epigenetic modifications needs to be reassessed as part of this updated perspective.

Many human diseases have a clear environmental contribution. For decades, it has been assumed that the environment can influence the regulation of gene expression through “epigenetic” mechanisms, which can be interpreted as meaning the molecular regulators of transcription, such as DNA methylation and chromatin states. The extracellular environmental cue acting to modify epigenetic states has even been given its own name, the “epigenator” [ 1 ]. Evidence supporting this model has mostly come from findings of altered epigenetic patterns following environmental exposures. It is a widely accepted model that epigenetic alterations are the primary mediators of environmental influences, potentially propagating these effects long after the cessation of exposure because of the assumed property that epigenetic regulation can self-propagate through cell division.

It would be misleading to give the impression that the only discoveries in this field were being made in mammals or in more economically developed countries. In 1966, Madeleine Charnier, working at the University of Dakar, Senegal, found that sex determination in a reptile species (the rainbow Agama lizard, Agama agama) depended on the temperature at which the embryo develops [ 16 ]. Temperature-dependent sex determination occurs in many amphibians and fish, another example of a phenotype dependent on environment and not determined by DNA differences. Plant biology was revealing numerous striking examples of exposures apparently being “remembered” by the organism, such as vernalisation, the response of plants to the prolonged cold exposure of winter, which was found to induce the silencing of the FLC gene [ 17 ], thus maintaining a memory of the past exposure.

Earlier work also supported DNA methylation responding to environmental influences. In 1984, rats were fed a methyl-deficient diet with the idea that it would promote neoplasia. The diet was associated with the development of decreased DNA methylation in the livers of the animals [ 7 ]. An influential model was the “viable yellow” mouse, animals with a mutation with variable effects on phenotype, ranging from no apparent phenotype at all (pseudoagouti) to a phenotype of yellow fur, obesity, hyperinsulinaemia, and an increased rate of malignancies. In 1998, Wolff showed that feeding pregnant dams a diet designed to increase DNA methylation increased the proportion of pseudoagouti offspring [ 8 ], subsequently shown to be associated with increasing DNA methylation at the mutation site, the insertion of an IAP retroelement upstream of the Nonagouti gene [ 9 , 10 ]. This was a fascinating paradigm, indicating that maternal diet during pregnancy could influence the eventual adult phenotype of her offspring, and represents part of the foundation for the field of the Developmental Origins of Health and Disease (DOHaD) [ 11 ]. The idea that the organism retains a memory of a past exposure involved a different use of the term “epigenetic,” to describe molecular regulatory processes that were heritable from parent to daughter cells [ 12 ]. Intrauterine nutritional deprivation during the Dutch Famine of 1944–1945 was described to be associated with obesity in male offspring in young adulthood, limited to those exposed during the first half of pregnancy, whereas exposure in the last trimester and the first months of life was found to have significantly reduced rates of obesity [ 13 ]. While the epidemiologists performing these studies were concerned that there could be some confounding effects on these findings, such as the greater fertility and fecundity of women from higher socioeconomic classes [ 14 , 15 ], the model emerged of a foetus adapting to a stressful environment in utero and retaining this memory of exposure in a way that is maladaptive postnatally. This finding became a model for epigenetic mediation of environmental exposures long after the environmental exposure was no longer present.

The idea of a regulatory layer of information, acting to influence cellular properties without causing DNA changes, puts molecular epigenetic processes in the spotlight as potential mediators of differences between cells or organisms that share the same DNA sequence. The puzzle of discordance of monozygotic twins for conditions that clearly had a genetic contribution prompted speculation that the environment could be different (nonshared) for each twin and that this difference in exposures caused only one to develop the disease [ 5 ]. Early studies of DNA methylation showed what appeared to be a progressive difference with age between monozygotic twins [ 6 ], an association that suggested a mechanism for this discordance.

The ability of the same DNA to behave in different ways in different cell types appears to indicate a layer of information residing on top of the DNA sequence, for which the back-translation of epi- (above, upon) -genetics (DNA sequence) seemed a useful descriptive term. It has also become commonplace to use the word “epigenetic” to describe events that appear to occur variably in cells and organisms in ways that are not mediated by DNA sequence variability, making “epigenetic” synonymous with “nongenetic” in a further use of the term.

In this review, we define the environment as the influences extrinsic to and influential for the cell, tissue, or organism. Epigenomic studies can be defined as genome-wide mapping of molecular mediators of transcriptional regulation, given that such mediators have been traditionally described to have “epigenetic” properties [ 4 ].

The second problem is that these global regulators of transcription lack sequence specificity and therefore do not have the ability to choose specific loci for selective activity. Once again, a separate mediators has to be active initially, for example, when short chain fatty acid exposure causes specific subgroups of promoters to be selected for increased histone crotonylation [ 28 ]. Later, we make the case that this is likely to be mediated by sequence-specific TFs establishing regulatory patterns in response to environmental challenges, patterns that are passed on to the global regulators of transcription, allowing TFs to have a primary role in the model of epigenomic responsiveness to environmental stimuli.

The first is cellular memory. With the exception of DNA methylation, there are no clear biochemical mechanisms for self-propagation of the specific molecular events in Table 1 to daughter chromatids following cell division. What needs to be invoked to support the idea that these chromatin constituents have long-term consequences is a two-step model, one involving an initial environmental perturbation, followed by the maintenance of a new equilibrium of transcriptional regulators that have the capacity to maintain their patterns long term and through cell division. For example, is it possible that a hyperglycaemic event that increases O-GlcNAcylation acutely then induces EZH2 activity nearby in the genome, causing de novo patterns of formation of H3K27me3 that are then maintained, even after the hyperglycaemia subsides?

These observations indicate that the environment may be able to act through regulators of chromatin and modifiers of DNA to influence gene expression. While these are intriguing findings and provide clear candidates for the mediation of environmental effects on transcriptional regulation, there are two problems that remain to be overcome before invoking any as primary mediators of long-term, stable cellular reprogramming.

We assume that a cell is faithfully able to propagate its transcriptional regulatory organisation to daughter cells as a way of transmitting the memory of a prior environmental exposure. However, there is another factor worth considering—it appears that some histone modifications and histone variants can directly mediate responses to the environment. In Table 1 , we summarise some of these possible environmental influences and molecular regulatory responses.

DNA methylation remains the paradigm for a transcriptional regulator that can maintain a biochemical memory through cell division, as 5-methylcytosine can be propagated from a parent chromatid to both daughter chromatids. There is also evidence for propagation of a repressive histone modification (histone H3 lysine 27 trimethylation, H3K27me3) through a number of generations of the nematode C. elegans [ 18 ]. TFs appear to remain bound to their DNA target sites through DNA replication, a phenomenon described as “mitotic bookmarking” [ 19 ], representing a further way that memory of the molecular organisation of the chromatid in a parent cell can be passed to replicated chromatids in daughter cells.

“Epigenetic” molecular regulators of transcription are numerous. The pattern throughout most of the mammalian genome is one of DNA wrapped around octamers of histones to form a nucleic acid:protein complex called a nucleosome. Most of the cytosines located at CG dinucleotides genome-wide are modified by the addition of a methyl group to form 5-methylcytosine. Where the genome departs from these default patterns typically represents the locations of regulatory elements, where sequence-specific proteins like TFs bind to the DNA. The histones in the flanking nucleosomes acquire patterns of posttranslational modifications (PTMs) that flag these sites as promoters, enhancers, or other regulatory elements. Further distinctive patterns of DNA and histone modifications are found at loci undergoing transcription, while repressive histone modifications or histone variants represent a more macro level of organisation, defining heterochromatic regions of the genome.

It is encouraging that the few genome-wide DNA methylation studies published in recent years suggest that blood DNA methylation measurements provide a stable and accurate assessment of risk when calculating MRS, for a range of phenotypes [ 99 – 102 ]. One recent study including the largest reported MRS cohort (n = 831) indicated that DNA methylation scores outperform baseline risk and GRS, improving imputation of 139 outcomes, compared with just 22 improved through GRS [ 103 ]. Many of these MRS replicated in external cohorts of different ethnicities and showed variable but robust replication across kidney-related traits in diverse populations. This remains an emerging area of research but looks to have room to improve when the accuracy of MRS is enhanced with increased sample sizes.

DNA methylation risk scores (MRS) are proposed to capture the effects of the environment in generating a risk score for a phenotype [ 98 ]. Like a GRS, the MRS summarises information across multiple informative loci in the genome to generate a single score, but based on risks associated with DNA methylation values, not with genetic variants. There are numerous factors that limit the utility and widespread calculation of MRS. The first is that there are too few large-scale EWAS with concurrent genotyping to provide adequate external datasets to calculate MRS [ 98 ]. The DNA methylation arrays may also introduce bias that skews interpretability due to the microarray design. For traits such as NAFLD that involve strong environmental effects, the hope would be that MRS would capture these influences and would offer predictive power beyond sequence variation alone.

Both GWAS and EWAS have been touted as having great promise for precision medicine, particularly for polygenic disorders such as NAFLD and NASH [ 94 ]. For polygenic traits, risk prediction through a risk score offers one such avenue. Typically, genetic risk scores (GRS; also described as polygenic risk scores (PRS)) are constructed to identify high-risk individuals based on GWAS results. The GRS is calculated as the weighted sum of the risk alleles for a trait in an individual, using weights determined by the best statistically powered GWA study for the trait. GRS have shown mixed utility in the ability to identify high-risk groups of individuals [ 95 ], and limited translation across ethnicities [ 96 ]. Recent NAFLD GWAS from the UK Biobank have identified >90 associated genomic loci, allowing the development of a GRS that successfully identified high-risk cases as having an odds ratio of 2.1 compared to individuals with the lowest NAFLD GRS score [ 94 ]. This difference is more modest than GRS for other diseases, such as the 3-fold higher risk for coronary artery disease [ 97 ].

The nine studies of Table 2 are used to represent the broader field of EWAS. Like most EWAS, each study has its own strengths, as defined above, but equally no individual study can be said to have addressed all of the problems inherent to how EWAS are currently performed. The potential outcome that results from failing to account for these problems is that changes in DNA methylation could be overinterpreted, assumed to represent cellular reprogramming responding to an environmental provocation, and causative of the phenotype, whereas the DNA methylation change could instead be due to changes in cell subtype composition within the tissue, genetic differences between individuals, or the consequence of the hepatic phenotype.

Finally, as mentioned above, if we know something about the genetic susceptibility affecting individuals within the cohort affected by the disease, that information is probably worth including. NAFLD has been found to be associated with protein-coding sequence variation, including the I148M missense variant in PNPLA3 that compromises the individual’s ability to hydrolyse triglycerides [ 92 ]. It would be unwise to assume that this subgroup of patients with NAFLD or NASH is experiencing the same environmental exposures at the tissue and cellular level as individuals with different or no recognisable pathogenic variants. Including protein-coding pathogenic variant information as a likely source of variability in EWAS is prudent as a major influence on phenotypic outcomes. It is likely that the pathogenic burden in the genome (genetic variation contributing to disease) is concentrated in loci of open chromatin in disease-relevant tissues [ 93 ]. In this “omnigenic model” of complex traits, genes with modest effects throughout the genome are influencing disease etiology through cell- and tissue-specific effects. This model supports the findings with NAFLD that genetic and epigenetic mechanisms interact with environment in a variable, tissue- and cell-dependent manner that introduces variability to EWAS outcomes.

In performing these kinds of EWAS, the ancestries of the patients studied should be described. Two of the Table 2 studies were on Han Chinese [ 57 , 60 ]; one was on “Caucasian women” [ 59 ], and one on Japanese patients [ 58 ]. The effect of DNA sequence variability on DNA methylation variability is so strong that ancestry has been shown to be predictable from DNA methylation assays [ 86 ]. Furthermore, with clear differences in susceptibility to NAFLD and NASH as complications of obesity between Hispanic and Black American patients [ 87 , 88 ], not accounting for ancestry in study design could be a strong influence on the results obtained if the affected individuals are disproportionately of one ancestral category. The meQTLs in a specific tissue type can differ between individuals of different genetic ancestries [ 89 , 90 ], which could cause the appearance of DNA methylation differences associated with a disease if the individuals studied in the disease group are disproportionately from distinct ancestral origins to the controls. The variability in DNA methylation and gene expression can even occur at fine-scale geographic levels, as demonstrated in a study identifying within-island DNA methylation and gene expression differences in individuals of diverse ancestry in Indonesia [ 91 ].

When the proportion of DNA methylation variation attributable to DNA sequence variation has been estimated, using different assays, tissue types, and ways of making these estimates, very substantial effects have been found, estimated to vary between 14% and 80% [ 78 – 85 ]. Of the studies in Table 2 , that by Nano and colleagues was distinctive for identifying which DNA methylation changes were attributable to DNA sequence variation. They found four sites of distinctive DNA methylation that survived their other rigorous filtering criteria, of which three were attributable to DNA sequence variation, leaving them with a single locus near the SLC7A11 gene where DNA methylation appeared to be associated with hepatic steatosis on its own and not genetic variability between the groups tested [ 55 ].

These are identified by finding a difference in DNA methylation between individuals that correlates with having one or other allele for a DNA sequence variant, in this case showing a variant on the same chromosome (a cis-meQTL). A significant change in DNA methylation associated with these differing allelic states defines the meQTL. Whether the meQTL causes the DNA methylation change is less certain—within the haplotype containing the meQTL and the locus where DNA methylation was tested will include other DNA sequence variants, one or more of which could be directly influencing the DNA methylation as a functional variant.

Finally, the influence of genetic sequence variation between the individuals studied needs to be considered. It is now widely recognised that DNA sequence variability can be associated with DNA methylation differences between individuals, defining functional variants referred to as meQTLs [ 77 ]. These are usually revealed when both genotyping and DNA methylation studies are performed on cohorts of individuals, testing whether differences in DNA methylation (the quantitative trait) are associated with the presence of different alleles for a variant at a locus within the flanking 1 Mb. In Fig 2 , we illustrate the example of a locus with a C/C genotype on both alleles in 81% of people in the population, a heterozygous C/A in 18% and a homozygous A/A in 1%. DNA methylation at a nearby site averages 40% in the C/C individuals, 30% in the C/A, and 20% in the A/A people, revealing a trend in DNA methylation associated with genotype. It may not be the variant itself with the C:A polymorphism that mediates the effect on the DNA methylation, as that locus is transmitted with a substantial amount of flanking DNA, representing a haplotype in which a separate functional variant could also be carried.

Of the studies in Table 2 , the Nano group [ 55 ] applied the minfi software package [ 74 ] to account for six leukocyte subtypes in analysing their results of peripheral blood studies, a mainstream approach in EWAS at present. Had cell subtypes been studies in liver itself, multiple changes would be expected, confirmed by the Johnson study that used the EpiDISH cell subtype deconvolution approach [ 75 ], revealing a decrease in epithelial cells and increases in immune cells, in particular natural killer (NK) T lymphocytes, with the progression of fibrosis in their liver samples. In the earlier stages of NAFLD, reprogramming of cellular properties should be reflected by the accumulation of large amounts of lipids in the cytoplasm of hepatocytes, manifested histologically by the displacement of the nucleus to a peripheral intracellular location. Cell subtype compositional changes would be likely to include the influx of inflammatory cells into the liver parenchyma with disease progression, and the transdifferentiation of hepatic stellate cells to the myofibroblasts that produce the extracellular matrix proteins that cause fibrotic scarring of the organ. The cells composing the liver affected by NAFLD or NASH are therefore substantially different to those in normal, healthy livers, both in terms of their innate properties and the representations of different subtypes. The development of single-cell transcription profiles from these kinds of tissues is likely to generate new ways of testing for both reprogramming and cell subtype proportional changes in disease like NAFLD [ 76 ].

It is possible for cell subtype proportions to vary within the tissue studied and between the groups tested and lead to a result showing changes of DNA methylation without any cells in the samples having undergone molecular reprogramming. For a DNA methylation change to be attributable to this influence, the cell subtype proportional change has to be consistently differing between the groups—in other words, the different proportion of a specific cell subtype needs to be nonrandomly present across the individuals in the test group compared with the controls.

It has been proposed that a good way of addressing this issue of confounded interpretation of reverse causation is through the use of a more difficult study design: longitudinal sampling of the same patients over time [ 73 ]. The Ahrens study in Table 2 accomplished this in a clever manner, sampling liver before and after bariatric surgery [ 53 ]. They found DNA methylation changes to be partially reversible at a subset of loci when liver biopsies were compared after bariatric surgery and associated dramatic weight loss (averaging 40 kg per person). By including a “healthy obese” group in their study, the possibility that the environmental exposure of obesity represents an independent and confounding influence on DNA methylation in the liver is addressed, allowing a focus on a subset of loci that is more likely to be involved in disease pathogenesis [ 54 ].

A further issue is whether the DNA methylation changes can be assumed to cause the phenotype, or whether they are caused by the phenotype, the latter being an example of “reverse causation.” This is a limitation inherent to cross-sectional study designs in which one group compared already has the phenotype [ 64 ]. Examples of reverse causation include obesity-related phenotypes that have been found to change DNA methylation of peripheral blood leukocytes (studying body mass index (BMI) [ 70 , 71 ] or blood lipid profiles [ 72 ]). Reverse causation represents another way that we can overinterpret epigenetic association studies, by making the incorrect assumption that the DNA methylation changes are causing the phenotype.

The median number of individuals in each cohort in Table 2 is 65, falling to 61.5 in the studies using liver samples. Obtaining well-characterised, high-quality liver samples for molecular studies is very challenging, reflected by these limited numbers. The question about how many samples are needed to power an epigenetic association study is frequently raised [ 68 , 69 ]. There are three major interacting variables that can be considered—the degree of change of DNA methylation, the number of sites tested, and the number of samples compared. When you use fewer samples, you can only confidently attribute changes of DNA methylation that are of greater magnitude and at fewer sites. It is a reasonable conclusion that these studies in Table 2 detected only a limited subset of the total likely number of DNA methylation changes occurring in these individuals.

Some human tissues are very accessible (e.g., peripheral blood leukocytes, buccal epithelium, hair follicles), but many common diseases are mediated by organs that would require sampling using invasive and risky procedures (e.g., liver, lungs, brain). A frequent question is whether an accessible tissue like peripheral blood leukocytes can report the epigenetic events occurring in an inaccessible tissue in the same individual [ 66 , 67 ]. What tends to be missing in EWAS projects using surrogate tissues is a clear rationale at the outset for the choice of the tissue used. If all that is required from the study is the development of a robust biomarker, in which DNA methylation changes reflect or predict a disease but without the need to understand why the DNA methylation change occurred, the use of surrogate, easily accessible tissues is highly desirable. If the question is whether the mechanism of the disease can be revealed through studies of a tissue like peripheral blood, that is when the rationale should be explicit—is the assumption that all cells in the body are changing their transcriptional regulatory patterns in the same way, both in the accessible and inaccessible tissues? Three of the NAFLD studies in Table 2 use peripheral blood leukocyte DNA for DNA methylation assays [ 55 , 57 , 60 ], making these studies examples of the use of surrogate reporter tissues. For their results in peripheral blood leukocytes to be informative about the physiology of cells in the liver, the genes and pathways would have to be active in both of these very distinctive tissue types and dysregulated in the same manner in response to an environmental exposure that may or may not be comparable in these distinct tissues. At present, such an explicit description of the rationale for the study or any discussion about whether this model is realistic is typically absent from reports of epigenetic studies of surrogate tissues.

These studies can be used to help illustrate some of the strengths and weaknesses of epigenetic association studies, as have been extensively reviewed in prior publications [ 62 – 65 ]. We break down some of the major issues in the following five categories.

We describe below several studies demonstrating tissue- and genomic region-dependent variation in DNA methylation in NAFLD, the foundation for proposals that DNA methylation has a role in the pathogenesis of NAFLD [ 52 ]. There have been several human EWAS performed on cohorts of patients with NAFLD that are worth examining as examples of a broader field of use of epigenomic assays to study environmental effects. Studies involving DNA methylation assays were chosen as this is at present by far the most commonly used epigenomic assay in EWAS. In Table 2 , we list eight studies involving DNA methylation studies of hepatic steatosis. None of these include extrahepatic phenotyping and are instead focused solely on liver disease.

NAFLD helps to illustrate an important point—the environmental exposure extrinsic to the organism may not be the same exposure experienced by the cells in the affected tissue ( Fig 1 ). In the case of NAFLD, the exposure extrinsic to the cell are the lipotoxic lipid species that influence hepatocytes in a poorly understood stress model. More relevant is the extrinsic exposure that causes NAFLD to progress to significant fibrosis, exemplified by high-fructose corn syrup [ 49 , 50 ], while the equivalent cellular exposure inducing fibrosis is the inflammatory cytokine TGFß [ 51 ].

Genetic susceptibility is a factor of importance in the development of NAFLD, which has a strong heritable component and involves several known loci [ 43 ]. NASH can develop in lean individuals, especially when carrying genetic risk alleles [ 44 ]. DNA sequence changes identified through genome-wide association studies (GWAS) have been causally associated with NAFLD [ 45 – 48 ], discussed in more detail below.

The pathogenesis of NAFLD involves both environmental exposures and genetic predisposition [ 39 , 40 ]. The causative environmental exposures are generally involved in causing obesity, such as the Western diet, but distinct gut microbiome complements in NAFLD patients are also invoked as possible contributory exposures [ 41 ]. It comes as no surprise that DOHaD models that are linked to obesity are also linked to NAFLD, with exposures during foetal and early neonatal life to maternal under- and overnutrition, excess glucocorticoids, and environmental pollutants linked to offspring NAFLD [ 42 ]. Whether these exposures act primarily to cause offspring obesity, with NAFLD as a secondary consequence, or whether early developmental programming predisposes to liver damage independently remains unclear.

We are using NAFLD as the focal point for the discussions to follow. NAFLD represents a major complication of the obesity epidemic, with NAFLD now the most common form of liver disease worldwide [ 33 , 34 ], estimated to affect 6 to 30 million people in the United States, including 600,000 who have advanced to developing cirrhosis [ 35 ]. A spectrum of histological stages characterizes the disease, ranging from simple fat accumulation (or steatosis) to an inflammatory phenotype, NASH, which can progress to involve fibrosis and cirrhosis [ 36 ]. Clinical outcomes with cirrhosis include decompensation, portal hypertension, liver transplantation, hepatocellular carcinoma, and death. Long-term follow-up of NAFLD patients confirms that NASH patients have a higher risk of liver-related mortality than non-NASH patients [ 37 ]. The economic burden of NASH is significant—the lifetime costs in the USA for NASH patients alone in 2017 was estimated to exceed US$200 billion [ 38 ].

The epigenome-wide association study (EWAS) refers to the testing of samples from individuals with a phenotype or exposure of interest using an assay testing the molecular regulators of gene transcription. This broad picture narrows significantly in practice, as most EWAS involve testing human blood leukocytes and the use of DNA methylation microarrays to survey across the entire genome. “Epigenome-wide” should not be taken to indicate comprehensive coverage of the genome, as microarrays typically represent no more than a few percent of the CG dinucleotides at which DNA methylation occurs. The groups are compared for consistent differences in DNA methylation levels between them, a positive outcome reflected by loci showing changes that are statistically significant. The underlying hypothesis driving these studies is the assumption that the phenotype being studied involves a reprogramming of the transcriptional regulation of the cell, not DNA sequence changes. By identifying DNA methylation differences between the groups, we not only gain evidence for this model, we also find the genes involved in changing the properties of the cell as part of the development of the phenotype. As will be discussed in the main text, it is not straightforward to interpret the results of these studies, for a number of reasons. The positive view is that the factors that make EWAS difficult to interpret, if understood, allow insights into the development of the phenotype, but at the expense of the hypothesis of cellular reprogramming.

When attempting to understand whether epigenetic changes occur associated with traits or disease phenotypes, an EWAS has been the typical approach. As introduced in Box 1 , an EWAS is generally performed on multiple individuals in a test group (with the phenotype) and a control group that is matched for potentially influential variables like age, sex, and ancestry. The same tissue type is sampled in all individuals, and an epigenome-wide assay is performed, usually studying DNA methylation at numerous loci in the genome, ranging from tens of thousands to millions of sites. For environmental studies, testing the same individuals in conditions when exposed and not exposed is an alternative study design that eliminates the influence of genetic variability, as discussed later, and it may be possible to quantify the exposure so that you do not have to compare two groups, but can instead correlate exposure as a continuous variable with the DNA methylation changes. When there is a locus or set of loci that has a difference of DNA methylation that appears statistically nonrandom, this represents a positive outcome for the EWAS. Generally, the result of a positive EWAS is one of numerous loci showing significant differences in DNA methylation, which generally leads to an attempt to interpret why this group of genomic regions underwent regulatory changes, involving linking each site with a gene, and then looking for a coherence of properties of these genes, using gene ontology or pathway information.

Embracing the sources of variation

Spurious associations reveal systematic changes If it sounds daunting to consider the possibility that we need to go beyond the DNA methylation change and understand its cause in terms of cell subtype or DNA sequence variants, there is an important point to consider. For one of these influences to influence the results, sending the DNA methylation in a specific direction so that it differs between groups, the spurious influence needs to be nonrandomly distributed between the groups. For example, if you are testing liver samples and there are more Kupffer cells in samples from one group compared with the other, this will cause the appearance of a DNA methylation change related to the loci that are distinctively methylated in that cell type. Likewise, in the situation of the meQTL above, where the C allele is associated with increased and the A allele with decreased DNA methylation, you will only see a resulting difference in DNA methylation between groups if the C allele is overrepresented in one group and not the other. These typically unrecognized influences, if studied, reveal cellular changes and genetic associations with the disease being studied, which represent potentially valuable insights into its pathogenesis.

Examples of cell subtype changes following environmental exposures It should be no surprise that environmental cues prompt responses that involve the composition of a tissue changing in terms of cell subtypes. Endocrine disrupting chemicals are characterised by altering sexual differentiation during development, a stark example where entire organs form differently in response to an environmental exposure. We have previously highlighted [65] an interesting model of a cell subtype response to vitamin A deficiency during intrauterine development. In this mouse model, the associated tissue phenotype was quite subtle, involving hyperplasia of smooth muscle surrounding distal airways, established during lung development and persisting into adulthood, causing bronchoconstriction in response to a methylcholine challenge [104]. A very similar phenotype that recapitulates a component of the human asthma phenotype was found following prenatal deficiency of vitamin D [105]. While relatively subtle from a histological point of view, the change in cell subtype composition and organisation in the lung tissue was enough to cause a measurable phenotype. EWAS of environmental exposures have also revealed cell subtype changes in peripheral blood. In a study of low-level intrauterine exposure to arsenic, DNA methylation of umbilical cord blood leukocytes was tested and analysed using an approach comparable to that described to be used by Nano group [55] earlier. These researchers found that arsenic exposure was associated with a higher proportion of CD8+ T lymphocytes [106]. The authors noted that the deconvolution approach based on DNA methylation profiles of adult reference cell types does not work as expected in cord blood, as was later systematically reviewed [107], and could therefore call their results into question. However, a separate study of arsenic exposure to adult mice showed an increase in CD8+ T cell proportions in bronchoalveolar lavage samples [108], suggesting that the exposure has consistent effects to increase this proportion of lymphocytes in different tissues. A consistently robust association from blood leukocyte DNA methylation studies is with a history of cigarette smoking [109]. One of the loci at which DNA methylation is distinctive in smokers is at the gene for the GPR15 surface marker. When this specific marker was studied further, a complex picture emerged. Initially, it appeared that GPR15 defined a specific T lymphocyte subtype in blood that increased in proportion following smoking [110]. However, the same researchers followed up with a study that showed GPR15 to be present on many T lymphocyte subtypes following smoking, indicating that the protein is induced by cigarette smoke exposure in many cell subtypes and does not necessarily reflect a change in cell subtype proportions, as they had originally proposed [111]. It should be noted that DNA methylation studies of peripheral blood generate a robust biomarker of smoking, a very useful indicator of a relevant environmental exposure when studying a phenotype like lung cancer [112]. When performing studies on human liver tissue, given the increasing availability of single-cell RNA-seq data from both healthy [116] and diseased [117] livers, it should be possible to go back to the results of those studies from Table 2 that included gene expression analyses of liver tissue [53,54,56,59] and get an indication of what proportion of the observed DNA methylation differences are due to cell subtype changes (Box 2). Rather than regarding this as undermining the results of an EWAS, such an additional layer of information potentially enhances insights into disease pathogenesis, by revealing cell subtype proportion changes occurring as part of the development or progression of the disease. Furthermore, if this kind of approach permits removal of DNA methylation changes that are due to cell subtype changes, the remaining signal is much more likely to represent the kind of cellular reprogramming events typically sought in an EWAS. Box 2. How can cell subtype proportions be measured? The complete blood count and differential white cell measurement provides quantification of lymphocytes, granulocytes (neutrophils, eosinophils, and basophils), and monocytes. This represents a restricted group of major subtypes of white blood cells, recognising that there are many subtypes of lymphocytes in particular, and monocytes are also inherently heterogeneous [113]. If DNA methylation microarray data are available from a tissue and if subtypes of cells have been tested to identify loci within the microarray with distinctive DNA methylation patterns in the different cell subtypes, the data can be analysed in a way that allows estimation of the proportions of each cell type present. This technique has been developed primarily for white blood cells, testing a different set of cell subtypes than is reported by the clinical differential white cell count: granulocytes, monocytes, and four subtypes of lymphocytes, B cells, natural killer cells, and CD4 and CD8 T cells [114]. If gene expression data are available, a comparable approach to allow estimation of cell subtypes can be performed, a salient example being CIBERSORT, which allows 22 different white blood cell subtype proportions to be estimated [115]. When reference gene expression data have not been generated from isolated cell subtypes in a tissue, the results of single-cell transcription studies can instead be used [76], allowing cell subtype estimations to be broadened beyond blood to other organs and tissues. If the vitamin A or vitamin D deficiency studies of the mouse lung described above were performed by a currently typical EWAS approach, sampling the bulk lung tissue from animals in each exposure group, testing DNA methylation patterns, and removing those changes attributable to cell subtype changes, we would be eliminating from further consideration the smooth muscle changes that mediate the bronchoconstriction phenotype in these animals. Likewise, the finding from DNA methylation studies of an increased proportion of CD8+ T lymphocytes following arsenic exposure is consistent with what has been found through immunological studies of the effects of toxicity of this heavy metal [118]. These cell subtype proportion changes occurring nonrandomly in patients with a disease are not, therefore, an artefact to discard, but instead represent an insight into disease pathogenesis, and should be harvested as useful information.

How to test for influences by DNA sequence variants An issue with any GWAS, or its extension to study environmental effects through GEWIS and GxE approaches, is that it does not point us to a specific locus at nucleotide resolution when it finds a phenotypic association. Instead, it implicates a haplotype, usually at least tens of kilobases in size and containing potentially thousands of variants [133]. To refine the search for the causative locus within the haplotype, one strategy has been to exploit the differences in phenotypic susceptibility between individuals of different ancestries to fine-map the region [134]. Another strategy uses the effects described earlier for DNA sequence variability influencing transcription and its regulatory mechanisms, identifying the subset of functional variants in these regions, allowing them to be prioritised as potentially mediating the phenotype. The most studied effect of functional variants is on gene expression. By treating the level of expression of a gene as a quantitative phenotype and correlating it with genetic variation, a locus where variation of the DNA sequence is associated with a change in gene expression can be defined. The sequences tested can be on the same chromosome and relatively close (typically <1 Mb) to the gene whose expression level is measured, a cis relationship, or further away or on another chromosome, a trans relationship. The outcome sought is a change similar to that in Fig 2, with the variant described as an eQTL and the target gene an eGene. By identifying the eQTLs in a haplotype implicated by an EWAS, the number of variants can be reduced from hundreds to a very limited number [133]. An even more powerful way of refining the search for causal variants is through studies that look for effects of a variant present on one but not the other allele, leading to an imbalance of expression of a linked gene, referred to as “allele-specific expression” [135]. This helps to safeguard against attributing function to a variant in a region of the genome in high linkage disequilibrium with neighbouring variants, one of which may instead be mediating the functional effect. Something interesting happens when you look for eQTLs after challenging the cell with an exposure. The groundbreaking study that revealed “response” eQTLs used an exposure of dendritic cells to infection by Mycobacterium tuberculosis. The authors found that while most eQTLs remained the same before and after exposure, a subset was present only in the uninfected or in the infected cells, and were described as response eQTLs. Within the panel of eQTLs genome wide, the response eQTLs were enriched at loci identified by GWAS as being involved in susceptibility to tuberculosis [136]. Response eQTLs have now been identified for multiple different exposures and cell types. Human monocytes have been tested following Toll-like receptor 4 (TLR4) stimulation [137], human dendritic cells were exposed to E. coli lipopolysaccharide, influenza, or interferon-β (IFN-β) [138], human monocytes were treated with interferon-γ (IFN-γ) or lipopolysaccharide [139], monocyte-derived macrophages were infected with Listeria monocytogenes or Salmonella typhimurium [140], while primary monocytes were exposed to ligands activating Toll-like receptor pathways (TLR1/2, TLR4, and TLR7/8) and to influenza virus, in samples from Africans and Europeans [141]. A study that exposed human macrophages to IFNγ, Salmonella enterica serovar Typhimurium, or a combination of the two exposures implicated certain TFs in mediating the response to infection, with a primary effect on PU.1 and secondary effects on stimulus-specific TFs, such as NF-κB and STAT2 [142]. A study of whole blood from 1,000 individuals, using exposures to three bacteria, a fungus, a live virus, and a superantigen, demonstrated that variability in responses between individuals was less influenced by age and sex and more by genetic factors, identifying response eQTLs enriched at loci implicated in autoimmune and inflammatory disorders [143]. Allele-specific expression has also proven valuable for discovering response eQTLs to factors such as BMI and exercise in large observational cohorts [144]. As well as these studies of infection and the immune system, we performed a study identifying response eQTLs following exposure of cardiomyocytes to anthracycline [2]. These response eQTLs were more enriched than preexposure eQTLs for loci implicated by GWAS for anthracycline-induced cardiotoxicity. The approach to study allele-specific expression has also been successfully scaled to a high number (50) of environmental exposures in five different cell types, revealing a large number of genes with GxE effects [145]. What all of these studies have in common is that they represent in vitro exposures by agents known to induce responses by the cells used. An obvious question that arises is how to apply these approaches to other diseases. Maintaining our focus on NAFLD, earlier, we described the environmental exposures to the individual may end up translating into exposures to the cells of the liver, including the effects of lipotoxicity on mixed cells types within the liver. We need to ask whether we can study one cell type in isolation, or whether effects of an exposure require the physical relationship between the cells that compose the normal liver. Rather than sampling primary liver cells from human subjects, which has substantially more risk than a blood draw to sample immune cells, we can use the approach of the anthracycline/cardiomyocytes study, which generated cardiomyocytes from induced pluripotent stem cells (iPSCs) from multiple individuals [2]. iPSCs can also be used to generate organoids that contain many of the cell subtypes of an organ, including liver organoids [146], which may be another avenue worth pursuing in studies of responses to exposures, although we highlight the caution that the field of organoid research is still in its early stages [147]. Once we have our exposure and cell system in place, we can move to molecular studies. The genotyping of samples is a foundation for understanding how genetic variability influences environmental responses and the transcriptional and epigenomic data generated. Furthermore, as mentioned above, the genotypic contribution to disease outcome is likely expressed in a tissue- and cell-specific manner [93]. Genotype information can be inferred from DNA methylation microarrays [148], while it has also been found that Gap Hunter can reveal ancestry information [86]. However, these approaches generates much less genotyping information than more mainstream approaches such as the use of microarrays that represent common variants in the genome, ideally designed to be as informative as possible across diverse populations [149]. An alternative is low-coverage whole genome sequencing, at a depth that is insufficient to identify every variant in the genome but allows the identification of many common variants through imputation methods that leverage large reference panels, and the additional revelation of some lower frequency variants that would not be detected by microarrays [150]. The next molecular assay is typically gene expression analysis, allowing eQTL identification, as described already. However, there are other molecular assays that reveal the effects of functional variants. While we have described meQTLs earlier as loci responding to functional variants by changing their DNA methylation, in practice, these have not been used to test for environmentally responsive loci in a way comparable with eQTL studies. Chromatin accessibility QTLs (caQTLs) represent an intriguing alternative that is understudied at present. An ingenious pooling approach was used to reveal caQTLs in lymphoblastoid cell lines from 1,000 individuals from 10 populations, revealing population-specific caQTLs [151], although this study did not involve any in vitro exposure. A recent, groundbreaking study combined the comparison of cells before and after exposure, testing multiple immune cell types, and identifying both response eQTLs and response caQTLs. The authors found evidence for function of candidate causal variants that would have been undetectable using more mainstream approaches studying resting cells [152]. A problem with eQTLs is that the expression of a gene is generally influenced by multiple cis regulators [153], making it difficult to link local sequence variation with the quantitative trait of gene expression. Chromatin accessibility, on the other hand, is likely to vary at the locus containing the functional variant, lending itself to allele-specific studies [151,154]. A shift in focus from eQTLs to caQTLs may be fruitful for functional variant analyses. Finally, it should be borne in mind that the relatively common variants that we can identify through microarray or low-coverage whole genome sequencing with imputation may only contribute a small proportion of variability of gene expression. Even with detailed haplotype information (in mostly European populations), the imputation can only predict variants down to a frequency of 1/1,000 (10−3) [155]. A study of 360 LCLs derived from European individuals combined deep whole genome sequencing and RNA sequencing to identify which variants influenced the heritability of levels of gene expression. Approximately 90% of the heritability was found to be associated with sequence variants that occurred only once in the cohort (singletons) and at a minor allele frequency of <0.01% in the gnomAD database [156]. If we assume that the variants identified in this study represent those with effects on DNA methylation (meQTLs) and chromatin accessibility (caQTLs), we can expect that only a small proportion of variants causing changes in expression, DNA methylation and chromatin accessibility will be revealed by current approaches, that deep whole genome sequencing will reveal many more variants affecting these molecular phenotypes, and that most functional variants will be present in the genome in a heterozygous state because of their population rarity.

[END]
---
[1] Url: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1010567

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/