(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org.
Licensed under Creative Commons Attribution (CC BY) license.
url:https://journals.plos.org/plosone/s/licenses-and-copyright

------------



Causal inference for heritable phenotypic risk factors using heterogeneous genetic instruments

['Jingshu Wang', 'Department Of Statistics', 'University Of Chicago', 'Chicago', 'Illinois', 'United States Of America', 'Qingyuan Zhao', 'Department Of Pure Mathematics', 'Mathematical Statistics', 'University Of Cambridge']

Date: 2021-08

Model overview

From the causal model to GWAS summary statistics. Our framework starts with a set of structural equations that jointly specify the generative model on the disease Y that relies on K observed risk factors X = (X 1 , ⋯, X K ) of interest, and the vector Z = (Z 1 , Z 2 , ⋯) containing all genetic information of an individual (Fig 1a). (1) Here U represents unknown non-heritable confounding factors and and E Y are random noise acting on X k and Y respectively. The parameter of interest, β, quantifies the causal effect of the vector of risk factors X on Y. Mendel’s laws of inheritance suggest that the genotypes Z are randomized during conception and are generally independent of the environmental factors ( ). The function f(U, Z, E Y ) represents the causal effect of unmeasured risk factors on Y, which can be heritable (contributed by Z) or non-heritable (contributed by U). The non-parametric functions f(⋅) and g k (⋅) allow interactions among SNPs in Z and variables ( ) in their causal effects on X and Y. Under this model, there is horizontal pleiotropy for a SNP j if Z j has nonzero association with f(U, Z, E Y ). This is the case, for example, when Z j acts on Y through a pathway affecting unmeasured risk factors, or when Z j is in linkage disequilibrium (LD) with such a locus. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 1. Model overview. a, The causal directed graph represented by structural equations (1). b, The existence of a pleiotropic pathway 2 (purple) can result in multiple modes of the profile likelihood. c, Multi-modality of the profile likelihood can reflect causal direction. d, The work-flow with GRAPPLE. https://doi.org/10.1371/journal.pgen.1009575.g001 Now consider the case where only GWAS summary statistics, i.e. the estimated marginal associations between each SNP j and the risk factors/disease traits, are available and there are in total p SNPs selected. Let Γ j be the true association between SNP j and Y, and γ j be the vector of true marginal associations between SNP j and X. Later, we will denote their estimated values from GWAS summary statistics as and . Then, as shown in Materials and Methods, the model (1) results in the linear relationship (2) where for binary Y, the parameter β in model (2) is a conservatively biased version of β in model (1). This relationship holds even when the functions f(⋅) and g k (⋅) in (1) are not linear. Here, α j is the marginal association between Z j and f(U, Z, E Y ), representing the unknown horizontal pleiotropy of SNP j. One can immediately see that identifying β is impossible without further assumptions regarding α j . Early MR methods such as IVW [5] made the assumption that all instruments are valid satisfying α j = 0. Other methods such as Weighted Median [7] or MR-PRSSO [9] assume that α j is sparsely nonzero. However, the no or sparse pleiotropy assumption follows from statistical convenience rather than biological insights. As discussed in Introduction, horizontal pleiotropy is pervasive for most complex traits. One assumption that allows pervasive pleiotropy is to assume the InSIDE assumption [6] where , or alternatively, the random effect model [10, 16] where for most genetic instruments. Unfortunately, the InSIDE assumption can be easily violated if the pleiotropic effects of selected genetic variants are driven by shared pleiotropic mechanisms. Some more recent MR methods such as LCV [26], MRMix [12], Contamination mixture [13] and CAUSE [15] have noticed this limitation of the InSIDE assumption and allow a subset of the genetic instruments to be associated with a common hidden pleiotropic pathway. For instance, using the above notation, both CAUSE and MRMix assumed that when for the SNPs that violate the inSIDE assumption, their pleiotropic effects satisfy (when K = 1) where aγ j represents the correlated pleiotropic effects due to a confounding pathway and . This is a more realistic assumption than InSIDE, though there would then be an issue in distinguishing the true causal effect β from the pleiotropic direction β + a. Allowing for only one pleiotropic pathway also makes the model restrictive for real datasets.

Identify multiple pleiotropic pathways and the direction of causality. The key idea underlying GRAPPLE is that multiple pleiotropic pathways can be detected by using the shape of the profile likelihood function under no pleiotropy assumption. This allows us to probe the underlying causal mechanism, without explicit assumptions on pleiotropic patterns (Fig 1b). When K = 1, the GWAS summary statistics reduce to the scalar and , with their standard errors σ 1j and σ 2j . From the central limit theorem, the joint distribution of approximately follows a multivariate normal distribution (3) where θ is a shared sample correlation that can be estimated as (see Materials and methods). When there is no horizontal pleiotropy in the p selected independent genetic instruments (α j = 0 for j = 1, 2, ⋯, p), the robust profile likelihood [10] is given by, (4) where ρ(⋅) is the Tukey’s Biweight loss, or any other robust loss functions. As described with more details in Materials and Methods, the profile likelihood is obtained by profiling out nuisance parameters γ 1 , ⋯, γ p in the full likelihood from (3), which is further robustified by replacing the L 2 loss with Tukey’s Biweight loss to increase the sensitivity of mode detection. Under no pleiotropy or InSIDE assumption, this function l(b) should have only one mode near the true causal effect b = β. Now consider the case where a second genetic pathway (Pathway 2) also contributes substantially to the disease, and some instrument loci are also associated with Pathway 2 (Fig 1b). In this scenario, SNPs that are associated with X only through Pathway 2 can contribute to a second mode in the profile likelihood at location β + κ/δ, where κ and δ quantifies the causal effect of Pathway 2 on Y and its marginal association with X, respectively (Materials and Methods). Similarly, multiple pleiotropic pathways generally result in multiple modes of l(b). Thus, we can use multiple modes in a plot of l(b) to diagnose the presence of horizontal pleiotropic effects that are grouped by different pleiotropic pathways. The existence of pleiotropic pathways complicates MR and makes the causal effects of the risk factors potentially unidentifiable. Specifically, when Pathway 2 exists, the GWAS summary statistics alone cannot provide information to distinguish β from β + κ/δ. Instead of making further untestable assumptions such as one pathway “dominates” the other, when multiple modes are detected, we suggest that whenever multiple modes are detected, the investigator should try to find biomarkers for each mode and collect more GWAS data to adjust for confounding risk factors. Specifically, GRAPPLE facilitates this by identifying marker SNPs of each mode, as well as the mapped genes and GWAS traits of each marker SNP (see Materials and methods). This allows researchers to use their expert knowledge to infer possible confounding risk factors that contribute to each mode. With GWAS summary statistics of these confounding traits, GRAPPLE can perform a multivariable MR analysis assuming the InSIDE assumption applies for the remaining horizontal pleiotropic effects (Materials and Methods). The detection of multiple modes can be also used to determine the causal direction (Fig 1c). If the wrong causal direction is specified in model (1) and Y is a cause of X, the genetic variants associated with X can be classified in two groups: those associated with X through Y, and those associated with X through another pathway unrelated to Y. In the former case, γ j = βΓ j where β is the causal effect of Y on X, and these SNPs should contribute to a mode around 1/β. In the latter case, a SNP j satisfies γ j ≠ 0 but Γ j = 0, and would contribute to a mode of l(b) at 0. Thus, there will be two modes in the robust profile likelihood with one mode being around 0. This idea can be viewed as an extension of the bidirectional MR [29, 30]. Bidirectional MR is based on the assumptions that when X is a cause of Y, most of the genetic instruments for Y should be unassociated with X, because they affect Y through a different pathway, thus the reserve MR would indicate a zero effect of Y on X. GRAPPLE makes this inference more robust by making use of the fundamentally different shape of the robust profile likelihood plots in different directions. In the correct causal direction, the plot should only show one mode around the true causal effect β. In the incorrect reverse direction - when the true outcome is treated as the risk factor and the true risk factor is treated as the outcome - the plot of the robust profile likelihood will have two modes, one around 0, representing the variants directly related to the true outcome, and one around 1/β, representing the variants indirectly related to the true outcome through the true risk factor.

Weak genetic instruments: A curse or a blessing? Besides the assumption of no-horizontal-pleiotropy, for a SNP to be a valid genetic instrument, it needs to have a non-zero association with the risk factor of interest. In most MR pipelines, SNPs are selected as instruments only when their p-values are below 10−8, which is required to guarantee a low family-wise error rate (FWER) for GWAS data. Using such a stringent threshold also helps to avoid weak instrument bias [31], where measurement errors in are not ignorable and lead to bias in . However, such a stringent selection threshold may result in very few, or even no instruments being selected with under-powered GWAS, and may still not be adequate to avoid weak instrument bias. Further, when our goal is to jointly model the effects of multiple risk factors (the setting where X as a vector), it is unrealistic to assume that all selected SNPs have strong effects on every risk factor. In addition, the high polygenecity of complex traits indicates that the weak instruments far outnumbers strong instruments, and collectively, they may substantially improve the estimation accuracy. In GRAPPLE, we use a flexible p-value threshold, which can be either as stringent as 10−8 or as relaxed as 10−2, for instrument selection. Based on the profile likelihood framework of MR-RAPS [10], GRAPPLE can provide valid inference for that avoids weak instrument bias for multiple risk factors even when the p-value threshold is as large as 10−2. This flexible p-value threshold is beneficial for several reasons. First, including moderate and weak instruments may increase power, especially for under-powered GWAS. Second, for MR with multiple risk factors where it is inevitable to include SNPs that have weak associations with some of the risk factors, we can obtain much more accurate causal effect estimations than methods that can only deal with jointly strong SNPs. More importantly, we can investigate the stability of the estimates across a series of p-value thresholds and get a more complete picture of the underlying horizontal pleiotropy. In practice, we suggest researchers to vary the selection p-value thresholds from a stringent one (say 10−8) to a relaxed one (say 10−2), both in the detection of multiple modes and in estimating causal effects.

[END]

[1] Url: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1009575

(C) Plos One. "Accelerating the publication of peer-reviewed science."
Licensed under Creative Commons Attribution (CC BY 4.0)
URL: https://creativecommons.org/licenses/by/4.0/


via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/