(C) PlosOne
This story was originally published on plosone.org. The content has not been altered[1]
Licensed under Creative Commons Attribution (CC BY) license .
url:
https://journals.plos.org/plosone/s/licenses-and-copyright
--------------------
High-throughput framework for genetic analyses of adverse drug reactions using electronic health records
['Neil S. Zheng', 'Department Of Biomedical Informatics', 'Vanderbilt University Medical Center', 'Nashville', 'Tennessee', 'United States Of America', 'Cosa. Stone', 'Division Of Allergy', 'Pulmonary', 'Critical Care Medicine']
Date: None
Understanding the contribution of genetic variation to drug response can improve the delivery of precision medicine. However, genome-wide association studies (GWAS) for drug response are uncommon and are often hindered by small sample sizes. We present a high-throughput framework to efficiently identify eligible patients for genetic studies of adverse drug reactions (ADRs) using “drug allergy” labels from electronic health records (EHRs). As a proof-of-concept, we conducted GWAS for ADRs to 14 common drug/drug groups with 81,739 individuals from Vanderbilt University Medical Center’s BioVU DNA Biobank. We identified 7 genetic loci associated with ADRs at P < 5 × 10 −8 , including known genetic associations such as CYP2D6 and OPRM1 for CYP2D6-metabolized opioid ADR. Additional expression quantitative trait loci and phenome-wide association analyses added evidence to the observed associations. Our high-throughput framework is both scalable and portable, enabling impactful pharmacogenomic research to improve precision medicine.
Adverse drug reactions are a considerable burden on the healthcare system. Genetic studies can improve our understanding of the pathophysiological mechanisms of adverse drug reactions but have been hindered by small sample sizes. Drug responses are less often recorded than physiological traits and common diseases. Here, we present a high-throughput framework to efficiently identify eligible patients for genetic studies of adverse drug reactions from electronic health records. We validated our approach by conducting genome-wide association studies for adverse reactions to 14 common drug/drug groups with 81,739 individuals from Vanderbilt University Medical Centre’s BioVU DNA Biobank, identifying 7 genetic loci associated with adverse drug reactions. Our high-throughput framework can enable impactful pharmacogenomic research to help develop clinical guidelines for the delivery of the right drug to the right person.
Funding: The study was supported by National Institutes of Health, under grant numbers R01 HL133786 (W-QW), R35 GM131770 (CMS), P50 GM115305 (JCD, EJP, DMR), and R01 HG010863 (EJP). CPC was also funded by grants R01 AR073764 and the Veterans Health Administration Merit Award 1I01CX001741. The dataset used for the analyses described were obtained from Vanderbilt University Medical Center's resources, the Synthetic Derivative, which are supported by institutional funding and by the National Center for Advancing Translational Science grant 2UL1 TR000445-06 from NCATS/NIH. The funders had no role in study design, data collection, and analysis, decision to publish, or preparation of the manuscript.
Data Availability: Data cannot be shared publicly because it includes confidential genetic and electronic health record data. Data is available from the VUMC Synthetic Derivative and BioVU DNA BioBank (contact via
[email protected] ) for researchers who meet the criteria for access to confidential data. Data has been deposited to the NCBI dbGaP under accession number phs002306.v1.p1 (
https://www.ncbi.nlm.nih.gov/gap/study/status/40059 ).
This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
We hypothesized that drug allergy labels from the allergy sections in EHRs can be leveraged for efficient identification of reported ADRs. We developed and applied a high-throughput approach for identifying ADRs from allergy sections in EHRs from Vanderbilt University Medical Center’s (VUMC) Synthetic Derivative. Then using VUMC’s BioVU DNA Biobank, [ 17 ] we conducted GWAS on 14 drug (drug class) ADRs in a subset of 67,323 individuals with self-reported European ancestry (EA), followed by trans-ethnic validation in 14,416 individuals with self-reported African ancestry (AA). Additional expression quantitative trait loci (eQTL) analyses and phenome-wide association analyses (PheWAS) were performed on the lead variants. [ 18 , 19 ]
In this study, we investigated the feasibility of using the allergy section in EHRs to conduct high-throughput GWAS of reported ADRs. In routine practice, healthcare providers often use “drug allergy” labels in an allergy section to document a patient’s intolerance or allergy to a drug as reported by the patient or observed by a healthcare provider. [ 14 , 15 ] Despite being called an “allergy” section, the documented information most clearly satisfies the definition for ADR, which includes any noxious, unintended or undesired effect of a drug experienced at normal therapeutic doses. [ 7 , 16 ] The ADR information in this section is meant to be reconciled with every patient encounter to capture new information. The allergy section is semi-structured (i.e., some structure but does not adhere to any rigorous format), which allows for easy retrieval of adverse reaction information without sophisticated natural language processing, enabling high-throughput analysis when linked to genetic data.
A significant challenge for pharmacogenomics discovery is small sample size. [ 3 , 10 ] Drug response phenotypes, such as ADRs, are less often recorded than physiological traits and common diseases. [ 3 , 7 , 10 ] Traditional studies that recruit patient cohorts remain cumbersome and costly, and usually result in limited statistical power to detect genetic predictors with small effect sizes. [ 3 , 7 , 10 ] Biobanks that are linked to electronic health records (EHRs) can generate large datasets for efficient discovery and replication GWASs. [ 7 , 10 , 11 ] However, defining drug response using EHR data (i.e., pharmacological phenotyping), remains difficult. Unlike disease phenotypes, which can be represented with diagnostic codes, drug response information is often embedded in clinical notes, [ 11 , 12 ] complicating the development and implementation of uniform methods to extract drug response phenotypes. [ 11 , 13 ]
Genome-wide association studies (GWAS) have contributed substantially to precision medicine, providing critical insights into the physiological and pathophysiological mechanisms of human complex traits and diseases. [ 1 , 2 ] However, less than 10% of published GWAS have focused on drug response. [ 3 ] Adverse drug reactions (ADRs) are a considerable burden on patients and healthcare systems as a major source of hospitalization, morbidity, and mortality. [ 4 – 7 ] The lack of such pharmacogenomics GWASs on ADRs hinders our ability to deliver the right drug to the right person. [ 3 , 6 – 9 ]
To compare our framework with the ability of diagnosis codes to identify ADRs, we performed PheWAS of the lead variants from the identified genetic loci (CYP2D6, OPRM1, PTHLH, HLA/MICA) ( S1 Fig ). The lead variant rs115200108 in the HLA/MICA risk-locus was associated with increased risk of ‘Poisoning by antibiotic’ with an (OR = 2.37; 95% CI = 1.90 to 2.84; P = 3.0 × 10 −4 ) but did not reach phenome-wide significance (P < 5.0 × 10 −5 ).
The A allele of the lead variant rs11049274 in PTHLH for meperidine ADRs was significantly associated with increased PTHLH expression in muscoskeletal tissue (NES = 0.28; P = 6.5 × 10 −5 ). Additionally, the A allele of rs115200108 for penicillin ADRs was most significantly associated with higher MIR6891 expression in adipose tissue (NES = 1.3; P = 2.0 × 10 −13 ) and reduced MICA expression in whole blood tissue (NES = -0.72; P = 2.0 × 10 −13 ).
Using data from the Genotype-Tissue Expression (GTEx) project, we evaluated the correlation of the lead variants for the genetic loci identified by the GWAS and expression levels of putative target genes. For CYP2D6-metabolized opioid ADRs, the A allele of lead variant rs739296 in the CYP2D6 locus was most significantly associated with decreased WBP2NL expression in adipose tissue (normalized effect size [NES] = -0.33; P = 1.9 × 10 −11 ) and increased CYP2D6 expression in brain tissue (NES = 0.55; P = 5.3 × 10 −11 ). The T allele of the lead variant rs62436463 and the G allele of the exonic variant rs1799971 in OPRM1 were both associated with higher OPRM1 expression in the cerebellum with NES of 0.70 (P = 9.5 × 10 −8 ) and 0.63 (P = 1.4 × 10 −7 ), respectively.
For meperidine ADRs, the analysis revealed a genome-wide significant association signal upstream of PTHLH and two significantly associated variants in FIPL1 and SERINC5 ( Fig 2A ). Additionally, we identified a genome-wide significant signal in the major histocompatibility complex (MHC) region for penicillin ADR ( Fig 2B ). The minor allele of the lead variant rs115200108, which is located between HLA-B and MICA, was significantly associated with increased risk of penicillin ADRs (OR = 1.30, 95% CI = 1.21 to 1.39).
A) Manhattan plots of genome-wide association studies (GWAS) for codeine (left) and CYP2D6-metabolized opioid (right) adverse drug reactions (ADRs). Red lines on Manhattan plots show the genome-wide significance level (P < 5.0 × 10 −8 ). B) CYP2D6 locus for CYP2D6-metabolized opioid ADRs. SNPs are colored according to their linkage disequilibrium (LD, based on 1000 Genome phase3 EUR reference panel) with the lead variant rs739296 (22:42389948), which is marked with a purple diamond. The lead variant rs9620007 (22:42405657) for codeine ADRs is also labeled. Dotted gray line shows the genome-wide significance level (P < 5.0 × 10 −8 ).
The opioids shown in Fig 1 are prodrugs metabolized to a morphine or morphine-like active metabolites by CYP2D6. We identified a strong genome-wide significant association signal near the CYP2D6 gene for codeine and CYP2D6-metabolized opioid ADRs ( Fig 1 ). Near the CYP2D6 locus, the minor allele of the variant rs9620007 (G) was associated with reduced risk of codeine ADRs (Odds ratio [OR] = 0.84; 95% confidence interval [CI] = 0.79 to 0.89) and CYP2D6-metabolized opioid ADRs (OR = 0.86; 95% CI = 0.82 to 0.90). Additionally, the nearby variant rs739296 (A) was associated with reduced risk of CYP2D6-metabolized opioid ADRs (OR = 0.86; 95% CI = 0.83 to 0.90). The rs739296 (A) variant was also associated with reduced risk of specifically nausea/vomiting reactions to CYP2D6-metabolized opioids (OR = 0.80; 95% CI = 0.74 to 0.86). We found a significant association for OPRM1 and CYP2D6-metabolized opioid ADRs, where individuals carrying the minor allele of the lead variant rs62436463 (T) were less likely to have a reported ADR (OR = 0.84; 95% CI = 0.79 to 0.90). Notably, the minor allele of the exonic variant rs1799971 (G) in OPRM1, which is in high LD with the lead variant rs62436463, was also associated with reduced risk of CYP2D6-metabolized opioid ADRs (OR = 0.86; 95% CI = 0.82 to 0.91).
The genetic analyses for EAs identified genome-wide significant signals (P < 5 × 10 −8 ) for 7 of the 14 adverse drug reactions. The lead variant for each signal is shown in Table 3 , and additional correlated variants are reported in S2 Table . The trans-ethnic validation of the identified signals for EAs in the AA cohort yielded no significant findings ( S3 Table ). Genome-wide analyses in AA individuals were excluded in our primary analysis due to the potential for unstable point estimates and inflated false discovery rates from limited sample size. Nonetheless, significant ADR-genetic associations in AAs may be informative for future studies and have been included S4 Table .
The most frequently documented ADRs were to penicillins (17.4%), sulfa drugs (11.6%), and codeine (9.1%). Cases and controls for GWAS of 14 adverse drug or drug group reactions are shown in Table 2 . We selected the top 10 most frequent drugs or drug classes reported in the allergy sections: penicillins, sulfa drugs, codeine, morphine, aspirin, lisinopril, levofloxacin, erythromycin, meperidine, and cephalexin. The top 10 most frequently reported drugs in the allergy sections were the same for EAs and AAs with differences in ordering. Additionally, we observed that ADRs to statins as a class of drugs were reported frequently. Therefore, we identified ADRs to any statin for a grouped analysis since the class of drugs shares a similar metabolic pathway and further broke down ADRs into atorvastatin only or simvastatin only. Likewise, we selected CYP2D6-metabolized opioid prodrugs, including codeine, hydrocodone, oxycodone, and tramadol, as a grouped analysis. [ 20 ] Types of adverse drug reactions for the 14 selected drug or drug groups are summarized in S1 Table . The type of reaction is not always documented in the allergy section and the percent missing ranges from 24.4% to 58.0%.
A summary of selected EHR characteristics for all individuals with available EHRs at VUMC and the selected BioVU individuals is shown in Table 1 . The BioVU cohort had mean EHR length of 10.6 years, which was more than double the length of the mean EHR length for all VUMC individuals (4.4 years). Additionally, a greater proportion of BioVU individuals (95.3%) had information documented in their allergy section compared to all VUMC individuals (62.4%). Similarly, a greater proportion of BioVU individuals (63.0%) had at least one reported ADR compared to all VUMC individuals (28.6%). While the proportion of individuals that have information documented in their allergy section is similar between EAs and AAs, we observed that the proportion of individuals with reported ADRs was greater among EAs compared to AAs for all VUMC individuals and the BioVU cohort.
Discussion
In this study, we present a high-throughput and scalable approach to conduct large-scale, genome-wide analyses for adverse drug reactions. Our framework can be adapted or shared between institutions, helping facilitate collaboration between sites. Utilizing EHRs allowed us to study ADRs in individuals with diverse clinical and ethnic backgrounds under the conditions of routine clinical care. As shown in this study, what and how physicians choose to document clinical observations or patients’ self-reported details as drug allergies in the EHR may provide useful information. In addition, our results demonstrated the potential of utilizing EHRs and our framework to efficiently generate pharmacogenomic findings, which can provide insights for optimizing drug therapy with maximal efficacy and minimal adverse effects.
We found that 28.6% of individuals at VUMC had at least one drug listed in the allergy section of their EHRs. This is consistent with other studies have reported between 20 to 35 percent of their populations have at least one drug allergy label in their EHRs. [14,21] The genotyped BioVU cohort is a patient cohort (i.e., receives more frequent medical care than general population) and has more dense EHR data, which may explain the higher proportion of the BioVU cohort (66.0%) that reported at least one ADR. We also observed a lower proportion of reported ADRs among AAs than EAs, which is consistent with a previous report. [14] As noted by the previous study, the difference in the reported ADRs between AAs and EAs may reflect a documentation bias that has been reported in other clinical domains. [14]
Using our ADR case-control definitions, analyses identified genetic loci for 7 of the 14 selected drug/drug group allergies. We found that variants in two well-known genetic loci, CYP2D6 and OPRM1, were associated with reduced risk of CYP2D6-metabolized opioid ADRs. The analysis of eQTL data from the GTEx project showed that variants in the CYP2D6 locus and in OPRM1 were associated with elevated expression of these genes in the brain. [22] Previous studies have implicated both of these genetic loci in opioid response and metabolism. [23–25] Notably, an independent report on variants associated with reduced risk of opioid-induced vomiting in a 23andMe cohort supported our findings that the minor alleles of rs9620007 near CYP2D6 and rs1799971 in OPRM1 were associated with reduced risk of CYP2D6-metabolized opioid ADRs. [26] Furthermore, our analysis of CYP2D6-metabolized opioid related nausea or vomiting also identified the same loci near CYP2D6 as associated with reduced risk. However, CYP2D6 metabolic activity also varies greatly depending on a copy number variation, [23] which was not available for this study. Therefore, further work is needed to better understand the contributions of genetic variations to CYP2D6-metabolized opioid ADRs. Additionally, studies have reported that patients who carried the G allele of rs1799971 in OPRM1 required higher doses of opioid for pain relief. [27,28] It is possible that patients carrying the minor allele for the significant variants in OPRM1 experienced reduced opioid effectiveness, which may affect their opioid sensitivity and risk of adverse reaction depending on the opioid dosage.
We also identified HLA-MICA as a risk-locus for penicillin ADR, which is supported with a recent large-scale genetic analysis for penicillin allergy including data from UK Biobank, Estonian Biobank and BioVU. The previous study also showed a strong association between penicillin allergy label and the HLA-MICA region with a different lead variant. [29] The eQTL analysis showed that the minor allele of the lead variant rs115200108 in the HLA-MICA risk-locus for penicillin ADR was associated with reduced MICA expression in whole blood tissue. The PheWAS results found that the minor allele of rs115200108 was highly associated with increased risk of ‘Poisoning by antibiotic,’ but did not reach phenome-wide significance. This finding suggests that our approach to identifying ADRs not only offers ADR phenotypes that are not covered by diagnosis codes but may also provide more power for genetic analyses than using diagnostic codes alone.
There have been no previous studies regarding the associations between PTHLH and meperidine allergy. In our eQTL analysis, we found that the lead variant in the PTHLH risk-locus for meperidine allergy was associated with increased PTHLH expression in muscoskeletal tissue. However, further investigation is needed to confirm this finding. Trans-ethnic validation among individuals with self-reported African ancestry did not replicate any associations of genome-wide significance, but this analysis may have been limited by smaller sample size. Additionally, we performed genetic imputation with reference panels from the Haplotype Reference Consortium, which were developed with individuals from predominantly European ancestry and therefore may not be adequate for individuals with self-reported African ancestry. [30] Likewise, genome-wide analyses in the African ancestry cohort were also limited by small sample sizes and predominantly European ancestry genetic reference panels. Further improvements in ADR documentation and genetic reference panels as well as the continued growth of EHR data may help us determine the generalizability of these findings in diverse populations. Due to the high-throughput nature of our framework, it should be easy to adapt to other large multi-ancestry EHR-based biobanks for future analyses.
There are several additional limitations to this study and approach. Drug allergy labels in the allergy section are entered into the EHRs by healthcare providers, but this information is often self-reported or subject to interpretation bias by the individual receiving the information and entering the data, introducing potential documentation or selection bias. For instance, patients who communicate with their healthcare provider more frequently, whether due to their specific conditions or due to socio-behavioral factors, may be more likely to report their adverse drug reactions. A better understanding of the factors that affect the likelihood of receiving a drug allergy label may improve our ability to utilize EHRs to study ADRs. Additionally, it is likely there were some misclassification errors in the controls. Controls who were exposed to the drug and experienced an adverse reaction may not have reported the reaction to their clinician to be documented. Similarly, controls who were never exposed to the drug and only had the “no known drug allergy” label may experience an adverse reaction when exposed to the drug. However, misclassifications of cases as controls most likely biases the results to null and leads to an underestimation of the true contribution of genetic variation to ADRs.
While a drug allergy labels in the allergy section is consistent with a previous adverse drug reaction to the drug, more detailed questioning often reveals that a true allergy is less certain. [15,31] For instance the vast majority of patients who are labeled as having a penicillin allergy were typically labeled much earlier in childhood. [32–34] Studies in allergy practice show that >95% of these individuals that undergo validated skin testing and challenge will tolerate penicillin, in part due to waning of this allergic response over time. [31] Therefore, our analysis did not consider the possibility of patients having lost their allergic tendency and being delabeled for a drug allergy, and our results should be explained as ‘ever or never’ reported an adverse reaction to a drug. Indeed, it is more challenging to capture specific details in the EHR when identifying individuals who ever had a penicillin allergy label, rather than those who currently have a penicillin allergy.
We also observed that clinicians often do not enter information in the allergy section in a standardized manner, especially in older EHRs. Drug allergies and drug intolerances are frequently documented together in the allergy section without clear distinguishers. In addition, allergy section entries often omit details such as severity, type of reaction (e.g., anaphylaxis vs. rash), specific dose, and time of administration, limiting nuanced analyses. Although the CYP2D6-metabolized opioid related nausea/vomiting findings demonstrate that our framework can extract more detailed ADR phenotypes, the frequency of missing reaction information hinders a high-throughput analyses of specific adverse effects. Thus, the high-throughput nature of our framework means that our genetic analyses were likely driven by the milder, more frequent reactions (e.g., rash from penicillin) rather than rarer phenotypes like Stevens-Johnson syndrome. Nonetheless, genetic variants identified with our framework need further follow-up to better understand the potential risks of a medication for a patient. For instance, labeling a patient to be broadly ‘at risk’ for an ADR may cause the patient to be given suboptimal therapy even if the reaction may be a common, expected side effect.
These observation highlights the need to emphasize efforts to capture more accurate and relevant drug response information. Our framework will yield better outcomes as newer EHR systems introduce more explicit semantic meaning (e.g., allergy vs. intolerance), structured inputs and questionnaires (e.g., drop-down menus or checkboxes), [15] and increased quantity of quality data to the allergy section. Although these improvements require time and planning, it is encouraging that our current study in the context of these limitations can successfully identify several known genetic associations for ADRs.
In summary, our results demonstrate the utility and efficacy of a high-throughput framework to identifying ADRs and eligible individuals from EHRs for large-scale studies. Our approach is scalable and portable and can help accelerate the pace of impactful pharmacogenomic research for advancing precision medicine.
[1] Url:
https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1009593
(C) GlobalVoices
Licensed under Creative Commons Attribution 3.0 Unported (CC BY 4.0)
URL:
https://creativecommons.org/licenses/by/4.0/
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/