(C) PLOS One

(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .

Development and preliminary testing of Health Equity Across the AI Lifecycle (HEAAL): A framework for healthcare delivery organizations to mitigate the risk of AI solutions worsening health inequities [1]

['Jee Young Kim', 'Duke Institute For Health Innovation', 'Duke Health', 'Durham', 'North Carolina', 'United States Of America', 'Alifia Hasan', 'Katherine C. Kellogg', 'Sloan School Of Management', 'Massachusetts Institute Of Technology']

Date: 2024-08

The use of data-driven technologies such as Artificial Intelligence (AI) and Machine Learning (ML) is growing in healthcare. However, the proliferation of healthcare AI tools has outpaced regulatory frameworks, accountability measures, and governance standards to ensure safe, effective, and equitable use. To address these gaps and tackle a common challenge faced by healthcare delivery organizations, a case-based workshop was organized, and a framework was developed to evaluate the potential impact of implementing an AI solution on health equity. The Health Equity Across the AI Lifecycle (HEAAL) is co-designed with extensive engagement of clinical, operational, technical, and regulatory leaders across healthcare delivery organizations and ecosystem partners in the US. It assesses 5 equity assessment domains–accountability, fairness, fitness for purpose, reliability and validity, and transparency–across the span of eight key decision points in the AI adoption lifecycle. It is a process-oriented framework containing 37 step-by-step procedures for evaluating an existing AI solution and 34 procedures for evaluating a new AI solution in total. Within each procedure, it identifies relevant key stakeholders and data sources used to conduct the procedure. HEAAL guides how healthcare delivery organizations may mitigate the potential risk of AI solutions worsening health inequities. It also informs how much resources and support are required to assess the potential impact of AI solutions on health inequities.

In healthcare, the use of data-driven technologies such as Artificial Intelligence (AI) and Machine Learning (ML) is increasing. However, the lack of robust regulations and standards poses a challenge to their safe and equitable use. To bridge this gap, we brought together healthcare leaders from various backgrounds in a workshop and developed the Health Equity Across the AI Lifecycle (HEAAL) framework. HEAAL evaluates how the use of AI might affect health equity. It examines five crucial domains—accountability, fairness, fitness for purpose, reliability and validity, and transparency—across eight key decision points in the AI adoption process. HEAAL offers tailored procedures for assessing both existing and new AI solutions, along with relevant stakeholders and data sources. By providing step-by-step guidance, HEAAL empowers healthcare delivery organizations to comprehend and mitigate the risk of AI exacerbating health inequities.

Funding: This work was supported by the Gordon and Betty Moore Foundation (Grant # 10849 to JYK, AH, SM, HS, AV, DEV, MAL, MP, IDR, and MPS). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2024 Kim et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

To address these gaps and tackle a common challenge faced by healthcare delivery organization leaders, we, the Health AI Partnership (HAIP), organized a case-based workshop [ 36 ] and developed a framework to assess how the use of AI solution might affect health equity. In the present research, we define health equity as the attainment of the optimal health for all people regardless of race, ethnicity, disability, sexual orientation, gender identity, socioeconomic status, geography, preferred language, and other factors that may affect access to care and health outcomes [ 37 ]. The manuscript offers a comprehensive overview of the development and testing of the framework designed specifically for leaders in healthcare delivery organizations. We named this framework Health Equity Across the AI Lifecycle (HEAAL). We aim to (1) provide a detailed overview of the procedures in the framework, along with relevant data sources and stakeholders and (2) describe in detail the participatory design research methodologies used to develop the framework to inform future stakeholder engagement efforts.

Our prior work revealed that healthcare delivery leaders find it challenging to identify and objectively measure the potential impact of an AI product on health inequities. We interviewed 89 individuals from 10 US healthcare delivery organizations and ecosystem partners [ 35 ]. Even though we interviewed 13 AI ethics and bias experts, we were not able to reach a consensus on the best approaches to assess AI products for potential impacts on health inequities.

Numerous academic papers have surfaced potential causes of bias in AI products, including lack of representation and diversity in model training data [ 18 – 20 ], lack of sufficient historic data to build an accurate model [ 21 ], an outlier event with unprecedented data [ 22 ], bias captured in specific data measurements [ 23 , 24 ], bias captured in unstructured text [ 25 , 26 ], bias embedded within outcome labels used to train models [ 11 , 12 ], and models learning shortcuts unrelated to disease process to generate diagnostic predictions [ 27 , 28 ]. Numerous reviews and frameworks have described categories of bias in AI products and proposed steps to address them [ 29 – 34 ]. But to date, there has yet to be a comprehensive set of actionable procedures across the AI product lifecycle for healthcare delivery organization leaders to adopt internally to mitigate the risk of AI products worsening health inequities.

However, the proliferation of healthcare AI tools has outpaced regulatory frameworks, accountability measures, and governance standards to ensure safe, effective, and equitable use [ 3 , 9 , 10 ]. Past research has shown numerous incidents where healthcare AI technologies perpetuate bias and inequities [ 11 – 13 ]. To address this issue, in 2022 and 2023, government officials from the White House [ 14 ], HHS Office of Civil Rights [ 15 ], Office of the National Coordinator for Health Information Technology (ONC) [ 16 ], and Office of the Attorney General in California [ 17 ] took action to protect against healthcare AI worsening inequities. While these regulatory actions describe what harms to avoid, they also leave significant room for interpretation of how healthcare delivery organizations can implement these principles.

The use of data-driven technologies such as Artificial Intelligence (AI) and Machine Learning (ML) is growing in healthcare. These technologies can be valuable tools for streamlining clinical workflow, aiding clinical decision-making, and improving clinical operations [ 1 – 4 ]. For example, the integration of AI and ML in healthcare helps in the detection and management of sepsis [ 5 ], preventing unanticipated intensive care unit transfers [ 6 ], and automated calculation of left ventricular ejection fraction [ 7 ]. AI and ML can promote earlier detection of diseases, more consistent collection and analysis of medical data, and greater access to care [ 8 ].

During the Deliver phase, the final prototype was refined and prepared for dissemination. Design researchers incorporated all feedback into revising the prototype and generated the first version of the framework. The framework was named Health Equity Across the AI Lifecycle (HEAAL). HEAAL was then shared with two other case study teams from NYP and PCCI. They plan to apply HEAAL in evaluating their postpartum depression and patient segmentation algorithms and publish their findings.

Responses to guiding questions were gathered and synthesized to create the initial prototype. It contained procedures for evaluating six health equity assessment domains. After the initial testing by a case study team, this prototype evolved into the second prototype. The second prototype was structured around eight key decision points of AI adoption and tested by the case study team. It was then shared with the framework developers and the HAIP leadership team for feedback and evaluation.

The second prototype was also shared with the framework developers and the HAIP leadership team for review. One major concern was that the framework does not sufficiently describe procedures related to one of the assessment domains, “policy and regulation.” HAIP leaders with regulatory expertise cautioned that engaging regulatory stakeholders in some procedures was not sufficient to assess the policy and regulation domain. Fig 3 shows how prototypes were developed during the Develop phase.

The case study team was atisfyied with the updated structure of the framework. They liked how the procedures flowed sequentially from the beginning to the end of the AI lifecycle. The project manager reported that while the framework demanded substantial effort, it remained manageable to navigate. The project manager found the framework to be particularly helpful in understanding potential gaps in algorithms. The data scientists provided additional feedback on how the assessment could be conducted more efficiently. They suggested rearranging some of the procedures in a different sequential order and modified descriptions of some procedures. They also suggested that once each procedure is completed, users should understand how to interpret the outputs of the procedure and what to do next.

A project manager and two data scientists from the DIHI case study team were recruited to test the usability of the second prototype. The team followed the procedures described in the framework to analyze the same pediatric sepsis prediction algorithm. With the updated content and structure of the framework, it was important to examine whether the framework addressed the initial pain points raised from the initial usability testing.

By incorporating feedback from the data scientists, design researchers generated the second prototype of the framework. The second prototype mapped procedures from the 6 domains of assessment to the HAIP eight key decision points of the AI product life cycle [ 35 ]. At this stage, tags were added to each procedure for relevant stakeholders to be involved, relevant datasets required for analyses, and health equity assessment domains.

Another suggestion that the data scientists provided was to describe some of the procedures more concretely with actionable guidance. For example, the data scientists requested the framework to explicitly state the required personnel or resources for each procedure. Similarly, they requested more detailed descriptions of the roles and responsibilities of individual decision-makers, advocating for statements like “seek approval from _____ stakeholder” instead of “engage _____ stakeholder.”

A major suggestion that the data scientists proposed was to consider restructuring the framework. They found that some of the procedures were redundant across different assessment domains. Such redundant procedures created inefficiencies, making data scientists go back and forth between different assessment domains to repeat similar analyses. To address this issue, they recommended listing the procedures of all assessment domains sequentially using the previously developed HAIP eight key decision points of the AI product life cycle [ 35 ].

Data scientists from the DIHI case study team tested the first prototype of the framework by applying its procedures to analyze a pediatric sepsis prediction algorithm. This process was essential to ensure that the framework was pragmatic and usable in practice. After the analysis, they reported the results of the analysis, shared their experiences using the framework, and suggested areas of improvement.

Framework developers individually provided answers to each guiding question listed under the six domains of assessment. Design researchers consolidated responses from all framework developers into a single document, organizing them sequentially to serve as procedures for assessing concerns described in each guiding question. Framework developers and design researchers then iterated on the document together. The revised document became the first prototype for the framework. It contained six assessment domains and relevant sets of actionable procedures under each of the assessment domains.

After the workshop, the design researchers reviewed and extracted key insights from their notes that they took during the workshop. Then, the design researchers mapped them onto the eight clusters of assessment domains that the framework developers had previously created. This activity ensured that novel ideas shared by the workshop participants were incorporated into the framework’s content.

Seventy-seven people with various domains of expertise from 10 healthcare delivery organizations and 4 ecosystem partners attended the workshop. Clinical, technical, operational, and regulatory stakeholders as well as AI ethics experts shared their perspectives on the workshop topic through different activities as described in the accompanying Formal Comment [ 36 ]. Design researchers took notes of the discussions that took place during the workshop.

Six framework developers individually reviewed two case studies and were asked to identify major domains of assessment or concerns that healthcare delivery organization leaders should assess when deciding to implement an AI solution into clinical practice safely, effectively, and equitably. For each domain of assessment, they were asked to provide its descriptions and propose how each domain may be assessed and what data may be required. Design researchers compiled responses from all framework developers in a single place and made the framework developers cluster similar ideas together. Ultimately, this activity resulted in the creation of eight unique clusters.

A total of three case studies were curated. A Duke Institute for Health Innovation (DIHI) team developed an initial example case study for a pediatric sepsis prediction algorithm. This case study was not presented at the workshop but was used to illustrate the case study format to other teams. Teams from NewYork-Presbyterian (NYP) and Parkland Center for Clinical Innovation (PCCI) then curated case studies for postpartum depression and patient segmentation algorithms, respectively, using the structure provided by the DIHI team [ 38 , 39 ]. The case studies served as real-world examples to facilitate ideation and discussion during the workshop among participants. More information about the workshop is presented in the accompanying Formal Comment [ 36 ].

The present research was considered a quality improvement (QI) project that did not involve human subjects research. Thus, it was exempted from IRB review and approval at Duke University Health System. All participants provided verbal consent to participate in the co-design processes and to have anonymized data used in analyses.

HEAAL was collaboratively designed through extensive engagement with clinical, operational, technical, and regulatory leaders from healthcare delivery organizations and ecosystem partners in the US ( Fig 1 ). Three innovation teams were recruited as case study teams. They curated case studies and presented them at the workshop. Seventy-seven representatives from ten healthcare delivery organizations and four ecosystem partners participated in the workshop and shared their experiences in adopting AI within their respective settings. Six framework developers—a clinician, a community representative, a computer scientist, a legal and regulatory expert, a project manager, and a sociotechnical scholar—were recruited to create a scaffolding of the framework and develop its procedures. Eight HAIP leaders who have clinical, community engagement, computer science, operational, and regulatory expertise evaluated the framework and provided feedback. Three design researchers facilitated the framework design process by collecting and synthesizing data from all other participants, refraining from generating data themselves. Design involved two rounds of divergent and convergent processes with four phases: discover, define, develop, and deliver ( Fig 2 ).

Results

HEAAL, presented in the supporting information (S1 Appendix and S2 Appendix), was established after conducting a series of activities, including curating case studies, surfacing domains of assessment, hosting a workshop, synthesizing insights, developing two prototypes, conducting two rounds of usability testing, and gathering feedback. Over the course of seven months, clinical, technical, operational, and regulatory stakeholders and AI ethics experts from healthcare delivery organizations and ecosystem partners contributed a great amount of their time and effort to these framework development activities.

Five domains of assessment HEAAL addresses five health equity assessment domains. The five equity assessment domains are (1) accountability, (2) fairness, (3) fitness for purpose, (4) reliability and validity, and (5) transparency. Accountability refers to the principle of holding individuals, organizations, or systems responsible for their actions, decisions, and outcomes of the proposed AI solution. This assessment domain entails overseeing potential substantial adverse impacts that may arise after the solution is integrated, identifying stakeholders responsible for managing and controlling the solution throughout its lifecycle, and developing plans for continuous monitoring. It highlights the role of a governance committee or designated stakeholders within a healthcare delivery organization who may oversee the risk of potential negative consequences arising from solution use. It suggests that the governance committee or designated stakeholders should have a clear understanding of the legal and internal policy constraints that the solution is subject to comply and proactively develop intervention plans. Additionally, they should devise strategies for ongoing monitoring, feedback, and evaluation. The assessment of accountability ensures that the solution remains adaptable to evolving circumstances and emerging health equity concerns, sustains safe performance and continues to improve over time. Fairness is defined as the ethical principle of treating individuals or groups impartially and without bias in the procurement, development, integration, and maintenance of the proposed AI solution. This assessment domain focuses on equal allocation of resources and opportunities across different individuals or groups to prevent any unjust or discriminatory outcomes that may arise from the use of the solution. It involves establishing and evaluating fairness criteria for the model performance and its work environment. The assessment of fairness ensures that the solution performs equitably across disadvantaged and advantaged patient subgroups and helps healthcare delivery organizations track progress towards achieving equity objectives. By understanding factors that contribute to potential inequitable technical, clinical, and operational outcomes, fairness assessment strives to mitigate existing disparities and prevent new ones that may arise from the adoption of the solution. Fitness for purpose is defined as the extent to which the proposed AI solution is appropriate for solving the identified problem posed by the intended use. This assessment domain evaluates whether the solution aligns with the specific goals, requirements, and contexts for which it was designed and implemented. It involves defining the intended and unintended use, constraints, and the target population for the solution. It also encompasses evaluating the suitability of a ML model compared to a simpler heuristic model for addressing the problem at hand. The fitness for purpose assessment emphasizes the engagement of its intended users and patient community members from the target population in the evaluation process. The active involvement ensures that the solution aligns not only with technical specifications but also with the broader goals and needs of its intended users, patient community members, and other relevant stakeholders within a specific context. Ultimately, the fitness for purpose assessment ensures that the solution is designed to address the identified problem comprehensively across disadvantaged and advantaged patient subgroups. Reliability and validity refer to the performance of the proposed AI solution regarding its consistency and accuracy. A reliable model produces consistent and reproducible output with the same input or similar data over multiple instances. Reliability promotes confidence in the solution’s performance. A valid model presents output that accurately measures or predicts the intended outcome of interest. Validity ensures that the model is measuring what is intended to measure, reflecting the real-world phenomenon it is meant to represent, and addressing the specific problem it was designed for. The assessment of reliability and validity ensures that the solution consistently achieves pre-specified performance targets across technical and clinical measures. Transparency is defined as the clarity and openness to explain how the proposed AI solution is developed, integrated, and maintained. This assessment domain highlights the importance of comprehensive communication with users and other affected stakeholders, including members of disadvantaged and advantaged patient subgroups. Effective communication should go beyond providing details about the technical specifications of the model and its intended use. It should entail the disclosure of information related to potential harms, risks, limitations, and impacts associated with the solution. The assessment of transparency empowers users and other affected stakeholders to make informed decisions in using the solution and helps them make progress towards equity objectives. Initially, “policy and regulation” emerged as one of the health equity assessment domains. Throughout the entire co-design process, participants expressed the importance of healthcare delivery organizations adapting to the changing regulatory landscape. However, ultimately it was not included in HEAAL, as there was no universal set of procedures that applied to diverse AI use cases across the US. Given the dynamic nature of regulations, the broad coverage of health equity assessment concerns within the framework, and the large number of jurisdiction-specific actions, HAIP leaders confirmed that no single set of procedures could adequately address policy and regulation across diverse AI use cases. For the time being, healthcare delivery organizations need to monitor federal and local regulators, including offices of state Attorney Generals and departments of health. A forum for streamlining and summarizing the evolving landscape may be needed so that healthcare delivery organizations have a go-to place to ensure that they comply with federal and local policy and regulation. New procedures may need to be added to HEAAL to support healthcare delivery organizations seeking to comply with emerging regulations and policies.

Achieving health equity HEAAL provides guidance for healthcare delivery organizations to assess the baseline level of health inequity, establish equity objectives for implementing the chosen AI solution, and evaluate the progress towards these objectives across eight key decision points in the AI lifecycle. The assessment of the baseline level of health inequity involves several procedures during the initial two decision points. First, it begins with an analysis of the current state of health inequity through conducting a literature review on epidemiology and consulting with personnel who have a deep understanding of patient experiences, such as healthcare providers, patient navigators, and patient community members. Then, local healthcare retrospective data is scrutinized to determine the presence of identified health inequities within the local healthcare delivery setting. The information obtained from both procedures is synthesized to compile a comprehensive list of health inequities and to identify disadvantaged patient subgroups. Following the measurement of the current state of health inequity, the third decision point entails establishing equity objectives for implementing the AI solution in terms of both health and economic outcomes. These objectives may span from maintaining the current level of inequity to reducing it significantly. Defining the equity objectives involves identifying the most suitable fairness metrics for the AI product to attain the established goals. Additionally, it requires documenting the rationale behind the selection of these specific fairness metrics. The pursuit of equity objectives progresses through the subsequent three decision points. The fourth decision point centers on solution design. Solution design is informed by input from both end-users and members of disadvantaged patient subgroups. By engaging end-users, the solution becomes accessible, inclusive, and usable by all. Involving members of disadvantaged patient subgroups uncovers specific support needs to ensure they derive maximum benefit from the solution. At the fifth decision point, there is a strong emphasis on evaluating the performance of the model using both retrospective and prospective data sourced from local healthcare providers. This assessment entails conducting a thorough evaluation of the model performance against fairness metrics across disadvantaged and advantaged patient subgroups. At the sixth decision point, the focus is on communication and education provided to end-users, members of disadvantaged patient subgroups, and other stakeholders affected by the clinical integration of the AI solution. This outreach raises awareness about existing health inequities, potential biases among users, and their consequences. Moreover, it facilitates the collection of feedback, ultimately advancing progress towards equity objectives. The final two decision points involve ongoing monitoring of shifts in health inequities among disadvantaged and advantaged patient subgroups. This continuous evaluation determines whether the implementation of the AI solution moves the organization closer to its equity objectives. If the monitoring results diverge from these objectives, the AI solution undergoes either updates or decommissioning.

[END]
---
[1] Url: https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000390

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/