(C) PLOS One

(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .

The First Generative AI Prompt-A-Thon in Healthcare: A Novel Approach to Workforce Engagement with a Private Instance of ChatGPT [1]

['William R. Small', 'Department Of Health Informatics', 'Nyu Langone Health', 'New York', 'United States Of America', 'Department Of Medicine', 'Nyu Grossman School Of Medicine', 'Kiran Malhotra', 'Department Of Ophthalmology', 'Vincent J. Major']

Date: 2024-10

Abstract Background Healthcare crowdsourcing events (e.g. hackathons) facilitate interdisciplinary collaboration and encourage innovation. Peer-reviewed research has not yet considered a healthcare crowdsourcing event focusing on generative artificial intelligence (GenAI), which generates text in response to detailed prompts and has vast potential for improving the efficiency of healthcare organizations. Our event, the New York University Langone Health (NYULH) Prompt-a-thon, primarily sought to inspire and build AI fluency within our diverse NYULH community, and foster collaboration and innovation. Secondarily, we sought to analyze how participants’ experience was influenced by their prior GenAI exposure and whether they received sample prompts during the workshop. Methods Executing the event required the assembly of an expert planning committee, who recruited diverse participants, anticipated technological challenges, and prepared the event. The event was composed of didactics and workshop sessions, which educated and allowed participants to experiment with using GenAI on real healthcare data. Participants were given novel “project cards” associated with each dataset that illuminated the tasks GenAI could perform and, for a random set of teams, sample prompts to help them achieve each task (the public repository of project cards can be found at https://github.com/smallw03/NYULH-Generative-AI-Prompt-a-thon-Project-Cards). Afterwards, participants were asked to fill out a survey with 7-point Likert-style questions. Results Our event was successful in educating and inspiring hundreds of enthusiastic in-person and virtual participants across our organization on the responsible use of GenAI in a low-cost and technologically feasible manner. All participants responded positively, on average, to each of the survey questions (e.g., confidence in their ability to use and trust GenAI). Critically, participants reported a self-perceived increase in their likelihood of using and promoting colleagues’ use of GenAI for their daily work. No significant differences were seen in the surveys of those who received sample prompts with their project task descriptions Conclusion The first healthcare Prompt-a-thon was an overwhelming success, with minimal technological failures, positive responses from diverse participants and staff, and evidence of post-event engagement. These findings will be integral to planning future events at our institution, and to others looking to engage their workforce in utilizing GenAI.

Citation: Small WR, Malhotra K, Major VJ, Wiesenfeld B, Lewis M, Grover H, et al. (2024) The First Generative AI Prompt-A-Thon in Healthcare: A Novel Approach to Workforce Engagement with a Private Instance of ChatGPT. PLOS Digit Health 3(7): e0000394. https://doi.org/10.1371/journal.pdig.0000394 Editor: Janna Hastings, University of Zurich Faculty of Medicine: Universitat Zurich Medizinische Fakultat, SWITZERLAND Received: October 26, 2023; Accepted: June 10, 2024; Published: July 23, 2024 Copyright: © 2024 Small et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: All relevant data are within the manuscript and its Supporting Information file. Funding: The author(s) received no specific funding for this work. Competing interests: The authors have declared that no competing interests exist.

Introduction Generative artificial intelligence (GenAI) use exploded with ChatGPT’s release to the public in late 2022. Large language models (LLMs), which learn from vast amounts of text data to "understand” the context in which words appear and ultimately, their meaning, belong to a class of artificial intelligence (AI) called GenAI. [1,2] ChatGPT sparked a user interface revolution in AI, making the extensive knowledge base that LLMs like GPT-3 and GPT-4 amass more accessible to everyone through natural language, rather than code, as a prompt to generate responses that mimic human syntax. [1–3] Several industries have recognized the advantages GenAI can offer, as studies with consultants and customer service agents showed increased worker productivity and reductions in employee attrition when they provided access to GenAI tools to their workforce. [4–5] For healthcare organizations to realize GenAI’s transformative potential, they must upskill their workforces in its responsible use or risk breaches of patients’ or researchers’ data privacy and perpetuation of biases that exist in training data and user prompts. [6–8] In the spring of 2023, NYULH, a large academic medical center in the New York City area, developed and publicized workforce policies for the use of the public ChatGPT application and launched a secure and private ChatGPT-like instance as a user interface for the use of GenAI models like GPT-4. All members of the workforce were invited to apply for access to this instance and experiment with data not allowed within the public instance, such as patient information and intellectual property. Leaders from the NYULH MCIT Department of Health Informatics and the Predictive Analytics Unit helped supervise and guide this exploration. In response to the workforce’s profound interest in this technology, and the risks of them using it without proper training (e.g., putting patient health information into the public instance of ChatGPT), we needed an efficient method for engaging and educating our community that promoted collaboration and innovation. Consequently, we decided to hold an in-person event: the first GenAI “Prompt-a-thon" in healthcare, to bring together staff from all corners of our institution to learn from local experts, experiment with GenAI, and share ideas about how GenAI could revolutionize their work. Our Prompt-a-thon builds on the burgeoning body of literature on crowdsourcing contests (e.g. hackathons, datathons, etc), which have successfully promoted innovation, utilization of new technologies, and inter-professional collaboration within healthcare communities. [3,9–19] For example, a hackathon for junior surgical physician-scientists exemplified how these events can foster innovation by accelerating academic productivity. [15] Furthermore, Aboab et al. reported that incorporating actual healthcare data into these workshops led to the development of decision support tools and novel research directly applicable to real-world tasks. [10] Others have showcased how these events equip participants with transferable skills relevant to their daily work [16], but to realize post-event success, organizers must ensure participating teams are diverse in their demographic attributes and technical capabilities. [15] Healthcare hackathons which promote GenAI utilization with real-world data [5] and involve a diverse community [15] thus have the potential to educate early adopters, upskill an entire workforce, and revolutionize patient care. The best method of incorporating LLMs within a healthcare hackathon is unclear. We consulted various precedents for hackathon best practices, [9–12] and whether to favor determinism (e.g., highly structured workshops) versus creativity (e.g., allowing individuals maximal freedom to dictate their problem-solving approach during workshops). In favor of determinism, best practices posed by Silver et al. state that healthcare hackathons should define the purpose to stakeholders, select an apt theme or problem, choose the right time and venue, and ensure clear expectations for all participants. [9] In favor of creativity, Falk et al. underscored the significance of the participant experience and the need for event customization to encourage diverse participation. [12] To study the impact of tipping the scales towards determinism or creativity on participants’ experience with the technology and event, we decided to perform an intervention where some groups received more structure during the workshop portion of the event than others, which came in the form of curated sample prompts designed to help participants achieve various tasks with their dataset. There were several objectives in organizing this event: i) educate the workforce on the benefits and responsible use of GenAI, ii) promote collaboration and innovation both during and after the event, and iii) examine the effects of prior GenAI exposure and the provision of sample prompts during the event on participant experience. The findings from the inaugural Prompt-a-thon will inform future approaches to engage our community in leveraging GenAI to improve healthcare.

Discussion The NYULH Prompt-a-thon aimed to augment the understanding and utilization of GenAI within our workforce, with the broader goal of democratizing the technology throughout our health system. Our event was designed to demonstrate the capabilities of GenAI, encourage its use for research and clinical tasks within NYU Langone Health, and ascertain how to run an engaging and educational Prompt-a-thon for a professional, diverse healthcare community. We confirmed our hypotheses that the event would bolster participants’ confidence and comfort using GenAI while underscoring its limitations and ethical considerations. The feasibility of future Prompt-a-thons is supported by the broad engagement of our healthcare community, our ability to effectively support the technological challenges inherent in such an event, and the low cost of technology usage during the event. The 412 and 70 virtual and in person attendees, with minimal advertising, reflect the vast interest in this type of event and its success in activating the workforce around GenAI, particularly the 18.5% percent who applied and hadn’t used GPT yet. The heterogeneity of employees who attended this event highlights the ability for such an event to reach all corners of a healthcare system. Further supporting the feasibility of this event, technological challenges such as bandwidth and privacy concerns were anticipated and prepared for, such that very few blocked calls to the API occurred and users were satisfied with their experience. Other organizations can reference our event metrics to ensure they can rightsize their technology capacity to the participant demands we observed (e.g. 24 prompts per user per session; max rates of 26 prompts per minute in a 70-attendee event). The survey findings suggest our event was received well by most attendees, was successful in increasing participants’ confidence and comfort using GenAI, and would likely drive engagement in the broader community due to participants’ strong desire to encourage others to use GenAI and attend similar events. These results support the notion that crowd events like hackathons can foster positive participant attitudes and accelerate innovation. [9] Callcut et al. discussed how these events can foster networking with domain experts and accelerate academic productivity, [15] which aligns with the enthusiasm demonstrated by participants after the lightning talks by experts in the field and their enhanced willingness to disseminate excitement about and utilization of GenAI across the health system. While users on average found their trust in GenAI improved because of the event, the fact that it scored the lowest of all survey questions suggests we were successful in our objective of promoting responsible use of GenAI by encouraging skepticism of GenAI outputs among participants without hampering their excitement for exploring its innovative capabilities. Recognizing the need for healthcare professionals and researchers to adapt to rapidly evolving technology, we equipped our participants with knowledge on responsible GenAI use by highlighting its potential to potentiate bias,6–8 data privacy issues, [4–5] and inherent technological limitations. [21–25] Crucially, we emphasized that mitigating these concerns requires the vigilance of subject-matter experts who are aware of these issues. The Prompt-a-thon’s achievement in promoting healthy skepticism of participants towards GenAI stems from their understanding of its capabilities and limitations. However, the event also instilled that participant-generated prompts could also lead to biased outputs, so they should employ prompt engineering techniques to limit or understand biases present in their prompts (e.g., chain-of-thought prompting [21–22]) and recruit diverse feedback when addressing unacceptable outputs. The Prompt-a-thon received positive qualitative feedback from participants and mentors, who found the event informative, inspiring, and diverse, though they identified a few areas for improvement. Both groups agreed that there could have been more focus on objectives prior to beginning the workshop portion of the event, which appeared to be exacerbated in teams without sample prompts. Teams were intentionally heterogenous with respect to role, department and specialty because diverse teams have been shown to be more innovative, [19] which was a central objective for event organizers. However, mentors reported that some individuals used GenAI to address their own work-related tasks rather than collaborating with their teams. Participants’ goals likely included developing local solutions for tasks specific to their individual job role. Therefore, they may have been more motivated to collaborate if groups were more homogenous with respect to their department or specialty, which must be balanced against event organizers’ innovation-oriented objectives. To promote greater team collaboration, we could have utilized the strategy of Hynes et al., who provided participants with detailed instructions and examples weeks in advance of the event. [18] Additionally, devoting more of the didactics sessions to preparing participants for the workshop, making the workshop more structured with explicit instructions at multiple time periods, and enabling report-outs of best practices teams identified (followed by time for teams to experiment with implementing these best practices) may have further facilitated collaboration. These conclusions align with prior work stating that it is critical to set clear expectations for hackathon participants [11,18] and account for the differing goals of participants and organizers. [12] Our experience with the Prompt-a-thon provided valuable insights into the nuances of prompt engineering for novice users in healthcare or research roles. The use of project cards, which delineated potential projects into themes, relevant data sources, and core tasks, successfully demystified the complexities of GenAI for our participants. The core tasks were inspired by existing literature on prompt engineering, [21–25] but sought to offer a simpler, more accessible framework that illuminated GenAI’s broad applicability to participants’ daily work. Providing sample prompts to a random selection of clinical teams, while having no significant impact on survey results, was found during qualitative feedback sessions with mentors to alleviate participants’ “blank page anxiety,” facilitating more rapid progress. Our experience revealed the importance of providing a structured, simplified framework to novice prompt engineers; offering distinct, well-defined tasks and examples via the project cards facilitated a smoother and more fruitful interaction with GenAI, enhancing their overall experience and learning outcomes. The evaluation of the Prompt-a-thon has several limitations. First, we did not use validated scales because there is limited research on optimal evaluations of crowdsourcing events, especially those that involve LLMs. Our assessment of prior experience with GenAI was not sensitive enough to capture the likely underlying variance. Most participants characterized their prior GenAI experience as “moderate,” limiting our confidence in comparisons across groups stratified by experience. Second, the generalizability of the findings may be limited to those who work at large academic medical centers whose workforce may be more experienced with and supported in using new technologies. Third, our sample size was limited by event constraints, which included physical space, bandwidth, and other associated costs.

Conclusion In conclusion, the Prompt-a-thon successfully brought together diverse healthcare professionals to explore and engage with GenAI, demonstrating its transformative potential and accessibility. The event effectively taught participants responsible GenAI use by highlighting its limitations and associated prompt engineering solutions and fostering personal accountability for GenAI outputs employed for their professional work. It also provided valuable insights for future initiatives, emphasizing the need for more specific instruction, particularly on prompt engineering, and structure, such as providing time for participants to share effective practices as a group and then apply them. As part of our ongoing commitment to fostering a supportive GenAI community, we will continue to provide resources for participants, which includes offering support office hours. We also have “mini-Prompt-a-thons" planned with domain-specific champions and datasets to enhance engagement and provide more targeted learning experiences. We believe the “Prompt-a-thon” intervention and associated learnings are readily scalable to other healthcare institutions interested in democratizing Gen AI. Educating our collective healthcare workforce in Gen AI’s strengths and limitations is essential to realizing the dream of this technology in improving patient care.

Acknowledgments We would like to thank our event participants and staff for generously giving their time and sharing their insights with us. Without their contributions, this research would not have been possible. We would like to acknowledge our NYULH colleagues and collaborators for their valuable feedback and discussions throughout this project. Those include, but are not limited to, Mark Triola, Eric Oermann, Kelly Ruggles, Lavender Jiang, Erin Lostraglio, and Nader Mherabi. We would also like to acknowledge our Microsoft partners, who were instrumental in realizing this event. They include, but are not limited to, Amy Brandt, Ana Del Campo Mendizabal, Cat Stolar, Greg Dinin, Nikhil Keeppanasseril, Rachel Romanczukiewicz, and Tim Fairlie.

[END]
---
[1] Url: https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000394

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/