(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org.
Licensed under Creative Commons Attribution (CC BY) license.
url:
https://journals.plos.org/plosone/s/licenses-and-copyright
------------
A systematic review of the prediction of hospital length of stay: Towards a unified framework
['Kieran Stone', 'Department Of Computer Science', 'Aberystwyth University', 'Ceredigion', 'Wales', 'United Kingdom', 'Reyer Zwiggelaar', 'Phil Jones', 'Bronglais District General Hospital', 'Aberystwyth']
Date: 2022-05
In addition to regression analysis, data mining approaches to LoS prediction have also become more popular. A data mining approach refers to an approach that is designed to extract usable data from a larger set of raw data. These approaches make use of techniques such as clustering and classification to perform knowledge discovery from data. Although there is some overlap with the studies mentioned previously, particularly [70], these techniques aid the user in discovering patterns in large datasets by uncovering the hidden patterns of various relationships in the data. It is from these relationships that knowledge can be extracted that can support decision making in a hospital or clinical environment. This is more commonly known as medical data mining [73]. As previously mentioned, there has been limited machine learning research that directly considers the goal of predicting LoS, instead the currently adopted machine learning methodologies focus on patients with specific conditions or the work centres around the factors that influence LoS in different contexts. In this section, the symbolic and sub-symbolic approaches [74] that have been employed for the task of LoS prediction are explored.
3.3.1 Sub-symbolic approaches.
Sub-symbolic learners such as Neural Networks (NNs) and Support Vector Machines (SVMs) have been used in a wide range of applications. They have the ability to implicitly identify relationships between a series of independent features and their corresponding dependent features that would have been otherwise unattainable. In the last 30 years, there has been an increased interest in sub-symbolic approaches in a clinical setting particularly in the area of predicting LoS.
In 1993, an approach utilised a NN for predicting ICU length of stay following cardiac surgery [75]. The population consisted of 1,409 patients who underwent open heart surgery in Toronto. The network was able to effectively divide patients into three heterogenous groups: low, intermediate and high risk of prolonged LoS. The overall performance of the network was evaluated using the area under the Receiver Operating Characteristic curve (ROC) and was found to be 0.7094 in the training set and 0.6960 in the test set. It was concluded that the network was able to perform to a satisfactory level but required further prospective clinical testing to determine whether or not it would be clinically useful.
In 1994, similar work which tested the effect of diagnosis on training a NN to predict the LoS for psychiatric patients involuntarily admitted to a state hospital was carried out [76]. A series of NNs were trained which represented Schizophrenia, affective disorders and diagnosis-related group. The features that were used to train the networks included a patient’s demographics, severity of illness, and nature of residence amongst other features that were identified to be significant in assessing a patient’s LoS. The NN predictions were compared with actual LoS indicated accuracy rates which ranged from 35% to 70%. The validity of these predictions were measured by comparing the LoS estimates with a clinical treatment team’s predictions at 72 hours after admission. In all cases, the NN was able to predict as well as or better than the treatment team.
Equally, in [78] the LoS of patients receiving care at a post-coronary care unit to predict possible stays of 1 to 20 days was investigated. An ANN [77] was trained using 629 patient records and used 127 records as a test set. There was an average 1.4 day difference per record between the actual LoS in the test set and the predictions of the network. The actual LoS predicted within 1 day was predicted with an overall accuracy of 72%. It was concluded that ANN-based classifiers demonstrated an ability to utilise common patient admission characteristics as predictors for LoS.
In a further study [79], ANN-based learners are also employed in order to stratify the length of stay of cardiac patients into risk groups based upon preoperative and initial postoperative patient feature values. The work focused on 1,292 patients that underwent cardiac surgery between 2001 and 2003 in the department of cardiothoracic surgery. Reviewing contemporary literature, 15 preoperative risk factors and 17 operative and postoperative features were discovered to have had an influence on LoS. ANNs and ensembles of ANNs were applied to the scaled data. The study concluded that ensembles of ANN-based learners were best suited to the task of predicting LoS for postoperative cardiac patients compared with ANN-based learners in isolation.
In [80] three different learners were applied to the task of classifying the LoS of coronary patients. These three techniques were drawn from different areas of machine learning, namely: Decision Trees [81], Support Vector Machines (SVMs) [82] and ANNs. The data consisted of the patient records of 4,948 patients who had suffered from coronary artery disease. The data included 36 different features. The dataset was partitioned into a training set and a testing set: 80% of the data was used for training and 20% of the data was used for testing. The training set was used to select the optimal hyperparameters of the models and the testing set was used to evaluate each model’s predictive ability. This study determined that all three algorithms were able to predict LoS with varying degrees of accuracy with SVM scoring the highest at 96.4%. It was also revealed that there was a strong tendency for LoS to be longer in patients with lung or respiratory disorders and high blood pressure. It is important to note however that despite the SVM algorithm outperforming the others, it can be very difficult to understand the underlying rules that are learned by sub-symbolic techniques as opposed to symbolic based learners such as decision trees or logistic regression where the output is transparent to human scrutiny. This aspect is often of paramount importance for medical applications and diagnosis as human expert input is often used to assist in understanding the data and the underlying relationships between the features and the decision classes [83].
Two stage LoS prediction was utilised for predischarge and preadmission patients in [99]. The predischarge stage makes use of all of the available data for hospital in-patients and the preadmission stage uses only the data that is available prior to a patients admission. The overall prediction results of the predischarge patients were used to evaluate the LoS prediction performance at the preadmission stage. The data set contained records of 2,377 cardiovascular disease patients with one of three diagnoses: Heart failure (HF), Acute Myocardial Infarction (AMI) and Coronary Atherosclerosis (CAS). The generated classification model was able to correctly predict 88.07% to 89.64% with a mean absolute error (MAE) of 1.06 ∼ 1.11 at the predischarge stage and 88.31% to 89.65% with an MAE of 1.03 ∼ 1.07 at the preadmission stage, respectively for CAS patients using an ANN. For HF and AMI patients the prediction accuracy ranged from 64.12% to 66.07% at the predischarge stage with an MAE of 3.83 ∼ 3.91 and 63.69% and 65.72% with an MAE of 3.87 ∼ 3.97 at preadmission.
In 2020, a study provided an accurate patient specific risk prediction for one-year postoperative mortality or cardiac transplantation and prolonged hospital LoS with the purpose of assisting clinicians and patient’ families in the preoperative decision making process [98]. The study applied 5 Machine learning algorithms (Ridge logistic regression, decision tree, random forest, gradient boosting, deep neural network) which predicted and calculated individual patient risk for mortality and prolonged LoS using the Pediatric Heart Network Single Ventricle Reconstruction Trial dataset. A Markov Chain Monte-Carlo simulation method was used to impute missing data and the feed the features to the machine learning models. The deep neural network model demonstrated 89 ± 4% accuracy and 0.95 ± 0.02 AUCROC.
Similarly, in 2020, a study proposed a 2 general-purpose multi-modal network architectures to enhance patient representation learning by combining sequential and unstructured clinical notes with structured data [100]. The proposed fusion models leverage document embeddings for the representation of long clinical note documents and either convolutional neural network or long short-term memory networks to model the sequential clinical notes and temporal signals, and one-hot encoding for static information representation. The performance of the proposed models on 3 risk prediction tasks (hospital mortality, 30-day readmission and long LoS prediction) was evaluared using derived data from the MIMIC III dataset. The results showed that by combining unstructured clinical notes with structured data, the proposed models outperform other models that utilize either unstructured notes or structured data only.
As recently as 2021, a study assessed the effectiveness of machine learning models using daily ward round notes to predict the likelihood of discharge within 2 days and predict the likelihood of discharge within 7 days as well produce an estimated date of discharge on a daily basis [101]. Daily ward round notes and relevant discrete features were collected from the electronic medical record of patients admitted under General Medicine at the Royal Adelaide hospital over an 8-month period. Artificial neural networks and logistic regression were effective at predicting discharge within 48 hours of a given ward round note. These models achieved AUC of 0.80 and 0.78, respectively. Prediction of discharge within 7 days of a given note was less accurate, with artificial neural network returning an AUC of 0.68 and logistic regression an AUC of 0.61.
The inherent success of sub-symbolic learners is irrefutable; they perform well in identifying complex, non-linear relationships in the data. Nevertheless, the “black-box” nature of these techniques is an obstacle to acceptance by clinicians and medical experts in a hospital environment. It is likely that clinicians will be reluctant to welcome the achievements of these approaches despite the benefits their predictive abilities might bring, as there is no explicit explanation for the derivation of their results. Inevitably, this calls for systems that support decisions which are explainable and transparent, especially with the rise of legal and privacy legislation in the form of the European General Data Protection Regulation (GDPR) which could make justifying the use of blackbox approaches more difficult. As such, research towards building explainable-AI (XAI) systems has become increasingly prevalent, particularly in a medical domain [86]. Explainable models typically take two forms, Post-hoc systems and Ante-hoc systems. Post-hoc systems provide local explanations for a specific decision and make it reproducible on demand. An example of this is BETA (Black Box Explanations through Transparent Approximations) which is a model-agnostic framework for explaining the behaviour of a given black-box classifier by optimising for high agreement between the original model and general interpretability of the explanation of the model, first outlined in [87]. Ante-hoc systems however, are interpretable by design and have been termed “glass box approaches” [88]. Examples of such approaches include decision trees and fuzzy inference systems. Fuzzy inference systems in particular, have historically been designed from expert knowledge or data and demonstrate a valuable framework for interaction between human expert knowledge and hidden knowledge in the data [89]. A further example of XAI is the use of high performance generalised additive models with pairwise interactions (GAMs) which were applied to the medical domain and can yield explainable and scalable models with a high degree of predictive accuracy on large datasets [90]. Currently in the domain of LoS prediction, there is a lack of work which makes use of explainable sub-symbolic learners, however, the increasing widespread applicability of these models necessitates a need for explanations in order to hold sub-symbolic approaches accountable in healthcare environments [83].
Furthermore, modelling length of stay using deep learning can lead to some ethical implications. Even a healthcare task as simple as determining whether a patient has a disease can be skewed by how prevalent diseases are, or how they are manifested in specific patient populations. One example of this would be a model that has been created to predict if a patient will develop heart failure. This model will undoubtedly require patients who have heart failure and those patients without heart failure. The selections of these patients can often rely on parts of EHR data that can be skewed due to either the lack of access to care or abnormalities in clinical care. More specifically, clinical protocol can affect the frequency and observation of abnormal tests [84] and naive data collection can yield inconsistent labels in chest X-rays [85]. Biased labelling and the models that result from this labelling can be a crucial factor when it comes to clinical resource management of a healthcare system.
In the same way, developers of clinical LoS models may also choose to predict healthcare costs. This means that a machine learning model will seek to predict which patients will have a prolonged LoS in hospitals and which patients will cost the healthcare provided more in the future. Some model developers may use healthcare costs as a proxy for future health needs to guide accurate healthcare interventions for high cost patients. However, it also possible that other modellers may explicitly want to understand patients who will have a high healthcare cost to reduce the total cost of healthcare. This could lead to healthcare providers denying care to those perceived to have a prolonged LoS in hospital. This could potentially be a worrying trend and because socioeconomic factors affect both access to financial resources and healthcare, these models could yield predictions that exacerbate inequities.
As well as a considering the ethical implications of using machine learning models for EHR Data, it is also important to consider the opportunities for Bias in using EHR data for machine learning in clinical decision support. As the utilisation of machine learning in healthcare is increasing exponentially, the underlying data sources and methods of data collection should undergo examination. As mentioned previously, it is possible that these modelling approaches could worse or perpetuate existing health inequalities. There is no doubt that any observational study or statistical modelling method could succumb to bias; however, the data that is available in healthcare has the potential to affect important clinical decision support tools that are based on machine learning. One example of such bias is missing data bias. EHR data may only contain more severe cases for specific patient populations and make incorrect inferences about the risk for such cases. Incomplete data can result in large portions of the population being eliminated and result in inaccurate predictions for certain patient groups. To be clear, this could affect vulnerable patient populations and as such could lead to patients having more fractured care and or being seen at multiple or varying healthcare institutions.
Measurement error is another important bias that could be apparent in EHR data. For example, patients of a lower socioeconomic status may be more likely to be seen in teaching clinics as opposed to private care where the data input or clinical reasoning could be less accurate or different from that of patients from a higher socioeconomic status. This implicit bias could lead to disparities in the level of care provided to patients with different socioeconomic backgrounds. A machine learning model that contends with healthcare data that is collected in this environment could inaccurately learn to treat patients of low socioeconomic status according to less than optimal care or according to the implicit bias of the data.
[END]
[1] Url:
https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000017
(C) Plos One. "Accelerating the publication of peer-reviewed science."
Licensed under Creative Commons Attribution (CC BY 4.0)
URL:
https://creativecommons.org/licenses/by/4.0/
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/