(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .
Interpretable machine learning for automated left ventricular scar quantification in hypertrophic cardiomyopathy patients [1]
['Zeinab Navidi', 'Division Of Cardiology', 'Peter Munk Cardiac Center', 'Toronto General Hospital', 'University Health Network', 'University Of Toronto', 'Toronto', 'Department Of Computer Science', 'Vector Institute', 'Jesse Sun']
Date: 2023-04
Scar quantification on cardiovascular magnetic resonance (CMR) late gadolinium enhancement (LGE) images is important in risk stratifying patients with hypertrophic cardiomyopathy (HCM) due to the importance of scar burden in predicting clinical outcomes. We aimed to develop a machine learning (ML) model that contours left ventricular (LV) endo- and epicardial borders and quantifies CMR LGE images from HCM patients.We retrospectively studied 2557 unprocessed images from 307 HCM patients followed at the University Health Network (Canada) and Tufts Medical Center (USA). LGE images were manually segmented by two experts using two different software packages. Using 6SD LGE intensity cutoff as the gold standard, a 2-dimensional convolutional neural network (CNN) was trained on 80% and tested on the remaining 20% of the data. Model performance was evaluated using the Dice Similarity Coefficient (DSC), Bland-Altman, and Pearson’s correlation. The 6SD model DSC scores were good to excellent at 0.91 ± 0.04, 0.83 ± 0.03, and 0.64 ± 0.09 for the LV endocardium, epicardium, and scar segmentation, respectively. The bias and limits of agreement for the percentage of LGE to LV mass were low (-0.53 ± 2.71%), and correlation high (r = 0.92). This fully automated interpretable ML algorithm allows rapid and accurate scar quantification from CMR LGE images. This program does not require manual image pre-processing, and was trained with multiple experts and software, increasing its generalizability.
Accurate scar quantification of cardiac magnetic resonance (CMR) late gadolinium enhancement (LGE) images is important in managing hypertrophic cardiomyopathy (HCM) patients. We developed a 2D convolutional neural network to quantify CMR LGE in HCM patients that is computationally interpretable and trained using multicenter data analyzed by 2 expert readers using 2 different analysis packages. Our model demonstrated low bias and limits of agreement and high correlation with expert analysis. Benchmarking comparison was performed between our algorithm and standard U-Net model with and without cropped raw images. Our method showed superior performance and has high potential for clinical adaptability.
Funding: Funding for this study was provided by the Peter Munk Cardiology Center Innovation Fund and the MSH-UHN AMO Innovation Fund. BW is partially supported by the CIFAR AI Chair Program. WT is supported by a Heart and Stroke Foundation of Canada National New Investigator Award. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Data Availability: Data cannot be shared publicly because of HIPAA requirements. For data requests, please contact the corresponding author at
[email protected] . All code is available in a public link:
https://drive.google.com/drive/folders/1197aHAFmLWqknuvrKAm51i7fH4q8c7Bo and will be shared on GitHub after publication.
Recently, machine learning (ML) and specifically deep convolutional neural networks (CNN) have been used to automate CMR LGE image segmentation in HCM patients [ 13 – 18 ]. However, many of these ML algorithms require image pre-processing or relied on a single expert reader as their reference standard, potentially limiting their generalizability and adoptability. The Shape Attentive U-Net (SAUNet) model, a previously developed algorithm from our group, focuses on model interpretability and robustness, and has shown promising performance on medical image segmentation [ 19 ]. We aimed to use SAUNet to develop a 2-dimensional (2D) computationally interpretable CNN model to efficiently and accurately segment left ventricular (LV) endo- and epicardial borders and quantify scar on LGE CMR images in HCM patients with minimal pre-processing and using a single NVIDIA Tesla P100 graphics card.
Hypertrophic cardiomyopathy (HCM) is the most common inheritable cardiomyopathy with a reported prevalence as high as 1 in 200 [ 1 ]. Patients with HCM can develop myocardial fibrosis, which is associated with heart failure and sudden cardiac death [ 2 – 5 ]. Late gadolinium enhancement (LGE) techniques on cardiovascular magnetic resonance (CMR) imaging allow for non-invasive detection and quantification of fibrosis in patients with HCM. Due to its prognostic value, current guidelines for the management of HCM patients recommends assessment of LGE by CMR as an important component for risk stratification [ 6 – 8 ]. However, in current practice, LGE quantification can be subjective, time-consuming, and requires training to delineate both the myocardial borders and the hyper-enhanced regions on the LGE images [ 9 – 12 ]. These issues are even more pronounced in HCM patients where scar is most often patchy and multi-focal [ 12 ].
The mass and LGE% statistics of ground truth from different sex groups are provided in S2 Table , using the 6SD SAUNet model. By Welch’s t-test, there are no differences between the predicted and ground truth means for between sexes.
Tables 3 – 4 provides the comparisons for LV segmentation and scar detection between the 4SD and 6SD SAUNet models and the U-Net models. While the DSC scores for SAUNet and U-Net endo and epicardium segmentation were close, the DSC scores for scar prediction were 5–7% higher for 4SD and 6SD SAUNet models compared to the implemented U-Net models. Cropping the images to isolate the LV region improved the absolute DSC scores for both the SAUNet and U-Net models by 2–8% compared to their respective results without image cropping. However, the results of the SAUNet program remained superior to U-Net ( Table 3 ).
The percentage LGE quantified by the SAUNet 6SD model was 1.28 ± 2.76% compared to 1.80 ± 3.38% obtained by manual expert-analysis, which was not significantly different (p = 0.35). Correlation between the 6SD model and manually quantified percentage LGE scar was high with an r-value of 0.92 ( Fig 5C ). Bland-Altman analysis for LGE% demonstrated that the 6SD model had a low bias of -0.53 g with limits of agreement of -3.23 g to 2.18 g ( Fig 5D ).
LV segmentation by the SAUNet 6SD model demonstrated excellent similarity to manual segmentation with a DSC score of 0.91 ± 0.04 for the LV endocardium and 0.83 ± 0.03 for the LV epicardium. For LV LGE scar quantification, the DSC score was good at 0.64 ± 0.09. There was no significant difference between the average LGE scar mass quantified by the 6SD model compared to manual expert analysis (2.21 ± 4.38 g model versus 2.68 ± 4.77 g, p = 0.50, Table 2 ). The per patient correlation between the 6SD model LGE scar mass and expert manually quantified LGE scar mass was high with an r-value of 0.91 ( Fig 5A ). Bland-Altman analysis for scar mass demonstrated that the 6SD model had a low bias of -0.56 g with limits of agreement of -4.44 g to 3.32 g ( Fig 5B ).
The percentage LGE quantified by the SAUNet 4SD model was 2.55 ± 4.94% compared to 3.47 ± 5.62% obtained by manual expert-analysis, which was not significantly different (p = 0.34). Correlation between SAUNet 4SD model and manually quantified percentage LGE scar was high (r-value 0.90, Fig 4C ). For %LGE, Bland-Altman analysis demonstrated that the 4SD model slightly underestimated LGE with a bias of -0.93% and limits of agreement of -5.64% to 3.78% ( Fig 4D ). There was no significant difference between sites in either scar quantification or the percentage of LGE quantified.
LV segmentation by the developed model demonstrated excellent similarity to manual segmentation with a DSC score of 0.92 ± 0.04 for the LV endocardium and 0.83 ± 0.03 for the LV epicardium. For LV LGE scar quantification, the DSC score was good at 0.60 ± 0.08. There was no significant difference between the average LGE scar mass quantified by the 4SD model compared to manual expert analysis (3.77 ± 7.11 g model versus 4.56 ± 7.23 g expert, p = 0.55, Table 2 ). The per patient correlation between SAUNet 4SD model LGE scar mass and expert manually quantified LGE scar mass was high (r-value 0.92, Fig 4A ). Bland-Altman analysis for scar mass demonstrated that the 4SD model had a low bias of -0.79 g with limits of agreement of -6.26 g to 4.68 g ( Fig 4B ).
Examples of patients with mild (8% LGE) (a) and large (51% LGE) scar burden (b). The left most column is the expert-based label or ground truth, and the second column is the model prediction. The third and fourth columns are the spatial attention heatmaps at the 1/2 and 1/4 resolutions, respectively. Note, not all slices from each patient are presented.
Examples of patients with no scar. The leftmost column is the original MRI image, the second column is the expert-based label or ground truth, and the third column is the model prediction. The fourth and fifth columns are the spatial attention heatmaps at the 1/2 and 1/4 resolutions, respectively.
Model development is described in Fig 1 . The average analysis time for one image using our algorithm was less than 70 milliseconds using a single NVIDIA Tesla P100 GPU. Figs 2 and 3 provide examples of the expert-based analysis and contours predicted by the SAUNet algorithm in patients with and without scar. The heatmaps obtained from different layers of the model were obtained to visualize the focus of the model at different steps. These intermediate-level outputs for each layer of the SAUNet model aided in identifying which layers required modification to improve the final segmentation performance for faulty predictions. The spatial attention maps of the final Dual Attention Block (DA-Block) are provided in Figs 2 and 3 for the corresponding samples to highlight regions of interest by the model during intermediate stages of the algorithm computation [ 19 ].
Baseline patient demographics are presented in Table 1 . The median age of patients was 52 years (interquartile range, IQR, 39–61 years) and the majority (70%) were male. The short axis LGE images consisted of 8 ± 2 images per patient (interquartile range, IQR, 7–9 images). LGE images from 307 HCM patients were divided into exclusive training or testing subsets. The training set included 247 patients (2056 images) with 200 patients having LGE scar (927 images). The testing set consisted of 60 patients (501 images) with 53 patients having LGE scar (253 images). See the S1 Table for more detailed information.
Discussion
In this study, we have used multicenter CMR data to successfully develop and validate a fully automated deep learning algorithm that contours the LV endo- and epicardial borders and quantifies LGE in patients with HCM. Based on the experiments we performed, our pipeline provides more accurate and robust scar quantification as it was trained with data from different sites, vendors, readers, and analysis packages. It also has higher clinical utility as it does not require manual image pre-processing and can rapidly analyze standard CMR LGE images using a single graphics card. Finally, it is based on SAUNet architecture, allowing strong computational interpretability by providing visualization attention maps during the intermediate stages of the segmentation.
Compared to previous studies specifically investigating HCM patients, the CMR images used in our study were collated from multiple vendors, and two distinct sets of these images were analyzed by two different readers using different software packages. The incorporation of data from different sites, vendors, readers, and analysis packages enabled us to develop a more robust model. Studies have demonstrated significant inter-reader variability in CMR LGE image analysis, which is exacerbated by the patchy multifocal CMR LGE appearance in HCM patients [9–12]. As such, incorporating more than one reader reduces the risk of bias that may develop in a deep learning model trained using contours from one clinician. Given the aforementioned advantages, the dataset used to train and test the algorithm allowed us to develop a model with greater potential for wider clinical use.
Moreover, we trained and validated our model to function efficiently on a single NVIDIA GPU to analyze uncropped images. Our algorithm can rapidly process one image in less than 0.07 seconds, which is comparable to previous programs. Using our program, an average CMR study consisting of approximately 8 LGE images would require less than 0.56 seconds to be analyzed. This is considerably shorter than the time currently required for experts to manually segment the LV and quantify the scar burden from CMR LGE images [20]. This offers time-savings to clinicians and will reduce the amount of training required to perform CMR LGE scar analysis. It will also increase the number of patients receiving quantitative scar burden measurement over a qualitative assessment.
Our pipeline does not require the extra step of manually cropping the images such that all the structures around the LV are removed. Most of the previously developed programs were trained and tested using these LV focused CMR LGE images, because it reduces computational requirements as the tasks (i.e., identify the LV and then segment the LV) required by the algorithm are reduced. The ability to analyze a more complete image permits this program to be more easily integrated into an automated clinical workflow.
To demonstrate the improvements gained by our SAUNet-based algorithm, we compared our results against a U-Net model trained and tested on the same samples with the same gold standard definition. Our benchmarking also included assessing the impact of image cropping to isolate the LV. Overall, CMR LGE quantification is a more complicated task as demonstrated by the lower DSC scores compared to LV segmentation as seen in this analysis and in prior publications. However, our SAUNet model performed better at scar segmentation compared to the standard U-Net, and this improvement persisted with cropped images. Overall, SAUNet results in an algorithm that outperforms standard U-Net architecture [16].
The use of SAUNet also improves both computational interpretability and performance of final prediction. SAUNet is a 2D architecture that we developed that uses shape-dependent information in addition to texture information to improve a CNN model’s robustness [19]. We have taken this architecture and adapted it to HCM LV segmentation and LGE quantification. Previously developed HCM algorithms used off-the-shelf U-Net models, which lack interpretability. In comparison, SAUNet allows for multi-level computational interpretability and removes the need for additional post-hoc computations or gradient-based saliency methods for sensitivity analysis [19,21]. The intrinsic attention maps within the SAUNet model are a strong alternative to saliency methods with some improved computational advantages [19]. Namely, attention maps at varying resolutions and layers of the model are all computed during the forward-pass of an image, reducing the need for multiple iterations of additional post-hoc computation to compute the saliency map of different points within the model. We suggest this computational interpretability is beneficial as having a seamless method to visualize comprehensible intermediate stages of the model is useful for debugging the pipeline, especially in the case of using larger datasets with greater variance (i.e., from different readers and centres). Existing works for automated scar quantification in HCM patients proposed programs with no framework or guidelines for interpretability analysis [13–16]. By verifying that an algorithm is not perpetuating biases, a valuable tool can be created to help solve the challenges numerous clinicians face in medical image analysis.
[END]
---
[1] Url:
https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000159
Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/