Cardiac function in a large animal model of myocardial infarction at 7 T: deep learning based automatic segmentation … – Nature.com

Posted: Published on May 20th, 2024

This post was added by Dr Simmons

The data used in this study are part of a comprehensive large animal study12,13,14. Details of our data can be accessed via the Zenodo repository (see chapter Data availability).

The methods of image acquisition, DL model training, segmentation, and analysis used in this study are described below. For a schematic illustration of the study procedure, see Fig.1.

Schematic representation of the design of the study. Eleven animals (seven infarct pigs and four sham pigs) were imaged in a 7T MR scanner four times each. The acquired high resolution images were labelled using different methods: Two different observers performed a manual segmentation. The end-diastolic and end-systolic labels from observer one were used in a transfer learning algorithm to re-train a DL model. This pre-trained model has a U-Net structure (illustrated schematically) and a ResNet34 backbone. In the transfer learning process, the model was trained using 560 high-resolution 7T images and the labels of observer one (manually created ED/ES labels or empty segmentation masks if no tissue to be segmented was visible). 212 images served for validation of the model, with which different stages of the model were tested and the performance evaluated. Parameters were changed and the training continued. Then, the model with the best performance (highest dice score) was selected and used as our model. It was tested on 288 images it had not seen so far (test set), and it provided labels for those images. In addition, we segmented the images using an automatic tool within the clinical software Medis. All different segmentations were then compared to each other in a statistical analysis. Dice scores and Hausdorff distances of the labels and the derived cardiac parameters were calculated and compared. MRI magnetic resonance imaging, DL deep learning, ED end-diastolic, ES end-systolic.

The large animal study was approved by the District Government of Lower Franconia, Germany, (Grant 55.2.2-2532.2-1134-16) and all experiments were performed in accordance with relevant guidelines and regulations. The study report follows recommendations in the ARRIVE guidelines. Details regarding experimental animals and experimental procedures have been previously reported by Schreiber et al.16. Experiments were performed in three blocks of n=4 animals, where the first two blocks belonged to the treatment group and the third block to the sham group. Since one animal died following infarct induction, corresponding data was omitted from this study. No blinding was applied with respect to groups. Blinding applied with respect to outcome and data analysis is described in the section Manual segmentation. The same concept was applied to the image quality rating.

We thus included a total number of eleven pigs. In seven of these, myocardial infarction was induced by 90-min occlusion of the left anterior descending artery (LAD) using a balloon catheter inserted via a femoral coronary catheter, after baseline magnetic resonance imaging (MRI).

Four sham animals were used as a control group and received the same intervention with exception of the balloon catheter inflation and occlusion of the coronary artery. Each of the animals underwent a total of four 7T MRI scans. One baseline scan before the procedure (MRI 1) and three scans (MRI 24) at different times (31days, 121days, and 581days) after infarction or sham procedure12.

MR images were acquired on a 7T MAGNETOM Terra system (Siemens Healthineers, Erlangen, Germany). We used three in-house built 8Tx/16Rx coils17 of different sizes to adapt to the increasing weight of the pigs throughout the study.

Scan parameters for high-resolution cine imaging were slice thickness: 6mm, in-plane spatial resolution: 0.4mm0.4mm, TE/TR: 3.18/49.52ms, echo spacing: 6.2ms, bandwidth: 893Hz/Px and flip angle: optimal (1527). A short-axis stack includes 30 frames per cardiac cycle and 1116 slices from base to apex. The measurements were performed under breath hold.

To assess the quality of the high-resolution cine images, each image in the end diastole and end systole was rated from one (best) to four (worst) based on three criteria (artefacts, noise, and general image quality). The scores were defined as (1) no artefacts/hardly any noise/very good image quality, (2) minor artefacts/noise/reduced image quality that does not affect the delineation of structures, (3) artefacts/noise/reduced image quality that affects the delineation of structures and may lead to misinterpretation, and (4) nondiagnostic image due to major disturbances. The three parameters were then summed up to obtain a total score for each image ranging from three (best possible result) to twelve (worst possible result)18.

Post-processing of the obtained MR images was performed using the commercially available software Medis Suite (QMass, Version 3.1.16.8, Medis Medical Imaging Systems, Leiden, Netherlands).

A standardized procedure was followed for manual segmentation of the short-axis cine stack19. The end-systolic and end-diastolic phases were selected based on the visually smallest and largest volume of the left ventricular (LV) blood pool, respectively. Epi- and endocardial borders of the myocardium were then delineated in these phases. Papillary muscles were not excluded from the blood pool, since both in- and exclusion are presented as valid approaches in the guidelines19 and the original DL model is not trained to recognize and label papillary muscles.

After one observer completed the segmentation, it was repeated by the same observer after a period of at least one week to evaluate the intra-observer variability. In addition, all scans were segmented by a second observer to assess the inter-observer variability. The two examiners were blinded to each other's segmentation; only the end-diastole and end-systole were set to the same phases for all observers prior to segmentation to allow calculation of Dice scores.

All figures showing CMR images with myocardial contours were processed subsequently. To improve contours with respect to general visibility and colour-blind readers, green and red pixels of the epicardial and endocardial contours were re-coloured blue and magenta, respectively. We used Adobe Photoshop CS6 (Version 13.0, Adobe Systems Incorporated, San Jose, California, USA) for this purpose.

CMR analysis software usually provides tools for fully automated LV segmentation. We used Medis Suite (QMass, Version 3.1.16.8) for CMR post-processing, which is intended for clinical use in human patients. We tested their automatic tool in QMass on our 7T images of porcine hearts.

Starting point for the deep learning was a pre-trained model published by Ankenbrand et al.15 This model has a U-Net architecture20 with a ResNet34 backbone21 implemented in fastai22. Pre-training was performed using cardiac MRI data from the "Data Science Bowl Cardiac Challenge Data23. Prediction is done for three classes (background, left ventricular cavity, and left ventricular myocardium) on images scaled to 256256 pixels.

To increase the amount of training data and make the predictions more consistent, various methods of data augmentation were applied. The images were rotated, flipped, and contrast and brightness were changed (flip [leftright], rotation [90], lighting [0.4] and zoom [1.2]).

Scanning the eleven pigs four times each resulted in a total number of 44 scans. Four of those scans had to be excluded from the study as high-resolution short-axis cine stacks were not recorded during the measurements. The remaining forty scans (24 of infarct animals, 16 of sham animals) were divided into three different subsets. This was done animal-wise: six (four infarct and two sham) were assigned to the training set, two (one infarct and one sham) to the validation set and three (two infarct and one sham) to the test set. It was ensured that the animals were divided equally according to infarct or sham group. However, within the groups, the animals were distributed randomly. This resulted in a total of 560 training images, 212 validation images, and 288 images for the test set. Supplementary Table S1 shows the number of images per scan and the division into the subsets for transfer learning in detail.

Re-training of the base model was performed in two steps. In the first step, all parameters except for those from the final parameter group were set as un-trainable (frozen). We trained for 100 epochs this way. An epoch is one full pass through the training data. We used the Adam optimizer24 to minimize the general Dice loss as implemented in fastai version 222. At this stage the maximum learning rate which determines how strongly the parameters are adjusted in each optimization step was set to 104. Checkpoints of the model were saved every 10 epochs. In the second step, models of all 10 checkpoints were compared with respect to the Dice scores on the validation set. The model with the highest Dice score was used as the basis for another 100 epochs with all parameters set as trainable (unfrozen) and maximum learning rate of 105. Afterwards, the model with the overall highest Dice score on the validation set was selected for further analyses. A test set consisting of scans of three pigs (two infarct pigs and one sham pig, 288 images) was excluded from the training process to evaluate the performance of the model.

The results of the manual segmentation could be calculated directly in QMass, while the contours generated by the DL model had to be imported into the software first. Medis uses dedicated contour files (.con) to store contour information. DL generated contours were transferred into such a contour file and imported into Medis for further analysis.

Based on the segmentation, various cardiac parameters were calculated: ejection fraction (EF), stroke volume (SV), LV mass, end-systolic volume (ESV), and end-diastolic volume (EDV). EDV and ESV [ml] were calculated by summing the voxels within the endocardial contour of all slices of the end-diastole and end-systole, respectively. SV [ml] was calculated as EDV minus ESV. EF [%] is expressed as SV divided by EDV, multiplied by 100. LV mass [g] was calculated as the difference of the total epicardial and endocardial volume in end-diastole, multiplied by the specific density of myocardium (1.05g/ml)19.

The following approach was taken in the overview assessment of the contours generated by the DL model. In some cases, the short-axis stack included images of the base of the heart that were above the part of the heart that guidelines suggest to segment. Therefore, only images that were also labelled manually were included in the evaluation. These were then examined and classified as labelled correctly, incorrectly, or not labelled at all. Any missing or incorrect contours could easily be manually added or adjusted in the software. This was intentionally avoided to be able to compare unedited results.

To quantify how close the automatically generated contours are to the manually drawn contours, we used two geometric metrics25:

The Dice score measures the volumetric overlap of two contours, with a value of 1 indicating perfect agreement and 0 indicating no agreement between the two contours. It was calculated for the left ventricle (DICELV) and the myocardium (DICEMY). The Dice score of a contour A and a contour B is calculated as the volumetric overlap of the two contours multiplied by the factor two, and then divided by the two areas of A and B:

$$Dice,score=frac{2cdot left|Acap Bright|}{left|Aright|+left|Bright|}.$$

The Hausdorff distance (HD) is the maximum distance between two contours, therefore, a low value indicates high agreement. The HD of two contours A and B is calculated as follows: The point a from contour A is determined as the maximum distance to contour B. Then, from this point a, the minimum distance to a point b from contour B is determined, resulting in the distance d(a, B). The same method is used to determine the distance d(b, A). The HD is now defined as the maximum of these two distances:

$$HD (A,B)=max left{ underset{ain A}{max} d(a,B), underset{bin B}{max}dleft(b,Aright)right}$$

with d(a, B) being the minimal distance from point a to contour B and d(b, A) being the minimal distance from point b to contour A.

Both metrics quantify how strongly the two compared contours agree mathematically.

Images where the DL model provides a label but the observer does not (and vice versa) result in a Dice score of 0 and an infinite HD.

We assessed the differences in clinical measures that were calculated based on the two methods of segmentation. All statistical analysis of the predicted cardiac parameters was done using OriginPro, Version 2021 (OriginLab Corporation, Northampton, Massachusetts, USA) and Microsoft Excel 2016 (Microsoft, Redmond, Washington, USA).

Continuous variables were checked for normal distribution using a ShapiroWilk test.

Paired Students t-tests were performed to test for significant differences. Since for each parameter (EF, SV, LV mass, EDV, and ESV) four hypotheses were tested (observer one vs. repeat, observer one vs. observer two, observer one vs. DL model, and observer one vs. DL model including only scans in the test set), the overall of 0.05 was adjusted according to a Bonferroni correction in order to decrease the risk of a type I error for multiple testing. Therefore, for each t-test a p-value of <0.0125 was considered statistically significant. For the assessment of the intra-class correlation coefficient (ICC), we used a two-way mixed-effects model based on absolute agreement. It was calculated and interpreted according to the guidelines of Koo and Li: Values<0.5 were classified as poor, between 0.5 and 0.75 as moderate, between 0.75 and 0.9 as good and>0.9 as excellent26. The coefficient of variability (CoV) was calculated as the standard deviation of the difference divided by the mean of two values27,28,29. We used a BlandAltman analysis to determine intra-observer and inter-observer variability, plotting the difference of the values against the mean of two values30. Additionally, Pearson correlation plots were created, and the corresponding r values were calculated.

Read this article:
Cardiac function in a large animal model of myocardial infarction at 7 T: deep learning based automatic segmentation ... - Nature.com

Related Posts
This entry was posted in Myocardial Infarction. Bookmark the permalink.

Comments are closed.