Validation and reproducibility of cardiovascular 4D-flow MRI from two vendors using 2 × 2 parallel imaging acceleration in pulsatile flow phantom and in vivo with and without respiratory gating

Background 4D-flow magnetic resonance imaging (MRI) is increasingly used. Purpose To validate 4D-flow sequences in phantom and in vivo, comparing volume flow and kinetic energy (KE) head-to-head, with and without respiratory gating. Material and Methods Achieva dStream (Philips Healthcare) and MAGNETOM Aera (Siemens Healthcare) 1.5-T scanners were used. Phantom validation measured pulsatile, three-dimensional flow with 4D-flow MRI and laser particle imaging velocimetry (PIV) as reference standard. Ten healthy participants underwent three cardiac MRI examinations each, consisting of cine-imaging, 2D-flow (aorta, pulmonary artery), and 2 × 2 accelerated 4D-flow with (Resp+) and without (Resp−) respiratory gating. Examinations were acquired consecutively on both scanners and one examination repeated within two weeks. Volume flow in the great vessels was compared between 2D- and 4D-flow. KE were calculated for all time phases and voxels in the left ventricle. Results Phantom results showed high accuracy and precision for both scanners. In vivo, higher accuracy and precision (P < 0.001) was found for volume flow for the Aera prototype with Resp+ (–3.7 ± 10.4 mL, r = 0.89) compared to the Achieva product sequence (–17.8 ± 18.6 mL, r = 0.56). 4D-flow Resp− on Aera had somewhat larger bias (–9.3 ± 9.6 mL, r = 0.90) compared to Resp+ (P = 0.005). KE measurements showed larger differences between scanners on the same day compared to the same scanner at different days. Conclusion Sequence-specific in vivo validation of 4D-flow is needed before clinical use. 4D-flow with the Aera prototype sequence with a clinically acceptable acquisition time (<10 min) showed acceptable bias in healthy controls to be considered for clinical use. Intra-individual KE comparisons should use the same sequence.


Introduction
Quantification of the intracardiac or vascular blood flow in three dimensions over time using magnetic resonance imaging (MRI), so called 4D-flow, enables the study of novel physiological parameters revealing insights into pathophysiological conditions.
One example is kinetic energy (KE), which has shown differences between healthy volunteers and patients (1)(2)(3) and has been proposed to provide risk stratification of patients with various heart diseases (3). Furthermore, the use of 4D-flow has been proposed as an effective and more accurate method to quantify unstable and spatially complex flow patterns, e.g. in valvular lesions (3)(4)(5)(6). Also, one single 4D-flow acquisition may save time compared to several 2D-flows in patients (7)(8)(9). Validation of the used techniques are needed in order for 4D-flow to be utilized clinically and to compare results between centers. To this date, most validation studies have been limited to one vendor platform. Thus, there is a need to demonstrate how 4Dflow performs on scanners from different vendors. Furthermore, the robustness of measured physiological parameters, i.e. repeatability, is most often studied using a scan and rescan at the same imaging occasion (10), but to be a reliable biomarker the measured parameter should be reproducible, i.e. stable over a longer time period (10). Finally, the data acquisition needs to be fast enough to fit into a clinical protocol. One way to achieve this is to acquire the images without respiratory navigator gating, which lowers the scanning time by up to 60% (11). The use of non-respiratory-gated 4D-flow in patients and controls with moderate parallel imaging acceleration using acceleration factor 2 has previously been validated (11). Modern scanner hardware allows for higher acceleration factors due to better signal-tonoise ratio, enabling further reduction of scan times by increasing the parallel imaging acceleration factor beyond 2 (12). However, the effect of higher parallel imaging acceleration factors on 4D-flow data quality with and without respiratory gating is not known. Therefore, the aims of this study were to validate highly accelerated 4D-flow sequences from two vendors head-to-head with respect to flow volumes and KE by performing scans in vitro in a three-dimensional (3D) pulsatile phantom setup with laser particle image velocimetry (PIV) as the reference standard, and in vivo: (i) with and without respiratory gating; (ii) on different platforms on the same day for reproducibility; and (iii) on the same scanner on different days for repeatability.

Material and Methods
The regional ethical committee approved the study and written informed consent was obtained from each subject. Healthy controls (n ¼ 10, see Table 2) underwent 1.5-T cardiac MRI scans including 2D cine in long and shortaxis planes during breath-hold, 2D flow of the aorta and pulmonary artery during free-breathing, and 4D-flow. The scanners used were a MAGNETOM Aera, Siemens Healthcare, Erlangen, Germany and an Achieva dStream, Philips Healthcare, Best, The Netherlands. The study protocol included: (i) repeating the scan back-to-back on the same day with the two different scanners for reproducibility test (10); and (ii) repeating the examination on one of the scanners within two weeks for repeatability (10). To minimize differences caused by circadian rhythm on cardiac physiology, all MRI scans were performed after 13:00 and the order of the two scans alternated between individuals. Additionally, all participants fasted 2 h before the measurements and were provided with a small snack (fruit and pastry) and water before each scan.
Whole heart 4D-flow with retrospective ECG-triggering and 2 Â 2 acceleration (GRAPPA and SENSE reconstruction for Aera and Achieva, respectively) were acquired with (Respþ) and without (RespÀ) respiratory navigator gating. On the Siemens scanner, a prototype 4D-flow sequence (WIP 785K) enabling retrospective ECG-triggering was used. The software version at Achieva dStream was R5.1.7. Typical parameters for 4D-flow are shown in Table 1. Number of time phases acquired depended on heart rate and was set to the maximum with a preserved segmentation factor/views per segment of 2 and images were then reconstructed to 40 time phases. The images covered the entire heart in an oblique transversal view, with phase encoding (fold-over direction) in the anteroposterior direction.

Image analysis
All MR images were analyzed using the Segment software (http://segment.heiberg.se) with an in-house developed module for 4D-PC-MRI analysis (13). Quality was graded by two observers (with ten and seven years of experience with 4D-flow, respectively) in consensus on a four-grade scale of 0-3, where 0 is excellent quality, 1 is good, 2 is acceptable, and 3 is inadequate image quality warranting exclusion of data. A first-order phase background correction (14,15) and phase unwrapping (16) was performed before analysis.
The integrated flow per heartbeat (flow volume) was calculated from the ascending and descending aorta and pulmonary artery from both 2D-and 4D-flow acquisitions in identical imaging planes. The vessels were semi-automatically outlined in the 2D-flow images and the contours transferred to the 4D-flow. The ratio of pulmonary to aortic (systemic) flow (QP/ QS) was calculated as this is a common clinical metric obtained from cardiac MRI.
The endocardium of the right and left ventricles was manually outlined in all time phases of the cine shortaxis stack and the delineations transferred to the 4Dflow dataset. KE for each voxel in the ventricle was calculated as KE ¼ ½mv 2 (m is mass of the voxel and v is the velocity in the voxel) and summed over the ventricle for each time phase.
Stroke volume (SV) from cine images was measured for comparison to flow to determine if 4D-flow underestimates or 2D-flow overestimates SV. Inter-observer variability was determined by a second observer for 66 vessels in 22 4D-flow datasets randomly selected from both scanners.

In vitro validation
Phantom validation was performed as shown in Fig. 1a. The phantom setup enables measurement of a pulsatile and fully 3D water flow with both 4D-flow MRI and laser PIV as the reference standard. Five different pump programs were used, with SV in the range of 12-37 mL. Acquisition parameters for 4D-flow were similar to the in vivo scans described above (see Supplementary file for more detailed information). Laser PIV resulted in a spatial resolution of 1.5 Â 1.5 mm 2 in a single sagittal slice along the main flow direction and a temporal resolution of 10 ms. Gadolinium contrast agent (Dotarem, Guerbet, France) was added to the water to achieve a T1 relaxation time roughly comparable to that of blood (1350-1550 ms). Pump stability was quantified by measuring SV before and after each 4D-flow acquisition using the timer and beaker method.
Linear background phase correction was performed for all phantom 4D flow datasets. A linear fit was performed in stationary areas of the phantom in each dataset and the fit then subtracted. Comparison between 4D-flow and laser PIV data was performed on the basis of velocity values along the main vortex flow direction (feet-head) and by computing KE in the vortex ring (17).

Statistical analysis
Statistical analysis was performed using GraphPad (v5, La Jolla, CA, USA). Values are presented as means AE standard deviation (SD). Differences in results were assessed using the non-parametric paired Wilcoxon test. Results with a P value < 0.05 were considered statistically significant. Pearson's correlation analysis and Bland-Altman analysis were performed and presented as bias AE SD.

In vivo validation
In total, 57 4D-flow scans were acquired in the ten individuals (characteristics summarized in Table 2). One participant could not undergo repeated MR within 14 days due to logistical reasons and two acquisitions were therefore not performed; in addition, one acquisition (Achieva) did not reconstruct correctly and images were therefore lost. The rescans were performed 6 AE 3 days after the first examination.
Average scan duration for Aera was 9 AE 3 (Respþ) and 6 AE 2 (RespÀ) min, and for Achieva 17 AE 3 (Respþ) and 10 AE 1 (RespÀ) min. Image quality was better for Aera compared to Achieva for both respiratory (0.7 AE 0.6 vs. 2.0 AE 0.7, lower means better image quality) and non-respiratory-gated (0.9 AE 0.6 vs. 2.5 AE 0.5) acquisitions. Intra-scanner comparison showed no difference in image quality between respiratory and nonrespiratory triggered acquisitions (Fig. 2). On the Aera, no 4D-flow acquisitions had inadequate image quality but 4 Respþ (31%) and 8 RespÀ (57%) 4D-flow acquisitions had inadequate image quality on the Achieva. Data are presented for all acquired data and with exclusion of acquisitions with inadequate image quality.
Examples of flow curves from one individual are shown in Fig. 3. Compared to standard 2D-flow, 4Dflow underestimated the flow volume on both Aera and Achieva (Table 3, Figs. 4 and 5) scanners. 4D-flow scans with respiratory gating on Aera showed the lowest bias of all scans and had higher precision both compared to Aera 4D-flow RespÀ (P ¼ 0.005) and Achieva Respþ (P < 0.001). Bias was not different for Respþ 4D-flow compared to RespÀ on Achieva (P ¼ 0.87). The QP/QS was slightly higher for 4D-flow compared to 2D-flow for both scanners ( Table 3). The use of respiratory gating did not change the QP/QS for Aera (P ¼ 0.89) or Achieva (P ¼ 0.057).
SV from 2D-flow before and after the 4D-flow acquisitions showed low bias and variability (0.6 AE 8.0 mL), which indicates that there was no systematic physiological drift in SV during scans.
KE curves over the heart cycle with in vivo 4D-flow showed that late diastolic peak KE and mean KE were lower on the Aera compared to the Achieva (Table 4) when using navigator respiratory gating (Respþ). The use of respiratory gating was associated with lower peak KE in systole and late diastole for Aera but higher systolic and lower early diastolic peak KE for Achieva ( Table 4).
The repeatability of KE for both scanners is shown in Table 5. Notably, the bias for a 4D-flow acquisition compared to a repeated acquisition within 14 days on the same scanner was generally lower compared to the bias comparing scans from the two vendors on the same day.

Discussion
This study validated 4D-flow sequences with high acceleration from two different vendors' MR scanners    in vivo measurements and highlights the importance of rigorous validation at each site before clinical use. The flow results from the 4D-flow prototype sequence show the possibility for 4D-flow to replace repeated 2D-flow measurements. Furthermore, this study showed the possible use of free breathing 4D-flow acquisitions without respiratory gating with high acceleration on the Aera with the prototype sequence with low differences in bias compared to results from respiratory gating and thus a further possibility to decrease 4Dflow scan time. KE measurements show differences between scanners on the same day that are larger than the differences in KE from the same scanner over time. This means that longitudinal KE measurements need to be performed using the same MR scanner.
A previous multi-center, multi-vendor 2D-flow validation showed that velocity offset varied between sites having the same vendor and sequence (18). This may also apply to 4D-flow and the suboptimal performance of 4D-flow on the Achieva scanner in this study may be scanner-or software-specific rather than  vendor-specific. However, of note, a recent 4D-flow two-center validation study performed on Philips Ingenia scanners showed similar correlation between aortic and mitral flows as in our study (19). Until a multi-center, multi-vendor 4D-flow validation study has shown agreement between scanners, validation at each site performing 4D-flow is therefore needed before clinical or research use of this technique. The proposed validation in this study compared 2D-flows in the aorta and pulmonary artery. The calculated QP/QS was in the range of 0.9-1.3, which is deemed to be acceptable for clinical use for shunt quantifications.

Relation to earlier studies
Our results on the Philips Achieva dStream show similar correlations as Garg et al. using a 1.5-T Philips Ingenia without respiratory navigating using a spoiled gradient echo sequence with k-segmentation of 2 and SENSE 2 in the phase encoding plane (19). Garg et al. found a modest correlation for net-flow consistency between aortic outflow and mitral inflow (r ¼ 0.58, y ¼ 0.74x þ 26.99) similar to our results comparing 4D-flows to 2D-flows (r ¼ 0.51, y ¼ 0.53x þ 21.99). They concluded that the 4D-flow using EPI-acceleration showed better image quality and higher consistency compared to the SENSE-accelerated sequence, however only with a modest correlation to 2D-flow for velocities. Previous work showed better accuracy on 4D-flow on Philips Achieva 3-T compared to 1.5-T Achieva with SENSE-acceleration of 2 and k-segmentation 2 and that acceleration with k-t-BLAST causes underestimation of flow volumes (12). This underestimation of velocity and flow using  k-t-BLAST was also replicated in the study by Garg et al. (19) but not seen in the 3-T study using a 32channel coil (20). We have previously shown a low bias of 4D-flow with SENSE-acceleration of 2 and ksegmentation 2, without respiratory gating validating the mitral inflows against planimetric SV in healthy volunteers and patients (11), and the RespÀ results of this study reinforce this view. Kamphuis et al. recently evaluated scan-rescan SV using an EPI-accelerated 4D-flow sequence compared to planimetric SV on a 3-T Philips Ingenia system (21) and found lower SV from 4D-flow but good 4D in-scan consistency. Compared to our study with comparison of 2D-flow and 4D-flow on a 1.5-T Philips scanner, Kamphuis et al. found lower bias (bias on first scan: -11 mL, 95% CI ¼ -34--12 mL) on the 3-T system. Notably, the bias from Kamphuis et al. are similar to our results from Siemens Aera 1.5T without respiratory navigator. Petersson et al. showed lower correlation between aortic and pulmonary flows with cartesian 4D-flow (SENSE 2 and k-segmentation 2, r ¼ 0.73, y ¼ 0.79x þ 24.55) compared to a spiral readout 4Dflow sequence (r ¼ 0.94, y ¼ 1.03 þ 3.56) on a 1.5-T Philips Achieva (22). Of note, the correlation in their study was calculated between two flows obtained from 4D-flows and not compared to an external reference such as 2D-flow or planimetric SV.
The bias in our study is lower compared to Hanneman et al. who compared 2D-and 4D-flow for QP/QS in patients with a prospectively triggered 4Dflow sequence in a 1.5-T Siemens Avanto (23). They found bias of -21.9 AE 12.2 mL for the pulmonary artery and -10.7 AE 13.1 mL for aorta; the lower bias in our study may be due to healthy volunteers scanned but may also be due to improvements of the 4D-flow prototype. Valverde et al. validated 4D-flow during free breathing with two-fold SENSE acceleration in patients with Fontan physiology to 2D-flow and showed bias low enough to motivate 4D-flow to replace the longer scan times of repeated 2D-flow acquisitions (9).
Frydrychowicz et al. compared blood-pool contrastenhanced radially under-sampled PC-4D-flow (VIPR) to 2D-flow and cine SV on a 3-T GE scanner and found that phantom data correction was necessary to reach accuracy needed for clinical scans (24). Of note, the bias in their study was comparable to our study at the 1.5-T Aera.
Recently, Stoll et al. showed test-retest variability of KE at a 3-T Siemens Trio and showed a higher coefficient of variation for KE at two different time points (6.2-16.1%) compared to scan-rescan at the same day (3.5-17.7%) (25).
In summary, our results using the prototype sequence on the Aera scanner compare favorably to previously published results, and the results without respiratory navigation compare well to results from 3-T scans. The results from the Philips sequence with Cartesian k-space acquisition and SENSE acceleration of 4 are comparable to previously published results with lower acceleration factors but the number of scans with inadequate image quality and the rather large bias should provide caution to use as high acceleration factors as in our study.
The Philips sequence has shown high accuracy and precision in flow phantoms, both for two-fold SENSE acceleration in a previous study (26) and for 2 Â 2 SENSE in the present results. However, the in vivo results using 2 Â 2 SENSE showed an underestimation of SV compared to 2D-flow. Possible explanations for this discrepancy may include the simplified, symmetric geometry in the phantom compared to human anatomy, an absence of respiratory movement, and less pronounced background phase offsets introduced by the phantom compared to those of the human body.
4D-flow tends to underestimate peak flow velocities due to the lower temporal resolution compared to 2D-flow. This could give lower peak flows as seen in Fig. 4 for Achieva. The phantom validation was only done up to 40 cm/s but the correlation analysis showed that the Achieve had a slope of 0.91, meaning that even velocities as low as 40 cm/s tended to be underestimated. The Aera had a slope of 0.98, meaning that there was little underestimation in this velocity range. Thus, there was no underestimation of KE with the Aera. Further validation studies of 4D-flow are needed to show to what degree higher peak velocities in patients are underestimated. These studies should probably be done in comparison with Doppler ultrasound as 2D-flow MRI may also have problems in detecting the peak velocity in a stenosis, mainly due to the problem of finding the correct imaging plane for detecting the peak velocity with 2D techniques. Indeed, Jarvis et al. showed that 4D-flow showed higher peak velocities compared to 2D-flow but no difference between 4D-flow and Doppler ultrasound (27).
This study included a relatively small population and did not include patients. The reason for not including patients are that three repeated scans with two 4D-flow acquisitions each was used in our study protocol and this is neither feasible nor ethical to ask of patients.
In conclusion, intracardiac 4D-flow accelerated by 2 Â 2 parallel imaging could be acquired with adequate quality even without respiratory gating and within a clinically feasible time window (10 min) using a retrospectively ECG-gated prototype 4D-flow sequence on the Aera. Even though in vitro phantom validation showed high accuracy of velocities and KE on both scanners, the in vivo accuracy differed significantly. Therefore, this study shows the importance of in vivo validation of 4D-flow for each scanner and sequence before clinical use. Patients followed over time with KE estimation need to be examined using the same scanner.

Acknowledgments
Philips Healthcare and Siemens Healthcare are gratefully acknowledged for research collaborations making the 4Dflow sequences available.

Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Karin Markenroth Bloch was an employee of Philips Healthcare during the acquisition of the study; Andreas Greiser and Frederik Testud are employees of Siemens Healthcare. The other authors declare no conflicts of interest.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was funded by the Swedish Research Council (2016-01617); Region of Ska˚ne, Sweden; Medical Faculty at Lund University, Sweden; Swedish Heart and Lung Foundation; Swedish Medical Association.