Introduction
The past decade has seen a gradually increased usage of the Monte Carlo (MC) method in dose calculation for clinical radiotherapy. A number of commercial treatment planning systems (TPSs) have started to offer MC dose engines as an option.
1–4 In order to increase the calculation speed, numerous variance reduction techniques such as history repetition, Russian roulette, and photon splitting are employed to improve the calculation efficiency.
5 In most dose engines for the MC treatment planning, the particles are tracked and the energy is scored in uniform geometric voxels which are resampled from the original computer tomography (CT) image data with decreased spatial resolution. The original CT pixel resolution which is 512 × 512 is not usually used in the MC simulations. The main reason is that the calculations are very slow because an extremely large number of particles are required to reduce the statistical uncertainty for small voxels to an acceptable level.
It is well known that the voxel size affects the speed of MC calculations. It has been demonstrated by Kawrakow
6 that the central processing unit (CPU) time spent for voxel-to-voxel boundary crossing and energy loss deposition with the MC code VMC++ is proportional to the number of voxel boundaries crossed by the particle track and thus proportional to the inverse of the voxel resolution. In order to obtain better dose resolution, small voxel size should be used. However, small voxel size results in longer calculation time. Large voxel size helps to increase the calculation speed since the number of voxel boundary crossing is reduced and the number of particles required to achieve a certain statistical uncertainty is decreased as well. The choice of the voxel size in MC simulations is a trade-off between precision and speed. The voxel size (eg, 5 or 2 mm) is chosen at the beginning of the simulation to uniformly discretize the dose volume.
Several techniques have been investigated for increasing the speed of MC simulations based on the geometrical voxel representation. One approach is to use a scoring grid with different voxel sizes.
7–9 The scoring grid is used for energy deposition only. The CT scan data are in a separate grid called “geometry grid” for particle transport. The voxels in the scoring grid are a number of grouped original CT voxels or separate superimposed spherical voxels. The size of the scoring voxels and the interspacing between scoring voxels are predefined by the user. The other approach that allows nonuniform voxel size is to use an octree as a compression algorithm to represent the CT data.
10,11 The octree refers to as a data structure in which each node has 8 subnodes. The octree is built by recursively subdividing a cubic volume into octants symmetrically. This technique has been used in the general purpose MC code GEANT4
12 to reduce the memory consumption as well as to decrease the calculation time. Other approaches include the region-oriented approach
13 and the non-voxelated approach.
14,15 The region-oriented approach segments the CT image volume into regions of homogeneous composition and density and was implemented in the GEANT4 toolkit.
12 It was reported that the calculation time was decreased by up to a factor of 15. The non-voxelated approach isolates regions of uniform density and composition from the scoring grid to minimize the number of boundary crossings. It was implemented in DOSXYZnrc,
16 and the reported speedup factor was up to 1.2 for CT phantoms. The other technique used for increasing the calculation speed is the macro MC method, which applies precalculated particle path library in a well-defined simple local geometry to a patient’s heterogeneous global geometry. Large steps are taken through the global geometry to accelerate the calculations.
17In this work, we postulate that by creating multi-resolution nonuniform voxels such that smaller voxels are used for the large density gradient regions and larger voxels for the smooth regions, we can increase the simulation speed while maintaining similar accuracy. To illustrate the potential of improvements in calculation time and efficiency using this strategy, we devised a volume decomposition technique that discretizes a CT image volume into multi-resolution nonuniform voxels. The algorithm employs an adaptive mesh refinement (AMR) technique,
18 which starts with a uniform coarse mesh and then each coarse voxel is further recursively split into 2 subvoxels whenever the density variation in the coarse voxel exceeds a presetable limit. The AMR technique is widely used in hydrodynamics to reveal the details of shock waves by changing the grid spacing at those regions. In the simulation of hydrodynamics, the mesh is changing dynamically over time by refining the mesh resolution. In our work, we use static mesh refinement in the sense that the nonuniform mesh is formed beforehand and it stays the same during the calculation. All neighboring information for a voxel is stored in the memory as a tree-like data structure; therefore, it is relatively quick to locate a voxel by a given particle location required in the track algorithm.
Materials and Methods
Algorithm to Generate Nonuniform Voxels
The multi-resolution subdivision (MRS) procedure to generate nonuniform voxels of variable size from a three-dimensional (3D) CT image volume is carried out as follows:
1.
The 3D CT image volume is converted to the mass density volume pixel-by-pixel without changing the resolution using a CT number-to-density calibration curve. The end result is a 3D density volume with the dimension of 512 × 512 × number of slices.
2.
The density volume is then discretized into a mesh with coarse resolution. Generally, we keep the slice thickness and use the space of 5 mm for both X and Y directions. The assignment of density to a coarse voxel is calculated by averaging the densities of the CT voxels whose center is inside the coarse voxel. Partial voxels within the coarse voxel are neglected. We also calculate average densities for each half of the coarse voxel in the X, Y, and Z directions.
3.
The density variation ΔDx,y,z in the X, Y, and Z directions within a coarse voxel is calculated by the difference in the average densities of the 2 halves of the voxel in that direction. The largest density difference and the associated direction are stored for each coarse voxel.
4.
If the density variation (ΔDx or ΔDy or ΔDz) in the coarse voxel is greater than the predefined split criterion (eg, 0.5 g/cm3), then the voxel is divided into half. Otherwise, the voxel remains intact. The stored average densities in step 2 are assigned to the 2 subvoxels.
5.
All the voxels including recently divided subvoxels are scrutinized repeatedly for possible splitting until the density variation within each of the voxels is less than the criterion. Finally, a material index is assigned for each voxel according to the density range-to-material table.
Figure 1 illustrates the above MRS procedure. It shows a split coarse voxel, which is divided into half at the X direction (green cut plane) during first pass of the procedure and then one of the subvoxels (right side) is further divided into half at the Y direction (red cut plane) during second pass. To refine a coarse voxel with the size of 5 × 5 × 3 mm
3 to a small voxel with the size of 1.25 × 1.25 × 3 mm
3, 4 passes are needed since each pass applies only in 1 direction. The generated voxels are stored in a file with an unstructured format, which includes the information such as voxel indices, voxel neighbor connectivity, density, and material.
The above MRS steps are based on the hypothesis that large dose gradients occur at the heterogeneous regions such as the interfaces of tissue and bone. It is a reasonable assumption as we know that the energy deposition is scaled based on the relative electron density in model-based dose methods. The other cause of large dose gradients is due to the beam penumbra. To take it into account, we apply 5 mm margin to the beam edges and carry out ray tracing from the margin area. The voxels which come cross with the rays and have not been divided previously are further subdivided into smaller voxels.
Monte Carlo Tracking Algorithm
To couple with the proposed nonuniform geometry, we rewrote an MC code in C++ called GMC based on the dose planning method (DPM) MC code,
19 which is a fast MC program for patient dose calculations. As in DPM, Compton scattering, photoelectric ionization, and pair production were considered for photon transport. These 3 processes were simulated in an analog fashion, that is, every interaction was modeled individually until the energy of the particle falls below a cutoff energy or the particle escapes the simulation volume. For other details about the physics implemented, readers may refer to DPM.
19 We only emphasize the boundary crossing algorithm here.
As implemented in DPM, electrons move across the voxel boundary to consider the material change in the next voxel. The distance to the voxel boundary is calculated along with the distance to the next hard interaction. The boundary crossing or the hard interaction event is sampled based on the closest distance. In the uniform mesh, the index of a voxel is determined by 3 indices (I, J, and K), and the index of next voxel after boundary crossing is simply adjusted by ±1 according to the direction of the particle. However, this is not true for the nonuniform voxel mesh since a voxel can no longer be indexed by I, J, and K. Therefore, the tracking subroutines have to be modified to accommodate this change.
The calculation geometry in our work consists of a number of nonuniform voxels. It is a hybrid mesh with variable voxel sizes. The data structure used in the hybrid mesh conceptually consists of a number of root nodes, and each root node represents a binary tree. The information such as the parentship, the neighbor voxels, and the connectivity for the 2 child voxels is stored in the tree node for quick retrieval. A tag is kept in each tree node indicating whether the voxel is split. To find the index of next voxel after the boundary crossing, the tag is first checked. If it is not split, the next voxel is updated by this leaf node. Otherwise, the child voxels are searched further until the undivided subvoxel is found. The block diagram in
Figure 2 illustrates this process.
Statistics and Dose Evaluation
The efficiency (∊) of an MC algorithm is defined
20 as ∊ = (σ
2T)
−1, where σ
2 is the variance and
T is the time required to obtain this variance. The efficiency expresses how fast an MC algorithm can reach a desired level of statistical accuracy. This metric was invented to compare the performance between MC packages with different variance reduction techniques so that the simulated number of histories does not reflect the true performance. For the calculation of σ
2, the average uncertainty proposed by Rogers and Mohan
20 is generally employed
where Di is the dose in the voxel i, ΔDi is its statistical uncertainty, and σ2 is averaged over all voxels with a dose greater than 50% of the maximum. N is the total number of simulated histories. Note that the efficiency is independent of N but dependent on the simulated time per history.
For dosimetric evaluation, the γ index
21 is widely used to compare dose distributions. The γ index function is defined for each reference dose point
rr as
where
re refers to a dose point at the evaluation distribution and Γ is simply the Euclidean distance in the renormalized dose–distance space. In our work, we used Δ
d = 3 mm as the acceptance criterion for the distance to agreement and Δ
D = 3% as the acceptance criterion for the dose difference.
22 Since all the dose distributions are evaluated with the fine grid (1.25 mm) dose distribution as the reference, the acceptance criterion Δd = 3 mm is reasonable. The criterion for acceptable comparison is defined as the γ index γ
avg ≤ 1, and the passing rate is defined as the ratio of the number of voxels with γ < 1 to the total number of voxels. The dose distribution from the fine uniform mesh was used as the reference. The evaluation distribution in the γ index calculations is the dose points from the unstructured nonuniform mesh (at the voxel center). All calculations were performed on a personal computer with a CPU of 3.3 GHz and 3 GB of memory.
Accuracy and Performance of the Algorithm Implementation
First of all, we illustrate the accuracy of the GMC code by using the published International Conference on the Use of Computers in Radiation Therapy (ICCR) simulations as an example.
20 The size of the slab phantom is 30.5 × 39.5 × 30 cm
3. For the photon test case, the ICCR phantom consists of water from 0 to 3 cm, aluminum from 3 to 5 cm, lung (ρ = 0.26 g/cm
3) from 5 to 12 cm, and water from 12 to 30 cm. For the electron test case, the phantom consists of water from 0 to 2 cm, aluminum from 2 to 3 cm, lung from 3 to 6 cm, and water from 6 to 30 cm. The beam size is 1.5 × 1.5 cm
2 with 100 cm source to surface distance. The statistical precision was set to 0.3%. The phantom was discretized to 5 × 5 mm
2 voxels in the X and Y directions and 2 mm in the Z direction.
To illustrate the potential gain of using nonuniform voxels, we performed a timing study on a simple setup: a single 20 MeV electron beam with 10 × 10 cm2 field size originating from a point source. The voxel sizes of the 30 × 30 × 30 cm3 water phantom are 1, 1.5, 2.5, 3.5, and 5 mm, respectively. Particles of 106 were simulated. The study was used to demonstrate the time dependence on the voxel size.
Tracking particles through a nonuniform mesh is more complicated than tracking particles in a uniform mesh. With the uniform mesh, the neighbor voxel is accessed directly through the simple computation of array index. With a nonuniform mesh representation, the tree-like data structure is traversed to obtain neighbor and child information, and thus extra steps and computations are involved. To characterize the overhead with the use of nonuniform mesh representation, several additional runs were performed using the same beam setup as in the timing study, except that the water phantom was represented as tree structure–like nonuniform voxels.
Clinical Cases
We use 2 clinical intensity-modulated radiation therapy (IMRT) cases (lung, head, and neck [H&N]) to demonstrate that the dose distribution calculated by the MRS approach maintains similar accuracy as calculated in the fine uniform mesh while the computational time is reduced. The plans were created with the Pinnacle TPS. The DICOM images, treatment field settings, and the fluence map were exported from the TPS. Seven fields at different gantry angles were used for each plan. The cutoff energies for absorption were 50 keV for photons and 200 keV for electrons.
We used the Metropolis sampling algorithm
23 to sample the source particles from the field fluence maps. The norm error |p
f − p
f*|
2 was recorded in the simulations, where p
f and p
f* are probability vectors for the simulated fluence map and the TPS fluence map, respectively. Particles of 10
9 were simulated to ensure that the simulated fluence map faithfully matched with the TPS fluence map so that the norm error was less than 10
−3. A real clinical source model is underdevelopment. For this work, simple point source assumption was employed, that is, the multi-leaf collimator and the detailed source model were not considered in this work. Therefore, the calculated dose was not comparable with the TPS dose.
Discussion and Conclusion
One difficulty in trying to increase the speed of MC calculations is the significant portion of simulation time spent in tracking the charged particles across heterogeneity boundaries. For accurate dose calculations, it is necessary to calculate the boundary crossing for electron transport as electrons move through small steps and deposit energy by continuous slowing down approximation since the cross-section data need to be updated when material type changes. Distance to the boundary has to be calculated at each step of the electron trajectory to determine the maximum permissible energy loss for that step.
24,25 There are 2 kinds of approaches to handle boundary crossing. One approach is to force all particles to stop at the boundary and the direction of the particles are changed (GEANT4, EGS4/EGSnrc, and PENELOPE). As an example, 2 algorithms of boundary crossing are implemented in EGSnrc, namely EXACT and PRESTA-I. The deflection occurs within a user input parameter distance (skin depth for BCA) and then either single elastic scattering mode (EXACT) or a multiple scattering event (PRESTA-I) is applied to calculate the changed direction. It has been demonstrated by David et al
13 that the computation time can be decreased by a factor of 15 with GEANT4 by using the proposed segmented volume approach to minimize the number of boundary crossing. The idea is to merge the voxel having the same material into large volume segment so that the number of boundary crossing is reduced.
The other approach to handling boundary crossings does not truncate steps at boundaries. This method is usually adopted in fast MC codes (eg, VMC++ and DPM). The CPU time for this type of MC simulations consists of 2 components, that is, time spent on calculation of geometry-independent quantities and time spent on voxel-to-voxel boundary crossing.
6 The theoretical maximum speed of an MC simulation has an upper limit when the entire time is spent for geometry calculations. For example, VMC++,
6 using the simultaneous transport of particle sets technique, reaches 90% of the maximum possible time for 1 mm voxels. The DPM
19 MC code spends 41% for voxel-to-voxel boundary crossing and energy loss deposition. The PMC,
26 using the precalculated data technique, spends 56% of the time in ray tracing. Although a patient’s CT is represented by millions of small voxels, many large homogeneous regions with the same tissue type exist such that boundary crossing considerations inside those regions are not necessary. A natural approach to reduce the number of boundary crossing is to create nonuniform voxels with various voxel sizes such that larger voxels are allowed for smooth density regions and smaller voxels for the regions with larger density gradient. The rationale of this approach is that high-dose gradient is often associated with high-density gradient, which is commonly used to differentiate the material. Note that in the fast dose algorithm, which does not truncate steps at boundaries, the operations to calculate the distance to the voxel boundary are still required. Therefore, as the number of boundaries that are encountered is reduced, the amount of computing time spent on boundary crossings is also reduced.
In our approach, the particles traverse and deposit energy on a unified nonuniform grid. One may suggest the use of a fine grid for particle transportation and scoring, and after the simulation, rebin the dose scoring grid into either uniform or nonuniform coarse grid to obtain better statistical precision. However, the time spent on geometrical transport per history on a fine grid is still slower than transport on a nonuniform grid.
Although the overhead of handling nonuniform voxels was about 10% more than handling uniform voxels, the reduction in the number of boundary crossing calculation still make the simulation per history faster than using the fine grid (1.25 × 1.25 × 3 mm
3). In addition, as a result of nonuniform voxel decomposition, the average voxel is larger. To reduce the statistical uncertainty, large voxel volume helps since more particles deposit in the voxel. For the target region, large voxels may be generated if the PTV region is fairly homogeneous and therefore fewer particles may be needed to reach the statistical uncertainty than using the fine grid without compromising much accuracy to the surrounding organs as demonstrated in DVHs. For the heterogeneous areas such as the interface of tissue and bone, small voxels are generated to keep high spatial resolution as large dose gradients may occur. By using nonuniform voxels such that large voxels are generated for smooth density regions and small voxels for heterogeneous regions, the number of voxels can be reduced by an order of magnitude. Efficiency can be improved because of the resulting larger average voxels. For dose algorithms such as collapse-cone convolution and superposition convolution,
27,28 a regular uniform mesh may be crucial for the algorithm to take advantage of the regularity to increase the calculation speed. However, for the MC simulation, the regularity of the mesh is not important because of the randomness of the particle trajectory. In addition to common variance reduction techniques, geometrical division into nonuniform voxels can be a useful technique to further accelerate MC simulations.
Abbreviations
- MC
- Monte Carlo
- MRS
- multi-resolution subdivision
- AMR
- adaptive mesh refinement
- DTA
- distance-to-agreement
- PC
- personal computer
- SSD
- source to surface distance
- IMRT
- intensity modulated radiation therapy
- H&N
- head and neck
- DVH
- dose volume histogram
- TPS
- treatment planning system
- PTV
- planning target volume
- DGT
- density gradient threshold
- PDD
- percentage depth dose
- MLC
- multi-leaf collimator
- CSDA
- continuous slowing down approximation
- ICCR
- International Conference on the Use of Computers in Radiation Therapy
- CT
- computerized tomography