GPU-accelerated unstructured mesh tallies for Monte Carlo particle transport
PUMI-Tally accelerates unstructured mesh tallies in Monte Carlo neutral particle transport simulations by exploiting mesh adjacency information on CPUs and GPUs. Built on top of PUMIPic and Kokkos, it provides distributed parallel particle and mesh data structures with Omega_h.
Spack automates the build of PUMI-Tally and all its dependencies.
- Create and activate a Spack environment
spack env create pumi-tally-env
spack env activate pumi-tally-env- Update the builtin Spack repo (requires at least
releases/v2026.02)
spack repo update builtin --branch releases/v2026.02- Add the PUMI-PIC Spack repository
spack repo add https://github.com/SCOREC/pumi-pic-spack.git --name pumi-pic-spack- Add packages and install
For OpenMC with PUMI-Tally support:
spack add openmc-pumi ^kokkos+openmp+serial
spack concretize --force
spack installYou may not need DAGMC but it installs OpenMC with DAGMC support by default. Change the DAGMC spec off if you don't need it.
Tip
For tallying on the GPUs, use ^kokkos+cuda cuda_arch=<arch_code> instead. Check spack info pumi-tally for the supported architectures.
Verify the installation:
# If OpenMC was installed
openmc --help # should show the --ohMesh option
# If only PUMI-Tally was installed
spack find pumi-tallyPUMI-Tally involves many dependencies and is complex to build from source. A complete build instruction for platform-specific versions is provided in the PUMIPic Wiki. Use this install PUMIPic. Be sure to install the make_search_class branch of PUMIPic.
cmake -S . -B build \
-DCMAKE_PREFIX_PATH=$DEPS_DIR \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_COMPILER=mpicxx \
-DCMAKE_C_COMPILER=mpicc \
-DCMAKE_INSTALL_PREFIX=$DEPS_DIR \
-DBUILD_SHARED_LIBS=ON
cmake --build build -j$(nproc) --target installTip
Add -DPUMI_USE_KOKKOS_CUDA=ON when building with CUDA.
Add -DPUMITALLYOPENMC_ENABLE_TESTS=ON to build the test suite (requires Catch2 v3.4.0).
PUMI-Tally currently works with a specific fork of OpenMC. HDF5 with MPI and high-level API support is required.
git clone --recurse-submodules --depth 1 --branch decouple_pumi_tally \
https://github.com/Fuad-HH/openmc.git /tmp/openmc
cmake -S /tmp/openmc -B /tmp/openmc/build \
-DCMAKE_INSTALL_PREFIX=$DEPS_DIR \
-DCMAKE_PREFIX_PATH=$DEPS_DIR \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_COMPILER=mpicxx \
-DCMAKE_C_COMPILER=mpicc \
-DCMAKE_INSTALL_LIBDIR=lib \
-DOPENMC_USE_MPI=ON \
-DOPENMC_USE_OPENMP=ON \
-DOPENMC_USE_PUMIPIC=ON
cmake --build /tmp/openmc/build -j$(nproc) --target installThis will install OpenMC without DAGMC support. To enable DAGMC, install DAGMC first using the instructions in DAGMC Website.
Note
This fork uses --recurse-submodules because OpenMC vendors some dependencies as git submodules.
Create a first-order (linear) tetrahedral volume mesh of the tally region using any meshing tool (Gmsh, Simmetrix, etc.). Requirements:
- The geometry must be convex (no concavity).
- The mesh must cover the entire OpenMC geometry with no internal holes. If the OpenMC geometry contains holes, add a bounding box around it.
Convert the mesh to Omega_h format (.osh). For example, from Gmsh:
msh2osh input.msh output.oshCheck and adjust the coordinate scale to match the OpenMC model:
describe output.osh # prints coordinate min/max
scale output.osh scaled.osh 10 # scale by 10 in all axes if neededopenmc --ohMesh mesh.oshResults are written to fluxresult.vtk.
PUMI-Tally decouples tally operations from OpenMC via the PIMPL idiom, so OpenMC does not need to link against PUMIPic, Kokkos, or any GPU compiler. Transport runs on the CPU or GPU based on the physics application that connects to it
(for example, OpenMC runs on the CPU); tallies run on the CPU or GPU (however Kokkos is compiled) through a batched interface of three calls:
PumiTallyandCopyInitialPosition— Initializes the PUMI-Tally object and localizes particles to their parent mesh elements based on the physics application's source sampling strategy.MoveToNextLocation— copies particle destinations, weights, and status from OpenMC to the device, then walks each particle through the mesh element-by-element using adjacency information, accumulating track-length tallies per element via atomics — no dynamic allocations or re-localization trees needed.WriteTallyResults— writes the accumulated tallies to disk (currently VTK only).
High-level API of PUMI-Tally in OpenMC.
If you use PUMI-Tally in your research, please cite:
@article{hasan2025gpu,
title = {GPU Acceleration of Monte Carlo Tallies on Unstructured Meshes in OpenMC with PUMI-Tally},
author = {Hasan, Fuad and Smith, Cameron W and Shephard, Mark S and Churchill, R Michael
and Wilkie, George J and Romano, Paul K and Shriwise, Patrick C and Merson, Jacob S},
journal = {arXiv preprint arXiv:2504.19048},
year = {2025}
}See LICENSE for details.