Describe the bug
In warp/native/bvh.cu, compute_total_bounds calls a CUB BlockReduce collective and __syncthreads() from inside a thread-divergent branch if (tid < num_items):
const int blockStart = blockDim.x * blockIdx.x;
const int numValid = ::min(num_items - blockStart, blockDim.x);
const int tid = blockStart + threadIdx.x;
if (tid < num_items) {
vec3 lower = item_lowers[tid];
vec3 upper = item_uppers[tid];
vec3 block_upper = BlockReduce(temp_storage).Reduce(upper, Vec3Max, numValid);
__syncthreads();
vec3 block_lower = BlockReduce(temp_storage).Reduce(lower, Vec3Min, numValid);
if (threadIdx.x == 0) {
atomic_max(total_upper, block_upper);
atomic_min(total_lower, block_lower);
}
}
BlockReduce::Reduce and __syncthreads() are block-wide operations that require every thread in the block to reach them. For the last block, when num_items is not an exact multiple of blockDim.x, the threads with tid >= num_items take the else path and never reach the collective/barrier. That is undefined behavior — it can hang or produce a wrong reduction — and the result feeds the total scene bounds used to build the Morton-code LBVH, so a wrong reduction silently corrupts the acceleration structure.
Location
warp/native/bvh.cu, compute_total_bounds (≈ lines 462–478).
Reachability
The default GPU LBVH build path; triggered whenever the item count is not a multiple of the block size (i.e. most inputs).
Suggested fix
Hoist the loads and both BlockReduce(...).Reduce(...) calls (and the __syncthreads()) out of the if (tid < num_items) so all threads participate; feed neutral sentinels for out-of-range lanes (+inf for the min/lower reduce, -inf for the max/upper reduce) — or have all threads call Reduce(..., numValid) (the valid count is already computed) — and guard only the atomic_* writes with threadIdx.x == 0.
Note on verification
I found this by static analysis. I don't have a Warp build on my machine (single-GPU WSL2 without the native toolchain), so I have not reproduced it under a sanitizer. A maintainer could confirm with compute-sanitizer --tool synccheck on an LBVH built from an item count not divisible by the block size (placing one far-outlier AABB last makes a wrong reduction observable against a CPU reference). Filing in case it's useful; happy to open a PR if the suggested direction looks right.
Describe the bug
In
warp/native/bvh.cu,compute_total_boundscalls a CUBBlockReducecollective and__syncthreads()from inside a thread-divergent branchif (tid < num_items):BlockReduce::Reduceand__syncthreads()are block-wide operations that require every thread in the block to reach them. For the last block, whennum_itemsis not an exact multiple ofblockDim.x, the threads withtid >= num_itemstake the else path and never reach the collective/barrier. That is undefined behavior — it can hang or produce a wrong reduction — and the result feeds the total scene bounds used to build the Morton-code LBVH, so a wrong reduction silently corrupts the acceleration structure.Location
warp/native/bvh.cu,compute_total_bounds(≈ lines 462–478).Reachability
The default GPU LBVH build path; triggered whenever the item count is not a multiple of the block size (i.e. most inputs).
Suggested fix
Hoist the loads and both
BlockReduce(...).Reduce(...)calls (and the__syncthreads()) out of theif (tid < num_items)so all threads participate; feed neutral sentinels for out-of-range lanes (+inffor the min/lower reduce,-inffor the max/upper reduce) — or have all threads callReduce(..., numValid)(the valid count is already computed) — and guard only theatomic_*writes withthreadIdx.x == 0.Note on verification
I found this by static analysis. I don't have a Warp build on my machine (single-GPU WSL2 without the native toolchain), so I have not reproduced it under a sanitizer. A maintainer could confirm with
compute-sanitizer --tool synccheckon an LBVH built from an item count not divisible by the block size (placing one far-outlier AABB last makes a wrong reduction observable against a CPU reference). Filing in case it's useful; happy to open a PR if the suggested direction looks right.