Bug Description
Make environment-first FEM space partitions graph-capturable with max_node_count
Description
Building a warp.fem space partition with environment_first=True is not CUDA graph-capturable for multiple
environments, even when max_node_count is provided.
EnvironmentSpacePartition.rebuild() unconditionally reads the compressed group offsets back to the host:
group_offsets_np = group_offsets.numpy()
This synchronization is used to determine the active node count and construct env_offsets. During CUDA graph
capture, the device-to-host copy fails.
NodePartition already handles the equivalent situation by treating a non-negative max_node_count as a host-known
partition size, avoiding device readback. The environment-first path should follow the same approach.
Reproduction
import warp as wp
import warp.fem as fem
wp.init()
device = wp.get_device("cuda:0")
geo = fem.Grid2D(res=wp.vec2i(2), env_count=2)
space = fem.make_polynomial_space(
geo,
degree=0,
discontinuous=True,
)
# Preload generated kernels.
fem.make_space_partition(
space_topology=space.topology,
environment_first=True,
max_node_count=space.node_count(),
device=device,
)
with wp.ScopedCapture(device=device, force_module_load=False) as capture:
partition = fem.make_space_partition(
space_topology=space.topology,
environment_first=True,
max_node_count=space.node_count(),
device=device,
)
This fails at group_offsets.numpy() with an error similar to:
RuntimeError: Warp copy error: Warp CUDA error 906:
operation would make the legacy stream depend on a capturing blocking stream
The issue affects the general multi-environment path and environment-first partitions over non-whole geometry
partitions. The single-environment whole-space fast path does not perform this readback.
Expected behavior
When max_node_count >= 0, environment-first space partition construction and rebuilding should be graph-capturable,
consistent with NodePartition.
The exact-size path where max_node_count < 0 may continue synchronizing to obtain the actual node count.
System Information
No response
Bug Description
Make environment-first FEM space partitions graph-capturable with
max_node_countDescription
Building a
warp.femspace partition withenvironment_first=Trueis not CUDA graph-capturable for multipleenvironments, even when
max_node_countis provided.EnvironmentSpacePartition.rebuild()unconditionally reads the compressed group offsets back to the host:This synchronization is used to determine the active node count and construct
env_offsets. During CUDA graphcapture, the device-to-host copy fails.
NodePartitionalready handles the equivalent situation by treating a non-negativemax_node_countas a host-knownpartition size, avoiding device readback. The environment-first path should follow the same approach.
Reproduction
This fails at
group_offsets.numpy()with an error similar to:The issue affects the general multi-environment path and environment-first partitions over non-whole geometry
partitions. The single-environment whole-space fast path does not perform this readback.
Expected behavior
When
max_node_count >= 0, environment-first space partition construction and rebuilding should be graph-capturable,consistent with
NodePartition.The exact-size path where
max_node_count < 0may continue synchronizing to obtain the actual node count.System Information
No response