Skip to content

Environment-first FEM space partitions are not graph-capturable #1607

Description

@gdaviet

Bug Description

Make environment-first FEM space partitions graph-capturable with max_node_count

Description

Building a warp.fem space partition with environment_first=True is not CUDA graph-capturable for multiple
environments, even when max_node_count is provided.

EnvironmentSpacePartition.rebuild() unconditionally reads the compressed group offsets back to the host:

group_offsets_np = group_offsets.numpy()

This synchronization is used to determine the active node count and construct env_offsets. During CUDA graph
capture, the device-to-host copy fails.

NodePartition already handles the equivalent situation by treating a non-negative max_node_count as a host-known
partition size, avoiding device readback. The environment-first path should follow the same approach.

Reproduction

import warp as wp
import warp.fem as fem

wp.init()
device = wp.get_device("cuda:0")

geo = fem.Grid2D(res=wp.vec2i(2), env_count=2)
space = fem.make_polynomial_space(
    geo,
    degree=0,
    discontinuous=True,
)

# Preload generated kernels.
fem.make_space_partition(
    space_topology=space.topology,
    environment_first=True,
    max_node_count=space.node_count(),
    device=device,
)

with wp.ScopedCapture(device=device, force_module_load=False) as capture:
    partition = fem.make_space_partition(
        space_topology=space.topology,
        environment_first=True,
        max_node_count=space.node_count(),
        device=device,
    )

This fails at group_offsets.numpy() with an error similar to:

RuntimeError: Warp copy error: Warp CUDA error 906:
operation would make the legacy stream depend on a capturing blocking stream

The issue affects the general multi-environment path and environment-first partitions over non-whole geometry
partitions. The single-environment whole-space fast path does not perform this readback.

Expected behavior

When max_node_count >= 0, environment-first space partition construction and rebuilding should be graph-capturable,
consistent with NodePartition.

The exact-size path where max_node_count < 0 may continue synchronizing to obtain the actual node count.

System Information

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No fields configured for Bug.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions