[Rhi] [bug] Fix the Unified Allocator to no longer return first two allocations as dupes#8705
Conversation
5d334f1 to
9c98bd5
Compare
666dce9 to
5a33be3
Compare
|
(updated with fixed allocator code. Looks like the first two allocations from the allocator always get same address, in master. This PR fixes that. I'll also add some unit tests somehow/somewhere). |
|
(added unit tests for allocator) |
|
Here is the script I used previously to reproduce a similar bug. The PR does indeed fix it. |
|
@YilingQiao Great! Also, nice work on such a compact minimum reproducible example 🙌 Just 6 lines 😮 |
|
Note that whilst there's a failing test, it is not for a 'required' check. I expect it's because I'm allocating 1GB of test memory 😅 My idea is to upgrade the Allocator to be able to be passed a 'default allocator size', so we don't have to leak 1GB memory for the tests. But since this makes the allocator more complicated, and might break stuff in itself, I'd prefer to do that in a second PR, ideally, unless anyone has strong opinions about my breaking the IOS build? (will basically triple the complexity of this PR I feel). |
|
A couple of other options accur to me:
|
|
Great! 🙌 Thank you :) |
Issue: #
Brief Summary
The unified allocator always allocated the first two allocations as dupe memory addresses, which always clobbered each other.
copilot:summary
New walkthrough
Looking at the Unified allocator code, we can see that when we first allocate a memory chunk, we do not add
sizetohead, and thus the next allocation will receive the exact same address too. Thus, there will be two structs or similar, in memory, which clobber each other, leading to plausibly a plethora of hard-to-debug crashes.High level overview of how allocator works
The allocator can work with two types of request:
exclusivenot exclusiveFor exclusive requests:
taichi/taichi/rhi/common/unified_allocator.cpp
Lines 74 to 75 in 562e05f
chunk.datais set to the start of this bufferchunk.headis too, but it's not really used for exclusive accesschunk.tailis set to the end of the buffer, but again not really used for exclusive accessExclusive access requests are thus fairly straightforward
For non-exclusive, it is slightly more complex
taichi/taichi/rhi/common/unified_allocator.cpp
Lines 9 to 10 in 562e05f
chunk.datais set to the start of this bufferchunk.headis set to the start of unused space in this bufferchunk.data + sizechunk.datathoughchunk.head- we add
sizetohead(to within alignment)- we return the old
head(to within alignment)Proposed fix
The proposed fix is to set
headtodata + sizefor newly allocated chunksI don't really have a strong opinion on which fix we prefer. The second approach seems mildly cleaner perhaps, since decouples 'finding/creating a chunk' from 'updating the chunk and returning the requested memory pointer'.
Low level details
In more details, and assuming non exclusive mode:
sizebytesdefault_allocator_sizebytesptrchunkstructure to store information about the chunk we just allocatedWhen we re-use a chunk:
Original Walkthrough
High level summary:
Reproducing the bug
This bug was initially reproduced in #8569 , but knowing what the bug is, we can reproduce it using the following much simpler code:
What this code does:
allocates snode trees 0 through 19, by creating fields indexed 0 through 19, and immediately calling ti.sync, to materialize the snode tree
following the creation of snode trees 0 through 19, we call a kernel twice
[E 04/30/25 19:00:30.022 3136495] Received signal 11 (Segmentation fault: 11)
Detailed walkthrough
taichi/taichi/runtime/llvm/llvm_runtime_executor.cpp
Lines 699 to 700 in 562e05f
taichi/taichi/runtime/llvm/llvm_runtime_executor.cpp
Lines 706 to 711 in 562e05f
taichi/taichi/runtime/llvm/runtime_module/runtime.cpp
Lines 932 to 933 in 562e05f
taichi/taichi/rhi/common/unified_allocator.cpp
Line 32 in 562e05f
taichi/taichi/rhi/common/unified_allocator.cpp
Line 57 in 562e05f
taichi/taichi/rhi/common/unified_allocator.h
Line 31 in 562e05f
Let's first walk through the effects of LLVMRuntime and result_buffer occupying the same space.
taichi/taichi/runtime/llvm/runtime_module/runtime.cpp
Lines 727 to 730 in 562e05f
runtime->set_result(taichi_result_buffer_error_id, runtime->error_code);taichi/taichi/inc/constants.h
Line 21 in 562e05f
constexpr std::size_t taichi_max_num_ret_value = 30;set_result:taichi/taichi/runtime/llvm/runtime_module/runtime.cpp
Lines 600 to 604 in 562e05f
taichi/taichi/runtime/llvm/runtime_module/runtime.cpp
Line 602 in 562e05f
taichi/taichi/runtime/llvm/runtime_module/runtime.cpp
Line 602 in 562e05f
taichi/taichi/runtime/llvm/runtime_module/runtime.cpp
Lines 552 to 562 in 562e05f
taichi/taichi/runtime/llvm/runtime_module/runtime.cpp
Lines 546 to 550 in 562e05f
%10 = call ptr @LLVMRuntime_get_roots(ptr %9, i32 19(exact statement index varies depends on the kernel of course)Proposed fixWe need to ensure that the allocator does not allocate the same memory block twice, both to the LLVMRuntime and to the result_buffermy proposed fix is to expose theexclusiveparameter to the LLVMRuntimeand to set this parameter totrue, when used from the runtimeQuestionsA question in my mind is, why we would ever want exclusive to not be true. And by default, it is in fact set to false. I feel like there is some knowledge or insight that is missing to me.copilot:walkthrough