Skip to content

[amdgpu] Calculate mcpu_ and compute_capability_ properly and with ROCm 6 compat#8667

Merged
feisuzhu merged 2 commits into
taichi-dev:masterfrom
GZGavinZhao:amdgpu-gcnarch-fix
Apr 7, 2025
Merged

[amdgpu] Calculate mcpu_ and compute_capability_ properly and with ROCm 6 compat#8667
feisuzhu merged 2 commits into
taichi-dev:masterfrom
GZGavinZhao:amdgpu-gcnarch-fix

Conversation

@GZGavinZhao

@GZGavinZhao GZGavinZhao commented Mar 27, 2025

Copy link
Copy Markdown
Contributor

Issue: #6434 (part of)

Brief Summary

In the AMDGPU backend, we calculate mcpu_ directly based on compute_capability_ using integer addition. However, this can be problematic as the mcpu_ name is partially hex-based, e.g. compute_capability_ = 910 corresponds to mcpu_ = "gfx90a". The proper and recommended way to get mcpu_ is using field gcnArchName. Similarly, compute_capability_ should be calculated using fields major and minor instead of gcnArch.

Additionally, there are complications in ROCm 6 regarding calling hipGetDeviceProperties by looking up its ABI symbol in libamdhip64.so. In ROCm 6, the ABI symbol hipGetDeviceProperties (likely) incorrectly maps to the ROCm 5 version of hipGetDeviceProperties which is not ABI-compatible. To handle this, we first treat hipGetDeviceProperties as ROCm 5 version, and if the values we get don't make sense we then treat it as the ROCm 6 version.

Walkthrough

copilot:walkthrough

@CLAassistant

CLAassistant commented Mar 27, 2025

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@GZGavinZhao GZGavinZhao force-pushed the amdgpu-gcnarch-fix branch 2 times, most recently from ee617f8 to bf76688 Compare March 27, 2025 01:28
@galeselee

Copy link
Copy Markdown
Contributor

Great!

@feisuzhu

feisuzhu commented Apr 5, 2025

Copy link
Copy Markdown
Contributor

/rebase

@feisuzhu feisuzhu changed the title [amdgpu]: calculate mcpu_ and compute_capability_ properly and with ROCm 6 compat Apr 7, 2025
@feisuzhu feisuzhu changed the title [amdgpu] calculate mcpu_ and compute_capability_ properly and with ROCm 6 compat Apr 7, 2025
@feisuzhu

feisuzhu commented Apr 7, 2025

Copy link
Copy Markdown
Contributor

pre-commit failure is not related, force merging

@feisuzhu feisuzhu merged commit 5106074 into taichi-dev:master Apr 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants