Skip to content

Please try native formats instead of disabling dynamic vram.#14577

Merged
comfyanonymous merged 1 commit into
masterfrom
comfyanonymous-patch-2
Jun 23, 2026
Merged

Please try native formats instead of disabling dynamic vram.#14577
comfyanonymous merged 1 commit into
masterfrom
comfyanonymous-patch-2

Conversation

@comfyanonymous

Copy link
Copy Markdown
Member

No description provided.

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 85361701-c76a-42fc-a232-e10472286b71

📥 Commits

Reviewing files that changed from the base of the PR and between b0f9e32 and 0ab45c6.

📒 Files selected for processing (1)
  • main.py

📝 Walkthrough

Walkthrough

In main.py, the startup warning emitted when args.disable_dynamic_vram is set has been expanded from a single-line logging.warning(...) call to a multi-line message. The new message includes information about the dynamic VRAM disablement, guidance to report issues, and recommendations regarding gguf and native ComfyUI model formats, along with a note about the intended deprecation timeline for the flag.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive No description was provided by the author, making it impossible to assess relevance to the changeset. Add a brief description explaining the rationale for the updated warning message and its intended impact on users.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly reflects the main change: updating a startup warning message to recommend native formats instead of disabling dynamic VRAM.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@zwukong

zwukong commented Jun 22, 2026

Copy link
Copy Markdown

So, gguf support still will be removed in the future. Really bad news, gguf saves half memory and disk than fp8, and i think quality better even using q4. GGUF is fater than nvfp4 on cards below 50s

@comfyanonymous comfyanonymous merged commit 833bfb5 into master Jun 23, 2026
16 checks passed
@comfyanonymous comfyanonymous deleted the comfyanonymous-patch-2 branch June 23, 2026 04:06
@Heliumrich

Heliumrich commented Jun 23, 2026

Copy link
Copy Markdown

nvfp4 run so bad on my RTX4080, even FP8 that overspill (ideogram at 4.2mpx) run better lol
but if FP8 is better than GGUF, that would be cool, haven't really tried recently

@SRStwo

SRStwo commented Jun 27, 2026

Copy link
Copy Markdown

"Please try" is a lot different from "too bad, tough luck." So, please try to not use euphemisms when making controversial changes.

@SRStwo

SRStwo commented Jun 27, 2026

Copy link
Copy Markdown

So, gguf support still will be removed in the future. Really bad news, gguf saves half memory and disk than fp8, and i think quality better even using q4. GGUF is fater than nvfp4 on cards below 50s

I have seen detailed data that shows Q8_0 GGUF beating FP8 soundly in perplexity. I have yet to see anyone produce data that shows otherwise but am very willing to see it if anyone can show it.

@Ph0rk0z

Ph0rk0z commented Jun 27, 2026

Copy link
Copy Markdown

Quality issues aside, dynamic vram often interferes with compilation. Maybe this will change with native int8 but it's a huge unknown. All dynamic vram has done for me is introduce headaches and overhead. I literally lose .20s from it shuffling the 20mb of TAE vae where normal non-dynamic doesn't have the problem. I am eager to try the cutlass/cublas int8 though.

GGUF support won't be removed, it just has to be modified to work with the system to get back most of the speed. There is an open PR so you are free to test/apply it.

my experience of dynamic vram have been overall negative, sorry to say. it solves a problem I don't have on my system and introduces many others.

@kijai

kijai commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator

Quality issues aside, dynamic vram often interferes with compilation. Maybe this will change with native int8 but it's a huge unknown. All dynamic vram has done for me is introduce headaches and overhead. I literally lose .20s from it shuffling the 20mb of TAE vae where normal non-dynamic doesn't have the problem. I am eager to try the cutlass/cublas int8 though.

GGUF support won't be removed, it just has to be modified to work with the system to get back most of the speed. There is an open PR so you are free to test/apply it.

my experience of dynamic vram have been overall negative, sorry to say. it solves a problem I don't have on my system and introduces many others.

Counter point: Dynamic VRAM solved my main gripes in ComfyUI memory management and actually works vs the previous smart memory which only worked (for me in Windows, Linux was and still is problem free) with tuning --reserve-vram flag per workflow, or just having it inefficiently high. I recognize there has been issues, and perhaps still are on specific setups, but overall it's been an improvement for most users.

For compile I have workaround in KJNodes, the advanced compile node has worked fine on latest image models such as Ideogram4 and Krea2 with dynamic VRAM enabled, chasing full graph has always been next to impossible in Comfy though so no change there, but dynamic VRAM without some extra compile guards definitely makes it worse, my node is already testable to address that. Officially supporting compile is rather complex endeavor and instead we've been applying targeted custom kernels for the hotspots where the biggest compile gains are, on such models compile doesn't really give any benefit anymore.

Can you elaborate on the TAE though, .2 seconds? 20 seconds? As live preview? Because that sounds like a bug that should be addressed in the TAE code.

@SRStwo

SRStwo commented Jun 27, 2026

Copy link
Copy Markdown

GGUF support won't be removed.

That's great to hear.

@Ph0rk0z

Ph0rk0z commented Jun 27, 2026

Copy link
Copy Markdown

Can you elaborate on the TAE though

I use TAE for the final VAE instead of the "proper" one in a few WFs. With dynamic vram disabled, it either stays in vram or gets moved faster. When I enable aimdo, the workflow slows by 0.20s and I see the [tiny] VAE being shuffled around. These WF are meant for speed and well tested so I can see performance regressions immediately.

but overall it's been an improvement for most users.

eh.. I see complaints about it all the time on reddit. For vramlet systems it probably solves OOMs and lets them run models they normally couldn't, albeit slowly. It did hurt a bunch of custom nodes too. I will dread the day it's no longer able to be turned off. Do admit that it's in much better shape than it used to be, but I've yet to get a single benefit. On my system I reserve no vram for the OS, it has free run of the GPU and plenty of sysram while disk loading is slow. Am 180 deg from the target audience. At least we finally have the --high-ram flag.

Tried the native int8 in kitchen today too. Both cuda kernels are slower on my turing card. Triton is for some reason gated by SM_80, despite working when I commented that bit out. After adding my previous autotune (triton.Config({'block_m': 128, 'block_n': 128, 'block_k': 64, 'group_size_m': 8}, num_stages=2, num_warps=4)), the performance seems to match int8-fast.

Compile under dynamic vram actually works, but again there is slight penalty. Maybe the cuda kernels are faster if you use xformers or some other attention besides sage? Triton always auto-compiles partially so it could be related to that.

I can't say exactly why adding a compile node after drops 1/2 a second for the models I'm using.. but it demonstrably does. A lot of the "fixes" done to address speeds are targeted at newer cards like blackwell and that's not in budget for me. Still on turing/ampere.

zhangp365 pushed a commit to zhangp365/ComfyUI that referenced this pull request Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

8 participants