feat: Context Windows sampling with LTX2 models and IC-LoRa guides (CORE-3)#13325
Conversation
…g to context windows
… properly with the retain index list
…ntext windows with guides
…odel. Older LTXV model's guides + context_windows will need to be re-implemented but outside the scope of LTX2 changes
… LTXAV model hooks
…dard execute path
…e variable names, refactor and condense new context window methods to separate execution paths cleanly
…; Condense execution path to treat all latents as potentially multimodal
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds multimodal-aware context-windowing: introduces 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@comfy/context_windows.py`:
- Around line 667-670: The clamp is still running for packed multimodal latents;
change the condition to skip truncation when the handler indicates
packed/multimodal data. Update the if-statement that uses handler.dim,
noise_shape and handler.context_length to require that the handler is NOT a
packed multimodal case (e.g. add "and not getattr(handler, 'packed_multimodal',
False)" or "and not getattr(handler, 'is_packed', False)") before mutating
noise_shape[handler.dim]; this ensures you only min() the value for real
frame-count axes and leave flattened multimodal token axes untouched.
In `@comfy/model_base.py`:
- Around line 1086-1104: The current map_context_window_to_modalities collapses
wrapped/strided video windows by using only len(primary_indices) and
max(primary_indices); instead, map each element of primary_indices individually
into the secondary modality using its relative position within the video
timeline so wrapped/strided geometry is preserved (e.g., for each idx in
primary_indices compute mapped_idx = clamp(round(idx * mod_total / video_total),
0, mod_total-1) and collect those mapped indices in the same order/stride as
primary_indices), then if you need a contiguous window for downstream code
optionally compute a minimal bounding window but prefer returning the ordered
mapped list; change map_context_window_to_modalities to iterate primary_indices
and compute per-index mappings for each latent_shapes[i] (refer to function name
map_context_window_to_modalities and caller create_windows_uniform_looped)
rather than using video_window_len and max(primary_indices).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e8047aba-e470-402b-800d-3b0413df0bd3
📒 Files selected for processing (4)
comfy/context_windows.pycomfy/ldm/lightricks/model.pycomfy/model_base.pycomfy_extras/nodes_lt.py
…imary windows The previous window-level calculation collapsed wrapped or strided primary windows into a contiguous audio tail, so audio attended to a different temporal region than the video. Replace with per-frame mapping that computes each primary index's audio span independently and concatenates in order.
…/ComfyUI into 20260322a_ltx_contextwin
There was a problem hiding this comment.
🧹 Nitpick comments (1)
comfy/model_base.py (1)
1159-1159: Consider moving import to module level.The
from comfy.ldm.lightricks.symmetric_patchifier import latent_to_pixel_coordsimport is inside the method. If this isn't for circular import avoidance, consider moving it to the module-level imports for clarity and to catch import errors at load time.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@comfy/model_base.py` at line 1159, The local import of latent_to_pixel_coords (from comfy.ldm.lightricks.symmetric_patchifier) should be moved to the module-level imports: remove the inline "from comfy.ldm.lightricks.symmetric_patchifier import latent_to_pixel_coords" inside the method and add that import at the top of comfy/model_base.py so import errors surface at load time and code is clearer; if this local import was required to avoid a circular import, keep it but add a short comment explaining why and consider replacing it with a lazy import via importlib.import_module or catching ImportError with a clear message.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@comfy/model_base.py`:
- Line 1159: The local import of latent_to_pixel_coords (from
comfy.ldm.lightricks.symmetric_patchifier) should be moved to the module-level
imports: remove the inline "from comfy.ldm.lightricks.symmetric_patchifier
import latent_to_pixel_coords" inside the method and add that import at the top
of comfy/model_base.py so import errors surface at load time and code is
clearer; if this local import was required to avoid a circular import, keep it
but add a short comment explaining why and consider replacing it with a lazy
import via importlib.import_module or catching ImportError with a clear message.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3d76da94-4ce5-4fde-bc23-9ff45cfd5c85
📒 Files selected for processing (1)
comfy/model_base.py
01cebdb to
b348c7f
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
comfy/model_base.py (1)
1091-1102: Consider guarding against zero-length video dimension.If
latent_shapes[0][dim]is zero (edge case of empty video latent), the division on lines 1101-1102 would raise aZeroDivisionError. While this is unlikely in normal operation, a defensive check would improve robustness.🛡️ Suggested defensive check
video_total = latent_shapes[0][dim] + if video_total == 0: + return result for i in range(1, len(latent_shapes)):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@comfy/model_base.py` around lines 1091 - 1102, The code computes video_total = latent_shapes[0][dim] and then divides by video_total when computing a_start/a_end, so add a defensive guard immediately after computing video_total to handle a zero-length primary video dimension: if video_total == 0, either raise a clear ValueError (e.g. referencing video_total/latent_shapes[0]) or skip the modality-mapping loop and return/emit empty mod_indices as appropriate for the caller; update callers or document behavior if you choose to raise. Ensure you reference video_total, latent_shapes, dim, primary_indices, and the loop that computes mod_total/mod_indices so the fix is applied in the correct spot.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@comfy/model_base.py`:
- Around line 1091-1102: The code computes video_total = latent_shapes[0][dim]
and then divides by video_total when computing a_start/a_end, so add a defensive
guard immediately after computing video_total to handle a zero-length primary
video dimension: if video_total == 0, either raise a clear ValueError (e.g.
referencing video_total/latent_shapes[0]) or skip the modality-mapping loop and
return/emit empty mod_indices as appropriate for the caller; update callers or
document behavior if you choose to raise. Ensure you reference video_total,
latent_shapes, dim, primary_indices, and the loop that computes
mod_total/mod_indices so the fix is applied in the correct spot.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bf8f8b7a-40c1-4b9b-a9e8-58d810d8ca83
📒 Files selected for processing (1)
comfy/model_base.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@comfy_extras/nodes_context_windows.py`:
- Around line 90-94: The call in WanContextWindowsManualNode.execute passes an
unexpected kwarg apply_causal_anchor_prefix=False to super().execute(), which
ContextWindowsManualNode.execute and IndexListContextHandler.__init__ don't
accept; remove the apply_causal_anchor_prefix=False argument from the
super().execute(...) call in WanContextWindowsManualNode.execute (or, if the
causal-anchor-prefix behavior is required, add an apply_causal_anchor_prefix
parameter to ContextWindowsManualNode.execute and propagate it into
IndexListContextHandler.__init__ so the flag is accepted and honored).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 07039581-cf9b-40b0-9aef-7564346af4e7
📒 Files selected for processing (1)
comfy_extras/nodes_context_windows.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
comfy/model_base.py (1)
1136-1150: Falling through whenguide_count == 0looks intentional — just confirming.When the cond has no guide suffix (
guide_count == 0), this branch returnsNoneand letscomfy/context_windows.pyhandle the slice via the genericcond_value.cond.size(self.dim) == x_in.size(self.dim)path (snippet fromcontext_windows.py:473-520). That should produce the expected per-window mask. Just flagging because the symmetry isn't obvious at a glance — if a future change makesdenoise_maskever larger thanx_inalongwindow.dimwithout populatingguide_frames_indices, this fallback would silently slice the wrong region. A small comment in-line noting the intentional fall-through would save the next reader a trip through the caller.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@comfy/model_base.py` around lines 1136 - 1150, Add an inline comment in the denoise_mask branch explaining that when guide_count == 0 we intentionally fall through and return None so the caller (comfy/context_windows.py) will handle slicing via the cond_value.cond.size(self.dim) == x_in.size(self.dim) path; reference the variables and methods used here (cond_value, guide_count, window.get_tensor, window.guide_frames_indices, sliced_video) so future readers understand this is deliberate and not a bug, and warn that if denoise_mask can become larger than x_in without guide_frames_indices being populated, this behavior would need revisiting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@comfy/model_base.py`:
- Around line 1136-1150: Add an inline comment in the denoise_mask branch
explaining that when guide_count == 0 we intentionally fall through and return
None so the caller (comfy/context_windows.py) will handle slicing via the
cond_value.cond.size(self.dim) == x_in.size(self.dim) path; reference the
variables and methods used here (cond_value, guide_count, window.get_tensor,
window.guide_frames_indices, sliced_video) so future readers understand this is
deliberate and not a bug, and warn that if denoise_mask can become larger than
x_in without guide_frames_indices being populated, this behavior would need
revisiting.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fd636dc2-66e9-4c3f-8c56-e80d2beb7144
📒 Files selected for processing (2)
comfy/model_base.pycomfy_extras/nodes_lt.py
🚧 Files skipped from review as they are similar to previous changes (1)
- comfy_extras/nodes_lt.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@comfy_extras/nodes_context_windows.py`:
- Around line 125-131: The execute method currently sets both
cond_retain_index_list and latent_retain_index_list to "0" when
retain_first_frame is true, which pins conditioning index 0 across windows;
change it to only set latent_retain_index_list to "0" (leave
cond_retain_index_list empty) so the LTXV/inplace path uses the new latent
retention mechanism—use retain_first_frame to compute latent_retain_index_list =
"0" or "" and pass cond_retain_index_list="" to super().execute while leaving
all other arguments intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b4cf52fe-1c10-49de-9efd-8b9bf49d6075
📒 Files selected for processing (1)
comfy_extras/nodes_context_windows.py
| def execute(cls, model: io.Model.Type, context_length: int, context_overlap: int, context_schedule: str, fuse_method: str, freenoise: bool, | ||
| retain_first_frame: bool=False, split_conds_to_windows: bool=False, context_stride: int=1, closed_loop: bool=False) -> io.Model: | ||
| context_length = max(((context_length - 1) // 8) + 1, 1) # at least length 1 | ||
| context_overlap = max(context_overlap // 8, 0) # at least overlap 0 | ||
| retain_index_list = "0" if retain_first_frame else "" | ||
| return super().execute(model, context_length, context_overlap, context_schedule, context_stride, closed_loop, fuse_method, dim=2, freenoise=freenoise, | ||
| cond_retain_index_list=retain_index_list, latent_retain_index_list=retain_index_list, split_conds_to_windows=split_conds_to_windows) |
There was a problem hiding this comment.
Use only latent_retain_index_list for LTXV retain_first_frame.
Lines 129-131 currently enable both retention paths, but this node is the inplace-style/LTXV convenience path that the new latent_retain_index_list was added for. Keeping cond_retain_index_list="0" here also pins conditioning index 0 in every window, which can leak the first-frame conditioning across prompt/guide changes.
Proposed fix
retain_index_list = "0" if retain_first_frame else ""
return super().execute(model, context_length, context_overlap, context_schedule, context_stride, closed_loop, fuse_method, dim=2, freenoise=freenoise,
- cond_retain_index_list=retain_index_list, latent_retain_index_list=retain_index_list, split_conds_to_windows=split_conds_to_windows)
+ latent_retain_index_list=retain_index_list, split_conds_to_windows=split_conds_to_windows)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@comfy_extras/nodes_context_windows.py` around lines 125 - 131, The execute
method currently sets both cond_retain_index_list and latent_retain_index_list
to "0" when retain_first_frame is true, which pins conditioning index 0 across
windows; change it to only set latent_retain_index_list to "0" (leave
cond_retain_index_list empty) so the LTXV/inplace path uses the new latent
retention mechanism—use retain_first_frame to compute latent_retain_index_list =
"0" or "" and pass cond_retain_index_list="" to super().execute while leaving
all other arguments intact.
There was a problem hiding this comment.
@drozbay, understood — thanks for the clarification! I'll note that for the LTXV convenience node, enabling both cond_retain_index_list and latent_retain_index_list when retain_first_frame=True is the intended behavior.
(ノ◕ヮ◕)ノ*:・゚✧
✏️ Learnings added
Learnt from: drozbay
Repo: Comfy-Org/ComfyUI PR: 13325
File: comfy_extras/nodes_context_windows.py:125-131
Timestamp: 2026-05-01T00:39:30.094Z
Learning: In `comfy_extras/nodes_context_windows.py`, `LTXVContextWindowsNode.execute()` intentionally sets BOTH `cond_retain_index_list="0"` AND `latent_retain_index_list="0"` when `retain_first_frame=True`. This is by design for LTXV/LTXAV models — the LTXV workflow requires both the concat-style conditioning retention path AND the inplace/latent retention path to be active simultaneously. Do not flag this as a bug or suggest removing `cond_retain_index_list`.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: drozbay
Repo: Comfy-Org/ComfyUI PR: 13565
File: comfy_extras/nodes_lt.py:350-368
Timestamp: 2026-04-29T23:29:58.457Z
Learning: In `comfy_extras/nodes_lt.py` (`LTXVAddGuide`), the `dilated_mask` produced by `dilate_latent()` uses `-1.0` for unsampled (filler) cells and `1.0` for sampled (grid) positions. This is intentional grid-occupancy metadata for IC-LoRA dilation, NOT a standard [0,1] denoise-strength noise_mask. The downstream `append_keyframe` path that receives `guide_mask` from `dilate_latent` interprets it differently (occupancy) from the regular noise mask path. Do not flag the -1.0 fill value as a bug. This mirrors the reference implementation by Lightricks in their ComfyUI-LTXVideo custom node pack.
Learnt from: pythongosssss
Repo: Comfy-Org/ComfyUI PR: 12555
File: comfy_extras/nodes_glsl.py:719-724
Timestamp: 2026-02-21T14:01:41.482Z
Learning: In PyOpenGL, bare Python scalars can be accepted for 1-element array parameters by NumberHandler. This means you can pass an int/float directly to OpenGL texture deletion (e.g., glDeleteTextures(tex)) without wrapping in a list. Verify function-specific expectations and ensure types match what the OpenGL call expects; use explicit lists only when the API requires an array.
Learnt from: jtydhr88
Repo: Comfy-Org/ComfyUI PR: 12757
File: comfy_extras/nodes_custom_sampler.py:1069-1089
Timestamp: 2026-03-04T14:05:31.426Z
Learning: In the ComfyUI sampling pipeline, treat percent_to_sigma(0.0) as a sentinel value (999999999.9) that means starting from pure noise. This is consistent with BasicScheduler via calculate_sigmas. The SamplingPercentToSigma node’s return_actual_sigma flag differentiates this sentinel from sigma_max. Reviewers should not flag CurveToSigmas or similar nodes that rely on percent_to_sigma as bugs; downstream samplers are expected to handle the sentinel correctly. When reviewing related sampling-related code, assume this sentinel semantics unless there is explicit handling for a real sigma_max.
Learnt from: kijai
Repo: Comfy-Org/ComfyUI PR: 13258
File: comfy_extras/nodes_frame_interpolation.py:151-189
Timestamp: 2026-04-04T13:29:15.653Z
Learning: In this ComfyUI codebase, node `execute()` inference is already run under a global `torch.inference_mode()` context established in the execution engine (e.g., `execution.py` around line ~732). During review, avoid recommending changes that wrap node inference loops in `torch.inference_mode()`—it is already applied, so such suggestions are likely redundant.
… downscale ratio instead of using latent_start key added to attention_guide_entries. - This makes it possible for existing nodes like the ComfyUI-LTXVideo nodes to work with context windows without needing modifications
Summary
Enables LTX-2 (LTXAV) multimodal video+audio models to work with context windows, and reworks IC-LoRA guide handling so guides at arbitrary positions get spliced correctly into each window they overlap.
Context windowing now treats every input as potentially multi-modal via a new
WindowingStatethat holds per-modality latents and concatenated guide latents if present. A newmap_context_window_to_modalitieshook on the LTXAV model derives per-modality windows from the primary (video) window. LTXAV's guide-specific slicing (denoise_mask,keyframe_idxs,guide_attention_entries, audio mask) lives on the model via theresize_cond_for_context_windowpattern.Related PRs (all merged)
Three PRs were extracted from this work to land independently. None are required for this PR to function but would improve the behavior of context windows for LTX2 especially with guides. (Update: As of May 16 2026 these have all been merged)
causal_window_fixtoggle for context windows. Improves temporal coherence between non-zero-indexed windows when no guides are present.Nodes
LTXV Context Windows(new)Convenience node for LTX-2 / LTXV workflows. Inputs are in real (pixel) frames:
context_lengthsnaps to8n+1,context_overlapsnaps to multiples of 8, and the node forwards to the underlying handler withdim=2. A singleretain_first_frametoggle simultaneously retains sub-pos 0 in both the conditioning and the noise latent, the right combo for inplace-style I2V workflows (LTX-2 first-frame, AnimateDiff-style references). Advanced settings (stride, closed_loop, freenoise) are collapsed by default.WAN Context Windows(updated)Renamed from
WAN Context Windows (Manual)to match the new LTXV variant. Same simplification: pixel-frame inputs (context_lengthsnaps to4n+1), advanced settings collapsed, and a singleretain_first_frametoggle that retains sub-pos 0 in conditioning (Wan I2V is concat-style, so the start image lives in c_concat).[screenshot]
Context Windows (Manual): newlatent_retain_index_listinputCompanion to the existing
cond_retain_index_list, covering the other I2V flavor:cond_retain_index_list: for concat-style I2V models (Wan I2V, HunyuanVideo I2V, Cosmos I2V, SVD), where the encoded start image lives in c_concat conditioning channels.latent_retain_index_list: for inplace-style I2V models (LTXV, LTX-2), where the reference content lives directly in the noise latent.Independent and can be combined. The new LTXV / WAN convenience nodes set the appropriate one(s) automatically when
retain_first_frameis enabled.Workflow and examples
All workflows below use native ComfyUI nodes only. Workflows 1-4 use LTX-2.3. Workflow 5 uses WanAnimate.
Note: The actual results here were created locally with PR #13563 merged in as well. Without this there is some artifacting when blending windows, but they still run fine.Edit: PR #13563 is merged now.Input assets (images, audio, video) for all workflows are bundled in cw_wf_assets_2.zip: extract into
ComfyUI/input/before running.1. Image + audio to video (multimodal LTX-2.3)
Context windowing across video and audio modalities simultaneously.
Workflow: droz_LTX2_context_win_imageaudio_to_video_v2.json
ComfyUI_00222_.mp4
3x upscale variant: chaining a 2x and a 1.5x latent upscale model in series after the base generation produces a 1728x1248 output, only possible with windowing because of per-window memory bounds.
ComfyUI_00223_.mp4
2. First-frame / last-frame (LTX-2.3)
First frame and last frame with retain_first_frame to keep reference throughout middle windows.
droz_LTX2_context_win_firstframe_lastframe_v2.json
ComfyUI_00225_.mp4
3. First-frame / last-frame with prompt travel (LTX-2.3)
Uses
split_conds_to_windowsfor guiding a generation through multiple windows temporally with prompts.droz_LTX2_context_win_firstframe_lastframe_prompt_travel_v2.json
ComfyUI_00226_.mp4
4. Control video input via ltx-2.3-22b-ic-lora-union-control-ref0.5
Uses full video guidance with a start image that is maintained for reference for all segments. Two stage workflow.
droz_LTX2_context_win_canny_union_control_v2.1.json
Expected output:
LTX23_ContextWinTest_00023_.mp4
5. WanAnimate test
Confirming non-multimodal video models still work.
droz_WanAnimate_ContextWindows_Native_v1.1.json
WanAnimate_00004_.mp4