Releases: unslothai/unsloth
Release list
GLM 5.2 + Model Hub + 3x longer contexts
GLM-5.2 is now supported in Unsloth Studio! All reasoning levels supported. 3x longer context lengths are now achievable with our new auto fit algorithm with MTP, allowing longer chats. Bypass permissions mode, forkable chats, queue-able chats, a new hub for model discovery, parallel modules + HTTPS Cloudflare support and more! Use unsloth studio --secure for secure HTTPS global access! Read our GLM-5.2 guide
To update Unsloth or install a new Unsloth Studio, you must use the below.
Ensure your version is 2026.6.9 or v0.1.471-beta for the latest.
MacOS, Linux, WSL:
curl -fsSL https://unsloth.ai/install.sh | sh
Windows:
irm https://unsloth.ai/install.ps1 | iex
Better context length algorithm
As per #6312 and #6447, we made Unsloth Studio's determination of memory usage and context length much better, achieving 3x longer context overall:
| scenario | KV | before | after |
|---|---|---|---|
| 1x 32GB pipeline (~31 GB free) | f16 | 23,040 | 64,000 |
| q8_0 | 43,520 | 114,944 | |
| q4_0 | 82,432 | 199,680 | |
| 2x 32GB pipeline | any | 262,144 | 262,144 |
| 2x 24GB tensor (~23 GB free) | f16 | 134,049 | 262,144 |
| q8_0 | 252,329 | 262,144 |
Chat Canvas, Forking & Queueing
- Edit assistant messages in place and re-run from any point in the thread.
- Fork a thread to branch a conversation without losing the original.
- Temporary (incognito) chats that leave nothing behind.
- Queue new prompts while a generation is still running instead of waiting.
- Chat "artifacts" are now canvas, with inline HTML canvas cards that auto-render, a Code view, and DiffusionGemma keeps its raw code visible inline instead of collapsing.
- Chat search now covers every message and surfaces your own messages first.
Hub (Redesigned)
- Full-page Hub with a trending feed, search, and custom model paths support.
- README preview in a split-view feed so you can read before you download.
- Downloads default to the faster Xet transport, with automatic HTTP fallback if a transfer stalls.
- New "Load on selection" toggle to set load options before a model loads.
- Google logo shown for DiffusionGemma and future Gemma derivatives.
Models & Inference
- DeepSeek-OCR and more vision models now load and run without errors.
- Fixed fast inference on the latest vLLM (0.22+) so speed-ups work again.
- Tensor parallelism is more reliable: if the faster MTP path fails, it now recovers on its own instead of crashing.
- DiffusionGemma now shows the image forming live as it denoises, with accurate speed stats.
Security & Cloudflare Encrypted Studios
- New
--secureCloudflare-only mode for end-to-end encrypted studios, with server-side tools staying enabled under--secure. Useunsloth studio --secure! - Bypass Permissions mode to skip confirmations and disable the tool sandbox when you want it.
- Auto detect Hugging Face Virus scanning + dangerous files in repos.
Logging and API
- New API server monitor in Studio.
- Faster API calling and less latency
- Much better streamlined logs - now with throughput and latency and removed a lot of bloated logs.
Hardware & Backend
- Better support for Blackwell RTX 50X and 60X GPUs
- Fix silent downgrading to CPU and not GPU
- torchao version is now selected from the installed torch.
- Installer now auto-repairs a broken or CPU-only PyTorch install and warns on silent CPU fallback, across NVIDIA + AMD on Win/Linux/Mac/WSL.
- Frees the chat model's VRAM when training starts, but only when the GPU is actually tight (no needless reloads otherwise).
- If llama-server hard-crashes at startup, Studio now steps through a recovery ladder instead of just failing.
Training & General Fixes & Parallel Modules
- MLX training updates.
- Improved GRPO training reliability with vLLM.
- Training startup made more reliable, with clearer errors for invalid VLM batches.
- Studio now cleans up leftover backend processes more reliably after crashes, restarts, or interrupted shutdowns.
- Export, Chat, Training, Recipes are all individualized / compartmentalized! This means you can do all 4 in parallel now! You can chat / do inference while you wait for a training run or an export!
What's Changed
- Bump install.sh / install.ps1 pin to unsloth>=2026.6.4 by @danielhanchen in #6257
- Studio: account for mmproj VRAM in GGUF fit budget (#5825) by @hoobnn in #5849
- fix(studio): keep local GGUF vision on llama-server by @alkinun in #5770
- install.sh: keep the studio launch from draining the curl | sh script (WSL/dash) by @danielhanchen in #6258
- DiffusionGemma: set UNSLOTH_IS_PRESENT so the shim runs on a clean install by @danielhanchen in #6259
- studio: add keyboard navigation to model picker by @alkinun in #5628
- Bump install.sh / install.ps1 pin to unsloth>=2026.6.5 by @danielhanchen in #6260
- Installer: drop the lemonade ROCm fallback now the fork ships identical per-gfx prebuilts by @oobabooga in #6225
- Studio: keep distinct bpw flavors of the same GGUF quant by @bouclem in #5729
- studio: declare UNSLOTH_IS_PRESENT at backend startup (clean-install + Windows) by @danielhanchen in #6262
- Studio: extend llama.cpp first-token timeout by @Imagineer99 in #5841
- Studio: polish update banner layout and sidebar settings icon by @shimmyshimmer in #6266
- Studio: only advertise a Cloudflare tunnel once it actually serves by @oobabooga in #6264
- Studio: backfill the DiffusionGemma visual-server on a tag-matching update by @danielhanchen in #6267
- Studio: keep llama-server discovery from crashing on an access-denied candidate by @danielhanchen in #6268
- Bump install.sh / install.ps1 pin to unsloth>=2026.6.6 by @danielhanchen in #6270
- Tidy update banner and auth button spacing by @shimmyshimmer in #6279
- Fix llama.cpp prebuilt: skip already-installed same-release fallback by @danielhanchen in #6285
- Installer: drop redundant -WindowStyle Hidden from the Windows launcher VBS by @danielhanchen in #6284
- Fix Responses tool output content arrays by @alkinun in #6287
- Studio: UI polish for sidebar, menus, hub and toasts by @shimmyshimmer in #6288
- Upgrade setuptools and wheel in the auto-install command by @danielhanchen in #6282
- Studio: fix training output dir escaping outputs root for models on another drive by @danielhanchen in #6293
- Studio: offer llama.cpp update for same-base mix builds on source installs by @shimmyshimmer in #6280
- Studio: resolve studio home before the llama-only setup split by @danielhanchen in #6289
- Studio: force-refresh the llama.cpp update check so new builds are not masked by the 24h cache by @shimmyshimmer in #6278
- Studio: decide diffusion routing before the SWA resolver by @danielhanchen in #6299
- Bump install.sh / install.ps1 pin to unsloth>=2026.6.7 by @danielhanchen in #6301
- Studio: don't silently fall back to a CPU prebuilt on NVIDIA Linux GPU hosts by @oobabooga in #6310
- Studio: clarify llama.cpp update banner copy by @danielhanchen in #6313
- MLX Training updates by @mmathew23 in #5656
- Studio: add temporary (incognito) chat by @oobabooga in #5956
- Studio: enable stdio MCP servers on a loopback bind by @oobabooga in #6295
- fix(studio): Windows GGUF cancel hang + CPU spinlock overhead (#5692) by @anmolxlight in #5749
- feat: add Anthropic-compatible thinking parameter by @maattm in #5856
- Studio: Bypass Permissions (skip confirmation, disable tool sandbox) by @danielhanchen in #5895
- Studio: add --secure Cloudflare-only mode and revamp API usage examples by @danielhanchen in #6300
- Studio: arm the VRAM-settle wait after the startup orphan reaper by @danielhanchen in #6315
- Studio: fix Mac IME input-method switch leaving composer Send disabled by @narakai in #5762
- fix: use partial hipinfo output on crash to avoid CPU fallback (RDNA 4 / gfx1200) by @mvanhorn in #6292
- Rename chat artifacts copy to canvas by @wasimysaid in https://github.com/unslothai/u...
DiffusionGemma + Gemma 4 MTP
We've merged over 150 PRs this week so lots of new updates, a new model Hub and look! Ensure you install the latest v0.1.464-beta or 2026.6.7. DiffusionGemma, Gemma 4 MTP and MiniMax-M3 are all now supported.
DiffusionGemma + Gemma 4 MTP + Audio
- Run and train DiffusionGemma via Unsloth Studio. Install the latest
v0.1.462-betaif DiffusionGemma wasn't previously working. - Gemma 4 MTP is here! Run Gemma 4 around 2x faster with MTP - MTP is auto enabled in Unsloth Studio.
- Audio chat is now supported for Gemma 4 (
wav,mp3,m4a,flac,webm). - Preserve Thinking added to Gemma 4.
Hub + Download Manager (Experimental)
- Added a new Hub page for browsing, downloading, and managing Hugging Face models and datasets.
- Unsloth can now detect models and datasets already on your machine and show them alongside downloaded assets.
- Downloaded GGUF models now have direct Run / New Chat actions.
Chat with Files / RAG (Experimental)
- Added Chat with Files in Studio, letting you ask questions over your own documents and knowledge bases.
- Supports hybrid search, citations, PDF previews, per-thread documents, and a built-in
search_knowledge_basetool.
New Update Button + Hardware Support
- Unsloth now uses fresh, up-to-date llama.cpp prebuilts across CUDA, ROCm, Windows, Linux, and macOS.
- Added an in-app Update llama.cpp button so users can update the local backend without reinstalling Studio.
- Improved Windows / WSL AMD support, Strix Halo ROCm support, Blackwell CUDA selection, and clearer installer messages.
Local Chat, Tools & API Compatibility
- Local tool calling is more reliable, with better ordering of tool cards, fewer duplicate tool loops, and support for tool use with GGUF vision models.
- Improved OpenAI-compatible API and Anthropic-compatible API behavior for local Studio servers, including better errors, token usage, stop reasons, and Claude Code compatibility.
Tool Calling, MCP, Encrypted Cloudflare Tunnels
- Bypass Permissions, Tool Call Permissions (Approve, Always Approve, Deny)
- 50% to 90% less tool call nudging issues without any accuracy loss
- MCP, Artifacts are now select-able
- Tensor parallelism is now enabled for GGUFs - get +30% throughput!
- Cloudflare HTTPS free tunnels is now added allowing for end to end encrypted studios!
Training & General Fixes
- Improved MLX support with better model labels, generation speed stats, and fixes for VLM training.
- Fixed several training and dataset edge cases, including non-writable Hugging Face caches and custom dataset mappings.
- Added many UI polish fixes across chat, menus, model picker, dark mode, import/export, and settings.
To update Unsloth or install a new Unsloth Studio, you must use:
macOS, Linux, WSL:
curl -fsSL https://unsloth.ai/install.sh | sh
Windows:
irm https://unsloth.ai/install.ps1 | iex
Warning
DO NOT USE unsloth studio update since packaging will not get the latest updates
What's Changed
- Studio: llama.cpp update banner redesign, About tab license info, UI polish by @shimmyshimmer in #6196
- Bump install.sh / install.ps1 pin to unsloth>=2026.6.3 by @danielhanchen in #6212
- Expose runtime context length for hub models by @alkinun in #6154
- Studio: fix llama.cpp update banner offering a downgrade / sticking on mix releases by @oobabooga in #6219
- Fix kwarg spacing in training files to satisfy pre-commit by @shimmyshimmer in #6209
- Studio: reword the Cloudflare line when the public probe fails by @danielhanchen in #6217
- fix: deduplicate lemonade ROCm prebuilt selection log by @LeoBorcherding in #6021
- Stop false RoPE 'default' warning and fix rope drift gate on transformers 5 by @danielhanchen in #6223
- fix(studio): load run.py by path for editable installs by @jimdawdy-hub in #5909
- fix(studio): inherit llama_extra_args and honor --no-mmproj by @jimdawdy-hub in #5902
- fix(studio): adopt server-loaded model before chat auto-load by @jimdawdy-hub in #5900
- Fix stale sidebar regression test to match the gap-px markup by @danielhanchen in #6232
- Studio: gate the staged prebuilt runtime validation behind a flag (off by default) by @danielhanchen in #6216
- Fix FastModel config passthrough for sequence classification by @alkinun in #6203
- fix: decode subprocess output as UTF-8 in save.py on Windows by @dylanschroers in #6218
- patch: fix EmptyLogits gathering in nested payloads and Accelerate recursively_apply by @MdHussain121 in #6092
- Studio: show Apple GPU temperature and power in the GPU monitor (macOS) by @Ban921 in #6187
- Studio: Add inline confirmation (Allow/Always allow/Deny) for tool calls by @oobabooga in #5869
- Studio: guard Apple GPU power against negative counter-reset readings by @danielhanchen in #6235
- Fix step count mismatch when sequence packing is enabled by @IrakliXYZ in #5967
- fix/uv-bytecode-timeout by @alkinun in #6166
- Studio: tune llama.cpp env for data-center GPUs by @danielhanchen in #6098
- Studio: drop the on-disk freshness cache after a llama.cpp update by @danielhanchen in #6234
- Add missing RAG deps to no-torch Studio runtime requirements by @danielhanchen in #6236
- Studio: rounded rectangle hover states for menu items instead of pills by @shimmyshimmer in #6210
- docs: repository cleanup by @Agnibha007 in #5617
- Run cross-platform parity test on Windows and macOS in CI by @danielhanchen in #6241
- chore(studio/frontend): normalize line endings to LF by @danielhanchen in #6012
- fix: respect absolute export paths to prevent cross-drive copy failures (WinError 112) by @anmolxlight in #6088
- Studio: Add Tensor-Parallel llama.cpp support by @oobabooga in #6040
- Studio: Add custom provider option to Connections by @Imagineer99 in #6112
- Studio: model selector and settings polish by @shimmyshimmer in #6240
- Studio: login card polish and sidebar label alignment by @shimmyshimmer in #6242
- Studio: pinnable plus menu items and saved prompt pins by @shimmyshimmer in #6237
- Studio: bottom update banners, smooth llama.cpp progress, re-prompt after copy by @shimmyshimmer in #6233
- fix(studio/responses): forward chat_template_kwargs enable_thinking to chat request by @Anai-Guo in #6202
- Studio: fix WSL Strix Halo GPU on reinstall (ROCDXG drop-in + system HIP before bundle) by @danielhanchen in #6227
- Studio: fully rounded Hub pills and refreshed menu icons by @shimmyshimmer in #6248
- Studio: use px-2.5 for Hub option menu padding by @shimmyshimmer in #6249
- Studio: fix Downloaded model list disappearing and order it by last download by @danielhanchen in #6247
- Studio: new-chat shortcut, composer draft autosave, archive threads by @NilayYadav in #5771
- Studio: persist speculative decoding preference across restart and model switch by @oobabooga in #6169
- Studio: refine menu chevron, tick icon, and one-line plus-menu shape by @shimmyshimmer in #6251
- Studio: serve D...
Gemma 4 MTP + Bug Fixes
Bug Fixes and more cross platform support
To update Unsloth or install a new Unsloth Studio, you must use:
macOS, Linux, WSL:
curl -fsSL https://unsloth.ai/install.sh | sh
Windows:
irm https://unsloth.ai/install.ps1 | iex
Warning
DO NOT USE unsloth studio update since packaging will not get the latest updates
What's Changed
- Bump install.sh / install.ps1 pin to unsloth>=2026.6.1 by @danielhanchen in #5977
- Port KTO logps truncation guard to TRL 1.x _compute_logps refactor by @danielhanchen in #5996
- CI: track deepseek_ocr2 compile timeout in known-broken list by @danielhanchen in #5995
- fix(studio): disable mlx gc for none by @Lyxot in #5991
- Normalize shell scripts to LF in .gitattributes by @danielhanchen in #5997
- Studio: enable audio input for Gemma 4 GGUFs; default chat model to Qwen3.5-4B-MTP by @danielhanchen in #6000
- Fix chat text cutoff at composer dock and speed up plus icon spin by @shimmyshimmer in #5989
- Studio: refine tool call and reasoning trigger UI by @shimmyshimmer in #5873
- fix: warn when localhost resolves to ::1 but Studio is bound only to 127.0.0.1 by @mvanhorn in #5994
- fix: persist Studio thread synchronously on first runStart so mid-stream refresh keeps the prompt by @mvanhorn in #5814
- Studio: enable GGUF tools with vision inputs by @Imagineer99 in #6009
- Studio: accept system-role messages in Claude Code requests by @Imagineer99 in #6006
- Studio: fix load_freeze audio-type tests for #6000's Gemma 4
<|audio|>probe by @danielhanchen in #6018 - Studio: fix chat preset persistence with fast mode by @Imagineer99 in #5870
- Studio: fix Repo tests (CPU) by stopping the ROCm test from leaking a fake utils into sys.modules by @danielhanchen in #6027
- Studio: stop ROCm amd-smi tests leaking a fake loggers into sys.modules (follow-up to #6027) by @danielhanchen in #6055
- Studio: emit usage and timings for MLX generation speed stats by @shimmyshimmer in #6068
- Studio: tag MLX loaded models as MLX instead of Base in chat by @shimmyshimmer in #6067
- Studio: remove red border on chat error messages by @shimmyshimmer in #6063
- Studio: keep chat in place when composer attachments resize it by @shimmyshimmer in #6070
- CI: allowlist deepseek_ocr2 in compiler full-model-sweep by @danielhanchen in #6085
- Restore KTO logps truncation guard for TRL (re-apply dropped #5996) by @danielhanchen in #6086
- Studio: stop leaking internal exceptions to API clients; harden sandbox path by @danielhanchen in #6072
- Formatting: ruff line-length 100 + drop blank after short local imports by @danielhanchen in #6079
- Fix MoE LoRA target parameter handling by @Datta0 in #5345
- Refactor VLM detection in studio by @Datta0 in #5245
- qwen 3.5 export fixes by @Datta0 in #5992
- [fix] Nvfp4 load by @Datta0 in #6087
- Studio: open the MCP dialog to the server list by @oobabooga in #6100
- Studio: make code comments and docstrings more succinct by @danielhanchen in #6029
- Reduce and tighten code comments and docstrings repo-wide by @danielhanchen in #6095
- Studio frontend: reduce and tighten code comments by @danielhanchen in #6099
- feat(studio): Hub + Download Manager by @Sneakr in #5916
- Studio fix recipe dataset preview by @wasimysaid in #6031
- Studio: make Helper LLM startup pre-cache opt in by @wasimysaid in #6113
- Improve local chat tool call flow by @wasimysaid in #5962
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #6104
- Studio: improve OpenAI- and Anthropic-compatible API spec compliance by @oobabooga in #6010
- Studio: follow-up fix for GGUF developer prompts by @wasimysaid in #6115
- Studio: stop the providers dialog from resetting custom provider form state by @NilayYadav in #6051
- Studio: clean-room compact RAG (knowledge bases, hybrid search, fast indexing) by @danielhanchen in #5910
- feat: support text-only loading of Gemma 3 27B via FastLanguageModel (skip SiglipVisionModel) by @mvanhorn in #5816
- studio(ui): use the --primary brand token for the avatar fallback color by @danielhanchen in #5987
- fix(studio): block arbitrary external image URLs in markdown renderer by @rsd-darshan in #5602
- Studio: center account avatar vertically in sidebar footer pill by @shimmyshimmer in #6026
- fix: clearer Studio setup error when GPU driver is too old for the installed CUDA toolkit by @mvanhorn in #5993
- Studio: npm v12 readiness (allowScripts policy, npmrc cleanup, bun bootstrap fix) by @danielhanchen in #6128
- Studio: faithful conversation export and import round trips (ShareGPT system role, CSV quoted newlines) by @danielhanchen in #6131
- Studio: auto-sync allowScripts pins after dependency bumps by @danielhanchen in #6136
- Studio: unify shadows, backgrounds and dark mode consistency in chat UI by @shimmyshimmer in #6116
- Windows/WSL installer: fix winget msstore cert failure, amd-smi DiskPart prompt, and enable AMD GPU (Strix Halo gfx1151) by @danielhanchen in #5940
- Studio: bulk export and import in Settings Chat Data, MCP pill off switch by @danielhanchen in #6141
- Studio: fix nested dropdown submenus clipped by the menu alignment nudge by @danielhanchen in #6143
- Studio: declutter the chat plus menu and make RAG session-only with pre-select by @danielhanchen in #6140
- fix/validate dataset video paths before training by @LeoBorcherding in #5136
- Restore config use_cache in for_inference after gradient checkpointing prep by @danielhanchen in #6137
- fix: persist Windows ROCm BNB version by @Peter7896 in #6048
- Studio GGUF CI: fix deterministic JSON-mode failure by bumping the quant, keep hard asserts by @danielhanchen in #6138
- Frontend CI: hard-fail unreviewed npm install scripts (--strict-allow-scripts) by @danielhanchen in #6139
- Studio UI polish: search dialog shadow, model picker pills, sidebar spacing, white Hub background by @shimmyshimmer in #6147
- Run the HF cache redirect before import fixes that can freeze Hub constants by @danielhanchen in #6150
- Fix Studio MLX VLM resized image layout by @Lyxot in #6019
- fix(studio): reject unsupported MLX CPT and embedding training by @Lyxot in #6091
- fix(studio): forward custom dataset mappings in MLX training by @Lyxot in #6094
- Tests + CI guard: batched left-padded generation can never silently regress again (#1066, #3699) by @danielhanchen in #6145
- Tests: follow Compare chat into the More submenu in the Playwright chat UI driver by @danielhanchen in #6153
- Fix UnboundLocalError in ROCm version detection dpkg/rpm fallback by @danielhanchen in #6149
- Auto-set BNB_ROCM_VERSION from the installed wheel on Windows + ROCm by @danielhanchen in #5986
- Call patch_compiling_bitsandbytes on the FastLanguageModel path by @danielhanchen in #6144
- Studio: mascot images degrade gracefully instead of showing alt text by @danielhanchen in #6146
- Studio: training survives a non-writable HF datasets cache by @danielhanchen in #6148
- Studio: fix Gemma-4-12B-it not loading by @Imagineer99 in https://github.com/uns...
Gemma 4 12B, New UI, MCP, Projects
Hey everyone, this update focuses mainly on MCP, Projects, Canvas and the new chat UI.
We've also made many improvements across Studio. Next week we'll have an even bigger update.
If Gemma 4 12B isn't working for you, please re-update Unsloth!
To update Unsloth or install a new Unsloth Studio, you must use:
macOS, Linux, WSL:
curl -fsSL https://unsloth.ai/install.sh | sh
Windows:
irm https://unsloth.ai/install.ps1 | iex
Warning
DO NOT USE unsloth studio update since packaging will not get the latest updates
Gemma 4 12B
Google releases Gemma 4 12B, a new model that runs locally on 8GB RAM. GGUF / Guide
Gemma 4 12B Unified supports image, audio and 256K context. Run and train the model via Unsloth Studio.
MCP
- Give your model live tools instead of relying on memory one click in the composer, no API keys for the built-ins
- Built-in presets:
- Context7 current docs & code for thousands of libraries
- Exa live web search
- Hugging Face search models, datasets & papers
- Add your own remote (OAuth/headers) or local (stdio) servers, toggled per chat
New Chat UI
- Projects, Canvas,
MCP, and Compare tuck into one+menu - Search and Code are now one click away
Projects
- Keep related chats together in one workspace
- Create a project from the sidebar, then add new or existing chats to it
Experimental Canvas / Artifacts
- Opens generated HTML in a dedicated canvas panel inside Unsloth Studio
- Supports interactive outputs, including browser based visualizations and CDN-loaded packages
- Lets you switch between rendered preview and source code
Install, Runtime & Hardware
- CUDA / Windows
- CUDA 13.3
llama.cppbinaries now work on Windows (and other non-Linux) and fix the CUDA 13.2 gibberish-output bug while default still pins to CUDA 13.1 for now - On CUDA 13.2, 13.1 and below, Windows falls back to CUDA 12.4 and native 13.1 binaries coming soon
- Windows prebuilt installs no longer block on the early
CUDA Toolkitcheck
- CUDA 13.3
- Linux / GPU
- Linux
llama.cppprebuilts now match your runtime'scudartmajor version - Prebuilt coverage for
Blackwell(with a CUDA 13.0 driver fallback) andB300(sm_103) ARM64Linux now source-builds on GPU hosts, with a CPU prebuilt fallbackROCm: detected AMDgfxarch is forwarded to the prebuilt installer (with asetup.shfallback)
- Linux
- macOS
- Fixed Apple Silicon installs that were resolving
torchagainst x86_64
- Fixed Apple Silicon installs that were resolving
Other Studio improvements
- Connected models now work in Compare mode
- Smoother streaming which now renders batched to one per animation frame
- Larger upload limits for training datasets and recipe
- Window size and maximized state persist across launches
- Chat search hides non-matching threads
- Model loading handles mid-refresh cancellation cleanly
- Cleaner rendering for generated image frames and Python tool code blocks
What's Changed
- Bump install.sh / install.ps1 pin to unsloth>=2026.5.9 by @danielhanchen in #5904
- Update README.md by @danielhanchen in #5905
- Use remove-circle icon for eject model button by @shimmyshimmer in #5906
- Studio: add HTML artifacts to chat by @wasimysaid in #5772
- Use natural generated image frame by @wasimysaid in #5794
- Studio: defer the Windows CUDA Toolkit check so prebuilt users are not blocked by @danielhanchen in #5912
- Studio CI: tolerate transient artifact-upload flakes on diagnostic log steps by @danielhanchen in #5913
- Studio: match the Linux llama.cpp prebuilt to the runtime cudart major by @danielhanchen in #5914
- Studio: extend the pinned Blackwell GPU fallback to CUDA 13.0 drivers by @danielhanchen in #5920
- Studio: source-build arm64 Linux GPU hosts, with a CPU prebuilt fallback by @danielhanchen in #5924
- Studio CI: run the Resolve-CudaToolkit unit test in the Windows GGUF job by @danielhanchen in #5921
- Studio: forward the resolved AMD gfx arch to the prebuilt installer by @danielhanchen in #5923
- Studio CI: tolerate transient runner crashes in the llama-server log collection step by @danielhanchen in #5925
- Studio: forward --has-rocm from setup.sh when gfx resolution fails by @LeoBorcherding in #5927
- Studio: cover B300 (sm_103) with the Linux prebuilt bundles by @danielhanchen in #5930
- Studio: move MCP to a composer button with presets by @danielhanchen in #5926
- Studio: clearer MCP server validation when stdio is disabled by @oobabooga in #5928
- Bump install.sh / install.ps1 pin to unsloth>=2026.5.10 by @danielhanchen in #5931
- Studio: support connected models in compare mode by @Imagineer99 in #5824
- Studio: manage chat history with projects by @Imagineer99 in #5725
- fix(chat): refine artifact panel styling by @wasimysaid in #5922
- studio/frontend: pad Python tool code block to fix corner clipping by @oobabooga in #5938
- Studio: polish model load toast styling by @Imagineer99 in #5648
- Studio: optimize chat streaming by batching renders to one per animation frame by @oobabooga in #5788
- Raise Studio upload limits by @wasimysaid in #5808
- Studio: persist Tauri window size and maximized state across launches by @oobabooga in #5799
- Guard model-load success path against mid-refresh cancellation by @rolandtannous in #5944
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #5935
- feature/dataset_none_detect.py adds empty content detection for conversation datasets by @LeoBorcherding in #4438
- studio/chat: hide non-matching threads in chat search (#5572) by @wtfashwin in #5651
- Update vulnerable dependencies to patched versions by @danielhanchen in #5970
- fix(studio): make the reset-password hint work on Windows, macOS, and Linux by @danielhanchen in #5971
- Document install env vars in README advanced launch options by @danielhanchen in #5972
- Update Install Scripts by @danielhanchen in #5968
- Logging cleanup by @danielhanchen in #5973
- studio: redesign chat composer by @shimmyshimmer in #5891
- fix(studio): don't double-quote the reset-password hint for paths with spaces by @danielhanchen in #5975
- Patch other imports of trainer and config by @Datta0 in #5946
- Fix UnicodeEncodeError when printing emoji on legacy Windows consoles by @danielhanchen in #5948
- Fix macOS Apple Silicon installs resolving torch against x86_64 by @danielhanchen in #5976
Full Changelog: v0.1.43-beta...v0.1.44-beta
CUDA 13.3, Windows, Mac update
To update Unsloth or install a new Unsloth Studio, you must use:
macOS, Linux, WSL:
curl -fsSL https://unsloth.ai/install.sh | sh
Windows:
irm https://unsloth.ai/install.ps1 | iex
Warning
DO NOT USE unsloth studio update since packaging will not get the latest updates
Mac Updates
- Re-enabled
llama.cppprebuilt binaries for Apple Silicon (M1-M4) - Mac OS 14 / 15 / 26 (Tahoe) - Apple Silicon Mac OS 13 (Ventura) is source build
- Intel (x86_64) for Mac OS 13.3 / 14 / 15 / 26 (Tahoe) uses
llama.cppprebuilt binaries - Intel for Max 13.0 - 13.2 is source build
Windows Updates
- CUDA 13.3
llama.cppprebuilt binaries now work for Windows - For CUDA 13.2, CUDA 13.1 and below, Windows devices uses CUDA 12.4 fallback - we'll work on CUDA 13.1 binaries soon.
CUDA 13.3 Update
- CUDA 13.3 non Linux binaries work. We'll still use CUDA 13.1 for now
- CUDA 13.3 solves the CUDA 13.2 gibberish problem - see #4849
Blackwell GPUs Update
- For now Blackwell will have delayed releases of
llama.cppprebuilt binaries sine CUDA 12.4 does not work - we are working to resolve this soon.
What's Changed
- Bump install.sh / install.ps1 pin to unsloth>=2026.5.8 by @danielhanchen in #5791
- Studio: expose --parallel / -np flag on
unsloth studio runby @danielhanchen in #5737 - tests: unblock three stale assertions broken on main (MLX CI + Backend CI) by @danielhanchen in #5803
- ci: install unsloth_zoo from git main in notebooks-ci + studio-backend-ci by @danielhanchen in #5802
- tool mask support by @Datta0 in #5682
- Studio: add frontend i18n support by @penumbrazz in #5765
- Studio: unblock install on Linux ARM64 + Windows ARM64 + Intel Mac by @danielhanchen in #5790
- Harden Linux llama.cpp prebuilt reuse by @mmathew23 in #5796
- Studio: expose image size setting in training UI by @Dariton4000 in #5743
- Studio: add configurable CPU thread pool limit by @alkinun in #5760
- Trim README Advanced launch options blurb by @danielhanchen in #5809
- [RL] make sync weights conditional by @Datta0 in #4925
- Studio: add Gemini provider with web_search, code_execution, prompt caching, and Nano Banana image generation by @danielhanchen in #5720
- Studio: add remote MCP server support by @NilayYadav in #5750
- Clear MRoPE after generation for GRPO by @Datta0 in #5683
- Detect CUDA UMD Version from newer nvidia-smi output (fixes #5812) by @danielhanchen in #5817
- Studio: remove dark mode upload circle by @Imagineer99 in #5813
- fix: honor --ctx-size and other forwarded args from
unsloth studio runin Studio's context-fit logic by @mvanhorn in #5815 - Fix non-streaming GGUF chat completion usage by @alkinun in #5781
- Fix/studio colab proxy and iframe - Unsloth Studio not loading in Colab (iframe "refused to connect" and wrong URL) by @LeoBorcherding in #5844
- Studio: keep web search/code pills off on model load if user disabled them by @rolandtannous in #5851
- studio/setup.sh: cope with fresh CUDA toolkits like 13.3 by @danielhanchen in #5826
- fix/strix halo and windows AMD ROCm support by @LeoBorcherding in #5301
- Version downpairing for Radeon+ROCm PyTorch wheels by @jaeiclee in #5353
- feature/use lemonade-sdk llamacpp-rocm binaries by @LeoBorcherding in #5303
- studio: ROCm cleanups follow-up to #5301 by @danielhanchen in #5874
- fix: wrong code block language for nightly Windows install by @mervivian in #5877
- fix: bump openssl, rand, rustls-webpki, tar in Studio src-tauri for security advisories by @orbisai0security in #5866
- Fix error when context-resizing by @afsuyadi in #5860
- Update Studio sidebar logout icon to logout-05 by @shimmyshimmer in #5878
- studio: pick a macOS llama.cpp prebuilt that loads on the host OS by @danielhanchen in #5883
- studio: make NVIDIA prebuilt selection track CUDA version bumps (Windows + Linux) by @danielhanchen in #5879
- Remove dead top-level bitsandbytes import from models/_utils.py by @Rahul007007 in #5875
- ci(security-audit): make package installs network-resilient by @danielhanchen in #5853
- Studio: add stdio MCP server support by @oobabooga in #5863
- studio/frontend: fix MCP dialog overflow on long URLs by @oobabooga in #5864
- Studio: clearer error for diffusion GGUFs loaded as chat models by @danielhanchen in #5857
- Studio: harden stdio MCP gating and fix transport edge cases by @danielhanchen in #5892
- Studio: fix phantom scroll area below messages with many sources by @danielhanchen in #5822
- Studio: pin the last pre-macOS-26 llama.cpp prebuilt instead of walking back by @danielhanchen in #5896
- fix(install): detect x86_64 Python venv on Apple Silicon and rebuild as arm64 by @ramankrishna in #5187
New Contributors
- @penumbrazz made their first contribution in #5765
- @Dariton4000 made their first contribution in #5743
- @NilayYadav made their first contribution in #5750
- @mvanhorn made their first contribution in #5815
- @jaeiclee made their first contribution in #5353
- @mervivian made their first contribution in #5877
- @orbisai0security made their first contribution in #5866
- @afsuyadi made their first contribution in #5860
- @Rahul007007 made their first contribution in #5875
- @ramankrishna made their first contribution in #5187
Full Changelog: v0.1.42-beta...v0.1.43-beta
An Update before Revamp!
Hey guys, we're doing one more-ish update before a major revamp which is likely coming this week or next week. Our revamp will change a lot of things, especially with new major features and a lot of design changes.
- NEW: API calling support now with image generation + editing, proper web search, code execution, auto prompt caching. Connect OpenAI, Anthropic and more.
unsloth studio updateshould now work properly. For Mac users, use the install curl command instead:curl -fsSL https://unsloth.ai/install.sh | sh- Proper support for non-English languages e.g. Japanese, Chinese, Indian etc.
apicalling.unsloth.mp4
Many of you may have missed our previous release which only lasted for one day. We introduced:
- Connect to external inference backends: vLLM, Ollama, llama-server
- Security improvements
- Auto MTP speculative decoding for MTP GGUFs; get the best settings customized for your hardware.
API provider calling & external connections
- You can now connect Unsloth to any API cloud provider (OpenAI, Anthropic, OpenRouter etc.)
- Built-in web search for OpenAI, Anthropic, OpenRouter and Kimi
- Built-in code execution for OpenAI and Anthropic (Anthropic containers persist and are reused across turns)
- Prompt caching is enabled for OpenAI and Anthropic models saving 50 to 90% of costs.
- Image generation + editing
- API key now optional for local providers (llama.cpp / vLLM / Ollama)
- Auto-load models when adding a cloud provider
Other Unsloth Studio updates
- OpenDocument chat attachments
- o3 reasoning summary payload
- Sending/prompting non-English languages (e.g. Japanese, Chinese) now works properly
- IME composer hardening, RTL
dir="auto", long log-line truncation fix - Tool reasoning trace rendering in UI
- Fully offline support: cached GGUF discovery and offline DNS auto-detect for both inference and training
Unsloth Studio security improvements
- Authentication rate-limiting, proxy-aware so reverse proxies don't bypass it
- Sandboxed worker with a tightened blocklist (bash,
hf upload,NOFILE) - Path containment so workers can't escape their in-flight tmp dirs
- Strict schema validation across the Studio API
- Tightened CSP / security headers (only legitimate favicon hosts allowed)
- Removed the
torch.loadfallback ontraining_args.binso untrusted pickles can never execute on model load - Hardened Tauri desktop release flow
- Frontend auth: singleflight token refresh, current-password input on changes, working logout, shared 422 helper
- Cancel cleanup now scoped strictly to in-flight tmp dirs so it can never delete user state
What's Changed
- install: bump unsloth floor to >=2026.5.5 by @danielhanchen in #5621
- Studio: persist chat toggles and preserve custom sampling by @Imagineer99 in #5587
- Studio: add connections toggle and order hosted providers by @Imagineer99 in #5588
- studio/ci: harden three pre-existing CI flakes by @danielhanchen in #5627
- Studio: expand Connections model picker for local inference server by @Imagineer99 in #5643
- Move uninstall scripts into scripts/ and fix references by @danielhanchen in #5644
- Studio: provider model loading controls by @Imagineer99 in #5645
- Studio: unify connection copy by @Imagineer99 in #5654
- Remove dead chat suggestions wiring by @alkinun in #5665
- Studio: Claude Code Anthropic API tool compatibility by @Imagineer99 in #5390
- studio/frontend: friendlier 404 fallback for unknown routes by @danielhanchen in #5664
- chore(deps): bump the npm-oxc-validator group across 1 directory with 2 updates by @dependabot[bot] in #5667
- studio/chat: Disable send button for empty composer by @harryfrzz in #5647
- studio/frontend: correct Think pill aria-label before model loads by @danielhanchen in #5655
- studio/frontend: fix onboarding CSP violations by @danielhanchen in #5658
- studio/frontend: set per-route document.title by @danielhanchen in #5660
- Fix Windows Tauri build and signing by @wasimysaid in #5694
- Respect GC for GRPO by @Datta0 in #5269
- studio: unblock /load event loop on detect_audio_type (#5642, #5635) by @danielhanchen in #5669
- studio: settle GPU VRAM after killing llama-server before the next reload by @danielhanchen in #5693
- Studio: surface prompt-cache token counts in /v1/chat/completions usage chunk by @danielhanchen in #5670
- Studio: per-model Anthropic server-side tool versions by @danielhanchen in #5679
- Studio: support Anthropic 1h cache TTL via prompt_cache_ttl by @danielhanchen in #5685
- Studio: wire OpenAI image_generation tool by @danielhanchen in #5688
- Studio: per-session cost calculator + /api/providers/pricing endpoint by @danielhanchen in #5690
- Studio: wire Anthropic web_fetch server-side tool by @danielhanchen in #5671
- Studio: persist chat history in backend storage by @Imagineer99 in #5272
- Studio: wire Anthropic server-side context compaction by @danielhanchen in #5686
- Studio: wire OpenAI Responses server-side context compaction by @danielhanchen in #5687
- Studio: PDF / document attachments for Anthropic + OpenAI by @danielhanchen in #5689
- Studio: reconcile external providers across browsers after delete by @danielhanchen in #5698
- Studio: persist external provider selection across page refresh by @danielhanchen in #5697
- Studio: add Anthropic and OpenAI prompt guards for disabled tools by @Imagineer99 in #5674
- Studio: persist external checkpoint when picker uses setParams by @danielhanchen in #5700
- Studio: surface OpenAI image_generation as composer Images pill by @danielhanchen in #5699
- Studio: expose Anthropic 5m vs 1h prompt cache TTL in Configuration by @danielhanchen in #5703
- Fix connected chat model selection after refresh by @wasimysaid in #5702
- Studio: render OpenAI image_generation results inline in chat by @danielhanchen in #5705
- Truncate long code execution tool output by @wasimysaid in #5708
- fix(gpt-oss): prefer flex attention over sdpa by @Datta0 in #5701
- Bump install.sh / install.ps1 pin to unsloth>=2026.5.6 by @danielhanchen in #5716
- ci: unblock Studio Windows + Linux + Mac smoke (supersedes #5733, #5734, #5738) by @danielhanchen in #5741
- ci: broaden Linux + narrow Windows llama.cpp runtime patterns + trim #5741 comments by @danielhanchen in #5746
- Lower default RL weight_decay from 0.01 to 0.001 for LoRA by @danielhanchen in #5747
- Studio: strip orphan tool_call XML leaking into visible content by @danielhanchen in #5735
- Bump install.sh / install.ps1 pin to unsloth>=2026.5.7 by @danielhanchen in #5753
- Fix MLX Studio base model export save method by @Lyxot in #5727
- fix(chat_templates): check find() return value before slicing on placeholders by @Ricardo-M-L in #5763
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #5773
- Studio: stop seeded admin to cross-origin callers by @danielhanchen in #5739
- Studio: surface external-provider cache hits and writes in context bar by @danielhanchen in #5736
- Studio: Anthropic fast_mode toggle and streaming refusal handling by @danielhanchen in #5715
- Studio: rewrite OpenAI Responses citation markers to markdown links by @danielhanchen in #5713
- Studio: standalone Fetch pill for Anthropic web_fetch by @danielhanchen in https://github.com/unslothai/unslo...
MTP + Studio fixes
Lots of bug fixes, UI, UX fixes to Studio!
To get the latest updates do:
macOS, Linux, WSL:
curl -fsSL https://unsloth.ai/install.sh | shWindows:
irm https://unsloth.ai/install.ps1 | iexFixes
- Fix
unsloth studio updatenot working well - Fix getting stuck on
reset-passwordpage - More offline mode support
- Improve MTP not being faster on Macs, CPUs and GPUs - now it's much better!
- Fix Desktop Shortcut not working after update
- Many many UI UX bug fixes
What's Changed in Unsloth
- install scripts: bump unsloth pin to >=2026.5.3 by @danielhanchen in #5557
- studio: engage draft-mtp on vision MTP GGUFs (drop incorrect vision gate) by @danielhanchen in #5560
- install scripts: bump unsloth pin to >=2026.5.4 by @danielhanchen in #5566
- Studio: derive Playwright default model expectation by @Imagineer99 in #5589
- studio: read Playwright default model from defaults.py without importing it by @danielhanchen in #5595
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #5586
- studio: fix toast close-button click and light-mode hover by @shimmyshimmer in #5597
- fix(loader): honour HF_HUB_OFFLINE/TRANSFORMERS_OFFLINE env vars by @xodn348 in #5598
- studio: emit one comma-chained --spec-type for CPU/Mac MTP path by @danielhanchen in #5575
- Fix loss function not patched for Qwen3.5 models by @rycerzes in #5442
- Fix GGUF multi-image chat handling by @alkinun in #5508
- Expose standalone tool healing utilities by @alkinun in #5583
- studio/frontend: reconcile stale must_change_password localStorage flag by @danielhanchen in #5576
- studio/frontend: cap auto-load cascade attempts by @danielhanchen in #5578
- studio: regenerate desktop launcher on
unsloth studio update(macOS + Linux + Windows) by @danielhanchen in #5577 - ci: advisory lockfile supply-chain audit (no install-script changes) by @danielhanchen in #5604
- loader: import FORCE_FLOAT32 from unsloth_zoo (single source of truth) by @danielhanchen in #5610
- fix(peft): expose finetune_last_n_layers for parity with mlx-lm CLI by @danielhanchen in #5564
- studio: add --spec-draft-n-max toggle for MTP speculative decoding by @danielhanchen in #5582
- studio: reserve VRAM headroom for the MTP draft cache in auto-fit by @danielhanchen in #5585
- Studio: tools, thinking blocks, code execution and web search for safetensors by @danielhanchen in #5520
- studio/frontend: widen settings sidebar so 'Connections' label fits by @danielhanchen in #5607
- studio: respect prefers-reduced-motion across animations by @danielhanchen in #5611
- studio/frontend: guard message-timing badge against unphysical tok/s by @danielhanchen in #5570
- studio/web: differentiate offline from backend-down in fetch error by @danielhanchen in #5591
- studio/frontend: keep theme classes mutually exclusive on by @danielhanchen in #5580
- studio/frontend: show Loading fallback instead of blank pane on lazy route navigation by @danielhanchen in #5568
- studio/frontend: include filename in attachment aria-label + img alt by @danielhanchen in #5594
- studio/frontend: compare composer blocks send when no model picked by @danielhanchen in #5574
- studio: restore focus to opener when settings dialog closes by @danielhanchen in #5612
- studio/frontend: add aria-label to Dictate / Stop dictation buttons by @danielhanchen in #5599
- studio/frontend: settings dialog fits viewport at tablet widths by @danielhanchen in #5600
- studio/frontend: show Generation stopped placeholder when cancelled mid-thinking by @danielhanchen in #5565
- studio: tool calling for Llama-3, Mistral, Gemma 4 on safetensors + MLX by @danielhanchen in #5615
- Revert "studio: tool calling for Llama-3, Mistral, Gemma 4 on safetensors + MLX (#5615)" by @danielhanchen in #5619
What's Changed in Unsloth Zoo:
- Honour offline env vars when enabling hf_transfer by @danielhanchen in unslothai/unsloth-zoo#675
- Fix Studio q2_k_l GGUF export and new llama.cpp converter package layout by @danielhanchen in unslothai/unsloth-zoo#667
- fix(mlx): match mlx-lm batch padding rule (1 + pad_to * ceil) by @danielhanchen in unslothai/unsloth-zoo#672
- fix(mlx): expose finetune_last_n_layers for parity with mlx-lm CLI by @danielhanchen in unslothai/unsloth-zoo#669
- fix(mlx): honor max_grad_value=None as a disable signal by @danielhanchen in unslothai/unsloth-zoo#671
- fix(mlx): seed mx.random immediately before linear_to_lora_layers (re-PR of #674) by @danielhanchen in unslothai/unsloth-zoo#678
- fix(mlx): warn on bf16 -> fp16 downcast in FastMLXModel loader by @danielhanchen in unslothai/unsloth-zoo#670
- fix(mlx): make_baseline_loss_fn byte-identical to mlx-lm default_loss when labels=None by @danielhanchen in unslothai/unsloth-zoo#673
- Fix Q2_K_L recipe: q2_k base + ffn_down=Q3_K, output=Q6_K, embed=Q4_K by @danielhanchen in unslothai/unsloth-zoo#677
Full Changelog: v0.1.40-beta...v0.1.41-beta
Qwen3.6 MTP and API / Connections
We've got lots of new updates. Please use the latest Unsloth v0.1.405-beta, not v0.1.40-beta which is older.
- ~2x faster GGUF inference with automatically enabled MTP
- API calling support for OpenAI, Anthropic etc. with auto prompt caching, web search, code execution
- Connect to external inference backends: vLLM, Ollama, llama-server
- Experimental MLX inference
- Proper support for non-English languages
- Security improvements
MTP speculative decoding support 1.4 to 2x faster inference!
- Auto MTP speculative decoding for MTP GGUFs; warn when the bundled llama.cpp prebuilt is stale or too old for MTP
- New pre-built llama.cpp binaries for MTP support!
API provider calling & external connections
- You can now connect Unsloth to any API cloud provider (OpenAI, Anthropic, OpenRouter etc.)
- Built-in web search for OpenAI, Anthropic, OpenRouter and Kimi
- Built-in code execution for OpenAI and Anthropic (Anthropic containers persist and are reused across turns)
- Prompt caching is enabled for OpenAI and Anthropic models saving 50 to 90% of costs.
- API key now optional for local providers (llama.cpp / vLLM / Ollama)
- Auto-load models when adding a cloud provider
MLX inference (Experimental)
- MLX quants and models now can run locally on your Mac machines!
- We'll be adding thinking, tools and web search soon!
Other Unsloth Studio updates
- OpenDocument chat attachments
- o3 reasoning summary payload
- Sending/prompting non-English languages (e.g. Japanese, Chinese) now works properly
- IME composer hardening, RTL
dir="auto", long log-line truncation fix - Tool reasoning trace rendering in UI
- Fully offline support: cached GGUF discovery and offline DNS auto-detect for both inference and training
- Lots of UI/UX polish: dark theme refactor, right sidebar redesign, time-of-day sloth mascot, dismissable copyable toasts, larger chat composer, code-execution config polish, composer action pill styling, narrower Discord button
Training updates
- Gemma attention mask fixes
- Multi Image GRPO
- GRPO hidden-state return experiments
- New Continued Pretraining (CPT) training method as a first-class option
- Gemma-4 MoE LoRA extractor registered to fix
grouped_mmcontraction crash - Opt-in fused
lm_head+ cross-entropy forward, with single-matmul path underUNSLOTH_RETURN_LOGITS=1 - Pass batch size for eval
- Eval/training paths now honour
HF_DATASETS_OFFLINEalongsideHF_HUB_OFFLINE
Unsloth Studio security improvements
- Authentication rate-limiting, proxy-aware so reverse proxies don't bypass it
- Sandboxed worker with a tightened blocklist (bash,
hf upload,NOFILE) - Path containment so workers can't escape their in-flight tmp dirs
- Strict schema validation across the Studio API
- Tightened CSP / security headers (only legitimate favicon hosts allowed)
- Removed the
torch.loadfallback ontraining_args.binso untrusted pickles can never execute on model load - Hardened Tauri desktop release flow
- Frontend auth: singleflight token refresh, current-password input on changes, working logout, shared 422 helper
- Cancel cleanup now scoped strictly to in-flight tmp dirs so it can never delete user state
Bug fixes and correctness
- Layout-aware MoE LoRA merge with loud-fail on fallback (no more silent wrong saves)
num_logits_to_keepregression fixed on transformers >= 4.52- Preserve tokenizer EOS token on merged saves
- Resume PEFT checkpoints under sentence-transformers >= 5.4
- Restore Flash > SDPA > Flex attention priority for non-Gemma3 models
- ORPO text-only tokenization now works with processors
- Embedding matrix size mismatch fix
- Vicuna chat template fix
fast_generateunifies legacy and new logits kwargs (fixes Mistral merge site)higher_precision_softmaxmade idempotent- Patch every
LOSS_MAPPINGkey aliased toForCausalLMLoss(covers transformers 5.x) - GGUF converter sibling imports fixed
- UTF-8 encoding added to all text-mode file operations
- Serialise GGUF reload and inherit
unsloth-runextra args - Fix
/recommended-folders500 on unreadable model directories under Python 3.12+ - Cross-family GGUF projector blocked in flat local dirs (no more wrong-vision-tower loads)
Installer and platform reliability
- Custom install paths via
STUDIO_HOME/UNSLOTH_STUDIO_HOME - CPU-only Linux x86_64 routed to
ggml-org/llama.cppprebuilts - Windows CUDA install fixes: paired
cudartbundle and Torch NVIDIA DLL paths added toPATH - Skip
flash-attninstall on Blackwell GPUs (sm_100+) - Refresh Intel XPU extras for torch 2.7.1 / 2.9.1 / 2.10 / 2.11.0 / 2.12.0; torch upper cap raised to <2.13.0
- HIP source builds on Ubuntu 24.04 now inject
--gcc-install-dir - Linux prebuilt fixes for branch-based llama.cpp releases (mangled symlink repair, top-level dir strip)
- New uninstallers for Linux, macOS (
uninstall.sh) and Windows (uninstall.ps1) - Mac desktop shortcut spawning and lifecycle fixed
unsloth --versionflag- Studio web update banner and release version display
- GPU pinned at 95% headroom, with a warning on silent CPU fallback
- Auto-install flash-linear-attention and tilelang for Qwen3.5 family
What's Changed in Unsloth
- Bump installer floor to 2026.5.2 by @danielhanchen in #5297
- install: support STUDIO_HOME / UNSLOTH_STUDIO_HOME for custom install paths by @danielhanchen in #5190
- Route CPU-only Linux x86_64 to ggml-org/llama.cpp prebuilts by @danielhanchen in #5302
- feat(studio): MLX training tab on Apple Silicon (LoRA / full FT, VLM, export) by @Manan17 in #5265
- feat(studio): add Continued Pretraining (CPT) as a training method by @OnePunchMonk in #4677
- Fix 14 stale tests under tests/studio/install/ that drifted from code by @danielhanchen in #5305
- Add Studio PR-time CI: pin enforcement, frontend, backend, wheel smoke by @danielhanchen in #5298
- Studio: restore Studio API and Help menu UI by @Imagineer99 in #5310
- [studio]: Fix tool reasoning trace in UI by @CodeMan62 in #5314
- fix: 3 patch_* helpers — fast_lora import, sft_trainer Union, openenv OSError by @danielhanchen in #5319
- Studio: API settings overflow with long Colab URLs by @Imagineer99 in #5286
- tests/studio/install: parallel UNSLOTH_STUDIO_HOME smoke test by @danielhanchen in #5306
- Studio: Dark theme refactor, right sidebar redesign, and chat UI polish by @Imagineer99 in #5150
- fix: harden Studio IME composer sends by @Etherll in #5327
- Studio: stop truncating long log lines as suspected base64 by @rolandtannous in #5335
- fix(gh_client): fail fast on 401/403 auth errors instead of retrying forever (#5325) by @Anai-Guo in #5329
- fix: unblock 4 tests deselected/skipped in #5312 (real bugs) by @danielhanchen in #5359
- fix(tests/sh): accept pinned tokenizers line after #5359 by @danielhanchen in #5361
- CI: scope GITHUB_TOKEN permissions, add MLX CI, unblock ~60 skipped tests by @danielhanchen in #5312
- studio/tests: make Playwright model-selector probe best-effort by @danielhanchen in #5371
- Studio: download paired cudart bundle on Windows CUDA installs by @danielhanchen in #5322
- Studio: add torch's pip nvidia DLL dirs to PATH on Windows by @danielhanchen in #5324
- studio: authenticate HF downloads across Studio CI workflows by @danielhanchen in #5370
- dependabot: group security updates and cover /studio/frontend npm advisories by @danielhanchen in #5372
- Add Studio web update banner and release version display by @wasimysaid in #5308
- ci/install: retry transient github.com 5xx on unsloth-zoo git fetches by @danielhanchen in #5389
- studio/ci: pre-install lockfile supply-chain audit (npm + cargo) by @danielhanchen in #5392
- studio/ci: npm tarball content scanner (no-install, hostile-input safe) by @danielhanchen in #5393
- studio/tests: AbortSignal-bound in-page fetches and wall-clock watchdog for Playwright probes by @danielhanchen in #5391
- chore: remove unused .semgrep/unsloth-rules.yml by @danielhanchen in #5395
- studio/ci: sweep actions/cache v5 hardening across sibling smoke workflows by @danielhanchen in #5399
- studio/ci: harden HF_HOME cache against actions/cache v5 silent restore failures by @danielhanchen in https:...
New Unsloth API Inference Endpoint
v0.1.39-beta bug fix
May 5th 2026 Fixes chat history not being shown (existing chat history is not lost) and attachments not attaching correctly. The bug was render-only - use 2026.5.2 or directly call curl -fsSL https://unsloth.ai/install.sh | sh or unsloth studio update to update
Run local LLMs with tools like Claude Code and Codex by connecting them to Unsloth’s API endpoint. This lets you run models like Qwen and Gemma locally, with additional features such as self-healing tool calling, code execution, and web search. Unsloth makes it easy to deploy a fast API inference endpoint that provides:
- Self-healing tool calling, which helps reduce broken or malformed tool calls by 50%
- Code execution support, allowing Bash and Python execution for more accurate code outputs.
- Advanced Web search that visits and actually reads webpages to gather in-depth info.
- Automatic inference settings for GGUF models (temp, top-k etc.)
Models loaded in Unsloth (including GGUFs) are exposed as an authenticated API via llama-server. A long API key is generated for security reasons like how OpenAI provides one. Your local models can then be used directly in your preferred AI agent, SDK, or chat client. Unsloth speaks two dialects on the same port:
- Anthropic-compatible
/v1/messagesfor Claude Code, OpenClaw, the Anthropic SDK, and any client that expects the Messages API. - OpenAI-compatible
/v1/chat/completionsand/v1/responsesfor the OpenAI SDK, OpenCode, Cursor, Continue, Cline, Open WebUI, SillyTavern, and any OpenAI-compatible tool. - Both support streaming, tool calling (OpenAI tools / Anthropic tools), and vision inputs.
New models
We've also got a handful of new models to run including NVIDIA Nemotron 3 Nano Omni, IBM Granite 4.1 and Mistral 3.5 Medium. We helped Mistral solve some issues with implementation in transformers and GGUFs.
Unsloth Updates
- Stopped Studio training runs can now resume from checkpoints.
- Chat threads now autosave and persist more reliably.
- DPO training hangs in multi-process setups were fixed.
- VLM GRPO support improved with MROPE updates.
- Studio’s stop button now properly stops generation.
- Fix chat template disappearing after browser refresh
What's Changed in Unsloth
- Studio: use (gguf) context length before max seq length by @G07cha in #5111
- chore: fix typo cleanup across tests and backend strings by @luojiyin1987 in #5152
- fix: guard resolve_model_class fallback against unresolvable transformers AutoModel entries by @Etherll in #5155
- Studio: kill in-flight llama-server before spawning a new one by @danielhanchen in #5171
- Studio: stop currency escape from breaking inline LaTeX by @danielhanchen in #5170
- Studio: probe AMD GPUs in llama-server VRAM detection by @danielhanchen in #5172
- Studio: make stop button actually stop generation by @danielhanchen in #5069
- Studio: add github_repo seed reader and GitHub Support Bot recipe by @danielhanchen in #5169
- fix(studio): use endswith for mmproj F16 variant selection by @LeoBorcherding in #5184
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #5204
- Fix Windows install when paths contain spaces or Python 3.14 is on PATH by @Etherll in #5201
- Studio: Preserve transparency in uploaded profile avatars by @Imagineer99 in #5200
- UX: single chat header error placement and selector alignment by @Imagineer99 in #5173
- Studio: Refine chat preset and group built-in presets by @Imagineer99 in #5159
- Studio: Fix image-only chat requests failing validation by @Imagineer99 in #5212
- Studio: fix 7 failing studio_unit_tests on main by @danielhanchen in #5216
- Patch checkpoint reload init functions to strip unsupported args by @Datta0 in #5167
- Studio: Fix clipped model selector text descenders by @Imagineer99 in #5210
- Fix DPO trainer multi process hang by @Datta0 in #5199
- Studio: Pin assistant-ui core for fresh installs by @Imagineer99 in #5229
- Fix local model scanner to handle ollama cloud models by @Anish9901 in #5220
- Fix Studio desktop tray installer and titlebar and bux fixes by @wasimysaid in #5179
- MROPE for VLM GRPO by @Datta0 in #5198
- install: overlay unsloth-zoo from git main on --local by @rolandtannous in #5242
- Studio: Fix chat template disappearing after browser refresh by @Imagineer99 in #5209
- studio: add --local to setup.sh + overlay unsloth-zoo from git main by @rolandtannous in #5252
- Fix/windowsprebuilt by @mmathew23 in #5241
- Studio: Add dataset upload dropzone and update preserve think copy by @Imagineer99 in #5253
- Add Qwen3.6 support by @rolandtannous in #5257
- Studio: Chat thread autosave persistence by @Imagineer99 in #5256
- Studio: Enable deleting fine-tuned chat models by @Imagineer99 in #5234
- Studio: Add checkpoint resume for stopped training runs by @Imagineer99 in #5255
- Studio: Polish spacing and profile input radius by @Imagineer99 in #5222
- Fix check for libcurl headers in install.sh by @LFd3v in #5251
- Default Studio host to 127.0.0.1 and prompt before auto-start by @rolandtannous in #5267
- Studio: forward llama-server args from
unsloth studio run, activateunsloth run, and allow passing model:quant to load models by @rolandtannous in #5271 - Studio: Always show API usage examples and docs links by @Imagineer99 in #5270
- Studio: Change API Keys settings to API Access by @Imagineer99 in #5268
- unsloth run: add --enable-tools/--disable-tools server-side tool policy by @rolandtannous in #5277
- fix: use % 8 instead of // 8 in FP8 weight shape check by @Ricardo-M-L in #5243
- Pin Studio GGUF export to llama.cpp's local convert script by @mmathew23 in #5275
- fix KVCache estimates for gemma4 style sliding window models by @Datta0 in #5225
- Update VRAM estimator to cater to broader model configs by @Datta0 in #5175
- Fix FastSentenceTransformer loading with newer sentence-transformers by @Etherll in #5259
- Studio: Preserve chat history during autosave by @Imagineer99 in #5278
What's changed in Unsloth-Zoo
- Fix fused CE grad scaling under DDP by @danielhanchen in unslothai/unsloth-zoo#434
- Fused CE backward: guard scaling=0, drop tensor path, use out-of-place mul by @mmathew23 in unslothai/unsloth-zoo#610
- Fix/gemma4moefix by @mmathew23 in unslothai/unsloth-zoo#612
- MROPE for VLM GRPO by @Datta0 in unslothai/unsloth-zoo#614
- Double-buffer GPU activations for overlapping H2D copy with backward compute by @ruixiang63 in unslothai/unsloth-zoo#534
- fix(temporary_patches/utils): add missing comma in all (raise_error / Unpack) by @Anai-Guo in unslothai/unsloth-zoo#617
- Fix qwen lora extractor for diff peft versions by @Datta0 in unslothai/unsloth-zoo#618
- fix: use backend device type in GGUF merge path by @andomeder in unslothai/unsloth-zoo#615
- Add unsloth_compiled_cache to gitignore by @Datta0 in unslothai/unsloth-zoo#622
- Allow local convert_hf_to_gguf.py via UNSLOTH_LLAMA_CPP_SCRIPTS_DIR by @mmathew23 in https://github.com/unslothai/unsloth-zoo/pull...
New UI Redesign + Qwen3.6
Hey guys, we revamped the entire Unsloth Studio UI and UX experience to put an emphasis on chat and training:
- Added a collapsible sidebar based on community feedback

- You can now delete chats and search past conversations

- New Preserve Thinking toggle for models that support it like Qwen3.6
- Cleaner, more consistent design with easier navigation
- Expanded Settings page with options to change your profile picture, name, and more

- No more entering your Hugging Face token twice
- gpt-oss now has low, medium and high thinking toggles.
- Now uses latest llama.cpp prebuilt, even on Linux CUDA
- Lots of bug, consistency and stability fixes
- Kimi-K2.6 can now be run!
- We also added experimental API support. Guides, announcement etc will come next week.
Qwen3.6 was also also previously already supported in Unsloth Studio for running and training. You can train and run Qwen3.6-27B right now!
What's Changed
- Only run ldconfig CUDA-linking recovery when we have permission by @danielhanchen in #4930
- Fix Mistral DPO/preference training crash on non-xformers platforms (e.g. Intel XPU) by @cheehook in #4889
- Fix raw text paragraph break normalization by @kiankyars in #4884
- Studio: keep chat input visible and fix compare pane clipping by @Imagineer99 in #4924
- fix: check find() return value before adding offset in try_fix_tokenizer by @Ricardo-M-L in #4923
- updated models template mappers. added lfm2.5vl450m to transformers 5… by @rolandtannous in #4939
- Revert "updated models template mappers. added lfm2.5vl450m to transformers 5…" by @rolandtannous in #4945
- Add AMD ROCm/HIP support across installer and hardware detection by @danielhanchen in #4720
- Pin bitsandbytes to continuous-release_main on ROCm (4-bit decode fix) by @danielhanchen in #4954
- Fix Gemma-4 GRPO catastrophic KL divergence with TRL 1.0.0+ by @danielhanchen in #4934
- Add ROCm test suite (companion to #4720) by @danielhanchen in #4824
- updating gemma4 script by @Manan17 in #4992
- Move gemma4 script by @Manan17 in #4994
- studio: fix route transition DOM duplication via AnimatePresence mode="wait" by @AdamPlatin123 in #4987
- Studio: Prompt manager, message deletion, and chat UI improvements by @Imagineer99 in #4938
- Pin kernels==0.12.1 to fix training import failure by @rolandtannous in #5000
- Studio: Expose openai and anthropic compatible external API end points by @danielhanchen in #4956
- studio: skip training status/metrics polling when idle by @AdamPlatin123 in #4988
- studio: fix api-keys access + refresh by @wasimysaid in #5005
- Studio: Polish API key copy button and harden async clipboard fallback by @Imagineer99 in #5006
- fix(studio): default chart view to full training history by @Barath19 in #5007
- [Studio] Show non exported models in chat UI by @Datta0 in #4892
- [Studio] Install flash attn at setup time for linux by @Datta0 in #4979
- fix(studio): remove 300s cap on load_checkpoint (inherits 3600s default) by @TF-MTGE in #4922
- Studio: honor explicit GGUF ctx and default to 4096 when weights exceed VRAM by @danielhanchen in #5011
- Studio: make GGUF disk-space preflight cache-aware by @danielhanchen in #5012
- Studio: anchor ctx-slider warning threshold at 4096 when weights exceed VRAM by @danielhanchen in #5014
- studio: show HF model download progress in training start overlay by @danielhanchen in #4894
- studio: stream export worker output into the export dialog by @danielhanchen in #4897
- Fix num_items_in_batch GA for Gemma4 by @Datta0 in #4998
- studio: pin peft to 0.18.1 to fix export subprocess issues by @rolandtannous in #5015
- Studio: live model-load progress + rate/ETA on download and load by @danielhanchen in #5017
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #5004
- Fix bitsandbytes ROCm install by using pip instead of uv by @edamamez in #4966
- Studio: split model-load progress label across two rows by @danielhanchen in #5020
- Studio: hard-stop at n_ctx with a 'Context limit reached' toast by @danielhanchen in #5021
- [moe][gemma4] Target MoE for gemma4 by @Datta0 in #4913
- Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var by @rolandtannous in #5024
- Studio: support GGUF variant selection for non-suffixed repos by @Imagineer99 in #5023
- fix: prevent offline freeze by fixing stats retry and forwarding local_files_only by @DavidSolanas in #5016
- Respect classification head skip list on pre-quantized 4-bit checkpoints (#5027) by @danielhanchen in #5034
- fix(rocm): tighten gfx regex to ignore generic ISA lines by @danielhanchen in #5033
- Fix grad-accum accepts_loss_kwargs detection for vision wrappers by @danielhanchen in #5036
- grpo_compute_loss_slow called with wrong positional args by @jonahsamost in #4887
- Gate trl disable_gradient_checkpointing warning on UNSLOTH_ENABLE_LOGGING by @danielhanchen in #5038
- Studio: refresh Downloaded GGUF list and recurse into variant subdirs by @danielhanchen in #5032
- feat: Add support for OLMo-3 model by @OnePunchMonk in #4678
- feat: Add cactus QAT scheme support by @OnePunchMonk in #4679
- Re-apply #4939: updated models template mappers by @rolandtannous in #4950
- Studio: add folder browser modal for Custom Folders by @danielhanchen in #5035
- Bump Studio installer minimum to 2026.4.5 by @danielhanchen in #5041
- fix Gemma4 flash attn disable by @mmathew23 in #5045
- BUG: fix _fix_chat_template for ChatML templates missing add_generation_prompt (#4150) by @kimimgo in #4426
- fix: use direct registry API for PATH writes instead of SetEnvironmentVariable by @Etherll in #4961
- Chat-template repair: warn-by-default, AST classification, dict support by @danielhanchen in #5049
- Restrict flash attn to <=256 head dim. Consolidate attn impl checks by @Datta0 in #5051
- Remove legacy venv Scripts entry from User PATH on upgrade by @danielhanchen in #5060
- Fix review findings for chat-template repair (#5049) by @danielhanchen in #5056
- Studio: Ollama support, recommended folders, Custom Folders UX polish by @danielhanchen in #5050
- feat(studio): replace navbar with collapsible sidebar by @wasimysaid in #4936
- fix audio dataset preview and finetuning by @CodeMan62 in #5043
- Chat first onboarding by @wasimysaid in #5063
- Fix onboarding followups by @wasimysaid in #5064
- Studio: Default Gemma fallback for chat + AI assist by @Imagineer99 in #5066
- fix: multi-GPU inference crash for bnb 4-bit/8-bit models by @danielhanchen in #5068
- Add Qwen3.6 inference defaults for Studio by @danielhanchen in #5065
- Add qwen3.6 script by @Manan17 in #5084
- Studio: forward standard OpenAI tools / tool_choice to llama-server by @rolandtannou...

