Releases · unslothai/unsloth

Release list

GLM 5.2 + Model Hub + 3x longer contexts Latest

Latest

danielhanchen released this 18 Jun 17:36

v0.1.471-beta

7ecbf5a

GLM-5.2 is now supported in Unsloth Studio! All reasoning levels supported. 3x longer context lengths are now achievable with our new auto fit algorithm with MTP, allowing longer chats. Bypass permissions mode, forkable chats, queue-able chats, a new hub for model discovery, parallel modules + HTTPS Cloudflare support and more! Use unsloth studio --secure for secure HTTPS global access! Read our GLM-5.2 guide

Screenshot 2026-06-18 at 10-35-59 Chat - Unsloth Studio

To update Unsloth or install a new Unsloth Studio, you must use the below.
Ensure your version is 2026.6.9 or v0.1.471-beta for the latest.

MacOS, Linux, WSL:

curl -fsSL https://unsloth.ai/install.sh | sh

Windows:

irm https://unsloth.ai/install.ps1 | iex

Better context length algorithm

As per #6312 and #6447, we made Unsloth Studio's determination of memory usage and context length much better, achieving 3x longer context overall:

scenario	KV	before	after
1x 32GB pipeline (~31 GB free)	f16	23,040	64,000
	q8_0	43,520	114,944
	q4_0	82,432	199,680
2x 32GB pipeline	any	262,144	262,144
2x 24GB tensor (~23 GB free)	f16	134,049	262,144
	q8_0	252,329	262,144

Chat Canvas, Forking & Queueing

Edit assistant messages in place and re-run from any point in the thread.
Fork a thread to branch a conversation without losing the original.
Temporary (incognito) chats that leave nothing behind.
Queue new prompts while a generation is still running instead of waiting.
Chat "artifacts" are now canvas, with inline HTML canvas cards that auto-render, a Code view, and DiffusionGemma keeps its raw code visible inline instead of collapsing.
Chat search now covers every message and surfaces your own messages first.

Hub (Redesigned)

Full-page Hub with a trending feed, search, and custom model paths support.
README preview in a split-view feed so you can read before you download.
Downloads default to the faster Xet transport, with automatic HTTP fallback if a transfer stalls.
New "Load on selection" toggle to set load options before a model loads.
Google logo shown for DiffusionGemma and future Gemma derivatives.

Models & Inference

DeepSeek-OCR and more vision models now load and run without errors.
Fixed fast inference on the latest vLLM (0.22+) so speed-ups work again.
Tensor parallelism is more reliable: if the faster MTP path fails, it now recovers on its own instead of crashing.
DiffusionGemma now shows the image forming live as it denoises, with accurate speed stats.

Security & Cloudflare Encrypted Studios

New --secure Cloudflare-only mode for end-to-end encrypted studios, with server-side tools staying enabled under --secure. Use unsloth studio --secure!
Bypass Permissions mode to skip confirmations and disable the tool sandbox when you want it.
Auto detect Hugging Face Virus scanning + dangerous files in repos.

Logging and API

New API server monitor in Studio.
Faster API calling and less latency
Much better streamlined logs - now with throughput and latency and removed a lot of bloated logs.

Hardware & Backend

Better support for Blackwell RTX 50X and 60X GPUs
Fix silent downgrading to CPU and not GPU
torchao version is now selected from the installed torch.
Installer now auto-repairs a broken or CPU-only PyTorch install and warns on silent CPU fallback, across NVIDIA + AMD on Win/Linux/Mac/WSL.
Frees the chat model's VRAM when training starts, but only when the GPU is actually tight (no needless reloads otherwise).
If llama-server hard-crashes at startup, Studio now steps through a recovery ladder instead of just failing.

Training & General Fixes & Parallel Modules

MLX training updates.
Improved GRPO training reliability with vLLM.
Training startup made more reliable, with clearer errors for invalid VLM batches.
Studio now cleans up leftover backend processes more reliably after crashes, restarts, or interrupted shutdowns.
Export, Chat, Training, Recipes are all individualized / compartmentalized! This means you can do all 4 in parallel now! You can chat / do inference while you wait for a training run or an export!

What's Changed

Bump install.sh / install.ps1 pin to unsloth>=2026.6.4 by @danielhanchen in #6257
Studio: account for mmproj VRAM in GGUF fit budget (#5825) by @hoobnn in #5849
fix(studio): keep local GGUF vision on llama-server by @alkinun in #5770
install.sh: keep the studio launch from draining the curl | sh script (WSL/dash) by @danielhanchen in #6258
DiffusionGemma: set UNSLOTH_IS_PRESENT so the shim runs on a clean install by @danielhanchen in #6259
studio: add keyboard navigation to model picker by @alkinun in #5628
Bump install.sh / install.ps1 pin to unsloth>=2026.6.5 by @danielhanchen in #6260
Installer: drop the lemonade ROCm fallback now the fork ships identical per-gfx prebuilts by @oobabooga in #6225
Studio: keep distinct bpw flavors of the same GGUF quant by @bouclem in #5729
studio: declare UNSLOTH_IS_PRESENT at backend startup (clean-install + Windows) by @danielhanchen in #6262
Studio: extend llama.cpp first-token timeout by @Imagineer99 in #5841
Studio: polish update banner layout and sidebar settings icon by @shimmyshimmer in #6266
Studio: only advertise a Cloudflare tunnel once it actually serves by @oobabooga in #6264
Studio: backfill the DiffusionGemma visual-server on a tag-matching update by @danielhanchen in #6267
Studio: keep llama-server discovery from crashing on an access-denied candidate by @danielhanchen in #6268
Bump install.sh / install.ps1 pin to unsloth>=2026.6.6 by @danielhanchen in #6270
Tidy update banner and auth button spacing by @shimmyshimmer in #6279
Fix llama.cpp prebuilt: skip already-installed same-release fallback by @danielhanchen in #6285
Installer: drop redundant -WindowStyle Hidden from the Windows launcher VBS by @danielhanchen in #6284
Fix Responses tool output content arrays by @alkinun in #6287
Studio: UI polish for sidebar, menus, hub and toasts by @shimmyshimmer in #6288
Upgrade setuptools and wheel in the auto-install command by @danielhanchen in #6282
Studio: fix training output dir escaping outputs root for models on another drive by @danielhanchen in #6293
Studio: offer llama.cpp update for same-base mix builds on source installs by @shimmyshimmer in #6280
Studio: resolve studio home before the llama-only setup split by @danielhanchen in #6289
Studio: force-refresh the llama.cpp update check so new builds are not masked by the 24h cache by @shimmyshimmer in #6278
Studio: decide diffusion routing before the SWA resolver by @danielhanchen in #6299
Bump install.sh / install.ps1 pin to unsloth>=2026.6.7 by @danielhanchen in #6301
Studio: don't silently fall back to a CPU prebuilt on NVIDIA Linux GPU hosts by @oobabooga in #6310
Studio: clarify llama.cpp update banner copy by @danielhanchen in #6313
MLX Training updates by @mmathew23 in #5656
Studio: add temporary (incognito) chat by @oobabooga in #5956
Studio: enable stdio MCP servers on a loopback bind by @oobabooga in #6295
fix(studio): Windows GGUF cancel hang + CPU spinlock overhead (#5692) by @anmolxlight in #5749
feat: add Anthropic-compatible thinking parameter by @maattm in #5856
Studio: Bypass Permissions (skip confirmation, disable tool sandbox) by @danielhanchen in #5895
Studio: add --secure Cloudflare-only mode and revamp API usage examples by @danielhanchen in #6300
Studio: arm the VRAM-settle wait after the startup orphan reaper by @danielhanchen in #6315
Studio: fix Mac IME input-method switch leaving composer Send disabled by @narakai in #5762
fix: use partial hipinfo output on crash to avoid CPU fallback (RDNA 4 / gfx1200) by @mvanhorn in #6292
Rename chat artifacts copy to canvas by @wasimysaid in https://github.com/unslothai/u...

Contributors

mvanhorn, Erildo, and 29 other contributors

Assets 2

0 Join discussion

DiffusionGemma + Gemma 4 MTP

shimmyshimmer released this 12 Jun 13:57

v0.1.464-beta

b32ef2e

We've merged over 150 PRs this week so lots of new updates, a new model Hub and look! Ensure you install the latest v0.1.464-beta or 2026.6.7. DiffusionGemma, Gemma 4 MTP and MiniMax-M3 are all now supported.

DiffusionGemma + Gemma 4 MTP + Audio

Run and train DiffusionGemma via Unsloth Studio. Install the latest v0.1.462-beta if DiffusionGemma wasn't previously working.
Gemma 4 MTP is here! Run Gemma 4 around 2x faster with MTP - MTP is auto enabled in Unsloth Studio.
Audio chat is now supported for Gemma 4 (wav, mp3, m4a,flac, webm).
Preserve Thinking added to Gemma 4.

Hub + Download Manager (Experimental)

Added a new Hub page for browsing, downloading, and managing Hugging Face models and datasets.
Unsloth can now detect models and datasets already on your machine and show them alongside downloaded assets.
Downloaded GGUF models now have direct Run / New Chat actions.

Chat with Files / RAG (Experimental)

Added Chat with Files in Studio, letting you ask questions over your own documents and knowledge bases.
Supports hybrid search, citations, PDF previews, per-thread documents, and a built-in search_knowledge_base tool.

New Update Button + Hardware Support

Unsloth now uses fresh, up-to-date llama.cpp prebuilts across CUDA, ROCm, Windows, Linux, and macOS.
Added an in-app Update llama.cpp button so users can update the local backend without reinstalling Studio.
Improved Windows / WSL AMD support, Strix Halo ROCm support, Blackwell CUDA selection, and clearer installer messages.

Local Chat, Tools & API Compatibility

Local tool calling is more reliable, with better ordering of tool cards, fewer duplicate tool loops, and support for tool use with GGUF vision models.
Improved OpenAI-compatible API and Anthropic-compatible API behavior for local Studio servers, including better errors, token usage, stop reasons, and Claude Code compatibility.

Tool Calling, MCP, Encrypted Cloudflare Tunnels

Bypass Permissions, Tool Call Permissions (Approve, Always Approve, Deny)
50% to 90% less tool call nudging issues without any accuracy loss
MCP, Artifacts are now select-able
Tensor parallelism is now enabled for GGUFs - get +30% throughput!
Cloudflare HTTPS free tunnels is now added allowing for end to end encrypted studios!

Training & General Fixes

Improved MLX support with better model labels, generation speed stats, and fixes for VLM training.
Fixed several training and dataset edge cases, including non-writable Hugging Face caches and custom dataset mappings.
Added many UI polish fixes across chat, menus, model picker, dark mode, import/export, and settings.

To update Unsloth or install a new Unsloth Studio, you must use:

macOS, Linux, WSL:

curl -fsSL https://unsloth.ai/install.sh | sh

Windows:

irm https://unsloth.ai/install.ps1 | iex

Warning

DO NOT USE unsloth studio update since packaging will not get the latest updates

What's Changed

Studio: llama.cpp update banner redesign, About tab license info, UI polish by @shimmyshimmer in #6196
Bump install.sh / install.ps1 pin to unsloth>=2026.6.3 by @danielhanchen in #6212
Expose runtime context length for hub models by @alkinun in #6154
Studio: fix llama.cpp update banner offering a downgrade / sticking on mix releases by @oobabooga in #6219
Fix kwarg spacing in training files to satisfy pre-commit by @shimmyshimmer in #6209
Studio: reword the Cloudflare line when the public probe fails by @danielhanchen in #6217
fix: deduplicate lemonade ROCm prebuilt selection log by @LeoBorcherding in #6021
Stop false RoPE 'default' warning and fix rope drift gate on transformers 5 by @danielhanchen in #6223
fix(studio): load run.py by path for editable installs by @jimdawdy-hub in #5909
fix(studio): inherit llama_extra_args and honor --no-mmproj by @jimdawdy-hub in #5902
fix(studio): adopt server-loaded model before chat auto-load by @jimdawdy-hub in #5900
Fix stale sidebar regression test to match the gap-px markup by @danielhanchen in #6232
Studio: gate the staged prebuilt runtime validation behind a flag (off by default) by @danielhanchen in #6216
Fix FastModel config passthrough for sequence classification by @alkinun in #6203
fix: decode subprocess output as UTF-8 in save.py on Windows by @dylanschroers in #6218
patch: fix EmptyLogits gathering in nested payloads and Accelerate recursively_apply by @MdHussain121 in #6092
Studio: show Apple GPU temperature and power in the GPU monitor (macOS) by @Ban921 in #6187
Studio: Add inline confirmation (Allow/Always allow/Deny) for tool calls by @oobabooga in #5869
Studio: guard Apple GPU power against negative counter-reset readings by @danielhanchen in #6235
Fix step count mismatch when sequence packing is enabled by @IrakliXYZ in #5967
fix/uv-bytecode-timeout by @alkinun in #6166
Studio: tune llama.cpp env for data-center GPUs by @danielhanchen in #6098
Studio: drop the on-disk freshness cache after a llama.cpp update by @danielhanchen in #6234
Add missing RAG deps to no-torch Studio runtime requirements by @danielhanchen in #6236
Studio: rounded rectangle hover states for menu items instead of pills by @shimmyshimmer in #6210
docs: repository cleanup by @Agnibha007 in #5617
Run cross-platform parity test on Windows and macOS in CI by @danielhanchen in #6241
chore(studio/frontend): normalize line endings to LF by @danielhanchen in #6012
fix: respect absolute export paths to prevent cross-drive copy failures (WinError 112) by @anmolxlight in #6088
Studio: Add Tensor-Parallel llama.cpp support by @oobabooga in #6040
Studio: Add custom provider option to Connections by @Imagineer99 in #6112
Studio: model selector and settings polish by @shimmyshimmer in #6240
Studio: login card polish and sidebar label alignment by @shimmyshimmer in #6242
Studio: pinnable plus menu items and saved prompt pins by @shimmyshimmer in #6237
Studio: bottom update banners, smooth llama.cpp progress, re-prompt after copy by @shimmyshimmer in #6233
fix(studio/responses): forward chat_template_kwargs enable_thinking to chat request by @Anai-Guo in #6202
Studio: fix WSL Strix Halo GPU on reinstall (ROCDXG drop-in + system HIP before bundle) by @danielhanchen in #6227
Studio: fully rounded Hub pills and refreshed menu icons by @shimmyshimmer in #6248
Studio: use px-2.5 for Hub option menu padding by @shimmyshimmer in #6249
Studio: fix Downloaded model list disappearing and order it by last download by @danielhanchen in #6247
Studio: new-chat shortcut, composer draft autosave, archive threads by @NilayYadav in #5771
Studio: persist speculative decoding preference across restart and model switch by @oobabooga in #6169
Studio: refine menu chevron, tick icon, and one-line plus-menu shape by @shimmyshimmer in #6251
Studio: serve D...

Contributors

Ban921, danielhanchen, and 15 other contributors

Assets 2

0 Join discussion

Gemma 4 MTP + Bug Fixes

danielhanchen released this 10 Jun 18:20

v0.1.451-beta

ddaa8a9

Bug Fixes and more cross platform support

To update Unsloth or install a new Unsloth Studio, you must use:

macOS, Linux, WSL:

curl -fsSL https://unsloth.ai/install.sh | sh

Windows:

irm https://unsloth.ai/install.ps1 | iex

Warning

DO NOT USE unsloth studio update since packaging will not get the latest updates

What's Changed

Bump install.sh / install.ps1 pin to unsloth>=2026.6.1 by @danielhanchen in #5977
Port KTO logps truncation guard to TRL 1.x _compute_logps refactor by @danielhanchen in #5996
CI: track deepseek_ocr2 compile timeout in known-broken list by @danielhanchen in #5995
fix(studio): disable mlx gc for none by @Lyxot in #5991
Normalize shell scripts to LF in .gitattributes by @danielhanchen in #5997
Studio: enable audio input for Gemma 4 GGUFs; default chat model to Qwen3.5-4B-MTP by @danielhanchen in #6000
Fix chat text cutoff at composer dock and speed up plus icon spin by @shimmyshimmer in #5989
Studio: refine tool call and reasoning trigger UI by @shimmyshimmer in #5873
fix: warn when localhost resolves to ::1 but Studio is bound only to 127.0.0.1 by @mvanhorn in #5994
fix: persist Studio thread synchronously on first runStart so mid-stream refresh keeps the prompt by @mvanhorn in #5814
Studio: enable GGUF tools with vision inputs by @Imagineer99 in #6009
Studio: accept system-role messages in Claude Code requests by @Imagineer99 in #6006
Studio: fix load_freeze audio-type tests for #6000's Gemma 4 <|audio|> probe by @danielhanchen in #6018
Studio: fix chat preset persistence with fast mode by @Imagineer99 in #5870
Studio: fix Repo tests (CPU) by stopping the ROCm test from leaking a fake utils into sys.modules by @danielhanchen in #6027
Studio: stop ROCm amd-smi tests leaking a fake loggers into sys.modules (follow-up to #6027) by @danielhanchen in #6055
Studio: emit usage and timings for MLX generation speed stats by @shimmyshimmer in #6068
Studio: tag MLX loaded models as MLX instead of Base in chat by @shimmyshimmer in #6067
Studio: remove red border on chat error messages by @shimmyshimmer in #6063
Studio: keep chat in place when composer attachments resize it by @shimmyshimmer in #6070
CI: allowlist deepseek_ocr2 in compiler full-model-sweep by @danielhanchen in #6085
Restore KTO logps truncation guard for TRL (re-apply dropped #5996) by @danielhanchen in #6086
Studio: stop leaking internal exceptions to API clients; harden sandbox path by @danielhanchen in #6072
Formatting: ruff line-length 100 + drop blank after short local imports by @danielhanchen in #6079
Fix MoE LoRA target parameter handling by @Datta0 in #5345
Refactor VLM detection in studio by @Datta0 in #5245
qwen 3.5 export fixes by @Datta0 in #5992
[fix] Nvfp4 load by @Datta0 in #6087
Studio: open the MCP dialog to the server list by @oobabooga in #6100
Studio: make code comments and docstrings more succinct by @danielhanchen in #6029
Reduce and tighten code comments and docstrings repo-wide by @danielhanchen in #6095
Studio frontend: reduce and tighten code comments by @danielhanchen in #6099
feat(studio): Hub + Download Manager by @Sneakr in #5916
Studio fix recipe dataset preview by @wasimysaid in #6031
Studio: make Helper LLM startup pre-cache opt in by @wasimysaid in #6113
Improve local chat tool call flow by @wasimysaid in #5962
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #6104
Studio: improve OpenAI- and Anthropic-compatible API spec compliance by @oobabooga in #6010
Studio: follow-up fix for GGUF developer prompts by @wasimysaid in #6115
Studio: stop the providers dialog from resetting custom provider form state by @NilayYadav in #6051
Studio: clean-room compact RAG (knowledge bases, hybrid search, fast indexing) by @danielhanchen in #5910
feat: support text-only loading of Gemma 3 27B via FastLanguageModel (skip SiglipVisionModel) by @mvanhorn in #5816
studio(ui): use the --primary brand token for the avatar fallback color by @danielhanchen in #5987
fix(studio): block arbitrary external image URLs in markdown renderer by @rsd-darshan in #5602
Studio: center account avatar vertically in sidebar footer pill by @shimmyshimmer in #6026
fix: clearer Studio setup error when GPU driver is too old for the installed CUDA toolkit by @mvanhorn in #5993
Studio: npm v12 readiness (allowScripts policy, npmrc cleanup, bun bootstrap fix) by @danielhanchen in #6128
Studio: faithful conversation export and import round trips (ShareGPT system role, CSV quoted newlines) by @danielhanchen in #6131
Studio: auto-sync allowScripts pins after dependency bumps by @danielhanchen in #6136
Studio: unify shadows, backgrounds and dark mode consistency in chat UI by @shimmyshimmer in #6116
Windows/WSL installer: fix winget msstore cert failure, amd-smi DiskPart prompt, and enable AMD GPU (Strix Halo gfx1151) by @danielhanchen in #5940
Studio: bulk export and import in Settings Chat Data, MCP pill off switch by @danielhanchen in #6141
Studio: fix nested dropdown submenus clipped by the menu alignment nudge by @danielhanchen in #6143
Studio: declutter the chat plus menu and make RAG session-only with pre-select by @danielhanchen in #6140
fix/validate dataset video paths before training by @LeoBorcherding in #5136
Restore config use_cache in for_inference after gradient checkpointing prep by @danielhanchen in #6137
fix: persist Windows ROCm BNB version by @Peter7896 in #6048
Studio GGUF CI: fix deterministic JSON-mode failure by bumping the quant, keep hard asserts by @danielhanchen in #6138
Frontend CI: hard-fail unreviewed npm install scripts (--strict-allow-scripts) by @danielhanchen in #6139
Studio UI polish: search dialog shadow, model picker pills, sidebar spacing, white Hub background by @shimmyshimmer in #6147
Run the HF cache redirect before import fixes that can freeze Hub constants by @danielhanchen in #6150
Fix Studio MLX VLM resized image layout by @Lyxot in #6019
fix(studio): reject unsupported MLX CPT and embedding training by @Lyxot in #6091
fix(studio): forward custom dataset mappings in MLX training by @Lyxot in #6094
Tests + CI guard: batched left-padded generation can never silently regress again (#1066, #3699) by @danielhanchen in #6145
Tests: follow Compare chat into the More submenu in the Playwright chat UI driver by @danielhanchen in #6153
Fix UnboundLocalError in ROCm version detection dpkg/rpm fallback by @danielhanchen in #6149
Auto-set BNB_ROCM_VERSION from the installed wheel on Windows + ROCm by @danielhanchen in #5986
Call patch_compiling_bitsandbytes on the FastLanguageModel path by @danielhanchen in #6144
Studio: mascot images degrade gracefully instead of showing alt text by @danielhanchen in #6146
Studio: training survives a non-writable HF datasets cache by @danielhanchen in #6148
Studio: fix Gemma-4-12B-it not loading by @Imagineer99 in https://github.com/uns...

Contributors

mvanhorn, Sneakr, and 12 other contributors

Assets 2

Gemma 4 12B, New UI, MCP, Projects

danielhanchen released this 03 Jun 14:35

v0.1.44-beta

c7d2ed1

Hey everyone, this update focuses mainly on MCP, Projects, Canvas and the new chat UI.
We've also made many improvements across Studio. Next week we'll have an even bigger update.
If Gemma 4 12B isn't working for you, please re-update Unsloth!

To update Unsloth or install a new Unsloth Studio, you must use:

macOS, Linux, WSL:

curl -fsSL https://unsloth.ai/install.sh | sh

Windows:

irm https://unsloth.ai/install.ps1 | iex

Warning

DO NOT USE unsloth studio update since packaging will not get the latest updates

Gemma 4 12B

Google releases Gemma 4 12B, a new model that runs locally on 8GB RAM. GGUF / Guide
Gemma 4 12B Unified supports image, audio and 256K context. Run and train the model via Unsloth Studio.

MCP

Give your model live tools instead of relying on memory one click in the composer, no API keys for the built-ins
Built-in presets:
- Context7 current docs & code for thousands of libraries
- Exa live web search
- Hugging Face search models, datasets & papers
Add your own remote (OAuth/headers) or local (stdio) servers, toggled per chat

New Chat UI

Projects, Canvas, MCP, and Compare tuck into one + menu
Search and Code are now one click away

Projects

Keep related chats together in one workspace
Create a project from the sidebar, then add new or existing chats to it

Experimental Canvas / Artifacts

Opens generated HTML in a dedicated canvas panel inside Unsloth Studio
Supports interactive outputs, including browser based visualizations and CDN-loaded packages
Lets you switch between rendered preview and source code

Install, Runtime & Hardware

CUDA / Windows
- CUDA 13.3 llama.cpp binaries now work on Windows (and other non-Linux) and fix the CUDA 13.2 gibberish-output bug while default still pins to CUDA 13.1 for now
- On CUDA 13.2, 13.1 and below, Windows falls back to CUDA 12.4 and native 13.1 binaries coming soon
- Windows prebuilt installs no longer block on the early CUDA Toolkit check
Linux / GPU
- Linux llama.cpp prebuilts now match your runtime's cudart major version
- Prebuilt coverage for Blackwell (with a CUDA 13.0 driver fallback) and B300 (sm_103)
- ARM64 Linux now source-builds on GPU hosts, with a CPU prebuilt fallback
- ROCm: detected AMD gfx arch is forwarded to the prebuilt installer (with a setup.sh fallback)
macOS
- Fixed Apple Silicon installs that were resolving torch against x86_64

Other Studio improvements

Connected models now work in Compare mode
Smoother streaming which now renders batched to one per animation frame
Larger upload limits for training datasets and recipe
Window size and maximized state persist across launches
Chat search hides non-matching threads
Model loading handles mid-refresh cancellation cleanly
Cleaner rendering for generated image frames and Python tool code blocks

What's Changed

Bump install.sh / install.ps1 pin to unsloth>=2026.5.9 by @danielhanchen in #5904
Update README.md by @danielhanchen in #5905
Use remove-circle icon for eject model button by @shimmyshimmer in #5906
Studio: add HTML artifacts to chat by @wasimysaid in #5772
Use natural generated image frame by @wasimysaid in #5794
Studio: defer the Windows CUDA Toolkit check so prebuilt users are not blocked by @danielhanchen in #5912
Studio CI: tolerate transient artifact-upload flakes on diagnostic log steps by @danielhanchen in #5913
Studio: match the Linux llama.cpp prebuilt to the runtime cudart major by @danielhanchen in #5914
Studio: extend the pinned Blackwell GPU fallback to CUDA 13.0 drivers by @danielhanchen in #5920
Studio: source-build arm64 Linux GPU hosts, with a CPU prebuilt fallback by @danielhanchen in #5924
Studio CI: run the Resolve-CudaToolkit unit test in the Windows GGUF job by @danielhanchen in #5921
Studio: forward the resolved AMD gfx arch to the prebuilt installer by @danielhanchen in #5923
Studio CI: tolerate transient runner crashes in the llama-server log collection step by @danielhanchen in #5925
Studio: forward --has-rocm from setup.sh when gfx resolution fails by @LeoBorcherding in #5927
Studio: cover B300 (sm_103) with the Linux prebuilt bundles by @danielhanchen in #5930
Studio: move MCP to a composer button with presets by @danielhanchen in #5926
Studio: clearer MCP server validation when stdio is disabled by @oobabooga in #5928
Bump install.sh / install.ps1 pin to unsloth>=2026.5.10 by @danielhanchen in #5931
Studio: support connected models in compare mode by @Imagineer99 in #5824
Studio: manage chat history with projects by @Imagineer99 in #5725
fix(chat): refine artifact panel styling by @wasimysaid in #5922
studio/frontend: pad Python tool code block to fix corner clipping by @oobabooga in #5938
Studio: polish model load toast styling by @Imagineer99 in #5648
Studio: optimize chat streaming by batching renders to one per animation frame by @oobabooga in #5788
Raise Studio upload limits by @wasimysaid in #5808
Studio: persist Tauri window size and maximized state across launches by @oobabooga in #5799
Guard model-load success path against mid-refresh cancellation by @rolandtannous in #5944
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #5935
feature/dataset_none_detect.py adds empty content detection for conversation datasets by @LeoBorcherding in #4438
studio/chat: hide non-matching threads in chat search (#5572) by @wtfashwin in #5651
Update vulnerable dependencies to patched versions by @danielhanchen in #5970
fix(studio): make the reset-password hint work on Windows, macOS, and Linux by @danielhanchen in #5971
Document install env vars in README advanced launch options by @danielhanchen in #5972
Update Install Scripts by @danielhanchen in #5968
Logging cleanup by @danielhanchen in #5973
studio: redesign chat composer by @shimmyshimmer in #5891
fix(studio): don't double-quote the reset-password hint for paths with spaces by @danielhanchen in #5975
Patch other imports of trainer and config by @Datta0 in #5946
Fix UnicodeEncodeError when printing emoji on legacy Windows consoles by @danielhanchen in #5948
Fix macOS Apple Silicon installs resolving torch against x86_64 by @danielhanchen in #5976

Full Changelog: v0.1.43-beta...v0.1.44-beta

Contributors

danielhanchen, Datta0, and 8 other contributors

Assets 2

CUDA 13.3, Windows, Mac update

danielhanchen released this 31 May 14:06

v0.1.43-beta

e04ea33

To update Unsloth or install a new Unsloth Studio, you must use:

macOS, Linux, WSL:

curl -fsSL https://unsloth.ai/install.sh | sh

Windows:

irm https://unsloth.ai/install.ps1 | iex

Warning

DO NOT USE unsloth studio update since packaging will not get the latest updates

Mac Updates

Re-enabled llama.cpp prebuilt binaries for Apple Silicon (M1-M4) - Mac OS 14 / 15 / 26 (Tahoe)
Apple Silicon Mac OS 13 (Ventura) is source build
Intel (x86_64) for Mac OS 13.3 / 14 / 15 / 26 (Tahoe) uses llama.cpp prebuilt binaries
Intel for Max 13.0 - 13.2 is source build

Windows Updates

CUDA 13.3 llama.cpp prebuilt binaries now work for Windows
For CUDA 13.2, CUDA 13.1 and below, Windows devices uses CUDA 12.4 fallback - we'll work on CUDA 13.1 binaries soon.

CUDA 13.3 Update

CUDA 13.3 non Linux binaries work. We'll still use CUDA 13.1 for now
CUDA 13.3 solves the CUDA 13.2 gibberish problem - see #4849

Blackwell GPUs Update

For now Blackwell will have delayed releases of llama.cpp prebuilt binaries sine CUDA 12.4 does not work - we are working to resolve this soon.

What's Changed

Bump install.sh / install.ps1 pin to unsloth>=2026.5.8 by @danielhanchen in #5791
Studio: expose --parallel / -np flag on unsloth studio run by @danielhanchen in #5737
tests: unblock three stale assertions broken on main (MLX CI + Backend CI) by @danielhanchen in #5803
ci: install unsloth_zoo from git main in notebooks-ci + studio-backend-ci by @danielhanchen in #5802
tool mask support by @Datta0 in #5682
Studio: add frontend i18n support by @penumbrazz in #5765
Studio: unblock install on Linux ARM64 + Windows ARM64 + Intel Mac by @danielhanchen in #5790
Harden Linux llama.cpp prebuilt reuse by @mmathew23 in #5796
Studio: expose image size setting in training UI by @Dariton4000 in #5743
Studio: add configurable CPU thread pool limit by @alkinun in #5760
Trim README Advanced launch options blurb by @danielhanchen in #5809
[RL] make sync weights conditional by @Datta0 in #4925
Studio: add Gemini provider with web_search, code_execution, prompt caching, and Nano Banana image generation by @danielhanchen in #5720
Studio: add remote MCP server support by @NilayYadav in #5750
Clear MRoPE after generation for GRPO by @Datta0 in #5683
Detect CUDA UMD Version from newer nvidia-smi output (fixes #5812) by @danielhanchen in #5817
Studio: remove dark mode upload circle by @Imagineer99 in #5813
fix: honor --ctx-size and other forwarded args from unsloth studio run in Studio's context-fit logic by @mvanhorn in #5815
Fix non-streaming GGUF chat completion usage by @alkinun in #5781
Fix/studio colab proxy and iframe - Unsloth Studio not loading in Colab (iframe "refused to connect" and wrong URL) by @LeoBorcherding in #5844
Studio: keep web search/code pills off on model load if user disabled them by @rolandtannous in #5851
studio/setup.sh: cope with fresh CUDA toolkits like 13.3 by @danielhanchen in #5826
fix/strix halo and windows AMD ROCm support by @LeoBorcherding in #5301
Version downpairing for Radeon+ROCm PyTorch wheels by @jaeiclee in #5353
feature/use lemonade-sdk llamacpp-rocm binaries by @LeoBorcherding in #5303
studio: ROCm cleanups follow-up to #5301 by @danielhanchen in #5874
fix: wrong code block language for nightly Windows install by @mervivian in #5877
fix: bump openssl, rand, rustls-webpki, tar in Studio src-tauri for security advisories by @orbisai0security in #5866
Fix error when context-resizing by @afsuyadi in #5860
Update Studio sidebar logout icon to logout-05 by @shimmyshimmer in #5878
studio: pick a macOS llama.cpp prebuilt that loads on the host OS by @danielhanchen in #5883
studio: make NVIDIA prebuilt selection track CUDA version bumps (Windows + Linux) by @danielhanchen in #5879
Remove dead top-level bitsandbytes import from models/_utils.py by @Rahul007007 in #5875
ci(security-audit): make package installs network-resilient by @danielhanchen in #5853
Studio: add stdio MCP server support by @oobabooga in #5863
studio/frontend: fix MCP dialog overflow on long URLs by @oobabooga in #5864
Studio: clearer error for diffusion GGUFs loaded as chat models by @danielhanchen in #5857
Studio: harden stdio MCP gating and fix transport edge cases by @danielhanchen in #5892
Studio: fix phantom scroll area below messages with many sources by @danielhanchen in #5822
Studio: pin the last pre-macOS-26 llama.cpp prebuilt instead of walking back by @danielhanchen in #5896
fix(install): detect x86_64 Python venv on Apple Silicon and rebuild as arm64 by @ramankrishna in #5187

New Contributors

@penumbrazz made their first contribution in #5765
@Dariton4000 made their first contribution in #5743
@NilayYadav made their first contribution in #5750
@mvanhorn made their first contribution in #5815
@jaeiclee made their first contribution in #5353
@mervivian made their first contribution in #5877
@orbisai0security made their first contribution in #5866
@afsuyadi made their first contribution in #5860
@Rahul007007 made their first contribution in #5875
@ramankrishna made their first contribution in #5187

Full Changelog: v0.1.42-beta...v0.1.43-beta

Contributors

mvanhorn, jaeiclee, and 17 other contributors

Assets 2

1 Join discussion

An Update before Revamp!

shimmyshimmer released this 26 May 14:36

v0.1.42-beta

e57a1a7

Hey guys, we're doing one more-ish update before a major revamp which is likely coming this week or next week. Our revamp will change a lot of things, especially with new major features and a lot of design changes.

NEW: API calling support now with image generation + editing, proper web search, code execution, auto prompt caching. Connect OpenAI, Anthropic and more.
unsloth studio update should now work properly. For Mac users, use the install curl command instead: curl -fsSL https://unsloth.ai/install.sh | sh
Proper support for non-English languages e.g. Japanese, Chinese, Indian etc.

apicalling.unsloth.mp4

Many of you may have missed our previous release which only lasted for one day. We introduced:

Connect to external inference backends: vLLM, Ollama, llama-server
Security improvements

Auto MTP speculative decoding for MTP GGUFs; get the best settings customized for your hardware.

API provider calling & external connections

You can now connect Unsloth to any API cloud provider (OpenAI, Anthropic, OpenRouter etc.)
Built-in web search for OpenAI, Anthropic, OpenRouter and Kimi
Built-in code execution for OpenAI and Anthropic (Anthropic containers persist and are reused across turns)
Prompt caching is enabled for OpenAI and Anthropic models saving 50 to 90% of costs.
Image generation + editing
API key now optional for local providers (llama.cpp / vLLM / Ollama)
Auto-load models when adding a cloud provider

Other Unsloth Studio updates

OpenDocument chat attachments
o3 reasoning summary payload
Sending/prompting non-English languages (e.g. Japanese, Chinese) now works properly
IME composer hardening, RTL dir="auto", long log-line truncation fix
Tool reasoning trace rendering in UI
Fully offline support: cached GGUF discovery and offline DNS auto-detect for both inference and training

Unsloth Studio security improvements

Authentication rate-limiting, proxy-aware so reverse proxies don't bypass it
Sandboxed worker with a tightened blocklist (bash, hf upload, NOFILE)
Path containment so workers can't escape their in-flight tmp dirs
Strict schema validation across the Studio API
Tightened CSP / security headers (only legitimate favicon hosts allowed)
Removed the torch.load fallback on training_args.bin so untrusted pickles can never execute on model load
Hardened Tauri desktop release flow
Frontend auth: singleflight token refresh, current-password input on changes, working logout, shared 422 helper
Cancel cleanup now scoped strictly to in-flight tmp dirs so it can never delete user state

What's Changed

install: bump unsloth floor to >=2026.5.5 by @danielhanchen in #5621
Studio: persist chat toggles and preserve custom sampling by @Imagineer99 in #5587
Studio: add connections toggle and order hosted providers by @Imagineer99 in #5588
studio/ci: harden three pre-existing CI flakes by @danielhanchen in #5627
Studio: expand Connections model picker for local inference server by @Imagineer99 in #5643
Move uninstall scripts into scripts/ and fix references by @danielhanchen in #5644
Studio: provider model loading controls by @Imagineer99 in #5645
Studio: unify connection copy by @Imagineer99 in #5654
Remove dead chat suggestions wiring by @alkinun in #5665
Studio: Claude Code Anthropic API tool compatibility by @Imagineer99 in #5390
studio/frontend: friendlier 404 fallback for unknown routes by @danielhanchen in #5664
chore(deps): bump the npm-oxc-validator group across 1 directory with 2 updates by @dependabot[bot] in #5667
studio/chat: Disable send button for empty composer by @harryfrzz in #5647
studio/frontend: correct Think pill aria-label before model loads by @danielhanchen in #5655
studio/frontend: fix onboarding CSP violations by @danielhanchen in #5658
studio/frontend: set per-route document.title by @danielhanchen in #5660
Fix Windows Tauri build and signing by @wasimysaid in #5694
Respect GC for GRPO by @Datta0 in #5269
studio: unblock /load event loop on detect_audio_type (#5642, #5635) by @danielhanchen in #5669
studio: settle GPU VRAM after killing llama-server before the next reload by @danielhanchen in #5693
Studio: surface prompt-cache token counts in /v1/chat/completions usage chunk by @danielhanchen in #5670
Studio: per-model Anthropic server-side tool versions by @danielhanchen in #5679
Studio: support Anthropic 1h cache TTL via prompt_cache_ttl by @danielhanchen in #5685
Studio: wire OpenAI image_generation tool by @danielhanchen in #5688
Studio: per-session cost calculator + /api/providers/pricing endpoint by @danielhanchen in #5690
Studio: wire Anthropic web_fetch server-side tool by @danielhanchen in #5671
Studio: persist chat history in backend storage by @Imagineer99 in #5272
Studio: wire Anthropic server-side context compaction by @danielhanchen in #5686
Studio: wire OpenAI Responses server-side context compaction by @danielhanchen in #5687
Studio: PDF / document attachments for Anthropic + OpenAI by @danielhanchen in #5689
Studio: reconcile external providers across browsers after delete by @danielhanchen in #5698
Studio: persist external provider selection across page refresh by @danielhanchen in #5697
Studio: add Anthropic and OpenAI prompt guards for disabled tools by @Imagineer99 in #5674
Studio: persist external checkpoint when picker uses setParams by @danielhanchen in #5700
Studio: surface OpenAI image_generation as composer Images pill by @danielhanchen in #5699
Studio: expose Anthropic 5m vs 1h prompt cache TTL in Configuration by @danielhanchen in #5703
Fix connected chat model selection after refresh by @wasimysaid in #5702
Studio: render OpenAI image_generation results inline in chat by @danielhanchen in #5705
Truncate long code execution tool output by @wasimysaid in #5708
fix(gpt-oss): prefer flex attention over sdpa by @Datta0 in #5701
Bump install.sh / install.ps1 pin to unsloth>=2026.5.6 by @danielhanchen in #5716
ci: unblock Studio Windows + Linux + Mac smoke (supersedes #5733, #5734, #5738) by @danielhanchen in #5741
ci: broaden Linux + narrow Windows llama.cpp runtime patterns + trim #5741 comments by @danielhanchen in #5746
Lower default RL weight_decay from 0.01 to 0.001 for LoRA by @danielhanchen in #5747
Studio: strip orphan tool_call XML leaking into visible content by @danielhanchen in #5735
Bump install.sh / install.ps1 pin to unsloth>=2026.5.7 by @danielhanchen in #5753
Fix MLX Studio base model export save method by @Lyxot in #5727
fix(chat_templates): check find() return value before slicing on placeholders by @Ricardo-M-L in #5763
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #5773
Studio: stop seeded admin to cross-origin callers by @danielhanchen in #5739
Studio: surface external-provider cache hits and writes in context bar by @danielhanchen in #5736
Studio: Anthropic fast_mode toggle and streaming refusal handling by @danielhanchen in #5715
Studio: rewrite OpenAI Responses citation markers to markdown links by @danielhanchen in #5713
Studio: standalone Fetch pill for Anthropic web_fetch by @danielhanchen in https://github.com/unslothai/unslo...

Contributors

danielhanchen, dependabot, and 9 other contributors

Assets 2

0 Join discussion

MTP + Studio fixes

danielhanchen released this 19 May 14:49

v0.1.41-beta

735d26b

Lots of bug fixes, UI, UX fixes to Studio!

To get the latest updates do:

macOS, Linux, WSL:

curl -fsSL https://unsloth.ai/install.sh | sh

Windows:

irm https://unsloth.ai/install.ps1 | iex

Fixes

Fix unsloth studio update not working well
Fix getting stuck on reset-password page
More offline mode support
Improve MTP not being faster on Macs, CPUs and GPUs - now it's much better!
Fix Desktop Shortcut not working after update
Many many UI UX bug fixes

What's Changed in Unsloth

install scripts: bump unsloth pin to >=2026.5.3 by @danielhanchen in #5557
studio: engage draft-mtp on vision MTP GGUFs (drop incorrect vision gate) by @danielhanchen in #5560
install scripts: bump unsloth pin to >=2026.5.4 by @danielhanchen in #5566
Studio: derive Playwright default model expectation by @Imagineer99 in #5589
studio: read Playwright default model from defaults.py without importing it by @danielhanchen in #5595
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #5586
studio: fix toast close-button click and light-mode hover by @shimmyshimmer in #5597
fix(loader): honour HF_HUB_OFFLINE/TRANSFORMERS_OFFLINE env vars by @xodn348 in #5598
studio: emit one comma-chained --spec-type for CPU/Mac MTP path by @danielhanchen in #5575
Fix loss function not patched for Qwen3.5 models by @rycerzes in #5442
Fix GGUF multi-image chat handling by @alkinun in #5508
Expose standalone tool healing utilities by @alkinun in #5583
studio/frontend: reconcile stale must_change_password localStorage flag by @danielhanchen in #5576
studio/frontend: cap auto-load cascade attempts by @danielhanchen in #5578
studio: regenerate desktop launcher on unsloth studio update (macOS + Linux + Windows) by @danielhanchen in #5577
ci: advisory lockfile supply-chain audit (no install-script changes) by @danielhanchen in #5604
loader: import FORCE_FLOAT32 from unsloth_zoo (single source of truth) by @danielhanchen in #5610
fix(peft): expose finetune_last_n_layers for parity with mlx-lm CLI by @danielhanchen in #5564
studio: add --spec-draft-n-max toggle for MTP speculative decoding by @danielhanchen in #5582
studio: reserve VRAM headroom for the MTP draft cache in auto-fit by @danielhanchen in #5585
Studio: tools, thinking blocks, code execution and web search for safetensors by @danielhanchen in #5520
studio/frontend: widen settings sidebar so 'Connections' label fits by @danielhanchen in #5607
studio: respect prefers-reduced-motion across animations by @danielhanchen in #5611
studio/frontend: guard message-timing badge against unphysical tok/s by @danielhanchen in #5570
studio/web: differentiate offline from backend-down in fetch error by @danielhanchen in #5591
studio/frontend: keep theme classes mutually exclusive on by @danielhanchen in #5580
studio/frontend: show Loading fallback instead of blank pane on lazy route navigation by @danielhanchen in #5568
studio/frontend: include filename in attachment aria-label + img alt by @danielhanchen in #5594
studio/frontend: compare composer blocks send when no model picked by @danielhanchen in #5574
studio: restore focus to opener when settings dialog closes by @danielhanchen in #5612
studio/frontend: add aria-label to Dictate / Stop dictation buttons by @danielhanchen in #5599
studio/frontend: settings dialog fits viewport at tablet widths by @danielhanchen in #5600
studio/frontend: show Generation stopped placeholder when cancelled mid-thinking by @danielhanchen in #5565
studio: tool calling for Llama-3, Mistral, Gemma 4 on safetensors + MLX by @danielhanchen in #5615
Revert "studio: tool calling for Llama-3, Mistral, Gemma 4 on safetensors + MLX (#5615)" by @danielhanchen in #5619

What's Changed in Unsloth Zoo:

Honour offline env vars when enabling hf_transfer by @danielhanchen in unslothai/unsloth-zoo#675
Fix Studio q2_k_l GGUF export and new llama.cpp converter package layout by @danielhanchen in unslothai/unsloth-zoo#667
fix(mlx): match mlx-lm batch padding rule (1 + pad_to * ceil) by @danielhanchen in unslothai/unsloth-zoo#672
fix(mlx): expose finetune_last_n_layers for parity with mlx-lm CLI by @danielhanchen in unslothai/unsloth-zoo#669
fix(mlx): honor max_grad_value=None as a disable signal by @danielhanchen in unslothai/unsloth-zoo#671
fix(mlx): seed mx.random immediately before linear_to_lora_layers (re-PR of #674) by @danielhanchen in unslothai/unsloth-zoo#678
fix(mlx): warn on bf16 -> fp16 downcast in FastMLXModel loader by @danielhanchen in unslothai/unsloth-zoo#670
fix(mlx): make_baseline_loss_fn byte-identical to mlx-lm default_loss when labels=None by @danielhanchen in unslothai/unsloth-zoo#673
Fix Q2_K_L recipe: q2_k base + ffn_down=Q3_K, output=Q6_K, embed=Q4_K by @danielhanchen in unslothai/unsloth-zoo#677

Full Changelog: v0.1.40-beta...v0.1.41-beta

Contributors

danielhanchen, xodn348, and 5 other contributors

Assets 2

Qwen3.6 MTP and API / Connections

shimmyshimmer released this 18 May 13:40

v0.1.405-beta

4699c7e

We've got lots of new updates. Please use the latest Unsloth v0.1.405-beta, not v0.1.40-beta which is older.

~2x faster GGUF inference with automatically enabled MTP
API calling support for OpenAI, Anthropic etc. with auto prompt caching, web search, code execution
Connect to external inference backends: vLLM, Ollama, llama-server
Experimental MLX inference
Proper support for non-English languages
Security improvements

MTP speculative decoding support 1.4 to 2x faster inference!

Auto MTP speculative decoding for MTP GGUFs; warn when the bundled llama.cpp prebuilt is stale or too old for MTP
New pre-built llama.cpp binaries for MTP support!

API provider calling & external connections

You can now connect Unsloth to any API cloud provider (OpenAI, Anthropic, OpenRouter etc.)
Built-in web search for OpenAI, Anthropic, OpenRouter and Kimi
Built-in code execution for OpenAI and Anthropic (Anthropic containers persist and are reused across turns)
Prompt caching is enabled for OpenAI and Anthropic models saving 50 to 90% of costs.
API key now optional for local providers (llama.cpp / vLLM / Ollama)
Auto-load models when adding a cloud provider

MLX inference (Experimental)

MLX quants and models now can run locally on your Mac machines!
We'll be adding thinking, tools and web search soon!

Other Unsloth Studio updates

OpenDocument chat attachments
o3 reasoning summary payload
Sending/prompting non-English languages (e.g. Japanese, Chinese) now works properly
IME composer hardening, RTL dir="auto", long log-line truncation fix
Tool reasoning trace rendering in UI
Fully offline support: cached GGUF discovery and offline DNS auto-detect for both inference and training
Lots of UI/UX polish: dark theme refactor, right sidebar redesign, time-of-day sloth mascot, dismissable copyable toasts, larger chat composer, code-execution config polish, composer action pill styling, narrower Discord button

Training updates

Gemma attention mask fixes
Multi Image GRPO
GRPO hidden-state return experiments
New Continued Pretraining (CPT) training method as a first-class option
Gemma-4 MoE LoRA extractor registered to fix grouped_mm contraction crash
Opt-in fused lm_head + cross-entropy forward, with single-matmul path under UNSLOTH_RETURN_LOGITS=1
Pass batch size for eval
Eval/training paths now honour HF_DATASETS_OFFLINE alongside HF_HUB_OFFLINE

Unsloth Studio security improvements

Authentication rate-limiting, proxy-aware so reverse proxies don't bypass it
Sandboxed worker with a tightened blocklist (bash, hf upload, NOFILE)
Path containment so workers can't escape their in-flight tmp dirs
Strict schema validation across the Studio API
Tightened CSP / security headers (only legitimate favicon hosts allowed)
Removed the torch.load fallback on training_args.bin so untrusted pickles can never execute on model load
Hardened Tauri desktop release flow
Frontend auth: singleflight token refresh, current-password input on changes, working logout, shared 422 helper
Cancel cleanup now scoped strictly to in-flight tmp dirs so it can never delete user state

Bug fixes and correctness

Layout-aware MoE LoRA merge with loud-fail on fallback (no more silent wrong saves)
num_logits_to_keep regression fixed on transformers >= 4.52
Preserve tokenizer EOS token on merged saves
Resume PEFT checkpoints under sentence-transformers >= 5.4
Restore Flash > SDPA > Flex attention priority for non-Gemma3 models
ORPO text-only tokenization now works with processors
Embedding matrix size mismatch fix
Vicuna chat template fix
fast_generate unifies legacy and new logits kwargs (fixes Mistral merge site)
higher_precision_softmax made idempotent
Patch every LOSS_MAPPING key aliased to ForCausalLMLoss (covers transformers 5.x)
GGUF converter sibling imports fixed
UTF-8 encoding added to all text-mode file operations
Serialise GGUF reload and inherit unsloth-run extra args
Fix /recommended-folders 500 on unreadable model directories under Python 3.12+
Cross-family GGUF projector blocked in flat local dirs (no more wrong-vision-tower loads)

Installer and platform reliability

Custom install paths via STUDIO_HOME / UNSLOTH_STUDIO_HOME
CPU-only Linux x86_64 routed to ggml-org/llama.cpp prebuilts
Windows CUDA install fixes: paired cudart bundle and Torch NVIDIA DLL paths added to PATH
Skip flash-attn install on Blackwell GPUs (sm_100+)
Refresh Intel XPU extras for torch 2.7.1 / 2.9.1 / 2.10 / 2.11.0 / 2.12.0; torch upper cap raised to <2.13.0
HIP source builds on Ubuntu 24.04 now inject --gcc-install-dir
Linux prebuilt fixes for branch-based llama.cpp releases (mangled symlink repair, top-level dir strip)
New uninstallers for Linux, macOS (uninstall.sh) and Windows (uninstall.ps1)
Mac desktop shortcut spawning and lifecycle fixed
unsloth --version flag
Studio web update banner and release version display
GPU pinned at 95% headroom, with a warning on silent CPU fallback
Auto-install flash-linear-attention and tilelang for Qwen3.5 family

What's Changed in Unsloth

Bump installer floor to 2026.5.2 by @danielhanchen in #5297
install: support STUDIO_HOME / UNSLOTH_STUDIO_HOME for custom install paths by @danielhanchen in #5190
Route CPU-only Linux x86_64 to ggml-org/llama.cpp prebuilts by @danielhanchen in #5302
feat(studio): MLX training tab on Apple Silicon (LoRA / full FT, VLM, export) by @Manan17 in #5265
feat(studio): add Continued Pretraining (CPT) as a training method by @OnePunchMonk in #4677
Fix 14 stale tests under tests/studio/install/ that drifted from code by @danielhanchen in #5305
Add Studio PR-time CI: pin enforcement, frontend, backend, wheel smoke by @danielhanchen in #5298
Studio: restore Studio API and Help menu UI by @Imagineer99 in #5310
[studio]: Fix tool reasoning trace in UI by @CodeMan62 in #5314
fix: 3 patch_* helpers — fast_lora import, sft_trainer Union, openenv OSError by @danielhanchen in #5319
Studio: API settings overflow with long Colab URLs by @Imagineer99 in #5286
tests/studio/install: parallel UNSLOTH_STUDIO_HOME smoke test by @danielhanchen in #5306
Studio: Dark theme refactor, right sidebar redesign, and chat UI polish by @Imagineer99 in #5150
fix: harden Studio IME composer sends by @Etherll in #5327
Studio: stop truncating long log lines as suspected base64 by @rolandtannous in #5335
fix(gh_client): fail fast on 401/403 auth errors instead of retrying forever (#5325) by @Anai-Guo in #5329
fix: unblock 4 tests deselected/skipped in #5312 (real bugs) by @danielhanchen in #5359
fix(tests/sh): accept pinned tokenizers line after #5359 by @danielhanchen in #5361
CI: scope GITHUB_TOKEN permissions, add MLX CI, unblock ~60 skipped tests by @danielhanchen in #5312
studio/tests: make Playwright model-selector probe best-effort by @danielhanchen in #5371
Studio: download paired cudart bundle on Windows CUDA installs by @danielhanchen in #5322
Studio: add torch's pip nvidia DLL dirs to PATH on Windows by @danielhanchen in #5324
studio: authenticate HF downloads across Studio CI workflows by @danielhanchen in #5370
dependabot: group security updates and cover /studio/frontend npm advisories by @danielhanchen in #5372
Add Studio web update banner and release version display by @wasimysaid in #5308
ci/install: retry transient github.com 5xx on unsloth-zoo git fetches by @danielhanchen in #5389
studio/ci: pre-install lockfile supply-chain audit (npm + cargo) by @danielhanchen in #5392
studio/ci: npm tarball content scanner (no-install, hostile-input safe) by @danielhanchen in #5393
studio/tests: AbortSignal-bound in-page fetches and wall-clock watchdog for Playwright probes by @danielhanchen in #5391
chore: remove unused .semgrep/unsloth-rules.yml by @danielhanchen in #5395
studio/ci: sweep actions/cache v5 hardening across sibling smoke workflows by @danielhanchen in #5399
studio/ci: harden HF_HOME cache against actions/cache v5 silent restore failures by @danielhanchen in https:...

Contributors

melroy89, mmathew23, and 19 other contributors

Assets 2

0 Join discussion

New Unsloth API Inference Endpoint

shimmyshimmer released this 05 May 13:08

v0.1.39-beta

518782b

v0.1.39-beta bug fix
May 5th 2026 Fixes chat history not being shown (existing chat history is not lost) and attachments not attaching correctly. The bug was render-only - use 2026.5.2 or directly call curl -fsSL https://unsloth.ai/install.sh | sh or unsloth studio update to update

Run local LLMs with tools like Claude Code and Codex by connecting them to Unsloth’s API endpoint. This lets you run models like Qwen and Gemma locally, with additional features such as self-healing tool calling, code execution, and web search. Unsloth makes it easy to deploy a fast API inference endpoint that provides:

Self-healing tool calling, which helps reduce broken or malformed tool calls by 50%
Code execution support, allowing Bash and Python execution for more accurate code outputs.
Advanced Web search that visits and actually reads webpages to gather in-depth info.
Automatic inference settings for GGUF models (temp, top-k etc.)

Models loaded in Unsloth (including GGUFs) are exposed as an authenticated API via llama-server. A long API key is generated for security reasons like how OpenAI provides one. Your local models can then be used directly in your preferred AI agent, SDK, or chat client. Unsloth speaks two dialects on the same port:

Anthropic-compatible /v1/messages for Claude Code, OpenClaw, the Anthropic SDK, and any client that expects the Messages API.
OpenAI-compatible /v1/chat/completions and /v1/responses for the OpenAI SDK, OpenCode, Cursor, Continue, Cline, Open WebUI, SillyTavern, and any OpenAI-compatible tool.
Both support streaming, tool calling (OpenAI tools / Anthropic tools), and vision inputs.

New models

We've also got a handful of new models to run including NVIDIA Nemotron 3 Nano Omni, IBM Granite 4.1 and Mistral 3.5 Medium. We helped Mistral solve some issues with implementation in transformers and GGUFs.

Unsloth Updates

Stopped Studio training runs can now resume from checkpoints.
Chat threads now autosave and persist more reliably.
DPO training hangs in multi-process setups were fixed.
VLM GRPO support improved with MROPE updates.
Studio’s stop button now properly stops generation.
Fix chat template disappearing after browser refresh

What's Changed in Unsloth

Studio: use (gguf) context length before max seq length by @G07cha in #5111
chore: fix typo cleanup across tests and backend strings by @luojiyin1987 in #5152
fix: guard resolve_model_class fallback against unresolvable transformers AutoModel entries by @Etherll in #5155
Studio: kill in-flight llama-server before spawning a new one by @danielhanchen in #5171
Studio: stop currency escape from breaking inline LaTeX by @danielhanchen in #5170
Studio: probe AMD GPUs in llama-server VRAM detection by @danielhanchen in #5172
Studio: make stop button actually stop generation by @danielhanchen in #5069
Studio: add github_repo seed reader and GitHub Support Bot recipe by @danielhanchen in #5169
fix(studio): use endswith for mmproj F16 variant selection by @LeoBorcherding in #5184
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #5204
Fix Windows install when paths contain spaces or Python 3.14 is on PATH by @Etherll in #5201
Studio: Preserve transparency in uploaded profile avatars by @Imagineer99 in #5200
UX: single chat header error placement and selector alignment by @Imagineer99 in #5173
Studio: Refine chat preset and group built-in presets by @Imagineer99 in #5159
Studio: Fix image-only chat requests failing validation by @Imagineer99 in #5212
Studio: fix 7 failing studio_unit_tests on main by @danielhanchen in #5216
Patch checkpoint reload init functions to strip unsupported args by @Datta0 in #5167
Studio: Fix clipped model selector text descenders by @Imagineer99 in #5210
Fix DPO trainer multi process hang by @Datta0 in #5199
Studio: Pin assistant-ui core for fresh installs by @Imagineer99 in #5229
Fix local model scanner to handle ollama cloud models by @Anish9901 in #5220
Fix Studio desktop tray installer and titlebar and bux fixes by @wasimysaid in #5179
MROPE for VLM GRPO by @Datta0 in #5198
install: overlay unsloth-zoo from git main on --local by @rolandtannous in #5242
Studio: Fix chat template disappearing after browser refresh by @Imagineer99 in #5209
studio: add --local to setup.sh + overlay unsloth-zoo from git main by @rolandtannous in #5252
Fix/windowsprebuilt by @mmathew23 in #5241
Studio: Add dataset upload dropzone and update preserve think copy by @Imagineer99 in #5253
Add Qwen3.6 support by @rolandtannous in #5257
Studio: Chat thread autosave persistence by @Imagineer99 in #5256
Studio: Enable deleting fine-tuned chat models by @Imagineer99 in #5234
Studio: Add checkpoint resume for stopped training runs by @Imagineer99 in #5255
Studio: Polish spacing and profile input radius by @Imagineer99 in #5222
Fix check for libcurl headers in install.sh by @LFd3v in #5251
Default Studio host to 127.0.0.1 and prompt before auto-start by @rolandtannous in #5267
Studio: forward llama-server args from unsloth studio run , activate unsloth run , and allow passing model:quant to load models by @rolandtannous in #5271
Studio: Always show API usage examples and docs links by @Imagineer99 in #5270
Studio: Change API Keys settings to API Access by @Imagineer99 in #5268
unsloth run: add --enable-tools/--disable-tools server-side tool policy by @rolandtannous in #5277
fix: use % 8 instead of // 8 in FP8 weight shape check by @Ricardo-M-L in #5243
Pin Studio GGUF export to llama.cpp's local convert script by @mmathew23 in #5275
fix KVCache estimates for gemma4 style sliding window models by @Datta0 in #5225
Update VRAM estimator to cater to broader model configs by @Datta0 in #5175
Fix FastSentenceTransformer loading with newer sentence-transformers by @Etherll in #5259
Studio: Preserve chat history during autosave by @Imagineer99 in #5278

What's changed in Unsloth-Zoo

Fix fused CE grad scaling under DDP by @danielhanchen in unslothai/unsloth-zoo#434
Fused CE backward: guard scaling=0, drop tensor path, use out-of-place mul by @mmathew23 in unslothai/unsloth-zoo#610
Fix/gemma4moefix by @mmathew23 in unslothai/unsloth-zoo#612
MROPE for VLM GRPO by @Datta0 in unslothai/unsloth-zoo#614
Double-buffer GPU activations for overlapping H2D copy with backward compute by @ruixiang63 in unslothai/unsloth-zoo#534
fix(temporary_patches/utils): add missing comma in all (raise_error / Unpack) by @Anai-Guo in unslothai/unsloth-zoo#617
Fix qwen lora extractor for diff peft versions by @Datta0 in unslothai/unsloth-zoo#618
fix: use backend device type in GGUF merge path by @andomeder in unslothai/unsloth-zoo#615
Add unsloth_compiled_cache to gitignore by @Datta0 in unslothai/unsloth-zoo#622
Allow local convert_hf_to_gguf.py via UNSLOTH_LLAMA_CPP_SCRIPTS_DIR by @mmathew23 in https://github.com/unslothai/unsloth-zoo/pull...

Contributors

luojiyin1987, G07cha, and 15 other contributors

Assets 2

0 Join discussion

New UI Redesign + Qwen3.6

shimmyshimmer released this 23 Apr 14:33

v0.1.37-beta

5c473fa

Hey guys, we revamped the entire Unsloth Studio UI and UX experience to put an emphasis on chat and training:

Added a collapsible sidebar based on community feedback
You can now delete chats and search past conversations
New Preserve Thinking toggle for models that support it like Qwen3.6
Cleaner, more consistent design with easier navigation
Expanded Settings page with options to change your profile picture, name, and more
No more entering your Hugging Face token twice
gpt-oss now has low, medium and high thinking toggles.
Now uses latest llama.cpp prebuilt, even on Linux CUDA
Lots of bug, consistency and stability fixes
Kimi-K2.6 can now be run!
We also added experimental API support. Guides, announcement etc will come next week.

Qwen3.6 was also also previously already supported in Unsloth Studio for running and training. You can train and run Qwen3.6-27B right now!

What's Changed

Only run ldconfig CUDA-linking recovery when we have permission by @danielhanchen in #4930
Fix Mistral DPO/preference training crash on non-xformers platforms (e.g. Intel XPU) by @cheehook in #4889
Fix raw text paragraph break normalization by @kiankyars in #4884
Studio: keep chat input visible and fix compare pane clipping by @Imagineer99 in #4924
fix: check find() return value before adding offset in try_fix_tokenizer by @Ricardo-M-L in #4923
updated models template mappers. added lfm2.5vl450m to transformers 5… by @rolandtannous in #4939
Revert "updated models template mappers. added lfm2.5vl450m to transformers 5…" by @rolandtannous in #4945
Add AMD ROCm/HIP support across installer and hardware detection by @danielhanchen in #4720
Pin bitsandbytes to continuous-release_main on ROCm (4-bit decode fix) by @danielhanchen in #4954
Fix Gemma-4 GRPO catastrophic KL divergence with TRL 1.0.0+ by @danielhanchen in #4934
Add ROCm test suite (companion to #4720) by @danielhanchen in #4824
updating gemma4 script by @Manan17 in #4992
Move gemma4 script by @Manan17 in #4994
studio: fix route transition DOM duplication via AnimatePresence mode="wait" by @AdamPlatin123 in #4987
Studio: Prompt manager, message deletion, and chat UI improvements by @Imagineer99 in #4938
Pin kernels==0.12.1 to fix training import failure by @rolandtannous in #5000
Studio: Expose openai and anthropic compatible external API end points by @danielhanchen in #4956
studio: skip training status/metrics polling when idle by @AdamPlatin123 in #4988
studio: fix api-keys access + refresh by @wasimysaid in #5005
Studio: Polish API key copy button and harden async clipboard fallback by @Imagineer99 in #5006
fix(studio): default chart view to full training history by @Barath19 in #5007
[Studio] Show non exported models in chat UI by @Datta0 in #4892
[Studio] Install flash attn at setup time for linux by @Datta0 in #4979
fix(studio): remove 300s cap on load_checkpoint (inherits 3600s default) by @TF-MTGE in #4922
Studio: honor explicit GGUF ctx and default to 4096 when weights exceed VRAM by @danielhanchen in #5011
Studio: make GGUF disk-space preflight cache-aware by @danielhanchen in #5012
Studio: anchor ctx-slider warning threshold at 4096 when weights exceed VRAM by @danielhanchen in #5014
studio: show HF model download progress in training start overlay by @danielhanchen in #4894
studio: stream export worker output into the export dialog by @danielhanchen in #4897
Fix num_items_in_batch GA for Gemma4 by @Datta0 in #4998
studio: pin peft to 0.18.1 to fix export subprocess issues by @rolandtannous in #5015
Studio: live model-load progress + rate/ETA on download and load by @danielhanchen in #5017
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #5004
Fix bitsandbytes ROCm install by using pip instead of uv by @edamamez in #4966
Studio: split model-load progress label across two rows by @danielhanchen in #5020
Studio: hard-stop at n_ctx with a 'Context limit reached' toast by @danielhanchen in #5021
[moe][gemma4] Target MoE for gemma4 by @Datta0 in #4913
Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var by @rolandtannous in #5024
Studio: support GGUF variant selection for non-suffixed repos by @Imagineer99 in #5023
fix: prevent offline freeze by fixing stats retry and forwarding local_files_only by @DavidSolanas in #5016
Respect classification head skip list on pre-quantized 4-bit checkpoints (#5027) by @danielhanchen in #5034
fix(rocm): tighten gfx regex to ignore generic ISA lines by @danielhanchen in #5033
Fix grad-accum accepts_loss_kwargs detection for vision wrappers by @danielhanchen in #5036
grpo_compute_loss_slow called with wrong positional args by @jonahsamost in #4887
Gate trl disable_gradient_checkpointing warning on UNSLOTH_ENABLE_LOGGING by @danielhanchen in #5038
Studio: refresh Downloaded GGUF list and recurse into variant subdirs by @danielhanchen in #5032
feat: Add support for OLMo-3 model by @OnePunchMonk in #4678
feat: Add cactus QAT scheme support by @OnePunchMonk in #4679
Re-apply #4939: updated models template mappers by @rolandtannous in #4950
Studio: add folder browser modal for Custom Folders by @danielhanchen in #5035
Bump Studio installer minimum to 2026.4.5 by @danielhanchen in #5041
fix Gemma4 flash attn disable by @mmathew23 in #5045
BUG: fix _fix_chat_template for ChatML templates missing add_generation_prompt (#4150) by @kimimgo in #4426
fix: use direct registry API for PATH writes instead of SetEnvironmentVariable by @Etherll in #4961
Chat-template repair: warn-by-default, AST classification, dict support by @danielhanchen in #5049
Restrict flash attn to <=256 head dim. Consolidate attn impl checks by @Datta0 in #5051
Remove legacy venv Scripts entry from User PATH on upgrade by @danielhanchen in #5060
Fix review findings for chat-template repair (#5049) by @danielhanchen in #5056
Studio: Ollama support, recommended folders, Custom Folders UX polish by @danielhanchen in #5050
feat(studio): replace navbar with collapsible sidebar by @wasimysaid in #4936
fix audio dataset preview and finetuning by @CodeMan62 in #5043
Chat first onboarding by @wasimysaid in #5063
Fix onboarding followups by @wasimysaid in #5064
Studio: Default Gemma fallback for chat + AI assist by @Imagineer99 in #5066
fix: multi-GPU inference crash for bnb 4-bit/8-bit models by @danielhanchen in #5068
Add Qwen3.6 inference defaults for Studio by @danielhanchen in #5065
Add qwen3.6 script by @Manan17 in #5084
Studio: forward standard OpenAI tools / tool_choice to llama-server by @rolandtannou...

Contributors

G07cha, mmathew23, and 21 other contributors

Assets 2

1 Join discussion

Uh oh!

Uh oh!

Releases: unslothai/unsloth

Release list

GLM 5.2 + Model Hub + 3x longer contexts

Better context length algorithm

Chat Canvas, Forking & Queueing

Hub (Redesigned)

Models & Inference

Security & Cloudflare Encrypted Studios

Logging and API

Hardware & Backend

Training & General Fixes & Parallel Modules

What's Changed

Contributors

Uh oh!

DiffusionGemma + Gemma 4 MTP

DiffusionGemma + Gemma 4 MTP + Audio

Hub + Download Manager (Experimental)

Chat with Files / RAG (Experimental)

New Update Button + Hardware Support

Local Chat, Tools & API Compatibility

Tool Calling, MCP, Encrypted Cloudflare Tunnels

Training & General Fixes

To update Unsloth or install a new Unsloth Studio, you must use:

What's Changed

Contributors

Uh oh!

Gemma 4 MTP + Bug Fixes

Bug Fixes and more cross platform support

To update Unsloth or install a new Unsloth Studio, you must use:

What's Changed

Contributors

Uh oh!

Gemma 4 12B, New UI, MCP, Projects

To update Unsloth or install a new Unsloth Studio, you must use:

Gemma 4 12B

MCP

New Chat UI

Projects

Experimental Canvas / Artifacts

Install, Runtime & Hardware

Other Studio improvements

What's Changed

Contributors

Uh oh!

CUDA 13.3, Windows, Mac update

To update Unsloth or install a new Unsloth Studio, you must use:

macOS, Linux, WSL:

Windows:

Mac Updates

Windows Updates

CUDA 13.3 Update

Blackwell GPUs Update

What's Changed

New Contributors

Contributors

Uh oh!

An Update before Revamp!

API provider calling & external connections

Other Unsloth Studio updates

Unsloth Studio security improvements

What's Changed

Contributors

Uh oh!

MTP + Studio fixes

Lots of bug fixes, UI, UX fixes to Studio!

macOS, Linux, WSL:

Windows:

Fixes

What's Changed in Unsloth

What's Changed in Unsloth Zoo:

Contributors

Uh oh!

Qwen3.6 MTP and API / Connections

MTP speculative decoding support 1.4 to 2x faster inference!

API provider calling & external connections

MLX inference (Experimental)

Other Unsloth Studio updates

Training updates