Skip to content

Releases: unslothai/unsloth

GLM 5.2 + Model Hub + 3x longer contexts

Choose a tag to compare

@danielhanchen danielhanchen released this 18 Jun 17:36
7ecbf5a

GLM-5.2 is now supported in Unsloth Studio! All reasoning levels supported. 3x longer context lengths are now achievable with our new auto fit algorithm with MTP, allowing longer chats. Bypass permissions mode, forkable chats, queue-able chats, a new hub for model discovery, parallel modules + HTTPS Cloudflare support and more! Use unsloth studio --secure for secure HTTPS global access! Read our GLM-5.2 guide

Screenshot 2026-06-18 at 10-35-59 Chat - Unsloth Studio

To update Unsloth or install a new Unsloth Studio, you must use the below.
Ensure your version is 2026.6.9 or v0.1.471-beta for the latest.

MacOS, Linux, WSL:

curl -fsSL https://unsloth.ai/install.sh | sh

Windows:

irm https://unsloth.ai/install.ps1 | iex

Better context length algorithm

As per #6312 and #6447, we made Unsloth Studio's determination of memory usage and context length much better, achieving 3x longer context overall:

scenario KV before after
1x 32GB pipeline (~31 GB free) f16 23,040 64,000
  q8_0 43,520 114,944
  q4_0 82,432 199,680
2x 32GB pipeline any 262,144 262,144
2x 24GB tensor (~23 GB free) f16 134,049 262,144
  q8_0 252,329 262,144

Chat Canvas, Forking & Queueing

  • Edit assistant messages in place and re-run from any point in the thread.
  • Fork a thread to branch a conversation without losing the original.
  • Temporary (incognito) chats that leave nothing behind.
  • Queue new prompts while a generation is still running instead of waiting.
  • Chat "artifacts" are now canvas, with inline HTML canvas cards that auto-render, a Code view, and DiffusionGemma keeps its raw code visible inline instead of collapsing.
  • Chat search now covers every message and surfaces your own messages first.

Hub (Redesigned)

  • Full-page Hub with a trending feed, search, and custom model paths support.
  • README preview in a split-view feed so you can read before you download.
  • Downloads default to the faster Xet transport, with automatic HTTP fallback if a transfer stalls.
  • New "Load on selection" toggle to set load options before a model loads.
  • Google logo shown for DiffusionGemma and future Gemma derivatives.

Models & Inference

  • DeepSeek-OCR and more vision models now load and run without errors.
  • Fixed fast inference on the latest vLLM (0.22+) so speed-ups work again.
  • Tensor parallelism is more reliable: if the faster MTP path fails, it now recovers on its own instead of crashing.
  • DiffusionGemma now shows the image forming live as it denoises, with accurate speed stats.

Security & Cloudflare Encrypted Studios

  • New --secure Cloudflare-only mode for end-to-end encrypted studios, with server-side tools staying enabled under --secure. Use unsloth studio --secure!
  • Bypass Permissions mode to skip confirmations and disable the tool sandbox when you want it.
  • Auto detect Hugging Face Virus scanning + dangerous files in repos.

Logging and API

  • New API server monitor in Studio.
  • Faster API calling and less latency
  • Much better streamlined logs - now with throughput and latency and removed a lot of bloated logs.

Hardware & Backend

  • Better support for Blackwell RTX 50X and 60X GPUs
  • Fix silent downgrading to CPU and not GPU
  • torchao version is now selected from the installed torch.
  • Installer now auto-repairs a broken or CPU-only PyTorch install and warns on silent CPU fallback, across NVIDIA + AMD on Win/Linux/Mac/WSL.
  • Frees the chat model's VRAM when training starts, but only when the GPU is actually tight (no needless reloads otherwise).
  • If llama-server hard-crashes at startup, Studio now steps through a recovery ladder instead of just failing.

Training & General Fixes & Parallel Modules

  • MLX training updates.
  • Improved GRPO training reliability with vLLM.
  • Training startup made more reliable, with clearer errors for invalid VLM batches.
  • Studio now cleans up leftover backend processes more reliably after crashes, restarts, or interrupted shutdowns.
  • Export, Chat, Training, Recipes are all individualized / compartmentalized! This means you can do all 4 in parallel now! You can chat / do inference while you wait for a training run or an export!

What's Changed

Read more

DiffusionGemma + Gemma 4 MTP

Choose a tag to compare

@shimmyshimmer shimmyshimmer released this 12 Jun 13:57

We've merged over 150 PRs this week so lots of new updates, a new model Hub and look! Ensure you install the latest v0.1.464-beta or 2026.6.7. DiffusionGemma, Gemma 4 MTP and MiniMax-M3 are all now supported.

DiffusionGemma + Gemma 4 MTP + Audio

  • Run and train DiffusionGemma via Unsloth Studio. Install the latest v0.1.462-beta if DiffusionGemma wasn't previously working.
  • Gemma 4 MTP is here! Run Gemma 4 around 2x faster with MTP - MTP is auto enabled in Unsloth Studio.
  • Audio chat is now supported for Gemma 4 (wav, mp3, m4a,flac, webm).
  • Preserve Thinking added to Gemma 4.
diffusiongemma

Hub + Download Manager (Experimental)

  • Added a new Hub page for browsing, downloading, and managing Hugging Face models and datasets.
  • Unsloth can now detect models and datasets already on your machine and show them alongside downloaded assets.
  • Downloaded GGUF models now have direct Run / New Chat actions.

Chat with Files / RAG (Experimental)

  • Added Chat with Files in Studio, letting you ask questions over your own documents and knowledge bases.
  • Supports hybrid search, citations, PDF previews, per-thread documents, and a built-in search_knowledge_base tool.

New Update Button + Hardware Support

Local Chat, Tools & API Compatibility

  • Local tool calling is more reliable, with better ordering of tool cards, fewer duplicate tool loops, and support for tool use with GGUF vision models.
  • Improved OpenAI-compatible API and Anthropic-compatible API behavior for local Studio servers, including better errors, token usage, stop reasons, and Claude Code compatibility.

Tool Calling, MCP, Encrypted Cloudflare Tunnels

  • Bypass Permissions, Tool Call Permissions (Approve, Always Approve, Deny)
  • 50% to 90% less tool call nudging issues without any accuracy loss
  • MCP, Artifacts are now select-able
  • Tensor parallelism is now enabled for GGUFs - get +30% throughput!
  • Cloudflare HTTPS free tunnels is now added allowing for end to end encrypted studios!

Training & General Fixes

  • Improved MLX support with better model labels, generation speed stats, and fixes for VLM training.
  • Fixed several training and dataset edge cases, including non-writable Hugging Face caches and custom dataset mappings.
  • Added many UI polish fixes across chat, menus, model picker, dark mode, import/export, and settings.

To update Unsloth or install a new Unsloth Studio, you must use:

macOS, Linux, WSL:

curl -fsSL https://unsloth.ai/install.sh | sh

Windows:

irm https://unsloth.ai/install.ps1 | iex

Warning

DO NOT USE unsloth studio update since packaging will not get the latest updates

What's Changed

  • Studio: llama.cpp update banner redesign, About tab license info, UI polish by @shimmyshimmer in #6196
  • Bump install.sh / install.ps1 pin to unsloth>=2026.6.3 by @danielhanchen in #6212
  • Expose runtime context length for hub models by @alkinun in #6154
  • Studio: fix llama.cpp update banner offering a downgrade / sticking on mix releases by @oobabooga in #6219
  • Fix kwarg spacing in training files to satisfy pre-commit by @shimmyshimmer in #6209
  • Studio: reword the Cloudflare line when the public probe fails by @danielhanchen in #6217
  • fix: deduplicate lemonade ROCm prebuilt selection log by @LeoBorcherding in #6021
  • Stop false RoPE 'default' warning and fix rope drift gate on transformers 5 by @danielhanchen in #6223
  • fix(studio): load run.py by path for editable installs by @jimdawdy-hub in #5909
  • fix(studio): inherit llama_extra_args and honor --no-mmproj by @jimdawdy-hub in #5902
  • fix(studio): adopt server-loaded model before chat auto-load by @jimdawdy-hub in #5900
  • Fix stale sidebar regression test to match the gap-px markup by @danielhanchen in #6232
  • Studio: gate the staged prebuilt runtime validation behind a flag (off by default) by @danielhanchen in #6216
  • Fix FastModel config passthrough for sequence classification by @alkinun in #6203
  • fix: decode subprocess output as UTF-8 in save.py on Windows by @dylanschroers in #6218
  • patch: fix EmptyLogits gathering in nested payloads and Accelerate recursively_apply by @MdHussain121 in #6092
  • Studio: show Apple GPU temperature and power in the GPU monitor (macOS) by @Ban921 in #6187
  • Studio: Add inline confirmation (Allow/Always allow/Deny) for tool calls by @oobabooga in #5869
  • Studio: guard Apple GPU power against negative counter-reset readings by @danielhanchen in #6235
  • Fix step count mismatch when sequence packing is enabled by @IrakliXYZ in #5967
  • fix/uv-bytecode-timeout by @alkinun in #6166
  • Studio: tune llama.cpp env for data-center GPUs by @danielhanchen in #6098
  • Studio: drop the on-disk freshness cache after a llama.cpp update by @danielhanchen in #6234
  • Add missing RAG deps to no-torch Studio runtime requirements by @danielhanchen in #6236
  • Studio: rounded rectangle hover states for menu items instead of pills by @shimmyshimmer in #6210
  • docs: repository cleanup by @Agnibha007 in #5617
  • Run cross-platform parity test on Windows and macOS in CI by @danielhanchen in #6241
  • chore(studio/frontend): normalize line endings to LF by @danielhanchen in #6012
  • fix: respect absolute export paths to prevent cross-drive copy failures (WinError 112) by @anmolxlight in #6088
  • Studio: Add Tensor-Parallel llama.cpp support by @oobabooga in #6040
  • Studio: Add custom provider option to Connections by @Imagineer99 in #6112
  • Studio: model selector and settings polish by @shimmyshimmer in #6240
  • Studio: login card polish and sidebar label alignment by @shimmyshimmer in #6242
  • Studio: pinnable plus menu items and saved prompt pins by @shimmyshimmer in #6237
  • Studio: bottom update banners, smooth llama.cpp progress, re-prompt after copy by @shimmyshimmer in #6233
  • fix(studio/responses): forward chat_template_kwargs enable_thinking to chat request by @Anai-Guo in #6202
  • Studio: fix WSL Strix Halo GPU on reinstall (ROCDXG drop-in + system HIP before bundle) by @danielhanchen in #6227
  • Studio: fully rounded Hub pills and refreshed menu icons by @shimmyshimmer in #6248
  • Studio: use px-2.5 for Hub option menu padding by @shimmyshimmer in #6249
  • Studio: fix Downloaded model list disappearing and order it by last download by @danielhanchen in #6247
  • Studio: new-chat shortcut, composer draft autosave, archive threads by @NilayYadav in #5771
  • Studio: persist speculative decoding preference across restart and model switch by @oobabooga in #6169
  • Studio: refine menu chevron, tick icon, and one-line plus-menu shape by @shimmyshimmer in #6251
  • Studio: serve D...
Read more

Gemma 4 MTP + Bug Fixes

Choose a tag to compare

@danielhanchen danielhanchen released this 10 Jun 18:20

Bug Fixes and more cross platform support

To update Unsloth or install a new Unsloth Studio, you must use:

macOS, Linux, WSL:

curl -fsSL https://unsloth.ai/install.sh | sh

Windows:

irm https://unsloth.ai/install.ps1 | iex

Warning

DO NOT USE unsloth studio update since packaging will not get the latest updates

What's Changed

  • Bump install.sh / install.ps1 pin to unsloth>=2026.6.1 by @danielhanchen in #5977
  • Port KTO logps truncation guard to TRL 1.x _compute_logps refactor by @danielhanchen in #5996
  • CI: track deepseek_ocr2 compile timeout in known-broken list by @danielhanchen in #5995
  • fix(studio): disable mlx gc for none by @Lyxot in #5991
  • Normalize shell scripts to LF in .gitattributes by @danielhanchen in #5997
  • Studio: enable audio input for Gemma 4 GGUFs; default chat model to Qwen3.5-4B-MTP by @danielhanchen in #6000
  • Fix chat text cutoff at composer dock and speed up plus icon spin by @shimmyshimmer in #5989
  • Studio: refine tool call and reasoning trigger UI by @shimmyshimmer in #5873
  • fix: warn when localhost resolves to ::1 but Studio is bound only to 127.0.0.1 by @mvanhorn in #5994
  • fix: persist Studio thread synchronously on first runStart so mid-stream refresh keeps the prompt by @mvanhorn in #5814
  • Studio: enable GGUF tools with vision inputs by @Imagineer99 in #6009
  • Studio: accept system-role messages in Claude Code requests by @Imagineer99 in #6006
  • Studio: fix load_freeze audio-type tests for #6000's Gemma 4 <|audio|> probe by @danielhanchen in #6018
  • Studio: fix chat preset persistence with fast mode by @Imagineer99 in #5870
  • Studio: fix Repo tests (CPU) by stopping the ROCm test from leaking a fake utils into sys.modules by @danielhanchen in #6027
  • Studio: stop ROCm amd-smi tests leaking a fake loggers into sys.modules (follow-up to #6027) by @danielhanchen in #6055
  • Studio: emit usage and timings for MLX generation speed stats by @shimmyshimmer in #6068
  • Studio: tag MLX loaded models as MLX instead of Base in chat by @shimmyshimmer in #6067
  • Studio: remove red border on chat error messages by @shimmyshimmer in #6063
  • Studio: keep chat in place when composer attachments resize it by @shimmyshimmer in #6070
  • CI: allowlist deepseek_ocr2 in compiler full-model-sweep by @danielhanchen in #6085
  • Restore KTO logps truncation guard for TRL (re-apply dropped #5996) by @danielhanchen in #6086
  • Studio: stop leaking internal exceptions to API clients; harden sandbox path by @danielhanchen in #6072
  • Formatting: ruff line-length 100 + drop blank after short local imports by @danielhanchen in #6079
  • Fix MoE LoRA target parameter handling by @Datta0 in #5345
  • Refactor VLM detection in studio by @Datta0 in #5245
  • qwen 3.5 export fixes by @Datta0 in #5992
  • [fix] Nvfp4 load by @Datta0 in #6087
  • Studio: open the MCP dialog to the server list by @oobabooga in #6100
  • Studio: make code comments and docstrings more succinct by @danielhanchen in #6029
  • Reduce and tighten code comments and docstrings repo-wide by @danielhanchen in #6095
  • Studio frontend: reduce and tighten code comments by @danielhanchen in #6099
  • feat(studio): Hub + Download Manager by @Sneakr in #5916
  • Studio fix recipe dataset preview by @wasimysaid in #6031
  • Studio: make Helper LLM startup pre-cache opt in by @wasimysaid in #6113
  • Improve local chat tool call flow by @wasimysaid in #5962
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #6104
  • Studio: improve OpenAI- and Anthropic-compatible API spec compliance by @oobabooga in #6010
  • Studio: follow-up fix for GGUF developer prompts by @wasimysaid in #6115
  • Studio: stop the providers dialog from resetting custom provider form state by @NilayYadav in #6051
  • Studio: clean-room compact RAG (knowledge bases, hybrid search, fast indexing) by @danielhanchen in #5910
  • feat: support text-only loading of Gemma 3 27B via FastLanguageModel (skip SiglipVisionModel) by @mvanhorn in #5816
  • studio(ui): use the --primary brand token for the avatar fallback color by @danielhanchen in #5987
  • fix(studio): block arbitrary external image URLs in markdown renderer by @rsd-darshan in #5602
  • Studio: center account avatar vertically in sidebar footer pill by @shimmyshimmer in #6026
  • fix: clearer Studio setup error when GPU driver is too old for the installed CUDA toolkit by @mvanhorn in #5993
  • Studio: npm v12 readiness (allowScripts policy, npmrc cleanup, bun bootstrap fix) by @danielhanchen in #6128
  • Studio: faithful conversation export and import round trips (ShareGPT system role, CSV quoted newlines) by @danielhanchen in #6131
  • Studio: auto-sync allowScripts pins after dependency bumps by @danielhanchen in #6136
  • Studio: unify shadows, backgrounds and dark mode consistency in chat UI by @shimmyshimmer in #6116
  • Windows/WSL installer: fix winget msstore cert failure, amd-smi DiskPart prompt, and enable AMD GPU (Strix Halo gfx1151) by @danielhanchen in #5940
  • Studio: bulk export and import in Settings Chat Data, MCP pill off switch by @danielhanchen in #6141
  • Studio: fix nested dropdown submenus clipped by the menu alignment nudge by @danielhanchen in #6143
  • Studio: declutter the chat plus menu and make RAG session-only with pre-select by @danielhanchen in #6140
  • fix/validate dataset video paths before training by @LeoBorcherding in #5136
  • Restore config use_cache in for_inference after gradient checkpointing prep by @danielhanchen in #6137
  • fix: persist Windows ROCm BNB version by @Peter7896 in #6048
  • Studio GGUF CI: fix deterministic JSON-mode failure by bumping the quant, keep hard asserts by @danielhanchen in #6138
  • Frontend CI: hard-fail unreviewed npm install scripts (--strict-allow-scripts) by @danielhanchen in #6139
  • Studio UI polish: search dialog shadow, model picker pills, sidebar spacing, white Hub background by @shimmyshimmer in #6147
  • Run the HF cache redirect before import fixes that can freeze Hub constants by @danielhanchen in #6150
  • Fix Studio MLX VLM resized image layout by @Lyxot in #6019
  • fix(studio): reject unsupported MLX CPT and embedding training by @Lyxot in #6091
  • fix(studio): forward custom dataset mappings in MLX training by @Lyxot in #6094
  • Tests + CI guard: batched left-padded generation can never silently regress again (#1066, #3699) by @danielhanchen in #6145
  • Tests: follow Compare chat into the More submenu in the Playwright chat UI driver by @danielhanchen in #6153
  • Fix UnboundLocalError in ROCm version detection dpkg/rpm fallback by @danielhanchen in #6149
  • Auto-set BNB_ROCM_VERSION from the installed wheel on Windows + ROCm by @danielhanchen in #5986
  • Call patch_compiling_bitsandbytes on the FastLanguageModel path by @danielhanchen in #6144
  • Studio: mascot images degrade gracefully instead of showing alt text by @danielhanchen in #6146
  • Studio: training survives a non-writable HF datasets cache by @danielhanchen in #6148
  • Studio: fix Gemma-4-12B-it not loading by @Imagineer99 in https://github.com/uns...
Read more

Gemma 4 12B, New UI, MCP, Projects

Choose a tag to compare

@danielhanchen danielhanchen released this 03 Jun 14:35
c7d2ed1

Hey everyone, this update focuses mainly on MCP, Projects, Canvas and the new chat UI.
We've also made many improvements across Studio. Next week we'll have an even bigger update.
If Gemma 4 12B isn't working for you, please re-update Unsloth!

unsloth studio new ui

To update Unsloth or install a new Unsloth Studio, you must use:

macOS, Linux, WSL:

curl -fsSL https://unsloth.ai/install.sh | sh

Windows:

irm https://unsloth.ai/install.ps1 | iex

Warning

DO NOT USE unsloth studio update since packaging will not get the latest updates

Gemma 4 12B

Google releases Gemma 4 12B, a new model that runs locally on 8GB RAM. GGUF / Guide
Gemma 4 12B Unified supports image, audio and 256K context. Run and train the model via Unsloth Studio.

MCP

  • Give your model live tools instead of relying on memory one click in the composer, no API keys for the built-ins
  • Built-in presets:
    • Context7 current docs & code for thousands of libraries
    • Exa live web search
    • Hugging Face search models, datasets & papers
  • Add your own remote (OAuth/headers) or local (stdio) servers, toggled per chat

New Chat UI

  • Projects, Canvas, MCP, and Compare tuck into one + menu
  • Search and Code are now one click away

Projects

  • Keep related chats together in one workspace
  • Create a project from the sidebar, then add new or existing chats to it

Experimental Canvas / Artifacts

  • Opens generated HTML in a dedicated canvas panel inside Unsloth Studio
  • Supports interactive outputs, including browser based visualizations and CDN-loaded packages
  • Lets you switch between rendered preview and source code

Install, Runtime & Hardware

  • CUDA / Windows
    • CUDA 13.3 llama.cpp binaries now work on Windows (and other non-Linux) and fix the CUDA 13.2 gibberish-output bug while default still pins to CUDA 13.1 for now
    • On CUDA 13.2, 13.1 and below, Windows falls back to CUDA 12.4 and native 13.1 binaries coming soon
    • Windows prebuilt installs no longer block on the early CUDA Toolkit check
  • Linux / GPU
    • Linux llama.cpp prebuilts now match your runtime's cudart major version
    • Prebuilt coverage for Blackwell (with a CUDA 13.0 driver fallback) and B300 (sm_103)
    • ARM64 Linux now source-builds on GPU hosts, with a CPU prebuilt fallback
    • ROCm: detected AMD gfx arch is forwarded to the prebuilt installer (with a setup.sh fallback)
  • macOS
    • Fixed Apple Silicon installs that were resolving torch against x86_64

Other Studio improvements

  • Connected models now work in Compare mode
  • Smoother streaming which now renders batched to one per animation frame
  • Larger upload limits for training datasets and recipe
  • Window size and maximized state persist across launches
  • Chat search hides non-matching threads
  • Model loading handles mid-refresh cancellation cleanly
  • Cleaner rendering for generated image frames and Python tool code blocks

What's Changed

Full Changelog: v0.1.43-beta...v0.1.44-beta

CUDA 13.3, Windows, Mac update

Choose a tag to compare

@danielhanchen danielhanchen released this 31 May 14:06

To update Unsloth or install a new Unsloth Studio, you must use:

macOS, Linux, WSL:

curl -fsSL https://unsloth.ai/install.sh | sh

Windows:

irm https://unsloth.ai/install.ps1 | iex

Warning

DO NOT USE unsloth studio update since packaging will not get the latest updates

Mac Updates

  • Re-enabled llama.cpp prebuilt binaries for Apple Silicon (M1-M4) - Mac OS 14 / 15 / 26 (Tahoe)
  • Apple Silicon Mac OS 13 (Ventura) is source build
  • Intel (x86_64) for Mac OS 13.3 / 14 / 15 / 26 (Tahoe) uses llama.cpp prebuilt binaries
  • Intel for Max 13.0 - 13.2 is source build

Windows Updates

  • CUDA 13.3 llama.cpp prebuilt binaries now work for Windows
  • For CUDA 13.2, CUDA 13.1 and below, Windows devices uses CUDA 12.4 fallback - we'll work on CUDA 13.1 binaries soon.

CUDA 13.3 Update

  • CUDA 13.3 non Linux binaries work. We'll still use CUDA 13.1 for now
  • CUDA 13.3 solves the CUDA 13.2 gibberish problem - see #4849

Blackwell GPUs Update

  • For now Blackwell will have delayed releases of llama.cpp prebuilt binaries sine CUDA 12.4 does not work - we are working to resolve this soon.

What's Changed

New Contributors

Full Changelog: v0.1.42-beta...v0.1.43-beta

An Update before Revamp!

Choose a tag to compare

@shimmyshimmer shimmyshimmer released this 26 May 14:36

Hey guys, we're doing one more-ish update before a major revamp which is likely coming this week or next week. Our revamp will change a lot of things, especially with new major features and a lot of design changes.

  • NEW: API calling support now with image generation + editing, proper web search, code execution, auto prompt caching. Connect OpenAI, Anthropic and more.
  • unsloth studio update should now work properly. For Mac users, use the install curl command instead: curl -fsSL https://unsloth.ai/install.sh | sh
  • Proper support for non-English languages e.g. Japanese, Chinese, Indian etc.
apicalling.unsloth.mp4

Many of you may have missed our previous release which only lasted for one day. We introduced:

  • Auto MTP speculative decoding for MTP GGUFs; get the best settings customized for your hardware.

API provider calling & external connections

  • You can now connect Unsloth to any API cloud provider (OpenAI, Anthropic, OpenRouter etc.)
  • Built-in web search for OpenAI, Anthropic, OpenRouter and Kimi
  • Built-in code execution for OpenAI and Anthropic (Anthropic containers persist and are reused across turns)
  • Prompt caching is enabled for OpenAI and Anthropic models saving 50 to 90% of costs.
  • Image generation + editing
  • API key now optional for local providers (llama.cpp / vLLM / Ollama)
  • Auto-load models when adding a cloud provider

Other Unsloth Studio updates

  • OpenDocument chat attachments
  • o3 reasoning summary payload
  • Sending/prompting non-English languages (e.g. Japanese, Chinese) now works properly
  • IME composer hardening, RTL dir="auto", long log-line truncation fix
  • Tool reasoning trace rendering in UI
  • Fully offline support: cached GGUF discovery and offline DNS auto-detect for both inference and training

Unsloth Studio security improvements

  • Authentication rate-limiting, proxy-aware so reverse proxies don't bypass it
  • Sandboxed worker with a tightened blocklist (bash, hf upload, NOFILE)
  • Path containment so workers can't escape their in-flight tmp dirs
  • Strict schema validation across the Studio API
  • Tightened CSP / security headers (only legitimate favicon hosts allowed)
  • Removed the torch.load fallback on training_args.bin so untrusted pickles can never execute on model load
  • Hardened Tauri desktop release flow
  • Frontend auth: singleflight token refresh, current-password input on changes, working logout, shared 422 helper
  • Cancel cleanup now scoped strictly to in-flight tmp dirs so it can never delete user state

What's Changed

Read more

MTP + Studio fixes

Choose a tag to compare

@danielhanchen danielhanchen released this 19 May 14:49
735d26b

Lots of bug fixes, UI, UX fixes to Studio!

To get the latest updates do:

macOS, Linux, WSL:

curl -fsSL https://unsloth.ai/install.sh | sh

Windows:

irm https://unsloth.ai/install.ps1 | iex

Fixes

  1. Fix unsloth studio update not working well
  2. Fix getting stuck on reset-password page
  3. More offline mode support
  4. Improve MTP not being faster on Macs, CPUs and GPUs - now it's much better!
  5. Fix Desktop Shortcut not working after update
  6. Many many UI UX bug fixes

What's Changed in Unsloth

What's Changed in Unsloth Zoo:

Full Changelog: v0.1.40-beta...v0.1.41-beta

Qwen3.6 MTP and API / Connections

Choose a tag to compare

@shimmyshimmer shimmyshimmer released this 18 May 13:40
4699c7e

We've got lots of new updates. Please use the latest Unsloth v0.1.405-beta, not v0.1.40-beta which is older.

  • ~2x faster GGUF inference with automatically enabled MTP
  • API calling support for OpenAI, Anthropic etc. with auto prompt caching, web search, code execution
  • Connect to external inference backends: vLLM, Ollama, llama-server
  • Experimental MLX inference
  • Proper support for non-English languages
  • Security improvements

MTP speculative decoding support 1.4 to 2x faster inference!

  • Auto MTP speculative decoding for MTP GGUFs; warn when the bundled llama.cpp prebuilt is stale or too old for MTP
  • New pre-built llama.cpp binaries for MTP support!

API provider calling & external connections

  • You can now connect Unsloth to any API cloud provider (OpenAI, Anthropic, OpenRouter etc.)
  • Built-in web search for OpenAI, Anthropic, OpenRouter and Kimi
  • Built-in code execution for OpenAI and Anthropic (Anthropic containers persist and are reused across turns)
  • Prompt caching is enabled for OpenAI and Anthropic models saving 50 to 90% of costs.
  • API key now optional for local providers (llama.cpp / vLLM / Ollama)
  • Auto-load models when adding a cloud provider

MLX inference (Experimental)

  • MLX quants and models now can run locally on your Mac machines!
  • We'll be adding thinking, tools and web search soon!

Other Unsloth Studio updates

  • OpenDocument chat attachments
  • o3 reasoning summary payload
  • Sending/prompting non-English languages (e.g. Japanese, Chinese) now works properly
  • IME composer hardening, RTL dir="auto", long log-line truncation fix
  • Tool reasoning trace rendering in UI
  • Fully offline support: cached GGUF discovery and offline DNS auto-detect for both inference and training
  • Lots of UI/UX polish: dark theme refactor, right sidebar redesign, time-of-day sloth mascot, dismissable copyable toasts, larger chat composer, code-execution config polish, composer action pill styling, narrower Discord button

Training updates

  • Gemma attention mask fixes
  • Multi Image GRPO
  • GRPO hidden-state return experiments
  • New Continued Pretraining (CPT) training method as a first-class option
  • Gemma-4 MoE LoRA extractor registered to fix grouped_mm contraction crash
  • Opt-in fused lm_head + cross-entropy forward, with single-matmul path under UNSLOTH_RETURN_LOGITS=1
  • Pass batch size for eval
  • Eval/training paths now honour HF_DATASETS_OFFLINE alongside HF_HUB_OFFLINE

Unsloth Studio security improvements

  • Authentication rate-limiting, proxy-aware so reverse proxies don't bypass it
  • Sandboxed worker with a tightened blocklist (bash, hf upload, NOFILE)
  • Path containment so workers can't escape their in-flight tmp dirs
  • Strict schema validation across the Studio API
  • Tightened CSP / security headers (only legitimate favicon hosts allowed)
  • Removed the torch.load fallback on training_args.bin so untrusted pickles can never execute on model load
  • Hardened Tauri desktop release flow
  • Frontend auth: singleflight token refresh, current-password input on changes, working logout, shared 422 helper
  • Cancel cleanup now scoped strictly to in-flight tmp dirs so it can never delete user state

Bug fixes and correctness

  • Layout-aware MoE LoRA merge with loud-fail on fallback (no more silent wrong saves)
  • num_logits_to_keep regression fixed on transformers >= 4.52
  • Preserve tokenizer EOS token on merged saves
  • Resume PEFT checkpoints under sentence-transformers >= 5.4
  • Restore Flash > SDPA > Flex attention priority for non-Gemma3 models
  • ORPO text-only tokenization now works with processors
  • Embedding matrix size mismatch fix
  • Vicuna chat template fix
  • fast_generate unifies legacy and new logits kwargs (fixes Mistral merge site)
  • higher_precision_softmax made idempotent
  • Patch every LOSS_MAPPING key aliased to ForCausalLMLoss (covers transformers 5.x)
  • GGUF converter sibling imports fixed
  • UTF-8 encoding added to all text-mode file operations
  • Serialise GGUF reload and inherit unsloth-run extra args
  • Fix /recommended-folders 500 on unreadable model directories under Python 3.12+
  • Cross-family GGUF projector blocked in flat local dirs (no more wrong-vision-tower loads)

Installer and platform reliability

  • Custom install paths via STUDIO_HOME / UNSLOTH_STUDIO_HOME
  • CPU-only Linux x86_64 routed to ggml-org/llama.cpp prebuilts
  • Windows CUDA install fixes: paired cudart bundle and Torch NVIDIA DLL paths added to PATH
  • Skip flash-attn install on Blackwell GPUs (sm_100+)
  • Refresh Intel XPU extras for torch 2.7.1 / 2.9.1 / 2.10 / 2.11.0 / 2.12.0; torch upper cap raised to <2.13.0
  • HIP source builds on Ubuntu 24.04 now inject --gcc-install-dir
  • Linux prebuilt fixes for branch-based llama.cpp releases (mangled symlink repair, top-level dir strip)
  • New uninstallers for Linux, macOS (uninstall.sh) and Windows (uninstall.ps1)
  • Mac desktop shortcut spawning and lifecycle fixed
  • unsloth --version flag
  • Studio web update banner and release version display
  • GPU pinned at 95% headroom, with a warning on silent CPU fallback
  • Auto-install flash-linear-attention and tilelang for Qwen3.5 family

What's Changed in Unsloth

Read more

New Unsloth API Inference Endpoint

Choose a tag to compare

@shimmyshimmer shimmyshimmer released this 05 May 13:08

v0.1.39-beta bug fix
May 5th 2026 Fixes chat history not being shown (existing chat history is not lost) and attachments not attaching correctly. The bug was render-only - use 2026.5.2 or directly call curl -fsSL https://unsloth.ai/install.sh | sh or unsloth studio update to update

Run local LLMs with tools like Claude Code and Codex by connecting them to Unsloth’s API endpoint. This lets you run models like Qwen and Gemma locally, with additional features such as self-healing tool calling, code execution, and web search. Unsloth makes it easy to deploy a fast API inference endpoint that provides:

Models loaded in Unsloth (including GGUFs) are exposed as an authenticated API via llama-server. A long API key is generated for security reasons like how OpenAI provides one. Your local models can then be used directly in your preferred AI agent, SDK, or chat client. Unsloth speaks two dialects on the same port:

  • Anthropic-compatible /v1/messages for Claude Code, OpenClaw, the Anthropic SDK, and any client that expects the Messages API.
  • OpenAI-compatible /v1/chat/completions and /v1/responses for the OpenAI SDK, OpenCode, Cursor, Continue, Cline, Open WebUI, SillyTavern, and any OpenAI-compatible tool.
  • Both support streaming, tool calling (OpenAI tools / Anthropic tools), and vision inputs.

New models

We've also got a handful of new models to run including NVIDIA Nemotron 3 Nano Omni, IBM Granite 4.1 and Mistral 3.5 Medium. We helped Mistral solve some issues with implementation in transformers and GGUFs.

Unsloth Updates

  • Stopped Studio training runs can now resume from checkpoints.
  • Chat threads now autosave and persist more reliably.
  • DPO training hangs in multi-process setups were fixed.
  • VLM GRPO support improved with MROPE updates.
  • Studio’s stop button now properly stops generation.
  • Fix chat template disappearing after browser refresh

What's Changed in Unsloth

What's changed in Unsloth-Zoo

Read more

New UI Redesign + Qwen3.6

Choose a tag to compare

@shimmyshimmer shimmyshimmer released this 23 Apr 14:33

Hey guys, we revamped the entire Unsloth Studio UI and UX experience to put an emphasis on chat and training:

  • Added a collapsible sidebar based on community feedback
    new ui
  • You can now delete chats and search past conversations
    Delete messages search chats
  • New Preserve Thinking toggle for models that support it like Qwen3.6
  • Cleaner, more consistent design with easier navigation
  • Expanded Settings page with options to change your profile picture, name, and more
    change profile settings
  • No more entering your Hugging Face token twice
  • gpt-oss now has low, medium and high thinking toggles.
  • Now uses latest llama.cpp prebuilt, even on Linux CUDA
  • Lots of bug, consistency and stability fixes
  • Kimi-K2.6 can now be run!
  • We also added experimental API support. Guides, announcement etc will come next week.

Qwen3.6 was also also previously already supported in Unsloth Studio for running and training. You can train and run Qwen3.6-27B right now!

What's Changed

Read more