Connect Python SDK to Unsloth

Guide to calling Unsloth's local API from Python using the official OpenAI or Anthropic SDKs including streaming, vision, function calling, and Unsloth's built-in server-side tools.

Unsloth serves three OpenAI-compatible dialects at the same base URL. Chat Completions, Responses, and Anthropic Messages, so every mainstream Python SDK works against it. You change only the base_url and api_key on the client; everything else (streaming, tool calling, vision, structured output) behaves the way the SDK documents. This page covers the two SDKs most developers reach for first: the official OpenAI Python SDK and the official Anthropic Python SDK.

If you're not sure what URL / key / model name to use, read the API overview first. It walks you through starting, loading a model, and creating an sk-unsloth-… key.

🔑 Prerequisites

Before you run any of the snippets below you'll need:

Unsloth running locally with a model loaded (note the port: typically 8000 or 8888).
An sk-unsloth-… API key created from Settings → API.
A model name. The name of the GGUF model inside Unsloth (e.g. qwen-local, unsloth/Qwen3.6-27B-GGUF). If you forget it, run:
```
curl http://localhost:8888/v1/models -H "Authorization: Bearer sk-unsloth-…"
```
and copy the id field.

Set the key as an env var so you never paste it into code:

export UNSLOTH_STUDIO_AUTH_TOKEN=sk-unsloth-xxxxxxxxxxxx

🤖 OpenAI SDK

Unsloth's /v1/chat/completions endpoint is a drop-in for the OpenAI Python SDK. The client treats Unsloth like any other OpenAI-compatible provider.

1. Install the SDK:

pip install openai

2. Create a client pointed at Unsloth:

import os
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8888/v1",              # your unsloth port + /v1
    api_key=os.environ["UNSLOTH_STUDIO_AUTH_TOKEN"],     # your sk-unsloth-… key
)

Basic chat completion

Streaming

Set stream=True and iterate over the returned generator:

Images (vision)

Attach an image as an image_url content part. Unsloth accepts either an HTTP(S) URL or a data: base64 URI:

The loaded model must be multimodal. If you load a text-only model, the vision request will succeed structurally but the model won't be able to "see" the image.

Function calling (OpenAI tools)

Pass OpenAI-style tools and (optionally) tool_choice and Unsloth forwards them to the backend. Your client is responsible for executing each tool call and returning the result on the next turn:

Unsloth server-side tools (shorthand)

In addition to OpenAI-style client-side tools, Unsloth can execute Python, bash, and web search server-side and stream the results back automatically. Opt in via the extra_body parameter so the fields pass straight through to Unsloth:

The session_id is optional. Use it to persist tool state (e.g. a Python kernel) across calls.

enabled_tools currently supports "python", "bash", and "web_search". Tool results are streamed back as tool_result events so the model can see them on its next turn.

Listing models

🧠 Anthropic SDK

Unsloth's /v1/messages endpoint is a drop-in for the Anthropic Python SDK.

1. Install the SDK:

2. Create a client pointed at Unsloth:

Basic message

Streaming

The SDK exposes a context manager that yields text deltas:

Images (vision)

Anthropic-style image content uses a source block with base64 data:

Tool calling (Anthropic tools)

Pass Anthropic-style tools with an input_schema and Unsloth forwards them natively:

Unsloth server-side tools (shorthand)

The same enable_tools / enabled_tools / session_id shorthand works against /v1/messages pass it through extra_body:

Unsloth emits custom tool_result SSE events for the model's view of each tool call's output. The Anthropic SDK passes these through its event stream unchanged.

JSON decoding (`response_format`)

Unsloth supports OpenAI-style structured outputs via response_format. Pass a JSON Schema and the model is constrained to produce JSON matching it.

The strict: True flag tells Unsloth to enforce the schema during decoding rather than relying on the model to comply on its own. additionalProperties: False and required work as in standard JSON Schema.

The terminal output should look roughly like this:

🧪 Choosing an SDK

Both SDKs work against Unsloth. The right choice depends on the rest of your stack:

Use the OpenAI SDK if your code already depends on the OpenAI Python package, you want OpenAI-style tools / tool_choice, or you plan to call the Responses API.
Use the Anthropic SDK if your code already depends on the Anthropic package, you prefer Anthropic's input_schema tool format, or you want the Anthropic-native streaming event types.

You can use both in the same project. Unsloth serves them on the same port, so a single sk-unsloth-… key authenticates both.

❔ Troubleshooting

401 Unauthorized The UNSLOTH_STUDIO_AUTH_TOKEN env var isn't set, or the key is wrong. Re-export and confirm with echo $UNSLOTH_STUDIO_AUTH_TOKEN.

404 Not Found from the OpenAI SDK Check that base_url ends in /v1. The OpenAI SDK appends endpoint paths to the base URL as-is.

404 Not Found from the Anthropic SDK Check that base_url does not end in /v1. The Anthropic SDK adds /v1/messages itself.

extra_body fields aren't reaching Unsloth Make sure you're on a recent openai / anthropic SDK. Older versions silently drop unknown fields. Upgrade with pip install -U openai anthropic.

Streaming "hangs" then dumps everything at once Whatever is wrapping your output is buffering. In a script, print(..., flush=True); in a notebook, it's usually fine; behind a proxy, disable response buffering on the proxy.

For endpoint-level issues (wrong port, model not loading, lost connection, etc.) see the API overview page.

Optional: set server defaults

You can configure default server behavior before connecting with the Python SDK, when using the unsloth run command.

Use --reasoning off to turn thinking off, or --reasoning on to turn it on for models that support reasoning.

This starts the server on 0.0.0.0:8888, allowing other devices on your local network to connect.

These settings become the server defaults when requests do not specify their own generation parameters.

Request-level values like temperature, top_p, max_tokens, and stream can still override the defaults for that request.

PreviousHermes Agent NextCurl & HTTP

Last updated 7 days ago

Was this helpful?

🔑 Prerequisites

🤖 OpenAI SDK

Basic chat completion

Streaming

Images (vision)

Function calling (OpenAI tools)

Unsloth server-side tools (shorthand)

🧠 Anthropic SDK

Basic message

Streaming

Images (vision)

Tool calling (Anthropic tools)

Unsloth server-side tools (shorthand)

JSON decoding (response_format)

🧪 Choosing an SDK

❔ Troubleshooting

Optional: set server defaults

JSON decoding (`response_format`)