sglang

Star

Here are 162 public repositories matching this topic...

ai-dynamo / dynamo

Star

A Datacenter Scale Distributed Inference Serving Framework

kubernetes rust routing-engine omni diffusion vllm llm-inference tensorrt-llm sglang disaggregated-serving

Updated Jul 2, 2026
Rust

kvcache-ai / Mooncake

Star

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

reinforcement-learning inference rdma disaggregation llm vllm sglang kvcache trt-llm tokenspeed

Updated Jul 2, 2026
C++

Gen-Verse / OpenClaw-RL

Star

OpenClaw-RL: Train any agent simply by talking

async gui-application coding slime tinker memory-systems skill-learning rlhf sglang grpo on-policy-distillation openclaw-skills open-claw

Updated May 23, 2026
Python

gpustack / gpustack

Star

A GPU cluster manager for high-performance AI model serving (vLLM, SGLang) and on-demand SSH-accessible GPU instances.

cuda inference openai llama maas rocm ascend llm llm-serving vllm genai llm-inference qwen deepseek sglang distributed-inference high-performance-inference mindie

Updated Jul 2, 2026
Python

intel / auto-round

Star

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

transformers rounding quantization omni int4 diffusers llms vllm gguf vlms sglang mxfp4 nvfp4

Updated Jul 2, 2026
Python

MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flexible speaker control, and multilingual support, while enabling zero-shot voice cloning from short audio references.

streaming finetune text-to-speeh large-language-models sglang speech-dialogue-generation

Updated Mar 23, 2026
Python

sybil-solutions / local-studio

Star

Control panel for VLLM, Sglang, llama.cpp, exllamav3

ai local hosting self llamacpp vllm exllama local-ai sglang

Updated Jul 2, 2026
TypeScript

ModelCloud / GPTQModel

Star

LLM model quantization (compression) toolkit with HW acceleration support for Nvidia, AMD, Intel GPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

transformers quantization optimum peft vllm gptq sglang

Updated Jul 1, 2026
Python

SemiAnalysisAI / InferenceX

Star

Open Source Continuous Inference Benchmark Research Platform — Kimi K2.7-Code, MiniMax M3, DeepSeekv4, GLM5 - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 & soon™ TPUv6e/v7/Trainium2/3

benchmark ai amd cuda pytorch nvidia glm minimax rocm kimi llm vllm deepseek sglang gb300 gb200 mi355x

Updated Jul 2, 2026
Python

ovg-project / kvcached

Star

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

serverless inference-engine llm llm-serving vllm llm-inference ollama llm-framework sglang kvcache gpu-sharing kvcached gpu-mutiplexing kvcache-optimization elastic-kvcache online-offline-coserve

Updated Jul 1, 2026
Python

OpenMOSS / MOVA

Star

MOVA: Towards Scalable and Synchronized Video–Audio Generation

multimodal diffusion-models sglang video-audio-generation

Updated Jun 18, 2026
Python

sgl-project / SpecForge

Star

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

training eagle pytorch llm fsdp sglang eagle3

Updated Jul 1, 2026
Python

Tencent-Hunyuan / UniRL

Star

UniRL is a Framework for Unified Multimodal Model Reinforcement Learning

reinforcement-learning vllm sglang ai-infrastructure

Updated Jul 1, 2026
Python

HuiResearch / FlashTTS

Star

基于SparkTTS、OrpheusTTS等模型，提供高质量中文语音合成与声音克隆服务。

vllm sglang llamacpp-python sparktts spark-tts orpheus-tts megatts3 flashtts

Updated May 18, 2025
Python

ome-projects / ome

Star

Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, TensorRT-LLM, and Triton

k8s llama oracle-cloud model-serving model-as-a-service multi-node-kubernetes llm vllm llm-inference qwen deepseek sglang kimi-k2 pd-disaggregation

Updated Jul 2, 2026
Go

redai-infra / Relax

Star

An Asynchronous Reinforcement Learning Engine for Omni-Modal Post-Training at Scale

reinforcement-learning multi-agent vlm distributed-training post-training multimodal megatron-lm llm ray-serve rlhf qwen sglang grpo agentic-rl

Updated Jul 1, 2026
Python

lightseekorg / smg

Sponsor

Star

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across vLLM, TRT-LLM, TokenSpeed, SGLang, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

chat mcp routing gemini openai claude llm anthropic vllm sglang anthropic-api inference-gateway tokenspeed responses-api tensorrtllm trtllm lightseek

Updated Jul 2, 2026
Rust

spark-arena / sparkrun

Star

sparkrun - launch, manage, and stop LLM inference workloads on NVIDIA DGX Spark systems

inference llama-cpp vllm sglang dgx-spark

Updated Jul 1, 2026
Python

InftyAI / llmaz

Star

☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!

kubernetes inference huggingface llm modelscope llamacpp vllm text-generation-inference ollama sglang inference-platform

Updated Jan 26, 2026
Go

shell-nlp / gpt_server

Star

gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR、TTS、文生图、图片编辑和文生视频的开源框架。

tts openai llama gpt infinity embedding asr text-moderation llm prompt-injection vllm fastchat function-calling rerank sglang lmdeploy

Updated May 9, 2026
Python

Improve this page

Add a description, image, and links to the sglang topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the sglang topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sglang

Here are 162 public repositories matching this topic...

ai-dynamo / dynamo

kvcache-ai / Mooncake

Gen-Verse / OpenClaw-RL

gpustack / gpustack

intel / auto-round

OpenMOSS / MOSS-TTSD

sybil-solutions / local-studio

ModelCloud / GPTQModel

SemiAnalysisAI / InferenceX

ovg-project / kvcached

OpenMOSS / MOVA

sgl-project / SpecForge

Tencent-Hunyuan / UniRL

HuiResearch / FlashTTS

ome-projects / ome

redai-infra / Relax

lightseekorg / smg

spark-arena / sparkrun

InftyAI / llmaz

shell-nlp / gpt_server

Improve this page

Add this topic to your repo