Skip to content

DEV Community

# llminference

👋 Sign in for the ability to sort posts by relevant, latest, or top.

The Cyber Sidekick

Jun 18

AI Inference at the Edge: Running Real-Time LLMs in Kubernetes Without a GPU Farm

#edgeai #kubernetes #llminference #vllm

3 min read

May 18

Qwen 3.6 enable_thinking — The MoE Pitfall That Broke My Agent JSON Parsing

#qwen #mlx #localai #llminference

5 min read

eyanpen

May 3

Multiple Independent Questions: Batch Into One Request or Split Into Many? — An Analysis of LLM Concurrent Processing

#llminference #autoregressivegeneration #parallelrequests #continuousbatching

5 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.