Skip to content
Log in
Create account
DEV Community
#
llminference
Follow
Hide
Posts
👋
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
AI Inference at the Edge: Running Real-Time LLMs in Kubernetes Without a GPU Farm
The Cyber Sidekick
The Cyber Sidekick
The Cyber Sidekick
Follow
Jun 18
AI Inference at the Edge: Running Real-Time LLMs in Kubernetes Without a GPU Farm
#
edgeai
#
kubernetes
#
llminference
#
vllm
Add Comment
3 min read
Qwen 3.6 enable_thinking — The MoE Pitfall That Broke My Agent JSON Parsing
SleepyQuant
SleepyQuant
SleepyQuant
Follow
May 18
Qwen 3.6 enable_thinking — The MoE Pitfall That Broke My Agent JSON Parsing
#
qwen
#
mlx
#
localai
#
llminference
Add Comment
5 min read
Multiple Independent Questions: Batch Into One Request or Split Into Many? — An Analysis of LLM Concurrent Processing
eyanpen
eyanpen
eyanpen
Follow
May 3
Multiple Independent Questions: Batch Into One Request or Split Into Many? — An Analysis of LLM Concurrent Processing
#
llminference
#
autoregressivegeneration
#
parallelrequests
#
continuousbatching
Add Comment
5 min read
👋
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account