LLAA178

codeKiller LLAA178

Achievements

LeetGPU-Guidebook LeetGPU-Guidebook Public

一步步通关GPU编程

Cuda 50 7
vllm-kivi vllm-kivi Public

Production-ready 2/4-bit KV Cache quantization for vLLM via Triton; 70% VRAM saving & 1.8x speedup

Python 2
cpp-performance-lab cpp-performance-lab Public

C++ microbenchmark lab for cache, memory, ILP, synchronization, queue, and allocator experiments

C++ 1
qlib-gpu-model qlib-gpu-model Public

GPU-first quant deep learning starter built with PyTorch and Qlib-style data pipelines

Python 1