Customed version of Google's tflite-micro
-
Updated
Sep 17, 2025 - C++
Customed version of Google's tflite-micro
High-performance LLM inference platform with vLLM continuous batching achieving 12.3K+ req/sec, 42ms P50/178ms P99 latency, INT8/INT4 quantization (70% memory savings), tensor parallelism across 4 GPUs, and comprehensive monitoring serving 1500+ concurrent users.
Add a description, image, and links to the inference-embedded-engine topic page so that developers can more easily learn about it.
To associate your repository with the inference-embedded-engine topic, visit your repo's landing page and select "manage topics."