vLLM
Checked 54m agoLink OKFree plan available
Fast and easy-to-use library for running language models and serving them with high throughput. Optimizes memory usage through efficient attention algorithm and KV cache optimization. Reduces serving costs by up to 10 times compared to traditional approaches. Supports popular models like LLama, Mistral, Qwen, and many others. Works with OpenAI-compatible API for easy integration. Used in production by many companies to serve LLMs efficiently. Free and open source.
Comments
Sign in to add a comment. Your account must be at least 1 day old.