← Back to Tools · Browse opensource tools

ExLlamaV2

Checked 56m agoLink OKFree plan available

Optimized CUDA kernel for extremely fast inference on quantized language models. Provides significantly faster inference than standard implementations. Uses specialized GPU optimization for Llama and compatible models. Reduces memory requirements through efficient quantization support. Used in production systems where speed is critical. Excellent for batch processing or high-throughput inference. Open source and actively maintained.

Learn more in this category

Browse tasks in this category · Category overview

Comments

  • Loading...