Why Nostr? What is Njump?
2024-10-25 07:03:12

michabbb on Nostr: šŸŽ‰ Efficient #LLM inference: #AirLLM enables running #Llama3.1 models up to 70B on ...

šŸŽ‰ Efficient #LLM inference: #AirLLM enables running #Llama3.1 models up to 70B on 4GB VRAM, and up to 405B on 8GB.

šŸ’¾ Memory optimization: Runs without needing quantization or distillation, saving resources.

šŸš€ Compression for speed: 4-bit and 8-bit compression options provide up to 3x speed boost with minimal accuracy loss.

šŸ§  Broad support: Compatible with various models like ChatGLM, QWen, and more.

šŸ”— Platform-ready: Runs seamlessly on Linux, MacOS, and low-end GPUs.
https://github.com/lyogavin/airllm
Author Public Key
npub1z20c8zvvwqydxdthrlngrm8e08nhv7ketrz49lu9m6tzhghhwklql84yd9