michabbb on Nostr: š Efficient #LLM inference: #AirLLM enables running #Llama3.1 models up to 70B on ...
š Efficient #LLM inference: #AirLLM enables running #Llama3.1 models up to 70B on 4GB VRAM, and up to 405B on 8GB.
š¾ Memory optimization: Runs without needing quantization or distillation, saving resources.
š Compression for speed: 4-bit and 8-bit compression options provide up to 3x speed boost with minimal accuracy loss.
š§ Broad support: Compatible with various models like ChatGLM, QWen, and more.
š Platform-ready: Runs seamlessly on Linux, MacOS, and low-end GPUs.
https://github.com/lyogavin/airllmPublished at
2024-10-25 07:03:12Event JSON
{
"id": "77aac224741eb82aea327c4d382b26a3b40f64cff0214572a3618820d92d9b0f",
"pubkey": "129f83898c7008d335771fe681ecf979e7767ad958c552ff85de962ba2f775be",
"created_at": 1729839792,
"kind": 1,
"tags": [
[
"t",
"llm"
],
[
"t",
"airllm"
],
[
"t",
"llama3"
],
[
"proxy",
"https://social.vivaldi.net/users/michabbb/statuses/113366780650970765",
"activitypub"
]
],
"content": "š Efficient #LLM inference: #AirLLM enables running #Llama3.1 models up to 70B on 4GB VRAM, and up to 405B on 8GB.\n\nš¾ Memory optimization: Runs without needing quantization or distillation, saving resources.\n\nš Compression for speed: 4-bit and 8-bit compression options provide up to 3x speed boost with minimal accuracy loss.\n\nš§ Broad support: Compatible with various models like ChatGLM, QWen, and more.\n\nš Platform-ready: Runs seamlessly on Linux, MacOS, and low-end GPUs.\nhttps://github.com/lyogavin/airllm",
"sig": "66f65f43a3fd8bc59e0638cc8ae1b9d242890458879c3a9ded61f51734e2716a53e5ca9bcaf6f2fdc98624f5d64b944d10c84d656f93e3364b8b948ce08a71f4"
}