jimbocoin on Nostr: Learning more about running my own LLM models at home. Apparently, the quantization ...
Learning more about running my own LLM models at home. Apparently, the quantization method impacts performance differently on different kinds of hardware.
This is why, if you’re browsing models on Hugging Face, you’ll see files with suffixes like “Q3_K_S” and “IQ2_XXS”. The number after the “Q” tells you which quantization method the model uses. Some will be much slower than others depending on the capabilities of the CPU and GPU in the machine. #llm
Published at
2024-06-27 10:13:55Event JSON
{
"id": "b6a6d416b2c49e4bbc2e8da383ac1b9395c5ff87ed9b11f9ebe0f0f77b2fa323",
"pubkey": "6140478c9ae12f1d0b540e7c57806649327a91b040b07f7ba3dedc357cab0da5",
"created_at": 1719483235,
"kind": 1,
"tags": [
[
"t",
"llm"
]
],
"content": "Learning more about running my own LLM models at home. Apparently, the quantization method impacts performance differently on different kinds of hardware.\n\nThis is why, if you’re browsing models on Hugging Face, you’ll see files with suffixes like “Q3_K_S” and “IQ2_XXS”. The number after the “Q” tells you which quantization method the model uses. Some will be much slower than others depending on the capabilities of the CPU and GPU in the machine. #llm",
"sig": "b7d5f0e2d63640b7d1ffcf89f16c655dbed8912b72a161dc2d59c17b78348567268536cb5c53cbca45b05932a5b3cb651cf0d54f0465965e59191c88a4066192"
}