Learning more about running my own LLM models at home. Apparently, the quantization ...

2024-06-27 10:13:55

Learning more about running my own LLM models at home. Apparently, the quantization method impacts performance differently on different kinds of hardware.

This is why, if you’re browsing models on Hugging Face, you’ll see files with suffixes like “Q3_K_S” and “IQ2_XXS”. The number after the “Q” tells you which quantization method the model uses. Some will be much slower than others depending on the capabilities of the CPU and GPU in the machine. #llm

Author Public Key

npub1v9qy0ry6uyh36z65pe790qrxfye84ydsgzc877armmwr2l9tpkjsdx9q3h

Show more details

jimbocoin on Nostr: Learning more about running my own LLM models at home. Apparently, the quantization ...