Why Nostr? What is Njump?
2024-07-25 03:59:55
in reply to

John Dee on Nostr: ollama seems to load as much as it can into VRAM, and the rest into RAM. Llama 3.1 ...

ollama seems to load as much as it can into VRAM, and the rest into RAM. Llama 3.1 70b is running a lot slower than 8b on a 4090, but it's usable. The ollama library has a bunch different versions that appear to be quantized: https://ollama.com/library/llama3.1
Author Public Key
npub1lceznr3f426wc2g3crdamfy9cpels6w9g38wjtm6ufr76gz3vfjskpw464