ollama seems to load as much as it can into VRAM, and the rest into RAM. Llama 3.1 ...

2024-07-25 03:59:55

ollama seems to load as much as it can into VRAM, and the rest into RAM. Llama 3.1 70b is running a lot slower than 8b on a 4090, but it's usable. The ollama library has a bunch different versions that appear to be quantized: https://ollama.com/library/llama3.1

Author Public Key

npub1lceznr3f426wc2g3crdamfy9cpels6w9g38wjtm6ufr76gz3vfjskpw464

Show more details

John Dee on Nostr: ollama seems to load as much as it can into VRAM, and the rest into RAM. Llama 3.1 ...