Why Nostr? What is Njump?
2025-05-23 19:13:47
in reply to

Daniel Wigton on Nostr: I use llama3.3 70b quite regularly. At a 4bit quant it is 43GB. I have 24GB vram so ...

I use llama3.3 70b quite regularly. At a 4bit quant it is 43GB. I have 24GB vram so 19 GB needs to be swapped from RAM every token. I get about 3.5 tokens per second. This is equivalent of someone typing 150wpm.
Author Public Key
npub1w4jkwspqn9svwnlrw0nfg0u2yx4cj6yfmp53ya4xp7r24k7gly4qaq30zp