Daniel Wigton on Nostr: I use llama3.3 70b quite regularly. At a 4bit quant it is 43GB. I have 24GB vram so ...
I use llama3.3 70b quite regularly. At a 4bit quant it is 43GB. I have 24GB vram so 19 GB needs to be swapped from RAM every token. I get about 3.5 tokens per second. This is equivalent of someone typing 150wpm.
Published at
2025-05-23 19:13:47Event JSON
{
"id": "8186e18768a65804c3f51c1fc0fdea6245b43d4aeffb62b6824b8a3585bc0ca9",
"pubkey": "75656740209960c74fe373e6943f8a21ab896889d8691276a60f86aadbc8f92a",
"created_at": 1748027627,
"kind": 1,
"tags": [
[
"e",
"d090889dfc19e313ca6d93f90197e1095105ea76fa4c8235e7cb8597f9953694",
"",
"root"
],
[
"p",
"036533caa872376946d4e4fdea4c1a0441eda38ca2d9d9417bb36006cbaabf58"
]
],
"content": "I use llama3.3 70b quite regularly. At a 4bit quant it is 43GB. I have 24GB vram so 19 GB needs to be swapped from RAM every token. I get about 3.5 tokens per second. This is equivalent of someone typing 150wpm.",
"sig": "e733499c9b46d72a633951fd181ade858274aea35b39fb887ce13b3fd40257ff0178f384f7acdef9b9332d28d3dde856aa05670ea11ed23dbddb64c94580d1a1"
}