Daniel Wigton on Nostr: That tracks. If you are ram/cpu bound then you GPU is going to be doing a whole lot ...
That tracks. If you are ram/cpu bound then you GPU is going to be doing a whole lot of waiting around. Ollama will put as much of the model in the GPU as it can and just leave it there. So those weights run nearly instantaneously. It looks like that memory is 1600MT/s and perhaps single channel?
Or maybe it is feeding a cpu that can't chew fast enough? 100% utilization hints at that. I am running an i9-13900k.
Published at
2025-05-23 20:21:16Event JSON
{
"id": "f156ef85c28e0bbf25faf162b753a97a1b22f757c0655e4af6849352ae9ba802",
"pubkey": "75656740209960c74fe373e6943f8a21ab896889d8691276a60f86aadbc8f92a",
"created_at": 1748031676,
"kind": 1,
"tags": [
[
"e",
"d090889dfc19e313ca6d93f90197e1095105ea76fa4c8235e7cb8597f9953694",
"",
"root"
],
[
"e",
"c33446ffbcf97e79786d5492a228b6c3b06e6e65ff855bddabab73dc650fa6de"
],
[
"e",
"407aa2c59610ce76fc2f6033920c75185218aedcec7bfd25c004472c212dd515",
"",
"reply"
],
[
"p",
"036533caa872376946d4e4fdea4c1a0441eda38ca2d9d9417bb36006cbaabf58"
],
[
"p",
"75656740209960c74fe373e6943f8a21ab896889d8691276a60f86aadbc8f92a"
]
],
"content": "That tracks. If you are ram/cpu bound then you GPU is going to be doing a whole lot of waiting around. Ollama will put as much of the model in the GPU as it can and just leave it there. So those weights run nearly instantaneously. It looks like that memory is 1600MT/s and perhaps single channel?\n\nOr maybe it is feeding a cpu that can't chew fast enough? 100% utilization hints at that. I am running an i9-13900k.",
"sig": "c4690d2a76c08c91b7d253a22da41b46ac016c2d2afaaa5fc998db1bf7b6f511c41a2f842a3a315719563c4efe0b8f444a9e45851e5b1458b8a133a0d5d16209"
}