That tracks. If you are ram/cpu bound then you GPU is going to be doing a whole lot ...

2025-05-23 20:21:16

That tracks. If you are ram/cpu bound then you GPU is going to be doing a whole lot of waiting around. Ollama will put as much of the model in the GPU as it can and just leave it there. So those weights run nearly instantaneously. It looks like that memory is 1600MT/s and perhaps single channel?

Or maybe it is feeding a cpu that can't chew fast enough? 100% utilization hints at that. I am running an i9-13900k.

Author Public Key

npub1w4jkwspqn9svwnlrw0nfg0u2yx4cj6yfmp53ya4xp7r24k7gly4qaq30zp

Show more details

Daniel Wigton on Nostr: That tracks. If you are ram/cpu bound then you GPU is going to be doing a whole lot ...