Scott Williams 🐧 on Nostr: I got IBM's 34b granite instruction model up and running in a box with a bunch of ...
I got IBM's 34b granite instruction model up and running in a box with a bunch of nVidia GPUs today, but basing on their demo code it oddly pulled memory from the GPUs and then did all the processing on one cpu thread, which took several minutes for one query. The query results were actually good, but about the most compute expensive way to possibly do it.
Published at
2024-08-06 00:45:17Event JSON
{
"id": "ccaa83b05d4d7b92cf2a3ed63eff41ff1afab0171a43f4b37bd000383095b509",
"pubkey": "beee28c6450eb5d21bf94279d1f3f41e524c87abf86393835b45e6a1e98390e7",
"created_at": 1722905117,
"kind": 1,
"tags": [
[
"proxy",
"https://mastodon.online/users/vwbusguy/statuses/112912309783056023",
"activitypub"
]
],
"content": "I got IBM's 34b granite instruction model up and running in a box with a bunch of nVidia GPUs today, but basing on their demo code it oddly pulled memory from the GPUs and then did all the processing on one cpu thread, which took several minutes for one query. The query results were actually good, but about the most compute expensive way to possibly do it.",
"sig": "73633d8ab2f296ea568f6572b3554e040d8e0aa30502329bab1fd01c0e87544dc866ec7324f5c7bb34118fad2e90102f6efd92150a5bfac85668bb9e4b249e7d"
}