BourbonicPlague on Nostr: Running with the trained weights is cheap. Training new models that are are based on ...
Running with the trained weights is cheap. Training new models that are are based on the existing weights is relatively cheap.
But Stable Diffusion was trained on 4K A100s and LLaMa used 2,048 A100-80GB cards for training. ChatGPT 4 is rumored to be 225B parameters and in a stream a few months ago I think George Hotz mentioned how many GPUs he heard it used for training, but I can’t remember what he said.
Ideally the costs will drop even faster than Moore’s law with technical advances in the training methods, but it’s still very expensive to train new models from scratch.
Published at
2023-07-12 22:45:18Event JSON
{
"id": "5ecb0b1e31abf9f8f2fbc3da948ba683be31ee1af380c02378c0e05e357de84f",
"pubkey": "104a9e01bfa9fd7d89920636bf25bb28f1fa5ee4a12201fad462fb79c9b5b2e9",
"created_at": 1689201918,
"kind": 1,
"tags": [
[
"e",
"e1937a566f14f1a6d92e50eec91847aa7d46b1626fb4052fd2c10e096617fbf4",
""
],
[
"e",
"b2b0fee37b2b714b92b75a5b92b1fa1161b541f096aa196331dd324f0ea3dfdb"
],
[
"p",
"f0ff87e7796ba86fc84b4807b25a5dee206d724c6f61aa8853975a39deeeff58"
],
[
"p",
"f0ff87e7796ba86fc84b4807b25a5dee206d724c6f61aa8853975a39deeeff58"
]
],
"content": "Running with the trained weights is cheap. Training new models that are are based on the existing weights is relatively cheap. \n\nBut Stable Diffusion was trained on 4K A100s and LLaMa used 2,048 A100-80GB cards for training. ChatGPT 4 is rumored to be 225B parameters and in a stream a few months ago I think George Hotz mentioned how many GPUs he heard it used for training, but I can’t remember what he said.\n\nIdeally the costs will drop even faster than Moore’s law with technical advances in the training methods, but it’s still very expensive to train new models from scratch.",
"sig": "10239b274db04f5ac3575154cbad20818ffb41caab07f3586079acf9913a55e60b48b94ca85d003ee891f33accc815e33b132ef09f1f64ac3a7e0dcef10eafe8"
}