Running with the trained weights is cheap. Training new models that are are based on ...

2023-07-12 22:45:18

Running with the trained weights is cheap. Training new models that are are based on the existing weights is relatively cheap.

But Stable Diffusion was trained on 4K A100s and LLaMa used 2,048 A100-80GB cards for training. ChatGPT 4 is rumored to be 225B parameters and in a stream a few months ago I think George Hotz mentioned how many GPUs he heard it used for training, but I can’t remember what he said.

Ideally the costs will drop even faster than Moore’s law with technical advances in the training methods, but it’s still very expensive to train new models from scratch.

Author Public Key

npub1zp9fuqdl487hmzvjqcmt7fdm9rcl5hhy5y3qr7k5vtahnjd4kt5sfckvjd

Show more details

BourbonicPlague on Nostr: Running with the trained weights is cheap. Training new models that are are based on ...