Short Text Note by Tom_Drummond (reply)

2024-11-17 22:04:32

nprofile1qy2hwumn8ghj7un9d3shjtnddaehgu3wwp6kyqpqc9m22hkc5h6zgrwkz48crhcpw6vch2rf6j97746ugl3neys86jeqcr59k6 (nprofile…59k6) - it’s only a factor of 2. You could probably write upper triangular only matrix mults to get it back if you cared.

There’s also ‘fast’ transformers implementations that never materialize the attention matrix in RAM - they compute tiles and immediately use these to compute contributions to the final returned value vectors (I think this is done by keeping track of the logsumexp of the contributions as they go) - these methods might also make the factor of 2 saving for causual attention.

Author Public Key

npub1g4ss3v8573z4aus0ytsq3ewah9cphk5p48ffnfz5hp8htsc9ndjqphtq97

Show more details

Tom_Drummond on Nostr: nprofile1q…r59k6 - it’s only a factor of 2. You could probably write upper ...