Tom_Drummond on Nostr: nprofile1q…r59k6 - it’s only a factor of 2. You could probably write upper ...
Published at
2024-11-17 22:04:32Event JSON
{
"id": "281deba7395c9a2086d294735bf3a7a2a6ecfbf2b98cced356f05b116c98bdf7",
"pubkey": "456108b0f4f4455ef20f22e008e5ddb9701bda81a9d299a454b84f75c3059b64",
"created_at": 1731881072,
"kind": 1,
"tags": [
[
"p",
"c176a55ed8a5f4240dd6154f81df0176998ba869d48bef575c47e33c9207d4b2",
"wss://relay.mostr.pub"
],
[
"p",
"3422fcbc32f333fb2d3481b2e981258af8a0b571869cbfe93c42962410e232ef",
"wss://relay.mostr.pub"
],
[
"e",
"13148e37e4017c9a5a902ffd4b7e39a6cc49a66b080dd5a68763e43be69eb448",
"wss://relay.mostr.pub",
"reply"
],
[
"proxy",
"https://mathstodon.xyz/users/Tom_Drummond/statuses/113500557960731433",
"activitypub"
]
],
"content": "nostr:nprofile1qy2hwumn8ghj7un9d3shjtnddaehgu3wwp6kyqpqc9m22hkc5h6zgrwkz48crhcpw6vch2rf6j97746ugl3neys86jeqcr59k6 - it’s only a factor of 2. You could probably write upper triangular only matrix mults to get it back if you cared.\n\nThere’s also ‘fast’ transformers implementations that never materialize the attention matrix in RAM - they compute tiles and immediately use these to compute contributions to the final returned value vectors (I think this is done by keeping track of the logsumexp of the contributions as they go) - these methods might also make the factor of 2 saving for causual attention.",
"sig": "5ff1d1944d22700ffe89060412dbcaf1f53afa4682324e44aec890d3e6efa99b0d804817012255fc0d86438b67cfaf407577206e416c2a302c72eb0113cc987c"
}