**RT @SebastienBubeck:**
We trained a small transformer (100M params) for basic arithmetic. W. the right training data it nails 12x12 digits multiplication w/o CoT (that's 10^24 possibilities, so no it's not memorization🤣).
Maybe arithmetic is not the LLM kryptonite after all?🤔
arxiv.org/abs/2311.14737 (https://arxiv.org/abs/2311.14737)
https://nitter.moomoo.me/pic/card_img%2F1730303658632392704%2F3ppX_K2o%3Fformat%3Djpg%26name%3D420x420_2
https://nitter.moomoo.me/SebastienBubeck/status/1729517609669030071#m