**RT @SebastienBubeck:** We trained a small transformer (100M params) for basic ...

2023-11-28 15:08:25

**RT @SebastienBubeck:**

We trained a small transformer (100M params) for basic arithmetic. W. the right training data it nails 12x12 digits multiplication w/o CoT (that's 10^24 possibilities, so no it's not memorization🤣).

Maybe arithmetic is not the LLM kryptonite after all?🤔

arxiv.org/abs/2311.14737 (https://arxiv.org/abs/2311.14737)

https://nitter.moomoo.me/pic/card_img%2F1730303658632392704%2F3ppX_K2o%3Fformat%3Djpg%26name%3D420x420_2

https://nitter.moomoo.me/SebastienBubeck/status/1729517609669030071#m

Author Public Key

npub1dp38ul43a35vhday6vsq9auncaeg03anjjwqyya7jrvg9w07c7tsfzkjg4

Show more details

Mark Russinovich / @markrussinovich (RSS Feed) on Nostr: **RT @SebastienBubeck:** We trained a small transformer (100M params) for basic ...