nprofile1q…8g23r Actually, I think the problem of untrustworthy sources on the ...

Why Nostr? What is Njump?

npub1hs…m7r3k

2025-03-10 17:46:18

in reply to nevent1q…ms8g

nprofile1qy2hwumn8ghj7un9d3shjtnddaehgu3wwp6kyqpqm3fp0m7fjdsd58dhs39sfvz4d45zsxtp4c362avjw4a869kr453qx8g23r (nprofile…g23r) Actually, I think the problem of untrustworthy sources on the internet (particularly on social media) predates generative AI, though certainly AI bots and "deepfake" images exacerbate the issue. With or without generative AI, it has become increasingly important to know how to independently verify information.

In the specific realm of pure mathematics, though, there is a potential solution to this problem by directing generative AI output to pass through a formal proof assistant to obtain a guarantee of correctness. At present, the experiments in this direction are only capable of resolving low-level undergraduate problems (such as computing a definite integral) by this approach, and it is still not clear whether the high-level conceptual component of a LLM-generated answer to a mathematical question can be captured by such formal languages; but I would imagine that requiring the LLM to formally verify at least some of the finer details of their output would significantly increase their broader reliability. (A similar phenomenon has already been observed in LLM-based solutions to Math Olympiad type challenges, in which models which do not directly attempt to answer the question, but instead create code in a more reliable language such as Python to solve the problem, significantly outperform pure LLM models.)

Author Public Key

npub1hsf727dlfy55vvm5wuqwyh457uwsc24pxn5f7vxnd4lpvv8phw3sjm7r3k

Show more details

Published at

2025-03-10 17:46:18

Kind type

1 Short Text Note

Event JSON

{ "id": "1ff747eef520fcced5c8e58130c1d04ed47be8899199af1f05cce39b150f427f", "pubkey": "bc13e579bf49294633747700e25eb4f71d0c2aa134e89f30d36d7e1630e1bba3", "created_at": 1741628778, "kind": 1, "tags": [ [ "p", "dc5217efc99360da1db7844b04b0556d68281961ae23a57592757a7d16c3ad22", "wss://relay.mostr.pub" ], [ "p", "56ac9127866e574201fad1df3807f487841c07b7a9719b69fb4e45154514b86b", "wss://relay.mostr.pub" ], [ "e", "7cd3c301d165c5f377b97ea7860cb6554d2bfddc886088d13891db1d1da4ec8b", "wss://relay.mostr.pub", "reply" ], [ "proxy", "https://mathstodon.xyz/users/tao/statuses/114139383627918505", "activitypub" ] ], "content": "nostr:nprofile1qy2hwumn8ghj7un9d3shjtnddaehgu3wwp6kyqpqm3fp0m7fjdsd58dhs39sfvz4d45zsxtp4c362avjw4a869kr453qx8g23r Actually, I think the problem of untrustworthy sources on the internet (particularly on social media) predates generative AI, though certainly AI bots and \"deepfake\" images exacerbate the issue. With or without generative AI, it has become increasingly important to know how to independently verify information.\n\nIn the specific realm of pure mathematics, though, there is a potential solution to this problem by directing generative AI output to pass through a formal proof assistant to obtain a guarantee of correctness. At present, the experiments in this direction are only capable of resolving low-level undergraduate problems (such as computing a definite integral) by this approach, and it is still not clear whether the high-level conceptual component of a LLM-generated answer to a mathematical question can be captured by such formal languages; but I would imagine that requiring the LLM to formally verify at least some of the finer details of their output would significantly increase their broader reliability. (A similar phenomenon has already been observed in LLM-based solutions to Math Olympiad type challenges, in which models which do not directly attempt to answer the question, but instead create code in a more reliable language such as Python to solve the problem, significantly outperform pure LLM models.)", "sig": "9f6cfcb5009459fee32900de3fdfd735c7a8c5c3e87dbfb40db0baf7688312d8a79841b396f8209014ac1461dcc530e838b6cf24a85903b6b00d497a1c6343c4" }