José A. Alonso on Nostr: DeepSeek-Prover-V1.5: Harnessing proof assistant feedback for reinforcement learning ...
DeepSeek-Prover-V1.5: Harnessing proof assistant feedback for reinforcement learning and Monte-Carlo tree search. ~ Huajian Xin et als.
https://www.arxiv.org/abs/2408.08152 #ITP #Lean4
Published at
2024-08-16 14:38:20Event JSON
{
"id": "ae2745dea4bf2c1dd7e1d967c834ca8a54450b6d3160d90623cd9bd1759f6b6f",
"pubkey": "0efb7bc903f4c6716cd4d07830d344d7abe5b607a156de3cde1ac1a5bf22ae1c",
"created_at": 1723819100,
"kind": 1,
"tags": [
[
"t",
"Lean4"
],
[
"t",
"itp"
],
[
"proxy",
"https://mathstodon.xyz/users/Jose_A_Alonso/statuses/112972208592378503",
"activitypub"
]
],
"content": "DeepSeek-Prover-V1.5: Harnessing proof assistant feedback for reinforcement learning and Monte-Carlo tree search. ~ Huajian Xin et als. https://www.arxiv.org/abs/2408.08152 #ITP #Lean4",
"sig": "ceae6653e8d6876691251a20a3b54a8cf824e5c8daa34fbc5b37c52b265c0cf9f827514e2ddfd47c2f29088410431ae5254482565e558f557e8c70585c28c173"
}