Why Nostr? What is Njump?
2024-07-06 10:07:12
in reply to

Brandon Rohrer on Nostr: Each time step it gets a reward based on the height of the pendulum is— ranging ...

Each time step it gets a reward based on the height of the pendulum is— ranging from zero if it’s at the bottom to two if it’s at the top.

By the time it reaches a thousand episodes, it’s performing near optimally, with an average reward of 1.96, which includes spinning up from the bottom.

That represents 1 million times steps of learning at four times steps per second-about three days at 1X speed.
Author Public Key
npub1jh4qsxnz0nhyfefjsfvcdmxxvgfe6p5vf0dvh6pq4r6ytwwxcp4sl9eag0