_______ __ _______ | |
| | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----. | |
| || _ || __|| < | -__|| _| | || -__|| | | ||__ --| | |
|___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____| | |
on Gopher (inofficial) | |
Visit Hacker News on the Web | |
COMMENT PAGE FOR: | |
Deep Reinforcement Learning: Zero to Hero | |
mode80 wrote 20 hours 49 min ago: | |
Thanks for making this! | |
Note: I was carefully reading along and well into the third notebook | |
before I realized that the code sections marked "TODO" were actual | |
exercises for the reader to implement! (And the tests which follow are | |
for the reader to check their work.) | |
This is a clever approach. It just wasn't obvious to me from the | |
outset. | |
(I thought the TODOs were just some fiddly details you didn't want | |
distracting readers from the big picture. But in fact, those are the | |
important parts.) | |
alessiodm wrote 10 hours 42 min ago: | |
Great feedback, I didn't even think about that the TODOs could be | |
indeed confusing! I updated the instructions in the README.md calling | |
them out explicitly as the coding sections to be completed. Thanks | |
again! | |
jezzamon wrote 21 hours 45 min ago: | |
Awesome, I've been sort of stuck in the limbo of doing courses that | |
taught me some theory but missing the hands on knowledge I need to | |
really use RL. This looks like exactly the type of course I'm looking | |
for! | |
alessiodm wrote 18 hours 22 min ago: | |
Thank you! I'll be be curious if / how these notebooks help and how | |
your experience is! Any feedback welcome! | |
wegfawefgawefg wrote 21 hours 49 min ago: | |
A few years ago I made something similar. It doesnt go all the way to | |
ppo, and has a different style. [1] I won't claim it is better or | |
worse, but if anyone here is trying to learn, having the same | |
information presented in multiple forms is always nice. | |
[1]: https://learndrl.com/ | |
dukeofdoom wrote 22 hours 5 min ago: | |
Maybe I can use this in my pygame game | |
fancyfredbot wrote 22 hours 18 min ago: | |
This is really nice, great idea. I am going to make a suggestion which | |
I hope is helpful - I don't mean to be critical of this nice project. | |
After going through the MDP example, I have one comment on the way you | |
introduce the non-deterministic transition function. In your example | |
the non-determinism comes from the agent making "mistakes", it can | |
mistakenly go left or right when trying to go up or down: | |
1) You could introduce the mistakes more clearly as it isn't really | |
explained the agent makes mistakes in the text, and so the comment | |
about mistakes in the transition() function is initally a bit | |
confusing. | |
2) I think the way this introduces non-determinism could be more | |
didactic if the non-determinism came from the environment, not the | |
agent? For example the agent might be moving on a rough surface and | |
moving its tracks/limbs/whatever might not always produce the intended | |
outcome. As you present it the transition is a function from an action | |
to a random action to a random state, and the definition is just a | |
function from an action to a random state. | |
alessiodm wrote 10 hours 47 min ago: | |
Thank you so much for this feedback! Indeed, this is definitely | |
confusing in the notebook. I pushed a small commit to make it a | |
little bit more clear that the non-determinism comes from the | |
probabilistic nature of the environment dynamics (and not b/c the | |
agent chooses a different action by mistake). | |
As a side note, initially I meant to go through it in a video to fill | |
the gaps in the text with my voice. But given that I didn't have time | |
for those, I am fixing those gaps first :) Thanks again! | |
malomalsky wrote 1 day ago: | |
If there anything like that, but for NLP? | |
spmurrayzzz wrote 20 hours 7 min ago: | |
There is an NLP section in Jeremy Howard's "Practical Deep Learning | |
for Coders" course (free): [1] The whole course is fantastic. I | |
recommend it frequently to folks who want to start with DL basics and | |
ramp up quickly to more advanced material. | |
[1]: https://course.fast.ai/Lessons/lesson4.html | |
alessiodm wrote 22 hours 45 min ago: | |
I took the Deep Learning course [1] by deeplearning.ai in the past, | |
and their resources where incredibly good IMHO. Hence, I would | |
suggest to take a look at their NLP specialization [2]. | |
+1000 to "Neural networks: zero to hero" already mentioned as well. | |
[1] | |
[1]: https://www.deeplearning.ai/courses/deep-learning-specializa... | |
[2]: https://www.deeplearning.ai/courses/natural-language-process... | |
barrenko wrote 1 day ago: | |
There's the series this material references - "Neural networks: zero | |
to hero" that has GPT related parts. | |
chaosprint wrote 1 day ago: | |
Great resources! Thank you for making this. | |
I'm attaching here a DRL framework I made for music generation, similar | |
to OpenAI Gym. If anyone wants to test the algorithms OP includes, you | |
are welcome to use it. Issues and PRs are also welcome. | |
[1]: https://github.com/chaosprint/RaveForce | |
levocardia wrote 1 day ago: | |
This looks great - maybe add a link to the youtube videos in the | |
README? | |
alessiodm wrote 1 day ago: | |
Thank you so much! Unfortunately, that is a mistake in the README | |
that I just noticed (thank you for pointing it out!) :( As I | |
mentioned in the first post, I didn't get to make the YouTube videos | |
yet. But it seems the community would be indeed interested. | |
I will try to get to them (and in the meantime fix the README, sorry | |
about that!) | |
bluishgreen wrote 1 day ago: | |
"Shamelessly stole the title from a hero of mine". Your Shamelessness | |
is all fine. But at first I thought this is a post from Andrej | |
Karpathy. He has one of the best personal brands out there on the | |
internet, while personal brands can't be enforced, this confused me at | |
first. | |
alessiodm wrote 1 day ago: | |
TL;DR: If more folks feel this way, please upvote this comment: I'll | |
be happy to take down this post, change the title, and either re-post | |
it or just don't - the GitHub repo is out there - that that should be | |
more than enough. Sorry again for the confusion (I just upvoted it). | |
I am deeply sorry about the confusion. And the last thing I intended | |
was to grab any attention away from Andrej, and / or being confused | |
with him. | |
I tried to find a way to edit the post title, but I couldn't find | |
one. Is there just a limited time window to do that? If you know how | |
to do it, I'd be happy to edit it right away in case. | |
I didn't even think this post would get any attention at all - it is | |
my first post indeed here, and I really did it just b/c if anybody | |
could use this project to learn RL I was happy to share. | |
ultra_nick wrote 1 day ago: | |
Didn't "Zero to Hero" come from Disney's Hercules movie before | |
Karparthy used it? | |
alessiodm wrote 1 day ago: | |
Didn't know that, but now I have an excuse to go watch a movie :D | |
gradascent wrote 1 day ago: | |
I didn't find it confusing at all. I think it's totally ok to | |
re-use phrasing made famous by someone else - this is how language | |
evolves after all. | |
alessiodm wrote 1 day ago: | |
Thank you, I appreciate it. | |
FezzikTheGiant wrote 1 day ago: | |
this is a great resource nonetheless. Even if you did use the name | |
to get attention how does it matter? I still see it as a net | |
positive. Thanks for sharing this | |
alessiodm wrote 1 day ago: | |
Thank you! | |
khiner wrote 1 day ago: | |
Throwing in my vote - I wasnât confused, saw your GH link and a | |
âZero to Heroâ course name on RL, seems clear to me and âZe… | |
to Heroâ is a classic title for a first course, nice that you | |
gave props to Andrea too! Multiple people can and should make ML | |
guides and reference each other. Thanks for putting in the time to | |
share your learnings and make a fantastic resource out of it! | |
alessiodm wrote 1 day ago: | |
Thanks a lot. It makes me feel better to hear that the post is | |
not completely confusing and appropriating - I really didn't mean | |
that, or to use it as a trick for attention. | |
zaptrem wrote 1 day ago: | |
I spent three semesters in college learning RL only to be massively | |
disappointed in the end after discovering that the latest and greatest | |
RL techniques canât even beat a simple heuristic in Tetris. | |
vineyardlabs wrote 20 hours 17 min ago: | |
RL seems to be in this weird middle ground right now where nobody | |
knows how to make it work all that well but almost everybody at the | |
top levels of ML research agrees it's a vital component of further | |
advances in AI. | |
jmward01 wrote 1 day ago: | |
I modeled part of my company's business problem as a MAB problem and | |
saved my company 10% off their biggest cost and, just as important, | |
showcased an automated truth signal that helped us understand what | |
was, and wasn't, working in several of our features. Like all tools, | |
finding the right place to use RL concepts is a big deal. I think one | |
thing that is often missed in a classroom setting is pushing more | |
real world examples of where powerful ideas can be used. Talking | |
about optimal policies is great, but if you don't help people | |
understand where those ideas can be applied then it is just a bunch | |
of fun math. (which is often a good enough reason on its own :) | |
smokel wrote 1 day ago: | |
For those not in the know, "MAB" is short for Multi-Armed Bandit | |
[1], which is a decision-making framework that is often discussed | |
in the broader context of reinforcement learning. | |
In my limited understanding, MAB problems are simpler than those | |
tackled by Deep Reinforcement Learning (DRL), because typically | |
there is no state involved in bandit problems. However, I have no | |
idea about their scale in practical applications, and would love to | |
know more about said business problem. | |
[1]: https://en.wikipedia.org/wiki/Multi-armed_bandit | |
jmward01 wrote 22 hours 39 min ago: | |
There are often times when you have n possible providers of | |
service y, each with strengths and weaknesses. If you have some | |
ultimate truth signal (like follow on costs which are linked to | |
quality, which was what I used) then you can model the providers | |
as bandits and use something like UCB1 to choose which to use. If | |
you then apply this to every individual customer what you end up | |
doing is learning the optimal vendor for each customer which | |
gives you a higher efficiency than had you picked just one 'best | |
all around' vendor for all customers. So the pattern here is: If | |
you have n_service_providers and n_customers and a value signal | |
to optimize then maybe MAB is the place to go for some possible | |
quick gains. Of course if you have a huge state space to explore | |
instead of just n_service_providers, for instance you want to | |
model combinations of choices, using something like a NN to learn | |
the state space value function is also a great way to go. | |
alessiodm wrote 1 day ago: | |
RL can be massively disappointing, indeed. And I agree with you (and | |
with the amazing post I already referenced [1]) that it is hard to | |
get it to work at all. Sorry to hear you have been disappointed so | |
much! | |
Nonetheless, I would personally recommend even just learning the | |
basics and fundamentals of RL. Beyond supervised, unsupervised, and | |
the most-recent and well-deservedly hyped semi-supervised learning | |
(generative AI, LLMs, and so on), reinforcement learning indeed | |
models the learning problem in a very elegant way: an agent | |
interacting with an environment and getting feedback. Which is, | |
arguably, a very intuitive and natural way of modeling it. You could | |
consider backward error correction / propagation as an implicit | |
reward signal, but that would be a very limited view. | |
On a positive note, RL has very practical sucessful applications | |
today - even if in niche fields. For example, LLM fine-tuning | |
techniques like RLHF successfully apply RL to modern AI systems, | |
companies like Covariant are working on large robotics models which | |
definitely use RL, and generally as a research field I believe (but I | |
may be proven wrong!) there is so much more to explore. For example, | |
check Nvidia Eureka that combines LLM to RL [2]: pretty cool stuff | |
IMHO! | |
Far from attempting to convince you on the strength and capabilities | |
of DRL, just recommending folks to not discard it right away and at | |
least give it a chance to learn the basics, even just for an | |
intellectual exercise :) Thanks again! [1] | |
[1]: https://www.alexirpan.com/2018/02/14/rl-hard.html | |
[2]: https://blogs.nvidia.com/blog/eureka-robotics-research/ | |
achandra03 wrote 1 day ago: | |
This looks really interesting! I tried exploring deep RL myself some | |
time ago but could never get my agents to make any meaningful progress, | |
and as someone with very little stats/ML background it was difficult to | |
debug what was going wrong. Will try following this and seeing what | |
happens! | |
barrenko wrote 1 day ago: | |
I mean, resources like these are great, but RL in itself is quite | |
dense and topic heavy, so not sure there is any way to reduce the | |
inherent difficulty level, any beginner should be made clear to that. | |
That's my primary gripe with ML topics (especially RL related). | |
alessiodm wrote 22 hours 31 min ago: | |
Thank you. It is true, indeed the material does assume some prior | |
knowledge (which I mention in the introduction). In particular: | |
being proficient in Python, or at least in one high-level | |
programming language, be familiar with deep learning and neural | |
networks, and - to get into the theory and mathematics (optional) - | |
basic calculus, algebra, statistics, and probability theory. | |
Nonetheless, especially for RL foundations, I found that a | |
practical understanding of the algorithms at a basic level, writing | |
them yourself, and "playing" with them and their results | |
(especially in small toy settings like the grid world) provided the | |
best way to start getting a basic intuition in the field. Hence, | |
this resource :) | |
alessiodm wrote 1 day ago: | |
Thank you very much! I'd be really interested to know if your agents | |
will eventually make progress, and if these notebooks help - even if | |
a tiny bit! | |
If you just want to see if these algorithm can even work at all, feel | |
free to jump on the `solution` folder and pick any algorithm you | |
think could work and just try it out there. If it does, then you can | |
have all the fun rewriting it from scratch :) Thanks again! | |
viraptor wrote 1 day ago: | |
In case you want to expand to more chapters one day: there's lots of | |
tutorials of doing the simple things that has been verified to work, | |
but if I'm struggling it's normally with something people barely ever | |
mention - what to do when things go wrong. For example your actions | |
just consistently get stuck at maximum. Or the exploration doesn't kick | |
in, regardless how noisy you make the off-policy training. Or ... | |
I wish there were more practical resources for when you've got the | |
basics usually working, but suddenly get issues nobody really talks | |
about. (beyond "just tweak some stuff until it works" anyway) | |
alessiodm wrote 1 day ago: | |
Thanks a lot, and another great suggestion for improvement. I also | |
found that the common advice is "tweak hyperparameters until you find | |
the right combination". That can definitely help. But usually issues | |
hide in different "corners", both of the problem space and its | |
formulation, the algorithm itself (e.g., just different random seeds | |
have big variance in performance), and more. | |
As you mentioned, in real applications of DRL things tend to go wrong | |
more often than right: "it doesn't work just yet" [1]. And my short | |
tutorial definitely lacks in the area of troubleshooting, tuning, and | |
"productionisation". If I carve time for expansion, this will likely | |
make top of list. Thanks again. | |
[1]: https://www.alexirpan.com/2018/02/14/rl-hard.html | |
ubj wrote 1 day ago: | |
Thanks for sharing [1], that was a great read. I'd be curious to | |
see an updated version of that article, since it's about 6 years | |
old now. For example, Boston Dynamics has transitioned from MPC to | |
RL for controlling its Spot robots [2]. Davide Scaramuzza, whose | |
team created autonomous FPV drones that beat expert human pilots, | |
has also discussed how his team had to transition from MPC to RL | |
[3]. | |
[2]: [1] [3]: | |
[1]: https://bostondynamics.com/blog/starting-on-the-right-foot... | |
[2]: https://www.incontrolpodcast.com/1632769/13775734-ep15-dav... | |
alessiodm wrote 1 day ago: | |
Thank you for the amazing links as well! You are right that the | |
article [1] is 6 years old now, and indeed the field has evolved. | |
But the algorithms and techniques I share in the GitHub repo are | |
the "classic" ones (dating back then too), for which that post is | |
still relevant - at least from an historical perspective. | |
You bring up a very good point though: more recent advancements | |
and assessments should be linked and/or mentioned in the repo | |
(e.g., in the resources and/or an appendix). I will try to do | |
that sometime. | |
alessiodm wrote 1 day ago: | |
While trying to learn the latest in Deep Reinforcement Learning, I was | |
able to take advantage of many excellent resources (see credits [1]), | |
but I couldn't find one that provided the right balance between theory | |
and practice for my personal experience. So I decided to create | |
something myself, and open-source it for the community, in case it | |
might be useful to someone else. | |
None of that would have been possible without all the resources listed | |
in [1], but I rewrote all algorithms in this series of Python notebooks | |
from scratch, with a "pedagogical approach" in mind. It is a hands-on | |
step-by-step tutorial about Deep Reinforcement Learning techniques (up | |
~2018/2019 SoTA) guiding through theory and coding exercises on the | |
most utilized algorithms (QLearning, DQN, SAC, PPO, etc.) | |
I shamelessly stole the title from a hero of mine, Andrej Karpathy, and | |
his "Neural Network: Zero To Hero" [2] work. I also meant to work on a | |
series of YouTube videos, but didn't have the time yet. If this posts | |
gets any type of interest, I might go back to it. Thank you. | |
P.S.: A friend of mine suggested me to post here, so I followed their | |
advice: this is my first post, I hope it properly abides with the rules | |
of the community. [1] | |
[1]: https://github.com/alessiodm/drl-zh/blob/main/00_Intro.ipynb | |
[2]: https://karpathy.ai/zero-to-hero.html | |
fancyfredbot wrote 12 hours 8 min ago: | |
I've gone through the first three notebooks today and enjoyed them a | |
lot. First time I've tried the Atari gymnasium and that was really | |
satisfying and fun. Thank you. | |
alessiodm wrote 9 hours 37 min ago: | |
Really happy to hear you enjoyed the notebooks! And thank you very | |
much for the patch in the simulate_mdp for the cliff world! | |
tunnuz wrote 1 day ago: | |
Does it rely heavily on python, or could someone use a different | |
language to go through the material? | |
alessiodm wrote 22 hours 37 min ago: | |
Yes, the material relies heavily on Python. I intentionally used | |
popular open-source libraries (such as Gymnasium for RL | |
environments, and PyTorch for deep learning) and Python itself | |
given their popularity in the field, so that the content and | |
learnings could be readily applicable to real-world projects. | |
The theory and algorithms per-se are general: they can be | |
re-implemented in any language, as long as there are comparable | |
libraries to use. But the notebooks are primarily in Python, and | |
the (attempted) "frictionless" learning experience would lose a bit | |
if the setup is in a different language, and it'll likely take a | |
little bit more effort to follow along. | |
verdverm wrote 1 day ago: | |
very cool, thanks for putting this together | |
It would be great to see a page dedicated to SoTA techniques & | |
results | |
alessiodm wrote 1 day ago: | |
Thank you so much! And very good advice: I have an extremely brief | |
and not-descriptive list in the "Next" notebook, initially intended | |
for that. But it definitely falls short. | |
I may actually expand it in a second "more advanced" series of | |
notebooks, to explore model-based RL, curiosity, and other recent | |
topics: even if not comprehensive, some hands on basic coding | |
exercise on those topics might be of interest nonetheless. | |
<- back to front page |