/hn/comments_40269489.gph on codevoid.de

	_______ __ _______
	\| \| \|.---.-..----.\| \|--..-----..----. \| \| \|.-----..--.--.--..-----.
	\| \|\| _ \|\| __\|\| < \| -__\|\| _\| \| \|\| -__\|\| \| \| \|\|__ --\|
	\|___\|___\|\|___._\|\|____\|\|__\|__\|\|_____\|\|__\| \|__\|____\|\|_____\|\|________\|\|_____\|
	on Gopher (inofficial)
	Visit Hacker News on the Web


	COMMENT PAGE FOR:
	Deep Reinforcement Learning: Zero to Hero


	mode80 wrote 20 hours 49 min ago:
	Thanks for making this!

	Note: I was carefully reading along and well into the third notebook
	before I realized that the code sections marked "TODO" were actual
	exercises for the reader to implement! (And the tests which follow are
	for the reader to check their work.)

	This is a clever approach. It just wasn't obvious to me from the
	outset.

	(I thought the TODOs were just some fiddly details you didn't want
	distracting readers from the big picture. But in fact, those are the
	important parts.)

	alessiodm wrote 10 hours 42 min ago:
	Great feedback, I didn't even think about that the TODOs could be
	indeed confusing! I updated the instructions in the README.md calling
	them out explicitly as the coding sections to be completed. Thanks
	again!

	jezzamon wrote 21 hours 45 min ago:
	Awesome, I've been sort of stuck in the limbo of doing courses that
	taught me some theory but missing the hands on knowledge I need to
	really use RL. This looks like exactly the type of course I'm looking
	for!

	alessiodm wrote 18 hours 22 min ago:
	Thank you! I'll be be curious if / how these notebooks help and how
	your experience is! Any feedback welcome!

	wegfawefgawefg wrote 21 hours 49 min ago:
	A few years ago I made something similar. It doesnt go all the way to
	ppo, and has a different style. [1] I won't claim it is better or
	worse, but if anyone here is trying to learn, having the same
	information presented in multiple forms is always nice.

	[1]: https://learndrl.com/

	dukeofdoom wrote 22 hours 5 min ago:
	Maybe I can use this in my pygame game

	fancyfredbot wrote 22 hours 18 min ago:
	This is really nice, great idea. I am going to make a suggestion which
	I hope is helpful - I don't mean to be critical of this nice project.

	After going through the MDP example, I have one comment on the way you
	introduce the non-deterministic transition function. In your example
	the non-determinism comes from the agent making "mistakes", it can
	mistakenly go left or right when trying to go up or down:

	1) You could introduce the mistakes more clearly as it isn't really
	explained the agent makes mistakes in the text, and so the comment
	about mistakes in the transition() function is initally a bit
	confusing.

	2) I think the way this introduces non-determinism could be more
	didactic if the non-determinism came from the environment, not the
	agent? For example the agent might be moving on a rough surface and
	moving its tracks/limbs/whatever might not always produce the intended
	outcome. As you present it the transition is a function from an action
	to a random action to a random state, and the definition is just a
	function from an action to a random state.

	alessiodm wrote 10 hours 47 min ago:
	Thank you so much for this feedback! Indeed, this is definitely
	confusing in the notebook. I pushed a small commit to make it a
	little bit more clear that the non-determinism comes from the
	probabilistic nature of the environment dynamics (and not b/c the
	agent chooses a different action by mistake).

	As a side note, initially I meant to go through it in a video to fill
	the gaps in the text with my voice. But given that I didn't have time
	for those, I am fixing those gaps first :) Thanks again!

	malomalsky wrote 1 day ago:
	If there anything like that, but for NLP?

	spmurrayzzz wrote 20 hours 7 min ago:
	There is an NLP section in Jeremy Howard's "Practical Deep Learning
	for Coders" course (free): [1] The whole course is fantastic. I
	recommend it frequently to folks who want to start with DL basics and
	ramp up quickly to more advanced material.

	[1]: https://course.fast.ai/Lessons/lesson4.html

	alessiodm wrote 22 hours 45 min ago:
	I took the Deep Learning course [1] by deeplearning.ai in the past,
	and their resources where incredibly good IMHO. Hence, I would
	suggest to take a look at their NLP specialization [2].

	+1000 to "Neural networks: zero to hero" already mentioned as well.
	[1]

	[1]: https://www.deeplearning.ai/courses/deep-learning-specializa...
	[2]: https://www.deeplearning.ai/courses/natural-language-process...

	barrenko wrote 1 day ago:
	There's the series this material references - "Neural networks: zero
	to hero" that has GPT related parts.

	chaosprint wrote 1 day ago:
	Great resources! Thank you for making this.

	I'm attaching here a DRL framework I made for music generation, similar
	to OpenAI Gym. If anyone wants to test the algorithms OP includes, you
	are welcome to use it. Issues and PRs are also welcome.

	[1]: https://github.com/chaosprint/RaveForce

	levocardia wrote 1 day ago:
	This looks great - maybe add a link to the youtube videos in the
	README?

	alessiodm wrote 1 day ago:
	Thank you so much! Unfortunately, that is a mistake in the README
	that I just noticed (thank you for pointing it out!) :( As I
	mentioned in the first post, I didn't get to make the YouTube videos
	yet. But it seems the community would be indeed interested.

	I will try to get to them (and in the meantime fix the README, sorry
	about that!)

	bluishgreen wrote 1 day ago:
	"Shamelessly stole the title from a hero of mine". Your Shamelessness
	is all fine. But at first I thought this is a post from Andrej
	Karpathy. He has one of the best personal brands out there on the
	internet, while personal brands can't be enforced, this confused me at
	first.

	alessiodm wrote 1 day ago:
	TL;DR: If more folks feel this way, please upvote this comment: I'll
	be happy to take down this post, change the title, and either re-post
	it or just don't - the GitHub repo is out there - that that should be
	more than enough. Sorry again for the confusion (I just upvoted it).

	I am deeply sorry about the confusion. And the last thing I intended
	was to grab any attention away from Andrej, and / or being confused
	with him.

	I tried to find a way to edit the post title, but I couldn't find
	one. Is there just a limited time window to do that? If you know how
	to do it, I'd be happy to edit it right away in case.

	I didn't even think this post would get any attention at all - it is
	my first post indeed here, and I really did it just b/c if anybody
	could use this project to learn RL I was happy to share.

	ultra_nick wrote 1 day ago:
	Didn't "Zero to Hero" come from Disney's Hercules movie before
	Karparthy used it?

	alessiodm wrote 1 day ago:
	Didn't know that, but now I have an excuse to go watch a movie :D

	gradascent wrote 1 day ago:
	I didn't find it confusing at all. I think it's totally ok to
	re-use phrasing made famous by someone else - this is how language
	evolves after all.

	alessiodm wrote 1 day ago:
	Thank you, I appreciate it.

	FezzikTheGiant wrote 1 day ago:
	this is a great resource nonetheless. Even if you did use the name
	to get attention how does it matter? I still see it as a net
	positive. Thanks for sharing this

	alessiodm wrote 1 day ago:
	Thank you!

	khiner wrote 1 day ago:
	Throwing in my vote - I wasnât confused, saw your GH link and a
	âZero to Heroâ course name on RL, seems clear to me and âZe…
	to Heroâ is a classic title for a first course, nice that you
	gave props to Andrea too! Multiple people can and should make ML
	guides and reference each other. Thanks for putting in the time to
	share your learnings and make a fantastic resource out of it!

	alessiodm wrote 1 day ago:
	Thanks a lot. It makes me feel better to hear that the post is
	not completely confusing and appropriating - I really didn't mean
	that, or to use it as a trick for attention.

	zaptrem wrote 1 day ago:
	I spent three semesters in college learning RL only to be massively
	disappointed in the end after discovering that the latest and greatest
	RL techniques canât even beat a simple heuristic in Tetris.

	vineyardlabs wrote 20 hours 17 min ago:
	RL seems to be in this weird middle ground right now where nobody
	knows how to make it work all that well but almost everybody at the
	top levels of ML research agrees it's a vital component of further
	advances in AI.

	jmward01 wrote 1 day ago:
	I modeled part of my company's business problem as a MAB problem and
	saved my company 10% off their biggest cost and, just as important,
	showcased an automated truth signal that helped us understand what
	was, and wasn't, working in several of our features. Like all tools,
	finding the right place to use RL concepts is a big deal. I think one
	thing that is often missed in a classroom setting is pushing more
	real world examples of where powerful ideas can be used. Talking
	about optimal policies is great, but if you don't help people
	understand where those ideas can be applied then it is just a bunch
	of fun math. (which is often a good enough reason on its own :)

	smokel wrote 1 day ago:
	For those not in the know, "MAB" is short for Multi-Armed Bandit
	[1], which is a decision-making framework that is often discussed
	in the broader context of reinforcement learning.

	In my limited understanding, MAB problems are simpler than those
	tackled by Deep Reinforcement Learning (DRL), because typically
	there is no state involved in bandit problems. However, I have no
	idea about their scale in practical applications, and would love to
	know more about said business problem.

	[1]: https://en.wikipedia.org/wiki/Multi-armed_bandit

	jmward01 wrote 22 hours 39 min ago:
	There are often times when you have n possible providers of
	service y, each with strengths and weaknesses. If you have some
	ultimate truth signal (like follow on costs which are linked to
	quality, which was what I used) then you can model the providers
	as bandits and use something like UCB1 to choose which to use. If
	you then apply this to every individual customer what you end up
	doing is learning the optimal vendor for each customer which
	gives you a higher efficiency than had you picked just one 'best
	all around' vendor for all customers. So the pattern here is: If
	you have n_service_providers and n_customers and a value signal
	to optimize then maybe MAB is the place to go for some possible
	quick gains. Of course if you have a huge state space to explore
	instead of just n_service_providers, for instance you want to
	model combinations of choices, using something like a NN to learn
	the state space value function is also a great way to go.

	alessiodm wrote 1 day ago:
	RL can be massively disappointing, indeed. And I agree with you (and
	with the amazing post I already referenced [1]) that it is hard to
	get it to work at all. Sorry to hear you have been disappointed so
	much!

	Nonetheless, I would personally recommend even just learning the
	basics and fundamentals of RL. Beyond supervised, unsupervised, and
	the most-recent and well-deservedly hyped semi-supervised learning
	(generative AI, LLMs, and so on), reinforcement learning indeed
	models the learning problem in a very elegant way: an agent
	interacting with an environment and getting feedback. Which is,
	arguably, a very intuitive and natural way of modeling it. You could
	consider backward error correction / propagation as an implicit
	reward signal, but that would be a very limited view.

	On a positive note, RL has very practical sucessful applications
	today - even if in niche fields. For example, LLM fine-tuning
	techniques like RLHF successfully apply RL to modern AI systems,
	companies like Covariant are working on large robotics models which
	definitely use RL, and generally as a research field I believe (but I
	may be proven wrong!) there is so much more to explore. For example,
	check Nvidia Eureka that combines LLM to RL [2]: pretty cool stuff
	IMHO!

	Far from attempting to convince you on the strength and capabilities
	of DRL, just recommending folks to not discard it right away and at
	least give it a chance to learn the basics, even just for an
	intellectual exercise :) Thanks again! [1]

	[1]: https://www.alexirpan.com/2018/02/14/rl-hard.html
	[2]: https://blogs.nvidia.com/blog/eureka-robotics-research/

	achandra03 wrote 1 day ago:
	This looks really interesting! I tried exploring deep RL myself some
	time ago but could never get my agents to make any meaningful progress,
	and as someone with very little stats/ML background it was difficult to
	debug what was going wrong. Will try following this and seeing what
	happens!

	barrenko wrote 1 day ago:
	I mean, resources like these are great, but RL in itself is quite
	dense and topic heavy, so not sure there is any way to reduce the
	inherent difficulty level, any beginner should be made clear to that.
	That's my primary gripe with ML topics (especially RL related).

	alessiodm wrote 22 hours 31 min ago:
	Thank you. It is true, indeed the material does assume some prior
	knowledge (which I mention in the introduction). In particular:
	being proficient in Python, or at least in one high-level
	programming language, be familiar with deep learning and neural
	networks, and - to get into the theory and mathematics (optional) -
	basic calculus, algebra, statistics, and probability theory.

	Nonetheless, especially for RL foundations, I found that a
	practical understanding of the algorithms at a basic level, writing
	them yourself, and "playing" with them and their results
	(especially in small toy settings like the grid world) provided the
	best way to start getting a basic intuition in the field. Hence,
	this resource :)

	alessiodm wrote 1 day ago:
	Thank you very much! I'd be really interested to know if your agents
	will eventually make progress, and if these notebooks help - even if
	a tiny bit!

	If you just want to see if these algorithm can even work at all, feel
	free to jump on the `solution` folder and pick any algorithm you
	think could work and just try it out there. If it does, then you can
	have all the fun rewriting it from scratch :) Thanks again!

	viraptor wrote 1 day ago:
	In case you want to expand to more chapters one day: there's lots of
	tutorials of doing the simple things that has been verified to work,
	but if I'm struggling it's normally with something people barely ever
	mention - what to do when things go wrong. For example your actions
	just consistently get stuck at maximum. Or the exploration doesn't kick
	in, regardless how noisy you make the off-policy training. Or ...

	I wish there were more practical resources for when you've got the
	basics usually working, but suddenly get issues nobody really talks
	about. (beyond "just tweak some stuff until it works" anyway)

	alessiodm wrote 1 day ago:
	Thanks a lot, and another great suggestion for improvement. I also
	found that the common advice is "tweak hyperparameters until you find
	the right combination". That can definitely help. But usually issues
	hide in different "corners", both of the problem space and its
	formulation, the algorithm itself (e.g., just different random seeds
	have big variance in performance), and more.

	As you mentioned, in real applications of DRL things tend to go wrong
	more often than right: "it doesn't work just yet" [1]. And my short
	tutorial definitely lacks in the area of troubleshooting, tuning, and
	"productionisation". If I carve time for expansion, this will likely
	make top of list. Thanks again.

	[1]: https://www.alexirpan.com/2018/02/14/rl-hard.html

	ubj wrote 1 day ago:
	Thanks for sharing [1], that was a great read. I'd be curious to
	see an updated version of that article, since it's about 6 years
	old now. For example, Boston Dynamics has transitioned from MPC to
	RL for controlling its Spot robots [2]. Davide Scaramuzza, whose
	team created autonomous FPV drones that beat expert human pilots,
	has also discussed how his team had to transition from MPC to RL
	[3].

	[2]: [1] [3]:

	[1]: https://bostondynamics.com/blog/starting-on-the-right-foot...
	[2]: https://www.incontrolpodcast.com/1632769/13775734-ep15-dav...

	alessiodm wrote 1 day ago:
	Thank you for the amazing links as well! You are right that the
	article [1] is 6 years old now, and indeed the field has evolved.
	But the algorithms and techniques I share in the GitHub repo are
	the "classic" ones (dating back then too), for which that post is
	still relevant - at least from an historical perspective.

	You bring up a very good point though: more recent advancements
	and assessments should be linked and/or mentioned in the repo
	(e.g., in the resources and/or an appendix). I will try to do
	that sometime.

	alessiodm wrote 1 day ago:
	While trying to learn the latest in Deep Reinforcement Learning, I was
	able to take advantage of many excellent resources (see credits [1]),
	but I couldn't find one that provided the right balance between theory
	and practice for my personal experience. So I decided to create
	something myself, and open-source it for the community, in case it
	might be useful to someone else.

	None of that would have been possible without all the resources listed
	in [1], but I rewrote all algorithms in this series of Python notebooks
	from scratch, with a "pedagogical approach" in mind. It is a hands-on
	step-by-step tutorial about Deep Reinforcement Learning techniques (up
	~2018/2019 SoTA) guiding through theory and coding exercises on the
	most utilized algorithms (QLearning, DQN, SAC, PPO, etc.)

	I shamelessly stole the title from a hero of mine, Andrej Karpathy, and
	his "Neural Network: Zero To Hero" [2] work. I also meant to work on a
	series of YouTube videos, but didn't have the time yet. If this posts
	gets any type of interest, I might go back to it. Thank you.

	P.S.: A friend of mine suggested me to post here, so I followed their
	advice: this is my first post, I hope it properly abides with the rules
	of the community. [1]

	[1]: https://github.com/alessiodm/drl-zh/blob/main/00_Intro.ipynb
	[2]: https://karpathy.ai/zero-to-hero.html

	fancyfredbot wrote 12 hours 8 min ago:
	I've gone through the first three notebooks today and enjoyed them a
	lot. First time I've tried the Atari gymnasium and that was really
	satisfying and fun. Thank you.

	alessiodm wrote 9 hours 37 min ago:
	Really happy to hear you enjoyed the notebooks! And thank you very
	much for the patch in the simulate_mdp for the cliff world!

	tunnuz wrote 1 day ago:
	Does it rely heavily on python, or could someone use a different
	language to go through the material?

	alessiodm wrote 22 hours 37 min ago:
	Yes, the material relies heavily on Python. I intentionally used
	popular open-source libraries (such as Gymnasium for RL
	environments, and PyTorch for deep learning) and Python itself
	given their popularity in the field, so that the content and
	learnings could be readily applicable to real-world projects.

	The theory and algorithms per-se are general: they can be
	re-implemented in any language, as long as there are comparable
	libraries to use. But the notebooks are primarily in Python, and
	the (attempted) "frictionless" learning experience would lose a bit
	if the setup is in a different language, and it'll likely take a
	little bit more effort to follow along.

	verdverm wrote 1 day ago:
	very cool, thanks for putting this together

	It would be great to see a page dedicated to SoTA techniques &
	results

	alessiodm wrote 1 day ago:
	Thank you so much! And very good advice: I have an extremely brief
	and not-descriptive list in the "Next" notebook, initially intended
	for that. But it definitely falls short.

	I may actually expand it in a second "more advanced" series of
	notebooks, to explore model-based RL, curiosity, and other recent
	topics: even if not comprehensive, some hands on basic coding
	exercise on those topics might be of interest nonetheless.


	<- back to front page