| _______ __ _______ | |
| | | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----. | |
| | || _ || __|| < | -__|| _| | || -__|| | | ||__ --| | |
| |___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____| | |
| on Gopher (inofficial) | |
| Visit Hacker News on the Web | |
| COMMENT PAGE FOR: | |
| Ask HN: How to learn CUDA to professional level | |
| FilosofumRex wrote 18 hours 35 min ago: | |
| In you're in it for the money, then forget about HPC and the mathy | |
| stuff, unless you've a PhD in the application domain, no one will | |
| bother with you, even if you write CUDA at 120 wpm. | |
| The real money is in mastering PTX, nvcc, cuobjdump, Nsight Systems, | |
| and Nsight Compute. CUTLASS is good open source code base to explore - | |
| start here [1] most importantly, stay off HN, get on Discord gpu mode, | |
| where real coders are: | |
| [1]: https://christianjmills.com/series/notes/cuda-mode-notes.html | |
| [2]: https://discord.com/invite/gpumode | |
| MoonGhost wrote 3 hours 43 min ago: | |
| It may be cool and real but sounds like very niche domain. Which | |
| means there are very few people and places. Mostly gaming industry | |
| and drivers. Starting from zero level and getting there in one step | |
| will be hard. One should be really, really smart for this. | |
| lacker wrote 20 hours 25 min ago: | |
| If you're experienced in C++ you can basically just jump in. I found | |
| this youtube series to be really helpful: [1] After watching this video | |
| I was able to implement a tiling version of a kernel that was the | |
| bottleneck of our production data analysis pipeline to improve | |
| performance by over 2x. There's much more to learn but I found this | |
| video series to be a great place to start. | |
| [1]: https://www.youtube.com/playlist?list=PLxNPSjHT5qvtYRVdNN1yDcd... | |
| SonOfLilit wrote 21 hours 26 min ago: | |
| Prefix scan is a great intro to GPU programming: [1] After this you | |
| should be able to tell whether you enjoy this kind of work. | |
| If you do, try to do a reasonably optimized GEMM, and then try to | |
| follow the FlashAttention paper and implement a basic version of what | |
| they're doing. | |
| [1]: https://developer.download.nvidia.com/compute/cuda/2_2/sdk/web... | |
| brudgers wrote 1 day ago: | |
| For better or worse, direct professional experience in a professional | |
| setting is the only way to learn anything to a professional level. | |
| That doesn't mean one-eyed-king knowledge is never enough to solve that | |
| chicken-and-egg. You only have to be good enough to get the job. | |
| But if you haven't done it on the job, you don't have work experience | |
| and you are either lying to others or lying to yourself...and any | |
| sophisticated organization won't fall for it... | |
| ...except of course, knowingly. And the best way to get someone to | |
| knowingly ignore obvious dunning-kruger and/or horseshit is to know | |
| that someone personally or professionally. | |
| Which is to say that the best way to get a good job is to have a good | |
| relationship with someone who can hire you for a good job (nepotism | |
| trumps technical ability, always). And the best way to find a good job | |
| is to know a lot of people who want to work with you. | |
| To put it another way, looking for a job is the only way to find a job | |
| and looking for a job is also much much harder than everything that | |
| avoids looking for a job (like studying CUDA) by pretending to be | |
| preparation...because again, studying CUDA won't ever give you | |
| professional experience. | |
| Don't get me wrong, there's nothing wrong with learning CUDA all on | |
| your own. But it is not professional experience and it is not looking | |
| for a job doing CUDA. | |
| Finally, if you want to learn CUDA just learn it for its own sake | |
| without worrying about a job. Learning things for their own sake is the | |
| nature of learning once you get out of school. | |
| Good luck. | |
| alecco wrote 1 day ago: | |
| Ignore everybody else. Start with CUDA Thrust. Study carefully their | |
| examples. See how other projects use Thrust. After a year or two, go | |
| deeper to cub. | |
| Do not implement algorithms by hand. Recent architectures are extremely | |
| hard to reach decent occupancy and such. Thrust and cub solve 80% of | |
| the cases with reasonable trade-offs and they do most of the work for | |
| you. | |
| [1]: https://developer.nvidia.com/thrust | |
| bee_rider wrote 1 day ago: | |
| It looks quite nice just from skimming the link. | |
| But, I donât understand the comparison to TBB. Do they have a | |
| version of TBB that runs on the GPU natively? If the TBB | |
| implementation is on the CPU⦠thatâs just comparing two different | |
| pieces of hardware. Which would be confusing, bordering on dishonest. | |
| alecco wrote 12 hours 15 min ago: | |
| The TBB comparison is a marketing leftover from 10 years ago when | |
| they were trying to convince people that NVIDIA GPUs were much | |
| faster than Intel CPUs for parallel problems. | |
| matt3210 wrote 1 day ago: | |
| Just make cool stuff. Find people to code review. I learn way more | |
| during code reviews than anything else. | |
| canyp wrote 1 day ago: | |
| My 2 cents: "Learning CUDA" is not the interest bit. Rather, you want | |
| to learn two things: 1) GPU hardware architecture, 2) parallelizing | |
| algorithms. For CUDA specifically, there is the book CUDA Programming | |
| Guide from Nvidia, which will teach you the basics of the language. But | |
| what these jobs typically require is that you know how to parallelize | |
| an algorithm and squeeze the most of the hardware. | |
| gdubs wrote 1 day ago: | |
| I like to learn through projects, and as a graphics guy I love the GPU | |
| Gems series. Things like: [1] As an Apple platforms developer I | |
| actually worked through those books to figure out how to convert the | |
| CUDA stuff to Metal, which helped the material click even more. | |
| Part of why I did it was â and this was some years back â I wanted | |
| to sharpen my thinking around parallel approaches to problem solving, | |
| given how central those algorithms and ways of thinking are to things | |
| like ML and not just game development, etc. | |
| [1]: https://developer.nvidia.com/gpugems/gpugems3/part-v-physics-s... | |
| fifilura wrote 1 day ago: | |
| I am not a CUDA programmer but when looking at this, I think I can see | |
| the parallels to Spark and SQL [1] So - start getting used to | |
| programming without using for loops, would be my tip. | |
| [1]: https://gfxcourses.stanford.edu/cs149/fall24/lecture/dataparal... | |
| sremani wrote 1 day ago: | |
| The book - PMPP - Programming Massively Parallel Processors | |
| The YouTube Channel - CUDA_MODE - it is based on PMPP | |
| I could not find the channel, but here is the playlist [1] Once done, | |
| you would be on solid foundation. | |
| [1]: https://www.youtube.com/watch?v=LuhJEEJQgUM&list=PLVEjdmwEDkgW... | |
| math_dandy wrote 1 day ago: | |
| Are there any GPU emulators you can use to run simple CUDA programs on | |
| a commodity laptops, just to get comfortable with the mechanics, the | |
| toolchain, etc.? | |
| throwaway81523 wrote 22 hours 16 min ago: | |
| You can get VPS with GPU's these days, not super cheap, but | |
| affordable for those in the industry. | |
| corysama wrote 1 day ago: | |
| [1] emulates running simple CUDA programs in a web page with zero | |
| setup. Itâs a good way to get your toes wet. | |
| [1]: https://leetgpu.com/ | |
| gkbrk wrote 1 day ago: | |
| Commodity laptops can just use regular non-emulated CUDA if they have | |
| an Nvidia GPU. It's not just for datacenter GPUs, a ton of regular | |
| consumer GPUs are also supported. | |
| bee_rider wrote 1 day ago: | |
| A commodity laptop doesnât have a GPU these days, iGPUs are good | |
| enough for basic tasks. | |
| SoftTalker wrote 1 day ago: | |
| It's 2025. Get with the times, ask Claude to do it, and then ask it to | |
| explain it to you as if you're an engineer who needs to convince a | |
| hiring manager that you understand it. | |
| rakel_rakel wrote 1 day ago: | |
| Might work in 2025, 2026 will demand more. | |
| mekpro wrote 1 day ago: | |
| To professionals in the field, I have a question: what jobs, positions, | |
| and companies are in need of CUDA engineers? My current understanding | |
| is that while many companies use CUDA's by-products (like PyTorch), | |
| direct CUDA development seems less prevalent. I'm therefore seeking to | |
| identify more companies and roles that heavily rely on CUDA. | |
| kloop wrote 1 day ago: | |
| My team uses it for geospatial data. We rasterize slippy map tiles | |
| and then do a raster summary on the gpu. | |
| It's a weird case, but the pixels can be processed independently for | |
| most of it, so it works pretty well. Then the rows can be summarized | |
| in parallel and rolled up at the end. The copy onto the gpu is our | |
| current bottleneck however. | |
| indianmouse wrote 1 day ago: | |
| As a very early CUDA programmer who participated in the cudacontest | |
| from NVidia during 2008 and I believe one of the only entries (I'm not | |
| claiming though) to be submitted from India and got a consolation and | |
| participation prize of a BlackEdition Card, I can vouch the method | |
| which I followed. | |
| - Look up the CUDA Programming Guide from NVidia | |
| - CUDA Programming books from NVidia from | |
| developer.nvidia.com/cuda-books-archive link | |
| - Start creating small programs based on the existing implementations | |
| (A strong C implementation knowledge is required. So, brush up if | |
| needed.) | |
| - Install the required Toolchains, compilers, and I am assuming you | |
| have the necessary hardware to play around | |
| - Github links with CUDA projects. Read the code, And now you could use | |
| LLM to explain the code in the way you would need | |
| - Start creating smaller, yet parallel programs etc., etc., | |
| And in about a month or two, you should have enough to start writing | |
| CUDA programs. | |
| I'm not aware of the skill / experience levels you have, but whatever | |
| it might be, there are plenty of sources and resources available now | |
| than it was in 2007/08. | |
| Create a 6-8 weeks of study plan and you should be flying soon! | |
| Hope it helps. | |
| Feel free to comment and I can share whatever I could to guide. | |
| edge17 wrote 1 day ago: | |
| What environment do you use? Is it still the case that Windows is the | |
| main development environment for cuda? | |
| hiq wrote 1 day ago: | |
| > I am assuming you have the necessary hardware to play around | |
| Can you expand on that? Is it enough to have an nvidia graphic card | |
| that's like 5 year old, or do you need something more specific? | |
| adrian_b wrote 9 hours 0 min ago: | |
| A 5-year old card, i.e. an NVIDIA Ampere RTX 30xx from 2020, is | |
| perfectly fine. | |
| Even 7-year old cards, i.e. NVIDIA Turing RTX 20xx from 2018, are | |
| still acceptable. | |
| Older GPUs than Turing should be avoided, because they lack many | |
| capabilities of the newer cards, e.g. "tensor cores", and their | |
| support in the newer CUDA toolkits will be deprecated in a not very | |
| distant future, but very slowly, so for now you can still create | |
| programs for Maxwell GPUs from 10 years ago. | |
| Among the newer GPUs, the RTX 40xx SUPER series (i.e. the SUPER | |
| variants, not the original RTX 40xx series) has the best energy | |
| efficiency. The newest RTX 50xx GPUs have worse energy efficiency | |
| than RTX 40xx SUPER, so they achieve a somewhat higher performance | |
| only by consuming a disproportionately greater power. Instead of | |
| that, it is better to use multiple RTX 40xx SUPER. | |
| indianmouse wrote 20 hours 41 min ago: | |
| That is sufficient. | |
| slt2021 wrote 1 day ago: | |
| each nVidia GPU has a certain Compute Capability ( [1] ). | |
| Depending on the model and age of your GPU, it will have a certain | |
| capability that will be the hard ceiling for what you can program | |
| using CUDA | |
| [1]: https://developer.nvidia.com/cuda-gpus | |
| sanderjd wrote 22 hours 25 min ago: | |
| Recognizing that this won't result in any useful benchmarks, is | |
| there a way to emulate an nvidia gpu? In a docker container, for | |
| instance? | |
| dpe82 wrote 1 day ago: | |
| When you're just getting started and learning that won't matter | |
| though. Any Nvidia card from the last 10 years should be fine. | |
| rahimnathwani wrote 1 day ago: | |
| I'm not a CUDA programmer, but AIUI: | |
| - you will want to install the latest version of CUDA Toolkit | |
| (12.9.1) | |
| - each version of CUDA Toolkit requires the card driver to be above | |
| a certain version (e.g. toolkit depends on driver version 576 or | |
| above) | |
| - older cards often have recent drivers, e.g. the current version | |
| of CUDA Toolkit will work with a GTX 1080, as it has a recent | |
| (576.x) driver | |
| sputknick wrote 1 day ago: | |
| I used this to teach high school students. Probably not sufficient to | |
| get what you want, but it should get you off the ground and you can run | |
| from there. | |
| [1]: https://youtu.be/86FAWCzIe_4?si=buqdqREWASNPbMQy | |
| tkuraku wrote 1 day ago: | |
| I think you just pick a problem you want to solve with gpu programming | |
| and go for it. Learning what you need along the way. Nvidia blog posts | |
| are great for learning things along the way such as | |
| [1]: https://devblogs.nvidia.com/cuda-pro-tip-write-flexible-kernel... | |
| majke wrote 1 day ago: | |
| I had a bit, limited, exposure to cuda. It was before the AI boom, | |
| during Covid. | |
| I found it easy to start. Then there was a pretty nice learning curve | |
| to get to warps, SM's and basic concepts. Then I was able to dig deeper | |
| into the integer opcodes, which was super cool. I was able to optimize | |
| the compute part pretty well, without much roadblocks. | |
| However, getting memory loads perfect and then getting closer to hw | |
| (warp groups, divergence, the L2 cache split thing, scheduling), was | |
| pretty hard. | |
| I'd say CUDA is pretty nice/fun to start with, and it's possible to get | |
| quite far for a novice programmer. However getting deeper and achieving | |
| real advantage over CPU is hard. | |
| Additionally there is a problem with Nvidia segmenting the market - | |
| some opcodes are present in _old_ gpu's (CUDA arch is _not_ forwards | |
| compatible). Some opcodes are reserved to "AI" chips (like H100). So, | |
| to get code that is fast on both H100 and RTX5090 is super hard. Add to | |
| that a fact that each card has different SM count and memory capacity | |
| and bandwidth... and you end up with an impossible compatibility | |
| matrix. | |
| TLDR: Beginnings are nice and fun. You can get quite far on the | |
| optimizing compute part. But getting compatibility for differnt chips | |
| and memory access is hard. When you start, chose specific problem, | |
| specific chip, specific instruction set. | |
| epirogov wrote 1 day ago: | |
| I bought P106-90 for 20$ and start porting my date apps to parallel | |
| processing with it. | |
| izharkhan wrote 1 day ago: | |
| Haking Kase kare | |
| rramadass wrote 1 day ago: | |
| CUDA GPGPU programming was invented to solve certain classes of | |
| parallel problems. So studying these problems will give you greater | |
| insight into CUDA based parallel programming. I suggest reading the | |
| following old book along with your CUDA resources. | |
| Scientific Parallel Computing by L. Ridgway Scott et. al. - | |
| [1]: https://press.princeton.edu/books/hardcover/9780691119359/scie... | |
| weinzierl wrote 1 day ago: | |
| Nvidia itself has a paid course series. It is a bit older but I believe | |
| still relevant. I have bought it, but not yet started it yet. I intend | |
| to do so during the summer holidays. | |
| imjonse wrote 1 day ago: | |
| These should keep you busy for months: [1] resources and discord | |
| community | |
| Book: Programming massively parallel processors | |
| nvidia cuda docs are very comprehensive too | |
| [1]: https://www.gpumode.com/ | |
| [2]: https://github.com/srush/GPU-Puzzles | |
| mdaniel wrote 1 day ago: | |
| Wowzers, the line noise | |
| [1]: https://github.com/HazyResearch/ThunderKittens#:~:text=here%... | |
| amelius wrote 1 day ago: | |
| This follows a "winner takes all" scenario. I see the differences | |
| between the submissions are not so large, often smaller than 1%. Kind | |
| of pointless to work on this, if you ask me. | |
| imjonse wrote 14 hours 5 min ago: | |
| the main site is confusing indeed with all those leaderboards, but | |
| follow the discord and resources links for the actual learning | |
| material. | |
| amelius wrote 9 hours 10 min ago: | |
| Thanks, looks interesting indeed. | |
| elashri wrote 1 day ago: | |
| I will give you personal experience learning CUDA that might be | |
| helpful. | |
| Disclaime: I don't claim that this is actually a systematic way to | |
| learn it and it is more for academic work. | |
| I got assigned to a project that needs learning CUDA as part of my PhD. | |
| There was no one in my research group who have any experience or know | |
| CUDA. I started with standard NVIDIA courses (Getting Started with | |
| Accelerated Computing with CUDA C/C++ and there is python version too). | |
| This gave me good introduction to the concepts and basic ideas but I | |
| think after that I did most of learning by trial and error. I tried a | |
| couple of online tutorials for specific things and some books but it | |
| was always a deprecated function there or here or a change of API that | |
| make things obsolete. Or basically things changed for your GPU and now | |
| you have to be careful because yoy might be using GPU version not | |
| compatible with what I develop for in production and you need things to | |
| work for both. | |
| I think learning CUDA for me is an endeavor of pain and going through | |
| "compute-sanitizer" and Nsight because you will find that most of your | |
| time will go into debugging why things is running slower than you | |
| think. | |
| Take things slowly. Take a simple project that you know how to do | |
| without CUDA then port it to CUDA ane benchmark against CPU and try to | |
| optimize different aspect of it. | |
| The one advice that can be helpful is not to think about optimization | |
| at the beginning. Start with correct, then optimize. A working slow | |
| kernel beats a fast kernel that corrupts memory. | |
| korbip wrote 1 day ago: | |
| I can share a similar PhD story (the result being visible here: [1] | |
| ). Back then I didn't find any tutorials that cover anything beyond | |
| the basics (which are still important). | |
| Once you have understood the principle working mode and architecture | |
| of a GPU, I would recommend the following workflow: | |
| 1. First create an environment so that you can actually test your | |
| kernels against baselines written in a higher-level language. | |
| 2. If you don't have an urgent project already, try to | |
| improve/re-implement existing problems (MatMul being the first | |
| example). Don't get caught by wanting to implement all size cases. | |
| Take an example just to learn a certain functionality, rather than | |
| solving the whole problem if it's just about learning. | |
| 3. Write the functionality you want to have in increasing complexity. | |
| Write loops first, then parallelize these loops over the grid. Use | |
| global memory first, then put things into shared memory and | |
| registers. Use plain matrix multiplication first, then use mma | |
| (TensorCore) primitives to speed things up. | |
| 4. Iterate over the CUDA C Programming Guide. It covers all (most) of | |
| the functionality that you want to learn - but can't be just read an | |
| memorized. When you apply it you learn it. | |
| 5. Might depend on you use-case but also consider using higher-level | |
| abstractions like CUTLASS or ThunderKitten. Also, if your environment | |
| is jax/torch, use triton first before going to CUDA level. | |
| Overall, it will be some pain for sure. And to master it including | |
| PTX etc. will take a lot of time. | |
| [1]: https://github.com/NX-AI/flashrnn | |
| kevmo314 wrote 1 day ago: | |
| > I think learning CUDA for me is an endeavor of pain and going | |
| through "compute-sanitizer" and Nsight because you will find that | |
| most of your time will go into debugging why things is running slower | |
| than you think. | |
| This is so true it hurts. | |
| ForgotIdAgain wrote 1 day ago: | |
| I have not tried it yet, but seems nice : | |
| [1]: https://leetgpu.com/ | |
| Onavo wrote 1 day ago: | |
| Assuming you are asking this because of the deep learning/ChatGPT hype, | |
| the first question you should ask yourself is, do you really need to? | |
| The skills needed for CUDA are completely unrelated to building machine | |
| learning models. It's like learning to make a TLS library so you can | |
| get a full stack web development job. The skills are completely | |
| orthogonal. CUDA belongs to the domain of game developers, graphics | |
| people, high performance computing and computer engineers (hardware). | |
| From the point of view of machine learning development and research, | |
| it's nothing more than an implementation detail. | |
| Make sure you are very clear on what you want. Most HR departments cast | |
| a wide net (it's like how every junior role requires "3-5 years of | |
| experience" when in reality they don't really care). Similarly when | |
| hiring, most companies pray for the unicorn developer who can | |
| understand the entire stack from the GPU to the end user product domain | |
| when the day to day is mostly in Python. | |
| throwaway81523 wrote 1 day ago: | |
| I looked at the CUDA code for Leela Chess Zero and found it pretty | |
| understandable, though that was back when Leela used a DCNN instead of | |
| transformers. DCNN's are fairly simple and are explained in fast.ai | |
| videos that I watched a few years ago, so navigating the Leela code | |
| wasn't too difficult. Transformers are more complicated and I want to | |
| bone up on them, but I haven't managed to spend any time understanding | |
| them. | |
| CUDA itself is just a minor departure from C++, so the language itself | |
| is no big deal if you've used C++ before. But, if you're trying to get | |
| hired programming CUDA, what that really means is they want you | |
| implementing AI stuff (unless it's game dev). AI programming is a much | |
| wider and deeper subject than CUDA itself, so be ready to spend a bunch | |
| of time studying and hacking to come up to speed in that. But if you | |
| do, you will be in high demand. As mentioned, the fast.ai videos are a | |
| great introduction. | |
| In the case of games, that means 3D graphics which these days is | |
| another rabbit hole. I knew a bit about this back in the day, but it | |
| is fantastically more sophisticated now and I don't have any idea where | |
| to even start. | |
| robotnikman wrote 1 day ago: | |
| >But if you do, you will be in high demand | |
| So I'm guessing trying to find a job as a CUDA programmer is nowhere | |
| as big of a headache compared to other software engineering jobs | |
| right now? I'm thinking maybe learning CUDA and more about AI might | |
| be a good pivot from the current position as a Java middleware | |
| developer. | |
| randomNumber7 wrote 5 hours 18 min ago: | |
| It is likely much more focused on mathematics compared to what a | |
| usual java dev does. | |
| upmind wrote 1 day ago: | |
| This is a great idea! This is the code right' [1] I have two beginner | |
| (and probably very dumb) questions, why do they have heavy c++/cuda | |
| usage rather than using only pytorch/tensorflow. Are they too slow | |
| for training Leela? Second, why is there tensorflow code? | |
| [1]: https://github.com/leela-zero/leela-zero | |
| henrikf wrote 1 day ago: | |
| That's Leela Zero (plays Go instead of Chess). It was good for its | |
| time (~2018) but it's quite outdated now. It also uses OpenCL | |
| instead of Cuda. I wrote a lot of that code including Winograd | |
| convolution routines. | |
| Leela Chess Zero ( [1] ) has much more optimized Cuda backend | |
| targeting modern GPU architectures and it's written by much more | |
| knowledgeable people than me. That would be a much better source to | |
| learn. | |
| [1]: https://github.com/LeelaChessZero/lc0 | |
| throwaway81523 wrote 1 day ago: | |
| As I remember, the CUDA code was about 3x faster than the | |
| tensorflow code. The tensorflow stuff is there for non-Nvidia | |
| GPU's. This was in the era of the GTX 1080 or 2080. No idea about | |
| now. | |
| upmind wrote 1 day ago: | |
| Ah I see, thanks a lot! | |
| lokimedes wrote 1 day ago: | |
| Thereâs a couple of âconcernsâ you may separate to make this a | |
| bit more tractable: | |
| 1. Learning CUDA - the framework, libraries and high-layer wrappers. | |
| This is something that changes with times and trends. | |
| 2. Learning high-performance computing approaches. While a GPU and the | |
| Nvlink interfaces are Nvidia specific, working in a massively-parallel | |
| distributed computing environment is a general branch of knowledge that | |
| is translatable across HPC architectures. | |
| 3. Application specifics. If your thing is Transformers, you may just | |
| as well start from Torch, Tensorflow, etc. and rely on the current | |
| high-level abstractions, to inspire your learning down to the | |
| fundamentals. | |
| Iâm no longer active in any of the above, so I canât be more | |
| specific, but if you want to master CUDA, I would say learning how | |
| massive-parallel programming works, is the foundation that may | |
| translate into transferable skills. | |
| david-gpu wrote 21 hours 52 min ago: | |
| Former GPU guy here. Yeah, that's exactly what I was going to suggest | |
| too, with emphasis on #2 and #3. What kind of jobs are they trying to | |
| apply for? Is it really CUDA that they need to be familiar with, or | |
| CUDA-based libraries like cuDNN, cuBLAS, cuFFT, etc? | |
| Understanding the fundamentals of parallel programming comes first, | |
| IMO. | |
| chanana wrote 20 hours 1 min ago: | |
| > Understanding the fundamentals of parallel programming comes | |
| first, IMO. | |
| Are there any good resources youâd recommend for that? | |
| rramadass wrote 15 hours 5 min ago: | |
| I am not the person you asked the question of, but you might find | |
| the following useful (in addition to the ones mentioned in my | |
| other comments); | |
| Foundations of Multithreaded, Parallel, and Distributed | |
| Programming by Gregory Andrews - An old classic but still very | |
| good explanations of concurrent algorithmic concepts. | |
| Parallel Programming: Concepts and Practice by Bertil Schmidt | |
| et.al. - A relatively recent book with comprehensive coverage. | |
| rramadass wrote 1 day ago: | |
| This is the right approach. Without (2) trying to learn (1) will just | |
| lead to "confusion worse confounded". I also suggest a book | |
| recommendation here - | |
| [1]: https://news.ycombinator.com/item?id=44216478 | |
| jonas21 wrote 1 day ago: | |
| I think it depends on your learning style. For me, learning | |
| something with a concrete implementation and code that you can play | |
| around with is a lot easier than trying to study the abstract | |
| general concepts first. Once you have some experience with the | |
| code, you start asking why things are done a certain way, and that | |
| naturally leads to the more general concepts. | |
| rramadass wrote 15 hours 19 min ago: | |
| It has got nothing to do with "learning styles". Parallel | |
| Computing needs knowledge of three things; a) Certain crucial | |
| architectural aspects (logical and physical) of the hardware b) | |
| Decomposing a problem correctly to map to that hardware c) | |
| Algorithms using a specific language/framework to combine the | |
| above two. CUDA (and other similar frameworks) only come in the | |
| last step and so a knowledge of the first two is a prerequisite. | |
| lokimedes wrote 1 day ago: | |
| This one was my go-to for HPC, but it may be a bit dated by now: | |
| [1]: https://www.amazon.com/Introduction-Performance-Computing-... | |
| rramadass wrote 1 day ago: | |
| That's a good book too (i have it) but more general than the | |
| Ridgway Scott book which uses examples from Numerical Computation | |
| domains. Here is an overview of the chapters; example domains | |
| start from chapter 10 onwards - [1] These sort of books are only | |
| "dated" when it comes to specific languages/frameworks/libraries. | |
| The methods/techniques are evergreen and often conceptually | |
| better explained in these older books. | |
| For recent up to date works on HPC, the free multi-volume The Art | |
| of High Performance Computing by Victor Eijkhout can't be beat - | |
| [1]: https://www.jstor.org/stable/j.ctv1ddcxfs | |
| [2]: https://news.ycombinator.com/item?id=38815334 | |
| dist-epoch wrote 1 day ago: | |
| As they typically say: Just Do It (tm). | |
| Start writing some CUDA core to sort an array or find the maximum | |
| element. | |
| the__alchemist wrote 1 day ago: | |
| I concur with this. Then supplement with resources A/R. Ideally, find | |
| some tasks in your programs that are parallelize. (Learning what | |
| these are is important too!), and switch them to Cuda. If you don't | |
| have any, make a toy case, e.g. an n-body simulation. | |
| amelius wrote 1 day ago: | |
| I'd rather learn to use a library that works on any brand of GPU. | |
| If that is not an option, I'll wait! | |
| moralestapia wrote 1 day ago: | |
| K, bud. | |
| Perhaps you haven't noticed, but you're in a thread that asked | |
| about CUDA, explicitly. | |
| uecker wrote 1 day ago: | |
| GCC / clang also have support for offloading. | |
| latchkey wrote 1 day ago: | |
| Then learn PyTorch. | |
| The hardware between brands is fundamentally different. There isn't | |
| a standard like x86 for CPUs. | |
| So, while you may use something like HIPIFY to translate your code | |
| between APIs, at least with GPU programming, it makes sense to | |
| learn how they differ from each other or just pick one of them and | |
| work with it knowing that the others will just be some variation of | |
| the same idea. | |
| horsellama wrote 1 day ago: | |
| the jobs requiring cuda experience are most of the times because | |
| torch is not good enough | |
| Cloudef wrote 1 day ago: | |
| Both zig and rust are aiming to compile to gpus natively. What cuda | |
| and hip provide is heterogeneous computing runtime, aka hiding the | |
| boilerplate of executing code on cpu and gpu seamlessly | |
| pjmlp wrote 1 day ago: | |
| If only Khronos and the competition cared about the developer | |
| experience.... | |
| the__alchemist wrote 1 day ago: | |
| This is continuously a point of frustration! Vulkan compute is... | |
| suboptimal. I use Cuda because it feels like the only practical | |
| option. I want Vulkan or something else to compete seriously, but | |
| until that happens, I will use Cuda. | |
| corysama wrote 1 day ago: | |
| Is [1] + [2] getting there? | |
| Runs on anything + auto-differentiatation. | |
| [1]: https://github.com/KomputeProject/kompute | |
| [2]: https://shader-slang.org/ | |
| pjmlp wrote 1 day ago: | |
| It took until Vulkanised 2025, to acknowledge Vulkan became the | |
| same mess as OpenGL, and to put an action plan into action to | |
| try to correct this. | |
| Had it not been for Apple with OpenCL initial contribution, | |
| regardless of how it went from there, AMD with Mantle as | |
| starting point for Vulkan, NVidia with Vulkan-Hpp and Slang, | |
| and the ecosystem of Khronos standards would be much worse. | |
| Also Vulkan isn't as bad as OpenGL tooling, because LunarG | |
| exists, and someone pays them for the whole Vulkan SDK. | |
| The attitude "we put paper standards" and the community should | |
| step in for the implementations and tooling, hardly comes to | |
| the productivity from private APIs tooling. | |
| Also all GPU vendors, including Intel and AMD, also rather push | |
| their own compute APIs, even if based on top of Khronos ones. | |
| david-gpu wrote 21 hours 40 min ago: | |
| > The attitude "we put paper standards" and the community | |
| should step in for the implementations and tooling | |
| Khronos is a consortium financed by its members, who either | |
| implement the standards on their own hardware or otherwise | |
| depend on the ecosystem around them. For example, competing | |
| GPU vendors typically implement the standards in parallel | |
| with the committee meetings. The very people who represent | |
| their company in Khronos are typically leads of the teams who | |
| implement the standards. | |
| Source: used to represent my employers at Khronos. It was a | |
| difficult, thankless job, that required almost as much | |
| diplomacy as technical expertise. | |
| pjmlp wrote 12 hours 58 min ago: | |
| I know, and the way those members implemented Khronos | |
| standards, versus their own proprietary alternatives, shows | |
| how it actually works in practice, regarding developer | |
| tooling and ergonomics. | |
| <- back to front page |