_______ __ _______ | |
| | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----. | |
| || _ || __|| < | -__|| _| | || -__|| | | ||__ --| | |
|___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____| | |
on Gopher (inofficial) | |
Visit Hacker News on the Web | |
COMMENT PAGE FOR: | |
Ask HN: How to learn CUDA to professional level | |
FilosofumRex wrote 9 hours 35 min ago: | |
In you're in it for the money, then forget about HPC and the mathy | |
stuff, unless you've a PhD in the application domain, no one will | |
bother with you, even if you write CUDA at 120 wpm. | |
The real money is in mastering PTX, nvcc, cuobjdump, Nsight Systems, | |
and Nsight Compute. CUTLASS is good open source code base to explore - | |
start here [1] most importantly, stay off HN, get on Discord gpu mode, | |
where real coders are: | |
[1]: https://christianjmills.com/series/notes/cuda-mode-notes.html | |
[2]: https://discord.com/invite/gpumode | |
lacker wrote 11 hours 25 min ago: | |
If you're experienced in C++ you can basically just jump in. I found | |
this youtube series to be really helpful: [1] After watching this video | |
I was able to implement a tiling version of a kernel that was the | |
bottleneck of our production data analysis pipeline to improve | |
performance by over 2x. There's much more to learn but I found this | |
video series to be a great place to start. | |
[1]: https://www.youtube.com/playlist?list=PLxNPSjHT5qvtYRVdNN1yDcd... | |
SonOfLilit wrote 12 hours 26 min ago: | |
Prefix scan is a great intro to GPU programming: [1] After this you | |
should be able to tell whether you enjoy this kind of work. | |
If you do, try to do a reasonably optimized GEMM, and then try to | |
follow the FlashAttention paper and implement a basic version of what | |
they're doing. | |
[1]: https://developer.download.nvidia.com/compute/cuda/2_2/sdk/web... | |
brudgers wrote 16 hours 57 min ago: | |
For better or worse, direct professional experience in a professional | |
setting is the only way to learn anything to a professional level. | |
That doesn't mean one-eyed-king knowledge is never enough to solve that | |
chicken-and-egg. You only have to be good enough to get the job. | |
But if you haven't done it on the job, you don't have work experience | |
and you are either lying to others or lying to yourself...and any | |
sophisticated organization won't fall for it... | |
...except of course, knowingly. And the best way to get someone to | |
knowingly ignore obvious dunning-kruger and/or horseshit is to know | |
that someone personally or professionally. | |
Which is to say that the best way to get a good job is to have a good | |
relationship with someone who can hire you for a good job (nepotism | |
trumps technical ability, always). And the best way to find a good job | |
is to know a lot of people who want to work with you. | |
To put it another way, looking for a job is the only way to find a job | |
and looking for a job is also much much harder than everything that | |
avoids looking for a job (like studying CUDA) by pretending to be | |
preparation...because again, studying CUDA won't ever give you | |
professional experience. | |
Don't get me wrong, there's nothing wrong with learning CUDA all on | |
your own. But it is not professional experience and it is not looking | |
for a job doing CUDA. | |
Finally, if you want to learn CUDA just learn it for its own sake | |
without worrying about a job. Learning things for their own sake is the | |
nature of learning once you get out of school. | |
Good luck. | |
alecco wrote 17 hours 8 min ago: | |
Ignore everybody else. Start with CUDA Thrust. Study carefully their | |
examples. See how other projects use Thrust. After a year or two, go | |
deeper to cub. | |
Do not implement algorithms by hand. Recent architectures are extremely | |
hard to reach decent occupancy and such. Thrust and cub solve 80% of | |
the cases with reasonable trade-offs and they do most of the work for | |
you. | |
[1]: https://developer.nvidia.com/thrust | |
bee_rider wrote 15 hours 11 min ago: | |
It looks quite nice just from skimming the link. | |
But, I donât understand the comparison to TBB. Do they have a | |
version of TBB that runs on the GPU natively? If the TBB | |
implementation is on the CPU⦠thatâs just comparing two different | |
pieces of hardware. Which would be confusing, bordering on dishonest. | |
alecco wrote 3 hours 15 min ago: | |
The TBB comparison is a marketing leftover from 10 years ago when | |
they were trying to convince people that NVIDIA GPUs were much | |
faster than Intel CPUs for parallel problems. | |
matt3210 wrote 17 hours 15 min ago: | |
Just make cool stuff. Find people to code review. I learn way more | |
during code reviews than anything else. | |
canyp wrote 18 hours 51 min ago: | |
My 2 cents: "Learning CUDA" is not the interest bit. Rather, you want | |
to learn two things: 1) GPU hardware architecture, 2) parallelizing | |
algorithms. For CUDA specifically, there is the book CUDA Programming | |
Guide from Nvidia, which will teach you the basics of the language. But | |
what these jobs typically require is that you know how to parallelize | |
an algorithm and squeeze the most of the hardware. | |
gdubs wrote 18 hours 52 min ago: | |
I like to learn through projects, and as a graphics guy I love the GPU | |
Gems series. Things like: [1] As an Apple platforms developer I | |
actually worked through those books to figure out how to convert the | |
CUDA stuff to Metal, which helped the material click even more. | |
Part of why I did it was â and this was some years back â I wanted | |
to sharpen my thinking around parallel approaches to problem solving, | |
given how central those algorithms and ways of thinking are to things | |
like ML and not just game development, etc. | |
[1]: https://developer.nvidia.com/gpugems/gpugems3/part-v-physics-s... | |
fifilura wrote 19 hours 4 min ago: | |
I am not a CUDA programmer but when looking at this, I think I can see | |
the parallels to Spark and SQL [1] So - start getting used to | |
programming without using for loops, would be my tip. | |
[1]: https://gfxcourses.stanford.edu/cs149/fall24/lecture/dataparal... | |
sremani wrote 19 hours 52 min ago: | |
The book - PMPP - Programming Massively Parallel Processors | |
The YouTube Channel - CUDA_MODE - it is based on PMPP | |
I could not find the channel, but here is the playlist [1] Once done, | |
you would be on solid foundation. | |
[1]: https://www.youtube.com/watch?v=LuhJEEJQgUM&list=PLVEjdmwEDkgW... | |
math_dandy wrote 20 hours 10 min ago: | |
Are there any GPU emulators you can use to run simple CUDA programs on | |
a commodity laptops, just to get comfortable with the mechanics, the | |
toolchain, etc.? | |
throwaway81523 wrote 13 hours 16 min ago: | |
You can get VPS with GPU's these days, not super cheap, but | |
affordable for those in the industry. | |
corysama wrote 19 hours 12 min ago: | |
[1] emulates running simple CUDA programs in a web page with zero | |
setup. Itâs a good way to get your toes wet. | |
[1]: https://leetgpu.com/ | |
gkbrk wrote 20 hours 8 min ago: | |
Commodity laptops can just use regular non-emulated CUDA if they have | |
an Nvidia GPU. It's not just for datacenter GPUs, a ton of regular | |
consumer GPUs are also supported. | |
bee_rider wrote 15 hours 9 min ago: | |
A commodity laptop doesnât have a GPU these days, iGPUs are good | |
enough for basic tasks. | |
SoftTalker wrote 20 hours 25 min ago: | |
It's 2025. Get with the times, ask Claude to do it, and then ask it to | |
explain it to you as if you're an engineer who needs to convince a | |
hiring manager that you understand it. | |
rakel_rakel wrote 17 hours 50 min ago: | |
Might work in 2025, 2026 will demand more. | |
mekpro wrote 20 hours 27 min ago: | |
To professionals in the field, I have a question: what jobs, positions, | |
and companies are in need of CUDA engineers? My current understanding | |
is that while many companies use CUDA's by-products (like PyTorch), | |
direct CUDA development seems less prevalent. I'm therefore seeking to | |
identify more companies and roles that heavily rely on CUDA. | |
kloop wrote 20 hours 11 min ago: | |
My team uses it for geospatial data. We rasterize slippy map tiles | |
and then do a raster summary on the gpu. | |
It's a weird case, but the pixels can be processed independently for | |
most of it, so it works pretty well. Then the rows can be summarized | |
in parallel and rolled up at the end. The copy onto the gpu is our | |
current bottleneck however. | |
indianmouse wrote 20 hours 46 min ago: | |
As a very early CUDA programmer who participated in the cudacontest | |
from NVidia during 2008 and I believe one of the only entries (I'm not | |
claiming though) to be submitted from India and got a consolation and | |
participation prize of a BlackEdition Card, I can vouch the method | |
which I followed. | |
- Look up the CUDA Programming Guide from NVidia | |
- CUDA Programming books from NVidia from | |
developer.nvidia.com/cuda-books-archive link | |
- Start creating small programs based on the existing implementations | |
(A strong C implementation knowledge is required. So, brush up if | |
needed.) | |
- Install the required Toolchains, compilers, and I am assuming you | |
have the necessary hardware to play around | |
- Github links with CUDA projects. Read the code, And now you could use | |
LLM to explain the code in the way you would need | |
- Start creating smaller, yet parallel programs etc., etc., | |
And in about a month or two, you should have enough to start writing | |
CUDA programs. | |
I'm not aware of the skill / experience levels you have, but whatever | |
it might be, there are plenty of sources and resources available now | |
than it was in 2007/08. | |
Create a 6-8 weeks of study plan and you should be flying soon! | |
Hope it helps. | |
Feel free to comment and I can share whatever I could to guide. | |
edge17 wrote 16 hours 6 min ago: | |
What environment do you use? Is it still the case that Windows is the | |
main development environment for cuda? | |
hiq wrote 20 hours 28 min ago: | |
> I am assuming you have the necessary hardware to play around | |
Can you expand on that? Is it enough to have an nvidia graphic card | |
that's like 5 year old, or do you need something more specific? | |
indianmouse wrote 11 hours 41 min ago: | |
That is sufficient. | |
slt2021 wrote 18 hours 47 min ago: | |
each nVidia GPU has a certain Compute Capability ( [1] ). | |
Depending on the model and age of your GPU, it will have a certain | |
capability that will be the hard ceiling for what you can program | |
using CUDA | |
[1]: https://developer.nvidia.com/cuda-gpus | |
sanderjd wrote 13 hours 25 min ago: | |
Recognizing that this won't result in any useful benchmarks, is | |
there a way to emulate an nvidia gpu? In a docker container, for | |
instance? | |
dpe82 wrote 17 hours 55 min ago: | |
When you're just getting started and learning that won't matter | |
though. Any Nvidia card from the last 10 years should be fine. | |
rahimnathwani wrote 20 hours 15 min ago: | |
I'm not a CUDA programmer, but AIUI: | |
- you will want to install the latest version of CUDA Toolkit | |
(12.9.1) | |
- each version of CUDA Toolkit requires the card driver to be above | |
a certain version (e.g. toolkit depends on driver version 576 or | |
above) | |
- older cards often have recent drivers, e.g. the current version | |
of CUDA Toolkit will work with a GTX 1080, as it has a recent | |
(576.x) driver | |
sputknick wrote 21 hours 38 min ago: | |
I used this to teach high school students. Probably not sufficient to | |
get what you want, but it should get you off the ground and you can run | |
from there. | |
[1]: https://youtu.be/86FAWCzIe_4?si=buqdqREWASNPbMQy | |
tkuraku wrote 22 hours 2 min ago: | |
I think you just pick a problem you want to solve with gpu programming | |
and go for it. Learning what you need along the way. Nvidia blog posts | |
are great for learning things along the way such as | |
[1]: https://devblogs.nvidia.com/cuda-pro-tip-write-flexible-kernel... | |
majke wrote 22 hours 47 min ago: | |
I had a bit, limited, exposure to cuda. It was before the AI boom, | |
during Covid. | |
I found it easy to start. Then there was a pretty nice learning curve | |
to get to warps, SM's and basic concepts. Then I was able to dig deeper | |
into the integer opcodes, which was super cool. I was able to optimize | |
the compute part pretty well, without much roadblocks. | |
However, getting memory loads perfect and then getting closer to hw | |
(warp groups, divergence, the L2 cache split thing, scheduling), was | |
pretty hard. | |
I'd say CUDA is pretty nice/fun to start with, and it's possible to get | |
quite far for a novice programmer. However getting deeper and achieving | |
real advantage over CPU is hard. | |
Additionally there is a problem with Nvidia segmenting the market - | |
some opcodes are present in _old_ gpu's (CUDA arch is _not_ forwards | |
compatible). Some opcodes are reserved to "AI" chips (like H100). So, | |
to get code that is fast on both H100 and RTX5090 is super hard. Add to | |
that a fact that each card has different SM count and memory capacity | |
and bandwidth... and you end up with an impossible compatibility | |
matrix. | |
TLDR: Beginnings are nice and fun. You can get quite far on the | |
optimizing compute part. But getting compatibility for differnt chips | |
and memory access is hard. When you start, chose specific problem, | |
specific chip, specific instruction set. | |
epirogov wrote 23 hours 11 min ago: | |
I bought P106-90 for 20$ and start porting my date apps to parallel | |
processing with it. | |
izharkhan wrote 23 hours 34 min ago: | |
Haking Kase kare | |
rramadass wrote 23 hours 51 min ago: | |
CUDA GPGPU programming was invented to solve certain classes of | |
parallel problems. So studying these problems will give you greater | |
insight into CUDA based parallel programming. I suggest reading the | |
following old book along with your CUDA resources. | |
Scientific Parallel Computing by L. Ridgway Scott et. al. - | |
[1]: https://press.princeton.edu/books/hardcover/9780691119359/scie... | |
weinzierl wrote 23 hours 57 min ago: | |
Nvidia itself has a paid course series. It is a bit older but I believe | |
still relevant. I have bought it, but not yet started it yet. I intend | |
to do so during the summer holidays. | |
imjonse wrote 1 day ago: | |
These should keep you busy for months: [1] resources and discord | |
community | |
Book: Programming massively parallel processors | |
nvidia cuda docs are very comprehensive too | |
[1]: https://www.gpumode.com/ | |
[2]: https://github.com/srush/GPU-Puzzles | |
mdaniel wrote 18 hours 2 min ago: | |
Wowzers, the line noise | |
[1]: https://github.com/HazyResearch/ThunderKittens#:~:text=here%... | |
amelius wrote 22 hours 21 min ago: | |
This follows a "winner takes all" scenario. I see the differences | |
between the submissions are not so large, often smaller than 1%. Kind | |
of pointless to work on this, if you ask me. | |
imjonse wrote 5 hours 5 min ago: | |
the main site is confusing indeed with all those leaderboards, but | |
follow the discord and resources links for the actual learning | |
material. | |
amelius wrote 10 min ago: | |
Thanks, looks interesting indeed. | |
elashri wrote 1 day ago: | |
I will give you personal experience learning CUDA that might be | |
helpful. | |
Disclaime: I don't claim that this is actually a systematic way to | |
learn it and it is more for academic work. | |
I got assigned to a project that needs learning CUDA as part of my PhD. | |
There was no one in my research group who have any experience or know | |
CUDA. I started with standard NVIDIA courses (Getting Started with | |
Accelerated Computing with CUDA C/C++ and there is python version too). | |
This gave me good introduction to the concepts and basic ideas but I | |
think after that I did most of learning by trial and error. I tried a | |
couple of online tutorials for specific things and some books but it | |
was always a deprecated function there or here or a change of API that | |
make things obsolete. Or basically things changed for your GPU and now | |
you have to be careful because yoy might be using GPU version not | |
compatible with what I develop for in production and you need things to | |
work for both. | |
I think learning CUDA for me is an endeavor of pain and going through | |
"compute-sanitizer" and Nsight because you will find that most of your | |
time will go into debugging why things is running slower than you | |
think. | |
Take things slowly. Take a simple project that you know how to do | |
without CUDA then port it to CUDA ane benchmark against CPU and try to | |
optimize different aspect of it. | |
The one advice that can be helpful is not to think about optimization | |
at the beginning. Start with correct, then optimize. A working slow | |
kernel beats a fast kernel that corrupts memory. | |
korbip wrote 22 hours 50 min ago: | |
I can share a similar PhD story (the result being visible here: [1] | |
). Back then I didn't find any tutorials that cover anything beyond | |
the basics (which are still important). | |
Once you have understood the principle working mode and architecture | |
of a GPU, I would recommend the following workflow: | |
1. First create an environment so that you can actually test your | |
kernels against baselines written in a higher-level language. | |
2. If you don't have an urgent project already, try to | |
improve/re-implement existing problems (MatMul being the first | |
example). Don't get caught by wanting to implement all size cases. | |
Take an example just to learn a certain functionality, rather than | |
solving the whole problem if it's just about learning. | |
3. Write the functionality you want to have in increasing complexity. | |
Write loops first, then parallelize these loops over the grid. Use | |
global memory first, then put things into shared memory and | |
registers. Use plain matrix multiplication first, then use mma | |
(TensorCore) primitives to speed things up. | |
4. Iterate over the CUDA C Programming Guide. It covers all (most) of | |
the functionality that you want to learn - but can't be just read an | |
memorized. When you apply it you learn it. | |
5. Might depend on you use-case but also consider using higher-level | |
abstractions like CUTLASS or ThunderKitten. Also, if your environment | |
is jax/torch, use triton first before going to CUDA level. | |
Overall, it will be some pain for sure. And to master it including | |
PTX etc. will take a lot of time. | |
[1]: https://github.com/NX-AI/flashrnn | |
kevmo314 wrote 23 hours 34 min ago: | |
> I think learning CUDA for me is an endeavor of pain and going | |
through "compute-sanitizer" and Nsight because you will find that | |
most of your time will go into debugging why things is running slower | |
than you think. | |
This is so true it hurts. | |
ForgotIdAgain wrote 1 day ago: | |
I have not tried it yet, but seems nice : | |
[1]: https://leetgpu.com/ | |
Onavo wrote 1 day ago: | |
Assuming you are asking this because of the deep learning/ChatGPT hype, | |
the first question you should ask yourself is, do you really need to? | |
The skills needed for CUDA are completely unrelated to building machine | |
learning models. It's like learning to make a TLS library so you can | |
get a full stack web development job. The skills are completely | |
orthogonal. CUDA belongs to the domain of game developers, graphics | |
people, high performance computing and computer engineers (hardware). | |
From the point of view of machine learning development and research, | |
it's nothing more than an implementation detail. | |
Make sure you are very clear on what you want. Most HR departments cast | |
a wide net (it's like how every junior role requires "3-5 years of | |
experience" when in reality they don't really care). Similarly when | |
hiring, most companies pray for the unicorn developer who can | |
understand the entire stack from the GPU to the end user product domain | |
when the day to day is mostly in Python. | |
throwaway81523 wrote 1 day ago: | |
I looked at the CUDA code for Leela Chess Zero and found it pretty | |
understandable, though that was back when Leela used a DCNN instead of | |
transformers. DCNN's are fairly simple and are explained in fast.ai | |
videos that I watched a few years ago, so navigating the Leela code | |
wasn't too difficult. Transformers are more complicated and I want to | |
bone up on them, but I haven't managed to spend any time understanding | |
them. | |
CUDA itself is just a minor departure from C++, so the language itself | |
is no big deal if you've used C++ before. But, if you're trying to get | |
hired programming CUDA, what that really means is they want you | |
implementing AI stuff (unless it's game dev). AI programming is a much | |
wider and deeper subject than CUDA itself, so be ready to spend a bunch | |
of time studying and hacking to come up to speed in that. But if you | |
do, you will be in high demand. As mentioned, the fast.ai videos are a | |
great introduction. | |
In the case of games, that means 3D graphics which these days is | |
another rabbit hole. I knew a bit about this back in the day, but it | |
is fantastically more sophisticated now and I don't have any idea where | |
to even start. | |
robotnikman wrote 15 hours 12 min ago: | |
>But if you do, you will be in high demand | |
So I'm guessing trying to find a job as a CUDA programmer is nowhere | |
as big of a headache compared to other software engineering jobs | |
right now? I'm thinking maybe learning CUDA and more about AI might | |
be a good pivot from the current position as a Java middleware | |
developer. | |
upmind wrote 1 day ago: | |
This is a great idea! This is the code right' [1] I have two beginner | |
(and probably very dumb) questions, why do they have heavy c++/cuda | |
usage rather than using only pytorch/tensorflow. Are they too slow | |
for training Leela? Second, why is there tensorflow code? | |
[1]: https://github.com/leela-zero/leela-zero | |
henrikf wrote 20 hours 57 min ago: | |
That's Leela Zero (plays Go instead of Chess). It was good for its | |
time (~2018) but it's quite outdated now. It also uses OpenCL | |
instead of Cuda. I wrote a lot of that code including Winograd | |
convolution routines. | |
Leela Chess Zero ( [1] ) has much more optimized Cuda backend | |
targeting modern GPU architectures and it's written by much more | |
knowledgeable people than me. That would be a much better source to | |
learn. | |
[1]: https://github.com/LeelaChessZero/lc0 | |
throwaway81523 wrote 23 hours 44 min ago: | |
As I remember, the CUDA code was about 3x faster than the | |
tensorflow code. The tensorflow stuff is there for non-Nvidia | |
GPU's. This was in the era of the GTX 1080 or 2080. No idea about | |
now. | |
upmind wrote 23 hours 36 min ago: | |
Ah I see, thanks a lot! | |
lokimedes wrote 1 day ago: | |
Thereâs a couple of âconcernsâ you may separate to make this a | |
bit more tractable: | |
1. Learning CUDA - the framework, libraries and high-layer wrappers. | |
This is something that changes with times and trends. | |
2. Learning high-performance computing approaches. While a GPU and the | |
Nvlink interfaces are Nvidia specific, working in a massively-parallel | |
distributed computing environment is a general branch of knowledge that | |
is translatable across HPC architectures. | |
3. Application specifics. If your thing is Transformers, you may just | |
as well start from Torch, Tensorflow, etc. and rely on the current | |
high-level abstractions, to inspire your learning down to the | |
fundamentals. | |
Iâm no longer active in any of the above, so I canât be more | |
specific, but if you want to master CUDA, I would say learning how | |
massive-parallel programming works, is the foundation that may | |
translate into transferable skills. | |
david-gpu wrote 12 hours 52 min ago: | |
Former GPU guy here. Yeah, that's exactly what I was going to suggest | |
too, with emphasis on #2 and #3. What kind of jobs are they trying to | |
apply for? Is it really CUDA that they need to be familiar with, or | |
CUDA-based libraries like cuDNN, cuBLAS, cuFFT, etc? | |
Understanding the fundamentals of parallel programming comes first, | |
IMO. | |
chanana wrote 11 hours 1 min ago: | |
> Understanding the fundamentals of parallel programming comes | |
first, IMO. | |
Are there any good resources youâd recommend for that? | |
rramadass wrote 6 hours 5 min ago: | |
I am not the person you asked the question of, but you might find | |
the following useful (in addition to the ones mentioned in my | |
other comments); | |
Foundations of Multithreaded, Parallel, and Distributed | |
Programming by Gregory Andrews - An old classic but still very | |
good explanations of concurrent algorithmic concepts. | |
Parallel Programming: Concepts and Practice by Bertil Schmidt | |
et.al. - A relatively recent book with comprehensive coverage. | |
rramadass wrote 23 hours 35 min ago: | |
This is the right approach. Without (2) trying to learn (1) will just | |
lead to "confusion worse confounded". I also suggest a book | |
recommendation here - | |
[1]: https://news.ycombinator.com/item?id=44216478 | |
jonas21 wrote 19 hours 11 min ago: | |
I think it depends on your learning style. For me, learning | |
something with a concrete implementation and code that you can play | |
around with is a lot easier than trying to study the abstract | |
general concepts first. Once you have some experience with the | |
code, you start asking why things are done a certain way, and that | |
naturally leads to the more general concepts. | |
rramadass wrote 6 hours 19 min ago: | |
It has got nothing to do with "learning styles". Parallel | |
Computing needs knowledge of three things; a) Certain crucial | |
architectural aspects (logical and physical) of the hardware b) | |
Decomposing a problem correctly to map to that hardware c) | |
Algorithms using a specific language/framework to combine the | |
above two. CUDA (and other similar frameworks) only come in the | |
last step and so a knowledge of the first two is a prerequisite. | |
lokimedes wrote 23 hours 2 min ago: | |
This one was my go-to for HPC, but it may be a bit dated by now: | |
[1]: https://www.amazon.com/Introduction-Performance-Computing-... | |
rramadass wrote 21 hours 47 min ago: | |
That's a good book too (i have it) but more general than the | |
Ridgway Scott book which uses examples from Numerical Computation | |
domains. Here is an overview of the chapters; example domains | |
start from chapter 10 onwards - [1] These sort of books are only | |
"dated" when it comes to specific languages/frameworks/libraries. | |
The methods/techniques are evergreen and often conceptually | |
better explained in these older books. | |
For recent up to date works on HPC, the free multi-volume The Art | |
of High Performance Computing by Victor Eijkhout can't be beat - | |
[1]: https://www.jstor.org/stable/j.ctv1ddcxfs | |
[2]: https://news.ycombinator.com/item?id=38815334 | |
dist-epoch wrote 1 day ago: | |
As they typically say: Just Do It (tm). | |
Start writing some CUDA core to sort an array or find the maximum | |
element. | |
the__alchemist wrote 22 hours 17 min ago: | |
I concur with this. Then supplement with resources A/R. Ideally, find | |
some tasks in your programs that are parallelize. (Learning what | |
these are is important too!), and switch them to Cuda. If you don't | |
have any, make a toy case, e.g. an n-body simulation. | |
amelius wrote 1 day ago: | |
I'd rather learn to use a library that works on any brand of GPU. | |
If that is not an option, I'll wait! | |
moralestapia wrote 15 hours 36 min ago: | |
K, bud. | |
Perhaps you haven't noticed, but you're in a thread that asked | |
about CUDA, explicitly. | |
uecker wrote 19 hours 59 min ago: | |
GCC / clang also have support for offloading. | |
latchkey wrote 20 hours 49 min ago: | |
Then learn PyTorch. | |
The hardware between brands is fundamentally different. There isn't | |
a standard like x86 for CPUs. | |
So, while you may use something like HIPIFY to translate your code | |
between APIs, at least with GPU programming, it makes sense to | |
learn how they differ from each other or just pick one of them and | |
work with it knowing that the others will just be some variation of | |
the same idea. | |
horsellama wrote 18 hours 18 min ago: | |
the jobs requiring cuda experience are most of the times because | |
torch is not good enough | |
Cloudef wrote 21 hours 49 min ago: | |
Both zig and rust are aiming to compile to gpus natively. What cuda | |
and hip provide is heterogeneous computing runtime, aka hiding the | |
boilerplate of executing code on cpu and gpu seamlessly | |
pjmlp wrote 1 day ago: | |
If only Khronos and the competition cared about the developer | |
experience.... | |
the__alchemist wrote 22 hours 17 min ago: | |
This is continuously a point of frustration! Vulkan compute is... | |
suboptimal. I use Cuda because it feels like the only practical | |
option. I want Vulkan or something else to compete seriously, but | |
until that happens, I will use Cuda. | |
corysama wrote 19 hours 8 min ago: | |
Is [1] + [2] getting there? | |
Runs on anything + auto-differentiatation. | |
[1]: https://github.com/KomputeProject/kompute | |
[2]: https://shader-slang.org/ | |
pjmlp wrote 21 hours 7 min ago: | |
It took until Vulkanised 2025, to acknowledge Vulkan became the | |
same mess as OpenGL, and to put an action plan into action to | |
try to correct this. | |
Had it not been for Apple with OpenCL initial contribution, | |
regardless of how it went from there, AMD with Mantle as | |
starting point for Vulkan, NVidia with Vulkan-Hpp and Slang, | |
and the ecosystem of Khronos standards would be much worse. | |
Also Vulkan isn't as bad as OpenGL tooling, because LunarG | |
exists, and someone pays them for the whole Vulkan SDK. | |
The attitude "we put paper standards" and the community should | |
step in for the implementations and tooling, hardly comes to | |
the productivity from private APIs tooling. | |
Also all GPU vendors, including Intel and AMD, also rather push | |
their own compute APIs, even if based on top of Khronos ones. | |
david-gpu wrote 12 hours 40 min ago: | |
> The attitude "we put paper standards" and the community | |
should step in for the implementations and tooling | |
Khronos is a consortium financed by its members, who either | |
implement the standards on their own hardware or otherwise | |
depend on the ecosystem around them. For example, competing | |
GPU vendors typically implement the standards in parallel | |
with the committee meetings. The very people who represent | |
their company in Khronos are typically leads of the teams who | |
implement the standards. | |
Source: used to represent my employers at Khronos. It was a | |
difficult, thankless job, that required almost as much | |
diplomacy as technical expertise. | |
pjmlp wrote 3 hours 58 min ago: | |
I know, and the way those members implemented Khronos | |
standards, versus their own proprietary alternatives, shows | |
how it actually works in practice, regarding developer | |
tooling and ergonomics. | |
<- back to front page |