| _______ __ _______ | |
| | | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----. | |
| | || _ || __|| < | -__|| _| | || -__|| | | ||__ --| | |
| |___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____| | |
| on Gopher (inofficial) | |
| Visit Hacker News on the Web | |
| COMMENT PAGE FOR: | |
| No Graphics API | |
| fngjdflmdflg wrote 12 hours 21 min ago: | |
| I think this almost has to be the future if most compute development | |
| goes to AI in the next decade or so, beyond the fact that the proposed | |
| API is much cleaner. Vendors will stop caring about maintaining complex | |
| fixed function hardware and drivers for increasingly complex graphics | |
| APIs when they can get 3x the return from AI without losing any | |
| potential sales, especially in the current day where compute seems to | |
| be more supply limited. Game engines can (and I assume already do) | |
| benefit from general purpose compute anyway for things like physics, | |
| and even for things that it wouldn't matter in itself for performance | |
| or would be slower, doing more on the GPU can be faster if your data is | |
| already on the GPU, which becomes more true the more things are done on | |
| the GPU. And as the author says, it would be great to have an open | |
| source equivalent to CUDA's ecosystem that could be leveraged by games | |
| in a cross platform way. | |
| dundarious wrote 13 hours 38 min ago: | |
| I see this as an expression of the same underlying complaint as Casey | |
| Muratori's 30 Million Line Problem: [1] Casey argues for ISAs for | |
| hardware, including GPUs, instead of heavy drivers. TFA argues for a | |
| graphics API surface that is so lean precisely because it fundamentally | |
| boils down to a simple and small set of primitives (mapping memory, | |
| simple barriers, etc.) that are basically equivalent to a simple ISA. | |
| If a stable ISA was a requirement, I believe we would have converged on | |
| these simpler capabilities ahead of time, as a matter of necessity. | |
| However, I am not a graphics programmer, so I just offer this as an | |
| intellectual provocation to drive conversation. | |
| [1]: https://caseymuratori.com/blog_0031 | |
| newpavlov wrote 13 hours 15 min ago: | |
| I generally agree with this opinion and would love to see a proper | |
| well documented low-level API for working with GPU. But it would | |
| probably result in different "GPU ISAs" for different vendors and | |
| maybe even for different GPU generations from one vendor. The bloated | |
| firmwares and drivers operating on a higher abstraction level allow | |
| to hide a lot of internal implementation details from end users. | |
| In such world most of software would still probably use something | |
| like Vulkan/DX/WebGPU to abstract over such ISAs, like we use today | |
| Java/JavaScript/Python to "abstract" over CPU ISA. And we also likely | |
| to have an NVIDIA monopoly similar to x86. | |
| loup-vaillant wrote 8 hours 24 min ago: | |
| Thereâs a simple (but radical) solution that would force GPU | |
| vendors to settle on a common, stable ISA: forbid hardware vendors | |
| to distribute software. In practice, stop at the border hardware | |
| that comes from vendors who still distribute software. | |
| That simple. Now to sell a GPU, the only way is to make an ISA so | |
| simple even third parties can make good drivers for it. And the | |
| first successful ISA will then force everyone else to implement the | |
| same ISA, so the same drivers will work for everyone. | |
| Oh, one other thing that has to go away: patents must no longer | |
| apply to ISAs. That way, anyone who wants to make and sell x86, | |
| ARM, or whatever GPU ISA that emerges, legally can. No more | |
| discussion about which instruction set is open or not, they all | |
| just are. | |
| Not that the US would ever want to submit Intel to such a brutal | |
| competition. | |
| dundarious wrote 11 hours 51 min ago: | |
| I wouldn't be so sure, as if we analogize to x86(_64), the ISA is | |
| stable and used by many vendors, but the underlying | |
| microarchitecture and caching model, etc., are free reign for | |
| impl-specific work. | |
| zbendefy wrote 13 hours 54 min ago: | |
| >The user writes the data to CPU mapped GPU memory first and then | |
| issues a copy command, which transforms the data to optimal compressed | |
| format. | |
| Wouldnt this mean double gpu memory usage for uploading a potentially | |
| large image? (Even if just for the time the copy is finished) | |
| Vulkan lets the user copy from cpu (host_visible) memory to gpu | |
| (device_local) memory without an intermediate gpu buffer, afaik there | |
| is no double vram usage there but i might be wrong on that. | |
| Great article btw. I hope something comes out of this! | |
| bullen wrote 19 hours 29 min ago: | |
| Personally I'm staying with OpenGL (ES) 3 for eternity. | |
| VAO is the last feature I was missing prior. | |
| Also the other cores will do useful gameplay work so one CPU core for | |
| the GPU is ok. | |
| 4 CPU cores is also enough for eternity. 1GB shared RAM/VRAM too. | |
| Let's build something good on top of the hardware/OSes/APIs/languages | |
| we have now? 3588/linux/OpenGL/C+Java specifically! | |
| Hardware has permanently peaked in many ways, only soft internal | |
| protocols can now evolve, I write mine inside TCP/HTTP. | |
| theandrewbailey wrote 18 hours 7 min ago: | |
| > Also the other cores will do useful gameplay work so one CPU core | |
| for the GPU is ok. | |
| In the before times, upgrading CPU meant eveything runs faster. Who | |
| didn't like that? Today, we need code that infinitely scales CPU | |
| cores for that to remain true. 16 thread CPUs have been around for a | |
| long time; I'd like my software to make the most of them. | |
| When we have 480+Hz monitors, we will probably need more than 1 CPU | |
| core for GPU rendering to make the most of them. | |
| Uh oh | |
| [1]: https://www.amazon.com/ASUS-Swift-Gaming-Monitor-PG27AQDP/dp... | |
| bullen wrote 14 hours 14 min ago: | |
| I'm 60Hz for life. | |
| Maybe 120Hz if they come in 4:3/5:4 with matte low res panel. | |
| But that's enough for VR which needs 2x because two eyes. | |
| So progress ends there. | |
| 16 cores can't share memory well. | |
| Also 15W is peak because more is hard to passively cool in a small | |
| space. So 120Hz x 2 eyes at ~1080 is limit what we can do | |
| anyways... with $1/KWh! | |
| The limits are physical. | |
| imdsm wrote 19 hours 48 min ago: | |
| LLMs will eat this up | |
| SunlitCat wrote 1 day ago: | |
| This article already feels like itâs on the right track. DirectX 11 | |
| was perfectly fine, and DirectX 12 is great if you really want total | |
| control over the hardware but I even remember some IHV saying that this | |
| level of control isnât always a good thing. | |
| When you look at the DirectX 12 documentation and best-practice guides, | |
| youâre constantly warned that certain techniques may perform well on | |
| one GPU but poorly on another, and vice versa. That alone shows how | |
| fragile this approach can be. | |
| Which makes sense: GPU hardware keeps evolving and has become | |
| incredibly complex. Maybe graphics APIs should actually move further up | |
| the abstraction ladder again, to a point where you mainly upload | |
| models, textures, and a high-level description of what the scene and | |
| objects are supposed to do and how they relate to each other. The | |
| hardware (and its driver) could then decide whatâs optimal and how to | |
| turn that into pixels on the screen. | |
| Yes, game engines and (to some extent) RHIs already do this, but having | |
| such an approach as a standardized, optional graphics API would be | |
| interesting. It would allow GPU vendors to adapt their drivers closely | |
| to their hardware, because they arguably know best what their hardware | |
| can do and how to do it efficiently. | |
| canyp wrote 1 day ago: | |
| > but I even remember some IHV saying that this level of control | |
| isnât always a good thing. | |
| Because that control is only as good as you can master it, and not | |
| all game developers do well on that front. Just check out enhanced | |
| barriers in DX12 and all of the rules around them as an example. You | |
| almost need to train as a lawyer to digest that clusterfuck. | |
| > The hardware (and its driver) could then decide whatâs optimal | |
| and how to turn that into pixels on the screen. | |
| We should go in the other direction: have a goddamn ISA you can | |
| target across architectures, like an x86 for GPUs (though ideally not | |
| that encumbered by licenses), and let people write code against it. | |
| Get rid of all the proprietary driver stack while you're at it. | |
| alaingalvan wrote 1 day ago: | |
| If you enjoyed history of GPUs section, there's a great book that goes | |
| into more detail by Jon Peddie titled "The History of the GPU - Steps | |
| to Invention", definitely worth a read. | |
| delifue wrote 1 day ago: | |
| This reminds of me Makimotoâs Wave: [1] There is a constant cycle | |
| between domain-specific hardware-hardcoded-algorithm design, and | |
| programmable flexible design. | |
| [1]: https://semiengineering.com/knowledge_centers/standards-laws/l... | |
| pavlov wrote 23 hours 57 min ago: | |
| It's also known as Sutherland's Wheel of Reincarnation: | |
| [1]: http://www.cap-lore.com/Hardware/Wheel.html | |
| awolven wrote 1 day ago: | |
| Is this going to materialize into a "thing"? | |
| qingcharles wrote 1 day ago: | |
| I started my career writing software 3D renderers before switching to | |
| Direct3D in the later 90s. What I wonder is if all of this is going to | |
| just get completely washed away and made totally redundant by the | |
| incoming flood of hallucinated game rendering? | |
| Will it be possible to hallucinate the frame of a game at a similar | |
| speed to rendering it with a mesh and textures? | |
| We're already seeing the hybrid version of this where you render a | |
| lower res mesh and hallucinate the upscaled, more detailed, more | |
| realistic looking skin over the top. | |
| I wouldn't want to be in the game engine business right now :/ | |
| cubefox wrote 6 hours 3 min ago: | |
| It is more likely that machine learning models will be used by the | |
| game artists for asset generation, but not for rendering those assets | |
| at the client side, which would be extremely expensive. | |
| But another upcoming use case of ML on the client side is neural | |
| texture compression, which somehow needs not just less storage but | |
| also less RAM. Though it comes at a computational (frame time) cost | |
| on the client side, though not as bad as generative AI. | |
| Neural mesh compression could be another potential thing we get in | |
| the future. (All lossy compression seems to go in the ML direction: | |
| currently there is a lot of work going on with next generation neural | |
| audio and video codecs. E.g. [1] ) | |
| [1]: https://arxiv.org/abs/2502.20762 | |
| webdevver wrote 17 hours 46 min ago: | |
| reminds me of this remark made by Carmack on hidden surface removal | |
| [1] > "research from the 70s especially, there was tons of work going | |
| on on hidden surface removal, these clever different algorithmic ways | |
| - today we just kill it with a depth buffer. We just throw megabytes | |
| and megabytes of memory and the problem gets solved much much | |
| easier." | |
| ofcourse "megabytes" of memory was unthinkiable in the 70s. but for | |
| us, its unthinkable to have real-time frame inferencing. I cant help | |
| but draw the parallels between our current-day "clever algorithmic | |
| ways" of drawing pixels to the screen. | |
| I definitely agree with the take that in the grand scheme of things, | |
| all this pixel rasterizing business will be a transient moment that | |
| will be washed away with a much simpler petaflop/exaflop local TPU | |
| that runs at 60W under load, and it simply 'dreams' frames and | |
| textures for you. | |
| [1]: https://www.youtube.com/watch?v=P6UKhR0T6cs&t=2315s | |
| qingcharles wrote 11 hours 24 min ago: | |
| Agree. If you look at the GPU in an iPhone 17 and compare to the | |
| desktop GPU I had in 1998, the difference is startling. | |
| Voodoo in 1998 could render about 3m poly/sec on a Utah teapot, | |
| which was absurd number at the time, where I was coming from | |
| software renderers that were considered amazing at 100K/sec. | |
| A19 Pro GPU could do about 5bn/sec at about 4X the resolution. And | |
| it fits in your pocket. And runs off a tiny battery. Which also | |
| powers the screen. | |
| 25 years from now a 5090 GPU will be laughably bad. I have no idea | |
| how fast we'll be able to hallucinate entire scenes, but my guess | |
| is that it'll be above 60fps. | |
| aj_hackman wrote 13 hours 35 min ago: | |
| What happens when you want to do something very new, or very | |
| specific? | |
| 8n4vidtmkvmk wrote 1 day ago: | |
| I just assumed hallucinated rendering was a stepping stone to | |
| training AGIs or something. No one is actually seriously trying to | |
| build games that way, are they? Seems horribly inefficient at best, | |
| and incoherent at worst. | |
| jsheard wrote 1 day ago: | |
| You can't really do a whole lot of inference in 16ms on consumer | |
| hardware. Not to say that inference isn't useful in realtime | |
| graphics, DLSS has proven itself well enough, but that's a very small | |
| model laser-targetted at one specific problem and even that takes a | |
| few milliseconds to do its thing. Fitting behemoth generative models | |
| into those time constraints seems like an uphill battle. | |
| overgard wrote 1 day ago: | |
| I'm kind of curious about something.. most of my graphics experience | |
| has been OpenGL or WebGL (tiny bit of Vulkan) or big engines like | |
| Unreal or Unity. I've noticed over the years the uptake of DX12 always | |
| seemed marginal though (a lot of things stayed on D3D11 for a really | |
| long time). Is Direct3D 12 super awful to work with or something? I | |
| know it requires more resource management than 11, but so does Vulkan | |
| which doesn't seem to have the same issue.. | |
| flohofwoe wrote 20 hours 46 min ago: | |
| > but so does Vulkan which doesn't seem to have the same issue | |
| Vulkan has the same issues (and more) as D3D12, you just don't hear | |
| much about it because there are hardly any games built directly on | |
| top of Vulkan. Vulkan is mainly useful as Proton backend on Linux. | |
| canyp wrote 1 day ago: | |
| Most AAA titles are on DX12 now. ID is on Vulkan. E-sports titles | |
| remain largely on the DX11 camp. | |
| What the modern APIs give you is less CPU driver overhead and new | |
| functionality like ray tracing. If you're not CPU-bound to begin with | |
| and don't need those new features, then there's not much of a reason | |
| to switch. The modern APIs require way more management than the prior | |
| ones; memory management, CPU-GPU synchronization, avoiding resource | |
| hazards, etc. | |
| Also, many of those AAA games are also moving to UE5, which is | |
| basically DX12 under the hood (presumably it should have a Vulkan | |
| backend too, but I don't see it used much?) | |
| kasool wrote 23 hours 53 min ago: | |
| UE5 has a fairly mature Vulkan backend but as you might guess is | |
| second class to DX12. | |
| starkparker wrote 1 day ago: | |
| > GPU hardware started to shift towards a generic SIMD design. SIMD | |
| units were now executing all the different shader types: vertex, pixel, | |
| geometry, hull, domain and compute. Today the framework has 16 | |
| different shader entry points. This adds a lot of API surface and makes | |
| composition difficult. As a result GLSL and HLSL still donât have a | |
| flourishing library ecosystem ... despite 20 years of existence | |
| A lot of this post went over my head, but I've struggled enough with | |
| GLSL for this to be triggering. Learning gets brutal for the lack of | |
| middle ground between reinventing every shader every time and using an | |
| engine that abstracts shaders from the render pipeline. A lot of | |
| open-source projects that use shaders are either allergic to | |
| documenting them or are proud of how obtuse the code is. Shadertoy is | |
| about as good as it gets, and that's not a compliment. | |
| The only way I learned anything about shaders was from someone who | |
| already knew them well. They learned what they knew by spending a solid | |
| 7-8 years of their teenage/young adult years doing nearly nothing but | |
| GPU programming. There's probably something in between that doesn't | |
| involve giving up and using node-based tools, but in a couple decades | |
| of trying and failing to grasp it I've never found it. | |
| canyp wrote 1 day ago: | |
| This page is a good place to start for shader programming: [1] I | |
| agree on the other points. GPU graphics programming is hard in large | |
| part because of terrible or lack of documentation. | |
| [1]: https://lettier.github.io/3d-game-shaders-for-beginners/inde... | |
| modeless wrote 1 day ago: | |
| I don't understand this part: | |
| > Meshlet has no clear 1:1 lane to vertex mapping, thereâs no | |
| straightforward way to run a partial mesh shader wave for selected | |
| triangles. This is the main reason mobile GPU vendors havenât been | |
| keen to adapt the desktop centric mesh shader API designed by Nvidia | |
| and AMD. Vertex shaders are still important for mobile. | |
| I get that there's no mapping from vertex/triangle to tile until after | |
| the mesh shader runs. But even with vertex shaders there's also no | |
| mapping from vertex/triangle to tile until after the vertex shader | |
| runs. The binning of triangles to tiles has to happen after the | |
| vertex/mesh shader stage. So I don't understand why mesh shaders would | |
| be worse for mobile TBDR. | |
| I guess this is suggesting that TBDR implementations split the vertex | |
| shader into two parts, one that runs before binning and only calculates | |
| positions, and one that runs after and computes everything else. I | |
| guess this could be done but it sounds crazy to me, probably | |
| duplicating most of the work. And if that's the case why isn't there an | |
| extension allowing applications to explicitly separate position and | |
| attribute calculations for better efficiency? (Maybe there is?) | |
| Edit: I found docs on Intel's site about this. I think I understand | |
| now. [1] Yes, you have to execute the vertex shader twice, which is | |
| extra work. But if your main constraint is memory bandwidth, not FLOPS, | |
| then I guess it can be better to throw away the entire output of the | |
| vertex shader except the position, rather than save all the output in | |
| memory and read it back later during rasterization. At rasterization | |
| time when the vertex shader is executed again, you only shade the | |
| triangles that actually went into your tile, and the vertex shader | |
| outputs stay in local cache and never hit main memory. And this doesn't | |
| work with mesh shaders because you can't pick a subset of the mesh's | |
| triangles to shade. | |
| It does seem like there ought to be an extension to add separate | |
| position-only and attribute-only vertex shaders. But it wouldn't help | |
| the mesh shader situation. | |
| [1]: https://www.intel.com/content/www/us/en/developer/articles/gui... | |
| yuriks wrote 1 day ago: | |
| I thought that the implication was that the shader compiler produces | |
| a second shader from the same source that went through a dead code | |
| elimination pass which maintains only the code necessary to calculate | |
| the position, ignoring other attributes. | |
| modeless wrote 1 day ago: | |
| Sure, but that only goes so far, especially when users aren't | |
| writing their shaders with knowledge that this transform is going | |
| to be applied or any tools to verify that it's able to eliminate | |
| anything. | |
| hrydgard wrote 19 hours 25 min ago: | |
| Well, it is what is done on several tiler architectures, and it | |
| generally works just fine. Normally your computations of the | |
| position aren't really intertwined with the computation of the | |
| other outputs, so dead code elimination does a good job. | |
| kasool wrote 23 hours 56 min ago: | |
| Why would it be difficult? There are explicit shader semantics to | |
| specify output position. | |
| In fact, Qualcomm's documentation spells this out: | |
| [1]: https://docs.qualcomm.com/nav/home/overview.html?product... | |
| xyzsparetimexyz wrote 1 day ago: | |
| This needs an index and introduction. It's also not super interesting | |
| to people in industry? Like yeah, it'd be nice if bindless textures | |
| were part of the API so you didn't need to create that global | |
| descriptor set. It'd be nice if you just sample from pointers to | |
| textures similar to how dereferencing buffer pointers works. | |
| wg0 wrote 1 day ago: | |
| Very well written but I can't understand much of this article. | |
| What would be one good primer to be able to comprehend all the design | |
| issues raised? | |
| jplusequalt wrote 10 hours 9 min ago: | |
| A working understanding of legacy graphics APIs, GPU hardware, and | |
| some knowledge of Vulkan/DirectX 12/CUDA. | |
| I have all of that but DX12 knowledge, and 50% of this article still | |
| went over my head. | |
| cmovq wrote 1 day ago: | |
| [1]: https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-... | |
| arduinomancer wrote 1 day ago: | |
| To be honest there isn't really one, a lot of these concepts are | |
| advanced even for graphics programmers | |
| adrian17 wrote 1 day ago: | |
| IMO the minimum is to be able to read a âhello world / first | |
| triangleâ example for any of the modern graphics APIs (OpenGL/WebGL | |
| doesnât count, WebGPU does), and have a general understanding of | |
| each step performed (resource creation, pipeline setup, passing data | |
| to shaders, draws, synchronization). Also to understand where the | |
| pipeline explosion issue comes from. | |
| Bonus points if you then look at CUDA âhello worldâ and consider | |
| that it can do nontrivial work on the same hardware (sans fixed | |
| function accelerators) with much less boilerplate (and driver | |
| overhead). | |
| jdashg wrote 1 day ago: | |
| And the GPU API cycle of life and death continues! | |
| I was an only-half-joking champion of ditching vertex attrib bindings | |
| when we were drafting WebGPU and WGSL, because it's a really nice | |
| simplification, but it was felt that would be too much of a departure | |
| from existing APIs. (Spending too many of our "Innovation Tokens" on | |
| something that would cause dev friction in the beginning) | |
| In WGSL we tried (for a while?) to build language features as "sugar" | |
| when we could. You don't have to guess what order or scope a `for` loop | |
| uses when we just spec how it desugars into a simpler, more explicit | |
| (but more verbose) core form/dialect of the language. | |
| That said, this powerpoint-driven-development flex knocks this back a | |
| whole seriousness and earnestness tier and a half: | |
| > My prototype API fits in one screen: 150 lines of code. The blog post | |
| is titled âNo Graphics APIâ. Thatâs obviously an impossible goal | |
| today, but we got close enough. WebGPU has a smaller feature set and | |
| features a ~2700 line API (Emscripten C header). | |
| Try to zoom out on the API and fit those *160* lines on one screen! My | |
| browser gives up at 30%, and I am still only seeing 127. This is just | |
| dishonesty, and we do not need more of this kind of puffery in the | |
| world. | |
| And yeah, it's shorter because it is a toy PoC, even if one I enjoyed | |
| seeing someone else's take on it. Among other things, the author pretty | |
| dishonestly elides the number of lines the enums would take up. (A | |
| texture/data format enum on one line? That's one whole additional | |
| Pinocchio right there!) | |
| I took WebGPU.webidl and did a quick pass through removing some of the | |
| biggest misses of this API (queries, timers, device loss, errors in | |
| general, shader introspection, feature detection) and some of the | |
| irrelevant parts (anything touching canvas, external textures), and | |
| immediately got it down to 241 declarations. | |
| This kind of dishonest puffery holds back an otherwise interesting | |
| article. | |
| m-schuetz wrote 1 day ago: | |
| Man, how I wish WebGPU didn't go all-in on legacy Vulkan API model, | |
| and instead find a leaner approach to do the same thing. Even Vulkan | |
| stopped doing pointless boilerplate like bindings and pipelines. | |
| Ditching vertex attrib bindings and going for programmable vertex | |
| fetching would have been nice. | |
| WebGPU could have also introduced Cuda's simple launch model for | |
| graphics APIs. Instead of all that insane binding boilerplate, just | |
| provide the bindings as launch args to the draw call like | |
| draw(numTriangles, args), with args being something like | |
| draw(numTriangles, {uniformBuffer, positions, uvs, samplers}), | |
| depending on whatever the shaders expect. | |
| pjmlp wrote 11 hours 49 min ago: | |
| My biggest issues with WebGPU are, yet another shading language, | |
| and after 15 years, browser developers don't care one second for | |
| debugging tools. | |
| It is either pixel debugging, or trying to replicate in native code | |
| for proper tooling. | |
| m-schuetz wrote 11 hours 41 min ago: | |
| Ironically, WebGPU was way more powerful about 5 years ago before | |
| WGSL was made mandatory. Back then you could just use any Spirv | |
| with all sorts of extensions, including stuff like 64bit types | |
| and atomics. | |
| Then wgsl came and crippled WebGPU. | |
| CupricTea wrote 14 hours 1 min ago: | |
| >Man, how I wish WebGPU didn't go all-in on legacy Vulkan API model | |
| WebGPU doesn't talk to the GPU directly. It requires | |
| Vulkan/D3D/Metal underneath to actually implement itself. | |
| >Even Vulkan stopped doing pointless boilerplate like bindings and | |
| pipelines. | |
| Vulkan did no such thing. As of today (Vulkan 1.4) they added | |
| VK_KHR_dynamic_rendering to core and added the VK_EXT_shader_object | |
| extension, which are not required to be supported and must be | |
| queried for before using. The former gets rid of render pass | |
| objects and framebuffer objects in favor of vkCmdBeginRendering(), | |
| and WebGPU already abstracts those two away so you don't see or | |
| deal with them. The latter gets rid of monolithic pipeline objects. | |
| Many mobile GPUs still do not support VK_KHR_dynamic_rendering or | |
| VK_EXT_shader_object. Even my very own Samsung Galaxy S24 Ultra[1] | |
| doesn't support shaderObject. | |
| Vulkan did not get rid of pipeline objects, they added extensions | |
| for modern desktop GPUs that didn't need them. Even modern mobile | |
| GPUs still need them, and WebGPU isn't going to fragment their API | |
| to wall off mobile users. | |
| [1]: https://vulkan.gpuinfo.org/displayreport.php?id=44583 | |
| m-schuetz wrote 12 hours 0 min ago: | |
| > WebGPU doesn't talk to the GPU directly. It requires | |
| Vulkan/D3D/Metal underneath to actually implement itself. | |
| So does WebGL and it's doing perfectly fine without pipelines. | |
| They were never necessary. Since WebGL can do without pipelines, | |
| WebGPU can too. Backends can implement via pipelines, or they can | |
| go for the modern route and ignore them. | |
| They are an artificial problem that Vulkan created and WebGPU | |
| mistakenly adopted, and which are now being phased out. Some | |
| devices may refuse to implement pipeline-free drivers, which is | |
| okay. I will happily ignore them. Let's move on into the 21st | |
| century without that design mistake, and let legacy devices and | |
| companies that refuse to adapt die in dignity. But let's not let | |
| them hold back everyone else. | |
| p_l wrote 23 hours 12 min ago: | |
| My understanding is that pipelines in Vulkan still matter if you | |
| target certain GPUs though. | |
| m-schuetz wrote 23 hours 6 min ago: | |
| At some point, we need to let legacy hardware go. Also, WebGL did | |
| just fine without pipelines, despite being mapped to Vulkan and | |
| DirectX code under the hood. Meaning WebGPU could have also | |
| worked without pipelines just fine as well. The backends can then | |
| map to whatever they want, using modern code paths for modern | |
| GPUs. | |
| flohofwoe wrote 20 hours 42 min ago: | |
| > Also, WebGL did just fine without pipelines, despite being | |
| mapped to Vulkan and DirectX code under the hood. | |
| ...at the cost of creating PSOs at random times which is an | |
| expensive operation :/ | |
| m-schuetz wrote 20 hours 20 min ago: | |
| No longer an issue with dynamic rendering and shader objects. | |
| And never was an issue with OpenGL. Static pipelines are an | |
| artificial problem that Vulkan imposed for no good reason, | |
| and which they reverted in recent years. | |
| flohofwoe wrote 19 hours 47 min ago: | |
| Going entirely back to the granular GL-style state soup | |
| would have significant 'usability problems'. It's too easy | |
| to accidentially leak incorrect state from a previous draw | |
| call. | |
| IMHO a small number of immutable state objects is the best | |
| middle ground (similar to D3D11 or Metal, but reshuffled | |
| like described in Seb's post). | |
| m-schuetz wrote 19 hours 37 min ago: | |
| Not using static pipelines does not imply having to use a | |
| global state machine like OpenGL. You could also make an | |
| API that uses a struct for rasterizer configs and pass it | |
| as an argument to a multi draw call. I would have | |
| actually preferred that over all the individual setters | |
| in Vulkan's dynamic rendering approach. | |
| p_l wrote 23 hours 3 min ago: | |
| Quoting things I only heard about, because I don't do enough | |
| development in this area, but I recall reading that it impacted | |
| performance on pretty much every mobile chip (discounting | |
| Apple's because there you go through a completely different API | |
| and they got to design the hw together with API). | |
| Among other things, that covers everything running on | |
| non-apple, non-nvidia ARM devices, including freshly bought. | |
| xyzsparetimexyz wrote 1 day ago: | |
| Who cares about dev friction in the beginning? That was a bad choice. | |
| vegabook wrote 1 day ago: | |
| ironically, explaining that "we need a simpler API" takes a dense | |
| 69-page technical missive that would make the Kronos Vulkan tutorial | |
| blush. | |
| Pannoniae wrote 1 day ago: | |
| It's actually not that low-level! It doesn't really get into hardware | |
| specifics that much (other than showing what's possible across | |
| different HW) or stuff like what's optimal where. | |
| And it's quite a bit simpler than what we have in the "modern" GPU | |
| APIs atm. | |
| mkoubaa wrote 1 day ago: | |
| I don't understand why you think this is ironic | |
| klaussilveira wrote 1 day ago: | |
| NVIDIA's NVRHI has been my favorite abstraction layer over the | |
| complexity that modern APIs bring. | |
| In particular, this fork: [1] which adds some niceties and quality of | |
| life improvements. | |
| [1]: https://github.com/RobertBeckebans/nvrhi | |
| greggman65 wrote 1 day ago: | |
| This seems tangentially related? | |
| [1]: https://github.com/google/toucan | |
| Bengalilol wrote 1 day ago: | |
| After reading this article, I feel like I've witnessed a historic | |
| moment. | |
| bogwog wrote 1 day ago: | |
| Most of it went over my head, but there's so much knowledge and | |
| expertise on display here that it makes me proud that this person | |
| I've never met is out there proving that software development isn't | |
| entirely full of clowns. | |
| ehaliewicz2 wrote 1 day ago: | |
| Seb is incredibly passionate about games and graphics programming. | |
| You can find old posts of his on various forums, talking about | |
| tricks for programming the PS2, PS3, Xbox 360, etc etc. He | |
| regularly posts demos he's working on, progress clips of various | |
| engines, etc, on twitter, after staying in the same area for 3 | |
| decades. | |
| I wish I still had this level of motivation :) | |
| ginko wrote 1 day ago: | |
| I mean sure, this should be nice and easy. | |
| But then game/engine devs want to use the vertex shader producing a uv | |
| coordinate and a normal together with a pixel shader that only reads | |
| the uv coordinate (or neither for shadow mapping) and don't want to pay | |
| for the bandwidth of the unused vertex outputs (or the cost of | |
| calculating them). | |
| Or they want to be able to randomly enable any other pipeline stage | |
| like tessellation or geometry and the same shader should just work | |
| without any performance overhead. | |
| Pannoniae wrote 1 day ago: | |
| A preprocessor step mostly solves this one. No one said that the | |
| shader source has to go into the GPU API 1:1. | |
| Basically do what most engines do - have preprocessor constants and | |
| use different paths based on what attributes you need. | |
| I also don't see how separated pipeline stages are against this - you | |
| already have this functionality in existing APIs where you can swap | |
| different stages individually. Some changes might need a fixup from | |
| the driver side, but nothing which can't be added in this proposed | |
| API's `gpuSetPipeline` implementation... | |
| blakepelton wrote 1 day ago: | |
| Great post, it brings back a lot of memories. Two additional factors | |
| that designers of these APIs consider are: | |
| * GPU virtualization (e.g., the D3D residency APIs), to allow many | |
| applications to share GPU resources (e.g., HBM). | |
| * Undefined behavior: how easy is it for applications to accidentally | |
| or intentionally take a dependency on undefined behavior? This can | |
| make it harder to translate this new API to an even newer API in the | |
| future. | |
| aarroyoc wrote 1 day ago: | |
| Impressive post, so many details. I could only understand some parts of | |
| it, but I think this article will probably be a reference for future | |
| graphics API. | |
| I think it's fair to say that for most gamers, Vulkan/DX12 hasn't | |
| really been a net positive, the PSO problem affected many popular games | |
| and while Vulkan has been trying to improve, WebGPU is tricky as it has | |
| is roots on the first versions of Vulkan. | |
| Perhaps it was a bad idea to go all in to a low level API that exposes | |
| many details when the hardware underneath is evolving so fast. Maybe | |
| CUDA, as the post says in some places, with its more generic computing | |
| support is the right way after all. | |
| apitman wrote 9 hours 16 min ago: | |
| The PSO problem is referring to this, right? | |
| [1]: https://therealmjp.github.io/posts/shader-permutations-part1... | |
| qiine wrote 17 hours 44 min ago: | |
| yeah.. let's make nvidia control more things.. | |
| m-schuetz wrote 14 hours 4 min ago: | |
| Problem is that NVIDIA literally makes the only sane | |
| graphics/compute APIs. And part of it is to make the API | |
| accessible, not needlessly overengineered. Either the other vendors | |
| start to step up their game, or they'll continue to lose. | |
| Archit3ch wrote 6 hours 40 min ago: | |
| > Problem is that NVIDIA literally makes the only sane | |
| graphics/compute APIs. | |
| Hot take, Metal is more sane than CUDA. | |
| m-schuetz wrote 2 hours 15 min ago: | |
| I'm having a hard time taking an API seriously that uses atomic | |
| types rather than atomic functions. But at least it seems to be | |
| better than Vulkan/OpenGL/DirectX. | |
| erwincoumans wrote 23 hours 58 min ago: | |
| Yes, an amazing and detailed post, enjoyed all of it. In AI, it is | |
| common to use jit compilers (pytorch, jax, warp, triton, taichi, ...) | |
| that compile to cuda (or rocm, cpu, tpu, ...). | |
| You could write renderers like that, rasterizers or raytracers. | |
| For example: [1] (A new simple raytracer that compiles to cuda, used | |
| for robotics reinforcement learning, renders at up to 1 million fps | |
| at low resolution, 64x64, with textures, shadows) | |
| [1]: https://github.com/StafaH/mujoco_warp/blob/render_context/mu... | |
| pjmlp wrote 1 day ago: | |
| I have followed Sebastian Aaltonen's work for quite a while now, so | |
| maybe I am a bit biased, this is however a great article. | |
| I also think that the way forward is to go back to software rendering, | |
| however this time around those algorithms and data structures are | |
| actually hardware accelerated as he points out. | |
| Note that this is an ongoing trend on VFX industry already, about 5 | |
| years ago OTOY ported their OctaneRender into CUDA as the main | |
| rendering API. | |
| torginus wrote 10 hours 35 min ago: | |
| I really want to make a game using a software rasterizer sometime - | |
| just to prove its possible. Back in the good ol' days, I had to get | |
| by on my dad's PC, which had no graphics acceleration, but a farily | |
| substatial Pentium 3 processor. | |
| Games like the original Half-Life, Unreal Tournament 2004, etc. ran | |
| surprisingly well and at decent resolutions. | |
| With the power of modern hardware, I guess you could do a decent FPS | |
| in pure software with even naively written code, and not having to | |
| deal with the APIs, but having the absolute creative freedom to say | |
| 'this pixel is green' would be liberating. | |
| Fun fact: Due to the divergent nature of computation, many ray | |
| tracers targeting real time performance were written on CPU, even | |
| when GPUs were quite powerful, software raytracers were quite good, | |
| until the hardware apis started popping up. | |
| darzu wrote 9 hours 43 min ago: | |
| You should! And you might enjoy this video about making a CPU | |
| rasterizer: [1] Note that when the parent comment says "software | |
| rendering" they're referring to software (compute shaders) on the | |
| GPU. | |
| [1]: https://www.youtube.com/watch?v=yyJ-hdISgnw | |
| Q6T46nT668w6i3m wrote 1 day ago: | |
| But they still rely on fixed functions for a handful of essential ops | |
| (e.g., intersection). | |
| gmueckl wrote 1 day ago: | |
| There are tons of places within the GPU where dedicated fixed | |
| function hardware provides massive speedups within the relevant | |
| pipelines (rasterization, raytracing). The different shader types are | |
| designed to fit inbetween those stages. Abandoning this hardware | |
| would lead to a massive performance regression. | |
| formerly_proven wrote 1 day ago: | |
| Just consider the sheer number of computations offloaded to TMUs. | |
| Shaders would already do nothing but interpolate texels if you | |
| removed them. | |
| efilife wrote 1 day ago: | |
| Offtop, but sorry, I can't resist. "Inbetween" is not a word. I | |
| started seeing many people having trouble with prepositions lately, | |
| for some unknown reason. | |
| > âInbetweenâ is never written as one word. If you have seen it | |
| written in this way before, it is a simple typo or misspelling. You | |
| should not use it in this way because it is not grammatically | |
| correct as the noun phrase or the adjective form. | |
| [1]: https://grammarhow.com/in-between-in-between-or-inbetween/ | |
| cracki wrote 21 hours 8 min ago: | |
| Your entire post does not once mention the form you call correct. | |
| If you intend for people to click the link, then you might just | |
| as well delete all the prose before it. | |
| Antibabelic wrote 22 hours 10 min ago: | |
| "Offtop" is not a word. It's not in any English dictionary I | |
| could find and doesn't appear in any published literature. | |
| Matthew 7:3 "And why beholdest thou the mote that is in thy | |
| brother's eye, but considerest not the beam that is in thine own | |
| eye?" | |
| speed_spread wrote 17 hours 40 min ago: | |
| Language evolves in mysterious ways. FWIW I find offtop to have | |
| high cromulency. | |
| Joker_vD wrote 20 hours 14 min ago: | |
| Oh, it's a transliteration of Russian "оÑÑоп", which | |
| itself started as a borrowing of "off-topic" from English (but | |
| as a noun instead of an adjective/stative) and then went some | |
| natural linguistic developments, namely loss of a hyphen and | |
| degemination, surface analysis of the trailing "-ic" as Russian | |
| suffix "-ик" [0], and its subsequent removal to obtain the | |
| supposed "original, non-derived" form. | |
| [0] | |
| [1]: https://en.wiktionary.org/wiki/-%D0%B8%D0%BA#Russian | |
| fngjdflmdflg wrote 12 hours 35 min ago: | |
| >subsequent removal to obtain the supposed "original, | |
| non-derived" form | |
| Also called a "back-formation". FWIF I don't think the | |
| existence of corrupted words automatically justifies more | |
| corruptions nor does the fact that it is a corruption | |
| automatically invalidate it. When language among a group | |
| evolves, everyone speaking that language is affected, which | |
| is why written language reads pretty differently looking back | |
| every 50 years or so, in both formal and informal writing. | |
| Therefore language changes should have buy-in from all users. | |
| mikestorrent wrote 1 day ago: | |
| Surely you mean "I've started seeing..." rather than "I started | |
| seeing..."? | |
| dragonwriter wrote 13 hours 58 min ago: | |
| Either the present perfect that you suggest or the past perfect | |
| originally presented is correct, and the denotation is | |
| basically identical. The connotation is slightly different, as | |
| the past perfect puts more emphasis on the "started...lately" | |
| and the emergent nature of the phenomenon, and the present | |
| perfect on the ongoing state of what was started, but thereâs | |
| no giant difference. | |
| dist-epoch wrote 1 day ago: | |
| If enough people use it, it will become correct. This is how | |
| language evolves. BTW, there is no "official English language | |
| specification". | |
| And linguists think it would be a bad idea to have one: | |
| [1]: https://archive.nytimes.com/opinionator.blogs.nytimes.co... | |
| mrec wrote 1 day ago: | |
| Isn't this already happening to some degree? E.g. UE's Nanite uses a | |
| software rasterizer for small triangles, albeit running on the GPU | |
| via a compute shader. | |
| djmips wrote 1 day ago: | |
| Why do you say 'albeit'? I think it's established that 'software | |
| rendering' can mean running on the GPU. That's what Octane is | |
| doing with CUDA in the comment you are replying to. But good | |
| callout on Nanite. | |
| mrec wrote 1 day ago: | |
| No good reason, I'm just very very old. | |
| jsheard wrote 1 day ago: | |
| Things are kind of heading in two opposite directions at the | |
| moment. Early GPU rasterization was all done in fixed-function | |
| hardware, but then we got programmable shading, and then we started | |
| using compute shaders to feed the HW rasterizer, and then we | |
| started replacing the HW rasterizer itself with more compute (as in | |
| Nanite). The flexibility of doing whatever you want in software has | |
| gradually displaced the inflexible hardware units. | |
| Meanwhile GPU raytracing was a purely software affair until quite | |
| recently when fixed-function raytracing hardware arrived. It's fast | |
| but also opaque and inflexible, only exposed through high-level | |
| driver interfaces which hide most of the details, so you have to | |
| let Jensen take the wheel. There's nothing stopping someone from | |
| going back to software RT of course but the performance of hardware | |
| RT is hard to pass up for now, so that's mostly the way things are | |
| going even if it does have annoying limitations. | |
| opminion wrote 1 day ago: | |
| The article is missing this motivation paragraph, taken from the blog | |
| index: | |
| > Graphics APIs and shader languages have significantly increased in | |
| complexity over the past decade. Itâs time to start discussing how to | |
| strip down the abstractions to simplify development, improve | |
| performance, and prepare for future GPU workloads. | |
| stevage wrote 1 day ago: | |
| Thanks, I had trouble figuring out what the article was about, lost | |
| in all the "here's how I used AI and had the article screened by | |
| industry insiders". | |
| jama211 wrote 12 hours 30 min ago: | |
| You only read two paragraphs in then? | |
| yuriks wrote 1 day ago: | |
| I was lost when it suddenly jumped from a long retrospective on | |
| GPUs to abruptly talking about "my allocator API" on the next | |
| paragraph with no segue or justification. | |
| masspro wrote 1 day ago: | |
| I read that whole (single) paragraph as âI made really, really, | |
| really sure I didnât violate any NDAs by doing these things to | |
| confirm everything had a public sourceâ | |
| beAbU wrote 21 hours 13 min ago: | |
| This is literally the second paragraph in the article. There is | |
| no need for interpretation here. | |
| Unless the link of the article has changed since your comment? | |
| doctorpangloss wrote 1 day ago: | |
| haha, instead of making them read an AI-coauthored blog post, which | |
| obviously, they didn't do, he could have asked them interesting | |
| questions like, "Do better graphics make better games?" or "If you | |
| could change anything about the platforms' technology, what would | |
| it be?" | |
| alberth wrote 1 day ago: | |
| Would this be analogous to NVMe? | |
| Meaning ... SSDs initially reused IDE/SATA interfaces, which had | |
| inherent bottlenecks because those standards were designed for | |
| spinning disks. | |
| To fully realize SSD performance, a new transport had to be built | |
| from the ground up, one that eliminated those legacy assumptions, | |
| constraints and complexities. | |
| rnewme wrote 1 day ago: | |
| ...and introduced new ones. | |
| MaximilianEmel wrote 1 day ago: | |
| I wonder if Valve might put out their own graphics API for SteamOS. | |
| m-schuetz wrote 1 day ago: | |
| Valve seems to be substantially responsible for the mess that is | |
| Vulkan. They were one of its pioneers from what I heard when chatting | |
| with Vulkan people. | |
| pjmlp wrote 1 day ago: | |
| Samsung and Google also have their share, see who does most of | |
| Vulkanised talks. | |
| jsheard wrote 1 day ago: | |
| There's plenty of blame to go around, but if any one faction is | |
| responsible for the Vulkan mess it's the mobile GPU vendors and | |
| Khronos' willingness to compromise for their sake at every turn. | |
| Huge amounts of API surface was dedicated to accommodating | |
| limitations that only existed on mobile architectures, and earlier | |
| versions of Vulkan insisted on doing things the mobile way even if | |
| you knew your software was only ever going to run on desktop. | |
| Thankfully later versions have added escape hatches which bypass | |
| much of that unnecessary bureaucracy, but it was grim for a while, | |
| and all that early API cruft is still there to confuse newcomers. | |
| reactordev wrote 1 day ago: | |
| I miss Mantle. It had its quirks but you felt as if you were literally | |
| programming hardware using a pretty straight forward API. The most fun | |
| Iâve had programming was for the Xbox 360. | |
| djmips wrote 1 day ago: | |
| You know what else is good like that? The Switch graphics API - | |
| designed by Nvidia and Nintendo. Easily the most straightforward of | |
| the console graphics APIs | |
| reactordev wrote 1 day ago: | |
| Yes but itâs so underpowered. I want RTX 5090 performance with 16 | |
| cores. | |
| yieldcrv wrote 1 day ago: | |
| what level of performance improvements would this represent? | |
| Ono-Sendai wrote 1 day ago: | |
| Relative to what? | |
| Relative to modern OpenGL with good driver support, not much | |
| probably. | |
| The big win is due to the simplified API, which is helpful for | |
| application developers and also driver writers. | |
| Pannoniae wrote 1 day ago: | |
| Most of it has been said by the other replies and they're really | |
| good, adding a few things onto it: | |
| - Would lead to reduced memory usage on the driver side due to | |
| eliminating all the statetracking for "legacy" APIs and all the | |
| PSO/shader duplication for the "modern" APIs (who doesn't like using | |
| less memory? won't show up on a microbenchmark but a reduced working | |
| set leads to globally increased performance in most cases, due to | |
| >cache hit%) | |
| - A much reduced cost per API operation. I don't just mean drawcalls | |
| but everything else too. And allowing more asynchrony without the | |
| "here's 5 types of fences and barriers" kind of mess. As the article | |
| says, you can either choose between mostly implicit sync (OpenGL, | |
| DX11) and tracking all your resources yourself (Vulkan) then feeding | |
| all that data into the API which mostly ignores it. | |
| This one wouldn't really have an impact on speeding up existing | |
| applications but more like unlock new possibilities. For example | |
| massively improving scene variety with cheap drawcalls and doing more | |
| procedural objects/materials instead of the standard PBR pipeline. | |
| Yes, drawindirect and friends exist but they aren't exactly | |
| straightforward to use and require you to structure your problem in a | |
| specific way. | |
| modeless wrote 1 day ago: | |
| It would likely reduce or eliminate the "compiling shaders" step many | |
| games now have on first run after an update, and the stutters many | |
| games have as new objects or effects come on screen for the first | |
| time. | |
| m-schuetz wrote 1 day ago: | |
| Probably mostly about quality of life. Legacy graphics APIs like | |
| Vulkan have abysmal developer UX for no good reason. | |
| vblanco wrote 1 day ago: | |
| There is no implementation of it but this is how i see it, at least | |
| comparing with how things with fully extensioned vulkan work, which | |
| uses a few similar mechanics. | |
| Per-drawcall cost goes to nanosecond scale. Assuming you do drawcalls | |
| of course, this makes bindless and indirect rendering a bit easier so | |
| you could drop CPU cost to near-0 in a renderer. | |
| It would also highly mitigate shader compiler hitches due to having a | |
| split pipeline instead of a monolythic one. | |
| The simplification on barriers could improve performance a | |
| significant amount because currently, most engines that deal with | |
| Vulkan and DX12 need to keep track of individual texture layouts and | |
| transitions, and this completely removes such a thing. | |
| flohofwoe wrote 1 day ago: | |
| It's mostly not about performance, but about getting rid of legacy | |
| cruft that still exists in modern 3D APIs to support older GPU | |
| architectures. | |
| wbobeirne wrote 1 day ago: | |
| Getting rid of cruft isn't really a goal in and of itself, it's a | |
| goal in service of other goals. If it's not about performance, what | |
| else would be accomplished? | |
| tonis2 wrote 18 hours 59 min ago: | |
| Getting rid of cruft and simplifying the GPU access, makes it | |
| easier to develope software that uses GPU's, like AI's, games | |
| ..etc. | |
| Have you taken a look at the codebase of some game-engines, its | |
| complete cluster fk, cause some simple tasks just take 800 lines | |
| of code, and in the end the drivers don't even use the complexity | |
| graphics API's force upon you. | |
| Improved this is not an accomplishment ? | |
| flohofwoe wrote 1 day ago: | |
| A simplified API means higher programmer productivity, higher | |
| robustness, simplified debugging and testing, and also less | |
| internal complexity in the driver. All this together may also | |
| result in slightly higher performance, but it's not the main | |
| goal. You might gain a couple hundred microseconds per frame as a | |
| side effect of the simpler code, but if your use case already | |
| perfectly fits the 'modern subset' of Vulkan or D3D12, the | |
| performance gains will be deep in 'diminishing returns area' and | |
| hardly noticeable in the frame rate. It's mostly about secondary | |
| effects by making the programmer's life easier on both sides of | |
| the API. | |
| The cost/compromise is dropping support for outdated GPUs. | |
| ksec wrote 1 day ago: | |
| I wonder why M$ stopped putting out new Direct X? Direct X Ultimate or | |
| 12.1 or 12.2 is largely the same as Direct X 12. | |
| Or has the use of Middleware like Unreal Engine largely made them | |
| irrelevant? Or should EPIC put out a new Graphics API proposal? | |
| djmips wrote 1 day ago: | |
| The frontier of graphics APIs might be the consoles and they don't | |
| get a bump until the hardware gets a bump and the console hardware is | |
| a little bit behind. | |
| pjmlp wrote 1 day ago: | |
| That has always been the case, it is mostly FOSS circles that argue | |
| about APIs. | |
| Game developers create a RHI (rendering hardware interface) like | |
| discussed on the article, and go on with game development. | |
| Because the greatest innovation thus far has been ray tracing and | |
| mesh shaders, and still they are largely ignored, so why keep on | |
| pushing forward? | |
| djmips wrote 1 day ago: | |
| I disagree that ray tracing and mesh shaders are largely ignored - | |
| at least within AAA game engines they are leaned on quite a lot. | |
| Particularly ray tracing. | |
| pjmlp wrote 23 hours 40 min ago: | |
| Game engines aren't games, or sales. | |
| reactordev wrote 1 day ago: | |
| Both-ish. | |
| Yes, the centralization of engines to Unreal, Unity, etc makes it so | |
| thereâs less interest in pushing the boundaries, they are still | |
| pushed just on the GPU side. | |
| From a CPU API perspective, itâs very close to just plain old | |
| buffer mapping and go. We would need a hardware shift that would add | |
| something more to the pipeline than what we currently do. Like when | |
| tesselation shaders came about from geometry shader practices. | |
| thescriptkiddie wrote 1 day ago: | |
| the article talks a lot about PSOs but never defines the term | |
| CrossVR wrote 1 day ago: | |
| PSOs are Pipeline State Objects, they encapsulate the entire state of | |
| the rendering pipeline. | |
| flohofwoe wrote 1 day ago: | |
| "Pipeline State Objects" (immutable state objects which define most | |
| of the rendering state needed for a draw/dispatch call). Tbf, it's a | |
| very common term in rendering since around 2015 when the modern 3D | |
| APIs showed up. | |
| henning wrote 1 day ago: | |
| This looks very similar to the SDL3 GPU API and other RHI libraries | |
| that have been created at first glance. | |
| cyber_kinetist wrote 1 day ago: | |
| If you look at the details you can clearly see SDL3_GPU is wildly | |
| different from this proposal, such as: | |
| - It's not exposing raw GPU addresses, SDL3_GPU has buffer objects | |
| instead. Also you're much more limited with how you use buffers in | |
| SDL3 (ex. no coherent buffers, you're forced to use a transfer buffer | |
| if you want to do a CPU -> GPU upload) | |
| - in SDL3_GPU synchronization is done automatically, without the user | |
| specifying barriers (helped by a technique called cycling: [1] ), | |
| - More modern features such as mesh shading are not exposed in | |
| SDL3_GPU, and keeps the traditional rendering pipeline as the main | |
| way to draw stuff. Also, bindless is a first class citizen in | |
| Aaltonen's proposal (and the main reason for the simplification of | |
| the API), while SDL3_GPU doesn't support it at all and instead opts | |
| for a traditional descriptor binding system. | |
| [1]: https://moonside.games/posts/sdl-gpu-concepts-cycling/ | |
| Scaevolus wrote 1 day ago: | |
| SDL3 is kind of the intersection of features found in DX12/Vulkan | |
| 1.0/Metal: if it's not easily supported in all of them, it's not in | |
| SDL3-- hence the lack of bindless support. That means you can run | |
| on nearly every device in the last 10-15 years. | |
| This "no api" proposal requires hardware from the last 5-10 years | |
| :) | |
| cyber_kinetist wrote 1 day ago: | |
| Yup you've actually pointed out the most important difference: | |
| SDL3 is designed to be compatible with the APIs and devices of | |
| the past (2010s), whereas this proposal is designed to be | |
| compatible with the newer 2020s batch of consumer devices. | |
| vblanco wrote 1 day ago: | |
| This is a fantastic article that demonstrates how many parts of vulkan | |
| and DX12 are no longer needed. | |
| I hope the IHVs have a look at it because current DX12 seems semi | |
| abandoned, with it not supporting buffer pointers even when every gpu | |
| made on the last 10 (or more!) years can do pointers just fine, and | |
| while Vulkan doesnt do a 2.0 release that cleans things, so it carries | |
| a lot of baggage, and specially, tons of drivers that dont implement | |
| the extensions that really improve things. | |
| If this api existed, you could emulate openGL on top of this faster | |
| than current opengl to vulkan layers, and something like SDL3 gpu would | |
| get a 3x/4x boost too. | |
| torginus wrote 10 hours 42 min ago: | |
| It's weird how the 'next-gen' APIs will turn out to be failures in | |
| many ways imo. I think still as sizeable amount of graphics devs | |
| still stuck to the old way of doing things. I know a couple graphics | |
| wizards (who work on major AAA titles) who never liked Vulkan/DX12, | |
| and many engines haven't really been rebuilt to accomodate the 'new' | |
| way of doing graphics. | |
| Ironically a lot of the time, these new APIs end up being slower in | |
| practice (something confirmed by gaming benchmarks), probably exactly | |
| because of the issues outlined in the article - having precompiled | |
| 'pipeline states', instead of the good ol state machine has forced | |
| devs to precompile a truly staggering amount of states, and even then | |
| sometimes compilation can occur, leading to these well known | |
| stutters. | |
| The other issue is synchronization - as the article mentions how | |
| unnecessarily heavy Vulkan synchronization is, and devs aren't really | |
| experts or have the time to figure out when to use what kind of | |
| barrier, so they adopt a 'better be safe than sorry approach', | |
| leading to unneccessary flushes and pipeline stalls that can tank | |
| performance in real life workloads. | |
| This is definitely a huge issue combined with the API complexity, | |
| leading many devs to use wrappers like the aforementioned SDL3, which | |
| is definitely very conservative when it comes to synchronization. | |
| Old APIs with smart drivers could either figure this out better, or | |
| GPU driver devs looked at the workloads and patched up rendering | |
| manually on popular titles. | |
| Additionally by the early to mid 10s, when these new APIs started | |
| getting released, a lot of crafty devs, together with new shader | |
| models and OpenGL extensions made it possible to render tens of | |
| thousands of varied and interesting objects, essentially the whole | |
| scene's worth, in a single draw call. The most sophisticated and | |
| complex of these was AZDO, which I'm not sure made it actually into a | |
| released games, but even with much less sophisticated approaches (and | |
| combined with ideas like PBR materials and deferred rendering), you | |
| could pretty much draw anything. | |
| This meant much of the perf bottleneck of the old APIs disappeared. | |
| eek2121 wrote 6 hours 8 min ago: | |
| I think the big issue is that there is no 'next-gen API'. Microsoft | |
| has largely abandoned DirectX, Vulkan is restrictive as anything, | |
| Metal isn't changing much beyond matching DX/Vk, and | |
| NVIDIA/AMD/Apple/Qualcomm aren't interested in (re)-inventing the | |
| wheel. | |
| There are some interesting GPU improvements coming down the | |
| pipeline, like a possible OoO part from AMD (if certain credible | |
| leaks are valid), however, crickets from Microsoft, and NVIDIA just | |
| wants vendor lock-in. | |
| Yes, we need a vastly simpler API. I'd argue even simpler than the | |
| one proposed. | |
| One of my biggest hopes for RT is that it will standardize like 80% | |
| of stuff to the point where it can be abstracted to libraries. It | |
| probably won't happen, but one can wish... | |
| exDM69 wrote 18 hours 40 min ago: | |
| > tons of drivers that dont implement the extensions that really | |
| improve things. | |
| This isn't really the case, at least on desktop side. | |
| All three desktop GPU vendors support Vulkan 1.4 (or most of the | |
| features via extensions) on all major platforms even on really old | |
| hardware (e.g. Intel Skylake is 10+ years old and has all the latest | |
| Vulkan features). Even Apple + MoltenVK is pretty good. | |
| Even mobile GPU vendors have pretty good support in their latest | |
| drivers. | |
| The biggest issue is that Android consumer devices don't get GPU | |
| driver updates so they're not available to the general public. | |
| pjmlp wrote 17 hours 23 min ago: | |
| Neither do laptops, where not using the driver from the OEM with | |
| whatver custom code they added can lead to interesting experiences, | |
| like power configuration going bad, not able to handle the mixed | |
| GPU setups, and so on. | |
| PeterStuer wrote 22 hours 19 min ago: | |
| Still have some 1080's in gaming machines going strong. But as even | |
| nVidea retired support I guess it is time to move on. | |
| kllrnohj wrote 1 day ago: | |
| No longer needed is a strong statement given how recent the GPU | |
| support is. It's unlikely anything could accept those minimum | |
| requirements today. | |
| But soon? Hopefully | |
| jsheard wrote 1 day ago: | |
| Those requirements more or less line up with the introduction of | |
| hardware raytracing, and some major titles are already treating | |
| that as a hard requirement, like the recent Doom and Indiana Jones | |
| games. | |
| tjpnz wrote 1 day ago: | |
| Doom was able to drop it and is now Steam Deck verified. | |
| nicolaslem wrote 21 hours 21 min ago: | |
| Little known fact, the Steam Deck has hardware ray tracing, | |
| it's just so weak as to be almost non-existent. | |
| kllrnohj wrote 1 day ago: | |
| Only if you're ignoring mobile entirely. One of the things Vulkan | |
| did which would be a shame to lose is it unified desktop and | |
| mobile GPU APIs. | |
| eek2121 wrote 6 hours 3 min ago: | |
| Mobile is getting RT, fyi. Apple already has it (for a few | |
| generations, at least), I think Qualcomm does as well (I'm less | |
| familiar with their stuff, because they've been behind the game | |
| forever, however the last I've read, their latest stuff has | |
| it), and things are rapidly improving. | |
| Vulkan is the actual barrier. On Windows, DirectX does an | |
| average job at supporting it. Microsoft doesn't really innovate | |
| these days, so NVIDIA largely drives the market, and sometimes | |
| AMD pitches in. | |
| m-schuetz wrote 9 hours 53 min ago: | |
| On the contrary, I would say this is the main thing Vulkan got | |
| wrong and the main reason whe the API is so bad. Desktop and | |
| mobile are way too different for a uniform rendering API. They | |
| should be two different flavours with a common denominator. | |
| OpenGL and OpenGL ES were much better in that regard. | |
| pjmlp wrote 17 hours 22 min ago: | |
| It is not unified, when the first thing an application has to | |
| do is to find out if their set of extension spaghetti is | |
| available on the device. | |
| flohofwoe wrote 20 hours 55 min ago: | |
| > One of the things Vulkan did which would be a shame to lose | |
| is it unified desktop and mobile GPU APIs. | |
| In hindsight it really would have been better to have a | |
| separate VulkanES which is specialized for mobile GPUs. | |
| pjmlp wrote 11 hours 57 min ago: | |
| Apparently in many Android devices it is still better to | |
| target OpenGL ES than Vulkan due to driver quality, outside | |
| Samsung and Google brands. | |
| jsheard wrote 1 day ago: | |
| Eh, I think the jury is still out on whether unifying desktop | |
| and mobile graphics APIs is really worth it. In practice Vulkan | |
| written to take full advantage of desktop GPUs is wildly | |
| incompatible with most mobile GPUs, so there's fragmentation | |
| between them regardless. | |
| eek2121 wrote 5 hours 59 min ago: | |
| I definitely disagree here. What matters for mobile is power | |
| consumption. Capabilities can be pretty easily | |
| implemented...if you disagree, ask Apple. They have seemingly | |
| nailed it (with a few unrelated limitations). | |
| Mobile vendors insisting on using closed, proprietary drivers | |
| that they refuse to constantly update/stay on top of is the | |
| actual issue. If you have a GPU capable of cutting edge | |
| graphics, you have to have a top notch driver stack. Nobody | |
| gets this right except AMD and NVIDIA (and both have their | |
| flaws). Apple doesn't even come close, and they are ahead of | |
| everyone else except AMD/NVIDIA. AMD seems to do it the best, | |
| NVIDIA, a distant second, Apple 3rd, and everyone else 10th. | |
| 01HNNWZ0MV43FF wrote 1 day ago: | |
| If the APIs aren't unified, the engines will be, since VR | |
| games will want to work on both standalone headsets and | |
| streaming headsets | |
| ablob wrote 1 day ago: | |
| I feel like it's a win by default. | |
| I do like to write my own programs every now and then and | |
| recently there's been more and more graphics sprinkled into | |
| them. | |
| Being able to reuse those components and just render onto a | |
| target without changing anything else seems to be very useful | |
| here. | |
| This kind of seamless interoperability between platforms is | |
| very desirable in my book. | |
| I can't think of a better approach to achieve this than the | |
| graphics API itself. | |
| Also there is no inherent thing that blocks extensions by | |
| default. | |
| I feel like a reasonable core that can optionally do more | |
| things similar to CPU extensions (i.e. vector extensions) | |
| could be the way to go here. | |
| kllrnohj wrote 1 day ago: | |
| It's quite useful for things like skia or piet-gpu/vello or | |
| the general category of "things that use the GPU that aren't | |
| games" (image/video editors, effects pipelines, compute, etc | |
| etc etc) | |
| Groxx wrote 1 day ago: | |
| would it also apply to stuff like the Switch, and | |
| relatively high-end "mobile" gaming in general? (I'm not | |
| sure what those chips actually look like tho) | |
| there are also some arm laptops that just run Qualcomm | |
| chips, the same as some phones (tablets with a keyboard, | |
| basically, but a bit more "PC"-like due to running | |
| Windows). | |
| AFAICT the fusion seems likely to be an accurate | |
| prediction. | |
| deliciousturkey wrote 20 hours 38 min ago: | |
| Switch has its own API. The GPU also doesn't have | |
| limitations you'd associate with "mobile". In terms of | |
| architecture, it's a full desktop GPU with desktop-class | |
| features. | |
| kllrnohj wrote 17 hours 19 min ago: | |
| well, it's a desktop GPU with desktop-class features | |
| from 2014 which makes it quite outdated relative to | |
| current mobile GPUs. The just released Switch 2 uses an | |
| Ampere-based GPU, which means it's desktop-class for | |
| 2020 (RTX 3xxx series), which is nothing to scoff about | |
| but "desktop-class features" is a rapidly moving target | |
| and the Switch ends up being a lot closer to mobile | |
| than it does to desktop since it's always launching | |
| with ~2 generations old GPUs. | |
| pjmlp wrote 11 hours 55 min ago: | |
| Still beats the design of all Web 3D APIs, and has | |
| much better development tooling, let that sink in how | |
| behind they are. | |
| jsheard wrote 1 day ago: | |
| I suppose that's true, yeah. I was focusing too much on | |
| games specifically. | |
| _bohm wrote 1 day ago: | |
| I'm surprised he made no mention of the SDL3 GPU API since his | |
| proposed API has pretty significant overlap with it. | |
| pjmlp wrote 1 day ago: | |
| DirectX documentation is on a bad state currently, you have the Frank | |
| Lunas's books, which don't cover the latest improvements, and then is | |
| hunting through Learn, Github samples and reference docs. | |
| Vulkan is another mess, even if there was a 2.0, how are devs | |
| supposed to actually use it, especially on Android, the biggest | |
| consumer Vulkan platform? | |
| tadfisher wrote 1 day ago: | |
| Isn't this all because PCI resizable BAR is not required to run any | |
| GPU besides Intel Arc? As in, maybe it's mostly down to | |
| Microsoft/Intel mandating reBAR in UEFI so we can start using stuff | |
| like bindless textures without thousands of support tickets and | |
| negative reviews. | |
| I think this puts a floor on supported hardware though, like Nvidia | |
| 30xx and Radeon 5xxx. And of course motherboard support is a | |
| crapshoot until 2020 or so. | |
| vblanco wrote 1 day ago: | |
| This is not really directly about resizable BAR. you could do | |
| mostly the same api without it. Resizable bar simplifies it a | |
| little bit because you skip manual transfer operations, but its not | |
| completely required as you can write things to a cpu-writeable | |
| buffer and then begin your frame with a transfer command. | |
| Bindless textures never needed any kind of resizable BAR, you have | |
| been able to use them since early 2010s on opengl through an | |
| extension. Buffer pointers also have never needed it. | |
| <- back to front page |