/hn/comments_46293062.gph on codevoid.de

	_______ __ _______
	\| \| \|.---.-..----.\| \|--..-----..----. \| \| \|.-----..--.--.--..-----.
	\| \|\| _ \|\| __\|\| < \| -__\|\| _\| \| \|\| -__\|\| \| \| \|\|__ --\|
	\|___\|___\|\|___._\|\|____\|\|__\|__\|\|_____\|\|__\| \|__\|____\|\|_____\|\|________\|\|_____\|
	on Gopher (inofficial)
	Visit Hacker News on the Web


	COMMENT PAGE FOR:
	No Graphics API


	fngjdflmdflg wrote 12 hours 21 min ago:
	I think this almost has to be the future if most compute development
	goes to AI in the next decade or so, beyond the fact that the proposed
	API is much cleaner. Vendors will stop caring about maintaining complex
	fixed function hardware and drivers for increasingly complex graphics
	APIs when they can get 3x the return from AI without losing any
	potential sales, especially in the current day where compute seems to
	be more supply limited. Game engines can (and I assume already do)
	benefit from general purpose compute anyway for things like physics,
	and even for things that it wouldn't matter in itself for performance
	or would be slower, doing more on the GPU can be faster if your data is
	already on the GPU, which becomes more true the more things are done on
	the GPU. And as the author says, it would be great to have an open
	source equivalent to CUDA's ecosystem that could be leveraged by games
	in a cross platform way.

	dundarious wrote 13 hours 38 min ago:
	I see this as an expression of the same underlying complaint as Casey
	Muratori's 30 Million Line Problem: [1] Casey argues for ISAs for
	hardware, including GPUs, instead of heavy drivers. TFA argues for a
	graphics API surface that is so lean precisely because it fundamentally
	boils down to a simple and small set of primitives (mapping memory,
	simple barriers, etc.) that are basically equivalent to a simple ISA.

	If a stable ISA was a requirement, I believe we would have converged on
	these simpler capabilities ahead of time, as a matter of necessity.
	However, I am not a graphics programmer, so I just offer this as an
	intellectual provocation to drive conversation.

	[1]: https://caseymuratori.com/blog_0031

	newpavlov wrote 13 hours 15 min ago:
	I generally agree with this opinion and would love to see a proper
	well documented low-level API for working with GPU. But it would
	probably result in different "GPU ISAs" for different vendors and
	maybe even for different GPU generations from one vendor. The bloated
	firmwares and drivers operating on a higher abstraction level allow
	to hide a lot of internal implementation details from end users.

	In such world most of software would still probably use something
	like Vulkan/DX/WebGPU to abstract over such ISAs, like we use today
	Java/JavaScript/Python to "abstract" over CPU ISA. And we also likely
	to have an NVIDIA monopoly similar to x86.

	loup-vaillant wrote 8 hours 24 min ago:
	Thereâs a simple (but radical) solution that would force GPU
	vendors to settle on a common, stable ISA: forbid hardware vendors
	to distribute software. In practice, stop at the border hardware
	that comes from vendors who still distribute software.

	That simple. Now to sell a GPU, the only way is to make an ISA so
	simple even third parties can make good drivers for it. And the
	first successful ISA will then force everyone else to implement the
	same ISA, so the same drivers will work for everyone.

	Oh, one other thing that has to go away: patents must no longer
	apply to ISAs. That way, anyone who wants to make and sell x86,
	ARM, or whatever GPU ISA that emerges, legally can. No more
	discussion about which instruction set is open or not, they all
	just are.

	Not that the US would ever want to submit Intel to such a brutal
	competition.

	dundarious wrote 11 hours 51 min ago:
	I wouldn't be so sure, as if we analogize to x86(_64), the ISA is
	stable and used by many vendors, but the underlying
	microarchitecture and caching model, etc., are free reign for
	impl-specific work.

	zbendefy wrote 13 hours 54 min ago:
	>The user writes the data to CPU mapped GPU memory first and then
	issues a copy command, which transforms the data to optimal compressed
	format.

	Wouldnt this mean double gpu memory usage for uploading a potentially
	large image? (Even if just for the time the copy is finished)

	Vulkan lets the user copy from cpu (host_visible) memory to gpu
	(device_local) memory without an intermediate gpu buffer, afaik there
	is no double vram usage there but i might be wrong on that.

	Great article btw. I hope something comes out of this!

	bullen wrote 19 hours 29 min ago:
	Personally I'm staying with OpenGL (ES) 3 for eternity.

	VAO is the last feature I was missing prior.

	Also the other cores will do useful gameplay work so one CPU core for
	the GPU is ok.

	4 CPU cores is also enough for eternity. 1GB shared RAM/VRAM too.

	Let's build something good on top of the hardware/OSes/APIs/languages
	we have now? 3588/linux/OpenGL/C+Java specifically!

	Hardware has permanently peaked in many ways, only soft internal
	protocols can now evolve, I write mine inside TCP/HTTP.

	theandrewbailey wrote 18 hours 7 min ago:
	> Also the other cores will do useful gameplay work so one CPU core
	for the GPU is ok.

	In the before times, upgrading CPU meant eveything runs faster. Who
	didn't like that? Today, we need code that infinitely scales CPU
	cores for that to remain true. 16 thread CPUs have been around for a
	long time; I'd like my software to make the most of them.

	When we have 480+Hz monitors, we will probably need more than 1 CPU
	core for GPU rendering to make the most of them.

	Uh oh

	[1]: https://www.amazon.com/ASUS-Swift-Gaming-Monitor-PG27AQDP/dp...

	bullen wrote 14 hours 14 min ago:
	I'm 60Hz for life.

	Maybe 120Hz if they come in 4:3/5:4 with matte low res panel.

	But that's enough for VR which needs 2x because two eyes.

	So progress ends there.

	16 cores can't share memory well.

	Also 15W is peak because more is hard to passively cool in a small
	space. So 120Hz x 2 eyes at ~1080 is limit what we can do
	anyways... with $1/KWh!

	The limits are physical.

	imdsm wrote 19 hours 48 min ago:
	LLMs will eat this up

	SunlitCat wrote 1 day ago:
	This article already feels like itâs on the right track. DirectX 11
	was perfectly fine, and DirectX 12 is great if you really want total
	control over the hardware but I even remember some IHV saying that this
	level of control isnât always a good thing.

	When you look at the DirectX 12 documentation and best-practice guides,
	youâre constantly warned that certain techniques may perform well on
	one GPU but poorly on another, and vice versa. That alone shows how
	fragile this approach can be.

	Which makes sense: GPU hardware keeps evolving and has become
	incredibly complex. Maybe graphics APIs should actually move further up
	the abstraction ladder again, to a point where you mainly upload
	models, textures, and a high-level description of what the scene and
	objects are supposed to do and how they relate to each other. The
	hardware (and its driver) could then decide whatâs optimal and how to
	turn that into pixels on the screen.

	Yes, game engines and (to some extent) RHIs already do this, but having
	such an approach as a standardized, optional graphics API would be
	interesting. It would allow GPU vendors to adapt their drivers closely
	to their hardware, because they arguably know best what their hardware
	can do and how to do it efficiently.

	canyp wrote 1 day ago:
	> but I even remember some IHV saying that this level of control
	isnât always a good thing.

	Because that control is only as good as you can master it, and not
	all game developers do well on that front. Just check out enhanced
	barriers in DX12 and all of the rules around them as an example. You
	almost need to train as a lawyer to digest that clusterfuck.

	> The hardware (and its driver) could then decide whatâs optimal
	and how to turn that into pixels on the screen.

	We should go in the other direction: have a goddamn ISA you can
	target across architectures, like an x86 for GPUs (though ideally not
	that encumbered by licenses), and let people write code against it.
	Get rid of all the proprietary driver stack while you're at it.

	alaingalvan wrote 1 day ago:
	If you enjoyed history of GPUs section, there's a great book that goes
	into more detail by Jon Peddie titled "The History of the GPU - Steps
	to Invention", definitely worth a read.

	delifue wrote 1 day ago:
	This reminds of me Makimotoâs Wave: [1] There is a constant cycle
	between domain-specific hardware-hardcoded-algorithm design, and
	programmable flexible design.

	[1]: https://semiengineering.com/knowledge_centers/standards-laws/l...

	pavlov wrote 23 hours 57 min ago:
	It's also known as Sutherland's Wheel of Reincarnation:

	[1]: http://www.cap-lore.com/Hardware/Wheel.html

	awolven wrote 1 day ago:
	Is this going to materialize into a "thing"?

	qingcharles wrote 1 day ago:
	I started my career writing software 3D renderers before switching to
	Direct3D in the later 90s. What I wonder is if all of this is going to
	just get completely washed away and made totally redundant by the
	incoming flood of hallucinated game rendering?

	Will it be possible to hallucinate the frame of a game at a similar
	speed to rendering it with a mesh and textures?

	We're already seeing the hybrid version of this where you render a
	lower res mesh and hallucinate the upscaled, more detailed, more
	realistic looking skin over the top.

	I wouldn't want to be in the game engine business right now :/

	cubefox wrote 6 hours 3 min ago:
	It is more likely that machine learning models will be used by the
	game artists for asset generation, but not for rendering those assets
	at the client side, which would be extremely expensive.

	But another upcoming use case of ML on the client side is neural
	texture compression, which somehow needs not just less storage but
	also less RAM. Though it comes at a computational (frame time) cost
	on the client side, though not as bad as generative AI.

	Neural mesh compression could be another potential thing we get in
	the future. (All lossy compression seems to go in the ML direction:
	currently there is a lot of work going on with next generation neural
	audio and video codecs. E.g. [1] )

	[1]: https://arxiv.org/abs/2502.20762

	webdevver wrote 17 hours 46 min ago:
	reminds me of this remark made by Carmack on hidden surface removal
	[1] > "research from the 70s especially, there was tons of work going
	on on hidden surface removal, these clever different algorithmic ways
	- today we just kill it with a depth buffer. We just throw megabytes
	and megabytes of memory and the problem gets solved much much
	easier."

	ofcourse "megabytes" of memory was unthinkiable in the 70s. but for
	us, its unthinkable to have real-time frame inferencing. I cant help
	but draw the parallels between our current-day "clever algorithmic
	ways" of drawing pixels to the screen.

	I definitely agree with the take that in the grand scheme of things,
	all this pixel rasterizing business will be a transient moment that
	will be washed away with a much simpler petaflop/exaflop local TPU
	that runs at 60W under load, and it simply 'dreams' frames and
	textures for you.

	[1]: https://www.youtube.com/watch?v=P6UKhR0T6cs&t=2315s

	qingcharles wrote 11 hours 24 min ago:
	Agree. If you look at the GPU in an iPhone 17 and compare to the
	desktop GPU I had in 1998, the difference is startling.

	Voodoo in 1998 could render about 3m poly/sec on a Utah teapot,
	which was absurd number at the time, where I was coming from
	software renderers that were considered amazing at 100K/sec.

	A19 Pro GPU could do about 5bn/sec at about 4X the resolution. And
	it fits in your pocket. And runs off a tiny battery. Which also
	powers the screen.

	25 years from now a 5090 GPU will be laughably bad. I have no idea
	how fast we'll be able to hallucinate entire scenes, but my guess
	is that it'll be above 60fps.

	aj_hackman wrote 13 hours 35 min ago:
	What happens when you want to do something very new, or very
	specific?

	8n4vidtmkvmk wrote 1 day ago:
	I just assumed hallucinated rendering was a stepping stone to
	training AGIs or something. No one is actually seriously trying to
	build games that way, are they? Seems horribly inefficient at best,
	and incoherent at worst.

	jsheard wrote 1 day ago:
	You can't really do a whole lot of inference in 16ms on consumer
	hardware. Not to say that inference isn't useful in realtime
	graphics, DLSS has proven itself well enough, but that's a very small
	model laser-targetted at one specific problem and even that takes a
	few milliseconds to do its thing. Fitting behemoth generative models
	into those time constraints seems like an uphill battle.

	overgard wrote 1 day ago:
	I'm kind of curious about something.. most of my graphics experience
	has been OpenGL or WebGL (tiny bit of Vulkan) or big engines like
	Unreal or Unity. I've noticed over the years the uptake of DX12 always
	seemed marginal though (a lot of things stayed on D3D11 for a really
	long time). Is Direct3D 12 super awful to work with or something? I
	know it requires more resource management than 11, but so does Vulkan
	which doesn't seem to have the same issue..

	flohofwoe wrote 20 hours 46 min ago:
	> but so does Vulkan which doesn't seem to have the same issue

	Vulkan has the same issues (and more) as D3D12, you just don't hear
	much about it because there are hardly any games built directly on
	top of Vulkan. Vulkan is mainly useful as Proton backend on Linux.

	canyp wrote 1 day ago:
	Most AAA titles are on DX12 now. ID is on Vulkan. E-sports titles
	remain largely on the DX11 camp.

	What the modern APIs give you is less CPU driver overhead and new
	functionality like ray tracing. If you're not CPU-bound to begin with
	and don't need those new features, then there's not much of a reason
	to switch. The modern APIs require way more management than the prior
	ones; memory management, CPU-GPU synchronization, avoiding resource
	hazards, etc.

	Also, many of those AAA games are also moving to UE5, which is
	basically DX12 under the hood (presumably it should have a Vulkan
	backend too, but I don't see it used much?)

	kasool wrote 23 hours 53 min ago:
	UE5 has a fairly mature Vulkan backend but as you might guess is
	second class to DX12.

	starkparker wrote 1 day ago:
	> GPU hardware started to shift towards a generic SIMD design. SIMD
	units were now executing all the different shader types: vertex, pixel,
	geometry, hull, domain and compute. Today the framework has 16
	different shader entry points. This adds a lot of API surface and makes
	composition difficult. As a result GLSL and HLSL still donât have a
	flourishing library ecosystem ... despite 20 years of existence

	A lot of this post went over my head, but I've struggled enough with
	GLSL for this to be triggering. Learning gets brutal for the lack of
	middle ground between reinventing every shader every time and using an
	engine that abstracts shaders from the render pipeline. A lot of
	open-source projects that use shaders are either allergic to
	documenting them or are proud of how obtuse the code is. Shadertoy is
	about as good as it gets, and that's not a compliment.

	The only way I learned anything about shaders was from someone who
	already knew them well. They learned what they knew by spending a solid
	7-8 years of their teenage/young adult years doing nearly nothing but
	GPU programming. There's probably something in between that doesn't
	involve giving up and using node-based tools, but in a couple decades
	of trying and failing to grasp it I've never found it.

	canyp wrote 1 day ago:
	This page is a good place to start for shader programming: [1] I
	agree on the other points. GPU graphics programming is hard in large
	part because of terrible or lack of documentation.

	[1]: https://lettier.github.io/3d-game-shaders-for-beginners/inde...

	modeless wrote 1 day ago:
	I don't understand this part:

	> Meshlet has no clear 1:1 lane to vertex mapping, thereâs no
	straightforward way to run a partial mesh shader wave for selected
	triangles. This is the main reason mobile GPU vendors havenât been
	keen to adapt the desktop centric mesh shader API designed by Nvidia
	and AMD. Vertex shaders are still important for mobile.

	I get that there's no mapping from vertex/triangle to tile until after
	the mesh shader runs. But even with vertex shaders there's also no
	mapping from vertex/triangle to tile until after the vertex shader
	runs. The binning of triangles to tiles has to happen after the
	vertex/mesh shader stage. So I don't understand why mesh shaders would
	be worse for mobile TBDR.

	I guess this is suggesting that TBDR implementations split the vertex
	shader into two parts, one that runs before binning and only calculates
	positions, and one that runs after and computes everything else. I
	guess this could be done but it sounds crazy to me, probably
	duplicating most of the work. And if that's the case why isn't there an
	extension allowing applications to explicitly separate position and
	attribute calculations for better efficiency? (Maybe there is?)

	Edit: I found docs on Intel's site about this. I think I understand
	now. [1] Yes, you have to execute the vertex shader twice, which is
	extra work. But if your main constraint is memory bandwidth, not FLOPS,
	then I guess it can be better to throw away the entire output of the
	vertex shader except the position, rather than save all the output in
	memory and read it back later during rasterization. At rasterization
	time when the vertex shader is executed again, you only shade the
	triangles that actually went into your tile, and the vertex shader
	outputs stay in local cache and never hit main memory. And this doesn't
	work with mesh shaders because you can't pick a subset of the mesh's
	triangles to shade.

	It does seem like there ought to be an extension to add separate
	position-only and attribute-only vertex shaders. But it wouldn't help
	the mesh shader situation.

	[1]: https://www.intel.com/content/www/us/en/developer/articles/gui...

	yuriks wrote 1 day ago:
	I thought that the implication was that the shader compiler produces
	a second shader from the same source that went through a dead code
	elimination pass which maintains only the code necessary to calculate
	the position, ignoring other attributes.

	modeless wrote 1 day ago:
	Sure, but that only goes so far, especially when users aren't
	writing their shaders with knowledge that this transform is going
	to be applied or any tools to verify that it's able to eliminate
	anything.

	hrydgard wrote 19 hours 25 min ago:
	Well, it is what is done on several tiler architectures, and it
	generally works just fine. Normally your computations of the
	position aren't really intertwined with the computation of the
	other outputs, so dead code elimination does a good job.

	kasool wrote 23 hours 56 min ago:
	Why would it be difficult? There are explicit shader semantics to
	specify output position.

	In fact, Qualcomm's documentation spells this out:

	[1]: https://docs.qualcomm.com/nav/home/overview.html?product...

	xyzsparetimexyz wrote 1 day ago:
	This needs an index and introduction. It's also not super interesting
	to people in industry? Like yeah, it'd be nice if bindless textures
	were part of the API so you didn't need to create that global
	descriptor set. It'd be nice if you just sample from pointers to
	textures similar to how dereferencing buffer pointers works.

	wg0 wrote 1 day ago:
	Very well written but I can't understand much of this article.

	What would be one good primer to be able to comprehend all the design
	issues raised?

	jplusequalt wrote 10 hours 9 min ago:
	A working understanding of legacy graphics APIs, GPU hardware, and
	some knowledge of Vulkan/DirectX 12/CUDA.

	I have all of that but DX12 knowledge, and 50% of this article still
	went over my head.

	cmovq wrote 1 day ago:


	[1]: https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-...

	arduinomancer wrote 1 day ago:
	To be honest there isn't really one, a lot of these concepts are
	advanced even for graphics programmers

	adrian17 wrote 1 day ago:
	IMO the minimum is to be able to read a âhello world / first
	triangleâ example for any of the modern graphics APIs (OpenGL/WebGL
	doesnât count, WebGPU does), and have a general understanding of
	each step performed (resource creation, pipeline setup, passing data
	to shaders, draws, synchronization). Also to understand where the
	pipeline explosion issue comes from.

	Bonus points if you then look at CUDA âhello worldâ and consider
	that it can do nontrivial work on the same hardware (sans fixed
	function accelerators) with much less boilerplate (and driver
	overhead).

	jdashg wrote 1 day ago:
	And the GPU API cycle of life and death continues!

	I was an only-half-joking champion of ditching vertex attrib bindings
	when we were drafting WebGPU and WGSL, because it's a really nice
	simplification, but it was felt that would be too much of a departure
	from existing APIs. (Spending too many of our "Innovation Tokens" on
	something that would cause dev friction in the beginning)

	In WGSL we tried (for a while?) to build language features as "sugar"
	when we could. You don't have to guess what order or scope a `for` loop
	uses when we just spec how it desugars into a simpler, more explicit
	(but more verbose) core form/dialect of the language.

	That said, this powerpoint-driven-development flex knocks this back a
	whole seriousness and earnestness tier and a half:
	> My prototype API fits in one screen: 150 lines of code. The blog post
	is titled âNo Graphics APIâ. Thatâs obviously an impossible goal
	today, but we got close enough. WebGPU has a smaller feature set and
	features a ~2700 line API (Emscripten C header).

	Try to zoom out on the API and fit those 160 lines on one screen! My
	browser gives up at 30%, and I am still only seeing 127. This is just
	dishonesty, and we do not need more of this kind of puffery in the
	world.

	And yeah, it's shorter because it is a toy PoC, even if one I enjoyed
	seeing someone else's take on it. Among other things, the author pretty
	dishonestly elides the number of lines the enums would take up. (A
	texture/data format enum on one line? That's one whole additional
	Pinocchio right there!)

	I took WebGPU.webidl and did a quick pass through removing some of the
	biggest misses of this API (queries, timers, device loss, errors in
	general, shader introspection, feature detection) and some of the
	irrelevant parts (anything touching canvas, external textures), and
	immediately got it down to 241 declarations.

	This kind of dishonest puffery holds back an otherwise interesting
	article.

	m-schuetz wrote 1 day ago:
	Man, how I wish WebGPU didn't go all-in on legacy Vulkan API model,
	and instead find a leaner approach to do the same thing. Even Vulkan
	stopped doing pointless boilerplate like bindings and pipelines.
	Ditching vertex attrib bindings and going for programmable vertex
	fetching would have been nice.

	WebGPU could have also introduced Cuda's simple launch model for
	graphics APIs. Instead of all that insane binding boilerplate, just
	provide the bindings as launch args to the draw call like
	draw(numTriangles, args), with args being something like
	draw(numTriangles, {uniformBuffer, positions, uvs, samplers}),
	depending on whatever the shaders expect.

	pjmlp wrote 11 hours 49 min ago:
	My biggest issues with WebGPU are, yet another shading language,
	and after 15 years, browser developers don't care one second for
	debugging tools.

	It is either pixel debugging, or trying to replicate in native code
	for proper tooling.

	m-schuetz wrote 11 hours 41 min ago:
	Ironically, WebGPU was way more powerful about 5 years ago before
	WGSL was made mandatory. Back then you could just use any Spirv
	with all sorts of extensions, including stuff like 64bit types
	and atomics.

	Then wgsl came and crippled WebGPU.

	CupricTea wrote 14 hours 1 min ago:
	>Man, how I wish WebGPU didn't go all-in on legacy Vulkan API model

	WebGPU doesn't talk to the GPU directly. It requires
	Vulkan/D3D/Metal underneath to actually implement itself.

	>Even Vulkan stopped doing pointless boilerplate like bindings and
	pipelines.

	Vulkan did no such thing. As of today (Vulkan 1.4) they added
	VK_KHR_dynamic_rendering to core and added the VK_EXT_shader_object
	extension, which are not required to be supported and must be
	queried for before using. The former gets rid of render pass
	objects and framebuffer objects in favor of vkCmdBeginRendering(),
	and WebGPU already abstracts those two away so you don't see or
	deal with them. The latter gets rid of monolithic pipeline objects.

	Many mobile GPUs still do not support VK_KHR_dynamic_rendering or
	VK_EXT_shader_object. Even my very own Samsung Galaxy S24 Ultra[1]
	doesn't support shaderObject.

	Vulkan did not get rid of pipeline objects, they added extensions
	for modern desktop GPUs that didn't need them. Even modern mobile
	GPUs still need them, and WebGPU isn't going to fragment their API
	to wall off mobile users.

	[1]: https://vulkan.gpuinfo.org/displayreport.php?id=44583

	m-schuetz wrote 12 hours 0 min ago:
	> WebGPU doesn't talk to the GPU directly. It requires
	Vulkan/D3D/Metal underneath to actually implement itself.

	So does WebGL and it's doing perfectly fine without pipelines.
	They were never necessary. Since WebGL can do without pipelines,
	WebGPU can too. Backends can implement via pipelines, or they can
	go for the modern route and ignore them.

	They are an artificial problem that Vulkan created and WebGPU
	mistakenly adopted, and which are now being phased out. Some
	devices may refuse to implement pipeline-free drivers, which is
	okay. I will happily ignore them. Let's move on into the 21st
	century without that design mistake, and let legacy devices and
	companies that refuse to adapt die in dignity. But let's not let
	them hold back everyone else.

	p_l wrote 23 hours 12 min ago:
	My understanding is that pipelines in Vulkan still matter if you
	target certain GPUs though.

	m-schuetz wrote 23 hours 6 min ago:
	At some point, we need to let legacy hardware go. Also, WebGL did
	just fine without pipelines, despite being mapped to Vulkan and
	DirectX code under the hood. Meaning WebGPU could have also
	worked without pipelines just fine as well. The backends can then
	map to whatever they want, using modern code paths for modern
	GPUs.

	flohofwoe wrote 20 hours 42 min ago:
	> Also, WebGL did just fine without pipelines, despite being
	mapped to Vulkan and DirectX code under the hood.

	...at the cost of creating PSOs at random times which is an
	expensive operation :/

	m-schuetz wrote 20 hours 20 min ago:
	No longer an issue with dynamic rendering and shader objects.
	And never was an issue with OpenGL. Static pipelines are an
	artificial problem that Vulkan imposed for no good reason,
	and which they reverted in recent years.

	flohofwoe wrote 19 hours 47 min ago:
	Going entirely back to the granular GL-style state soup
	would have significant 'usability problems'. It's too easy
	to accidentially leak incorrect state from a previous draw
	call.

	IMHO a small number of immutable state objects is the best
	middle ground (similar to D3D11 or Metal, but reshuffled
	like described in Seb's post).

	m-schuetz wrote 19 hours 37 min ago:
	Not using static pipelines does not imply having to use a
	global state machine like OpenGL. You could also make an
	API that uses a struct for rasterizer configs and pass it
	as an argument to a multi draw call. I would have
	actually preferred that over all the individual setters
	in Vulkan's dynamic rendering approach.

	p_l wrote 23 hours 3 min ago:
	Quoting things I only heard about, because I don't do enough
	development in this area, but I recall reading that it impacted
	performance on pretty much every mobile chip (discounting
	Apple's because there you go through a completely different API
	and they got to design the hw together with API).

	Among other things, that covers everything running on
	non-apple, non-nvidia ARM devices, including freshly bought.

	xyzsparetimexyz wrote 1 day ago:
	Who cares about dev friction in the beginning? That was a bad choice.

	vegabook wrote 1 day ago:
	ironically, explaining that "we need a simpler API" takes a dense
	69-page technical missive that would make the Kronos Vulkan tutorial
	blush.

	Pannoniae wrote 1 day ago:
	It's actually not that low-level! It doesn't really get into hardware
	specifics that much (other than showing what's possible across
	different HW) or stuff like what's optimal where.

	And it's quite a bit simpler than what we have in the "modern" GPU
	APIs atm.

	mkoubaa wrote 1 day ago:
	I don't understand why you think this is ironic

	klaussilveira wrote 1 day ago:
	NVIDIA's NVRHI has been my favorite abstraction layer over the
	complexity that modern APIs bring.

	In particular, this fork: [1] which adds some niceties and quality of
	life improvements.

	[1]: https://github.com/RobertBeckebans/nvrhi

	greggman65 wrote 1 day ago:
	This seems tangentially related?

	[1]: https://github.com/google/toucan

	Bengalilol wrote 1 day ago:
	After reading this article, I feel like I've witnessed a historic
	moment.

	bogwog wrote 1 day ago:
	Most of it went over my head, but there's so much knowledge and
	expertise on display here that it makes me proud that this person
	I've never met is out there proving that software development isn't
	entirely full of clowns.

	ehaliewicz2 wrote 1 day ago:
	Seb is incredibly passionate about games and graphics programming.
	You can find old posts of his on various forums, talking about
	tricks for programming the PS2, PS3, Xbox 360, etc etc. He
	regularly posts demos he's working on, progress clips of various
	engines, etc, on twitter, after staying in the same area for 3
	decades.

	I wish I still had this level of motivation :)

	ginko wrote 1 day ago:
	I mean sure, this should be nice and easy.

	But then game/engine devs want to use the vertex shader producing a uv
	coordinate and a normal together with a pixel shader that only reads
	the uv coordinate (or neither for shadow mapping) and don't want to pay
	for the bandwidth of the unused vertex outputs (or the cost of
	calculating them).

	Or they want to be able to randomly enable any other pipeline stage
	like tessellation or geometry and the same shader should just work
	without any performance overhead.

	Pannoniae wrote 1 day ago:
	A preprocessor step mostly solves this one. No one said that the
	shader source has to go into the GPU API 1:1.

	Basically do what most engines do - have preprocessor constants and
	use different paths based on what attributes you need.

	I also don't see how separated pipeline stages are against this - you
	already have this functionality in existing APIs where you can swap
	different stages individually. Some changes might need a fixup from
	the driver side, but nothing which can't be added in this proposed
	API's `gpuSetPipeline` implementation...

	blakepelton wrote 1 day ago:
	Great post, it brings back a lot of memories. Two additional factors
	that designers of these APIs consider are:

	* GPU virtualization (e.g., the D3D residency APIs), to allow many
	applications to share GPU resources (e.g., HBM).

	* Undefined behavior: how easy is it for applications to accidentally
	or intentionally take a dependency on undefined behavior? This can
	make it harder to translate this new API to an even newer API in the
	future.

	aarroyoc wrote 1 day ago:
	Impressive post, so many details. I could only understand some parts of
	it, but I think this article will probably be a reference for future
	graphics API.

	I think it's fair to say that for most gamers, Vulkan/DX12 hasn't
	really been a net positive, the PSO problem affected many popular games
	and while Vulkan has been trying to improve, WebGPU is tricky as it has
	is roots on the first versions of Vulkan.

	Perhaps it was a bad idea to go all in to a low level API that exposes
	many details when the hardware underneath is evolving so fast. Maybe
	CUDA, as the post says in some places, with its more generic computing
	support is the right way after all.

	apitman wrote 9 hours 16 min ago:
	The PSO problem is referring to this, right?

	[1]: https://therealmjp.github.io/posts/shader-permutations-part1...

	qiine wrote 17 hours 44 min ago:
	yeah.. let's make nvidia control more things..

	m-schuetz wrote 14 hours 4 min ago:
	Problem is that NVIDIA literally makes the only sane
	graphics/compute APIs. And part of it is to make the API
	accessible, not needlessly overengineered. Either the other vendors
	start to step up their game, or they'll continue to lose.

	Archit3ch wrote 6 hours 40 min ago:
	> Problem is that NVIDIA literally makes the only sane
	graphics/compute APIs.

	Hot take, Metal is more sane than CUDA.

	m-schuetz wrote 2 hours 15 min ago:
	I'm having a hard time taking an API seriously that uses atomic
	types rather than atomic functions. But at least it seems to be
	better than Vulkan/OpenGL/DirectX.

	erwincoumans wrote 23 hours 58 min ago:
	Yes, an amazing and detailed post, enjoyed all of it. In AI, it is
	common to use jit compilers (pytorch, jax, warp, triton, taichi, ...)
	that compile to cuda (or rocm, cpu, tpu, ...).
	You could write renderers like that, rasterizers or raytracers.

	For example: [1] (A new simple raytracer that compiles to cuda, used
	for robotics reinforcement learning, renders at up to 1 million fps
	at low resolution, 64x64, with textures, shadows)

	[1]: https://github.com/StafaH/mujoco_warp/blob/render_context/mu...

	pjmlp wrote 1 day ago:
	I have followed Sebastian Aaltonen's work for quite a while now, so
	maybe I am a bit biased, this is however a great article.

	I also think that the way forward is to go back to software rendering,
	however this time around those algorithms and data structures are
	actually hardware accelerated as he points out.

	Note that this is an ongoing trend on VFX industry already, about 5
	years ago OTOY ported their OctaneRender into CUDA as the main
	rendering API.

	torginus wrote 10 hours 35 min ago:
	I really want to make a game using a software rasterizer sometime -
	just to prove its possible. Back in the good ol' days, I had to get
	by on my dad's PC, which had no graphics acceleration, but a farily
	substatial Pentium 3 processor.

	Games like the original Half-Life, Unreal Tournament 2004, etc. ran
	surprisingly well and at decent resolutions.

	With the power of modern hardware, I guess you could do a decent FPS
	in pure software with even naively written code, and not having to
	deal with the APIs, but having the absolute creative freedom to say
	'this pixel is green' would be liberating.

	Fun fact: Due to the divergent nature of computation, many ray
	tracers targeting real time performance were written on CPU, even
	when GPUs were quite powerful, software raytracers were quite good,
	until the hardware apis started popping up.

	darzu wrote 9 hours 43 min ago:
	You should! And you might enjoy this video about making a CPU
	rasterizer: [1] Note that when the parent comment says "software
	rendering" they're referring to software (compute shaders) on the
	GPU.

	[1]: https://www.youtube.com/watch?v=yyJ-hdISgnw

	Q6T46nT668w6i3m wrote 1 day ago:
	But they still rely on fixed functions for a handful of essential ops
	(e.g., intersection).

	gmueckl wrote 1 day ago:
	There are tons of places within the GPU where dedicated fixed
	function hardware provides massive speedups within the relevant
	pipelines (rasterization, raytracing). The different shader types are
	designed to fit inbetween those stages. Abandoning this hardware
	would lead to a massive performance regression.

	formerly_proven wrote 1 day ago:
	Just consider the sheer number of computations offloaded to TMUs.
	Shaders would already do nothing but interpolate texels if you
	removed them.

	efilife wrote 1 day ago:
	Offtop, but sorry, I can't resist. "Inbetween" is not a word. I
	started seeing many people having trouble with prepositions lately,
	for some unknown reason.

	> âInbetweenâ is never written as one word. If you have seen it
	written in this way before, it is a simple typo or misspelling. You
	should not use it in this way because it is not grammatically
	correct as the noun phrase or the adjective form.

	[1]: https://grammarhow.com/in-between-in-between-or-inbetween/

	cracki wrote 21 hours 8 min ago:
	Your entire post does not once mention the form you call correct.

	If you intend for people to click the link, then you might just
	as well delete all the prose before it.

	Antibabelic wrote 22 hours 10 min ago:
	"Offtop" is not a word. It's not in any English dictionary I
	could find and doesn't appear in any published literature.

	Matthew 7:3 "And why beholdest thou the mote that is in thy
	brother's eye, but considerest not the beam that is in thine own
	eye?"

	speed_spread wrote 17 hours 40 min ago:
	Language evolves in mysterious ways. FWIW I find offtop to have
	high cromulency.

	Joker_vD wrote 20 hours 14 min ago:
	Oh, it's a transliteration of Russian "Ð¾ÑÑÐ¾Ð¿", which
	itself started as a borrowing of "off-topic" from English (but
	as a noun instead of an adjective/stative) and then went some
	natural linguistic developments, namely loss of a hyphen and
	degemination, surface analysis of the trailing "-ic" as Russian
	suffix "-Ð¸Ðº" [0], and its subsequent removal to obtain the
	supposed "original, non-derived" form.

	[0]

	[1]: https://en.wiktionary.org/wiki/-%D0%B8%D0%BA#Russian

	fngjdflmdflg wrote 12 hours 35 min ago:
	>subsequent removal to obtain the supposed "original,
	non-derived" form

	Also called a "back-formation". FWIF I don't think the
	existence of corrupted words automatically justifies more
	corruptions nor does the fact that it is a corruption
	automatically invalidate it. When language among a group
	evolves, everyone speaking that language is affected, which
	is why written language reads pretty differently looking back
	every 50 years or so, in both formal and informal writing.
	Therefore language changes should have buy-in from all users.

	mikestorrent wrote 1 day ago:
	Surely you mean "I've started seeing..." rather than "I started
	seeing..."?

	dragonwriter wrote 13 hours 58 min ago:
	Either the present perfect that you suggest or the past perfect
	originally presented is correct, and the denotation is
	basically identical. The connotation is slightly different, as
	the past perfect puts more emphasis on the "started...lately"
	and the emergent nature of the phenomenon, and the present
	perfect on the ongoing state of what was started, but thereâs
	no giant difference.

	dist-epoch wrote 1 day ago:
	If enough people use it, it will become correct. This is how
	language evolves. BTW, there is no "official English language
	specification".

	And linguists think it would be a bad idea to have one:

	[1]: https://archive.nytimes.com/opinionator.blogs.nytimes.co...

	mrec wrote 1 day ago:
	Isn't this already happening to some degree? E.g. UE's Nanite uses a
	software rasterizer for small triangles, albeit running on the GPU
	via a compute shader.

	djmips wrote 1 day ago:
	Why do you say 'albeit'? I think it's established that 'software
	rendering' can mean running on the GPU. That's what Octane is
	doing with CUDA in the comment you are replying to. But good
	callout on Nanite.

	mrec wrote 1 day ago:
	No good reason, I'm just very very old.

	jsheard wrote 1 day ago:
	Things are kind of heading in two opposite directions at the
	moment. Early GPU rasterization was all done in fixed-function
	hardware, but then we got programmable shading, and then we started
	using compute shaders to feed the HW rasterizer, and then we
	started replacing the HW rasterizer itself with more compute (as in
	Nanite). The flexibility of doing whatever you want in software has
	gradually displaced the inflexible hardware units.

	Meanwhile GPU raytracing was a purely software affair until quite
	recently when fixed-function raytracing hardware arrived. It's fast
	but also opaque and inflexible, only exposed through high-level
	driver interfaces which hide most of the details, so you have to
	let Jensen take the wheel. There's nothing stopping someone from
	going back to software RT of course but the performance of hardware
	RT is hard to pass up for now, so that's mostly the way things are
	going even if it does have annoying limitations.

	opminion wrote 1 day ago:
	The article is missing this motivation paragraph, taken from the blog
	index:

	> Graphics APIs and shader languages have significantly increased in
	complexity over the past decade. Itâs time to start discussing how to
	strip down the abstractions to simplify development, improve
	performance, and prepare for future GPU workloads.

	stevage wrote 1 day ago:
	Thanks, I had trouble figuring out what the article was about, lost
	in all the "here's how I used AI and had the article screened by
	industry insiders".

	jama211 wrote 12 hours 30 min ago:
	You only read two paragraphs in then?

	yuriks wrote 1 day ago:
	I was lost when it suddenly jumped from a long retrospective on
	GPUs to abruptly talking about "my allocator API" on the next
	paragraph with no segue or justification.

	masspro wrote 1 day ago:
	I read that whole (single) paragraph as âI made really, really,
	really sure I didnât violate any NDAs by doing these things to
	confirm everything had a public sourceâ

	beAbU wrote 21 hours 13 min ago:
	This is literally the second paragraph in the article. There is
	no need for interpretation here.

	Unless the link of the article has changed since your comment?

	doctorpangloss wrote 1 day ago:
	haha, instead of making them read an AI-coauthored blog post, which
	obviously, they didn't do, he could have asked them interesting
	questions like, "Do better graphics make better games?" or "If you
	could change anything about the platforms' technology, what would
	it be?"

	alberth wrote 1 day ago:
	Would this be analogous to NVMe?

	Meaning ... SSDs initially reused IDE/SATA interfaces, which had
	inherent bottlenecks because those standards were designed for
	spinning disks.

	To fully realize SSD performance, a new transport had to be built
	from the ground up, one that eliminated those legacy assumptions,
	constraints and complexities.

	rnewme wrote 1 day ago:
	...and introduced new ones.

	MaximilianEmel wrote 1 day ago:
	I wonder if Valve might put out their own graphics API for SteamOS.

	m-schuetz wrote 1 day ago:
	Valve seems to be substantially responsible for the mess that is
	Vulkan. They were one of its pioneers from what I heard when chatting
	with Vulkan people.

	pjmlp wrote 1 day ago:
	Samsung and Google also have their share, see who does most of
	Vulkanised talks.

	jsheard wrote 1 day ago:
	There's plenty of blame to go around, but if any one faction is
	responsible for the Vulkan mess it's the mobile GPU vendors and
	Khronos' willingness to compromise for their sake at every turn.
	Huge amounts of API surface was dedicated to accommodating
	limitations that only existed on mobile architectures, and earlier
	versions of Vulkan insisted on doing things the mobile way even if
	you knew your software was only ever going to run on desktop.

	Thankfully later versions have added escape hatches which bypass
	much of that unnecessary bureaucracy, but it was grim for a while,
	and all that early API cruft is still there to confuse newcomers.

	reactordev wrote 1 day ago:
	I miss Mantle. It had its quirks but you felt as if you were literally
	programming hardware using a pretty straight forward API. The most fun
	Iâve had programming was for the Xbox 360.

	djmips wrote 1 day ago:
	You know what else is good like that? The Switch graphics API -
	designed by Nvidia and Nintendo. Easily the most straightforward of
	the console graphics APIs

	reactordev wrote 1 day ago:
	Yes but itâs so underpowered. I want RTX 5090 performance with 16
	cores.

	yieldcrv wrote 1 day ago:
	what level of performance improvements would this represent?

	Ono-Sendai wrote 1 day ago:
	Relative to what?
	Relative to modern OpenGL with good driver support, not much
	probably.
	The big win is due to the simplified API, which is helpful for
	application developers and also driver writers.

	Pannoniae wrote 1 day ago:
	Most of it has been said by the other replies and they're really
	good, adding a few things onto it:

	- Would lead to reduced memory usage on the driver side due to
	eliminating all the statetracking for "legacy" APIs and all the
	PSO/shader duplication for the "modern" APIs (who doesn't like using
	less memory? won't show up on a microbenchmark but a reduced working
	set leads to globally increased performance in most cases, due to
	>cache hit%)

	- A much reduced cost per API operation. I don't just mean drawcalls
	but everything else too. And allowing more asynchrony without the
	"here's 5 types of fences and barriers" kind of mess. As the article
	says, you can either choose between mostly implicit sync (OpenGL,
	DX11) and tracking all your resources yourself (Vulkan) then feeding
	all that data into the API which mostly ignores it.
	This one wouldn't really have an impact on speeding up existing
	applications but more like unlock new possibilities. For example
	massively improving scene variety with cheap drawcalls and doing more
	procedural objects/materials instead of the standard PBR pipeline.
	Yes, drawindirect and friends exist but they aren't exactly
	straightforward to use and require you to structure your problem in a
	specific way.

	modeless wrote 1 day ago:
	It would likely reduce or eliminate the "compiling shaders" step many
	games now have on first run after an update, and the stutters many
	games have as new objects or effects come on screen for the first
	time.

	m-schuetz wrote 1 day ago:
	Probably mostly about quality of life. Legacy graphics APIs like
	Vulkan have abysmal developer UX for no good reason.

	vblanco wrote 1 day ago:
	There is no implementation of it but this is how i see it, at least
	comparing with how things with fully extensioned vulkan work, which
	uses a few similar mechanics.

	Per-drawcall cost goes to nanosecond scale. Assuming you do drawcalls
	of course, this makes bindless and indirect rendering a bit easier so
	you could drop CPU cost to near-0 in a renderer.

	It would also highly mitigate shader compiler hitches due to having a
	split pipeline instead of a monolythic one.

	The simplification on barriers could improve performance a
	significant amount because currently, most engines that deal with
	Vulkan and DX12 need to keep track of individual texture layouts and
	transitions, and this completely removes such a thing.

	flohofwoe wrote 1 day ago:
	It's mostly not about performance, but about getting rid of legacy
	cruft that still exists in modern 3D APIs to support older GPU
	architectures.

	wbobeirne wrote 1 day ago:
	Getting rid of cruft isn't really a goal in and of itself, it's a
	goal in service of other goals. If it's not about performance, what
	else would be accomplished?

	tonis2 wrote 18 hours 59 min ago:
	Getting rid of cruft and simplifying the GPU access, makes it
	easier to develope software that uses GPU's, like AI's, games
	..etc.

	Have you taken a look at the codebase of some game-engines, its
	complete cluster fk, cause some simple tasks just take 800 lines
	of code, and in the end the drivers don't even use the complexity
	graphics API's force upon you.

	Improved this is not an accomplishment ?

	flohofwoe wrote 1 day ago:
	A simplified API means higher programmer productivity, higher
	robustness, simplified debugging and testing, and also less
	internal complexity in the driver. All this together may also
	result in slightly higher performance, but it's not the main
	goal. You might gain a couple hundred microseconds per frame as a
	side effect of the simpler code, but if your use case already
	perfectly fits the 'modern subset' of Vulkan or D3D12, the
	performance gains will be deep in 'diminishing returns area' and
	hardly noticeable in the frame rate. It's mostly about secondary
	effects by making the programmer's life easier on both sides of
	the API.

	The cost/compromise is dropping support for outdated GPUs.

	ksec wrote 1 day ago:
	I wonder why M$ stopped putting out new Direct X? Direct X Ultimate or
	12.1 or 12.2 is largely the same as Direct X 12.

	Or has the use of Middleware like Unreal Engine largely made them
	irrelevant? Or should EPIC put out a new Graphics API proposal?

	djmips wrote 1 day ago:
	The frontier of graphics APIs might be the consoles and they don't
	get a bump until the hardware gets a bump and the console hardware is
	a little bit behind.

	pjmlp wrote 1 day ago:
	That has always been the case, it is mostly FOSS circles that argue
	about APIs.

	Game developers create a RHI (rendering hardware interface) like
	discussed on the article, and go on with game development.

	Because the greatest innovation thus far has been ray tracing and
	mesh shaders, and still they are largely ignored, so why keep on
	pushing forward?

	djmips wrote 1 day ago:
	I disagree that ray tracing and mesh shaders are largely ignored -
	at least within AAA game engines they are leaned on quite a lot.
	Particularly ray tracing.

	pjmlp wrote 23 hours 40 min ago:
	Game engines aren't games, or sales.

	reactordev wrote 1 day ago:
	Both-ish.

	Yes, the centralization of engines to Unreal, Unity, etc makes it so
	thereâs less interest in pushing the boundaries, they are still
	pushed just on the GPU side.

	From a CPU API perspective, itâs very close to just plain old
	buffer mapping and go. We would need a hardware shift that would add
	something more to the pipeline than what we currently do. Like when
	tesselation shaders came about from geometry shader practices.

	thescriptkiddie wrote 1 day ago:
	the article talks a lot about PSOs but never defines the term

	CrossVR wrote 1 day ago:
	PSOs are Pipeline State Objects, they encapsulate the entire state of
	the rendering pipeline.

	flohofwoe wrote 1 day ago:
	"Pipeline State Objects" (immutable state objects which define most
	of the rendering state needed for a draw/dispatch call). Tbf, it's a
	very common term in rendering since around 2015 when the modern 3D
	APIs showed up.

	henning wrote 1 day ago:
	This looks very similar to the SDL3 GPU API and other RHI libraries
	that have been created at first glance.

	cyber_kinetist wrote 1 day ago:
	If you look at the details you can clearly see SDL3_GPU is wildly
	different from this proposal, such as:

	- It's not exposing raw GPU addresses, SDL3_GPU has buffer objects
	instead. Also you're much more limited with how you use buffers in
	SDL3 (ex. no coherent buffers, you're forced to use a transfer buffer
	if you want to do a CPU -> GPU upload)

	- in SDL3_GPU synchronization is done automatically, without the user
	specifying barriers (helped by a technique called cycling: [1] ),

	- More modern features such as mesh shading are not exposed in
	SDL3_GPU, and keeps the traditional rendering pipeline as the main
	way to draw stuff. Also, bindless is a first class citizen in
	Aaltonen's proposal (and the main reason for the simplification of
	the API), while SDL3_GPU doesn't support it at all and instead opts
	for a traditional descriptor binding system.

	[1]: https://moonside.games/posts/sdl-gpu-concepts-cycling/

	Scaevolus wrote 1 day ago:
	SDL3 is kind of the intersection of features found in DX12/Vulkan
	1.0/Metal: if it's not easily supported in all of them, it's not in
	SDL3-- hence the lack of bindless support. That means you can run
	on nearly every device in the last 10-15 years.

	This "no api" proposal requires hardware from the last 5-10 years
	:)

	cyber_kinetist wrote 1 day ago:
	Yup you've actually pointed out the most important difference:
	SDL3 is designed to be compatible with the APIs and devices of
	the past (2010s), whereas this proposal is designed to be
	compatible with the newer 2020s batch of consumer devices.

	vblanco wrote 1 day ago:
	This is a fantastic article that demonstrates how many parts of vulkan
	and DX12 are no longer needed.

	I hope the IHVs have a look at it because current DX12 seems semi
	abandoned, with it not supporting buffer pointers even when every gpu
	made on the last 10 (or more!) years can do pointers just fine, and
	while Vulkan doesnt do a 2.0 release that cleans things, so it carries
	a lot of baggage, and specially, tons of drivers that dont implement
	the extensions that really improve things.

	If this api existed, you could emulate openGL on top of this faster
	than current opengl to vulkan layers, and something like SDL3 gpu would
	get a 3x/4x boost too.

	torginus wrote 10 hours 42 min ago:
	It's weird how the 'next-gen' APIs will turn out to be failures in
	many ways imo. I think still as sizeable amount of graphics devs
	still stuck to the old way of doing things. I know a couple graphics
	wizards (who work on major AAA titles) who never liked Vulkan/DX12,
	and many engines haven't really been rebuilt to accomodate the 'new'
	way of doing graphics.

	Ironically a lot of the time, these new APIs end up being slower in
	practice (something confirmed by gaming benchmarks), probably exactly
	because of the issues outlined in the article - having precompiled
	'pipeline states', instead of the good ol state machine has forced
	devs to precompile a truly staggering amount of states, and even then
	sometimes compilation can occur, leading to these well known
	stutters.

	The other issue is synchronization - as the article mentions how
	unnecessarily heavy Vulkan synchronization is, and devs aren't really
	experts or have the time to figure out when to use what kind of
	barrier, so they adopt a 'better be safe than sorry approach',
	leading to unneccessary flushes and pipeline stalls that can tank
	performance in real life workloads.

	This is definitely a huge issue combined with the API complexity,
	leading many devs to use wrappers like the aforementioned SDL3, which
	is definitely very conservative when it comes to synchronization.

	Old APIs with smart drivers could either figure this out better, or
	GPU driver devs looked at the workloads and patched up rendering
	manually on popular titles.

	Additionally by the early to mid 10s, when these new APIs started
	getting released, a lot of crafty devs, together with new shader
	models and OpenGL extensions made it possible to render tens of
	thousands of varied and interesting objects, essentially the whole
	scene's worth, in a single draw call. The most sophisticated and
	complex of these was AZDO, which I'm not sure made it actually into a
	released games, but even with much less sophisticated approaches (and
	combined with ideas like PBR materials and deferred rendering), you
	could pretty much draw anything.

	This meant much of the perf bottleneck of the old APIs disappeared.

	eek2121 wrote 6 hours 8 min ago:
	I think the big issue is that there is no 'next-gen API'. Microsoft
	has largely abandoned DirectX, Vulkan is restrictive as anything,
	Metal isn't changing much beyond matching DX/Vk, and
	NVIDIA/AMD/Apple/Qualcomm aren't interested in (re)-inventing the
	wheel.

	There are some interesting GPU improvements coming down the
	pipeline, like a possible OoO part from AMD (if certain credible
	leaks are valid), however, crickets from Microsoft, and NVIDIA just
	wants vendor lock-in.

	Yes, we need a vastly simpler API. I'd argue even simpler than the
	one proposed.

	One of my biggest hopes for RT is that it will standardize like 80%
	of stuff to the point where it can be abstracted to libraries. It
	probably won't happen, but one can wish...

	exDM69 wrote 18 hours 40 min ago:
	> tons of drivers that dont implement the extensions that really
	improve things.

	This isn't really the case, at least on desktop side.

	All three desktop GPU vendors support Vulkan 1.4 (or most of the
	features via extensions) on all major platforms even on really old
	hardware (e.g. Intel Skylake is 10+ years old and has all the latest
	Vulkan features). Even Apple + MoltenVK is pretty good.

	Even mobile GPU vendors have pretty good support in their latest
	drivers.

	The biggest issue is that Android consumer devices don't get GPU
	driver updates so they're not available to the general public.

	pjmlp wrote 17 hours 23 min ago:
	Neither do laptops, where not using the driver from the OEM with
	whatver custom code they added can lead to interesting experiences,
	like power configuration going bad, not able to handle the mixed
	GPU setups, and so on.

	PeterStuer wrote 22 hours 19 min ago:
	Still have some 1080's in gaming machines going strong. But as even
	nVidea retired support I guess it is time to move on.

	kllrnohj wrote 1 day ago:
	No longer needed is a strong statement given how recent the GPU
	support is. It's unlikely anything could accept those minimum
	requirements today.

	But soon? Hopefully

	jsheard wrote 1 day ago:
	Those requirements more or less line up with the introduction of
	hardware raytracing, and some major titles are already treating
	that as a hard requirement, like the recent Doom and Indiana Jones
	games.

	tjpnz wrote 1 day ago:
	Doom was able to drop it and is now Steam Deck verified.

	nicolaslem wrote 21 hours 21 min ago:
	Little known fact, the Steam Deck has hardware ray tracing,
	it's just so weak as to be almost non-existent.

	kllrnohj wrote 1 day ago:
	Only if you're ignoring mobile entirely. One of the things Vulkan
	did which would be a shame to lose is it unified desktop and
	mobile GPU APIs.

	eek2121 wrote 6 hours 3 min ago:
	Mobile is getting RT, fyi. Apple already has it (for a few
	generations, at least), I think Qualcomm does as well (I'm less
	familiar with their stuff, because they've been behind the game
	forever, however the last I've read, their latest stuff has
	it), and things are rapidly improving.

	Vulkan is the actual barrier. On Windows, DirectX does an
	average job at supporting it. Microsoft doesn't really innovate
	these days, so NVIDIA largely drives the market, and sometimes
	AMD pitches in.

	m-schuetz wrote 9 hours 53 min ago:
	On the contrary, I would say this is the main thing Vulkan got
	wrong and the main reason whe the API is so bad. Desktop and
	mobile are way too different for a uniform rendering API. They
	should be two different flavours with a common denominator.
	OpenGL and OpenGL ES were much better in that regard.

	pjmlp wrote 17 hours 22 min ago:
	It is not unified, when the first thing an application has to
	do is to find out if their set of extension spaghetti is
	available on the device.

	flohofwoe wrote 20 hours 55 min ago:
	> One of the things Vulkan did which would be a shame to lose
	is it unified desktop and mobile GPU APIs.

	In hindsight it really would have been better to have a
	separate VulkanES which is specialized for mobile GPUs.

	pjmlp wrote 11 hours 57 min ago:
	Apparently in many Android devices it is still better to
	target OpenGL ES than Vulkan due to driver quality, outside
	Samsung and Google brands.

	jsheard wrote 1 day ago:
	Eh, I think the jury is still out on whether unifying desktop
	and mobile graphics APIs is really worth it. In practice Vulkan
	written to take full advantage of desktop GPUs is wildly
	incompatible with most mobile GPUs, so there's fragmentation
	between them regardless.

	eek2121 wrote 5 hours 59 min ago:
	I definitely disagree here. What matters for mobile is power
	consumption. Capabilities can be pretty easily
	implemented...if you disagree, ask Apple. They have seemingly
	nailed it (with a few unrelated limitations).

	Mobile vendors insisting on using closed, proprietary drivers
	that they refuse to constantly update/stay on top of is the
	actual issue. If you have a GPU capable of cutting edge
	graphics, you have to have a top notch driver stack. Nobody
	gets this right except AMD and NVIDIA (and both have their
	flaws). Apple doesn't even come close, and they are ahead of
	everyone else except AMD/NVIDIA. AMD seems to do it the best,
	NVIDIA, a distant second, Apple 3rd, and everyone else 10th.

	01HNNWZ0MV43FF wrote 1 day ago:
	If the APIs aren't unified, the engines will be, since VR
	games will want to work on both standalone headsets and
	streaming headsets

	ablob wrote 1 day ago:
	I feel like it's a win by default.
	I do like to write my own programs every now and then and
	recently there's been more and more graphics sprinkled into
	them.
	Being able to reuse those components and just render onto a
	target without changing anything else seems to be very useful
	here.
	This kind of seamless interoperability between platforms is
	very desirable in my book.
	I can't think of a better approach to achieve this than the
	graphics API itself.

	Also there is no inherent thing that blocks extensions by
	default.
	I feel like a reasonable core that can optionally do more
	things similar to CPU extensions (i.e. vector extensions)
	could be the way to go here.

	kllrnohj wrote 1 day ago:
	It's quite useful for things like skia or piet-gpu/vello or
	the general category of "things that use the GPU that aren't
	games" (image/video editors, effects pipelines, compute, etc
	etc etc)

	Groxx wrote 1 day ago:
	would it also apply to stuff like the Switch, and
	relatively high-end "mobile" gaming in general? (I'm not
	sure what those chips actually look like tho)

	there are also some arm laptops that just run Qualcomm
	chips, the same as some phones (tablets with a keyboard,
	basically, but a bit more "PC"-like due to running
	Windows).

	AFAICT the fusion seems likely to be an accurate
	prediction.

	deliciousturkey wrote 20 hours 38 min ago:
	Switch has its own API. The GPU also doesn't have
	limitations you'd associate with "mobile". In terms of
	architecture, it's a full desktop GPU with desktop-class
	features.

	kllrnohj wrote 17 hours 19 min ago:
	well, it's a desktop GPU with desktop-class features
	from 2014 which makes it quite outdated relative to
	current mobile GPUs. The just released Switch 2 uses an
	Ampere-based GPU, which means it's desktop-class for
	2020 (RTX 3xxx series), which is nothing to scoff about
	but "desktop-class features" is a rapidly moving target
	and the Switch ends up being a lot closer to mobile
	than it does to desktop since it's always launching
	with ~2 generations old GPUs.

	pjmlp wrote 11 hours 55 min ago:
	Still beats the design of all Web 3D APIs, and has
	much better development tooling, let that sink in how
	behind they are.

	jsheard wrote 1 day ago:
	I suppose that's true, yeah. I was focusing too much on
	games specifically.

	_bohm wrote 1 day ago:
	I'm surprised he made no mention of the SDL3 GPU API since his
	proposed API has pretty significant overlap with it.

	pjmlp wrote 1 day ago:
	DirectX documentation is on a bad state currently, you have the Frank
	Lunas's books, which don't cover the latest improvements, and then is
	hunting through Learn, Github samples and reference docs.

	Vulkan is another mess, even if there was a 2.0, how are devs
	supposed to actually use it, especially on Android, the biggest
	consumer Vulkan platform?

	tadfisher wrote 1 day ago:
	Isn't this all because PCI resizable BAR is not required to run any
	GPU besides Intel Arc? As in, maybe it's mostly down to
	Microsoft/Intel mandating reBAR in UEFI so we can start using stuff
	like bindless textures without thousands of support tickets and
	negative reviews.

	I think this puts a floor on supported hardware though, like Nvidia
	30xx and Radeon 5xxx. And of course motherboard support is a
	crapshoot until 2020 or so.

	vblanco wrote 1 day ago:
	This is not really directly about resizable BAR. you could do
	mostly the same api without it. Resizable bar simplifies it a
	little bit because you skip manual transfer operations, but its not
	completely required as you can write things to a cpu-writeable
	buffer and then begin your frame with a transfer command.

	Bindless textures never needed any kind of resizable BAR, you have
	been able to use them since early 2010s on opengl through an
	extension. Buffer pointers also have never needed it.


	<- back to front page