/hn/comments_46291941.gph on codevoid.de

	_______ __ _______
	\| \| \|.---.-..----.\| \|--..-----..----. \| \| \|.-----..--.--.--..-----.
	\| \|\| _ \|\| __\|\| < \| -__\|\| _\| \| \|\| -__\|\| \| \| \|\|__ --\|
	\|___\|___\|\|___._\|\|____\|\|__\|__\|\|_____\|\|__\| \|__\|____\|\|_____\|\|________\|\|_____\|
	on Gopher (inofficial)
	Visit Hacker News on the Web


	COMMENT PAGE FOR:
	GPT Image 1.5


	yuni_aigc wrote 19 hours 24 min ago:
	One thing Iâve noticed when comparing these models is that
	âqualityâ and ârealismâ donât always move together.

	Some models are very strong at sharp details and localized edits, but
	they can break global lighting consistency â shadows, reflections, or
	overall scene illumination drift in subtle ways. GPT-Image seems to
	trade a bit of micro-detail for better global coherence, especially in
	lighting, which makes composites feel more believable even if theyâre
	not pixel-perfect.

	Itâs hard to capture this in benchmarks, but for real-world editing
	workflows it ends up mattering more than I initially expected.

	fock wrote 1 day ago:
	Good to see that hands are still not solved...

	sipsi wrote 1 day ago:
	the combination of two images the last gpt-image (nano banana)
	generated seem to be inappropriate

	Garlef wrote 1 day ago:
	GPT images is the new MS Word "Arial + clip art"

	chakintosh wrote 1 day ago:
	Can't wait to generate fake memories with my 20 years ago dead grandma

	bunnybomb2 wrote 7 hours 33 min ago:
	Or me and my ex

	rw2 wrote 1 day ago:
	Having used it compared to Nano Banana:

	-The latency is still too high, lower than 10 seconds for nano banana
	and around 25 seconds for GPT image 1.5

	-The quality is higher but not a jump like previous google models to
	Nano Banana Pro. Nano banana pro is still at least equivalently good or
	better in my opinion.

	jdthedisciple wrote 1 day ago:
	Why is the emphasis of these promos always to create fake social media
	pictures of people and things that didnt happen?

	Aren't we plagued enough by all the fake bullshit out there.

	Ffs!

	/rant

	Sorry gotta be honest and blunt every one of those times...

	v9v wrote 1 day ago:
	Lots of em-dashes in this copy.

	sroussey wrote 1 day ago:
	â Photo of a blond male in his 50s with half gray hair â

	Still fails. Every photo of a man with half gray hair will have the
	other half black.

	andai wrote 1 day ago:
	Sam Altman Christmas decoration isn't real, he can't hurt me...

	thumbsup-_- wrote 1 day ago:
	now you can create good memories with your family without meeting them

	augustk wrote 1 day ago:
	Or create the family

	GaryBluto wrote 1 day ago:
	God OpenAI are so far behind. Their own example shows that trying to
	only change specific parts of the image doesn't work without affecting
	the background.

	encroach wrote 1 day ago:
	This outperforms Gemini 3 pro image (nano banana pro) on Text-to-Image
	Arena and Image Edit Arena. I'm surprised they didn't mention this
	leaderboard in the blog post.

	I like this benchmark because its based upon user votes, so overfitting
	is not as easy (after all, if users prefer your result, you've won).
	[1]

	[1]: https://lmarena.ai/leaderboard/text-to-image
	[2]: https://lmarena.ai/leaderboard/image-edit

	ygouzerh wrote 1 day ago:
	The score are really, really close, it might be why

	nycdatasci wrote 1 day ago:
	The arena concept doesnât work for image models due to watermarks.

	encroach wrote 1 day ago:
	There are no watermarks in the arena.

	nycdatasci wrote 1 day ago:
	There are no visible watermarks, but model makers can use
	steganographic codes to identify outputs from their own models.

	nycdatasci wrote 21 hours 1 min ago:
	Text-to-Image Models Leave Identifiable Signatures:
	Implications for Leaderboard Security

	[1]: https://arxiv.org/pdf/2510.06525

	encroach wrote 18 hours 51 min ago:
	This is true, however LMArena does employ some methods to
	mitigate attempts to manipulate the leaderboard, see [1] They
	also control for style

	[1]: https://openreview.net/forum?id=zf9zwCRKyP
	[2]: https://news.lmarena.ai/sentiment-control/

	raw_anon_1111 wrote 1 day ago:
	I still canât get it to draw a â13 hour clockâ correctly

	fellowniusmonk wrote 17 hours 37 min ago:
	All the latest round of openai is massively overfit.

	randall wrote 1 day ago:
	double popped collar ftw

	password-app wrote 1 day ago:
	Impressive image quality improvements. Meanwhile, AI agents just
	crossed a milestone: Simular's Agent S hit 72.6% on OSWorld
	(human-level is 72.36%).

	We're seeing AI get better at both creative tasks (images) and
	operational tasks (clicking through websites).

	For anyone building AI agents: the security model is still the hard
	part. Prompt injection remains unsolved even with dedicated security
	LLMs.

	nightshift1 wrote 1 day ago:
	What is the endgame? Why is OpenAI throwing that much money on
	image/video generation? Is there a profitable market for AI-generated
	image slop? Do people choose ChatGPT instead of Gemini/Grok/Claude
	because of the image generation capabilities? To me, it looks like a
	huge fiery money pit.

	BrokenCogs wrote 1 day ago:
	The endgame is to make money during the hype and then cash out before
	it crashes.

	bdangubic wrote 1 day ago:
	if that is the endgame openai is doing everything but working
	towards that goal :)

	BrokenCogs wrote 1 day ago:
	Yeah they fumbled big time

	eterm wrote 1 day ago:
	I have a "go to" prompt for images:

	> In the style of a 1970s book sci-fi novel cover: A spacer walks
	towards the frame. In the background his spaceship crashed on an icy
	remote planet. The sky behind is dark and full of stars.

	Nano banana pro via gemini did really well, although still way too
	detailed, and it then made a mess of different decades when I asked it
	to follow up: [1] It's therefore really disappointing that GPT-image
	1.5 did this: [2] Completely generic, not at all like a book cover, it
	completely ignored that part of the prompt while it focused on the
	other elements.

	Did it get the other details right? Sure, maybe even better, but the
	important part it just ignored completely.

	And it's doing even worse when I try to get it to correct the mistake.
	It's just repeating the same thing with more "weathering".

	[1]: https://gemini.google.com/share/1902c11fd755
	[2]: https://chatgpt.com/share/6941ed28-ed80-8000-b817-b174daa922a7

	bongodongobob wrote 1 day ago:
	You're just not describing what you want properly. Looks fine to me.
	Clearly you have something else in mind, so I think you're just not
	describing well. My tip would be to use actuall illustration
	language. Do you want a wide angle shot? What should depth of field
	be? Oil painting print? Ink illustration? What kind of printing
	style? Do you want a photo of the book or a pre-print proof? What
	kind of color scheme?

	A professional artist wouldn't know what you want.

	You didn't even specify an art style. 1970s sci-fi novel cover isn't
	a style. You'll find vastly different art styles from the 70s. If
	you're disappointed, it's because you're doing a shitty job
	describing what's in your head. If your prompt isn't at least a
	paragraph, you're going to just get random generic results.

	eterm wrote 1 day ago:
	The killer feature of LLMs is to be able to extrapolate what's
	really wanted from short descriptions.

	Look again at Gemini's output, it looks like an actual book cover,
	it looks like an illustration that could be found on a book.

	It takes on board corrections (albeit hilariously literaly).

	Look at GPT image's output, it doesn't look anything like a book
	cover, and when prompted to say it got it wrong, just doubles down
	on what it was doing.

	bongodongobob wrote 1 day ago:
	What you want, and what you think image generation is, is
	impossible.

	eterm wrote 21 hours 9 min ago:
	And yet we can see Gemini do what I wanted, so it's clearly not
	impossible.

	bongodongobob wrote 19 hours 25 min ago:
	What you've found is a prompt that returns what you want on
	Gemeni. That's all.

	eterm wrote 18 hours 30 min ago:
	It's a prompt I've been using for years. Gemini has been
	the best of the bunch, but Nana Banana, midjourney, etc,
	all did okay to various degrees.

	GPT Image bombed notably worse than the others, not the
	original picture itself, but the complete lack of
	recognition of my feedback that it hadn't got it right, it
	just doubled down on the image it had generated.

	enigma101 wrote 1 day ago:
	Really can't stand the image slop suffocating the internet.

	adammarples wrote 1 day ago:
	Still can't pass my image test

	Two women walking in single file

	Although it tried very hard and had them staggered slightly

	weird-eye-issue wrote 1 day ago:
	Interestingly when you Google that literally all of the images have
	two women walking side by side

	ge96 wrote 1 day ago:
	I get the tech implementation is amazing, I wonder if it takes away
	from genuineness of events, like the Astronaut photo, I get it's just a
	joke/funny too but it's like a photo of you in a supercar vs. actually
	buying one. Or fake AI companions vs. real people. Beauty
	filters/skinny filters vs. actually being healthy.

	ge96 wrote 21 hours 13 min ago:
	thinking about this more it goes two ways for guys/girls, the guys
	can post pics of them doing crazy things on their Tinder up to the
	girl to decide if it's real or not

	I'm not saying this as a critique against image generation as you can
	manually make these fake images but yeah

	Ultimately I think it's good, makes people be real

	onoesworkacct wrote 1 day ago:
	the next generation of humans growing up will not even care whether
	media is real or not any more. The saturation of AI content and FUD
	around real content is going to blur the lines to the extent that
	there's no point even caring about it. And it's an intractable
	problem.

	hopefully this leads to greater importance of seeing things with your
	own wetware.

	ge96 wrote 1 day ago:
	The other issue is the need to show off... if I had a supercar why
	do I have to post it on Instagram that kind of thing.

	smlavine wrote 1 day ago:
	This is terrifying. Truth is dead.

	teaearlgraycold wrote 1 day ago:
	Eventually phone manufacturers will be forced to become arbiters of
	truth with signed images and videos.

	WhyOhWhyQ wrote 1 day ago:
	Makes you wonder what's really meant when we talk about progress.

	mingabunga wrote 1 day ago:
	Did an experiment to give a software product a dark theme. Gave Both
	(GPT and Gemini/Nano) a screenshot of the product and an example theme
	I found on Dribbble.

	- Gemini/Nano did a pretty average job, only applying some grey to some
	of the panels. I tried a few different examples and got similar output.

	- GPT did a great job and themed the whole app and made it look great.
	I think I'd still need a designer to finesse some things though.

	vunderba wrote 1 day ago:
	Okay results are in for GenAI Showdown with the new gpt-image 1.5 model
	for the editing portions of the site! [1] Conclusions

	- OpenAI has always had some of the strongest prompt understanding
	alongside the weakest image fidelity. This update goes some way towards
	addressing this weakness.

	- It's leagues better at making localized edits without altering the
	entire image's aesthetic than gpt-image-1, doubling the previous score
	from 4/12 to 8/12 and the only model that legitimately passed the
	Giraffe prompt.

	- It's one of the most steerable models with a 90% compliance rate

	Updates to GenAI Showdown

	- Added outtakes sections to each model's detailed report in the
	Text-to-Image category, showcasing notable failures and unexpected
	behaviors.

	- New models have been added including REVE and Flux.2 Dev (a new
	locally hostable model).

	- Finally got around to implementing a weighted scoring mechanism which
	considers pass/fail, quality, and compliance for a more holistic model
	evaluation (click pass/fail icon to toggle between scoring methods).

	If you just want to compare gpt-image-1, gpt-image-1.5, and NB Pro at
	the same time: [1] ?models=o4,nbp...

	[1]: https://genai-showdown.specr.net/image-editing
	[2]: https://genai-showdown.specr.net/image-editing?models=o4,nbp,g...

	Bombthecat wrote 15 hours 1 min ago:
	I can't click the compliance info button on mobile. The text shows
	for half a second and then vanishes. Long press just marks the text
	for copy paste.

	vunderba wrote 6 hours 37 min ago:
	Hey bombthecat - thanks for pointing this out. I had some poor
	mobile browser detection that was causing this issue. It should be
	fixed now.

	nicpottier wrote 19 hours 15 min ago:
	Love this benchmark, always the first place I look. Also seems like
	it is time to move the goalposts, not sure we are getting enough
	resolution between models anymore.

	Out of curiosity why does gemini get gold for the poker example but
	gpt-image 1.5 does not? I couldn't see a difference between the two.

	leumon wrote 1 day ago:
	One other test you could add is generating a chessboard from a FEN. I
	was surprised to see NBP able to do that (however, it seems to only
	work with fewer pieces, after a certain amount it makes mistakes or
	even generates a completely wrong image)

	[1]: https://files.catbox.moe/uudsyt.png

	quietbritishjim wrote 1 day ago:
	Absolutely fabulous work.

	Ludicrously unnecessary nitpick for "Remove all the brown pieces of
	candy from the glass bowl":

	> Gemini 2.5 Flash - 18 attempts - No matter what we tried, Gemini
	2.5 Flash always seemed to just generate an entirely new assortment
	of candies rather than just removing the brown ones.

	The way I read the prompt, it demands that the candies should change
	arrangement. You didn't say "change the brown candies to a different
	color", you said "remove them". You can infer from the few brown ones
	that you can see that there are even more underneath - surely if you
	removed them all (even just by magically disappearing them) then the
	others would tumble down into a new location? The level of the
	candies is lower than before you started, which is what you'd expect
	if you remove some. Maybe it's just coincidence, but maybe this
	really was its reasoning. (It did unnecessarily remove the red candy
	from the hand though.)

	I don't think any of the "passes" did as well as this, including
	Gemini 3.0 Pro Image. Qwen-Image-Edit did at least literally remove
	one of the three visible brown candies, but just recolored the other
	two.

	vunderba wrote 20 hours 48 min ago:
	That is a great point! Since we are moving towards better "world
	models" in terms of these multimodal models, you could reasonably
	argue that if the directive was to physically remove the candy that
	in the process of doing so, gravity/physics could affect the
	positioning of other objects.

	You will note that the Minimum Passing Criteria allows for a color
	change in order to pass the prompt but with the rapid improvements
	in generative models, I may revise this test to be stricter, only
	allowing "Removal" to be considered as pass as opposed to a simple
	color swap.

	boredhedgehog wrote 1 day ago:
	I disagree with gpt-image-1.5's grade on the worm sign. It moved some
	of the marks around to accommodate the enlarged black area, but
	retained the overall appearance of the sign.

	vunderba wrote 20 hours 50 min ago:
	I can see how you'd come to that conclusion. Each prompt is
	supposed to illustrate a different type of test criteria. The
	ultimate goal of Worm Sign is intended to test a near 100%
	retention of the original weathered/dented sign.

	If you look at the ones that passed (Flux.2 Pro, Gemini 2.5 Flash,
	Reve), you'll see that they did not add/subtract/move any of the
	pockmarks from the original image.

	KeplerBoy wrote 1 day ago:
	"Remove all the trash from the street and sidewalk. Replace the
	sleeping person on the ground with a green street bench. Change the
	parking meter into a planted tree."

	What a prompt and image.

	__alexs wrote 1 day ago:
	Looking forward to the first AR glasses to include live editing of
	the world like this.

	nisegami wrote 1 day ago:
	How long until this shows up in a YC batch?

	imdsm wrote 1 day ago:
	A way it could be...

	walrus01 wrote 1 day ago:
	I've already seen images on the MLS uploaded by real estate agents
	that look like this is the same concept as what they've been doing,
	generally, to bait people into coming and touring houses.

	llmthrow0827 wrote 1 day ago:
	It failed my benchmark of a photo of a person touching their elbows
	together.

	lobochrome wrote 1 day ago:
	Stupid Cisco Umbrella is blocking you

	mvkel wrote 1 day ago:
	This leaderboard feels incredibly accurate given my own experience.

	heystefan wrote 1 day ago:
	So when you say "X attempts" what does that mean? You just start a
	new chat with the same exact prompt and hope for a different result?

	vunderba wrote 1 day ago:
	All images are generated using independent, separate API calls. See
	the FAQ at the bottom under âWhy is the number of attempts
	seemingly arbitrary?â and âHow are the prompts written?â for
	more detail, but to quickly summarize:

	In addition to giving models multiple attempts to generate an
	image, we also write several variations of each prompt. This helps
	prevent models from getting stuck on particular keywords or
	phrases, which can happen depending on their training data. For
	example, while âhippity hopâ is a relatively common name for
	the ball-riding toy, itâs also known as a âspace hopper.â In
	some cases, we may even elaborate and provide the model with a
	dictionary-style definition of more esoteric terms.

	This is why providing an âX Attemptsâ metric is so important.
	It serves as a rough measure of how âsteerableâ a given model
	is - or put another way how much we had to fight with the model in
	order for it to consistently follow the promptâs directives.

	singhkays wrote 1 day ago:
	GPT Image 1.5 is the first model that gets close to replicating the
	intricate detail mosaic of bullets in the "Lord of War" movie poster
	for me. Following the prompt instructions more closely also seems
	better compared to Nano Banana Pro.

	I edited the original "Lord of War" poster with a reference image of
	Jensen and replaced bullets with GPU dies, silicon wafers and
	electronic components.

	[1]: https://x.com/singhkays/status/2001080165435113791

	smusamashah wrote 1 day ago:
	Z-image was released recently and that's what /r/StableDiffusion all
	talks about these days. Consider adding that too. It is very good
	quality for its size (Requires only 6 or 8 gigs of ram).

	vunderba wrote 1 day ago:
	I've actually done a bit of preliminary testing with ZiT. I'm
	holding off on adding it to the official GenAI site until the base
	and edit models have been released since the Turbo model is pretty
	heavily distilled.

	[1]: https://mordenstar.com/other/z-image-turbo

	pierrec wrote 1 day ago:
	This showdown benchmark was and still is great, but an enormous grain
	of salt should be added to any model that was released after the
	showdown benchmark itself.

	Maybe everyone has a different dose of skepticism. Personally I'm not
	even looking at results for models that were released after the
	benchmark, for all this tells us, they might as well be one-trick
	ponies that only do well in the benchmark.

	It might be too much work, but one possible "correct" approach for
	this kind of benchmark would to periodically release new benchmarks
	with new tests (that are broadly in the same categories) and only
	include models that predate each benchmark.

	somenameforme wrote 1 day ago:
	You don't need skepticism, because even if you're acting in 100%
	good faith and building a new model, what's the first thing you're
	going to do? You're going to go look up as many benchmarks as you
	can find and see how it does on them. It gives you some easy
	feedback relative to your peers. The fact that your own model may
	end up being put up against these exact tests is just icing.

	So I don't think there's even a question of whether or not newer
	models are going to be maximizing for benchmarks - they 100% are.
	The skepticism would be in how it's done. If something's not being
	run locally, then there's an endless array of ways to cheat - like
	dynamically loading certain LoRAs in response to certain queries,
	with some LoRAs trained precisely to maximize benchmark
	performance. Basically taking a page out of the car company
	playbook in response to emissions testing.

	But I think maximizing the general model itself to perform well on
	benchmarks isn't really unethical or cheating at all. All you're
	really doing there is 'outsourcing' part of your quality control
	tests. But it simultaneously greatly devalues any benchmark,
	because that benchmark is now the goal.

	smusamashah wrote 1 day ago:
	I think training image models to pass these very specific tests
	correctly will be very difficult for any of these companies. How
	would they even do that?

	8n4vidtmkvmk wrote 1 day ago:
	Hire a professional Photoshop artist to manually create the
	"correct" images and then put the before and after photos into
	the training data. Or however they've been training these models
	thus far, i don't know.

	And if that still doesn't get you there, hash the image inputs to
	detect if its one of these test photos and then run your special
	test-passer algo.

	smusamashah wrote 3 hours 23 min ago:
	I don't think a few images done by any professional will have a
	measurable impact in training.

	vunderba wrote 1 day ago:
	Yeah thatâs a classic problem, and it's why good tests are such
	closely guarded secrets: to keep them from becoming training fodder
	for the next generation of models. Regarding the "model date" vs
	"benchmark date" - that's an interesting point... I'll definitely
	look into it!

	I don't have any captcha systems in place, but I wonder if it might
	be worth putting up at least a few nominal roadblocks (such as
	Anubis [1]) to at least slow down the scrapers.

	A few weeks ago I actually added some new, more challenging tests
	to the GenAI Text-to-Image section of the site (the âangelic
	forgeâ and âovercrowded flat earthâ) just to keep pace with
	the latest SOTA models.

	In the next few weeks, Iâll be adding some new benchmarks to the
	Image Editing section as well~~ [1] -

	[1]: https://anubis.techaro.lol

	echelon wrote 1 day ago:
	The Blender previz reskin task [1] could be automated! New test
	cases could be randomly and procedurally generated (without AI).

	Generate a novel previz scene programatically in Blender or some
	3D engine, then task the image model with rendering it in a style
	(or to style transfer to a given image, eg. something novel and
	unseen from Midjourney). Another test would be to replace stand
	in mannequins with identities of characters in reference images
	and make sure the poses and set blocking match.

	Throw in a 250 object asset pack and some skeletal meshes that
	can conform to novel poses, and you've got a fairly robust test
	framework.

	Furthermore, anything that succeeds from the previz rendering
	task can then be fed into another company's model and given a
	normal editing task, making it doubly useful for two entirely
	separate benchmarks. That is, successful previz generations can
	be reused as image edit test cases - and you a priori know the
	subject matter without needing to label a bunch of images or run
	a VLM, so you can create a large set of unseen tests.

	[1]: https://imgur.com/gallery/previz-to-image-gpt-image-1-x8...

	irishcoffee wrote 1 day ago:
	> the only model that legitimately passed the Giraffe prompt.

	10 years ago I would have considered that sentence satire. Now it
	allegedly means something.

	Somehow it feels like weâre moving backwards.

	echelon wrote 1 day ago:
	> Somehow it feels like weâre moving backwards.

	I don't understand why everyone isn't in awe of this. This is
	legitimately magical technology.

	We've had 60+ years of being able to express our ideas with
	keyboards. Steve Jobs' "bicycle of the mind". But in all this time
	we've had a really tough time of visually expressing ourselves.
	Only highly trained people can use Blender, Photoshop, Illustrator,
	etc. whereas almost everyone on earth can use a keyboard.

	Now we're turning the tide and letting everyone visually articulate
	themselves. This genuinely feels like computing all over again for
	the first time. I'm so unbelievably happy. And it only gets better
	from here.

	Every human should have the ability to visually articulate
	themselves. And it's finally happening. This is a major win for the
	world.

	I'm not the biggest fan of LLMs, but image and video models are a
	creator's dream come true.

	In the near future, the exact visions in our head will be
	shareable. We'll be able to iterate on concepts visually,
	collaboratively. And that's going to be magical.

	We're going to look back at pre-AI times as primitive. How did
	people ever express themselves?

	conradfr wrote 2 hours 39 min ago:
	It is amazing and impressive. But also an unlimited source of
	trash and slop during my internet use.

	concats wrote 1 day ago:
	âI've come up with a set of rules that describe our reactions
	to technologies:

	1. Anything that is in the world when youâre born is normal and
	ordinary and is just a natural part of the way the world works.

	2. Anything that's invented between when youâre fifteen and
	thirty-five is new and exciting and revolutionary and you can
	probably get a career in it.

	3. Anything invented after you're thirty-five is against the
	natural order of things.â

	â Douglas Adams

	vintermann wrote 1 day ago:
	Is that how it works this time, though?

	* I'm into genealogy. Naturally, most of my fellow genealogists
	are retired, often many years ago, though probably also above
	average in mental acuity and tech-savviness for their age. They
	LOVE generative AI.

	* My nieces, and my cousin's kids of the same age, are deeply
	into visual art. Especially animation, and cutesy Pokemon-like
	stuff. They take it very seriously. They absolutely DON'T like
	AI art.

	SchemaLoad wrote 1 day ago:
	I'm struggling to see the benefits. All I see people using this
	for is generating slop for work presentations, and misleading
	people on social media. Misleading might be understating it too.
	It's being used to create straight up propaganda and destruction
	of the sense of reality.

	irishcoffee wrote 1 day ago:
	You basically described magic mushrooms, where the description
	came from you while high on magic mushrooms.

	Itâs just a tool. Itâs not a world-changing tech. Itâs a
	tool.

	Rodeoclash wrote 1 day ago:
	Where is all this wonderful visual self expression that people
	are now free to do? As far as I can tell it's mostly being used
	on LinkedIn posts.

	scrollaway wrote 1 day ago:
	Itâs a classic issue that you give access to superpowers to
	the general population and most will use them in the most
	boring ways.

	The internet is an amazing technology, yet its biggest
	consumption is a mix of ads, porn and brain rot.

	We all have cameras in our pockets yet most people use them for
	selfies.

	But if you look closely enough, the incredible value that comes
	from these examples more than makes up for all the people using
	them in a âboringâ way.

	And anyway whoâs the arbiter of boring?

	BoredPositron wrote 1 day ago:
	Nano Banana has still the best VAE we have seen especially if you are
	doing high res production work. The flux2 comes close but gpt image
	is still miles away.

	echelon wrote 1 day ago:
	I really love everything you're doing!

	Personal request: could you also advocate for "image previz
	rendering", which I feel is an extremely compelling use case for
	these companies to develop. Basically any 2d/3d compositor that
	allows you to visually block out a scene, then rely on the model to
	precisely position the set, set pieces, and character poses.

	If we got this task onto benchmarks, the companies would absolutely
	start training their models to perform well at it.

	Here are some examples:

	gpt-image-1 absolutely excels at this, though you don't have much
	control over the style and aesthetic: [1] Nano Banana (Pro) fails at
	this task: [2] Flux Kontext, Qwen, etc. have mixed results.

	I'm going to re-run these under gpt-image-1.5 and report back.

	Edit:

	gpt-image-1.5 : [3] And just as I finish this, Imgur deletes my
	original gpt-image-1 post.

	Old link (broken): [4] Hopefully imgur doesn't break these. I'll have
	to start blogging and keep these somewhere I control.

	[1]: https://imgur.com/gallery/previz-to-image-gpt-image-1-x8t1ij...
	[2]: https://imgur.com/a/previz-to-image-nano-banana-pro-Q2B8psd
	[3]: https://imgur.com/a/previz-to-image-gpt-image-1-5-3fq042U
	[4]: https://imgur.com/a/previz-to-image-gpt-image-1-Jq5M2Mh

	vunderba wrote 1 day ago:
	Thanks! A highly configurable Previz2Image model would be a
	fantastic addition. I was literally just thinking about this the
	other day (but more in the context of ControlNets and posable
	kinematic models). Iâm even considering adding an early CG Poser
	blockedâout scene test to see how far the various editor models
	can take it.

	With additions like structured prompts (introduced in BFL Flux 2),
	maybe we'll see something like this in the near future.

	ares623 wrote 1 day ago:
	My copium is that analog photography makes a come back as a way to
	recover some level of trust and authenticity.

	famahar wrote 1 day ago:
	I was reading a trend report on art and it seems like collage,
	squiggly hand drawn text, and lots of intentional imperfections are
	becoming popular. I'm not sure how hard it is for AI to recreate
	those, but it is nice to see people trying to do more of what AI
	struggles with.

	Forgeties79 wrote 1 day ago:
	Good luck getting it developed unfortunately. I have to ship it off
	now, there isnât a single local spot in my city that will develop
	anymore

	ares623 wrote 1 day ago:
	When the demand is back, the labs should start coming back. There's
	a few in my relatively small city which is pretty surprising. But
	the costs are still too high to cover the low volume I guess.

	Forgeties79 wrote 1 day ago:
	The big issue is chemical disposal IIRC (which yes is a cost just
	being more specific)

	celeryd wrote 1 day ago:
	If it can't generate non-sexual content of a woman in a bikini, I am
	not interested.

	brador wrote 1 day ago:
	Every person in every picture in their examples is white except for 1
	Asian dude. Like a 46:1 ratio for the page (I counted). Not one Middle
	Eastern or Black or Jewish or Indian or South American person.

	Not even one. And no one on the team said anything?

	Come on Sam, do better.

	agentifysh wrote 1 day ago:
	I am very impressed a benchmark I like to run is have it create sprite
	maps, uv texture maps for an imagined 3d model

	Noticed it captured a megaman legends vibe .... [1] and here it
	generated a texture map from a 3d character [2] however im not sure if
	these are true uv maps that is accurate as i dont have the 3d models
	itself

	but ive tried this in nano banana when it first came out and it
	couldn't do it

	[1]: https://x.com/AgentifySH/status/2001037332770615302
	[2]: https://x.com/AgentifySH/status/2001038516067672390/photo/1

	101008 wrote 1 day ago:
	> however im not sure if these are true uv maps that is accurate as
	i dont have the 3d models itself

	also in the tweet

	> GPT Image 1.5 is **ing crazy

	and

	> holy shit lol

	what's impressive if you don't know if it's right or not (as the
	other comment pointed out, it is not right)

	gs17 wrote 1 day ago:
	> however im not sure if these are true uv maps

	I can tell you with 100% certainty they are not. For example, Crash
	doesn't have a backside for his torso. You could definitely make a
	model that uses these as textures, but you'd really have to force it
	and a lot of it would be stretched or look weird. If you want to go
	this approach, it would make a lot more sense to make a model, unwrap
	it, and use the wireframe UV map as input.

	Here's the original Crash model: [1] , its actual texture is nothing
	like the generated one, because the real one was designed for
	efficiency.

	[1]: https://models.spriters-resource.com/pc_computer/crashbandic...

	Nition wrote 1 day ago:
	That's a remake model in a modern game. The original Crash was even
	simpler than that one.

	Most of Crash in the first game was not textured; just vertex
	colours. Only the fur on his back and his shoelaces were textures
	at all.

	gs17 wrote 1 day ago:
	"Original" as in the original of the one they used in their
	tweet.

	agentifysh wrote 1 day ago:
	yeah definitely impressive compared to what nano banana outputted

	tried your suggested approach by unwrapaped wireframe uv as input
	and im impressed [1] obviously its not going to be accurate 1:1 but
	with more 3d spatial awareness i think it could definitely improve

	[1]: https://x.com/AgentifySH/status/2001057153235222867

	gs17 wrote 1 day ago:
	> Still some scientific inaccuracies, but ~70% correct

	That's still dangerously bad for the use-case they're proposing. We
	don't need better looking but completely wrong infographics.

	rcarmo wrote 1 day ago:
	We donât, but most Marketing departments salivate for them.

	astrange wrote 1 day ago:
	It's pretty common for infographics to be wrong. The people making
	them aren't the same people who know the facts.

	I'd especially say like 100% of amateur political infographics/memes
	are wrong. ("climate change is caused by 100 companies" for instance)

	anonfunction wrote 1 day ago:
	So the announcement said the API works with the new model, so I updated
	my Golang SDK grail ( [1] ) to use but it returns a 500 server error
	when you try to use it, and if you change to a completely unknown model
	it's not listed in the available models:

	POST "https://api.openai.com/v1/responses": 500 Internal Server Error
	{
	"message": "An error occurred while processing your request. You
	can retry your request, or contact us through our help center at
	help.openai.com if the error persists. Please include the request ID
	req_******************* in your message.",
	"type": "server_error",
	"param": null,
	"code": "server_error"
	}

	POST "https://api.openai.com/v1/responses": 400 Bad Request {
	"message": "Invalid value: 'blah'. Supported values are:
	'gpt-image-1' and 'gpt-image-1-mini'.",
	"type": "invalid_request_error",
	"param": "tools[0].model",
	"code": "invalid_value"
	}

	[1]: https://github.com/montanaflynn/grail

	aziis98 wrote 1 day ago:
	I know this is a bit out of scope for these image editing models but I
	always try this experiment [1] of drawing a "random" triangle and then
	doing some geometric construction and they mess up in very funny ways.
	These models can't "see" very well. I think [2] is still very relevant.

	[1]

	[1]: https://chatgpt.com/share/6941c96c-c160-8005-bea6-c809e58591c1
	[2]: https://vlmsareblind.github.io/

	zkmon wrote 1 day ago:
	AI-generated images would remove all the trust and admire for human
	talent in art, similar to how text-generation would remove trust and
	admire for human talent in writing. Same case for coding.

	So, let's simulate that future. Since no one trusts your talent in
	coding, art or writing, you wouldn't care to do any of these. But the
	economy is built on the products and services which get their value
	based how much of human talent and effort is required to produce them.

	So, the value of these services and products goes down as demand and
	trust goes down. No one knows or cares who is a good programmer in the
	team, who is great thinker and writer and who is a modern Picasso.

	So, the motivation disappears for humans. There are no achievements to
	target, there is no way to impress others with your talent. This should
	lead to uniform workforce without much difference in talents. Pretty
	much a robot army.

	arnz-arnz wrote 1 day ago:
	all I can hope for is that a new industry or reliable ecosystem of
	vetters of real human talent will emerge. Are you really as good a
	writer as you claim to be? Show us the badge. That or AI firms have
	to be forced to 'watermark' all their creative outputs, and anyone
	misleading the public/audience should be punishable by law.

	zkmon wrote 1 day ago:
	Both are just mid-summer dreams. There is no global law to enforce
	watermark. There are no badges that can't be forged.

	arnz-arnz wrote 1 day ago:
	There isn't but that doesn't mean there won't be. It can even go
	as far as banning certain features. There isn't just hope with
	the kind of politics we have right now.

	gostsamo wrote 1 day ago:
	Alt text is one of the nicest uses for ai and still Open AI didn't
	bother using it for something so basic. The dogfooding is not strong
	with their marketing team.

	KaiserPro wrote 1 day ago:
	Is there a watermarking, or some other way for normal people to tell
	if its fake?

	qingcharles wrote 1 day ago:
	Not if you strip the EXIF data. Also, it will strip the star
	watermark and SynthID from Gemini if you paste a Nano Banana pic in
	and tell it to mirror it.

	wavemode wrote 1 day ago:
	I think society is going to need the opposite - cameras that can
	embed cryptographic information in the pixels of a video indicating
	the image is real.

	laurent123456 wrote 1 day ago:
	There are ways to tell if an image is real, if it's been signed
	cryptographically by the camera for example, but increasingly it
	probably won't be possible to tell if something is fake. Even if
	there's some kind of hidden watermark embedded in the pixels, you can
	process it with img2img in another tool and get rid of the watermark.
	Exif data, etc is irrelevant, you can get rid of it easily or fake
	it.

	ewoodrich wrote 1 day ago:
	Sure, you can always remove it, but an average person posting AI
	images on Facebook or whatever probably won't bother. I was
	skeptical of Google's SynthID when I first heard about it but I've
	been seeing it used to identify suspected AI images on Reddit
	recently (the example I saw today was cropped and lightly edited
	with a filter but still got flagged correctly) and it's cool to
	have a hard data point when present. It won't help with
	bad/manipulative actors but a decent mitigation for the low effort
	slop scenario since it can survive the kind of basic editing a
	regular person knows how to do on their phone and typical
	compression when uploading/serving.

	mnorris wrote 1 day ago:
	I ran exiftool on an image I just generated:

	$ exiftool chatgpt_image.png

	...

	Actions Software Agent Name : GPT-4o

	Actions Digital Source Type : [1] Name
	: jumbf manifest

	Alg : sha256

	Hash : (Binary data 32 bytes, use -b
	option to extract)

	Pad : (Binary data 8 bytes, use -b option
	to extract)

	Claim Generator Info Name : ChatGPT

	...

	[1]: http://cv.iptc.org/newscodes/digitalsourcetype/trainedAlgori...

	KaiserPro wrote 1 day ago:
	Exif isn't all that robust though.

	I suppose I'm going to have to bite the bullet and actually train
	an AI detector that works roughly in real time.

	mmh0000 wrote 1 day ago:
	I know OpenAI watermarks their stuff. But I wish they wouldn't. It's
	a "false" trust.

	Now it means whoever has access to uncensored/non-watermarking models
	can pass off their faked images as real and claim, "Look! There's no
	watermark, of course, it's not fake!"

	Whereas, if none of the image models did watermarking, then people
	(should) inherently know nothing can be trusted by default.

	pbmonster wrote 1 day ago:
	Yeah, I'd go the other way. Camera manufacturers should have the
	camera cryptographically sign the data from the sensor directly in
	hardware, and then provide an API to query if a signed image was
	taken on one of their cameras.

	Add an anonymizing scheme (blind signatures or group signatures),
	done.

	PhilippGille wrote 1 day ago:
	[1] It doesn't mention the new model, but it's likely the same or
	similar.

	[1]: https://help.openai.com/en/articles/8912793-c2pa-in-chatgpt-...

	adrian17 wrote 1 day ago:
	I just checked several of the files uploaded to the news post, the
	"previous" and "new", both the png and webp (&fm=webp in url)
	versions - none had the content metadata. So either the internal
	version they used to generate them skipped them, or they just
	stripped the metadata when uploading.

	dzonga wrote 1 day ago:
	we seriously can't be burning GW of energy just to have sama in a
	GPT-Shirt Ad generated by A.I

	impressive stuff though - as you can give it a base image + prompt.

	astrange wrote 1 day ago:
	It's a joke about one of his old fits.

	[1]: https://x.com/coldhealing/status/1747270233306644560

	drawnwren wrote 1 day ago:
	counterpoint: we should make energy abundant enough that it really
	doesn't matter if sama wants to generate gpt-shirt ads or not.

	we have the capability, we just stopped making power more abundant.

	iknowstuff wrote 1 day ago:
	I think we can say the pause we took was reasonable once we
	realized the environmental impact of dumping greenhouse gases into
	the atmosphere but if now that can ensure further growth wonât do
	it, letâs make sure we restart, just clean this time.

	sfmike wrote 1 day ago:
	Hope to see more "red alert" status from the ai wars putting companies
	into al hands on deck. This is only helping cost of tokens and
	efficacy. As always competition only helps the end users.

	surrTurr wrote 1 day ago:
	not super impressed. feels like 70% as good as nano banana pro.

	oxag3n wrote 1 day ago:
	If this was a farm of sweatshop Photoshopers in 2010, who download all
	images from the internet and provide a service of combining them on
	your request, this would escalate pretty quickly.

	Question: with copyright and authorship dead wrt AI, how do I make (at
	least) new content protected?

	Anecdotal: I had a hobby of doing photos in quite rare style and lived
	in a place where you'd get quite a few pictures of. When I asked gpt to
	generate a picture of that are in that style, it returned highly
	modified, but recognizable copy of a photo I've published years ago.

	pfortuny wrote 1 day ago:
	I guess some kind of hard (repetitive) steganography where the
	private key signature of the original photo is somehow encoded lots
	of times; also watermarking everything and asking the reader for some
	kind of verification if they want their non-watermarked copy.

	There seems to be no other way (apart from air-gapping everything, as
	others say).

	999900000999 wrote 1 day ago:
	A middle ground would be Chat GPT at least providing attribution.

	Back in reality, you can get in line to sue. Since they have more
	money than you, you can't really win though.

	So it goes.

	ur-whale wrote 1 day ago:
	> Question: with copyright and authorship dead wrt AI, how do I make
	(at least) new content protected?

	Question: Now that the steamboats have been invented, how do I keep
	my clipper business afloat ?

	Answer: Good riddance to the broken idea of IP, Schumpeter's Gale is
	around the corner, time for a new business model.

	LudwigNagasena wrote 1 day ago:
	Using references is a standard industry practice for digital art and
	VFX. The main difference is that you are unable to accidentally copy
	a reference too close, while with AI itâs possible.

	mortenjorck wrote 1 day ago:
	> how do I make (at least) new content protected?

	Air gap. If you donât want content to be used without your
	permission, it never leaves your computer. This is the only
	protection that works.

	If you want others to see your content, however, you have to accept
	some degree of trade off with it being misappropriated. Blatant cases
	can be addressed the same as they always were, but a model
	overfitting to your original work poses an interesting question for
	which Iâm not aware of any legal precedents having been set yet.

	echelon wrote 1 day ago:
	Horror scenario:

	Big IP holders will go nuclear on IP licensing to an extent we've
	never seen before.

	Right now, there are thousands of images and videos of Star Wars,
	Pokemon, Superman, Sonic, etc. being posted across social media.
	All it takes is for the biggest IP conglomerates to turn into
	linear tv and sports networks of the past and treat social media
	like cable.

	Disney: "Gee {Google,Meta,Reddit,TikTok}, we see you have a lot of
	Star Wars and Marvel content. We think that's a violation of our
	rights. If you want your users to continue to be able to post our
	media, you need to pay us $5B/yr."

	I would not be surprised if this happens now that every user on the
	internet can soon create high-fidelity content.

	This could be a new $20-30B/yr business for Disney. Nintendo, WBD,
	and lots of other giant IP holders could easily follow suit.

	empressplay wrote 1 day ago:
	Disney invests $1 billion in OpenAI, licenses 200 characters for
	AI video app Sora

	[1]: https://arstechnica.com/ai/2025/12/disney-invests-1-bill...

	echelon wrote 1 day ago:
	One day later, "Google pulls AI-generated videos of Disney
	characters from YouTube in response to cease and desist": [1]
	The next step is to take this beyond AI generations and to
	license rights to characters and IP on social media directly.

	The next salvo will be where YouTube has to take down all major
	IP-related content if they don't pay a licensing fee.
	Regardless of how it was created. Movie reviews, fan
	animations, video game let's plays.

	I've got a strong feeling that day is coming soon.

	[1]: https://www.engadget.com/ai/google-pulls-ai-generated-...

	margorczynski wrote 1 day ago:
	We are probably entering the post-copyright era. The law will follow
	sooner or later.

	oblio wrote 4 hours 9 min ago:
	Yup, just like the post-copyright era followed the dawn of the
	internet and the emergence of Napster.

	rafram wrote 1 day ago:
	That seems unlikely to me. One side is made up of lots and lots of
	entrenched interests with sympathetic figures like authors and
	artists on their side, and the other is âbig tech,â dominated
	by the rather unsympathetic OpenAI and Google.

	realharo wrote 1 day ago:
	The other side however has the "if you restrict us, China will
	win" argument on their side.

	panopticon wrote 16 hours 34 min ago:
	That argument is easy to politicize and selectively ignore.
	See: renewables and EVs.

	nobody_r_knows wrote 1 day ago:
	my question to your anecdotal: who cares? not being fecicious, but
	who cares if someone reproduced your stuff and millions of people see
	your stuff? is the money that you want? is it the fame? because fame
	you will get, maybe not money... but couldn't there be another way?

	whywhywhywhy wrote 14 hours 27 min ago:
	The people building the tech are extremely fussy about their work
	being cited and extremely protective of their models files so they
	themselves have massive issues with their work being used or
	replicated non-consensually.

	oxag3n wrote 1 day ago:
	To clarify my question - I do not want anything I create to be fed
	into their training data. That photo is just an example that I
	caught and it became personal. But in general I don't want anymore
	to open source my code, write articles and put any effort into
	improving training data set.

	Forgeties79 wrote 1 day ago:
	As a professional cinematographer/photographer I am incredibly
	uncomfortable with people using my art without my permission for
	unknown ends. Doubly so when itâs venture backed private
	companies stealing from millions of people like me as they make
	vague promises about the capabilities of their software trained on
	my work. It doesnât take much to understand why that makes me
	uncomfortable and why I feel I am entitled to saying âno.â
	Legally I am entitled to that in so many cases, yet for some reason
	Altman et al get to skip that hurdle. Why?

	How do you feel about entities taking your face off of your
	personal website and plastering it on billboards smiling happily
	next to their product? What if itâs for a gun? Or condoms? Or a
	candidate for a party you donât support? Pick your own example if
	none of those bother you. Iâm sure there are things you do not
	want to be associated with/donât want to contribute to.

	At the end of the day itâs very gross when we are exploited
	without our knowledge or permission so rich groups can get richer.
	I donât care if my visual work is only partially contributing to
	some mashed up final image. I donât want to be a part of it.

	vintermann wrote 1 day ago:
	> How do you feel about entities taking your face off of your
	personal website and plastering it on billboards smiling happily
	next to their product?

	That would be misrepresentation. Even Stallman isn't OK with
	that. You can take one of his opinion pieces and publish it as
	your own. Or you can attach his name to it.

	However, if you're editing it and releasing it under his name,
	clearly you're simply lying, and nobody is OK with that. People
	have the right to be recognized as authors of things they did
	author (if they so desire) and they have a right to NOT be
	associated with things they didn't.

	> At the end of the day itâs very gross when we are exploited
	without our knowledge or permission so rich groups can get
	richer.

	The second part is the entirety of the problem. If I'm
	"exploited" in a way where I can't even notice it, and I'm not
	worse off for it, how is it even exploitation? But people
	amassing great power is a problem no matter if they do it with
	"legitimate" means or not.

	Forgeties79 wrote 1 day ago:
	If somebody is stealing from your bank account every week and
	you just donât notice it, are you not being stolen from? Has
	nobody stolen your credit card and used it until the moment you
	notice the charges. I donât really think we can go âif a
	tree fall in the forest and nobody is around to hear itâ¦â
	about this.

	Stallman has his opinions on software, I have my opinions on my
	visual work. I donât get really how that applies here or why
	that settles this matter.

	vintermann wrote 1 day ago:
	If someone steals from my bank account I certainly CAN notice
	it even if I don't immediately, and I'm certainly worse off.

	That's such a bad straw man I wonder if you're really
	supporting the position you claim to be supporting. Maybe
	you're just trying to give it a bad name.

	Your opinion isn't on visual work, but visual property. You
	don't demand to be paid for your work - your labor. Rather
	you traded that for the dream of being paid rent on a capital
	object, in perpetuity (or close enough). Artists lost to the
	power-mongers when we bit at that bait.

	Forgeties79 wrote 20 hours 40 min ago:
	If you think thatâs a bad example so be it but Iâm not
	attempting to make a strawman or give anything a bad name.

	I donât really know where all the hostility came from in
	this conversation but I think itâs best if we move on.

	smileson2 wrote 1 day ago:
	You should be proud your work will now be distilled enterally and
	an aspect of your work will forever influence the world

	Forgeties79 wrote 20 hours 40 min ago:
	Iâm not

	CamperBob2 wrote 1 day ago:
	The day after I first heard about the Internet, back in
	1990-whatever, it occurred to me that I probably shouldn't upload
	anything to the Internet that I didn't want to see on the front
	page of tomorrow's newspaper.

	Apart from the 'newspaper' anachronism, that's pretty much still
	my take.

	Sorry, but you'll just have to deal with it and get over it.

	Forgeties79 wrote 1 day ago:
	> Sorry, but you'll just have to deal with it and get over it.

	You were fine until this bit.

	onraglanroad wrote 1 day ago:
	They're still fine because they're right.

	You got to play the copyright game when the big corps were on
	your side.

	Now they're on the other side. Deal with it and get over it.

	Forgeties79 wrote 1 day ago:
	You are not entitled to my art. Comparing that to copyright
	abuse by large corporations is ridiculous.

	CamperBob2 wrote 20 hours 23 min ago:
	I get access to inspiration from everybody's art, and so
	do you. Seems like a good deal to me.

	Meanwhile, the next generation of great artists is
	already at work down the street from you. Some kids
	you've never heard of, playing around in a basement or
	garage you've probably driven past a hundred times.
	They're learning to make the most of the tools at hand,
	just like the old masters did. Except the tools at hand
	this time are little short of godlike.

	It's an exciting time. If you wanted things to stay the
	same, you shouldn't have gone into technology or art.

	Forgeties79 wrote 14 hours 38 min ago:
	Inspiring artists =/= involuntarily training privately
	owned LLMâs that charge for access.

	If you want me to hand some of my work over to artists
	so they can learn and grow and experiment, send them my
	way. Happy to help.

	CamperBob2 wrote 13 hours 40 min ago:
	Inspiring artists =/= involuntarily training
	privately owned LLMâs that charge for access.

	Agreed there, which is why it's important to work for
	open access to the results. The resulting regime
	won't look much like present-day copyright law, but
	if we do it right, it will be better for us all.

	In other words, instead of insisting that "No one can
	have this," or "Only a few can have this," which
	(again) will not be options for works that you
	release commercially, it's better IMHO to insist that
	"Everyone can have this."

	Forgeties79 wrote 49 min ago:
	> In other words, instead of insisting that "No one
	can have this," or "Only a few can have this,

	Please show me where I ever said anything remotely
	like that. Youâre painting my stance as very all
	or nothing, which is inaccurate.

	Youâre trying to make me into some caricature
	that you can grind your axe against, when Iâm
	somebody who doesnât even agree with modern
	copyright law. I think weâre past the point of
	productivity, so Iâll just leave it there. Have a
	good one

	illwrks wrote 1 day ago:
	The issue is ownership, not promotion or visibility.

	jibal wrote 1 day ago:
	facetious

	[I won't bother responding to the rest of your appalling comment]

	swatcoder wrote 1 day ago:
	People have values that go beyond wealth and fame. Some people care
	about things like personal agency, respect and deference, etc.

	If someone were on vacation and came home to learn that their
	neighbor had allowed some friends stay in the empty house, we would
	often expect some kind of outrage regardless of whether there had
	been specific damage or wear to the home.

	Culturally, people have deeply set ideas about what's theirs, and
	feel like they deserve some say over how their things are used and
	by whom. Even those that are very generous and want their things be
	widely shared usually want to have have some voice in making that
	come to be.

	visarga wrote 1 day ago:
	If I were a creative I would avoid seeing any work I am not
	legally allowed to get inspired by, why install furniture into my
	brain I can't sit on? I see this kind of IP protection as
	poisoned grounds, can't do anything on top of it.

	netule wrote 1 day ago:
	Suddenly, copyright doesn't matter anymore when it's no longer
	useful to the narrative.

	CamperBob2 wrote 1 day ago:
	(Shrug) This is more important. Sorry.

	ragequittah wrote 1 day ago:
	Copyright has overstepped its initial purpose by leaps and bounds
	because corporations make the law. If you're not cynical about
	how Copyright currently works you probably haven't been paying
	attention. And it doesn't take much to go from cynical to
	nihilist in this case.

	netule wrote 1 day ago:
	There's definitely a case of miscommunication at play if you
	didn't read cynicism into my original post. I broadly agree
	with you, but I'll leave it at that to prevent further
	fruitless arguing about specifics.

	BoorishBears wrote 1 day ago:
	OpenAI does care about copyright, thankfully China does not: [1]
	(to clarify, OpenAI stops refining the image if a classifier
	detects your image as potentially violating certain copyrights.
	Although the gulf in resolution is not caused by that.)

	[1]: https://imgur.com/a/RKxYIyi

	blurbleblurble wrote 1 day ago:
	It's really weird to see "make images from memories that aren't real"
	as a product pitch

	impjohn wrote 1 day ago:
	This is what struck me as well. I got weird undertones of 'Now you
	don't even need to have real memories! Just fabricate them.' They
	even prominently showcase edits of placing you with another person,
	further deepening disingenuous or parasocial relationships

	999900000999 wrote 1 day ago:
	I can actually imagine actors selling the rights to make fake images
	with them.

	In late stage capitalism you pay for fake photos with someone. You
	have chat gpt write about how you dated for a summer, and have it end
	with them leaving for grad school to explain why you aren't together.

	Eventually we'll all just pay to live in the matrix. When your credit
	card is declined you'll be logged out, to awaken in a shared studio
	apartment. To eat your rations.

	oblio wrote 3 hours 26 min ago:
	> When your credit card is declined you'll be logged out, to awaken
	in a shared studio apartment. To eat your rations.

	You're funny. No, you'll awaken in a tent, next to your shopping
	cart, under the bridge.

	ares623 wrote 1 day ago:
	I can see them getting paid like residuals from TV re-runs.

	But after a point it'll hit saturation point. The novelty will wear
	off since everyone has access to it. Who cares if you have a fake
	photo with a celebrity if everyone knows it's fake.

	nurettin wrote 1 day ago:
	It would creep me out if the model produced origami animals for that
	prompt.

	kingstnap wrote 1 day ago:
	It's strange to me too, but they must have done the market research
	for what people do with image gen.

	My own main use cases are entirely textual: Programming, Wiki, and
	Mathematics.

	I almost never use image generation for anything. However its
	objectively extremely popular.

	This has strong parallels for me to when snapchat filters became
	super popular. I know lots of people loved editing and filtering
	pictures but I always left everything as auto mode, in fact I'd turn
	off a lot of the default beauty filters. It just never appealed to
	me.

	StarterPro wrote 1 day ago:
	In the image they showed for the new one, the mechanic was checking a
	dipstick...that was still in the vehicle.

	I really hope everyone is starting to get disillusioned with OpenAI.
	They're just charging you more and more for what? Shitty images that
	are easy to sniff out?

	In that case, I have a startup for you to invest in. Its a
	bridge-selling app.

	czhu12 wrote 1 day ago:
	Havenât their prices stayed at $20/m for a while now?

	wahnfrieden wrote 1 day ago:
	They've published anticipated price increases over coming years.
	Prices will rise dramatically and steadily to meet revenue targets.

	cheema33 wrote 1 day ago:
	AI doesnât have much of a moat. People can and will easily
	switch providers.

	wahnfrieden wrote 1 day ago:
	Sure but there are only a couple leading providers worth
	considering for coding at least, and there will be
	consolidation once investment pulls back. They may find a way
	to collude on raising prices.

	Where switching will be easier is with casual chat users plus
	API consumers that are already using substandard models for
	cost efficiency. But there will also always be a market for
	state of art quality.

	wahnfrieden wrote 19 hours 53 min ago:
	Reinforced today:

	As Gemini has gained competitiveness (higher confidence in
	its output, better reputation), its prices have steadily
	risen

	0dayman wrote 1 day ago:
	nah Nano Banana Pro is much better

	alasano wrote 1 day ago:
	It's still not available in the API despite them announcing the
	availability.

	They even linked to their Image Playground where it's also not
	available..

	I updated my local playground to support it and I'm just handling the
	404 on the model gracefully

	[1]: https://github.com/alasano/gpt-image-1-playground

	weird-eye-issue wrote 1 day ago:
	My Enterprise account got an email 1.5 hours ago that it is available
	in API but my other accounts haven't gotten any email yet

	anonfunction wrote 1 day ago:
	Yeah I just tried it and got a 500 server error with no details as to
	why:

	POST "https://api.openai.com/v1/responses": 500 Internal Server
	Error {
	"message": "An error occurred while processing your request. You
	can retry your request, or contact us through our help center at
	help.openai.com if the error persists. Please include the request ID
	req_******************* in your message.",
	"type": "server_error",
	"param": null,
	"code": "server_error"
	}

	Interestingly if you change to request the model foobar you get an
	error showing this:

	POST "https://api.openai.com/v1/responses": 400 Bad Request {
	"message": "Invalid value: 'blah'. Supported values are:
	'gpt-image-1' and 'gpt-image-1-mini'.",
	"type": "invalid_request_error",
	"param": "tools[0].model",
	"code": "invalid_value"
	}

	minimaxir wrote 1 day ago:
	It's a staggered rollout but I am not seeing it on the backend
	either.

	joshstrange wrote 1 day ago:
	> staggered rollout

	It's too bad no OpenAI Engineers (or Marketers?) know that term
	exists. /s

	I do not understand why it's so hard for them to just tell the
	truth. So many announcements "Available today for Plus/Pro/etc"
	really means "Sometime this week at best, maybe multiple weeks".
	I'm not asking for them to roll out faster, just communicate
	better.

	rvz wrote 1 day ago:
	Another bunch of "startups" have been eliminated.

	moralestapia wrote 1 day ago:
	Among those, Photoshop.

	koakuma-chan wrote 1 day ago:
	I wish. Even Nano Banana Pro still sucks for even basic operations.

	mohsen1 wrote 1 day ago:
	Unlike Nano Banana it allows generating photos of children. Always fun
	to ask AI to imagine children of a couple but it's also kinda
	concerning that there might be terrible use cases.

	BoorishBears wrote 1 day ago:
	I haven't seen that, meanwhile gpt-image-1.5 still has zero-tolerance
	policing copyright (even via the API) so it's pretty much useless in
	production once exposed to consumers.

	I'm honestly surprised they're still on this post-Sora 2: let the
	consumer of the API determine their risk appetite. If a copyright
	holder comes knocking, "the API did it" isn't going to be a defense
	either way.

	hexage1814 wrote 1 day ago:
	If memory serves me, Nano Banana allows generating/editing photos of
	children. But anything that could be misinterpreted, gets blocked,
	even absolutely benign and innocent things (especially if you are
	asking to modify a photo that you upload there). So they allow, but
	they turn on the guardrails to a point that might not be useful in
	many situations.

	r053bud wrote 1 day ago:
	I was able to generate photos of my imagined children via Nano Banana

	catigula wrote 1 day ago:
	Nano Banana Pro is so good that any other attempt feels 1-2 generations
	behind.

	Jonovono wrote 1 day ago:
	Nano banana pro is almost as good as seedream 4.5!

	BoorishBears wrote 1 day ago:
	Seedream 4.5 is almost as good as Seedream 4!

	(Realistically, Seedream 4 is the best at aesthetically pleasing
	generation, Nano Banana Pro is the best at realism and editing, and
	Seedream 4.5 is a very strong middleground between the two with
	great pricing)

	gpt-image-1.5 feels like OpenAI doing the bare minimum to keep
	people from switching to Gemini every time they want an image.

	pdevr wrote 1 day ago:
	>Now remove the two men, just keep the dog, and put them in an OpenAI
	livestream that looks like the attached image.

	Where is the image given along with the prompt? If I didn't miss it:
	Would have been nice to show the attached image.

	taytus wrote 1 day ago:
	on top of the prompt. It has a weird layout; I had to scroll up to
	see it.

	xnx wrote 1 day ago:
	Great to have continued competition in the different model types.

	What angle is there for second tier models? Could the future for OpenAI
	be providing a cheaper option when you don't need the best? It seems
	like that segment would also be dominated by the leading models.

	I would imagine the future shakes out as: first class hosted models,
	hosted uncensored models, local models.

	sharkjacobs wrote 1 day ago:
	Was it ever explained or understood why ChatGPT Images always has
	(had?) that yellow cast?

	onoesworkacct wrote 1 day ago:
	There's definitely an analysis on the net somewhere, can't remember
	the details though.

	efilife wrote 1 day ago:
	I'm guessing that it was intentional all along, as no other models
	exhibit this behavior. It was so it could be instantly recognized as
	ChatGPT

	varjag wrote 1 day ago:
	Not always, it started at a very specific point. Studio Ghibli craze
	+ reinforcement learning on the likes.

	weird-eye-issue wrote 1 day ago:
	That's not how it works the model doesn't just update in real time
	to likes and besides it was already yellow upon release

	minimaxir wrote 1 day ago:
	The Studio Ghibli craze started with the initial release of images
	in ChatGPT, and the yellow filter has always existed even at that
	time. They did not make changes to the model as a result of RL
	(until pontentially today, with a new model)

	dvngnt_ wrote 1 day ago:
	maybe their version of synth-id? it at least helps me spot gpt images
	vs gemini's

	KaiserPro wrote 1 day ago:
	Meta's codec avatars all have a green cast because they spent
	millions on the rig to capture whole bodies and even more on rolling
	it out to get loads of real data.

	They forgot to calibrate the cameras, so everything had a green tint.

	Meanwhile all the other teams had a billion macbeth charts lying
	around just in case.

	jiggawatts wrote 1 day ago:
	Also, you'd be shocked at how few developers know anything at all
	about sRGB (or any other gamut/encoding), other than perhaps the
	name. Even people working in graphics, writing 3D game engines,
	working on colorist or graphics artist tools and libraries.

	viraptor wrote 1 day ago:
	My pet theory is that this is the "Mexico filter" from movies leaking
	through the training data.

	vunderba wrote 1 day ago:
	I never heard anything concrete offered. At least it's relatively
	easy to work around with a tone mapping / LUTs.

	ACCount37 wrote 1 day ago:
	Not really, but there's a number of theories. The simplest one is
	that they "style tuned" the AI on human preference data, and this
	introduced a subtle bias for yellow.

	And I say "subtle" - but because that model would always "regenerate"
	an image when editing, it would introduce more and more of this
	yellow tint with each tweak or edit. Which has a way of making a
	"subtle" bias anything but.

	amoursy wrote 1 day ago:
	There was also the theory that is was because they scanned a bunch
	of actual real books and book paper has a slight yellow hue.

	danielbln wrote 1 day ago:
	That seems unlikely, as we didn't see anything like that with
	Dall-E, unless the auto regressive nature of gpt-image somehow
	was more influenced by it.

	minimaxir wrote 1 day ago:
	My pet theory is that OpenAI screwed up the image normalization
	calculation and was stuck with the mistake since that's something
	that can't be worked around.

	At the least, it's not present in these new images.

	swyx wrote 1 day ago:
	wdym it cant be worked around when there exist literal yellow tint
	corrector models/tools haha

	minimaxir wrote 19 hours 59 min ago:
	There's a possibility that any automatic correction could have
	false positives (since the yellow tint doesn't happen 100% of the
	time) which creates different problems where a image could have
	an even weirder hue.

	ineedasername wrote 1 day ago:
	Yeah, though I can imagine a conversation like this:

	SWE: "Seriously? import PIL \ read file \ == (c + 10%, m = m, y =
	y, k = k) \ save file done!"

	Exec: "Yeah, and first blogger get's a hold of image #1 they
	generate, starts saying 'Hey! This thing's been color corrected
	w/o AI! lol lame'"

	Or not, no idea. i've not understood the choice either, besides
	very intelligent AI-driven auto-touch up for lighting/color
	correction has been a thing for a while. It's just, for those I
	end up finding an answer for, maybe 25% of head scratcher
	decisions do end of having a reasonable, if non intuitive answer
	for. Here? haven't been able to figure one yet though, or find a
	reason/mention by someone who appears to have an inside line on
	it.

	BoorishBears wrote 1 day ago:
	There's still something off in the grading, and I suspect they
	worked around it

	(although I get what you mean, not easily since you already
	trained)

	I'm guessing when they get a clean slate we'll have Image 2 instead
	of 1.5. In LMArena it was immediately apparent it was an OpenAI
	model based on visuals.

	kingkawn wrote 1 day ago:
	Colloquially called the urine filter

	jebronie wrote 1 day ago:
	lets not mince words, its called the "piss filter"

	ezero wrote 1 day ago:
	Even from their own curated examples, this looks quite a bit worse than
	nano banan in terms of preserving consistency on image edits.

	mortenjorck wrote 1 day ago:
	Nano Banana became useless for image edits once the safety training
	started rejecting anything as âI canât edit some public
	figures.â

	My own profile picture? Canât edit some public figures. A famous
	Norman Rockwell painting from 80 years ago? Canât edit some public
	figures.

	Safetyâd into oblivion.

	almosthere wrote 1 day ago:
	I didn't have a good experience with NB. I am half Indian.
	Immediately changes my face to a prototypical Indian man every time I
	use it.

	This tool is keeping my look the same.

	gundmc wrote 1 day ago:
	I find including "don't change anything else" in the NBP prompt
	goes a long way.

	almosthere wrote 1 day ago:
	I tried all of those types of prompts

	neom wrote 1 day ago:
	Anyone else have issues verifying with openai? I always get a "congrats
	you're done" screen with a green checkmark from Persona, nothing to
	click, and my account stays unverified. (Edit, mystically, it's
	fixed..!)

	minimaxir wrote 1 day ago:
	I have a Nano Banana Pro blog post in the works expanding on my
	experiments with Nano Banana ( [1] ). Running a few of my test cases
	from that post and the upcoming blog post through this new ChatGPT
	Image model, this new model is better than Nano Banana but MUCH worse
	than Nano Banana Pro which now nails the test cases that previously
	showed issues. The pricing is unclear but gpt-image-1.5 appears to be
	20% cheaper than the current gpt-image-1 model, which would put a
	`high`-quality generation in the same price range as Nano Banana Pro.

	One curious case demoed here in the docs is the grid use case. Nano
	Banana Pro can also generate grids, but for NBP grid adherence to the
	prompt collapses after going higher than 4x4 (there's only a finite
	amount of output tokens to correspond to each subimage), so I'm curious
	that OpenAI started with a 6x6 case albeit the test prompt is not that
	nuanced.

	[1]: https://news.ycombinator.com/item?id=45917875

	echelon wrote 1 day ago:
	I've been a filmmaker for 10+ years. I really want more visual tools
	that let you precisely lay out consistent scenes without prompting.
	This is important for crafting the keyframes in an image-to-video
	style workflow, and is especially important for long form narrative
	content.

	One thing that gpt-image-1 does exceptionally well that Nano Banana
	(Pro) can't is previz-to-render. This is actually an incredibly
	useful capability.

	The Nano Banana models take the low-fidelity previz
	elements/stand-ins and unfortunately keep the elements in place
	without attempting to "upscale" them. The model tries to preserve
	every mistake and detail verbatim.

	Gpt-image-1, on the other hand, understands the layout and blocking
	of the scene, the pose of human characters, and will literally repair
	and upscale everything.

	Here's a few examples:

	- 3D + Posing + Blocking: [1] - Again, but with more set re-use: [2]
	- Gaussian splats: [3] - Gaussians again: [4] We need models that can
	do what gpt-image-1 does above, but that have higher quality, better
	stylistic control, faster speed, and that can take style references
	(eg. glossy Midjourney images).

	Nano Banana team: please grow these capabilities.

	Adobe is testing and building some really cool capabilities:

	- Relighting scenes: [5] - Image -> 3D editing: [6] (payoff is at
	3:54)

	- Image -> Gaussian -> Gaussian editing: [7] - 3D -> image with
	semantic tags: [8] I'm trying to build the exact same things that
	they are, except as open source / source available local desktop
	tools that we can own. Gives me an outlet to write Rust, too.

	[1]: https://youtu.be/QYVgNNJP6Vc
	[2]: https://youtu.be/QMyueowqfhg
	[3]: https://youtu.be/iD999naQq9A
	[4]: https://youtu.be/IxmjzRm1xHI
	[5]: https://youtu.be/YqAAFX1XXY8?si=DG6ODYZXInb0Ckvc&t=211
	[6]: https://youtu.be/BLxFn_BFB5c?si=GJg12gU5gFU9ZpVc&t=185
	[7]: https://youtu.be/z3lHAahgpRk?si=XwSouqEJUFhC44TP&t=285
	[8]: https://youtu.be/z275i_6jDPc?si=2HaatjXOEk3lHeW-&t=443

	pablonaj wrote 1 day ago:
	Love the samples of the app you are making, will be testing it!

	echelon wrote 1 day ago:
	Images make this even easier to see (though predictable and precise
	video is what drives the demand) :

	gpt-image-1: [1] (fixed link - imgur deleted the last post for some
	reason)

	gpt-image-1.5: [2] nano banana / pro: [3] gpt-image-1 excels in
	these cases, despite being stylistically monotone.

	I hope that Google, OpenAI, and the various Chinese teams lean in
	on this visual editing and blocking use case. It's much better than
	text prompting for a lot of workflows, especially if you need to
	move the camera and maintain a consistent scene.

	While some image editing will be in the form of "remove the
	object"-style prompts, a lot will be molding images like clay.
	Grabbing arms and legs and moving them into new poses. Picking up
	objects and replacing them. Rotating scenes around.

	When this gets fast, it's going to be magical. We're already
	getting close.

	[1]: https://imgur.com/gallery/previz-to-image-gpt-image-1-x8t1...
	[2]: https://imgur.com/a/previz-to-image-gpt-image-1-5-3fq042U
	[3]: https://imgur.com/a/previz-to-image-nano-banana-pro-Q2B8ps...

	qingcharles wrote 1 day ago:
	I just tested GPT1.5. I would say the image quality is on par with
	NBP in my tests (which is surprising as the images in their trailer
	video are bad), but the prompt adherence is way worse, and its "world
	model" if you want to call it that is worse. For instance, I asked it
	for two people in a row boat and it had two people, but the boat was
	more like a coracle and they would barely fit inside it.

	Also: SUPER ANNOYING. It seems every time you give it a modification
	prompt it erases the whole conversation leading up to the new pic?
	Like.. all the old edits vanish??

	I added "shaky amateur badly composed crappy smartphone photo of
	____" to the start of my prompts to make them look more natural.

	Counterpoint from someone on the Musk site:

	[1]: https://x.com/flowersslop/status/2001007971292332520

	vunderba wrote 1 day ago:
	I actually just finished running the Text-to-Image benchmark a few
	minutes ago. This matches my own testing as well. GPT-Image 1.5 is
	clearly a step up as an editing model, but it performed worse in
	purely generative tasks compared to its predecessor - dropping from
	11 (out of 14) to 9.

	Comparing NB Pro, GPT Image 1, and GPT Image 1.5

	[1]: https://genai-showdown.specr.net/?models=o4,nbp,g15

	abadar wrote 1 day ago:
	I really enjoyed your experiments. Thank you for sharing your
	experiences. They've improved my prompting and have tempered my
	expectations.

	vunderba wrote 1 day ago:
	I'll be running gpt-image-1.5 through my GenAI Showdown later today,
	but in the meantime if you want to see some legitimately impressive
	NB Pro outputs, check out: [1] In particular, NB Pro successfully
	assembled a jigsaw puzzle it had never seen before, generated
	semi-accurate 3D topographical extrapolations, and even swapped a
	window out for a mirror.

	[1]: https://mordenstar.com/blog/edits-with-nanobanana

	jngiam1 wrote 1 day ago:
	The mirror test is cool!

	IgorPartola wrote 1 day ago:
	Subtle detail but the little table casts a shadow because of the
	light in the window and the shadow remains unchanged after the
	mirror replaces the window.

	dash2 wrote 1 day ago:
	More obviously, the objects in the mirror aren't actually
	reversed!

	vunderba wrote 1 day ago:
	That one's on me! It was still using the old NB image.

	Updated the mirror test to use the NB Pro version.

	niklassheth wrote 1 day ago:
	Nice! Your comparison site is probably the best one out there for
	image models

	abbycurtis33 wrote 1 day ago:
	I still use Midjourney, because all of these major players are so bad
	at stylistic and creative work. They're singularly focused on
	photorealism.

	Sohcahtoa82 wrote 20 hours 30 min ago:
	In my experience, MidJourney creates the best overall-looking images,
	but it's the worst at sticking to your prompt.

	empressplay wrote 1 day ago:
	That's because it's a two-way street, a multi-modal model that is
	highly proficient at real-life image generation is also highly
	proficient at interpreting real-life image input, which is something
	sorely needed for robotics.

	ianbicking wrote 1 day ago:
	I haven't really kept up with what Midjourney has been doing the past
	year or two. While I liked the stylistic aspects of Midjourney, being
	able to use image examples to maintain stylistic consistency and
	character consistency is SO useful for creating any meaningful
	output. Have they done anything in that respect?

	That is, it's nice to make a pretty stand-alone image, but without
	tools to maintain consistency and place them in context you can't
	make a project that is more than just one image, or one video, or a
	scattered and disconnected sequence of pieces.

	xnx wrote 1 day ago:
	This is surprising. Is there a gallery of images that illustrates
	this?

	throwthrowuknow wrote 1 day ago:
	their explore page is a firehose of examples created by users and
	you can see the prompt used so you can compare the results in other
	services

	[1]: https://www.midjourney.com/explore?tab=video_top

	takoid wrote 1 day ago:
	Midjourney has a gallery on their website:

	[1]: https://www.midjourney.com/explore

	kingkawn wrote 1 day ago:
	This is a cultural flaw that predates image generation. Even PG has
	made statements on HN in the past equating ârendering skillâ with
	the quality of art works. Itâs a stand-in for the much more
	difficult task of understanding the work and value of culture making
	within the context of the society producing it.

	doctorpangloss wrote 1 day ago:
	Suppose the deck for Midjourney hit Paul Graham's desk, and the CEO
	was just an average Y Combinator CEO - so no previous success
	story. He would have never invested in Midjourney at seed stage
	(meaning before launch / before there were users) even if he were
	given the opportunity.

	Better to read that particular story in the context of, "It would
	be very difficult to make a seed fund that is an index of all avant
	garde culture making because [whatever]."

	FergusArgyll wrote 1 day ago:
	That's the opinionated vs user choice dynamic. When the opinions are
	good, they have a leg up

	ChrisArchitect wrote 1 day ago:
	Post: [1] ( [2] )

	[1]: https://openai.com/index/new-chatgpt-images-is-here/
	[2]: https://news.ycombinator.com/item?id=46291827

	dang wrote 1 day ago:
	We'll merge that thread hither to give some other submitters a
	chance.


	<- back to front page