_______ __ _______ | |
| | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----. | |
| || _ || __|| < | -__|| _| | || -__|| | | ||__ --| | |
|___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____| | |
on Gopher (inofficial) | |
Visit Hacker News on the Web | |
COMMENT PAGE FOR: | |
StoryDiffusion: Long-range image and video generation | |
gtoast wrote 22 min ago: | |
Its really challenging to think of the positive, constructive uses for | |
this technology without thiking of the myriad, life and societal | |
effecting uses for this. Just interpersonally the use of this | |
technology is heavily weighted towards destruction and deception. I | |
don't know where this ends or where researchers who release this | |
technology think this will go, but I can't imagine its going anywhere | |
good for all of us. | |
nephanth wrote 2 hours 3 min ago: | |
Um, the github link is a 404, and the paper link links to the webpage | |
itself (â the paper is not on arxiv). Probably they put the website | |
on too fast? | |
spywaregorilla wrote 5 hours 38 min ago: | |
How is this conceptually different from tracking an embedding for a | |
single character or training a lora on it? | |
MisterTea wrote 6 hours 53 min ago: | |
One day we won't have 3D engines or GPU's but AI chips that generate | |
the scenes without calculating a single triangle or loading a single | |
texture. We just stream in a scene, IP asset seeds provide the | |
characters, plot and story. But even those can be generated in | |
real-time. Video games, movies, anything will be on demand. No one will | |
act. No one will draw. We will just sit and ask for more. Strange | |
times. | |
whamlastxmas wrote 6 hours 52 min ago: | |
I had this same realization when Sora came out | |
jerpint wrote 9 hours 35 min ago: | |
The videos look incredible, but a lot of the captions are riddled with | |
grammar/syntax mistakes that seem odd for a model to make of that | |
quality. | |
speedgoose wrote 10 hours 18 min ago: | |
Is there a video of Will Smith eating spaghetti with this model? | |
smusamashah wrote 11 hours 24 min ago: | |
This is unbelievably good. Seems better than Sora even in terms of | |
natural look and motion in videos. | |
The video of two girls talking seems so natural. There are some | |
artifacts but the movement is so natural and clothes and other things | |
around are not continuously changing. | |
I hope it does become open source, which i suspect it won't because | |
it's coming from byte dance. | |
cchance wrote 11 hours 1 min ago: | |
I don't know if thats true, theirs a massive flicker in the guys hair | |
(the one thats mostly black background and black shirt) half way | |
through it completely loses tracking on his hair and it like snap | |
changes. | |
smusamashah wrote 7 hours 51 min ago: | |
If you compare this with current state of openly available video | |
models (assuming this will be open too) this is still a leap. If it | |
is going to be closed like Sora than it's comparable. Sora has | |
different kind of artifacts. | |
These artifacts are an improvement over current state. | |
pmontra wrote 14 hours 27 min ago: | |
The Moon in the sky seen from the surface of the Moon is wrong? Poetic? | |
Funny? Recursive? A demonstration that these models don't understand | |
anything? Add to the list. | |
gbickford wrote 17 hours 24 min ago: | |
It's always disappointing when people publish things to GitHub without | |
the intention of collaborating or sharing. | |
forgingahead wrote 17 hours 39 min ago: | |
Github link is broken, and I honestly find it frustrating that the only | |
link to code is the theme source and credits?? Is it really that | |
important to give the static page theme that much real estate instead | |
of actual code release for the project? | |
29athrowaway wrote 17 hours 43 min ago: | |
Time for Microsoft Chat 2.0 it seems. | |
topspin wrote 17 hours 48 min ago: | |
Love how under "Multiple Characters Generation" the white guy is "A | |
Man," whereas the someone else is "An Asian Man." Reminds me of Daryl | |
Gates and the "normal people" quote, thence patrol cars being called | |
"black and normals." | |
fnordpiglet wrote 17 hours 42 min ago: | |
A probabilistic regression models behavior will just demonstrate the | |
training data. Donât hate the player, hate the game. | |
topspin wrote 17 hours 12 min ago: | |
No hate for any part of this: it's just amusing. | |
peteradio wrote 17 hours 55 min ago: | |
There is a video of two girls. One girl seems to be sticking out her | |
tongue and then blowing a kiss, but the tongue is appearing again | |
mid-kiss. Very arousing stuff I'll say. Keep up the good work | |
microsft or goggle or whoever made it. | |
yard2010 wrote 10 hours 44 min ago: | |
Worse - bytedance | |
schoen wrote 18 hours 13 min ago: | |
I looked very closely at the videos for a while and managed to find | |
some minor continuity errors (like different numbers of buttons on | |
people's button-down shirts at different times, or different sizes or | |
styles of earrings, or arguably different interpretations of which | |
finger is which in an intermittently-obscured hand). I also think that | |
the cycling woman's shorts appear to cover more of her left leg than | |
her right leg, although that's not physically impossible, and the bear | |
seemingly has a differently-sized canine tooth at different times. | |
But I guess it took me multiple minutes to find these problems, | |
watching each video clip many times, rather than having any of them | |
jump out at me. So, it's not like literally full consistent object | |
persistence, but at a casual viewing it was very persuasive. | |
Maybe people who shoot or edit video frequently would notice some of | |
these problems more quickly, because they're more attuned to looking | |
for continuity problems? | |
chrsw wrote 5 hours 17 min ago: | |
There are lots of inconsistencies in these clips of the type you | |
would never find even in a hastily put together amateur film. I | |
wonder how you would even add continuity support into a generative | |
video model. It's got its training data, its model, its algorithms | |
for generating data... but could you say "make sure this shirt always | |
has 6 buttons in this scene"? Does it even understand what a button | |
is? Or a shirt? Or a thing? | |
It seems to me that eventually these systems are going to have to be | |
grounded in some hard truths about our world. Like, there are things | |
called objects, objects can be distinct, objects can have | |
relationships between other objects, etc. Then the generative network | |
would have to generate data around these priors. Or maybe they | |
already have that, I don't know how they work. | |
jononor wrote 1 hour 59 min ago: | |
Hopefully continuity (of relevant features) would be the results of | |
the training process, eventually. In videos from the wild, the | |
number of buttons on a shirt basically never changes during a | |
scene. That kind of information is in the training data already. So | |
it is theoretically possible for a model to learn that should stay | |
consistent, in contrast to other properties that affect the shirt, | |
like lighting or pose. But we are still in the very early days of | |
kinda-working video generation, and certainly in terms of temporal | |
consistency. | |
whywhywhywhy wrote 6 hours 57 min ago: | |
You can see this in Sora videos too if you look closely to things | |
like leaves of trees, you can tell some sort of bucketing is going on | |
temporally even on SOTA models | |
IanCal wrote 7 hours 10 min ago: | |
I think it's fascinating to watch what the issues/complaints are. I'm | |
in no way saying you're complaining but I think looking at what | |
people point out are the issues is a great measure of progress. | |
Here we're looking at video, of high quality individual frames, where | |
the inconsistencies are maybe clear and maybe not - but compared to | |
craion (around the time of dalle): [1] it's wild how that's changed. | |
And this capability was a vast improvement over things before (at | |
least ones that weren't fixed goals, the GAN approach to faces in | |
headshots was very lifelike before this) | |
[1]: https://i.ytimg.com/vi/lcoitxKbw_0/maxresdefault.jpg | |
grobgambit wrote 8 hours 49 min ago: | |
I am super picky when it comes to art and I think these look like | |
complete shit when compared to what I have seen from Sora. | |
Not even in the same ballpark. Even when things are wrong in Sora it | |
seems like the imagery is still very crisp. If I watched these videos | |
for 5 minutes I know I would get a headache. | |
taneq wrote 7 hours 39 min ago: | |
Werenât the Sora videos heavily edited/post produced? At least so | |
Iâve read, happy to be corrected here. | |
cchance wrote 11 hours 0 min ago: | |
I mean at the end of the day neither is standard video editing, how | |
many times have we all found inconsistencies in TV shows or random | |
water bottles showing up and disappearing in scenes... I imagine | |
diffusion video creation will be similar eventually funny anecdotes | |
of what we saw that time in LOTR 10 | |
vkou wrote 17 hours 33 min ago: | |
I'm immediately noticing significant issues with mouths | |
(specifically, when they are open). | |
It's also telling that most of the shots do their best to hide hands | |
- whenever they are visible, they are obviously broken. | |
godelski wrote 17 hours 40 min ago: | |
Did you miss the fish?[0] You should see the error in first viewing | |
What about the woman with glasses? Her face literally "jumps"[1] Same | |
with this guy's hands[2] | |
Interesting, we notice that [1] has "sora" in the name though I think | |
it is a reference to the main image on sora[3] | |
Not sure if the gallery is weird to anyone else, but it doesn't | |
exactly show new images and the position indicator is wonky. | |
The thing that makes me most suspicious is seeing the numbers on | |
these demos. 1, 2, 4 (terrifying to me), 5, 65, 66, 68, 72, 73, 83, | |
85, 86 (is this Simone Giertz? Vic Michaelis?). The part that is | |
tough about evaluating generative models is the cherry picking for | |
demonstrations. You have to do it or people tear your work apart but | |
also in doing so you give a false impression of what your work can | |
actually do. | |
IMO it has gotten out of hand and is not benefiting anyone. It makes | |
these papers more akin to advertising than communication of research. | |
We talk about integrity of the research community and why we argue | |
over borderline works but come on, if you can get a better review by | |
more samples, you can get better reviews by paying more, not by doing | |
better work. A pay to play system is far worse for the integrity of | |
ML (or any science) than arguing over borderline works. | |
Edit: I think it is also a bit problematic that this is posted BEFORE | |
the arxiv link or GitHub goes live. I'd appeal to the HN community to | |
not upvote these kinds of works until at least the paper is live. | |
[0] [1] [2] [1] [3] | |
[1]: https://storydiffusion.github.io/MagicStory_files/longvideo/... | |
[2]: https://storydiffusion.github.io/MagicStory_files/longvideo/... | |
[3]: https://storydiffusion.github.io/MagicStory_files/longvideo/... | |
[4]: https://openai.com/sora | |
nyokodo wrote 17 hours 54 min ago: | |
> But I guess it took me multiple minutes to find these problems | |
Iâm no video editor but I noticed straight away that The | |
charactersâ eyes and hair tend to change, sometimes dramatically as | |
they turn their head. Also, the head movement tends to be jerky or | |
abrupt especially in the middle of the turn. | |
justinclift wrote 15 hours 1 min ago: | |
Eyes and teeth seem like they still need further work. Still, | |
looks like things are improving. :) | |
brotherdusk wrote 18 hours 29 min ago: | |
sorry, i can't access the repo and the pdf doesn't have an href attr, | |
is that by design? | |
hbbio wrote 18 hours 30 min ago: | |
GitHub link is not public yet? | |
[1]: https://github.com/HVision-NKU/StoryDiffusion | |
ActionHank wrote 6 hours 12 min ago: | |
A lot of these AI-related announcements seem to be doing this sort of | |
baiting. | |
"I made a new thing", go to the repo, COMING SOON. Or this, here's | |
the paper, no we won't show our work. | |
smcnally wrote 16 hours 17 min ago: | |
That repoâs not listed | |
[1]: https://github.com/orgs/HVision-NKU/repositories | |
stanislavb wrote 18 hours 4 min ago: | |
Seems so. I was about to report about it, too. | |
keikobadthebad wrote 18 hours 34 min ago: | |
It'll be good if the girl and the giant squirrel are ever seen in the | |
same park at the same time. | |
freefruit wrote 18 hours 47 min ago: | |
So is Amazon flooded with hyper niche e-books yet? | |
m463 wrote 14 hours 49 min ago: | |
I went to buy an air fryer. There were several | |
specific-air-fryer-model recipe books available. But they were all | |
garbage auto-generated stuff. | |
I complained to amazon, and they said since I hadn't purchased the | |
book they couldn't do anything. So I bought the book, complained, | |
and returned it. | |
The chapters devoted to the details of the specific air fryer model | |
were either very general (almost quotes of product description on | |
amazon), or just plain wrong. | |
What I thought I would get would be like the magic lantern books | |
about specific camera models. Instead it was auto-generated pages of | |
nonsense. | |
surfingdino wrote 12 hours 32 min ago: | |
Your real-life example is a good case against using AI-generated | |
legal or medical advice. | |
selalipop wrote 18 hours 32 min ago: | |
Iâm working on a platform for reading hyper niche e-books: [1] I | |
donât think this form of generative AI needs to become a source of | |
spam, carefully designed platforms can let people enjoy their niche | |
content without making them feel isolated | |
[1]: https://tryspellbound.com | |
surfingdino wrote 12 hours 32 min ago: | |
Too late, it has become a source of spam. | |
selalipop wrote 10 hours 9 min ago: | |
Not really useful to give up the fight in the infancy of | |
something with as much surface area as generative AI. | |
Is being used to create spam is not the same as needs to be spam, | |
and we mostly just need platforms that leverage generative AI | |
natively to bridge the gap. | |
surfingdino wrote 1 hour 22 min ago: | |
There is literally zero need for tools to generate text. Humans | |
generate tons of spam already. | |
samspenc wrote 18 hours 54 min ago: | |
Normally I don't mind spelling errors - and there are plenty in the | |
examples - but my question is, did the system really produce "lunch" | |
when the prompt was "they have launch at restraunt" (verbatim from the | |
sample)? I would imagine it got restaurant right, but I would have | |
expected it to produce something like a rocket launch image instead of | |
figuring out the author meant lunch. | |
ffhhj wrote 15 hours 25 min ago: | |
Curious what it would produced with: "they have launch a | |
rockestaurant". | |
taneq wrote 7 hours 41 min ago: | |
Bistromathics! | |
godelski wrote 17 hours 29 min ago: | |
"He felt very frightened and run", "There is a huge amount of | |
treasure in the house!" | |
I suspect that some grammar and spelling issues may be the authors | |
themselves. For example "A Asian Man": "a" instead of "an" is a | |
common mistake for many Asian languages due to not having similar | |
forms in their languages. So considering consistent article errors, I | |
expect this to be an issue from the authors. Not sure the "M" | |
capitalization. Similar things with "The man have breakfast", "They | |
have launch at restaurant", "They play in (the) amusement part." | |
Considering the comics have similar types of error (the squirrel one | |
clearer) I'd chalk it up to language barrier instead of the process. | |
Though LeCun is not wearing gloves on the moon, and well... | |
neckro23 wrote 18 hours 27 min ago: | |
And if it the model is supposed to be so attentive to context, why | |
did it show a desert instead of "dessert"? After all, they just ate | |
"launch". | |
yorwba wrote 13 hours 55 min ago: | |
The model can only attend to context that is part of the input. | |
Most likely they created the image grid by independently feeding | |
the model each prompt together with the reference image. (And the | |
point is to show off that the model output remains consistent | |
despite this independent generation process.) | |
dkarras wrote 18 hours 36 min ago: | |
transformers / attention is very robust against typos as they take | |
the entire context into account just like we do. launch any free LLM | |
and ask them questions with typos that you would notice and | |
auto-correct and you'll see that the models just don't care and | |
understand them. actually they are so resilient that they understand | |
very garbled text without breaking a sweat. | |
noneeeed wrote 10 hours 47 min ago: | |
I often use ChatGPT in learning spanish, I find it's great for | |
explaining distinctions between words with similar meanings where a | |
dictionary isn't always a lot of help. | |
I am constantly surprised by how well it copes with my typos, | |
grammatical errors and generally poor spelling. | |
BoorishBears wrote 17 hours 9 min ago: | |
There's honestly something uncanny about how well they do. | |
In the "early days" of GPT-4 I tried testing it as a way to get | |
around poor transcription for an in-car voice assistant. It | |
managed: "I'm how dew yew say... Freud?" => Turn up the | |
temperature... which was nonsense most people would stare at for a | |
long time before making any sense of. | |
LeoPanthera wrote 18 hours 58 min ago: | |
The rate of progress of generative AI is honestly quite scary. | |
ed_mercer wrote 18 hours 38 min ago: | |
Really? Feels like nothing much is happening lately. | |
newswasboring wrote 10 hours 23 min ago: | |
What are you talking about? ChatGPT-3 came out less than 4 years | |
ago. Stable diffusion's first version around that too. In less than | |
4 years we went from nothing to making janky but believable video | |
clips. This is not fast enough for you? | |
vouaobrasil wrote 14 hours 35 min ago: | |
Progress comes in spurts. Due to the negative reactions to AI by | |
some (artists), the system wants it to appear that nothing is | |
happening so that the next wave of AI can be created in relative | |
peace, at which time it will be too late to stop it. | |
We have been conditioned to only react to hype and "news", rather | |
than analyze reality and see the danger. | |
thejohnconway wrote 10 hours 10 min ago: | |
Which âsystemâ? | |
vouaobrasil wrote 8 hours 32 min ago: | |
The global capitalist system, or the emergent behaviour that | |
comes out of a mass of humanity addicted to technological | |
development through wealth accumulation. | |
thejohnconway wrote 6 hours 51 min ago: | |
Such a system can't want anything. | |
vouaobrasil wrote 5 hours 47 min ago: | |
It's a term I use for emergent behaviour. And some | |
philosophers of technology would disagree with you, such as | |
the panpsychists. We are just a bag of cells and yet we | |
speak of "wanting" things even though we might just be | |
deterministic bags of blood. | |
<- back to front page |