

Diffusion models iteratively convert noise across a space into forms and that’s what they are trained to do. In contrast to, say, a GPT that basically performs a recursive token prediction in sequence. They’re just totally different models, both in structure and mode of operation. Diffusion models are actually pretty incredible imo and I think we’re just beginning to scratch the surface of their power. A very fundamental part of most modes of cognition is converting the noise of unstructured multimodal signal data into something with form and intention, so being able to do this with a model, even if only in very very narrow domains right now, is a pretty massive leap forward.
I agree. I’m generally pretty indifferent to this new generation of consumer models–the worst thing about it is the incredible amount of idiots flooding social media witch hunting it or evangelizing it without any understanding of either the tech or the law they’re talking about–but the people who use it so frequently for so many fundamental things that it’s observably diminishing their basic competencies and health is really unsettling.