• Excrubulent@slrpnk.net
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    3 days ago

    whilst it is adding some productivity

    Is it though? Like what’s the evidence of that? If it just feels like it must be true, I have some bad news about that:

    https://arstechnica.com/ai/2025/07/study-finds-ai-tools-made-open-source-software-developers-19-percent-slower/

    The most interesting part of this isn’t that it slowed them down when they expected to be faster, it’s that even after it slowed them down, they couldn’t tell and were fooled that they had been faster.

    Look at the graph, especially the last two lines:

    https://cdn.arstechnica.net/wp-content/uploads/2025/07/aicodingchart-1024x507.png

    My theory about this is that LLMs were tasked with giving useful output, but they couldn’t do that, because they have no fidelity, so instead they found a shortcut, which was to trick people into thinking they were being useful. They found the same loophole that conmen have used for millenia, and automated it. It’s the AI alignment problem, only for some reason people aren’t talking about it, maybe because they don’t want to believe that we’re this easily manipulated.

    There’s no reason to believe LLMs have gotten any better at actually doing useful work in the meantime in the absence of any objective measure of it. I think the best explanation for their “improvement” is that they have simply gotten better at fooling us.

    • percent@infosec.pub
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 days ago

      That’s from almost a year ago. I’m sure it was accurate at the time, but LLMs got a lot more useful around December 2025 or so. Tooling for them has also evolved a lot since then.

        • percent@infosec.pub
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          22 hours ago

          Unfortunately, I don’t know what data I can share, so I’ll err on the side of caution and share none 🙂. But I suppose I can share a little more general insight:

          Modern “agentic” (yeah I’m tired of that word too) techniques, patterns, and tools, paired with modern LLMs allow for much more autonomy than what was available a year ago.

          Are agents faster than skilled engineers per task? No, not in most cases. But they allow engineers to scale horizontally, knocking out many tasks in parallel.

          That’s the performance gain: Foster autonomy for horizontal scaling. Build/optimize projects’ AGENTS.md and SKILL.md files[1].

          Agents can work for some long runs (some engineers even run them overnight), given a safe environment/project with guardrails — mostly the same guardrails that human engineers have had for years: Statically typed languages, TDD, good test coverage, code reviews (both agents and human[2]), CI pipelines, etc.

          They still need human engineers to operate them; the workflow is just different now, and there’s a learning curve for it.

          Whether we like it or not (personally, I miss the old days), this is just how it is now. We have not even reached the peak yet. This is the least autonomous that agents will ever be.


          1. The bigger the repo, the more important this probably is. Structure them so they don’t bloat the context windows with unnecessary info. ↩︎

          2. I usually wait for the AI agent review cycles to settle first — no need to spend human engineering time on potential slop that will probably get fixed autonomously. ↩︎

          • Excrubulent@slrpnk.net
            link
            fedilink
            English
            arrow-up
            2
            ·
            19 hours ago

            You’ve been given evidence that people cannot trust their own perceptions of what these agents do, and you replied by telling a bunch of stories about why you think you personally can trust your perceptions. My 12-year-old did the same thing when I tried to explain this to them.

            Engineers being spread thinner to manage a wider number of tasks whilst reviewing shitty LLM noise that they didn’t write is inevitably going to make horrible code that’s impossible to maintain and will cost massive amounts of time and resources in the long run.

            And the idea that it allows more things to be done is just a bunch of “it makes you faster” assessments in a trenchcoat.

            Agentic or not, they still have zero fidelity. Fidelity can only come from an internal model of reality that the network is comparing its inputs to, and I’m pretty sure you don’t get that without AGI.

            The data we have till this point shows that they don’t help, they only create an illusion of helping. And until you can show that that has fundamentally changed, then you have to assume that the improvements you’re seeing are just improved illusions.

            • percent@infosec.pub
              link
              fedilink
              English
              arrow-up
              1
              ·
              18 hours ago

              You’ve been given evidence that people cannot trust their own perceptions of what these agents do, and you replied by telling a bunch of stories about why you think you personally can trust your perceptions. My 12-year-old did the same thing when I tried to explain this to them.

              You asked for data. I (probably) can’t give you the data, so I gave you what I could: a few things gleaned from both objective data (collected from a significant number of engineers) and my own anecdotal experience. You are free to disregard it, and I wouldn’t even blame you. There are lots of fools on the internet, and there’s a decent chance that I’m just another one 🙂.

              Engineers being spread thinner to manage a wider number of tasks whilst reviewing shitty LLM noise that they didn’t write is inevitably going to make horrible code that’s impossible to maintain and will cost massive amounts of time and resources in the long run.

              This was true a year ago. Even like seven months ago. Hell, even three months ago, I would have agreed with you a LOT more than I do today – mostly because I was just forced learn these things more in-depth quite recently. “Shitty LLM noise” is a very early part of the learning curve. In a way, it’s similar to “Hello world.” Discard it and figure out how get more useful results.

              In many companies that have adopted AI, engineers are still responsible for their code. Any slop in the codebase is the fault of the engineer that introduced it (and the engineer[s] that reviewed it), regardless of whether it’s hand-written or generated. So far, I have not seen anyone merge unmaintainable, “shitty LLM noise” into enterprise codebases – that would be very risky. (It probably happens in other places like Microsoft, I just haven’t seen it myself. It would be unacceptable.)

              Anyway, you’ll see all this eventually, when some data gets published. I’d gain nothing by convincing anyone of this, so I won’t try 🙂.

              • Excrubulent@slrpnk.net
                link
                fedilink
                English
                arrow-up
                1
                ·
                11 hours ago

                This is just a statement of faith in your ability to judge these things accurately. Nowhere in here do I see any evidence that you’ve even considered that the reason you’ve changed your attitude towards the tech is that it’s just gotten so good at fooling people that it’s finally got you.

                You don’t gain much from trying to convince me, but you could gain a lot from being more sceptical. People invented science to address the fact that our intuitive understanding doesn’t always reflect reality.

                Science and the collection of objective data stops us from doing this:

                A three-panel illustration of a child with two water glasses on a table in front of them.  In the first panel, the glasses are identical and full.  In the second, someone is pouring one glass's contents into a tall thin glass.  In the third, the tall glass of water has replaced the glass that was poured into it, and the child is pointing to the tall glass to indicate they believe it contains more water.

                There are a bunch of things that our brains just don’t understand intuitively, so we need to check our intuition against measurement. There’s no shame in that, but when it’s pointed out, then you have a chance to check yourself.

                But you don’t seem to understand that. When you say:

                Anyway, you’ll see all this eventually, when some data gets published.

                you are demonstrating that you are the perfect mark for this stuff, because you are not reflecting on your own thought process to see where it might be failing you.

                • percent@infosec.pub
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  edit-2
                  6 hours ago

                  This is just a statement of faith in your ability to judge these things accurately. Nowhere in here do I see any evidence that you’ve even considered that the reason you’ve changed your attitude towards the tech is that it’s just gotten so good at fooling people that it’s finally got you.

                  Yet in all of your replies, you seem to have assumed early on that I’ve been fooled, based on outdated data. Do you just assume that newer data just doesn’t exist anywhere, and I’m lying about it? (To be clear: I wouldn’t blame you. There’s an old proverb: “Believe nothing you hear, and only half of what you see,” or something like that.)

                  you could gain a lot from being more sceptical

                  Another assumption that I wasn’t skeptical.

                  Anyway, the rest of your reply continues with the assumption that there was no data or objectivity on my part, so I won’t keep beating a dead horse. Just wait for newer data. It might be old by the time you see it, but still useful.


                  Edit: I suppose the number of recent layoffs might be useful (or at least interesting) data. Suddenly many different, unrelated companies had too many engineers – quite a contrast to the engineer shortage just a few years ago. Correlation ≠ causation and all, but interesting nonetheless.


                  Edit 2: I just noticed this paragraph in that link you shared:

                  And even for complex coding projects like the ones studied, the researchers are also optimistic that further refinement of AI tools could lead to future efficiency gains for programmers. Systems that have better reliability, lower latency, or more relevant outputs (via techniques such as prompt scaffolding or fine-tuning) “could speed up developers in our setting,” the researchers write. Already, they say there is “preliminary evidence” that the recent release of Claude 3.7 “can often correctly implement the core functionality of issues on several repositories that are included in our study.”

                  Claude 3.7 was released in February 2025. Also, I highly doubt 3.7 was good enough to make engineers more productive, overall (though I don’t have data on those old models). Relative to the speed of evolution of LLMs, harnesses, and people’s skills in using them, the data behind this article is ancient.


                  Edit 3:

                  In that article you shared, they link to the study in the second paragraph. Follow that link, and you’ll see this at the top:

                  Update: In February 2026, we published new data on the productivity impact of late-2025 AI tools.

                  There were selection effects in the follow-up study, but seemed worth mentioning anyway.