wordfreq is not just concerned with formal printed words. It collected more conversational language usage from two sources in particular: Twitter and Reddit.
Now Twitter is gone anyway, its public APIs have shut down,
Reddit also stopped providing public data archives, and now they sell their archives at a price that only OpenAI will pay.
There’s still the Fediverse.
I mean, that doesn’t solve the LLM pollution problem, but…
Endoscope profile pics are the future.