Internet Archive Study Reveals 35% of New Websites Are AI-Generated Since 2022
Sonic Intelligence
A study found 35% of new websites since 2022 are AI-generated, altering web content.
Explain Like I'm Five
"Smart computer programs are now making lots of new websites, so many that about one out of every three new websites you see was made by a computer since 2022! Scientists found that these computer-made websites are often happier and use simpler words, but they aren't necessarily full of lies."
Deep Intelligence Analysis
The research, which leveraged the Internet Archive's vast data and employed the high-accuracy Pangram v3 AI-detection software, provides critical data points for understanding the "Dead Internet Theory" in a new light. Prior to ChatGPT's release in late 2022, the proportion of AI-generated websites was negligible, highlighting the explosive growth. Crucially, the study systematically tested six common critiques leveled against AI-generated text. Contrary to widespread fears, the researchers found that AI-generated content did not necessarily lead to a proliferation of factual inaccuracies or a failure to cite sources. Instead, the primary confirmed effects were a reduction in semantic diversity and a tendency towards a more positive, less verbose tone.
The implications of this rapid AI integration are multifaceted. While concerns about disinformation may be partially alleviated by these findings, the homogenization of online discourse and the potential for a less semantically rich internet present new challenges. The sheer volume of AI-generated content could fundamentally alter search engine optimization, content discovery, and the perceived authenticity of online information. This transformation necessitates a re-evaluation of content strategies for publishers, a focus on AI literacy for consumers, and continued research into the long-term effects on human creativity and critical thinking in an increasingly AI-permeated digital environment.
Impact Assessment
This research provides quantitative evidence of AI's rapid and significant impact on the internet's content landscape, confirming a substantial shift in how digital information is created. It challenges some prevailing assumptions about AI-generated text, particularly regarding disinformation and source citation, while highlighting new concerns about content homogenization.
Key Details
- Researchers from Stanford, Imperial College London, and the Internet Archive conducted the study.
- 35% of newly published websites by mid-2025 were classified as AI-generated or AI-assisted.
- This figure is up from zero before ChatGPT's launch in late 2022.
- The study sampled websites from August 2022 to May 2025 using the Wayback Machine.
- Pangram v3 AI-detection software was used, demonstrating the highest detection rate.
- Only two of six common critiques of AI text were confirmed: less semantic diversity and a more positive tone.
- AI-generated text was not found to proliferate lies or cut out sources.
Optimistic Outlook
The rapid adoption of AI for website generation could democratize content creation, enabling more individuals and small businesses to establish an online presence efficiently. If AI tools improve in diversity and factual accuracy, they could significantly boost productivity and content volume without necessarily degrading overall quality.
Pessimistic Outlook
A web dominated by AI-generated content risks a homogenization of voice and style, potentially leading to a less diverse and engaging internet experience. While the study found no increase in lies, the sheer volume of AI-generated text could still make it harder to discern authoritative human-created content, impacting trust and information discovery.
Get the next signal in your inbox.
One concise weekly briefing with direct source links, fast analysis, and no inbox clutter.
More reporting around this signal.
Related coverage selected to keep the thread going without dropping you into another card wall.