Back to Wire

Science

Internet Archive Study Reveals 35% of New Websites Are AI-Generated Since 2022

Source: 404Media Original Author: Matthew Gault 2 min read Intelligence Analysis by Gemini

Sonic Intelligence

00:00 / 00:00

Signal Summary

A study found 35% of new websites since 2022 are AI-generated, altering web content.

Explain Like I'm Five

"Smart computer programs are now making lots of new websites, so many that about one out of every three new websites you see was made by a computer since 2022! Scientists found that these computer-made websites are often happier and use simpler words, but they aren't necessarily full of lies."

Deep Intelligence Analysis

A collaborative study involving researchers from Stanford, Imperial College London, and the Internet Archive has delivered compelling quantitative evidence of artificial intelligence's profound and rapid impact on the digital content landscape. The findings indicate that a staggering 35% of all new websites published between late 2022 and mid-2025 were either entirely AI-generated or significantly AI-assisted. This dramatic shift, occurring in just three years since the public launch of advanced generative AI models, underscores a fundamental transformation in how online information is produced, moving from predominantly human-authored to a significant hybrid or AI-first model.

The research, which leveraged the Internet Archive's vast data and employed the high-accuracy Pangram v3 AI-detection software, provides critical data points for understanding the "Dead Internet Theory" in a new light. Prior to ChatGPT's release in late 2022, the proportion of AI-generated websites was negligible, highlighting the explosive growth. Crucially, the study systematically tested six common critiques leveled against AI-generated text. Contrary to widespread fears, the researchers found that AI-generated content did not necessarily lead to a proliferation of factual inaccuracies or a failure to cite sources. Instead, the primary confirmed effects were a reduction in semantic diversity and a tendency towards a more positive, less verbose tone.

The implications of this rapid AI integration are multifaceted. While concerns about disinformation may be partially alleviated by these findings, the homogenization of online discourse and the potential for a less semantically rich internet present new challenges. The sheer volume of AI-generated content could fundamentally alter search engine optimization, content discovery, and the perceived authenticity of online information. This transformation necessitates a re-evaluation of content strategies for publishers, a focus on AI literacy for consumers, and continued research into the long-term effects on human creativity and critical thinking in an increasingly AI-permeated digital environment.

AI-assisted intelligence report · EU AI Act Art. 50 compliant

Impact Assessment

This research provides quantitative evidence of AI's rapid and significant impact on the internet's content landscape, confirming a substantial shift in how digital information is created. It challenges some prevailing assumptions about AI-generated text, particularly regarding disinformation and source citation, while highlighting new concerns about content homogenization.

Key Details

Researchers from Stanford, Imperial College London, and the Internet Archive conducted the study.
35% of newly published websites by mid-2025 were classified as AI-generated or AI-assisted.
This figure is up from zero before ChatGPT's launch in late 2022.
The study sampled websites from August 2022 to May 2025 using the Wayback Machine.
Pangram v3 AI-detection software was used, demonstrating the highest detection rate.
Only two of six common critiques of AI text were confirmed: less semantic diversity and a more positive tone.
AI-generated text was not found to proliferate lies or cut out sources.

Optimistic Outlook

The rapid adoption of AI for website generation could democratize content creation, enabling more individuals and small businesses to establish an online presence efficiently. If AI tools improve in diversity and factual accuracy, they could significantly boost productivity and content volume without necessarily degrading overall quality.

Pessimistic Outlook

A web dominated by AI-generated content risks a homogenization of voice and style, potentially leading to a less diverse and engaging internet experience. While the study found no increase in lies, the sheer volume of AI-generated text could still make it harder to discern authoritative human-created content, impacting trust and information discovery.

More reporting around this signal.

Related coverage selected to keep the thread going without dropping you into another card wall.

Science

Pre-1900 LLM Shows Glimpses of Intuition for Quantum Mechanics and Relativity

An LLM trained on pre-1900 text exhibited hints of modern physics intuition.

Science

Microsoft Open-Sources VibeVoice: Frontier Voice AI for Long-Form Audio

Microsoft open-sources VibeVoice, a frontier voice AI for long-form speech processing.

Science

AI Peer Review: Trust Under Scrutiny Amidst Vulnerabilities

AI in peer review faces acute failure modes, raising critical questions about reliability and trust.

Policy

US Lawmakers Propose Bills Targeting AI Chatbot Fraud

US lawmakers propose bills addressing AI chatbot fraud.

Society

Spotify's AI Music Dilemma: User Choice vs. Platform Neutrality

Spotify faces a dilemma regarding AI music: user demand for filters versus platform neutrality.

Tools

LLM Budget Guard: Preventing Runaway AI Agent Costs and Provider Bans

LLM Budget Guard enforces hard cutoffs to prevent runaway AI agent costs and provider account bans.

Internet Archive Study Reveals 35% of New Websites Are AI-Generated Since 2022

Sonic Intelligence

Explain Like I'm Five

Deep Intelligence Analysis

Impact Assessment

Key Details

Optimistic Outlook

Pessimistic Outlook

Get the next signal in your inbox.

More reporting around this signal.

Pre-1900 LLM Shows Glimpses of Intuition for Quantum Mechanics and Relativity

Microsoft Open-Sources VibeVoice: Frontier Voice AI for Long-Form Audio

AI Peer Review: Trust Under Scrutiny Amidst Vulnerabilities

US Lawmakers Propose Bills Targeting AI Chatbot Fraud

Spotify's AI Music Dilemma: User Choice vs. Platform Neutrality

LLM Budget Guard: Preventing Runaway AI Agent Costs and Provider Bans