Blocking the Internet Archive Won’t Stop AI, But It Will Erase the Web’s Historical Record

Share It

Español

Imagine a newspaper publisher announcing it will no longer allow libraries to keep copies of its paper.

That’s effectively what’s begun happening online in the last few months. The Internet Archive—the world’s largest digital library—has preserved newspapers since it went online in the mid-1990s. The Archive’s mission is to preserve the web and make it accessible to the public. To that end, the organization operates the Wayback Machine, which now contains more than one trillion archived web pages and is used daily by journalists, researchers, and courts.

But in recent months The New York Times began blocking the Archive from crawling its website, using technical measures that go beyond the web’s traditional robots.txt rules. That risks cutting off a record that historians and journalists have relied on for decades. Other newspapers, including The Guardian, seem to be following suit.

For nearly three decades, historians, journalists, and the public have relied on the Internet Archive to preserve news sites as they appeared online. Those archived pages are often the only reliable record of how stories were originally published. In many cases, articles get edited, changed, or removed—sometimes openly, sometimes not. The Internet Archive often becomes the only source for seeing those changes. When major publishers block the Archive’s crawlers, that historical record starts to disappear.

The Times says the move is driven by concerns about AI companies scraping news content. Publishers seek control over how their work is used, and several—including the Times—are now suing AI companies over whether training models on copyrighted material violates the law. There’s a strong case that such training is fair use.

Whatever the outcome of those lawsuits, blocking nonprofit archivists is the wrong response. Organizations like the Internet Archive are not building commercial AI systems. They are preserving a record of our history. Turning off that preservation in an effort to control AI access could essentially torch decades of historical documentation over a fight that libraries like the Archive didn’t start, and didn’t ask for.

If publishers shut the Archive out, they aren’t just limiting bots. They’re erasing the historical record.

Archiving and Search Are Legal

Making material searchable is a well-established fair use. Courts have long recognized it’s often impossible to build a searchable index without making copies of the underlying material. That’s why when Google copied entire books in order to make a searchable database, courts rightly recognized it as a clear fair use. The copying served a transformative purpose: enabling discovery, research, and new insights about creative works.

The Internet Archive operates on the same principle. Just as physical libraries preserve newspapers for future readers, the Archive preserves the web’s historical record. Researchers and journalists rely on it every day. According to Archive staff, Wikipedia alone links to more than 2.6 million news articles preserved at the Archive, spanning 249 languages. And that’s only one example. Countless bloggers, researchers, and reporters depend on the Archive as a stable, authoritative record of what was published online.

The same legal principles that protect search engines must also protect archives and libraries. Even if courts place limits on AI training, the law protecting search and web archiving is already well established.

The Internet Archive has preserved the web’s historical record for nearly thirty years. If major publishers begin blocking that mission, future researchers may find that huge portions of that historical record have simply vanished. There are real disputes over AI training that must be resolved in courts. But sacrificing the public record to fight those battles would be a profound, and possibly irreversible, mistake.

Related Issues

Artificial Intelligence

Creativity & Innovation

Related Updates

Deeplinks Blog by Tori Noble, Corynne McSherry | June 18, 2026

AI Regulation Should Be Rational, Not Retaliatory

The Trump administration’s approach to AI safety, particularly the generative AI models that regularly grab headlines, has been haphazard at best. At worst, it’s unconstitutional. As EFF and our allies explained in an amicus brief, the Pentagon’s actions against one company, Anthropic, violate the First Amendment because they were...

Deeplinks Blog by Tori Noble | June 17, 2026

The Free and Open Web Is Under Attack at the IETF

The ability to access publicly available information using automated tools is a central value and benefit of a free and open internet. Automated access—often called crawling or scraping—powers important, useful tools for locating, preserving, and analyzing online information. For example, crawling and scraping helps journalists, researchers, and watchdog organizations...

Deeplinks Blog by Josh Richman | June 11, 2026

‘News’ Site Keeps Hallucinating EFF Staffers

What do EFF staffers Sarah Chen, Javier Morales, Caitlin Chin, Emma Rodriguez, and Mikko Kopponen have in common? For one thing, they don’t exist. For another, all have been quoted...

Deeplinks Blog by Josh Richman | June 4, 2026

EFF Testifies to Congress on Protecting Americans’ Rights from Government AI

Governments must not adopt emerging and powerful AI technologies without also adopting strong and clear safeguards to protect Constitutional rights, EFF Senior Policy Analyst Dr. Matthew Guariglia testified today to the House Homeland Security Subcommittee on Cybersecurity and Infrastructure Protection.

Deeplinks Blog by Corynne McSherry | April 3, 2026

Tech Nonprofits to Feds: Don’t Weaponize Procurement to Undermine AI Trust and Safety

While the very public fight continues between the Department of Defense and Anthropic over whether the government can punish a company for refusing to allow its technology to be used for mass surveillance, another agency of the U.S. government is quietly working to ensure that this dispute will never...

Legal Case

EFF v. CMS

The Electronic Frontier Foundation has filed a Freedom of Information Act (FOIA) lawsuit to obtain records from the Centers for Medicare & Medicaid Services (CMS) about a multi-state program using AI to evaluate requests for medical care.Launched January 2026, the program known as WISeR (Wasteful and Inappropriate Service Reduction) uses...

Press Release | March 25, 2026

EFF Sues for Answers About Medicare's AI Experiment

SAN FRANCISCO – The Electronic Frontier Foundation (EFF) today filed a Freedom of Information Act (FOIA) lawsuit against the Centers for Medicare & Medicaid Services (CMS) seeking records about a multi-state program that is using AI to evaluate requests for medical care.

Deeplinks Blog by Corynne McSherry, Tori Noble | March 10, 2026

The Government Must Not Force Companies to Participate in AI-powered Surveillance

Update: On March 24, 2026, the Northern District of California granted Anthropic's Motion for Preliminary Injunction, finding that the government's actions were not designed to protect national security, but rather to punish Anthropic. "Punishing Anthropic for bringing public scrutiny to the government’s contracting position is classic illegal First Amendment retaliation."The...

Deeplinks Blog by Corynne McSherry, Matthew Guariglia | March 6, 2026

Weasel Words: OpenAI’s Pentagon Deal Won’t Stop AI‑Powered Surveillance

OpenAI, the maker of ChaptGPT, is rightfully facing widespread criticism for its decisions to fill the gap the U.S. Department of Defense (DoD) created when rival Anthropic refused to drop its restrictions against using its AI for surveillance and autonomous weapons systems. After protests from both users and...

Deeplinks Blog by Matthew Guariglia | March 3, 2026

The Anthropic-DOD Conflict: Privacy Protections Shouldn’t Depend On the Decisions of a Few Powerful People

The U.S. military has officially ended its $200 million contract with AI company Anthropic and has ordered all other military contractors to cease use of their products. Why? Because of a dispute over what the government could and could not use Anthropic’s technology to do. Anthropic had made it...

Related Issues

Artificial Intelligence

Creativity & Innovation

Blocking the Internet Archive Won’t Stop AI, But It Will Erase the Web’s Historical Record

Blocking the Internet Archive Won’t Stop AI, But It Will Erase the Web’s Historical Record

Archiving and Search Are Legal

Related Issues

Related Updates

Related Issues

Follow EFF:

Contact

About

Issues

Updates

Press

Donate