The Internet Archive’s Fight to Save Itself

If you step into the headquarters of the Internet Archive on a Friday after lunch, when it offers public tours, chances are you’ll be greeted by its founder and merriest cheerleader, Brewster Kahle.

You cannot miss the building; it looks like it was designed for some sort of Grecian-themed Las Vegas attraction and plopped down at random in San Francisco’s foggy, mellow Richmond district. Once you pass the entrance’s white Corinthian columns, Kahle will show you the vintage Prince of Persia arcade game and a gramophone that can play century-old phonograph cylinders on display in the foyer. He’ll lead you into the great room, filled with rows of wooden pews sloping toward a pulpit. Baroque ceiling moldings frame a grand stained glass dome. Before it was the Archive’s headquarters, the building housed a Christian Science church.

I made this pilgrimage on a breezy afternoon last May. Along with around a dozen other visitors, I followed Kahle, 63, clad in a rumpled orange button-down and round wire-rimmed glasses, as he showed us his life’s work. When the afternoon light hits the great hall’s dome, it gives everyone a halo. Especially Kahle, whose silver curls catch the sun and who preaches his gospel with an amiable evangelism, speaking with his hands and laughing easily. “I think people are feeling run over by technology these days,” Kahle says. “We need to rehumanize it.”

In the great room, where the tour ends, hundreds of colorful, handmade clay statues line the walls. They represent the Internet Archive’s employees, Kahle’s quirky way of immortalizing his circle. They are beautiful and weird, but they’re not the grand finale. Against the back wall, where one might find confessionals in a different kind of church, there’s a tower of humming black servers. These servers hold around 10 percent of the Internet Archive’s vast digital holdings, which includes 835 billion web pages, 44 million books and texts, and 15 million audio recordings, among other artifacts. Tiny lights on each server blink on and off each time someone opens an old webpage or checks out a book or otherwise uses the Archive’s services. The constant, arrhythmic flickers make for a hypnotic light show. Nobody looks more delighted about this display than Kahle.

Brewster Kahle, the Internet Archive's founder and biggest cheerleader.

Photograph: Gabriela Hasbun

It is no exaggeration to say that digital archiving as we know it would not exist without the Internet Archive—and that, as the world’s knowledge repositories increasingly go online, archiving as we know it would not be as functional. Its most famous project, the Wayback Machine, is a repository of web pages that functions as an unparalleled record of the internet. Zoomed out, the Internet Archive is one of the most important historical-preservation organizations in the world. The Wayback Machine has assumed a default position as a safety valve against digital oblivion. The rhapsodic regard the Internet Archive inspires is earned—without it, the world would lose its best public resource on internet history.

Its employees are some of its most devoted congregants. “It is the best of the old internet, and it's the best of old San Francisco, and neither one of those things really exist in large measures anymore,” says the Internet Archive’s director of library services, Chris Freeland, another longtime staffer, who loves cycling and favors black nail polish. “It's a window into the late-’90s web ethos and late-’90s San Francisco culture—the crunchy side, before it got all tech bro. It's utopian, it's idealistic.”

The Internet Archive headquarters houses clay sculptures by artist Nuala Creed. Each sculpture depicts an employee or collaborator; getting one is a rite of passage.

Photograph: Gabriela Hasbun

But the Internet Archive also has its foes. Since 2020, it’s been mired in legal battles. In Hachette v. Internet Archive, book publishers complained that the nonprofit infringed on copyright by loaning out digitized versions of physical books. In UMG Recordings v. Internet Archive, music labels have alleged that the Internet Archive infringed on copyright by digitizing recordings.

In both cases, the Internet Archive has mounted “fair use” defenses, arguing that it is permitted to use copyrighted materials as a noncommercial entity creating archival materials. In both cases, the plaintiffs characterized it as a hub for piracy. In 2023, it lost Hachette. This month, it lost an appeal in the case. The Archive could appeal once more, to the Supreme Court of the United States, but has no immediate plans to do so. (“We have not decided,” Kahle told me the day after the decision.)

A judge rebuffed an attempt to dismiss the music labels’ case earlier this year. Kahle says he’s thinking about settling, if that’s even an option.

The combined weight of these legal cases threatens to crush the Internet Archive. The UMG case could prove existential, with potential fines running into the hundreds of millions. The internet has entrusted its collective memory to this one idiosyncratic institution. It now faces the prospect of losing it all.

Kahle has been obsessed with creating a digital library since he was young, a calling that spurred him to study artificial intelligence at MIT. “I wanted to build the library of everything, and we needed computers that were big enough to be able to deal with it,” he says.

After graduating in 1982, he worked at the supercomputing startup Thinking Machines Corporation. While there, he developed a program called Wide Area Information Server (WAIS), a way to search for data on remote computers. He left to cocreate a startup of the same name, which he sold to AOL in 1995. The next year, he launched a two-headed project from his attic: “AI and IA.”

That “AI” was a for-profit company called Alexa Internet—“Alexa” a nod to the Library of Alexandria—alongside the nonprofit Internet Archive. The two projects were interlinked; Alexa Internet crawled the web, then donated what it collected to the Internet Archive. Kahle couldn’t quite make the business model work. When Amazon made an offer in 1999, it seemed prudent to accept. The Everything Store paid a reported $250 million in stock for Alexa, severing the AI from IA and leaving Kahle a wealthy man.

Kahle stayed on with Alexa for a few years but left in 2002 to focus on the Internet Archive. It has been his vocation ever since. “His entire being is committed to the Archive,” says copyright scholar Pam Samuelson, who has known Kahle since the ’90s. “He lives and breathes it.”

If Silicon Valley has a Mr. Fezziwig, it’s Kahle. He’s not an ascetic; he owns a handsome black sailboat anchored in a slip at a tony yacht club. But his day-to-day life is modest. He ebikes to work and dresses like a guy who doesn’t care about clothes, and while he used to love Burning Man—he and his wife, Mary Austin, got married there in 1992—now he thinks it’s gotten too big. (Their current bougie-hippie pastime is the seasteading gathering Ephemerisle, where boaters hitch themselves together and create temporary islands in the Sacramento River Delta every July.)

What he really loves, above all, is his job.

“The story of Brewster Kahle is that of a guy who wins the lottery,” says longtime archivist Jason Scott. “And he and his wife, Mary, turned around and said, awesome, we get to be librarians now.”

The Internet Archive’s headquarters, a former church. The graffiti van was commissioned by Amir Esfahani, who runs the Archive’s artist-in-residence program.

Photograph: Gabriela Hasbun

Kahle is now the merry custodian to a uniquely comprehensive catalog, spanning all manner of digital and physical media, from classic video games to live recordings of concerts to magazines and newspapers to books from around the world. It recently backed up the island of Aruba’s cultural institutions. It’s an essential tool for everything from legal research—particularly around patent law—to accountability journalism. “There are other online archiving tools,” says ProPublica reporter Craig Silverman, “but none of them touch the Internet Archive.” It is, in short, a proof machine.

What makes the Internet Archive unique is its willingness to push boundaries in ways that traditional libraries do not. The Library of Congress also archives the web—but only after it has notified, and often asked permission from, the websites it scrapes.

“The Internet Archive has always been a little risky,” says University of Waterloo historian Ian Milligan, who has a forthcoming book on web archiving. Its distinctive utility is entwined with its long-standing outré approach to copyright. In fact, Kahle and the Internet Archive sued the government more than two decades ago, challenging the way the Copyright Renewal Act of 1992 and the Copyright Term Extension Act of 1998 had expanded copyright law. He lost that case—but, certainly, not his desire to keep pushing.

One of those pushes came in 2005. At the time, beloved hacker Aaron Swartz was often working on Internet Archive projects, and he cocreated and led the development of a new initiative called the Open Library program along with Kahle. The goal was to create one webpage for every book in the world. Kahle saw it as an alternative to Google Books, one that wasn’t driven by commercial interests but loftier and decidedly kumbaya information-wants-to-be-free ambitions.

In addition to its attempt to catalog every book ever, the project sought to make copies available to readers. To that end, it scans physical books, then allows people to check out the digitized versions. For over a decade, it has operated using a framework called controlled digital lending (CDL), where digitized books are treated as old-fashioned physical books rather than ebooks. The books it lends out were either purchased by the Internet Archive or donated by other libraries, organizations, or individuals; according to CDL principles, libraries that own a physical copy of a book should be able to lend it digitally.

An archive employee at work.

Photograph: Gabriela Hasbun

The project primarily appeals to researchers for whom specific books are hard to attain elsewhere, rather than casual readers. “Try checking out one of our books and then reading it—it’s tough going,” Kahle says. He’s not lying. A blurry scan of a physical book on a desktop screen compared to a regular ebook on a Kindle is like music from a tinny iPhone speaker versus a Bose surround sound system. Most borrowers read what they check out for less than five minutes.

Like other digital media, ebooks are typically licensed rather than sold outright, at a much higher rate than the cover price. Libraries who license ebooks get a limited number of loans; if they stop paying, the book vanishes. CDL is an attempt to give libraries more control over their inventory, and to expand access to books in a library’s collection that exist only as physical copies.

For years, publishers ignored the Internet Archive’s book-scanning spree. Finally, during the pandemic, after the Internet Archive took one liberty too many with its approach to CDL, they snapped.

In March 2020, as schools and libraries abruptly shut down, they faced a dilemma. Demand for ebooks far outstripped their ability to loan them out under restrictive licensing deals, and they had no way of lending out books that existed only in physical form. In response, the Internet Archive made a bold decision: It allowed multiple people to check out digital versions of the same book simultaneously. It called this program the National Emergency Library. “We acted at the request of librarians and educators and writers,” says Chris Freeland.

Kahle remembers feeling a vocational tug in that moment for the Internet Archive to do whatever it could to expand access. He thought they had broad support, too. “We got over 100 libraries to sign on and say ‘help us,’” Kahle says. “They stood behind the National Emergency Library and said ‘do this under our names.’”

Dave Hansen, now executive director of the nonprofit Authors Alliance, was a librarian at Duke University at the time. “We had tremendous challenges getting books for our students,” he says. “What they did was a good-faith effort.”

The Internet Archive's collection includes a sprawling array of old newspapers and periodicals from around the world.

Photograph: Gabriela Hasbun

Not everyone agreed. Prominent writers vehemently criticized the project, as did the Authors Guild and the National Writers Union. “They are not a library. Libraries buy books and respect copyright. They are fraudsters posing as saints,” author James Gleick wrote on Twitter. (Today, Gleick maintains that the Internet Archive is not a library, though he says “fraudsters was a little harsh.”)

“They seem to work by fiat,” says Bhamati Viswanathan, a copyright lawyer who signed an amicus brief on behalf of the publishers in the Hachette case. Viswanathan thinks it was arrogant to circumvent the licensing system. “Very much like what the tech companies seem to be doing, which is, ‘we're going to ask forgiveness, not permission.’”

The Internet Archive was in its first full-blown PR crisis. The coalition of publishing houses filed its lawsuit in June 2020, alleging that both the National Emergency Library and the Internet Archive’s broader Open Library program violated copyright. A few weeks later, the Internet Archive scuttled the National Emergency Library and reverted to its traditional, capped loan system, but it made no difference to the publishers.

The publishing houses and their supporters maintain that the Archive’s behavior harmed authors. “Internet Archive is arguing that it is OK to make and publicly distribute unauthorized copies of an author’s work to the global public,” Terrance Hart, the general counsel for the Association of American Publishers, tells WIRED. “Imagine if everyone started doing the same. The only existential threat here is the one posed by Internet Archive to the livelihoods of authors and to the copyright system itself in the digital age.”

After the lawsuit was filed, over a thousand writers signed a letter in support of libraries and the Internet Archive to be able to loan digital books, including Naomi Klein and Daniel Ellsberg. One supportive author, Chuck Wendig, had very publicly changed his mind after initially tweeting criticism. Even some writers who currently belong to and support the Authors Guild, like Joanne McNeil, were staunch supporters of the Archive. She sometimes reads out-of-print books using the lending service and still sees it as a vital tool. “I hope my books are in the Open Library project,” she says, telling me that she’s already aware that her critically acclaimed but modestly popular books aren’t widely available. “At least I’ll know that way there’s someplace someone can find them.”

The shows of support didn’t matter. The publishers didn’t back down. In March 2023, the Internet Archive lost the case. This September, it lost its appeal. The court refuted the fair use arguments, insisting that the organization had not proved that it wasn’t financially harming publishers. In the meantime, legal bills continue to pile up for the Internet Archive’s next challenge.

After the initial ruling in Hachette v. Internet Archive, the parties agreed upon settlement terms; although those terms are confidential, Kahle has confirmed that the Internet Archive can financially survive it thanks to the help of donors. If the Internet Archive decides not to file a second appeal, it will have to fulfill those settlement terms. A blow, but not a death knell.

The other lawsuit may be far harder to survive. In 2023, several major record labels, including Universal Music Group, Sony, and Capitol, sued the Internet Archive over its Great 78 Project, a digital archive of a niche collection of recordings of albums in the obsolete record format known as 78s, which was used from the 1890s to the late 1950s. The complaint alleges that the project “undermines the value of music.” It lists 2,749 recordings as infringed, which means damages could potentially be over $400 million.

“One thing that you can say about the recording industry,” Pam Samuelson says, “is that there are no statutory damages that are too large for them to claim.”

The Internet Archive's basement, the site of many animated discussions about encryption and internet freedom.

Photograph: Gabriela Hasbun

As with the book publishing case, the Internet Archive’s defense hinges on fair use. It argues that preserving obsolete versions of these records, complete with the crackles and pops from the old shellac resin, makes history accessible. Copyright law is notoriously unpredictable, and some find the Internet Archive’s case shaky. “It doesn’t strike me, necessarily, as a winning fair use argument,” says Zvi Rosen, a law professor at Southern Illinois University who focuses on copyright.

James Grimmelmann, a professor of digital and information law at Cornell University, thinks the labels are “vastly exaggerating the commercial harm” from the project. (If there was a sizable audience for extremely low-quality versions of songs, he reasons, why wouldn’t the labels be putting out 78-style releases?) On average, each recording is accessed only once a month. Still, Grimmelmann isn’t convinced that will matter. “They are directly reproducing these works,” he says. “That’s a very hard lift for a judge.”

It may be years before the case is resolved, which means the uncertainty about the Internet Archive’s future is likely to linger, and potentially spread. And if it is resolved through either a settlement or a win for the recording industry, other copyright holders could be inspired to sue. “I'm worried about the blast radius from the music lawsuit,” Grimmelmann says.

In Kahle’s view, the Internet Archive’s legal challenges are part of a larger story about beleaguered libraries in the United States. He likes to frame his plight as a battle against a cadre of nefarious publishers, one piece of a larger struggle to wrest back the right to own books in the digital age. (Get him started on the topic, and he’ll likely point out that both ebook distributor OverDrive and publishing company Simon & Schuster are owned by the global investment firm Kohlberg Kravis Roberts & Co.) He’s keenly aware that everything he has built is in danger. “It’s the time of Orwell but with corporations,” Kahle says. “It’s scary.”

Losing the Archive is, indeed, a frightening prospect. “There is a misperception that things on the web are forever—but they really, really aren't,” says Craig Silverman, who thinks the nonprofit’s demise would make certain types of scholarship and reporting “way more difficult, if not impossible,” in addition to representing a disappearance of a bastion of collective memory.

Just this September, Google and the Internet Archive announced a partnership to allow people to see previous versions of websites surfaced through Google Search by linking to the Wayback Machine. Google previously offered its own cached historical websites; now it leans on a small nonprofit.

The Internet Archive also has challenges beyond its legal woes. For starters, it’s getting harder to archive things. As Mark Graham, director of the Wayback Machine, told me, the rise of apps with functions like livestreaming, especially when they’re limited to certain operating systems, presents a technical challenge. On top of that, paywalls are an obstacle, as is the sheer and ever-increasing amount of content. “There’s just so much material,” he says. “How does one know what to prioritize?”

Then there’s AI, once again. Thus far, the Internet Archive has sidestepped or been exempt from the new scrutiny on web crawling as it relates to AI training data. This June, for example, when Reddit announced that it was updating its scraping policy, it specifically noted that it was still allowing “good faith actors” like the Internet Archive to crawl it. But as opposition to rampant AI data scraping grows, the Internet Archive may yet face a new obstacle: If regulators and lawmakers are clumsy in attempts to curb permissionless AI web scraping, it could kneecap services like the Wayback Machine, which functions precisely because it can trawl and reproduce vast amounts of data.

The rise of AI has already soured some creative types on the Internet Archive’s approach to copyright. While Kahle views his creation as a library on the side of the little guy, opponents strenuously dispute this view. They paint Kahle as a tech-wolf disguised in librarian-sheep clothing, stuck in a mentality better suited for the Napster era. “The Internet Archive is really fighting the battles of 20 years ago, when it was as simple as ‘publishers bad, anything that hurts publishers good,’” says Neil Turkewitz, a former Recording Industry Association of America executive who has criticized the Archive’s copyright stances. “But that’s not the world we live in.”

A portion of the servers holding the Archive's vast data collection. Each time someone accesses a book, website, movie, song, or other file, a light flashes.

Photograph: Gabriela Hasbun

When I talk to Kahle over Zoom this September, shortly after he’d learned that the Internet Archive had lost the appeal, he’s agitated—an internet prophet literally wandering around in the wilderness. He’s perched in front of jagged cliffs while hiking outside of Arles, France, a blue baseball cap pulled over his hair, cheeks extra-ruddy in the sun, his default affability tempered by a sense of despondency. He hadn’t known about the timing of the ruling in advance, so he interrupted a weeklong vacation with Mary to jump back into work crisis mode. “It’s just so depressing,” he says.

As he sits on a rock with his phone in his hand, Kahle says the US legal system is broken. He says he doesn’t think this is the end of the lawsuits. “I think the copyright cartel is on a roll,” he says. He frets that copycat cases could be on the way. He’s the most bummed-out guy I’ve ever seen on vacation in the south of France. But he’s also defiant. There’s no inkling of regret, only a renewed sense that what he’s doing is righteous. “We have such an opportunity here. It’s the dream of the internet,” he says. “It’s ours to lose.” It sounds less like a statement and more like a prayer.

You Might Also Like …