Understanding the "Hachette v Internet Archive" Case

25 March 2023, by Sherri Mastrangelo

The Internet Archive, a non-profit organization based in San Francisco, has been an immensely valuable resource for genealogy researchers, including myself, for decades. I refer to it for finding local town histories, city directories, and biographies - most of which are out of print and not available elsewhere. I also utilize their Wayback Machine to reference broken links to older internet pages that no longer exist, but have been preserved by the Archive.  All of these sources are a beacon of truth in a world where it’s getting harder and harder to distinguish fact from fiction, especially online.

For the past three years the Internet Archive battled against several book publishers, over licensing fees and copyright issues with their lending library programs, including one launched during the pandemic called the “National Emergency Library”. The case is called Hachette Book Group, Inc. v. Internet Archive, and though litigation started back in 2020, the lawyers were able to argue their cases in front of a New York federal judge a few days ago. A judgement came back on the 24th - which I’ll get to shortly.

(The above graphic was created with Canva’s new text to image AI generator - so fun!)

For almost 27 years, the Internet Archive (“IA”), founded by Brewster Kahle, has had the mission to “provide Universal Access to All Knowledge” by creating a digital history of our society through archiving the internet, literary works, and many forms of multimedia. There’s over 735 billion web pages, 41 million books and texts, 14.7 million audio recordings, 8.4 million videos, 4.4 million images, and 890k software programs (archive.org). The Archive works with the Smithsonian, the Library of Congress, and many other libraries and universities in the world to maintain unique digital collections. Their website states they currently “scan 4,300 books per day in 18 locations around the world”, working hard to provide everyone - including those with print disabilities - free access to their digital library. 

While books published before 1924 and those in the public domain can be downloaded, newer books can be borrowed. In 2011, the Archive started a lending program through its “Open Library” that allows a single user to digitally check out a scanned book. This “Controlled Digital Lending (CDL) initiative allows one person to check out the digital copy of each scanned book. The idea is that the purchased physical book is being lent in digital form but no extra copies are being lent” (Claburn). In other words, the Internet Archive has in its possession a physical copy of a book, which it has scanned, and allows one person at a time to access it digitally, for free. I believe users can currently have up to 10 checked-out books at a time.

Many libraries today operate in a similar manner with their digital lending and e-book programs, though your local library has licensing restrictions that may limit things like how often the book can be checked out, or how long they can have the e-book in circulation. Your local library (unless they worked with the IA), paid publishers for the right to distribute e-books to their patrons.  

During the pandemic lockdown, when libraries across the country closed for health reasons, the Archive removed the one person at a time restriction in the CDL by declaring a temporary “National Emergency Library” and providing digital copies for anyone to borrow from their homes. In addition to the millions of public domain books offered, the Emergency Library program also allowed authors to opt-in by donating their books for readers, or opt-out if they requested it. This National Emergency Library program ended in June of 2020 - but it was enough to ruffle the feathers of publishers whom seemed to have looked the other way since 2011.

The same summer in 2020, four book publishers: Hachette Book Group, HarperCollins Publishers, John Wiley & Sons, and Penguin Random House, filed a lawsuit over this digital lending, claiming copyright infringement and financial harm.  Oral arguments in this case, Hachette v. Internet Archive, were heard this past Monday, the 20th of March, three years later.

The Archive did not pay licensing fees to the book publishers for this National Emergency Library program, which allowed multiple people to check out a book at the same time, or their Open Library lending program. The court complaint states “without any license or any payment to authors or publishers, IA scans print books, uploads these illegally scanned books to its servers, and distributes verbatim digital copies of the books in whole via public-facing websites” (see link below for full text). These were books that the Archive had at least one physical copy in their possession, and had scanned into their digital library. “The central question in the case, as summarized during oral arguments by Judge John Koeltl, is: does a library have the right to make a copy of a book that it otherwise owns and then lend the ebook it has made without a license from the publisher to patrons of the library?” (Claburn)



View a transcript of the whole complaint, filed in 2020: https://storage.courtlistener.com/recap/gov.uscourts.nysd.537900/gov.uscourts.nysd.537900.1.0_1.pdf



Defendant’s answer:  https://storage.courtlistener.com/recap/gov.uscourts.nysd.537900/gov.uscourts.nysd.537900.33.0_1.pdf


Publishers claimed a financial loss from the Archive’s lending programs, but the lawyer for the Internet Archive, “Joseph Gratz, argued that the Open Library’s digitization of physical books is fair use, and publishers have yet to show they’ve been harmed by IA’s digital lending” (Belanger). Additionally, a recent Ars Technica article points out that “during this same time, however, the book publishing industry experienced so much demand that revenues rose by 12 percent, amounting to a $3 billion spike in sales by 2021…” (Belanger).

The pandemic was no doubt a profitable time for book publishers - yet they argue they could have made more. The book publisher’s lawyer, Elizabeth McNamara, “seemed to suggest that publishers would have been further enriched if not for IA providing unprecedented free, unlimited e-books access. She also told Koeltl that publishers suing - Hachette, HarperCollins, Penguin Random House, and Wiley - are concerned that there are already some libraries avoiding paying e-book licensing fees by partnering with IA and making their own copies. If the court sanctioned IA’s digitization practices and thousands of libraries started digitizing the books in their collections, the entire e-book licensing market would collapse” (Belanger).

To be fair, I might side with the book publishers views on the licensing fee issue - but I’m very worried what the outcome of this case might mean for the future of the Internet Archive and similar digital libraries. It’s easy to see the book publishers as a bit greedy here, but should they allow libraries to make and distribute copies of their books? Does current copyright law protect digital libraries? What about in the case of a national emergency, like a pandemic?

The Internet Archive feared the worst, as founder Brewster Kahle wrote: “It’s not fair to assume that other libraries will take on the IA’s role or repository if there is a judgment against them, as “the publishers are now demanding that those millions of digitized books, not only be made inaccessible, but be destroyed. This is horrendous. Let me say it again - the publishers are demanding that millions of digitized books be destroyed…And if they success in destroying our books or even making many of them inaccessible, there will be a chilling effect on the hundreds of other libraries that lend digitized books as we do.” (Claburn, quoting Kahle).

The court case does state that “the Internet Archive provides a number of services not at issue in this action, including its Wayback Machine and digitization of public domain materials” so we don’t have to worry about that at the moment.

According to the Internet Archives defendant response, they mention “all of the works at issue in this case have been removed from the Internet Archive’s websites” back in 2020. I assume this means they have retained the physical copies, and digital scans, but have removed them for public display and from the lending program.

It’s not the first such case where digital libraries have come under fire. Past related litagtions include: Authors Guild v. Google; McGraw-Hill v. Google; and Authors Guild v. HathiTrust. If you’re interested in reading more, I found the following article by Argyri Panezi found in the Cornell Journal of Law and Public Policy does a good job summarizing these past cases and details of the copyright issues: https://community.lawschool.cornell.edu/wp-content/uploads/2022/07/Panezi-final-1.pdf


The Judgement

With a loss for the Archive, “after three years of litigation Koeltl easily found for the publishers, holding that the Internet Archive’s scanning and lending clearly constituted a prima facie case of copyright infringement and that the Internet Archive’s fair use defense failed on the facts and the law” (Anderson). This detailed Publishers Weekly article explains Judge Koeltl’s reasoning.

Furthermore, the Judgement says “IA’s wholesale copying and unauthorized lending of digital copies of the publishers’ print books does not transform the use of the books, and IA profits from exploiting the copyrighted material without paying the customary price” (Anderson).

View the full Judge’s opinion and Order, as of 24 March 2023:

https://storage.courtlistener.com/recap/gov.uscourts.nysd.537900/gov.uscourts.nysd.537900.188.0.pdf

So what does all this mean for the future of the Internet Archive?

Within the order, it states that “IA remains entitled to scan and distribute the many public domain books in its collection...It also may use its scans of the Works in Suit, or other works in its collection, in a manner consistent with the uses deemed to be fair in Google Books and HathiTrust” (pg 45, link above). While this means the digital scans stored at Internet Archive likely won’t be destroyed, some of your search results for books not in the public domain may soon be limited to “preview” only sections, or even less: indexed titles.

A statement from the Internet Archive founder, Brewster Kahle, reads:

“Libraries are more than the customer service departments for corporate database products. For democracy to thrive at global scale, libraries must be able to sustain their historic role in society - owning, preserving, and lending books.

This ruling is a blow for libraries, readers, and authors and we plan to appeal it” (blog.archive.org).


While it is too late to make a difference in this case, there are still some things you can do:


Sources and Further Reading: