Technology

Court filings show Meta paused efforts to license books for AI training

3Views


New court filings in an AI copyright case against Meta add credence to earlier reports that the company “paused” discussions with book publishers on licensing deals to supply some of its generative AI models with training data.

The filings are related to the case Kadrey v. Meta Platforms — one of many such cases winding through the U.S. court system that’s pitted AI companies against authors and other intellectual property holders. For the most part, the defendants in these cases — AI companies — have claimed that training on copyrighted content is “fair use.” The plaintiffs — copyright holders — have vociferously disagreed.

The new filings submitted to the court Friday, which include partial transcripts of Meta employee depositions taken by attorneys for plaintiffs in the case, suggest that certain Meta staff felt negotiating AI training data licenses for books might not be scalable.

According to one transcript, Sy Choudhury, who leads Meta’s AI partnership initiatives, said that Meta’s outreach to various publishers was met with “very slow uptake in engagement and interest.”

“I don’t recall the entire list, but I remember we had made a long list from initially scouring the Internet of top publishers, et cetera,” Choudhury said, per the transcript, “and we didn’t get contact and feedback from — from a lot of our cold call outreaches to try to establish contact.”

Choudhury added, “There were a few, like, that did, you know, engage, but not many.”

According to the court transcripts, Meta paused certain AI-related book licensing efforts in early April 2023 after encountering “timing” and other logistical setbacks. Choudhury said some publishers, in particular fiction book publishers, turned out to not in fact have the rights to the content that Meta was considering licensing, per a transcript.

“I’d like to point out that the — in the fiction category, we quickly learned from the business development team that most of the publishers we were talking to, they themselves were representing that they did not have, actually, the rights to license the data to us,” Choudhury said. “And so it would take a long time to engage with all their authors.”

Choudhury noted during his deposition that Meta has on at least one other occasion paused licensing efforts related to AI development, according to a transcript.

“I am aware of licensing efforts such, for example, we tried to license 3D worlds from different game engine and game manufacturers for our AI research team,” Choudhury said. “And in the same way that I’m describing here for fiction and textbook data, we got very little engagement to even have a conversation […] We decided to — in that case, we decided to build our own solution.”

Counsel for the plaintiffs, who include bestselling authors Sarah Silverman and Ta-Nehisi Coates, have amended their complaint several times since the case was filed in the U.S. District Court for the Northern District of California, San Francisco Division in 2023. The latest amended complaint submitted by plaintiffs’ counsel alleges that Meta, among other offenses, cross-referenced certain pirated books with copyrighted books available for license to determine whether it made sense to pursue a licensing agreement with a publisher. 

The complaint also accuses Meta of using “shadow libraries” containing pirated e-books to train several of the company’s AI models, including its popular Llama series of “open” models. According to the complaint, Meta may have secured some of the libraries via torrenting. Torrenting, a way of distributing files across the web, requires that torrenters simultaneously “seed,” or upload, the files they’re trying to obtain — which the plaintiffs asserted is a form of copyright infringement.



Source link

Leave a Reply

Exit mobile version