Researchers get inside the mind of bots, find out what texts they trained on

If you've ever wondered whether that chatbot you're using knows the entire text of a particular book, answers are on the way. Computer scientists have developed a more effective way to coax memorized content from large language models, a development that may address regulatory concerns while helping to clarify copyright infringement claims arising from AI model training and inference.

Researchers affiliated with Carnegie Mellon University, Instituto Superior Técnico/INESC-ID, and AI security platform Hydrox AI describe their approach in a preprint paper titled "RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline."

The authors – André V. Duarte, Xuying Li, Bin Zeng, Arlindo L. Oliveira, Lei Li, and Zhuo Li – argue that the ongoing concerns about AI models being trained on proprietary data and the copyright claims being litigated against AI companies underscore the need for tools that make it easier to understand what AI models have memorized.

Commercial AI vendors generally do not disclose their full data training sets, which makes it difficult for customers, regulators, rights holders, or anyone for that matter to know the ingredients that went into making AI models.

To further complicate matters, the researchers note in their paper that prior techniques for probing AI models like Prefix-Probing have become less reliable because "current models are often overly aligned in their effort to avoid revealing memorized content, and as a result, they tend to refuse such direct requests, sometimes even blocking outputs from public domain sources."

In effect, model alignment, notionally a safety mechanism, ends up keeping model makers safe from scrutiny. Ask a model to quote a passage from a specific book and it may politely decline.

Corresponding author André V. Duarte, a PhD student at CMU and INESC-ID, told The Register in an email about the rationale for the project.

"Although our work frequently uses copyrighted material as a motivating example, the broader scientific goal is to understand how memorization happens in large language models, regardless of whether the underlying data is copyrighted, public-domain, or otherwise," Duarte explained.

"From a research perspective, any training data is relevant, because the phenomenon we study (verbatim or near-verbatim memorization) can arise across many kinds of sources."

The research isn't exclusively focused on copyrighted material, said Duarte, but that naturally becomes a focal point when explaining the work to the public.

"People are generally less concerned if a model memorizes older books like Pride and Prejudice, and are far more concerned if it can reproduce passages from a book or article for which the model may not have had permission to train on," he explained.

"Copyrighted examples therefore make the real-world stakes of memorization easy to understand. That's why developing better methods to detect such memorization is important: it helps clarify what models may have internalized, supports transparency, and could inform discussions about compliance and responsibility."

RECAP – not to be confused with the Free Law Project's RECAP tools – is a software agent (an iterative loop with tools) that tries to extract specific content from LLMs through an iterative feedback process. It includes a jailbreaking component that rephrases the prompt to counter when models refuse to respond.

"The key advantage of RECAP is its agentic feedback loop," Duarte explained. "We know from prior work that language models don't always give their strongest or most complete answer on the first attempt.

"RECAP takes advantage of this by letting the model iteratively refine its own output: after each extraction attempt, a secondary agent reviews the result and provides high-level guidance about what was missing or inconsistent, while taking special care never to include any verbatim text from the target passage, since that would contaminate the pipeline."

Using a benchmark of their own design called EchoTrace, the authors report that RECAP achieves an average score of 0.46 on ROUGE-L [PDF], a test for evaluating text summarization algorithms. That score outperforms the best prior extraction method by 78 percent.

The paper states, "While we acknowledge RECAP to be computationally intensive, across multiple model families, RECAP consistently outperforms all other methods; as an illustration, it extracted about 3,000 passages from the first 'Harry Potter' book with Claude-3.7, compared to the 75 passages identified by the best baseline."

Coincidentally, Claude's maker, Anthropic, agreed in September to pay at least $1.5 billion to settle authors' copyright claims. ®

Source: The register

Home

Researchers get inside the mind of bots, find out what texts they trained on