Study Finds AI Models Ignore Warnings and Learn Lies Anyway
New research shows that LLMs absorb false statements from training data even when those statements are explicitly labeled as false, a phenomenon the authors call negation neglect. Models fine‑tuned on documents containing fabricated claims — even with repeated warnings like “this is false” — still internalized and reproduced those falsehoods with high confidence. The effect persisted even when the documents were framed as unreliable or fictional, and even when models were later corrected with true information. The study suggests that LLMs prioritize statistical patterns over explicit negation, raising concerns about training data quality and how easily false “beliefs” can be implanted.
Read the full story on Ars Technica →