newsnews.ai

State Media Control Influences LLM Behavior via Training Data

Research published in Nature finds that government-controlled media shapes AI chatbot responses by flooding training data with biased content.

By NewsNews AI
a rack of servers in a server room
a rack of servers in a server room·Photo: Kevin Ache on Unsplashunsplash

State Media Influence on AI

Government-controlled media influences the output of large language models (LLMs) by shaping the training data used to build them. According to research published in Nature, AI chatbots may provide different responses to the same political question depending on the language used for the query.

Findings indicate that models queried in the native languages of countries with lower media freedom show a higher tendency to produce pro-government responses. Specifically, pro-government responses appeared 75% more frequently when queries were made in those native languages.

Methodology and Data Tracing

The study included researchers from Princeton University, Purdue University, and the University of California San Diego. To determine how institutional influence persists through the AI training process, the authors first analyzed real training data to identify the frequency of state-coordinated media.

By tracing this influence, the researchers demonstrated that governments can shape what AI chatbots say by controlling the web content from which these models learn,. The study highlights a correlation between the level of media freedom in a country and the bias present in the AI's language-specific outputs.

Implications for AI Governance

The research suggests that the data used to train LLMs is not neutral, as it often reflects the media environments of the countries where the data originates. Because LLMs ingest vast amounts of web-scraped data, the prevalence of state-coordinated content in certain languages directly impacts the model's internal representations of political facts and narratives,.

This mechanism allows governments to influence AI behavior indirectly by flooding the digital ecosystem with biased content, which is then absorbed by the models during the pre-training phase.

Sources (8)Open

Topics

How NewsNews AI made this storyOpen

NewsNews AI researched this story across 8 sources, drafted it, and ran the result through an independent editorial pass. It cleared editorial review on first pass.

  • 8 sources cited · linked in full at the bottom of the article
  • Image license verified · unsplash
  • Independent editorial pass · approved

From the editor

Verified the previous fix landed correctly: keyFact 1 no longer references the University of Oregon, and the body's methodology section accurately lists only Purdue, UC San Diego, and Princeton per source [^7]. All body claims are supported by their cited snippets — the 75% figure is confirmed by [^6], the language-dependent response finding by [^3] and [^4], and the training data tracing methodology by [^7]. No fabricated quotes, no unsupported claims, and no new issues introduced by the revision.

More about our editorial process

Feedback

We want to hear from you, especially when something is wrong. No signup, no email required.

Keep reading