The Risks of Simulating Insights with LLMs

Table of content

  • Disconnected from the Cause

  • Heartless
  • Biased representativeness
  • One size fits none
  • Intermittent consistency

  •  

The rise of Large Language Models (LLMs) undoubtedly represents a paradigm shift for the market research industry. The ability of Artificial Intelligence (AI) to streamline research processes and analyze large volumes of human-generated data offers undeniable advantages. However, one of its most debated applications is the fabrication or simulation of insights.

This involves using LLMs to mimic human responses by creating "synthetic respondents": AI agents designed with specific demographic characteristics, preferences, or even personalities, which simulate "human" answers. The result is a new category of Synthetic Data that promises faster and cheaper solutions for market research.

However, despite their appeal, these fabricated data carry significant risks. Over-reliance on synthetic responses can lead to flawed insights and poorly founded decisions. These challenges can be summarized in what we call the 5 blind spots of synthetic responses:

  • Disconnected from the cause: Synthetic responses often omit the underlying "why," offering limited interpretations that fail to connect the dots meaningfully.

  • Heartless: They lack authentic human emotion, making it difficult to connect with real human behavior.

  • Biased Representativeness: They offer a partial view that does not accurately reflect the complexity of the real world.

  • One size fits none: They tend to be rigid and standardized, failing to capture the diversity of human interactions.

  • Intermittent Consistency: They can alternate between useful results and notable inaccuracies, undermining their credibility over time.

Let's explore these blind spots further.

Gemini_Generated_Image_ie9ehfie9ehfie9e

1. Disconnected from the Cause

Although LLMs learn in a way that resembles human cognitive development, there are key differences. Humans develop general intelligence through varied experiences; in contrast, Artificial Intelligence needs enormous volumes of specific data to perform "narrow" or more limited tasks.

Research suggests that a child's tendency to ask "why?" is linked to cognitive development, especially the understanding of causality. Humans reject "black box" models and seek to understand the underlying mechanisms, which is not the case with LLM learning.

For example, an LLM might predict that a certain demographic group prefers a product but overlook the cultural reasons driving that choice. Although LLMs and correlation-based models can make effective predictions, they often cannot explain the why behind phenomena. A classic example: the correlation between eating ice cream and getting sunburned. To understand the real cause, one must control for sun exposure. Causality-based predictions are more robust because they identify how one factor directly influences another.

Although newer models like OpenAI's Strawberry are exploring more advanced reasoning approaches, there is still a long way to go for LLMs to fully understand the motivations behind human behavior.

2. Heartless

Synthetic responses, while logical, often lack the emotional depth present in human interactions. This limits the ability to understand consumption contexts where empathy, trust, and personal connection are key.

Emotional elements are deeply rooted in the decision-making processes described by Kahneman's System 1: impulsive, fast, and instinctive. Studies like "Digital Respondents and their Implications for Market Research" or "Using Synthetic Data to Solve Client Problems" have shown that responses generated by LLMs, although rationally correct, fail to resonate emotionally. This is partly because LLMs do not experience emotions and, therefore, cannot replicate them authentically.

Thus, even if they can eventually imitate logical decision-making frameworks, they will remain disconnected from the emotional factors that drive human behavior.

3. Biased Representativeness

LLMs are particularly vulnerable to biases present in their training data. If that data reflects historical prejudices or unequally represents certain groups, the generated insights will also be biased.

A recent study by Yan Tao, Olga Viberg, and other researchers, titled Cultural Bias and Cultural Alignment of Large Language Models, showed that models like ChatGPT tend to reflect the cultural values of Northern Europe and Anglo-Saxon countries. Without specific guidance, their responses to the World Values Survey aligned with those cultures.

However, when ChatGPT was asked to respond as if it were a person from specific countries, its answers much better reflected local values. This finding underscores the importance of actively managing biases to generate more inclusive results. In this context, having good seed data is fundamental to producing reliable Synthetic Data.

4. One Size Fits None

One of the most common arguments for Synthetic Data is its ability to replicate average values similar to human data. But this is misleading. Although the means may match, the dispersion (the variability in responses) is often much smaller.

Dispersion metrics like variance or the interquartile range are crucial for understanding how data is distributed. Two datasets with the same mean can lead to completely different interpretations if their dispersions differ.

Synthetic data, while useful, tends to be homogeneous and lose nuances. This can lead to recycled and unoriginal insights. Clayton Christensen, author of the Disruptive Innovation theory, had a sign in his Harvard office that said: “Anomalies Welcome”. Because, often, innovation arises from outliers.

Furthermore, relying solely on synthetic data can create dangerous feedback loops, where a lack of diversity compromises future analyses, strategies, and decisions.

5. Intermittent Consistency

Although some blind spots can be corrected with better prompts and reverse engineering, a key problem remains: inconsistency.

Everything seems obvious once you know the answer, but the variability in the quality of LLM responses undermines their reliability. Consumer decisions are shaped by cultural norms, economic conditions, and individual motivations. Detecting and correcting biases in LLMs is an emerging field, still without definitive solutions. Moreover, creating and training these models requires large volumes of data and computational resources.

Most concerning is that synthetic data exhibits a form of epistemic instability: it can oscillate between brilliant successes and flagrant errors. This inconsistency is reminiscent of the replicability crisis in psychology, where fundamental findings are questioned due to their lack of methodological consistency.

To trust Artificial Intelligence and Synthetic Data, we need solid standards of consistency and reproducibility. Without them, the models lose credibility, and their value as tools to support strategic decision-making is weakened.

Subscribe to our blog and receive the latest updates here or in your email