“Synthetic Respondents”? Long live the survey, but with AI assistance

Table of content

  • Synthetic respondents

  • Two ways to use AI to generate synthetic respondents
  • Use cases: description or prediction
  • Empirical tests
  • AI helping us with surveys

  •  

It’s the latest revolution that promises to change everything. The innovation that, this time, truly seems destined to end the survey: synthetic respondents. While we are still trying to grasp the impact of artificial intelligence (AI) in our lives, new and surprising uses of this technology are emerging in various activities and industries. AI is already used to supplement or complement the work of translators, editors, legal advisors, computer programmers, designers, and a long list of others. And the world of market research was not going to be an exception.

 

What are “synthetic respondents”?

The idea is related to the more general concept of synthetic data, which consists of creating artificial data that mimics the statistical characteristics of real data without containing information from real people. This data is usually generated through algorithms, simulations, or statistical models to reproduce the patterns and correlations of real data in contexts where obtaining it would be impossible or too costly. For example, if we want to simulate the volume of patients that a hospital's emergency care system can handle, we can estimate the waiting times for different patient volumes using real data on when patient visits accumulate, how much time their care typically requires, etc.

The use of synthetic data is not new. Its origins date back to the mid-20th century, when statistical techniques were developed to facilitate the generation of simulated data from known probability distributions, for the less-than-noble purpose of developing the atomic bomb.

But could synthetic data be used to replace the data we obtain from consumers through surveys? In other words, can we create “synthetic respondents”? This would allow for statistical analysis and decision-making, just as we currently do with survey data, but at a lower cost, more quickly, and without the problems associated with personal information privacy.

The emergence of Large Language Models (LLMs)—such as ChatGPT (OpenAI), Gemini (Google), or Copilot (Microsoft)—opens the door to an ambitious idea: generating data without relying on real people. Observing the ability of these models to answer questions with coherent and well-structured reasoning—often indistinguishable from that of a human—it is almost inevitable to consider using them to simulate human responses. And, of course, the proposal is as suggestive as it is promising.

A closer more abstract view of a neural network or a digital brain processing survey data It focuses on the idea that AI can think and generate responses maintaining the blogs clean and corporate aesthetic-1

Two ways to use AI to generate synthetic respondents

How can we use AI to replace the responses of real people in quantitative studies? Essentially, there are two strategies.

The first is to use AI to replace survey respondents. The idea is to define the different population profiles we want to investigate and ask the AI, through detailed instructions provided via “prompts” (textual cues that guide the model on what and how to respond), to generate plausible responses that these profiles might provide. For example, in an urban mobility study, I could ask the AI to answer a questionnaire about the use of transport for daily city commutes as if it were a 25-year-old man, or as if it were a 45-year-old woman. By requesting multiple responses for each profile, I would end up with a dataset comparable to what I would get from a traditional survey.

The second strategy is to skip the simulation of individual responses and ask the model to respond directly and in an aggregated form to the research questions. Continuing with the example, this would mean asking the AI to estimate the percentages of each population profile that uses each type of transport and the main reasons why. This strategy would be equivalent to receiving a final analysis of data that we never actually observed.

Each strategy has its pros and cons. Generating individual responses via AI offers greater granularity of information. LLMs do not always give the same answer to the same request; they generate possible responses, each with different probabilities of being “correct” or appropriate. By adjusting a parameter known as temperature, we can control the degree of variability in the responses: with a low temperature, the model tends to always give the most probable answer; with a high temperature, it allows for more variation. If we set a high temperature and repeat the same request several times, we will get different responses, whose frequency will roughly reflect their relative probability. We can use this to request multiple responses for the same simulated profile and thus capture some variability of opinions or behaviors within the population we want to study.

In other words, if we ask a simulated profile—for example, a 25-year-old man—what mode of transport he usually uses, and the model estimates an 80% probability that he will answer “public transport” and a 20% probability that he will say “private transport,” with a low temperature and 100 repetitions, we would always get the same answer (“public transport”).

In contrast, generating aggregated responses takes us directly to the end of the process. We are asking the model to perform a certain reasoning—if the term is permitted—to directly estimate the expected response distributions for certain questions. The LLM will operate substantially differently here: it will try to estimate these distributions from the data in its training. Following the example, it would look for information it has seen in its training data about public transport use in the studied city and, if it finds no specific data, it might resort to available information from cities with similar characteristics. It would do something similar to what humans would do: combine information from secondary sources with common sense.

Use cases: description or prediction

The principle behind “synthetic respondents” has a certain logic, but does it really work? Here we can distinguish two main use cases.

The first could be described as a descriptive use: asking an LLM to estimate existing behaviors. For example, we could ask it to tell us what percentage of the population consumes energy drinks or, more specifically, a particular brand of these drinks. In these cases, the models usually perform well, although what they do is not very different from what we could achieve ourselves by searching for available consumption reports and combining them with data such as advertising spending or demographic reports, among others. LLMs are effective at finding and combining data to generate coherent answers. However, these types of studies are currently often resolved using secondary data sources, without the need for surveys.

The real problem arises with predictive use: that is, anticipating a future or present but unobservable behavior, including opinions on topics that have not been previously raised. The vast majority of problems in commercial research with primary sources fall into this category: for example, what percentage of the population would buy a new product, why consumers prefer one brand over another, or what proportion remembers seeing a particular advertising campaign.

Can an LLM really answer these types of questions accurately, for which it has no solid evidence in its training data or in sources accessible on the internet?

 

Empirical tests

This past July, the biannual conference of  ESRA (European Survey Research Association) was held, a benchmark in methodological survey research, but also in the development of alternative methods of data collection and analysis. The application of AI to supplement or replace survey data generated great interest.

One of the conference tracks was specifically dedicated to the topic of “Synthetic Data Generation and Imputation with LLMs.” In particular, the first presentation, by Leah von der Heyde (LMU Munich, Munich Center for Machine Learning), presented the results of an experiment designed to evaluate the ability of LLMs to replace survey respondents in order to predict the results of the 2024 European elections. The key question of the study was: “Can LLMs predict the aggregated outcomes of future elections?”.

To answer this question, the researchers used three LLMs to predict the electoral behavior of 26,000 European voters, providing them with individual information about each voter's profile, according to the real demographic composition of the population, and compared the generated responses with the actual results. They also tried to obtain aggregated estimates using the same models.

The results were, in general, disastrous. Significant differences were observed by country and language, and accuracy largely depended on the prompts including not only sociodemographic data but also attitudinal information. The study's authors emphasized the limited applicability of synthetic samples generated by LLMs for predicting public opinion, which casts doubt on other potential uses in market research. As an example: the average voter turnout predicted by the models was 83%, when the reality was 49%.

Source: von der Heyde et al. (2025)

But why does synthetic data fail in these types of tasks? Several researchers—including the authors of this study—mention factors such as biases in the training data, the overrepresentation of certain groups, the inherent complexity of social and political dynamics, the digital divide affecting certain segments of the population, and the hallucinations that sometimes occur in LLM responses.

I would go even further: the real question is not why synthetic data fails, but why we would expect it to work. LLMs identify relationships between words (or parts of words, tokens) from vast training texts, using architectures with millions of parameters. These relationships concentrate both human knowledge and, to some extent, the logic that structures it, allowing the models to emulate human reasoning in their responses. But how could an LLM faithfully and representatively model unobserved behaviors?

The described results are devastating. Even so, some providers are already offering solutions based on “synthetic respondents,” especially geared towards conducting qualitative interviews with certain profiles of interest. I don’t think this qualitative focus is a coincidence, for two reasons:

  1. LLMs are terribly convincing in their responses; they usually make sense and are logical, whether they are correct or not.

  2. In qualitative studies, we do not have an objective truth to compare the results against, so no one can easily dispute the apparent value of the information obtained.

The use of “synthetic personas” may have value for the researcher, but it is probably more about providing a well-informed interlocutor with whom to explore hypotheses or debate ideas, rather than faithfully representing a typical member of the target group. This could be useful in the early stages of research to identify promising proposals, but they could never completely replace data generated by humans, as a study published in the Harvard Business Review by Brand, Israeli, and Ngwe points out.

In short, as Nik Samoylov (Conjointly) pointed out, synthetic data could be something like the homeopathy of market research: there is no evidence that it works, but many people still believe in it.

A conceptual image showing a futuristic interface where digital avatar profiles representing synthetic respondents are being analyzed by an artificial intelligence The color palette is professional with shades of blue and white evoking technology and-1

AI helping us with surveys

Despite the above, AI seems destined to play a fundamental role in market research. Several presentations at ESRA addressed these possible uses, summarized in Reveilhan's (2025) presentation, which include:

  • questionnaire design,

  • its translation and adaptation,

  • the development of questionnaires capable of adapting to participants' responses,

  • the prediction and prevention of non-response,

  • the interpretation and coding of open-ended responses,

  • data quality control,

  • imputation of missing values,

  • and even interactive analysis through natural language instructions (“talk to data”).

In short, the survey—which has been declared dead so many times (with the rise of the Internet, social media, passive data)—is more alive than ever and, paradoxically, could be strengthened by the arrival of AI.

 

References:

Brand, J., Israeli, A., & Ngwe, D. (2025, July 18). Using Gen AI for early-stage market research. Harvard Business Review. https://hbr.org/2025/07/using-gen-ai-for-early-stage-market-research Reveilhac, M. (2025, July 17). Advancing survey research through AI and machine learning: Current applications and future directions [Conference session]. European Survey Research Association (ESRA) Conference 2025, Utrecht, Netherlands. https://www.europeansurveyresearch.org/conf2025/prog.php?sess=137#main von der Heyde, L., Haensch, A.-C., Wenz, A., & Ma, B. (2024). United in diversity? Contextual biases in LLM‑based predictions of the 2024 European Parliament elections (Version 2) [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2409.09045

Subscribe to our blog and receive the latest updates here or in your email