Zappi’s perspective on human, AI and synthetic research & how to determine the right approach

⚽️ What's working in FIFA World Cup marketing this year?

AI and synthetic research methods are reshaping how brands gather consumer insights, which makes for an exciting time of change.

New use cases are emerging as more teams gain access to consumer insights at a scale and speed never before possible. But as adoption accelerates, terminology is becoming increasingly blurred, and it’s important to understand that not all “synthetic” approaches are created equal.

These approaches vary dramatically in how they are built, what they are trained on, and most importantly, what types of decisions they can reliably support.

The wrong synthetic approach applied to the wrong business problem can create false confidence at scale. While large language models (LLMs) are exceptionally good at producing fluent, authoritative-sounding answers, confidence is not the same as accuracy.

The critical question is not whether AI can generate an answer, it’s what the model was trained on, how it’s validated and whether it's fit for the decision (and acceptable level of risk) being made.

In this article, we’ll cover Zappi’s perspective on these types of research approaches, where we believe each delivers the greatest value, where caution is warranted and how we see human, AI and synthetic approaches evolving together.

How to use AI agents to innovate smarter

Learn how AI agents can help brands work smarter, reduce friction and bring viable ideas to market faster.

Get the guide

Zappi’s stance on AI, human and synthetic research

At Zappi, we believe synthetic is best understood as a complement to human research, not a replacement for it.

The most effective decision systems combine human, synthetic and AI and machine learning approaches together, using each where it creates the most value. The goal is not to remove people from the process, but to help brands move faster, test more, learn earlier and focus human research where it matters most.

"We as researchers are in an incredibly brilliant position because we can be the data asset that is being used by AI to create new ideas and products. But the critical piece is that we are the ones who manage and worry about that data."
- Steve Phillips, founder and Chief Innovation Officer, Zappi

We believe the greatest opportunity for synthetic research is not simply doing existing research faster and cheaper. It’s unlocking insight generation in places where insight has historically been too slow, too expensive or operationally impossible to obtain.

This becomes particularly powerful in high-volume, low-risk or exploratory use cases, such as:

Narrowing down large volumes of early-stage product ideas
Testing large sets of advertising assets at scale
Exploring secondary or emerging markets where traditional research may not be prioritized
Generating earlier directional feedback before investing in deeper human validation

In these examples, synthetic approaches can dramatically scale the learnings available to marketing and insights teams, enabling more iteration, better prioritization, broader exploration and faster decision-making cycles.

At the same time, it’s important to remember that not every decision carries the same level of risk. High-stakes decisions such as major innovation bets, launch validation or significant brand investments still require human validation.

The key is understanding where synthetic methods create leverage, where human insight remains essential and how the two work together within a connected learning system.

For insights and marketing leaders, this creates both opportunity and responsibility.

Navigating this shift requires a clear understanding of the different approaches, their strengths and limitations and the questions brands should ask when evaluating partners and solutions.

How to determine if an AI-based or synthetic approach is truly useful and reliable

When evaluating AI or synthetic research approaches, there are three core areas that matter most.

The right solution ultimately depends on:

The type of decision and level of acceptable risk
The quality of the data underpinning the model
The ability to validate against real outcomes

Let’s dive into each of these.

1. Fit for purpose

Not every decision requires the same level of rigor, and not every AI or synthetic approach is designed to solve the same problem.

Match the method to the level of risk

The level of risk in the decision should inform the appropriate research approach.

Human research remains critical for high-risk decisions. AI prediction models may be appropriate for lower-risk, high-volume decisions, while synthetic approaches can play a valuable role in early screening, directional feedback and exploration.

The key is to match the depth of research to the level of investment and reversibility of the decision. A $10 million TV ad and multi-channel social campaign deserve different methodological approaches, but both still deserve meaningful consumer grounding.

Remember prediction without diagnosis is not enough

A score alone is rarely enough to drive action. Especially when optimization is the goal.

Reliable AI research approaches should deliver both:

Prediction: What is likely to happen
Diagnosis: Why it is happening and what could improve performance

The “why” still matters.

Understanding what consumers are responding to requires qualitative depth, whether from human open-ended responses or well-grounded synthetic verbatims. But without diagnostic insight, optimization becomes guesswork.

Ensure insights is embedded inside the workflow

Research delivered after a decision has already been made has limited value.

For example, as content volumes grow and production cycles compress, the challenge is not only accuracy — it’s whether insight is available when decisions are actually being made.

The best approaches embed prediction and guidance directly into existing workflows, rather than positioning research as a disconnected step after product or campaign production.

2. The right underlying data

The quality of any AI or synthetic system is limited by the quality of the data beneath it.

"Data is the underpinning of everything we do, and it’s that old adage of garbage in, garbage out…Frankly, we can start using AI on top of bad data but we’ll get bad outcomes. So you have to get the data right."
- Steve Phillips, founder and Chief Innovation Officer, Zappi

This sounds obvious, but it's one of the most important factors when evaluating vendors and solutions.

The single biggest determinant of whether an AI or synthetic approach is reliable is the data it was trained on.

Generic models trained on broad internet data can generate fluent, plausible-sounding outputs that may have little predictive relationship to real consumer behavior. Without augmenting with specific data, an LLM predicts plausible answers but doesn’t “think” in the messy way humans do, so they tend to have generic and too positive responses that reflect stereotypes.

Purpose-built models trained on real consumer responses to real ads, concepts and behaviors against validated norms and consistent methodologies are materially more reliable.

And specificity matters enormously. A model trained on food and beverage advertising will perform very differently from one trained on financial services, even if the underlying LLM is identical.

When evaluating training data, there are three areas that matter most:

1. Breadth of training data

Models trained across diverse categories, brands, markets, audiences and creative styles offer far better insight than narrowly trained systems. And understandably, limited or biased datasets inevitably create blind spots. The strongest approaches continuously expand training coverage and seek out more over time.

2. Consistency of training data

Accuracy degrades when training data is collected inconsistently. Differences in audience definitions, sampling methods or survey tools can introduce “noise” in the data, making it less able to detect the right signals. Reliable systems require standardized, consistently collected human data.

3. Continuous fueling of fresh data

Consumer behavior, culture and creative constantly evolve. Models trained once and left static will become stale. The most robust systems create a continuous learning loop where every new human study feeds fresh data back into the model, improving performance over time.

3. Always on, transparent validation

No AI or synthetic approach should be trusted without ongoing validation against real human outcomes.

Accuracy should be transparent and continuously measured, not a one off activity. Vendors should be able to clearly explain how models are validated, how often validation occurs and how model performance is monitored over time.

The problem with the term “synthetic”

Before we go on, one important call out we’d like to make is on the use of the term synthetic.

A big challenge in this space is that “synthetic” has become a catch-all term for very different technologies. These approaches are often discussed as though they are interchangeable.

But they are not.

As discussed above, the right approach depends on the decision being made, the level of acceptable risk, the quality of the underlying data and the ability to validate outputs against real human results.

That is why “synthetic” cannot be treated as a single methodology. It spans at least three fundamentally different approaches, each with different strengths, limitations and appropriate use cases. Conflating them creates real risk, either by over-relying on methods that cannot support high-stakes decisions or dismissing approaches that can create meaningful speed and efficiency gains.

To evaluate these approaches properly, it’s important to first understand what each one actually is and what role it is best suited to play.

Three distinct approaches — not one

The sections below provide a framework for understanding the different approaches, matching methods to decisions and asking the right questions of any vendor.

These approaches may exist independently, in combination or in solutions that loosely span multiple definitions.

1. AI prediction models (Machine learning)

What they are: Models trained on large databases of real human responses to ads, ideas or concepts. They learn patterns from historical data and predict how audiences will respond to new stimuli.

Pros:

Typically 60–85% correlation with human survey results when well-trained
Can evaluate dozens or hundreds of assets quickly and affordably
Grounded in real data, not assumptions
Can identify relative winners and losers across a portfolio of creative

Considerations:

Provide more of the “what” than the “why”
Accuracy degrades significantly if the model is trained on generic, outdated or irrelevant category data

Best suited to: Screening and validating large volumes of digital creative where consumer response insights are currently lacking and human research may not provide sufficient return on investment.

2. Synthetic respondents (LLM-based personas)

What they are: AI-generated personas designed to simulate how specific demographic, attitudinal or behavioral profiles might respond — effectively skilled actors improvising consumer reactions. The more the generic LLM data is augmented, the better the outcome.

Pros:

Useful for early-stage exploration before investing in human research
Helpful for generating qualitative “why” insights, including likely reactions, concerns and directional messaging feedback
Can filter large volumes of early-stage concepts to identify the most promising ideas

Considerations:

Lower accuracy for closed-ended quantitative responses versus a prediction model, particularly outside US and English-language contexts due to significant LLM bias
Based on assumed patterns rather than lived experience, making them less reliable for culturally nuanced ideas. More generic, stereotypical and positive responses without the ‘messiness’ of humans
Should be treated as directional, not definitive — they augment human research, not replace it
Without grounding in real survey data, outputs can appear precise while remaining statistically artificial

Best suited to: Early-stage screening, exploring potential consumer reactions before committing budget and filtering large idea sets prior to human validation.

3. Digital twins (Individual behavioral replicas)

What they are: Virtual replicas of real individuals, built from actual behavioral data such as purchase history, media habits and survey responses that can simulate future actions.

Pros:

Strong individual-level behavioral prediction (70–90% accuracy on behavioral tasks)
Dynamic and able to model scenarios over time
Useful for media optimization, adoption forecasting and long-term scenario planning

Considerations:

Requires deep, high-quality behavioral data that most brands do not have at scale
Less suited to attitudinal or emotional insights or explaining why consumers feel something
Do not connect well to pre-market creative evaluation

Best suited to: Identifying white space and unmet needs, media planning and optimization and behavioral scenario modeling.

Matching the method to the decision

Here’s a visual breakdown to help you better understand the approaches we just covered and where they sit in terms of risk and precision:

Chart showing the risk and precision levels across synthetic data approaches

Key questions to ask any vendor

We just covered a lot in terms of these various approaches. Here are some key questions to keep top of mind when evaluating solutions or conducting vendor research:

1. On the fit for purpose

Does the vendor’s AI and synthetic capabilities align to the decisions and use cases you are trying to support? Depending on the need, you may require fast directional feedback, deeper diagnostic insight or both.
Does the vendor offer both human and AI-based research approaches, allowing methods to be matched to the level of risk and decision type?
Does the human, AI and synthetic offerings share a common framework, allowing results to be meaningfully compared within a broader, holistic research approach?

2. On the right underlying data

What is the model trained on? Is it real consumer responses to real stimuli, or primarily derived from public, online and general LLM training data?
How relevant and broad is the training data for the use case? Does it span the right categories, markets, audiences and creative styles? For some use cases, like product innovation research, depth within a category may matter more than broad horizontal coverage.
How consistently was the training data collected? Differences in audience definitions, sampling methods or survey approaches can introduce the “noise” that models learn and perpetuate. More standardized data generally leads to more reliable outputs.
How frequently is the model updated with new human data? Is there a continuous learning loop or periodic updates?

3. On continuous, transparent validation

How is model accuracy monitored and validated over time? Can the vendor demonstrate ongoing validation against real human outcomes? How does the vendor identify, monitor and correct for model drift?

Final thoughts: The system matters more than any individual method

AI and synthetic approaches will continue to evolve rapidly. But long term, the advantage will not come from any single model or standalone tool. It will come from the system.

Insights compound when they are connected. Results that sit in disconnected projects, reports or other silos create limited value. Connected insight systems create a continuous learning loop where every test, campaign and optimization helps strengthen future decisions.

This is where we believe the real value of AI and synthetic approaches exists — not simply in generating cheaper and/or faster outputs, but in helping brands build scalable, continuously improving learning systems.

The future is not AI replacing human insight. It is in connected systems that combine human insight, AI and synthetic approaches together to help organizations test more, learn faster, optimize continuously and build a stronger proprietary data asset over time.