Prova
Back to Blog
/Proof

When AI Workflows Fail In Marketing: What Actually Goes Wrong

The most common way AI workflows fail in marketing is inconsistent input data with no human output review before it ships.

Short answer

The most common way AI workflows fail in marketing is inconsistent input data with no human output review before it ships. The fix is not better AI — it is a review checkpoint built into the workflow design.

Prova editorial image for a post examining why AI workflows fail in marketing and how to fix the underlying design problem.

I am going to tell you about the time Prova's sprint review system broke in a specific and embarrassing way, because I think it is more useful than a list of abstract failure modes.

The sprint review in Prova is the core of how the program works. A learner submits an artifact — a workflow spec, a working prototype — and an AI reviewer evaluates it against criteria for that sprint. The reviewer gives structured feedback. The learner revises or passes. It is the difference between "I read about this" and "I built something that got evaluated."

Here is what happened in an early version: the AI reviewer was receiving inconsistent context about which sprint the user was currently on. The system was passing sprint information into the review prompt, but the data was sometimes stale — a cached value from a previous session, or a session where the user had navigated between sprints without the state updating correctly.

The result was that the reviewer would occasionally evaluate a submission against the wrong sprint's criteria. A user submitting work for Sprint 3 would receive feedback calibrated for Sprint 1 — where the criteria are simpler and the expected artifact is less developed. The feedback sounded confident. It was coherent. It was evaluating the wrong thing.

Users got contradictory signals. Some passed easily on submissions that should have required more work. Some received feedback pointing out missing elements that were correct for a different sprint. The worst part was that the feedback sounded authoritative in both cases.

I did not catch it immediately because the reviews were not obviously broken — they were subtly wrong. The language was fluent. The structure was correct. The criteria being applied were just for a different sprint.

What actually caused it

Not the model. The context.

The AI reviewer was given whatever sprint information was in the prompt at call time. When that information was incorrect — stale, misrouted, or overwritten by a session bug — the reviewer worked with what it had. It had no way to know the context was wrong, because nothing in the prompt told it to validate the sprint ID against the user's current state.

The system trusted the input without verifying it. The input was sometimes wrong. The output was always confidently wrong in the same direction as the input.

This is the most common way AI workflows fail in marketing, not just in what I built: inconsistent input data with no review step before the output reaches someone who acts on it.

Why the fix is not a better model

When this kind of failure happens, the instinct is to look at the AI. Maybe the model is not smart enough. Maybe a different tool would catch it.

It would not. The model cannot verify context it is not given. A more capable model would produce more fluent wrong answers, which is actually harder to catch.

The real fix is in the workflow design: what context does every invocation of this prompt require, and how do you ensure that context is always current and correct?

In Prova, the fix was adding a structured context block at the start of every review prompt that explicitly included the sprint ID, the current criteria set, and the user's previous submission history for that sprint. Not pulled from cache. Fetched fresh at invocation time. The prompt also included an explicit instruction: "If the sprint ID or criteria set is missing or marked as unknown, do not proceed with the review — return an error requesting the correct context."

That last line is the important one. It gave the reviewer a defined behavior for uncertain input, instead of defaulting to confident output based on bad data.

The general pattern

Four things, in order, cause AI workflow failures in marketing:

  1. Inconsistent inputs. The workflow receives different data structures at different times, and the prompt does not handle the variation. The model does its best with whatever it gets.
  2. No output review before it ships. Someone automated the generation step but not the review step. The output reaches a customer, a leadership report, or a published page without a human reading it.
  3. Prompt drift. The original prompt worked for the use case as it was designed. Six months later, the inputs changed slightly, the use case evolved, but the prompt did not. It still runs. It is no longer calibrated for what it is receiving.
  4. No failure documentation. When something goes wrong, it gets fixed quietly. The team does not document what happened, what caused it, or what the fix was. Three months later, a different person builds the same mistake into a different workflow.

How to build a review checkpoint that works

A review checkpoint is not a human reading every output — that defeats the purpose of automation. It is a defined decision rule for which outputs get reviewed and what happens when something fails.

For any AI workflow that produces output that reaches customers or leadership, define: what does a suspicious output look like? What triggers a human review? What is the fallback if the review fails? These answers belong in the workflow design document before the system goes live.

In my experience, the checkpoint catches failures about twice as often as it is triggered by actual problems. Most of the time, the output is fine. But the checkpoint being there is what makes the team willing to trust the workflow — because they know the failure mode is covered.

What this means for building

The AI Builder Reality Check post covers the broader pattern of what is harder than it looks when you start building. This post is about one specific mechanism: inconsistent input data flowing into a confident-sounding AI output with no review between them.

The discomfort of writing this is the point. I built the thing. It broke in a specific and predictable way that I should have designed around earlier. The lesson is more credible coming from a concrete failure than from a hypothetical warning.

When you design your first AI workflow for marketing, the review checkpoint is not optional. Build it in from the start. The model will produce confident output regardless of what it receives. The checkpoint is the only part of the system that knows the difference.

Cheers, Chandler

Related reading

Continue with the adjacent sprint, artifact, or operating question.

/Builder

AI Builder Reality Check

A reality check for marketers who want to build AI products or internal tools without ignoring cost, compliance, QA, recovery, and users.