Agentic AI

How GAIA Broke our Design-Enginering Handoff (and What We Built Instead) - Part 2

Make Storybook the single source of truth for the design system. Use component stories as acceptance criteria to align design and engineering, preventing drift.

May 18, 2026

Author(s)

Leonardo Lemos

Steven Barber

Todd Gilbert

Contributor(s)

No items found.

In Part 1, we laid out the four-layer substrate of an AI-first design system (governance, knowledge, transformation, verification) and the “one artifact, two consumers” framing where JSDoc and Storybook serve both humans and agents. The post is what happened when we tried to use it on something the standard handoff couldn’t keep up with.

‍

GAIA Disrupted our Handoff

GAIA (Govern AI Assistant) is Credo AI’s AI-powered governance assistant. The pitch “Govern every AI system. Not just the ones you have time for” guided AI registration, AI-powered questionnaire completion, intelligent risk assessment, and compliance mapping.

We started GAIA in a different way. We didn’t adopt the standard process. The conventional loop is Figma mockups -> engineering review -> screenshot back to designer -> iterate. This loop was too slow for what we were building, and the artifacts it produced were too far from the dynamism users would actually see.

‍

GAIA needs suggestions surfaced contextually, with inline provenance for why a specific item was suggested. Loading, empty, partial, error states. Fallback chains. UI that streamed. Designers were mocking up things they hadn’t actually seen running. Engineers were shipping ahead of design because they had to. The handoff loop was…broken.

‍

Why: Conversational UX Doesn’t Fit Static mockups

Four reasons the existing process couldn’t carry GAIA.

1. Too many states. We split our suggestion system into two surfaces precisely because we had too many states to handle. That sprawl reached design, and a single Figma can’t enumerate them all.

2. Data dependency. Edge cases, fallback chains, architectural decisions, design and reality drifted often. What looks coherent on a happy-path mock falls apart when you wire it to real responses.

3. Interaction fidelity. A simple hover, a focus ring, a piece of data that hadn’t streamed down to the UI yet. Almost everything changed the rendered output in ways the design wasn’t ready for, and that engineers struggled to translate back into a static mock.

4. Lag time as a UI dimension. This is the one we hadn’t internalised before GAIA. AI features have inherent latency, and that latency isn’t a bug to hide. We didn’t want to apply the “traditional” AI stream process because suggestions are different from chats. But that hurt the very thing we wanted to maximise: the user’s time in governance. The first step takes a moment. The next takes a bit more. The solution was in the problem itself: Suggestions stream in over seconds, not all at once. That change essentially reshaped both what we designed and how we built:

Design. We added loading and status animations after the first step. We changed the AI assist to surface suggestions as they arrived, framed as “1/x ready” and not “1/10 total” because we don’t always know the total at the start.
Build & test. Static prototypes and mock data can’t factor lag in at all. They break the timeline. To get useful feedback on a streaming experience, you need something close enough to the finished product to actually feel the wait. That ruled out anything we couldn’t run.

Pivot: Storybook as Source of Truth

We moved the “centre of gravity”. Storybook became the place where we now go to change the design system. Not to find docs. Not to find snapshots. Not to compare the built components with the mockups. But to use it as the source of truth itself. A few things made it work:

Real components, real states, based on real data points we use to test and validate.
Stories enumerate states explicitly (e.g. Default, Loading, Error, Empty, Confirmed).
Accessibility addons, viewport configuration, and interaction tests live alongside the component.
The same artifact engineers ship is the same artifact designers review. No screenshot round-trip.

The Enabler: the AI-first Design System

Storybook as the source of truth only works if stories can be authored at the speed of design interaction. Manual story authoring is a bottleneck. The agentic-coding influence from Part 1 of the engineering blog series (rules, commands, JSDoc, story-as-contract) removes this very bottleneck. Grouped into four layers, each carries distinct weight in the design-engineering handoff:

Layer (Role in the handoff)

Governance - Components follow predictable patterns, naming, a11y, docs.

Knowledge - JSDoc & token schemas give agents enough context to draft a story.

Transformation - Plans & commands let agents scaffold stories, migrate states, refactor props.

Verification - Storybook a11y, visual diff, interaction tests; the automatic acceptance check.

‍

The verification layer carries extra weight in this model. Storybook becomes a contract; observed, respected, and amended when needs change.

The New Loop: Designer + Agent + Engineer

A real walk-through. For instance, SuggestionButton is a popover that surfaces pending AI suggestions on a use case.

(a) A designer writes the story spec (states, copy, a11y intent):

Hidden when no suggested value
Single suggestion -> “Apply suggestion”
Multiple-suggestion -> “Apply suggestions”
With explanation block from the suggestion request
With compaction notice when upstream context was truncated
Popover trigger label; sparkle adornment is decorative; announce apply count on action

(b) An agent drafts the component, the JSDoc, and suggestion-button.stories.tsx against the design-system rules:

(c) The designer manipulates the suggestion list, toggles value types and counts, and requests adjustments directly in the artifact. No screenshot, no spec docs.

(d) An engineer reviews the diff and validates against the verification layer: a11y add-on on the popover, type-check against props, visual regression on the trigger and content surfaces.

(e) Ship. Same artifact. No translation step.

Compounding: Primitives Promoted Across Surfaces

GAIA’s bespoke primitive didn’t stay in GAIA. The popover, the pending-acceptance widget, the compaction notice, the some-filled banner; each got a name, a story, a place in the design system.

‍

Each primitive added means the next surface inherits a vetted, accessible, story-covered component. The loop compounds.

Where Humans Stay

Agents are responsible for scaffolding stories, drafting components, enriching JSDoc, running a11y sweeps, and migrating props.

Humans are responsible for design intent, visual taste, token semantics, edge-case judgment, and deprecation calls.

‍

Here’s how we often think of this interaction:
agent = junior engineer; designer + engineer = judges.

Honest Gaps

A few things we haven’t solved:

Promotion to the design system isn’t automatic: We iterate on what counts as a primitive and whether it belongs in the design system or in a domain-oriented module. That’s a judgment call, not a rule we can codify yet.

Visual taste is still human. Agents don’t reason about whether something feels right, or about cohesion across surfaces; instead, they consistently repeat themselves.

Story-spec authoring is a new designer skill. The learning curve is real, and we’re still figuring out how to support it.

Type-check gaps let some runtime contracts slip past. Engineering review still catches these. We don’t pretend it doesn’t.

Context-window limits on large component trees. Agents can’t always see enough to do the right thing in one pass.

Storybook loses fidelity for cross-surface flows. A streaming, multi-page interaction with real network lag isn’t fully reproducible in a story. Figma still earns its keep for route-level prototyping when the flow itself is what’s being designed.

The compounding effect is directional. We cannot measure it yet. We feel it. We haven’t quantified it.

‍

What’s next?

We’re expanding this into non-UI engineering areas inside Credo AI. It’s a culture. Engineering teams who build self-documenting systems will prevail because they’re more architecturally constrained, and, therefore, more reliable by design.

‍

We’re building for external builders to connect with our platform, so they build on top of our platform, and we’ll share more soon.

‍

DISCLAIMER. The information we provide here is for informational purposes only and is not intended in any way to represent legal advice or a legal opinion that you can rely on. It is your sole responsibility to consult an attorney to resolve any legal issues related to this information.

Stay in the loop

Subscribe to our blog and get the latest posts delivered right to your inbox