Product Design

Designing a system that designs systems.

The architecture decisions, token design, cost engineering, and real tradeoffs behind an AI tool that writes directly to Figma. How I designed a credit model that stays honest at $19 a month on real usage.

Loading demo…

The most valuable thing nobody has time to build.

A proper design system with semantic tokens, multi-mode variables, and documented components is the foundation of scalable product work. Building one from scratch in Figma still takes 8 to 12 weeks with a two-person team. For freelancers juggling multiple clients and small agencies running lean, that timeline is the difference between delivering a proper system and delivering a pile of unconnected components.

The loudest pain point right now is the fragility of Figma's variable system. Moving or reorganizing variables between files frequently causes alias connections to break, forcing designers into hours of manual fixes. Even a simple reorganization can collapse an entire token hierarchy with no error, no warning. Just gone.

I mapped every tool in the space and found that while AI can now generate layouts, it can't generate systems. Figma Make is credit-limited and code-first, 3 frames max with absolute positioning. Relume builds wireframes externally. Nobody writes native variables, tokens, and components through conversation.

Four decisions that shaped everything.

Product design for AI tools is mostly about deciding where determinism matters more than flexibility. These four choices defined the architecture and the experience.

Decision 01

Structured commands over arbitrary code

Early versions let the AI write arbitrary code against Figma's API. Most impressive demo ever. Completely unreliable in practice. I replaced it with 70+ structured JSON command types. Less flexibility, finite failure modes. The AI authors a plan. The plugin executes it. No eval, no surprises.

Decision 02

The brain doesn't live in the plugin

Three components: platform (intelligence), relay (transport), plugin (execution). The AI orchestrator lives on my server. The plugin is a thin execution layer with no AI logic. Figma never sees the orchestration code, I can evolve the AI without touching the plugin, and review approval is simpler.

Decision 03

Pattern matching instead of AI classification

When the plugin encounters an unfamiliar design library, it needs to classify components. Could use AI. Chose deterministic pattern matching instead. Free, instant, predictable. Measured the miss rate on real files. Still nowhere near needing an AI fallback. If-statements beat a model call.

Decision 04

Pre-resolved role mappings in the payload

The AI resolves all design system role mappings (which token maps to which component property) during planning, then passes the resolved map to the plugin as part of the execution payload. The plugin never re-derives mappings on its own. This eliminated a class of scoring failures, particularly with Material Design 3.

The principle

Deterministic-first AI. Use the model for planning and creative decisions where judgment matters. Use structured commands for execution where consistency matters. Never let the AI improvise during the critical path.

The hardest part isn't the components.

Everyone starts with buttons and cards. The actual time sink is the variable layer: primitives aliased to semantics, semantics bound to components, everything working across light and dark. That's 2 to 4 weeks of work nobody sees. It's also the layer that breaks most often when files get reorganized.

Intent Studio authors a three-tier token hierarchy. Primitives are the raw values (color scales, spacing, type sizes) that live in a single mode. Semantics are the purpose-mapped layer (primary, surface, destructive) that get light and dark modes, aliased to primitives. Component tokens bind to semantic tokens and give components their visual properties.

Primitives

Color/White#FFFFFF

Color/Neutral Lightest#EEEEEE

Color/Neutral Light#999999

Color/Neutral Darkest#000000

Color/Primary 500#2563EB

Color Schemes

Background→ Color/White

Text→ Color/Neutral Darkest

Foreground→ Color/White

Border→ Color/Neutral Light

Accent→ Color/Primary 500

Components

Button

Card

Card title

Body text using semantic tokens

Input

Placeholder text...

Primitives raw values, single modeColor Schemes semantic aliases, light + darkComponents bound to semantic tokens

03b

Twenty-two tokens, not two hundred.

I scoped the semantic color system to 22 core tokens: 14 foundational, 2 surface-inverse, and 6 interactive states. The AI kept trying to invent extras. Fewer, well-defined tokens with clear purposes beats a sprawling set nobody can remember. Every background gets a paired foreground (on-primary, on-destructive, on-muted). That's the convention that prevents white text on white backgrounds.

Spacing lives in Primitives with a single mode. Colors get light and dark. Conflating the two creates unnecessary complexity. Most mature systems (shadcn, Material, Tailwind) follow this pattern. Intent Studio follows it because it works, not because it's novel.

A 5-pass engine built for precision.

Generative AI is inherently unpredictable. Design systems require deterministic precision. Every token needs to resolve. Every alias needs to connect. There's no room for “close enough.” The 5-pass pipeline solves this by separating planning from execution and giving each pass a narrow, testable scope.

Pass 0
Manifest
Intent parsing, industry aesthetics, token architecture mapping, component planning.
Haiku
Pass 1
Foundation
Primitive tokens: color scales, type scales, spacing, breakpoints, variable collections.
Haiku
Pass 2
Atoms & Components
Buttons, inputs, cards, and atomic UI elements authored with proper token bindings.
Sonnet
Pass 3
Sections
Structural section kits hydrated with design tokens rather than regenerated from scratch.
Haiku
Pass 4
Assembly
Page composition, component placement on canvas, documentation layer.
Haiku

Sonnet handles the pass that needs real judgment about component variants and token bindings. Haiku handles the high-volume execution passes at a fraction of the cost. This tiered routing is what lets the $19 monthly credit allotment cover a realistic month of design work without brushing the ceiling.

The WebSocket relay adds sub-200ms latency between the platform backend and the Figma plugin. Command batching sends 60+ variables in a single round trip. The live building experience feels nearly instantaneous because the user sees elements appearing on their canvas as the commands execute.

Credits that don't run out on real work.

Intent Studio runs on a credit model, not an unlimited pitch. A credit model is the honest way to offer a powerful AI tool at a sustainable price, because inference cost is real and it scales with how much you author. The product design problem is sizing the monthly allotment so a realistic month of design work doesn't brush the ceiling, and driving the cost per authored element low enough that each credit goes further than it should.

I approached this as a product design problem, not an engineering afterthought. The question was never “how do we ration the user.” It was “how do we make each credit worth more by the time it reaches the canvas.”

$0.31

Original cost per full system

→

$0.25–0.35

Optimized full system, target range

Three strategies compounded: structural section kits (don't regenerate layout structure when you can hydrate pre-built templates with tokens), intelligent model routing (Sonnet for thinking, Haiku for doing), and output optimization (compressed command batching that reduces token count without losing information). Individual component and section requests land in the $0.06 to $0.10 range each.

The $19 monthly tier includes a credit allotment sized for the realistic work of a designer or small team: a handful of full systems and dozens of smaller refinements per month. The average user never sees the ceiling. Heavier months are covered by top-up credit packs so power users have a clear path forward without getting throttled mid-project, and the cost per authored element stays low enough that the economics hold at both ends of the distribution.

Things that broke and what I learned.

The interesting product decisions aren't the ones that worked on the first try. These are the ones that taught me something about designing for reliability.

“I watched my variables disappear from the file in real time.”

Mid-execution, the variables panel went empty. Figma's API returned stale data. The code trusted it, created what it thought were new collections, and the originals got silently replaced. No crash. No error. Just gone.

The fix was straightforward once I understood the cause: never trust a snapshot. Always query live state before writing. But the deeper lesson was about defensive design. When you're building on a platform you don't control, the API's contract is a suggestion, not a guarantee.

LessonDefensive design isn't paranoia. It's the cost of building on a platform you don't control.

“One operation failed. 267 were silently skipped.”

A break statement on first error in a 900-action plan. Logs said "1 failure." The output was catastrophically incomplete. The user would have seen a design system that looked fine until they opened the variables panel and found most of it missing.

The fix was switching to continue-on-error with severity-proportional handling. A missing optional token gets logged and skipped. A failed variable collection creation stops the current pass. Error handling should make consequences proportional to severity.

LessonError handling is product design. The failure mode matters more than the failure itself.

“White text on a white background. No error. No warning.”

Some authored components had invisible text. The foreground token was correct for dark surfaces but was being applied to light ones. The contrast was technically correct in terms of token resolution. It just produced something unusable.

Contrast isn't a property of one token. It's a relationship between two. The fix was a foreground pairing convention: every background token gets a paired foreground. Primary background gets primary-foreground, not surface-inverse. This is what shadcn and Material do. Most hand-built systems forget it.

LessonToken architecture is relationship design. Individual tokens don't fail. Pairings do.

What changed.

The numbers tell the story more honestly than I can.

A design system that would take a two-person team weeks to months of manual token work was authored and deployed in under 5 minutes. That's roughly a 90% reduction in the mechanical labor of design system creation. Not the creative labor. The mechanical part. The part where you're manually creating variable collections, aliasing primitives to semantics, binding components to tokens, and checking contrast ratios across two modes. That part.

1,000+ native Figma elements in a single session. All WCAG AA compliant. Proper alias chains across light and dark modes. Seven different framework seed files tested (Untitled UI, shadcn, Material Design 3, Tailwind, Tokens Studio, and custom agency patterns). The system detected and worked with each one rather than fighting it.

Multi-turn conversational refinement works the way you'd expect. “Make the primary blue warmer” modifies existing tokens without touching secondary or accent colors. The AI reads the current file state before any modification, extending what's there rather than overwriting.

The creative decisions still belong to the designer. The scaffolding doesn't eat the budget anymore. That was the whole point.

1,000+: Elements authored on-canvas in a single session
< 5 min: First system authored from a blank file
7: Framework seed files tested
AA: WCAG contrast on all semantic colors

Loading demo…

What building this taught me.

The hardest problems were never the AI. They were the systems thinking: designing a token architecture that scales without collapsing, a command protocol that stays deterministic when the model wants to improvise, and a credit model where the monthly price is an honest promise and not a marketing trick that falls apart at real usage.

Those are product design problems. They just happen to involve an LLM. And they're the parts I'm most proud of working through, because they're the parts where getting it wrong would have meant shipping something that looked impressive in a demo and fell apart in practice.

Building solo forces a certain clarity about what matters. Every feature has to justify itself against one question: does this help someone reach their first “wow” moment faster? If the answer is “not directly,” it waits. That discipline shaped everything from the 5-pass architecture to the decision to hold off on CMS features until the design tool has clearly found its footing.

I don't think the future of design tooling is AI that replaces designers. I think it's AI that handles the steady, mechanical labor so designers can focus on the part they're actually good at: intent, taste, and judgment. That's the bet this product makes.

Deliverables produced: Three-tier token architecture (primitives, semantics, component-level). 70+ structured JSON command types. Five-pass AI orchestration pipeline with tiered model routing. WebSocket relay server with sub-200ms latency. Credit and cost model with per-request economics. Figma plugin (submitted for Community review). Platform backend with OAuth, rate limiting, and cost budgeting. Security audit and ToS compliance documentation.

The other side of this project

Brand and creative strategy

Naming, the competitive audit, the ] mark, the --intent: voice, the Atelier palette, the self-authoring landing page, and the GTM.

Read path 01 →