Design Systems and AI

TL;DR

A visual polish session was the trigger. I redirected it into design-system infrastructure because the refresh would only paint on top of the same structural problems. 18 days later, Primer iOS is production-shaped: 597 functional color tokens, 18 SwiftUI components, 17 UIKit twins, a demo app, a Figma plugin, all built with an agent-first workflow of 10 specialized agents. Engineering picked it up within 24 hours, leadership funded the rollout, and the same structural argument now seeds Primer Android.

Primer iOS: One designer, 10 AI agents, and a design system

Primer iOS is a design system package for GitHub Mobile: shared tokens, SwiftUI components, UIKit twins, a demo app, and a Figma plugin that reads from the same source as the code.

I started this solo, with no engineering headcount and no mandate. A visual refresh was the trigger, but the refresh couldn't be the finish line once the structural issues were visible. It would only paint on top of them. I used the moment to build the infrastructure under it, so the next design decisions had somewhere durable to land instead of being rediscovered every time.

18 calendar days later, the package was production-shaped and the first engineering contributor showed up within 24 hours of me announcing it. Leadership and the team funded it for production rollout.

Pivoting from polish to system

Before Primer iOS, there was a 12-PR visual refresh in flight, around 289 files and 30+ screens, with most of the actual UI work generated through an AI workflow inside the production codebase. The PRs were technically fine, but engineering capacity to merge them never materialized, and the deeper reason was structural. Piecemeal polish wouldn't get the team anywhere durable. For a moment it might have looked better, but the underlying issues would have stayed there, and the next refresh would have had the same problems. The system really had to be the project itself, not the polish painted on top.

I made the case to PM, design, and engineering leadership that we should redirect the capacity from visual polish into Primer iOS. They agreed after I showed the potential impact and ROI. The visual refresh becomes the evidence base for the design system, not the deliverable. That call shapes what the team is building toward, more than the package itself does.

What exists now

Primer iOS has three layers, shared tokens, SwiftUI components, and UIKit twins. UIKit still has to be there because GitHub Mobile for iOS still has a lot of UIKit. A SwiftUI-only system would look cleaner on paper and be much less useful in practice.

The first version included 597 functional color tokens, 726 bundled Octicons, 18 SwiftUI components, 17 UIKit twins, a demo app, a migration strategy, and a Figma plugin that generates a 1:1 component catalog from the same token definitions.

I built it with Swift 6 strict concurrency and zero third-party dependencies because the foundation has to stay small enough that engineering can trust it.

It isn't supposed to sit around as a nice demo repo. Engineering has to be able to pick it up and move real work into it. A Staff iOS engineer runs a production proof-of-concept inside the main app within the first month, which validates package structure, tokens, and components in a real codebase. The systems design team independently confirms the approach is the right one for keeping iOS coherent with the rest of the platform.

The agent-first workflow

This isn't Copilot autocompleting a few files, the repository is built around an agent-first workflow with 10 specialized AI agents, each with a clear job, coordinated by an orchestrator.

The flow is simple on purpose, because it has to work with plain language. I describe what I need, the orchestrator classifies the request, an analysis agent challenges overlap with existing patterns, and implementation agents work through tokens, SwiftUI, UIKit, Figma, and the demo app. Build checks run between steps, a reviewer agent checks token usage, accessibility, parity, conventions, and performance, and a commit agent prepares clean history. Nothing gets committed without human approval.

AI handles the volume, repetition, and cross-file coordination. I stay on system boundaries, product judgment, and the call on whether the work deserves to stay.

The agents have real jobs

Each agent owns a narrow part of the system. The Analysis agent challenges new work before implementation starts. The Tokens agent owns colors, spacing, fonts, durations, haptics, and icons. SwiftUI builds modern components. UIKit builds parity twins after reading the SwiftUI source. Figma creates plugin builders, Demo makes sure every component has a reference page, and Reviewer acts as the quality gate.

Generic AI output is usually where consistency goes to die, so the agents are constrained by repeated instructions, scoped responsibilities, and build verification. A new component is not done until it exists in SwiftUI, has a UIKit twin when needed, appears in the demo app, and has Figma coverage.

After a while it stops feeling like code generation. It feels more like running a workflow with clear owners, rules, and checks. The system knows where files live, which rules apply, how to validate them, and when to send work back.

The instruction layer

The agents are backed by a layered guidance system that contains global project instructions, file-scoped instructions, and agent-specific definitions. In total, the repository has more than 2,600 lines of agent and instruction documentation.

It's a lot, but design systems drift fast when the rules only live in one person's head. "Use PrimerColor tokens, never hardcoded colors" appears in more than one place on purpose. The rule is active during implementation, during review, and when editing matching files.

I repeat those rules on purpose because consistency is the whole point. They have to survive across agents, files, and passes, not live only in my head.

Where AI helps most

AI is most useful where design systems create a lot of necessary surface area. It helps with token generation, component boilerplate, SwiftUI and UIKit parity, demo pages, Figma builders, documentation, and the repeated checks humans can do, but rarely do perfectly at this scale.

The Figma plugin is a good example. It doesn't keep Figma and code in sync by asking someone to update both manually, it generates a component catalog from the same token definitions the Swift package already uses. A designer looking at a PrimerButton in Figma and an engineer looking at PrimerButton in Xcode are looking at the same underlying values.

What I want is less translation, and less time spent rediscovering what the system has already decided.

Where human judgment stays in charge

The agents are fast, but they aren't the product designer, architect, or final reviewer. They make confident mistakes, and some get caught by the reviewer, some by the build, and some by me.

UIKit is the hardest part. Some SwiftUI effects don't have reliable UIKit equivalents. Instead of forcing a bad port, the workflow documents the gap and moves on. I keep one rule for the rest of the project. When a problem takes too many loops, stop, write down what happened, and avoid turning stubbornness into architecture.

The workflow also has persistent memory. When the system learns that a certain approach doesn't work, that lesson becomes a note, then a rule, then something the Reviewer can enforce later. Every mistake makes the next pass a little better.

Why this goes beyond the refresh

Traditional design system work still involves too much translation. A designer creates a Figma component, an engineer interprets it, review catches differences, and the Figma component and code component drift over time.

What I want instead is infrastructure that designers and engineers can both build on. A designer can describe a component in product language. The system can plan it, build the code, create the Figma builder, add the demo page, and validate the result. The designer still reviews it, but the handoff gets much smaller.

Once tokens, component rules, demo references, and Figma builders point at the same source, the team spends less time reinterpreting what the system has already decided.

AI gets interesting for design when it helps build and verify that structure without pretending it can replace judgment.

What it changes beyond iOS

Primer iOS pulls in a wider set of partners than I expected. The systems-design and accessibility teams validate the token-enforced approach. Engineering picks up production-readiness work: release infrastructure, snapshot testing, naming, and lowering the deployment target so the package can actually be adopted. The work is selected for an industry conference talk, which forces me to articulate the agent-first workflow more clearly in public, and gives it more reach.

I also extended the same systems thinking cross-platform. After Primer iOS, I wrote up the structural case for an equivalent on the Android side, citing the open issues, the duplicated tokens, and the typography sprawl. Acknowledging the debt as systemic, and not anyone's fault, takes careful framing: the point is to make the work easier for a future team, not to score the past.

The first engineering contributor showed up within 24 hours. The Primer systems-design team confirmed the approach within the first month. The proof-of-concept inside the production app validated structure and tokens before there was a mandate to use it. None of that came from a roadmap slot. It came from the package showing enough value, and its immense ROI.

Closing

Primer iOS is infrastructure for better product work, dressed as a design system.

The agent workflow makes it possible to build across tokens, components, docs, demos, and Figma without giving up human review. AI broadens what one designer can build, but the quality bar and the final calls still stay human.