Black Flag Design
Working sessionMay 13, 2026 · 30 min
For Pascal · picking up from 05/12

QA Cases.

The one format question we never closed yesterday. This is the proposal, and where it fits inside the playbook you're already drafting.

Author Keith Pattison 1 of 9
AnchorThe frame Pascal pitched on 05/12
01. Your frame is the frame we are running with

Testing is not the pyramid alone. It is a four-phase OS that bookends the pyramid with intent up-front and observability after.

Phase 01

Intent

Documentation of what perfect looks like. North Star. PRDs. Design intent. QA cases.

Notion · Figma · qa-cases/*.md
Phase 02

Implementation

Designing and building against the intent. Quality kept up as the code is written, not after.

ESLint · Vitest · Playwright (functional)
Phase 03

Deploy gate

The final stage gate before production. Real services. Real flows. Tag-cut, not every commit.

Playwright (E2E) · axe · visual snapshots
Phase 04

Observe

In-production signal. New regressions surface as new QA cases, feeding back to Phase 01.

Sentry · PostHog · feedback widget
04 loops back to 01
"We have to think beyond the post-deployment. Before the code is written. And also we have design and engineering. How do we find solutions that speak to the documentation of what the North Star is?" Pascal, 05/12 (paraphrased lightly for slide)
What this deck adds to your frame

QA cases sit in Phase 01. They are the intent document. Phase 02 translates them into running tests. Phase 03 gates on them. Phase 04 generates new ones. They are the single artifact that crosses every phase.

QA Cases · Where they fit2 of 9
Where the pyramid livesInside Phases 02 + 03
02. The testing pyramid lives inside your frame

The pyramid Reggie and I keep drawing is the middle two phases of your OS. Code quality runs in parallel underneath the whole thing.

Code Quality (parallel)
ESLint · TS · Prettier · A11y · Bundle
Unit tests
Vitest · Happy DOM
Functional (mocked)
Playwright
E2E
Playwright
Answer to your question on 05/12

"Do all these fit in the code quality layer?"

Yes for ESLint, TypeScript, Prettier, A11y, bundle. They are the parallel layer underneath the pyramid. They read the code as you type. They never run it.

Unit / Functional / E2E

Three layers, three confidence levels. Unit tests run alone in seconds. Functional drives the UI with mocks. E2E drives the real services on tag-cut releases.

Reggie's clarification

Functional and E2E are the same Playwright suite in two modes (mocked vs real). Tags (@critical, @ft-only, @e2e-only) decide what runs where.

QA Cases · Where they fit3 of 9
Do they work together?Yes. Here is the sequence.
03. "Do all these tools work together?" Yes.

Each tool runs at a different moment in time. Earlier gates are cheaper. Later gates are higher confidence. Nothing is redundant.

T-0 · In-IDE

As you type

  • ESLint (with custom rules)
  • TypeScript
Editor
Cost: zero. Steers agents in real time.
T+1 · On commit

Husky pre-commit

  • ESLint on staged files
  • Prettier auto-format
  • Build smoke
Local · seconds
Cost: 2-5 sec. Stops broken commits.
T+2 · On push

Husky pre-push

  • Full Vitest suite
  • Type-check
Local · 1 minute
Cost: 1 min. Catches unit failures before CI.
T+3 · On PR

GitHub Actions

  • Lint + build (job A)
  • Vitest (job B)
  • Playwright (mocked)
CI · 5-8 minutes
Cost: CI minutes. Required to merge.
T+4 · On tag

Release cut

  • Playwright (real services)
  • Visual snapshots
  • axe a11y
CI · 15 minutes
Cost: full E2E. Gates prod deploy.
T+5 · In production

Observability

  • Sentry (errors)
  • PostHog (usage)
  • BFD feedback widget
Always-on
New issues become new QA cases.
The point of running locally AND in CI

Locally the agent reacts in seconds. CI is the artifact we sign off on. We are not paying CI minutes for things that should have failed at the keyboard.

Where your "do they work together?" question lands

They do, but only because they run at different times on different surfaces. The stack is a timeline, not a checklist. Read it left-to-right.

QA Cases · Where they fit4 of 9
The gap, and the proposalThe part we never finished
04. The one thing we didn't lock yesterday

We agreed on the model, the pilot, the playbook. We never agreed on the format every team writes tests in.

"Most of our confidence in the app, or pretty much all of it right now, comes from people clicking through it." Keith on the problem, 05/12

"If you wrote them out as when you're on this page, I would click this, I would type this, I would expect it to say this. These are the actual phrases that would mimic the test." Keith on the proposal, 05/12

Step 1 · Author

Plain-English Markdown QA case.

Anyone can write one. A QA in India, a designer, you, Claude. No code. No selectors. Just intent.

Step 2 · Translate

Claude generates the Playwright spec.

Using the repo's Page Objects, Fixtures, Factories. One translation pattern across BFD.

Step 3 · Run

Same spec, mocked on PR, real on tag.

Tags pin the runner. @critical in both modes. @ft-only on PR. @e2e-only on release.

QA Cases · Where they fit5 of 9
Source formattests/qa-cases/<area>/<case>.md
05. Anatomy of a QA case

Frontmatter. Setup. Steps. Assertions. Out of scope. That is the whole schema.

---
id:     web.auth.login.google
title:  Web sign-in shows Google button, starts OAuth
app:    web            # web | admin | api
area:   authentication
tags:   [@critical]   # critical | ft-only | e2e-only
owners: [pascal, keith]
---

## Setup
- The user is signed out.
- Mocks: auth.google.start, auth.google.callback.

## Steps
1. I navigate to /authentication/login.
2. I expect "Sign in with Google" to be visible.
3. I click "Sign in with Google".
4. I wait for the URL to leave /authentication/login.

## Assertions
- The button renders before any click.
- Clicking leaves the login route (mocked OAuth).

## Out of scope
- Real Google OAuth (covered by @e2e-only case).
Why a human can write it

It is product language, not code. A QA in India already writes briefs exactly like this. A designer who knows the flow can. You can.

Why Claude can translate it

Frontmatter pins the file location. Tags pin the runner. "I click label" maps onto a Page Object method. The case is the prompt.

Why this is not new

Serious QA teams have shipped this brief for years. We are just giving the agent the same one.

The five required sections
  • Frontmatter · id, title, app, area, tags, owners
  • Setup · preconditions + mocks
  • Steps · numbered "I do X" phrases
  • Assertions · observable outcomes
  • Out of scope · what this case does not cover
QA Cases · Where they fit6 of 9
The translationReggie's TP architecture is the lift target
06. From QA case to Playwright spec

The translation is deterministic. Reggie's NCEE Teaching Partner setup is the architecture we lift into every other repo.

qa-cases/web/auth/login-google.md

## Steps
1. I navigate to /authentication/login.
2. I expect "Sign in with Google" to be visible.
3. I click "Sign in with Google".
4. I wait for the URL to leave the login route.

playwright/specs/web/login.spec.ts (generated)

import { test, expect } from '../../fixtures/unified';

test.describe('Web Login @critical', () => {
  test('shows Google sign-in, starts OAuth',
    async ({ mockedPage, loginPage }) => {
      await loginPage.goto();
      await expect(loginPage.googleLoginButton).toBeVisible();
      await Promise.all([
        mockedPage.waitForURL(u =>
          !u.toString().includes('/authentication/login')),
        loginPage.clickGoogleLogin(),
      ]);
  });
});
// already in NCEE Teaching Partner today playwright/ ├── fixtures/unified.ts // auto mocks vs real ├── page-objects/web/LoginPage.ts ├── factories/users.ts ├── mocks/{external,internal}.ts └── specs/web/login.spec.ts @critical
The lift

Templatize once. Share across BFD.

bfd-express-base: led by Reggie. Lifted directly from TP. bfd-convex-base: led by Keith, because Keith carries the Convex port. Both ship as scaffolds with the playbook inside.

QA Cases · Where they fit7 of 9
Your laneWhat's yours, what's ours
07. Your lane

Pascal owns the QA case library and the playbook. Reggie owns the Playwright architecture. Keith owns the first wave + the Convex port.

Pascal · QA Case Editor + Playbook Owner

The library, the bar, the rollout.

  • Own the QA case library. Catalog, coverage map, priorities. The single source of truth for what we test.
  • Set + enforce the schema. Five sections, no exceptions. Every dev case goes through you before it becomes a spec.
  • Write cases yourself for product flows and intent-driven scenarios. You are the SME for the "what should this do."
  • Own the playbook doc. The four-phase OS lives in your draft. This deck plugs into it.
  • Lead Monday's Practice Dev rollout. Keith demos the case-to-spec loop. You frame the OS.
Reggie · Architecture

The translation target.

  • Playwright architecture in TP (already done).
  • Extract bfd-express-base by Friday.
  • Hand-translate 2 of Keith's QA cases to validate the pattern.
  • Backstop questions on Page Object conventions.
Keith · First wave

Cases + Convex port.

  • 10 QA cases for TP auth + meeting flows by Thursday.
  • bfd-convex-base following Reggie's architecture.
  • Demo the case-to-spec translation Monday.
  • "Rip out all my tests" once the playbook is locked.
One thing to settle on Playwright

Playwright is automation infrastructure, not a manual-test workbench.

You asked yesterday whether you could mess around with Playwright in the interim. Reggie's answer: it is not a tool a QA person uses to test a site, it is code you write into the repo so we do not need a person to test the site. If you want to feel the loop yourself, Playwright Codegen is the one surface for humans. You open a page, click around, and it writes the TypeScript for you. That is the closest thing to a workbench. Everything else runs without a human at the wheel.

QA Cases · Where they fit8 of 9
Lock itThree decisions. Five days. Done.
08. Three decisions. The week.

Three calls to make in this meeting. The rest of the week writes itself.

#The callProposalOwner
1 Lock the QA-case Markdown schema.
Frontmatter · Setup · Steps · Assertions · Out of scope.
Yes, lock it. Extend later via tags, not schema changes. Keith ships the schema doc by EOD.
2 Pascal as QA Case Editor + Playbook Owner.
Sets the bar. Reviews dev-authored cases. Writes the playbook.
Yes. Devs author against the schema. Pascal reviews before translation. Pascal writes cases himself where he is the SME. Pascal
3 NCEE TP as reference architecture, templatized into two bases.
The thing Claude translates into.
Yes. Reggie extracts bfd-express-base. Keith builds bfd-convex-base on the same shape. Reggie + Keith
Wed · today

Lock the 3 calls

  • This meeting.
  • Pascal: board draft of the 4-phase OS.
  • Keith: schema doc by EOD.
Thu

10 cases + 2 specs

  • Keith writes 10 cases.
  • Reggie translates 2 by hand.
  • Pascal reviews both.
Fri

Playbook v1

  • Pascal ships playbook + journey.
  • Reggie extracts express-base.
Sat / Sun

Buffer

  • Async polish.
  • Dry-run the Monday demo.
Mon · Practice Dev

Roll to team

  • Pascal frames the OS.
  • Keith demos case → spec live.
  • Name the first 3 repos to convert.
QA Cases · Where they fit9 of 9