← Dashboard

Methodology

How flowcraft.systems measures AI agent productivity

Core Philosophy

ROI is measured against a quality baseline, not against a hypothetical rushed output. The question we answer is:

"How long would a skilled human have taken to produce this same artifact at the same level of rigour — evidence-backed discovery, peer-reviewed architecture, 100% passing tests?"

This methodology is deliberately conservative. We do not count AI time spent on retries, corrections, or iteration — only the wall-clock minutes recorded by the SDK observer for a completed session. Manual hours are sourced from industry-standard estimates for each agent and will be calibrated against real team benchmarks as sample size grows.

The result is a genuine comparison: AI assistance vs a thoughtful human doing the same quality of work. Not "AI vs someone not thinking."

KPI Definitions

Agent Runs

Total span records in the spans table for the selected org and period. Each span corresponds to one completed or partial agent session.

Hours Saved

Sum of manual_hours − (agent_minutes / 60) across all runs with ROI data. Represents real human time reclaimed for higher-value work.

ROI Ratio

(total_manual_minutes) / (total_agent_minutes) across all runs with ROI data. A ratio of 10× means the same quality work took one-tenth the time.

Quant Complete %

Percentage of runs where roi.quant_complete = true. Use this to track data collection coverage — higher is better.

ROI Formula

The primary ROI metric computed per session is roi.hours_saved, written directly by the ROI skill at capture time.

Hours Saved (per session)

roi.hours_saved  ←  computed by the ROI skill; stored directly in the span

The ROI skill assesses the time a skilled practitioner would have taken to produce the same artifact at an equivalent quality bar, accounting for context and delivery conditions. Because this is a subjective professional estimate, it is captured once at session end by the skill — not re-derived by the dashboard.

⚠ Quant Complete Flag

The SDK sets roi.quant_complete = true only when roi.hours_saved and roi.agent_minutes are present and positive. Sessions without this flag are still valuable activity data but are excluded from ROI aggregations to avoid skewing totals.

Per-Agent ROI Baselines

Initial estimates, calibration in progress. Based on industry benchmarks for senior practitioners at quality-equivalent output. Figures will be updated as real team data accumulates.

Agent Manual hrs AI time Hrs Saved ROI Ratio
nw-product-discoverer 4.0 h 25 min 3.6 h 9.6×
nw-product-owner 3.0 h 20 min 2.7 h 9.0×
nw-solution-architect 8.0 h 30 min 7.5 h 16×
nw-platform-architect 5.0 h 25 min 4.6 h 12×
nw-acceptance-designer 2.0 h 12 min 1.8 h 10×
nw-software-crafter 3.5 h 45 min 2.8 h 4.7×
nw-functional-software-crafter 4.0 h 45 min 3.3 h 5.3×
nw-researcher 3.0 h 20 min 2.7 h 9.0×
nw-troubleshooter 2.5 h 15 min 2.3 h 10×
nw-documentarist 2.0 h 15 min 1.8 h 8.0×
nw-data-engineer 4.0 h 20 min 3.7 h 12×
nw-agent-builder 3.0 h 20 min 2.7 h 9.0×

* Values are initial estimates. Calibrate by logging manual time for 5+ runs per agent and updating roi.manual_hours_equivalent in the SDK configuration.

Weekly Activity Chart

The combo bar + line chart shows run count (bars, left axis) and hours saved (line, right axis) grouped by ISO week. Use this to identify velocity trends and correlate agent adoption with ROI growth. A widening gap between runs and hours saved indicates increasing ROI efficiency per run.

Artifact Type Distribution

Horizontal bar chart showing how agent runs distribute across artifact types (code, architecture, bdd-spec, docs, research, etc.). A heavy skew toward a single artifact type may indicate an imbalanced workflow — for example, generating a lot of code without corresponding architecture or test-design artifacts.

Data Collection

All telemetry is collected by the Flowcraft SDK Observer in packages/sdk. It writes an AgentSpan to the Supabase spans table on every completed agent session.

AgentSpan attribute reference
Attribute Type Description
agent.namestringAgent identifier, e.g. nw-software-crafter
agent.frameworkstringAgent framework, e.g. flowcraft or custom
agent.artifact.typestringOutput category: code | architecture | bdd-spec | …
agent.artifact.pathstringWorkspace-relative path of the primary output file
session.statusstringcomplete | partial | cancelled
roi.manual_hours_equivalentnumberBaseline manual effort in hours (from per-agent table)
roi.agent_minutesnumberActual wall-clock AI time in minutes
roi.quant_completebooleantrue when both ROI fields are populated
user.namestring?Optional: contributor name for personal leaderboard
sdk.versionstringFlowcraft SDK semver
workspace.org_slugstringOrganisation identifier

Enable Personal Leaderboard

The personal leaderboard on the 🏆 page ranks contributors by hours saved. It requires user.name to be present in span attributes.

⚠ Currently not available

The current AgentSpan interface does not include user.name. Add it as an optional field in packages/shared/src/span.ts and populate it from your preferred identity source (VS Code git config, env var, etc.).

How to add user.name (TypeScript SDK)

1. Add the field to packages/shared/src/span.ts:

// In the AgentSpanAttributes interface:
"user.name"?: string;

2. Populate it in the SDK observer when building the span:

const userName = process.env.FLOWCRAFT_USER_NAME
  || (await execGit('config user.name'))
  || undefined;

// then include in attributes:
"user.name": userName

3. Set the environment variable for each team member:

export FLOWCRAFT_USER_NAME="Alice"

Glossary

AgentSpan
A single telemetry record emitted by the Flowcraft SDK observer at the end of one agent session. Stored in the spans table.
Task Category
The type of work an agent session produced: Research, Requirements, Architecture, Platform Ops, Test Design, Implementation, Documentation, etc. Used to label spans in the ROI baselines table.
Quality Baseline
The estimated time a skilled human practitioner would need to produce the same artifact at equivalent rigour, without time pressure. Used as the denominator in ROI calculations.
ROI Ratio
Manual minutes divided by agent minutes. A ratio of 10× means the work was completed ten times faster than the quality baseline.
Hours Saved
The concrete time reclaimed: manual_hours − agent_hours. This is the key metric for engineering leadership.
Quant Complete
A boolean flag indicating a span has both ROI fields populated and can contribute to aggregate ROI statistics.
org slug
Workspace-level identifier for multi-tenant installations. Usually the GitHub organisation slug or company short name.
PostgREST
Supabase's auto-generated REST API over PostgreSQL. Used directly by the analytics HTML pages to query the spans table without a backend server.