Core Philosophy
ROI is measured against a quality baseline, not against a hypothetical rushed output. The question we answer is:
"How long would a skilled human have taken to produce this same artifact at the same level of rigour — evidence-backed discovery, peer-reviewed architecture, 100% passing tests?"
This methodology is deliberately conservative. We do not count AI time spent on retries, corrections, or iteration — only the wall-clock minutes recorded by the SDK observer for a completed session. Manual hours are sourced from industry-standard estimates for each agent and will be calibrated against real team benchmarks as sample size grows.
The result is a genuine comparison: AI assistance vs a thoughtful human doing the same quality of work. Not "AI vs someone not thinking."
KPI Definitions
Agent Runs
Total span records in the spans table for the selected org and period.
Each span corresponds to one completed or partial agent session.
Hours Saved
Sum of manual_hours − (agent_minutes / 60) across all
runs with ROI data. Represents real human time reclaimed for higher-value work.
ROI Ratio
(total_manual_minutes) / (total_agent_minutes) across
all runs with ROI data. A ratio of 10× means the same quality work took one-tenth the time.
Quant Complete %
Percentage of runs where roi.quant_complete = true.
Use this to track data collection coverage — higher is better.
ROI Formula
The primary ROI metric computed per session is roi.hours_saved,
written directly by the ROI skill at capture time.
Hours Saved (per session)
roi.hours_saved ← computed by the ROI skill; stored directly in the span
The ROI skill assesses the time a skilled practitioner would have taken to produce the same artifact at an equivalent quality bar, accounting for context and delivery conditions. Because this is a subjective professional estimate, it is captured once at session end by the skill — not re-derived by the dashboard.
⚠ Quant Complete Flag
The SDK sets roi.quant_complete = true only when
roi.hours_saved and
roi.agent_minutes are present and positive.
Sessions without this flag are still valuable activity data but are excluded from
ROI aggregations to avoid skewing totals.
Per-Agent ROI Baselines
Initial estimates, calibration in progress. Based on industry benchmarks for senior practitioners at quality-equivalent output. Figures will be updated as real team data accumulates.
| Agent | Manual hrs | AI time | Hrs Saved | ROI Ratio |
|---|---|---|---|---|
| nw-product-discoverer | 4.0 h | 25 min | 3.6 h | 9.6× |
| nw-product-owner | 3.0 h | 20 min | 2.7 h | 9.0× |
| nw-solution-architect | 8.0 h | 30 min | 7.5 h | 16× |
| nw-platform-architect | 5.0 h | 25 min | 4.6 h | 12× |
| nw-acceptance-designer | 2.0 h | 12 min | 1.8 h | 10× |
| nw-software-crafter | 3.5 h | 45 min | 2.8 h | 4.7× |
| nw-functional-software-crafter | 4.0 h | 45 min | 3.3 h | 5.3× |
| nw-researcher | 3.0 h | 20 min | 2.7 h | 9.0× |
| nw-troubleshooter | 2.5 h | 15 min | 2.3 h | 10× |
| nw-documentarist | 2.0 h | 15 min | 1.8 h | 8.0× |
| nw-data-engineer | 4.0 h | 20 min | 3.7 h | 12× |
| nw-agent-builder | 3.0 h | 20 min | 2.7 h | 9.0× |
* Values are initial estimates. Calibrate by logging manual time for 5+ runs per agent
and updating roi.manual_hours_equivalent in the SDK configuration.
Weekly Activity Chart
The combo bar + line chart shows run count (bars, left axis) and hours saved (line, right axis) grouped by ISO week. Use this to identify velocity trends and correlate agent adoption with ROI growth. A widening gap between runs and hours saved indicates increasing ROI efficiency per run.
Artifact Type Distribution
Horizontal bar chart showing how agent runs distribute across artifact types (code, architecture, bdd-spec, docs, research, etc.). A heavy skew toward a single artifact type may indicate an imbalanced workflow — for example, generating a lot of code without corresponding architecture or test-design artifacts.
Data Collection
All telemetry is collected by the Flowcraft SDK Observer
in packages/sdk. It writes an
AgentSpan to the Supabase
spans table on every completed agent session.
AgentSpan attribute reference
| Attribute | Type | Description |
|---|---|---|
| agent.name | string | Agent identifier, e.g. nw-software-crafter |
| agent.framework | string | Agent framework, e.g. flowcraft or custom |
| agent.artifact.type | string | Output category: code | architecture | bdd-spec | … |
| agent.artifact.path | string | Workspace-relative path of the primary output file |
| session.status | string | complete | partial | cancelled |
| roi.manual_hours_equivalent | number | Baseline manual effort in hours (from per-agent table) |
| roi.agent_minutes | number | Actual wall-clock AI time in minutes |
| roi.quant_complete | boolean | true when both ROI fields are populated |
| user.name | string? | Optional: contributor name for personal leaderboard |
| sdk.version | string | Flowcraft SDK semver |
| workspace.org_slug | string | Organisation identifier |
Enable Personal Leaderboard
The personal leaderboard on the 🏆 page ranks contributors by hours saved.
It requires user.name to be present in span attributes.
⚠ Currently not available
The current AgentSpan interface does not include
user.name.
Add it as an optional field in packages/shared/src/span.ts
and populate it from your preferred identity source (VS Code git config, env var, etc.).
How to add user.name (TypeScript SDK)
1. Add the field to packages/shared/src/span.ts:
// In the AgentSpanAttributes interface: "user.name"?: string;
2. Populate it in the SDK observer when building the span:
const userName = process.env.FLOWCRAFT_USER_NAME
|| (await execGit('config user.name'))
|| undefined;
// then include in attributes:
"user.name": userName
3. Set the environment variable for each team member:
export FLOWCRAFT_USER_NAME="Alice"
Glossary
- AgentSpan
- A single telemetry record emitted by the Flowcraft SDK observer at the end of one agent session. Stored in the
spanstable. - Task Category
- The type of work an agent session produced: Research, Requirements, Architecture, Platform Ops, Test Design, Implementation, Documentation, etc. Used to label spans in the ROI baselines table.
- Quality Baseline
- The estimated time a skilled human practitioner would need to produce the same artifact at equivalent rigour, without time pressure. Used as the denominator in ROI calculations.
- ROI Ratio
- Manual minutes divided by agent minutes. A ratio of 10× means the work was completed ten times faster than the quality baseline.
- Hours Saved
- The concrete time reclaimed:
manual_hours − agent_hours. This is the key metric for engineering leadership. - Quant Complete
- A boolean flag indicating a span has both ROI fields populated and can contribute to aggregate ROI statistics.
- org slug
- Workspace-level identifier for multi-tenant installations. Usually the GitHub organisation slug or company short name.
- PostgREST
- Supabase's auto-generated REST API over PostgreSQL. Used directly by the analytics HTML pages to query the
spanstable without a backend server.