Agent Analytics — Methodology

Core Philosophy

ROI is measured against a quality baseline, not against a hypothetical rushed output. The question we answer is:

"How long would a skilled human have taken to produce this same artifact at the same level of rigour — evidence-backed discovery, peer-reviewed architecture, 100% passing tests?"

This methodology is deliberately conservative. We do not count AI time spent on retries, corrections, or iteration — only the wall-clock minutes recorded by the SDK observer for a completed session. Manual hours are sourced from industry-standard estimates for each agent and will be calibrated against real team benchmarks as sample size grows.

The result is a genuine comparison: AI assistance vs a thoughtful human doing the same quality of work. Not "AI vs someone not thinking."

KPI Definitions

Agent Runs

Total span records in the spans table for the selected org and period. Each span corresponds to one completed or partial agent session.

Hours Saved

Sum of manual_hours − (agent_minutes / 60) across all runs with ROI data. Represents real human time reclaimed for higher-value work.

ROI Ratio

(total_manual_minutes) / (total_agent_minutes) across all runs with ROI data. A ratio of 10× means the same quality work took one-tenth the time.

Quant Complete %

Percentage of runs where roi.quant_complete = true. Use this to track data collection coverage — higher is better.

ROI Formula

The primary ROI metric computed per session is roi.hours_saved, written directly by the ROI skill at capture time.

Hours Saved (per session)

roi.hours_saved  ←  computed by the ROI skill; stored directly in the span

The ROI skill assesses the time a skilled practitioner would have taken to produce the same artifact at an equivalent quality bar, accounting for context and delivery conditions. Because this is a subjective professional estimate, it is captured once at session end by the skill — not re-derived by the dashboard.

⚠ Quant Complete Flag

The SDK sets roi.quant_complete = true only when roi.hours_saved and roi.agent_minutes are present and positive. Sessions without this flag are still valuable activity data but are excluded from ROI aggregations to avoid skewing totals.

Per-Agent ROI Baselines

Initial estimates, calibration in progress. Based on industry benchmarks for senior practitioners at quality-equivalent output. Figures will be updated as real team data accumulates.

Agent	Task Type	Manual hrs	AI time	Hrs Saved	ROI Ratio
nw-product-discoverer	Research	4.0 h	25 min	3.6 h	9.6×
nw-product-owner	Requirements	3.0 h	20 min	2.7 h	9.0×
nw-solution-architect	Architecture	8.0 h	30 min	7.5 h	16×
nw-platform-architect	Platform Ops	5.0 h	25 min	4.6 h	12×
nw-acceptance-designer	Test Design	2.0 h	12 min	1.8 h	10×
nw-software-crafter	Implementation	3.5 h	45 min	2.8 h	4.7×
nw-functional-software-crafter	Implementation	4.0 h	45 min	3.3 h	5.3×
nw-researcher	Research	3.0 h	20 min	2.7 h	9.0×
nw-troubleshooter	Debugging	2.5 h	15 min	2.3 h	10×
nw-documentarist	Documentation	2.0 h	15 min	1.8 h	8.0×
nw-data-engineer	Data Engineering	4.0 h	20 min	3.7 h	12×
nw-agent-builder	Agent Building	3.0 h	20 min	2.7 h	9.0×

* Values are initial estimates. Calibrate by logging manual time for 5+ runs per agent and updating roi.manual_hours_equivalent in the SDK configuration.

Weekly Activity Chart

The combo bar + line chart shows run count (bars, left axis) and hours saved (line, right axis) grouped by ISO week. Use this to identify velocity trends and correlate agent adoption with ROI growth. A widening gap between runs and hours saved indicates increasing ROI efficiency per run.

Artifact Type Distribution

Horizontal bar chart showing how agent runs distribute across artifact types (code, architecture, bdd-spec, docs, research, etc.). A heavy skew toward a single artifact type may indicate an imbalanced workflow — for example, generating a lot of code without corresponding architecture or test-design artifacts.

Data Collection

All telemetry is collected by the Flowcraft SDK Observer in packages/sdk. It writes an AgentSpan to the Supabase spans table on every completed agent session.

AgentSpan attribute reference

Attribute	Type	Description
agent.name	string	Agent identifier, e.g. `nw-software-crafter`
agent.framework	string	Agent framework, e.g. `flowcraft` or `custom`
agent.artifact.type	string	Output category: `code` \| `architecture` \| `bdd-spec` \| …
agent.artifact.path	string	Workspace-relative path of the primary output file
session.status	string	`complete` \| `partial` \| `cancelled`
roi.manual_hours_equivalent	number	Baseline manual effort in hours (from per-agent table)
roi.agent_minutes	number	Actual wall-clock AI time in minutes
roi.quant_complete	boolean	`true` when both ROI fields are populated
user.name	string?	Optional: contributor name for personal leaderboard
sdk.version	string	Flowcraft SDK semver
workspace.org_slug	string	Organisation identifier

Enable Personal Leaderboard

The personal leaderboard on the 🏆 page ranks contributors by hours saved. It requires user.name to be present in span attributes.

⚠ Currently not available

The current AgentSpan interface does not include user.name. Add it as an optional field in packages/shared/src/span.ts and populate it from your preferred identity source (VS Code git config, env var, etc.).

How to add user.name (TypeScript SDK)

1. Add the field to packages/shared/src/span.ts:

// In the AgentSpanAttributes interface:
"user.name"?: string;

2. Populate it in the SDK observer when building the span:

const userName = process.env.FLOWCRAFT_USER_NAME
  || (await execGit('config user.name'))
  || undefined;

// then include in attributes:
"user.name": userName

3. Set the environment variable for each team member:

export FLOWCRAFT_USER_NAME="Alice"

Glossary

AgentSpan: A single telemetry record emitted by the Flowcraft SDK observer at the end of one agent session. Stored in the spans table.
Task Category: The type of work an agent session produced: Research, Requirements, Architecture, Platform Ops, Test Design, Implementation, Documentation, etc. Used to label spans in the ROI baselines table.
Quality Baseline: The estimated time a skilled human practitioner would need to produce the same artifact at equivalent rigour, without time pressure. Used as the denominator in ROI calculations.
ROI Ratio: Manual minutes divided by agent minutes. A ratio of 10× means the work was completed ten times faster than the quality baseline.
Hours Saved: The concrete time reclaimed: manual_hours − agent_hours. This is the key metric for engineering leadership.
Quant Complete: A boolean flag indicating a span has both ROI fields populated and can contribute to aggregate ROI statistics.
org slug: Workspace-level identifier for multi-tenant installations. Usually the GitHub organisation slug or company short name.
PostgREST: Supabase's auto-generated REST API over PostgreSQL. Used directly by the analytics HTML pages to query the spans table without a backend server.