The Firm as a Learning Loop: Human Capital, Token Capital, and Private Evals

What Is This?

Satya Nadella argues that the AI-native firm will not be defined by which model it uses. It will be defined by whether it owns a learning loop that compounds human expertise into reusable AI capability.

His terms are useful:

human capital = knowledge, judgment, relationships, taste, pattern recognition

token capital = the firm's owned AI capability built from workflows, traces, evals, context, and institutional knowledge

The important sentence is this:

You can offload a task, or even a job, but you can never offload your learning.

That is the core model. A company can rent frontier models, but it cannot rent its own accumulated judgment. The durable asset is the loop that turns work into better future work.

Why Does It Matter?

Most companies are asking the wrong AI question:

Which model should we use?

The better question is:

What system turns our work, failures, decisions, customer edge cases, and expert judgment into reusable capability that survives model churn?

A firm that only calls a frontier model is a consumer of intelligence. A firm that captures its own traces, builds private evals, maintains institutional memory, and improves workflows after every run is building an internal hill-climbing machine.

The advantage is not that the model knows more in general. The advantage is that the organization knows more about its own work, and has a system for converting that into better actions.

The Core Loop

The AI-native firm needs a loop like this:

work happens
  -> traces are captured
  -> outcomes are judged
  -> failures are categorized
  -> private evals are updated
  -> memory/context improves
  -> workflows change
  -> agents perform better next time
  -> humans learn what changed

If that loop is real, every task can become training signal. If it is missing, AI becomes a faster way to forget.

Human Capital Does Not Disappear

A shallow reading says AI reduces human capital. Nadella's better point is that human capital becomes more important because it directs token capital.

Models do not know which goals matter inside a firm. They do not know which customer edge case is strategically important, which workaround is dangerous, which failure is a one-off, or which judgment reflects ten years of tacit context.

Humans still provide:

goal selection,
taste,
domain judgment,
relationship context,
moral and commercial trade-offs,
recognition of weird cases,
the decision to update or ignore a signal.

Without that direction, compute runs in circles.

Token Capital Is Not Just Data

Token capital is easy to misunderstand as "our documents in a vector database." That is too small.

A useful definition:

token capital = proprietary AI capability that improves from the organization's real work

It includes:

workflow traces,
private eval suites,
examples of good and bad outcomes,
task-specific rubrics,
memory of prior decisions,
tool and system integrations,
domain-specific context packs,
reusable agent procedures,
human feedback on edge cases.

This is why private evals matter. Public benchmarks tell you whether a model is generally capable. Private evals tell you whether the system is getting better at your work.

Private Evals Become IP

Anthropic's agent-eval framing makes this concrete: agents are not evaluated like simple chatbots. They operate over many turns, call tools, modify state, and adapt based on intermediate results. The evaluation object is the trajectory, not just the final answer.

For a company, that means evals should test real business outcomes:

Did the support agent solve the customer problem without inventing policy?
Did the coding agent modify the right files and pass the real tests?
Did the sales-prep agent surface relevant account context instead of generic notes?
Did the finance workflow preserve auditability?
Did the research agent cite sources and avoid laundering a thread into a fact?

A private eval suite becomes a map of what the company cares about. It encodes judgment. It also becomes the portability test Nadella points at:

Can you swap the general model without losing the company veteran?

If yes, the firm owns the loop. If no, the model provider owns the learning.

Institutional Memory Is More Than a Knowledge Base

Organizational-learning theory has been here before. Nonaka and Takeuchi's SECI model describes knowledge creation as movement between tacit and explicit knowledge:

socialization      = tacit -> tacit
externalization    = tacit -> explicit
combination        = explicit -> explicit
internalization    = explicit -> tacit

AI changes the tooling, not the underlying problem. Firms still need to turn expert intuition into shareable artifacts, combine those artifacts into better systems, and feed them back into human practice.

The mistake is thinking that explicit knowledge alone is enough. Brown and Duguid warned against treating information as if it could be detached from social context. Knowledge lives in practices, communities, norms, and use. An AI system that stores documents but loses context is not institutional memory. It is a searchable archive.

Institutional memory becomes useful when it is tied to action:

what happened,
why it mattered,
what decision was made,
what outcome followed,
what should change next time.

The Data Flywheel Is Not Automatic

People use "data flywheel" too loosely. More data does not automatically create advantage.

Data becomes a flywheel only when it improves the product or workflow, which creates more useful use, which creates better data, which improves the system again. Harvard's work on data network effects makes the constraint clear: customer-generated data is defensible only under certain conditions, such as when the data is proprietary, improves the product meaningfully, and cannot be easily replicated by competitors.

For AI firms, the equivalent is:

trace -> eval -> improvement -> better workflow -> more useful trace

Raw logs are not the moat. The moat is the selection, labeling, evaluation, and workflow change that turns logs into learning.

Why Smart People Get This Wrong

They confuse model access with capability

If every competitor can call the same model, model access is not strategy. The strategic layer is what the firm does with its own work traces and judgment.

They build knowledge bases without feedback loops

A document store answers questions. A learning loop changes future behavior. Those are different systems.

They treat evals as QA instead of capital formation

An eval is not only a test. It is a codified preference about what good work means. A good private eval suite is stored judgment.

They over-automate before they understand the work

If the organization has not learned how experts judge outcomes, automating the workflow may only accelerate bad defaults.

They ignore portability

If the system's expertise is trapped inside one model or vendor stack, the firm has not built token capital. It has rented it.

How To Use This

1. Build the eval before scaling the agent

For every recurring workflow, define:

the task,
the starting context,
the desired outcome,
the state that must change,
the failure modes,
the human judgment that matters.

Then run the agent against that repeatedly.

2. Capture traces deliberately

Do not keep only the final answer. Preserve the trajectory:

prompt,
tool calls,
retrieved context,
intermediate failures,
final output,
real-world result,
human correction.

That is the raw material of token capital.

3. Convert expert judgment into reusable rubrics

When a human says "this is bad," extract the rule. Was it wrong tone, wrong fact, wrong priority, missing context, unsafe action, poor timing, or weak commercial judgment?

4. Keep the model swappable

The control question:

If GPT-6, Claude, Gemini, or an open model replaced today's model, would the firm's learning loop still work?

If no, the architecture is too dependent on the model provider.

5. Feed learning back to humans

The goal is not a machine that learns while people stagnate. The loop should also make humans sharper: better rubrics, better examples, better postmortems, better taste.

What This Means For Hermes And Jme-Loop

This is exactly the direction Hermes should move:

canonical memory as institutional memory,
source cards and reference cards as explicit knowledge conversion,
tool traces and cron outputs as workflow traces,
private evals for recurring agent tasks,
process updates when failures repeat,
model/runtime portability as a design constraint.

The practical next step is not "more agents." It is better loops around the agents we already run.

For Jamie's stack, the test is:

Can Hermes get better at Jamie-specific work without trapping the learning inside one model or one chat thread?

If yes, it is becoming token capital. If no, it is just automation.

Key Terms

Human capital: human knowledge, judgment, relationships, taste, and context.
Token capital: owned AI capability built from the firm's workflows, traces, evals, memory, and context.
Private eval: internal test suite measuring whether an AI system improves on outcomes that matter to the organization.
Trace: the full record of an agent run: inputs, tool calls, retrieved context, intermediate steps, output, and state changes.
Institutional memory: durable knowledge of what happened, why, what was decided, and what should change next time.
Data flywheel: a loop where usage generates data that improves the system, attracting more useful usage and better data.
Tacit knowledge: expert know-how that is hard to fully write down.
Explicit knowledge: codified knowledge that can be documented, shared, and recombined.

Recall Questions

Why is picking the best model not enough to create durable firm advantage?
What is the difference between a knowledge base and a learning loop?
Why can private evals become company IP?
What does it mean to keep the "company veteran" while swapping the general model?
How does human capital direct token capital rather than disappear under it?

Best Resources to Learn More

Read Nadella's article first for the business architecture claim.
Pair it with Anthropic's agent-eval work to understand why private evals need traces and outcome checks.
Use Nonaka and Takeuchi for the old organizational-learning frame: tacit and explicit knowledge conversion.
Use data-network-effects work to avoid vague "data moat" thinking.

Sources

Satya Nadella. "A frontier without an ecosystem is not stable." X Article, 14 Jun 2026. https://x.com/satyanadella/status/2066182223213293753
Microsoft Foundry overview. Enterprise agents, memory, observability, evals, governance, and Model Context Protocol. https://ai.azure.com/
Anthropic Engineering. "Demystifying evals for AI agents." 2026. https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents
Harvard Digital Data Design Institute. "Data, network effects, and competitive advantage." https://d3.harvard.edu/data-network-effects-and-competitive-advantage
ASCN. "SECI Model of Knowledge Creation: Socialization, Externalization, Combination, Internalization." https://ascnhighered.org/ASCN/change_theories/collection/seci.html
Brown JS, Duguid P. "Organizational Learning and Communities-of-Practice: Toward a Unified View of Working, Learning, and Innovation." Organization Science, 1991. Accessible PDF mirror: https://joelchan.me/INST801-FA22%20Readings/Brown_Duguid_1991_Organizational%20Learning%20and%20Communities-of-Practice.pdf
Brown JS, Duguid P. The Social Life of Information. Harvard Business School Press, 2000. Review summary: https://webdoc.sub.gwdg.de/edoc/aw/ucsb/istl/01-winter/review3.html
Existing library article: /library/agent-evals-stateful-systems

What Is This?

His terms are useful:

human capital = knowledge, judgment, relationships, taste, pattern recognition

token capital = the firm's owned AI capability built from workflows, traces, evals, context, and institutional knowledge

The important sentence is this:

You can offload a task, or even a job, but you can never offload your learning.

That is the core model. A company can rent frontier models, but it cannot rent its own accumulated judgment. The durable asset is the loop that turns work into better future work.

Why Does It Matter?

Most companies are asking the wrong AI question:

Which model should we use?

The better question is:

What system turns our work, failures, decisions, customer edge cases, and expert judgment into reusable capability that survives model churn?

The advantage is not that the model knows more in general. The advantage is that the organization knows more about its own work, and has a system for converting that into better actions.

The Core Loop

The AI-native firm needs a loop like this:

work happens
  -> traces are captured
  -> outcomes are judged
  -> failures are categorized
  -> private evals are updated
  -> memory/context improves
  -> workflows change
  -> agents perform better next time
  -> humans learn what changed

If that loop is real, every task can become training signal. If it is missing, AI becomes a faster way to forget.

Human Capital Does Not Disappear

A shallow reading says AI reduces human capital. Nadella's better point is that human capital becomes more important because it directs token capital.

Humans still provide:

goal selection,
taste,
domain judgment,
relationship context,
moral and commercial trade-offs,
recognition of weird cases,
the decision to update or ignore a signal.

Without that direction, compute runs in circles.

Token Capital Is Not Just Data

Token capital is easy to misunderstand as "our documents in a vector database." That is too small.

A useful definition:

token capital = proprietary AI capability that improves from the organization's real work

It includes:

workflow traces,
private eval suites,
examples of good and bad outcomes,
task-specific rubrics,
memory of prior decisions,
tool and system integrations,
domain-specific context packs,
reusable agent procedures,
human feedback on edge cases.

This is why private evals matter. Public benchmarks tell you whether a model is generally capable. Private evals tell you whether the system is getting better at your work.

Private Evals Become IP

For a company, that means evals should test real business outcomes:

Did the support agent solve the customer problem without inventing policy?
Did the coding agent modify the right files and pass the real tests?
Did the sales-prep agent surface relevant account context instead of generic notes?
Did the finance workflow preserve auditability?
Did the research agent cite sources and avoid laundering a thread into a fact?

A private eval suite becomes a map of what the company cares about. It encodes judgment. It also becomes the portability test Nadella points at:

Can you swap the general model without losing the company veteran?

If yes, the firm owns the loop. If no, the model provider owns the learning.

Institutional Memory Is More Than a Knowledge Base

Organizational-learning theory has been here before. Nonaka and Takeuchi's SECI model describes knowledge creation as movement between tacit and explicit knowledge:

socialization      = tacit -> tacit
externalization    = tacit -> explicit
combination        = explicit -> explicit
internalization    = explicit -> tacit

Institutional memory becomes useful when it is tied to action:

what happened,
why it mattered,
what decision was made,
what outcome followed,
what should change next time.

The Data Flywheel Is Not Automatic

People use "data flywheel" too loosely. More data does not automatically create advantage.

For AI firms, the equivalent is:

trace -> eval -> improvement -> better workflow -> more useful trace

Raw logs are not the moat. The moat is the selection, labeling, evaluation, and workflow change that turns logs into learning.

Why Smart People Get This Wrong

They confuse model access with capability

If every competitor can call the same model, model access is not strategy. The strategic layer is what the firm does with its own work traces and judgment.

They build knowledge bases without feedback loops

A document store answers questions. A learning loop changes future behavior. Those are different systems.

They treat evals as QA instead of capital formation

An eval is not only a test. It is a codified preference about what good work means. A good private eval suite is stored judgment.

They over-automate before they understand the work

If the organization has not learned how experts judge outcomes, automating the workflow may only accelerate bad defaults.

They ignore portability

If the system's expertise is trapped inside one model or vendor stack, the firm has not built token capital. It has rented it.

How To Use This

1. Build the eval before scaling the agent

For every recurring workflow, define:

the task,
the starting context,
the desired outcome,
the state that must change,
the failure modes,
the human judgment that matters.

Then run the agent against that repeatedly.

2. Capture traces deliberately

Do not keep only the final answer. Preserve the trajectory:

prompt,
tool calls,
retrieved context,
intermediate failures,
final output,
real-world result,
human correction.

That is the raw material of token capital.

3. Convert expert judgment into reusable rubrics

When a human says "this is bad," extract the rule. Was it wrong tone, wrong fact, wrong priority, missing context, unsafe action, poor timing, or weak commercial judgment?

4. Keep the model swappable

The control question:

If GPT-6, Claude, Gemini, or an open model replaced today's model, would the firm's learning loop still work?

If no, the architecture is too dependent on the model provider.

5. Feed learning back to humans

The goal is not a machine that learns while people stagnate. The loop should also make humans sharper: better rubrics, better examples, better postmortems, better taste.

What This Means For Hermes And Jme-Loop

This is exactly the direction Hermes should move:

canonical memory as institutional memory,
source cards and reference cards as explicit knowledge conversion,
tool traces and cron outputs as workflow traces,
private evals for recurring agent tasks,
process updates when failures repeat,
model/runtime portability as a design constraint.

The practical next step is not "more agents." It is better loops around the agents we already run.

For Jamie's stack, the test is:

Can Hermes get better at Jamie-specific work without trapping the learning inside one model or one chat thread?

If yes, it is becoming token capital. If no, it is just automation.

Key Terms

Human capital: human knowledge, judgment, relationships, taste, and context.
Token capital: owned AI capability built from the firm's workflows, traces, evals, memory, and context.
Private eval: internal test suite measuring whether an AI system improves on outcomes that matter to the organization.
Trace: the full record of an agent run: inputs, tool calls, retrieved context, intermediate steps, output, and state changes.
Institutional memory: durable knowledge of what happened, why, what was decided, and what should change next time.
Data flywheel: a loop where usage generates data that improves the system, attracting more useful usage and better data.
Tacit knowledge: expert know-how that is hard to fully write down.
Explicit knowledge: codified knowledge that can be documented, shared, and recombined.

Recall Questions

Why is picking the best model not enough to create durable firm advantage?
What is the difference between a knowledge base and a learning loop?
Why can private evals become company IP?
What does it mean to keep the "company veteran" while swapping the general model?
How does human capital direct token capital rather than disappear under it?

Best Resources to Learn More

Read Nadella's article first for the business architecture claim.
Pair it with Anthropic's agent-eval work to understand why private evals need traces and outcome checks.
Use Nonaka and Takeuchi for the old organizational-learning frame: tacit and explicit knowledge conversion.
Use data-network-effects work to avoid vague "data moat" thinking.

Sources

Satya Nadella. "A frontier without an ecosystem is not stable." X Article, 14 Jun 2026. https://x.com/satyanadella/status/2066182223213293753
Microsoft Foundry overview. Enterprise agents, memory, observability, evals, governance, and Model Context Protocol. https://ai.azure.com/
Anthropic Engineering. "Demystifying evals for AI agents." 2026. https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents
Harvard Digital Data Design Institute. "Data, network effects, and competitive advantage." https://d3.harvard.edu/data-network-effects-and-competitive-advantage
ASCN. "SECI Model of Knowledge Creation: Socialization, Externalization, Combination, Internalization." https://ascnhighered.org/ASCN/change_theories/collection/seci.html
Brown JS, Duguid P. "Organizational Learning and Communities-of-Practice: Toward a Unified View of Working, Learning, and Innovation." Organization Science, 1991. Accessible PDF mirror: https://joelchan.me/INST801-FA22%20Readings/Brown_Duguid_1991_Organizational%20Learning%20and%20Communities-of-Practice.pdf
Brown JS, Duguid P. The Social Life of Information. Harvard Business School Press, 2000. Review summary: https://webdoc.sub.gwdg.de/edoc/aw/ucsb/istl/01-winter/review3.html
Existing library article: /library/agent-evals-stateful-systems

The Firm as a Learning Loop: Human Capital, Token Capital, and Private Evals

What Is This?

Why Does It Matter?

The Core Loop

Human Capital Does Not Disappear

Token Capital Is Not Just Data

Private Evals Become IP

Institutional Memory Is More Than a Knowledge Base

The Data Flywheel Is Not Automatic

Why Smart People Get This Wrong

They confuse model access with capability

They build knowledge bases without feedback loops

They treat evals as QA instead of capital formation

They over-automate before they understand the work

They ignore portability

How To Use This

1. Build the eval before scaling the agent

2. Capture traces deliberately

3. Convert expert judgment into reusable rubrics

4. Keep the model swappable

5. Feed learning back to humans

What This Means For Hermes And Jme-Loop

Key Terms

Recall Questions

Best Resources to Learn More

Sources

Want more depth?

What next?

Back to Home

Open Learning

Mark complete

Questions & Answers

The Firm as a Learning Loop: Human Capital, Token Capital, and Private Evals

What Is This?

Why Does It Matter?

The Core Loop

Human Capital Does Not Disappear

Token Capital Is Not Just Data

Private Evals Become IP

Institutional Memory Is More Than a Knowledge Base

The Data Flywheel Is Not Automatic

Why Smart People Get This Wrong

They confuse model access with capability

They build knowledge bases without feedback loops

They treat evals as QA instead of capital formation

They over-automate before they understand the work

They ignore portability

How To Use This

1. Build the eval before scaling the agent

2. Capture traces deliberately

3. Convert expert judgment into reusable rubrics

4. Keep the model swappable

5. Feed learning back to humans

What This Means For Hermes And Jme-Loop

Key Terms

Recall Questions

Best Resources to Learn More

Sources

Want more depth?

What next?

Back to Home

Open Learning

Mark complete

Questions & Answers