Growth Alchemy Lab

Eval design before agent design

Thu, 30 Apr 2026 00:00:00 GMT

The cheapest agent improvement usually happens before the agent exists.

Write the eval first. A good eval describes the job, the source material, the allowed action, the required citation, and the conditions that force refusal. That sounds slower than prompting, but it prevents the expensive version of speed: a system that produces plausible work nobody trusts.

For an operating workflow, I start with five examples:

A clear pass.
A close-but-wrong answer.
A missing-source refusal.
A risky action that needs a human.
A correct answer with weak evidence.

Those examples become the product spec. The model, prompt, tools, and UI are implementation details underneath the eval.

The moment an agent can say "I do not have source material for that" and still be considered successful, the team starts trusting it for the right reason.

When not to deploy an agent

Wed, 29 Apr 2026 00:00:00 GMT

Not every slow workflow deserves an agent.

If the decision rules are unclear, adding autonomy usually makes the problem harder to inspect. If the source data is unreliable, the agent will package that unreliability in nicer prose. If the team does not know who approves a risky action today, the software will not solve the ownership gap.

I usually pause agent work when one of these is true:

The team cannot name the decision the agent is supposed to improve.
The workflow has no owner.
The data changes faster than the review loop.
The cost of a wrong action is higher than the cost of a manual step.
A deterministic script would remove most of the work.

The right first move is often a queue, a form, a policy, or a dashboard. Agents earn their place after the workflow has enough shape to evaluate.

The audit trail is the product

Tue, 28 Apr 2026 00:00:00 GMT

The impressive part of an agent demo is usually the output. The useful part in production is the audit trail.

An operating team needs to know what the agent saw, what it ignored, which tool it used, what it was allowed to do, who approved the action, and how the result changed the next run. Without that record, every failure becomes a debate about hidden reasoning.

The audit trail should be designed as a user interface, not a log dump. A good review surface answers five questions quickly:

What source material was used?
Which rule or budget applied?
What action was proposed?
Who approved or rejected it?
What changed after the action?

That record is what lets autonomy compound without turning into institutional folklore.

Your Agents Have Amnesia - Stop Using Markdown!

Thu, 16 Apr 2026 00:00:00 GMT

A few weeks ago, I posted about something most AI agents are doing wrong. They are stuffing context into markdown files. memory.md. AGENTS.md. skills.md.

It felt clever. Worked great for about a week.

Then you hit 100 files across a handful of agents. And you ask yourself: what happens when this is 500? What happens next month?

The answer was not good.

The problem

Stanford published a paper called "Lost in the Middle: How Language Models Use Long Contexts." It proved something most of us suspected: LLMs are bad at finding important information buried in long text.

Production systems fail to find the right context 20-40% of the time. That is 1 in 3 queries.

A file system is not a memory system.

So what do you actually build instead?

What actually works

After running production agents across Discord, Telegram, Slack, iMessage, and the CLI, serving multiple platforms simultaneously with persistent sessions, this is the architecture that survived.

It has six layers. Each one solves a different problem.

Layer 6: Plugin extensibility with mem0, Neo4j, and vector databases.
Layer 5: Cross-session search with FTS5 full-text search.
Layer 4: Long-term memory in a git repo: AGENTS.md and the knowledge base.
Layer 3: Procedural memory with 100+ expert skills, loaded on demand.
Layer 2: Curated memory in MEMORY.md and USER.md, with hard limits.
Layer 1: Session memory in SQLite plus FTS5, storing every conversation.

The repo is the brain. The agents are the hands. The harness is just the wiring: thin, fast, replaceable.

Layer 1: Session memory, SQLite with FTS5

Every conversation. Every tool call. Every reasoning chain. Persisted to a single database file. Not JSONL. Not markdown. A real database with transactions, indexes, and concurrent access.

Why SQLite?

One file. No Docker. No API keys. No infrastructure.

Twenty years of production testing. Handles thousands of concurrent reads in WAL mode.

Real transactions. Not eventual consistency. Not "close enough."

The working part: FTS5

A content-linked full-text virtual table. Three triggers keep the index in sync on every INSERT, DELETE, and UPDATE. Zero manual maintenance.

Search supports real query syntax. Keywords. Phrases. Boolean operators. Prefix matching. Combined queries like error OR exception AND docker.

WAL mode handles concurrent access. Multiple agents on multiple platforms read freely. Writes serialize with application-level retry and random jitter. Competing writers naturally stagger. The database never blocks for more than 150ms.

Layer 2: Curated memory

Two files. Hard character limits. Forced curation.

MEMORY.md: 2,200 characters. The agent's personal notes.

USER.md: 1,375 characters. What the agent knows about you.

When you hit the limit, you replace. You do not add. The agent cannot hoard. It has to choose what matters.

Frozen snapshot pattern

Both files load at session start. They inject into the system prompt as a frozen snapshot. Mid-session writes update the files immediately. But the system prompt does not change.

This preserves the prefix cache for the entire session. No invalidation. No re-computation. The snapshot refreshes on the next session start.

Threat scanning protects this layer. Prompt injection patterns. Role hijacking. Deception attempts. Exfiltration with curl and secrets. Suspicious entries get rejected.

The curation is not a limitation. It is the point. You cannot remember everything. Neither can your agent.

Layer 3: Procedural memory, skills

Session memory is what happened. Curated memory is what matters. Skills are how to do it.

One hundred skills. Each one encodes an expert workflow.

Skill files are markdown with YAML frontmatter. They define when to use a skill, the step-by-step procedure, pitfalls to avoid, and verification steps.

The skill router

python3 skill-router.py "optimize landing page"
# Returns: top 5 matching skills with relevance scores

The agent loads the most relevant skill, cites it in the response, and follows its framework. Not default behavior. The skill's framework.

Skills load on demand. They do not all live in memory. Load the best match. Execute. Unload.

Categories span development, marketing, social media, research, productivity, and MLOps. The agent has procedural memory for tasks it has never encountered.

Layer 4: Long-term memory, the git repo

Everything above is agent-local. The git repo is shared. Version-controlled. Human-readable.

AGENTS.md defines the agent's identity, rules, and operating procedures. Who the agent is. What files to read every session. Safety rules. Tool conventions. Memory structure.

The knowledge base is a git repository that every agent pulls at session start.

git pull --rebase

Shared context lives there. CURRENT_PRIORITIES.md. COMPANY_FACTS.md. BRAND_VOICE.md. The agent's morning briefing. It reads the latest updates before doing any work.

Why git?

Version control. Every change is tracked and reversible.

Human-readable. Anyone can read and edit.

Branchable. Experiment without breaking production.

Distributed. No central server required.

The repo is the brain. Not a database. Not a vector store. Text files in git.

Layer 5: Cross-session search

The agent has thousands of past conversations. It needs to find the right one.

FTS5 search, no embeddings required

The session search tool uses the same FTS5 index. No embeddings. No preprocessing. No vector database. Just full-text search.

Query syntax is rich. Broad recall with OR. Exact phrase matching. Exclusion with NOT. Prefix matching with wildcards.

When a match is found, the system pulls the message before and after. The agent gets the match plus context. Full conversation flow around the relevant part.

The agent searches its own history before asking the user to repeat themselves.

Layer 6: Plugin extensibility

SQLite handles 95% of cases. But some agents need more.

Memory provider interface

A pluggable architecture for alternative memory backends:

mem0: vector memory with session-based retrieval.
Neo4j: graph database for relationship-based memory.
Honcho: dialectic memory with peer-based reasoning.
Holographic: holographic memory encoding.
RetainDB: persistent retrieval database.
Hindsight: retrospective memory analysis.

Providers combine in hybrid mode. mem0 plus Neo4j for vector and graph memory. SQLite plus Honcho for session and dialectic memory. Same interface. Different backends.

When to use vector memory: semantic similarity search, unstructured and diverse memory, RAG over large document collections, and cross-lingual search.

But for most agent workflows, finding a past conversation, remembering a user preference, or recalling a tool configuration, keyword search is faster, cheaper, and more predictable.

The philosophy

Here is what most people get wrong about agent memory.

They think more context equals better memory. It does not. More context equals more noise. The Stanford paper proved it. LLMs lose information in the middle of long contexts. Stuffing everything into markdown files is just a file attachment, not memory.

Real memory has layers. Each layer has a purpose, a format, and a size limit.

Session memory for everything. Curated memory for what matters. Skills for how to do things. Git for shared knowledge. Search for finding it all.

Memory is not about storing everything. It is about finding the right thing.

You are either building a folder or building a system.

Pick one.

Getting started

Want to implement this? Here is the minimum viable stack:

SQLite session store: one file, WAL mode, FTS5 for search.
Curated memory: two files, hard limits, forced curation.
Session search: FTS5 with MATCH queries, snippet extraction, and surrounding context.
Skills registry: skill files with YAML frontmatter and a router for on-demand loading.

This scales. From one agent to hundreds. From one platform to all of them.

Final thought

The agents that win will not be the ones with the biggest context windows. They will be the ones with the best memory architecture.

Layered. Bounded. Searchable. Version-controlled. Extensible.

That is not a folder. That is a system. And the system scales.

Most PMs Won't Become Agent Orchestrators. They'll Become Obsolete.

Tue, 30 Dec 2025 00:00:00 GMT

I am going to make an argument that most people in product management will not like. I might be wrong. But I think the counterargument needs to be made, because the current narrative feels too comfortable, and comfortable narratives are often the ones that age worst.

Everyone is telling product managers the same story: agentic AI will transform you from a backlog manager into an "agent orchestrator." You will coordinate fleets of autonomous AI agents. You will be more strategic than ever. Your job is safe. It is just evolving.

I do not buy it.

Here is my contrarian position: 89% of agentic AI pilots never make it to production. Not because the technology fails. Because organizations realize they do not need a human in the loop at all.

I want to stress upfront: this is one perspective. The optimists have real arguments too. But I think someone needs to articulate the pessimistic case clearly, so PMs can stress-test their own assumptions.

The orchestrator fantasy

McKinsey, Deloitte, and every AI course vendor are pushing the same narrative: PMs will "orchestrate intelligent ecosystems" instead of managing Jira tickets.

To be fair, these are smart people with real data. They are not making things up. But I think they are describing a transitional state, not an end state.

Here is what I believe they are missing:

Orchestration is a temporary job.

When AI agents first deploy, yes, someone needs to coordinate them, set guardrails, and handle edge cases. That someone might be a PM.

But agents learn. They adapt. The orchestration layer gets automated too.

Andrew Ng's agentic AI patterns, reflection, tool use, planning, and multi-agent collaboration, are not designed to keep humans in charge. They are designed to remove the need for human coordination entirely.

The 23% of companies scaling agentic AI are not hiring more PMs. They are hiring fewer.

What actually happens when agents ship

Here is where I will admit my sample size is small. I talked to three PMs at companies with production agentic AI systems. Three people is not a trend. But their stories were consistent enough that I think they are worth sharing.

Here is what they described:

Week 1-4: PM defines agent goals, sets constraints, reviews outputs. Feels strategic.
Month 2-3: Agent handles 80% of decisions autonomously. PM reviews exceptions. Starts feeling like QA.
Month 4+: Exception rate drops below 5%. PM role becomes "agent babysitter": checking dashboards, escalating the rare failure. Leadership asks: "Why do we need a full-time person for this?"

One PM put it bluntly: "I went from product manager to product janitor."

The skills gap nobody talks about

Now here is where reasonable people might disagree with me.

The industry says PMs need new skills:

AI literacy.
Prompt engineering.
Systems thinking.
Guardrail design.

These are real skills. The courses teaching them are not scams. But here is where my contrarian view gets sharper: I think these are fundamentally engineering skills, not PM skills.

The PM who can design agent architectures, write evaluation frameworks, and debug multi-agent coordination? That is not a PM who "evolved." That is a PM who became an engineer.

And if you are competing for that role, you are competing against actual engineers who already have those skills, plus computer science degrees and 10 years of technical depth.

The "upskilling" path leads to a job you are underqualified for.

The two PMs who will survive

I do not think the PM role goes to zero. That would be an overclaim, and I am trying to be honest about what I actually believe versus what makes for a spicy take.

Here is my actual view: most PM jobs disappear, but some survive. The survivors will not be "orchestrators." They will be one of two things.

1. Domain experts who happen to do product

Agentic AI can coordinate tasks. It cannot understand why a healthcare compliance workflow matters more than a marketing automation workflow.

The PM who survives is the one with 15 years in fintech regulation, or clinical trial management, or supply chain logistics. Someone whose expertise cannot be scraped from the internet and fed to an LLM.

If your domain knowledge can fit in a prompt, you are replaceable.

2. The last-mile human

Some decisions require a human for legal, ethical, or trust reasons. Not because humans decide better, but because regulations or customers demand it.

These PMs are not orchestrating agents. They are the mandatory human checkpoint before an agent-generated decision goes live. It is a compliance role, not a strategic one.

Neither of these is the glamorous "agent orchestrator" future the industry is selling.

The real numbers

Let us be specific:

11% of organizations have agentic AI in production, according to Deloitte 2025.
35% have no formal agentic AI strategy at all.
94% say process orchestration is essential, but they mean automated orchestration, not human orchestration.
15% of day-to-day work decisions will be made autonomously by 2028, according to Gartner.

That last number is the killer. "Day-to-day work decisions" is literally the PM job description. Prioritization. Tradeoffs. Stakeholder alignment. Resource allocation.

If 15% of those decisions are autonomous by 2028, what is the number by 2032? By 2035?

The trend line does not end at "PM as orchestrator." It ends at "PM as historical artifact."

What to actually do

I am not saying quit your PM job tomorrow. I am saying stop believing the comfortable narrative and start making real moves.

Option 1: Go deep on domain

Pick an industry vertical. Become the expert who understands not just the product, but the regulatory environment, the customer psychology, the competitive dynamics, and the operational constraints.

This takes years. Start now.

Option 2: Go technical for real

Do not learn "prompt engineering" from a weekend course. Learn ML fundamentals. Learn distributed systems. Learn how to actually build and evaluate AI systems.

This means competing with engineers on their turf. It is hard. But at least it is honest.

Option 3: Move to strategy

The decisions agents cannot make are the ones with incomplete information, high uncertainty, and existential stakes. M&A. Market entry. Pivots.

These are executive decisions, not PM decisions. If you want this path, you need to get out of product and into business strategy.

Option 4: Accept the timeline

Maybe you are 15 years from retirement. Maybe agentic AI's impact takes longer than predicted. Maybe you ride it out.

This is a valid choice. Just make it consciously, not by default.

The contrarian bet

Here is my actual prediction:

By 2030, the PM role as we know it will not exist at most tech companies.

Not because PMs failed. Because the job, translating customer needs into prioritized requirements and coordinating cross-functional execution, is exactly what agentic AI does best.

The surviving "product" roles will look nothing like today's PM job. They will either be deeply technical, as AI/ML product engineers, or deeply domain-specific, as regulated industry specialists.

The generalist PM who is "technical enough" and "strategic enough" and "customer-focused enough" is the profile most exposed to automation.

I know this is not what you want to hear. The "agent orchestrator" story is much nicer.

But if I were a PM today, I would rather know the uncomfortable truth than optimize for a future that is not coming.

The bottom line

The agentic AI narrative for PMs has three versions:

Version 1, industry marketing: "You will become an agent orchestrator. More strategic than ever."
Version 2, honest assessment: "Some PMs will adapt. Many will not. The job changes fundamentally."
Version 3, what I actually believe: "The PM role is a transitional artifact of the pre-AI era. Its core functions are being automated. Plan accordingly."

I am betting on version 3.

You do not have to agree. But you should at least consider it before investing your career in the orchestrator fantasy.

What is your take? Are PMs positioned to thrive in the agentic era, or is the industry selling false hope? I would genuinely like to hear the counterargument.

Sources

Deloitte: Agentic AI Strategy, 2025.
McKinsey: The Agentic Organization.
Gartner: Top Tech Trends 2025.
Andrew Ng on Agentic AI Patterns.