2026

2026/07/08
in fde
4 min read

Is FDE a Bubble?

When I wrote If LEGO Had Forward Deployed Engineers, I opened by saying the role was having a moment. That was an understatement. Since then the title has spread everywhere — every AI startup seems to be hiring FDEs, investors ask founders about their "FDE motion" the way they used to ask about their PLG motion, and the job ads have long since outrun any shared understanding of what the job is.

When a title spreads faster than the understanding behind it, it's worth asking the uncomfortable question: is FDE a bubble?

I should declare my interest upfront. I spent seven years as an FDE at Palantir and I now make my living helping companies build FDE organisations. I have every incentive to tell you the answer is no. Bear that in mind as you read.

Palantir never chose FDE

The origin story matters here, because it's the part everyone skips.

In The FDE Fork I described how Foundry was an accident — nobody sat in a room and decided to build it. FDEs were solving customer problems, building tools to bootstrap themselves, and the product fell out of the residue. But push that one step further back: the FDE role itself was an accident too. Nobody at Palantir designed "forward deployed engineering" as an organisational innovation. In the beginning there was no platform to deploy. FDEs worked the way founding engineers work: embedded in a customer's mess, doing discovery, building custom things end to end. Out of the learnings from those bespoke builds came the beginnings of Foundry. And then the same motion kept repeating — pointed at a different part of the platform each time, always at the envelope of what didn't exist yet. Plenty of those deployments ran at a loss. The economics were never the point; the discovery was. The tooling fell out of the delivery, the product fell out of the tooling, and the role got a name only after it already existed.

FDE, in other words, is residue. It's what was left over after one company spent a decade solving one specific problem from first principles. The structure was downstream of the reasoning, never the other way around.

Copying the residue

What's spreading right now, mostly, isn't the reasoning. It's the residue.

A founder reads about the Palantir model, or watches an AI lab they admire hire FDEs, and concludes that the title is the strategy. They hire smart generalists, call them FDEs, send them to customers, and wait for the magic. At no point does anyone ask the first-principles question — what is our deployment problem, and what is the cheapest structure that solves it?

Sometimes the honest answer to that question is an FDE team. Often it's a solutions engineering function, or better documentation, or a smaller product surface, or the admission that the product isn't ready to sell yet. But those answers don't sound like Palantir, so they don't get considered.

The LEGO test applies directly. If your FDEs are building dragons but nobody is walking back to product with the bag of weird bricks — if the field signal never changes what you build — then you haven't built an FDE org. You've built a services team with a fashionable title. That can be a perfectly good business. It just isn't the thing you thought you were copying, and it won't produce the thing Palantir's version produced.

The part that isn't a bubble

A lot of AI companies genuinely do have Palantir's problem shape. Capable technology, bespoke deployments, a chasm between "demo works" and "production works", customers who can't self-serve their way to value, and a product whose shape nobody knows yet — the kind you only discover by embedding engineers in a customer's mess and watching what they're forced to build. A founder who reasons honestly from those constraints will land on something that looks like forward deployment — not because Palantir did it, but because the same conditions produce the same conclusion.

And as I argued in The FDE Fork, AI has made a second version of the model coherent — the outcomes path, where the internal tooling compounds and the FDE never stops being the product. The model has more legitimate configurations today than it did when Palantir ran it, not fewer.

So the model isn't hollow. The bubble, to the extent there is one, is the gap between the number of companies using the title and the number that did the reasoning.

What happens when it pops

That gap will close the way these gaps always close. The companies that hired FDEs as an identity will quietly discover they built expensive support engineering, wind it down, and conclude that "FDE didn't work for us" — when the truth is they never had the problem FDE solves. The title will absorb some reputational damage on the way down, and the discourse will move on to whatever gets named next.

The companies that derived their structure from their own problem will be fine. Some of them will be running FDE orgs. Some of them will be running something else that they arrived at the same way Palantir arrived at FDE. What they call it won't matter much.

The wrong question

Which is why "is FDE a bubble?" is ultimately the wrong question. Whether the title is over-adopted matters far less than whether you derived your structure from your problem.

Palantir's actual insight was never "put engineers in the field." It was the willingness to look hard at the problem in front of it, ignore what everyone else's org chart looked like, and build whatever the problem demanded. FDE is what that process produced — once, for one company, in one problem space. It happened to be immensely successful, so the output got a name and the name got copied. But the output was never the insight, the process was.

If you do that reasoning and land on FDE, great. If you do that reasoning and land somewhere else, also great. The only losing move is skipping the reasoning.

FDE is not the path to success. Solving problems from first principles is.

2026/05/29
in physics
12 min read

AI as the Undergrad Researcher: A Real Physics Result, Two Months, One Person

Writing code with AI is no longer surprising. I rebuilt my PhD-era gyrokinetics code in JAX with Claude in 30 days last year (Building a Gyrokinetics Code Without Reading a Single Line). What I actually wanted to test was the next step: can AI do the unglamorous, undergraduate-level research motions that turn a working tool into a publishable result — running simulations on a cluster, iterating on the configs when the physics looks off, generating diagnostic plots, catching numerical bugs, and drafting the paper?

Two months later, the answer is yes. The paper is merged (repo, currently with Alex Schekochihin for review). It's a small but clean physics result: across a 50× scan in collisionality, the phase-space dissipation rate stays flat to under 1%. The kinetic analogue of Onsager's classical 1949 dissipative anomaly, demonstrated directly in nonlinear KRMHD simulation.

In this article I write about what those two months actually looked like — what Claude did, where I stepped in, and the broader claim I want to make: computational physics researchers today have undergraduate-level research assistants on tap, and using them well is a systems problem more than a model problem.

The Setup

In plasma turbulence, energy cascades from large scales down to small scales until it dissipates. The dissipation rate ε_ν depends on collisionality ν. Onsager's 1949 result for incompressible fluids (the dissipative anomaly, sometimes called Onsager's conjecture) says ε_ν becomes independent of viscosity in the inviscid limit — the cascade self-organises to maintain constant flux. The kinetic analogue of this statement has been argued for in the literature (Schekochihin 2016, Adkins 2018, Eyink 2018, Nastac 2024) but hadn't been directly demonstrated in nonlinear KRMHD simulation. That was the question: can we see the ν-independent plateau?

The reason to pick this question was pragmatic. The setup is tractable, all you need is one diagnostic per run. If the simulations behaved, the result would be unambiguous. Checking the validity of the results was easy.

Figure 1: The collisional dissipation rate ε_ν stays flat to under 1% across a 50× scan in collisionality ν. This plateau is the central result of the paper.

Dissipative anomaly plateau: epsilon_nu vs nu

Figure 2: What a healthy run looks like (ν=3). Left: the time-averaged perpendicular spectrum E(k_⊥) shows a clean inertial-range cascade. Right: the Hermite-moment spectrum W(m,t) as a heatmap — energy stays confined to low m, with no pileup at the m=128 truncation. This is the "physics makes sense" check I was doing on every run.

Base-state spectrum and W(m,t) heatmap for the nu=3 run

The Workflow: Claude Does the Undergrad Work

For two months, almost every day, the rhythm looked like this:

I state the next physics goal in a sentence or two.
Claude proposes a config — YAML parameters, a Modal runner, sometimes a new diagnostic.
I push back on the physics where it looks wrong, or say "go for it."
Claude submits the run to A100 cloud GPU.
Run finishes, Claude pulls the data and generates plots.
I look at the plots and decide whether the physics makes sense.
If yes, Claude drafts the paper section. If no, we iterate.

The bulk of the labour — writing the Modal runner, parsing diagnostic outputs, fitting power laws, generating figures, writing LaTeX, managing the BibTeX — was Claude. I never touched the cloud orchestration code. I never wrote a matplotlib script. I never edited the LaTeX preamble. I looked at outputs and either nodded or asked for something different.

As things progressed, we got into a rhythm. A normal day involved me sending 5-10 short messages, mostly along the lines of:

"go for it"
"this looks great! Let's get it merged."
"ok submit the M=256 run too"

This is the texture of the collaboration most of the time: short, direct, advisory messages, the kind you would send a capable grad student.

Where I Stepped In

The interesting moments are the five places where I had to nudge. None of them were heroic. Each was the kind of intervention an advisor makes when a student is going down the wrong path.

Nudge 1: Lambda

Two weeks in, the Hermite cascade wasn't cascading. The setup was right by every code-level check, but energy was just sitting in the lowest moment. I asked: isn't there supposed to be a coupling Λ between g₀ and g₁? The Alfvénic checkpoint we'd restarted from had Λ=1; for the Hermite problem at β_i=1, it should be √5. One physical constant. The cascade lit up on the next run.

The AI executes correctly inside the frame it is given. The frame — including the specific physical values that distinguish your problem from the adjacent one — is yours to set.

Nudge 2: Three Weeks Chasing a Numerical Ghost

This was the longest detour, and the one I keep thinking about. Every nonlinear run at high ν blew up. The blowup times scaled with ν cleanly — ν=1 died at 80 τ_A after the restart, ν=3 at 122 τ_A, ν=5 at 167 τ_A — and the scaling matched what the canonical "pileup at the Hermite truncation" failure mode predicts. The literature warns about it. I went deep into that hypothesis.

For three weeks Claude helped me investigate inside that frame: adjusting hyper-collisional dissipation, varying M, looking at high-m energy budgets. The frame was wrong, but it was an internally consistent wrong frame, which is what made it so hard. Every diagnostic could be interpreted as consistent with the pileup story if you squinted.

What broke it was finally asking Claude to extract the three diagnostics the pileup story actually predicts: the energy at the truncation moment, the parallel-wavenumber spectrum at the highest m, and how localised the blowup was across m. The pileup story predicted a clear signal in all three, and the data showed none of them. The energy at the truncation sat at the noise floor, the k_z spectrum at high m was down at 10⁻¹⁴, and the blowup happened simultaneously across every moment, independent of ν. That is not a physical cascade failing; that is the time integrator.

Figure 3: What the wrong story looked like. W(m,t) heatmaps for the runs that blew up, with the divergence in ΣW(m), W(m=M), and ε_ν(t) on the right. It reads as a physical pileup at the truncation, but it was actually a hidden CFL leak in the Lawson-RK4 integrator. The wrong frame produced fingerprints that looked entirely physical.

Hermite blowup W(m,t) heatmaps for the Lawson-RK4 runs

I reframed for Claude: this isn't a physics bug, it's a code-path bug. Look at the Lawson-RK4 integrator. Within an afternoon Claude had identified a hidden CFL leak in the integrator. A fix shipped as GANDALF #138 the next day.

The honest version is that the AI did not catch this. It spent three weeks helping me debug inside a confidently wrong hypothesis. What broke the frame was me coming back to first principles and asking "what would the data look like if I were wrong?" before "what would the data look like if I were right?"

This is the hardest discipline to maintain in a long project. The frame is yours to set, and questioning it is yours too.

Nudge 3: Dropping the Π(m) Panel

Late in the project Claude proposed a second panel for Figure 1 showing the constant-flux Hermite cascade, Π(m) versus m. The diagnostic worked and the figure was real, but the ν=1 curve had a clear numerical artifact left over from low-ν pileup. Claude was happy to keep the panel with explanatory caveats. My message at the time was: hmmm the flux plot is not great honestly. i think we should just remove it. The figure that survived was cleaner, and the headline result stronger.

The instinct to publish less rather than over-claim is a human one. The AI will defend a marginal figure indefinitely if you let it.

Nudge 4: Checking the Novelty Claim

The draft asserted that no prior numerical work had cleanly demonstrated a ν-independent ε_ν. I have been out of active physics for over a decade, so I am not current on the literature and couldn't vouch for that claim. What I could do was flag it. I asked Claude whether we actually had a citation for the sentence.

Claude pulled the relevant prior work, including Nastac 2024 — titled "Phase-space entropy cascade and dissipative anomaly," recent, directly adjacent to our claim, and a paper I had never seen. We turned the bare assertion into a short prior-work paragraph and cited it properly.

This nudge cuts the other way from the rest. Here the AI knew the literature better than I did. My contribution was the editorial reflex of not asserting novelty I hadn't checked; the retrieval was Claude's. The lesson is to interrogate your own strong claims and let the AI do the literature work it is genuinely good at.

Nudge 5: The Convergence Study

This is the least dramatic of the five, included because it is the routine mode. Late in the project I said: we should do the M convergence as a part of this study. Claude wrote the Modal runner, submitted two new sims (M=64 and M=256 at ν=3), did the analysis, and wrote the new subsection. The whole thing took two days, with almost no input from me beyond approving the configs. Most of the collaboration looks like this. The dramatic episodes are easy to write about, but the long calm stretches where the work just gets done are the actual story.

Memory: The New Piece of Infrastructure

For the GANDALF sprint, persistent memory didn't matter. It was thirty days in one rhythm, and the whole project fit in active context.

For this project it did matter. At sixty days you can't hold everything in working memory, because sessions interrupt each other and knowledge from Tuesday gets lost by Thursday unless you write it down. The auto-memory system in Claude Code — files that persist across sessions — turned out to be the most important piece of infrastructure for the long arc.

There are six memory files for this project right now. Two of them, verbatim:

Lambda parameter physics Lambda=1 kills Hermite cascade; use √5 for standard β_i=1

M=128 Hermite — resolved by GANDALF v0.5.0 IMEX Lawson blowup was numerical; pin scheme="imex_rk222"; ν=3 acceptance passed

These are corrections I won't have to relearn three weeks later. The "I already explained this to you" cost is the silent productivity killer in any long-running AI collaboration, and persistent memory is what prevents it.

It's a Systems Problem

The most important thing I learned from this project is that using AI effectively at the two-month scale is not a model problem but a systems problem.

The model is a given. What mattered was the system around the model: persistent memory, the choice of a tractable physics question, Modal as the cloud orchestration layer, a clean cross-repo handoff between GANDALF (the upstream library, where bugs got filed) and krmhd-research (the science repo, where the paper lives), a paper repo with curated BibTeX, the ability to look at a generated plot and decide in seconds.

The autonomy gradient framing I wrote about a few months ago — about choosing where on the spectrum from "I drive every step" to "AI runs autonomously" — already feels dated. The model capability has moved faster than the framework was ready for. The real question now isn't "how much autonomy do I give it?" but "what's the system I need around it so that the autonomy is productive?"

By "system" I mean the set of practical questions around the model. Where do corrections persist? Where do artifacts live? What is the cross-tool rhythm? Where do you intervene without breaking flow? What is the smallest unit of trustable output — a plot, a config, a paragraph? And how do you make decision-points cheap enough that you actually make them?

I don't have clean answers. What I have is the working version of one specific instance: a six-memory-file, two-repo, Modal-backed, advisor-mode rhythm that produced a paper. The recommendations below are what I've extracted from it.

What This Enables: Independent Computational Research

The category I keep thinking about is "recovering physicists" — people who left active research years ago but kept the training and the taste. The cost of returning is now genuinely low. A side project, a few hours a week, can produce real work.

Computational physics researchers today, in my reading, have undergraduate-level research assistants on tap. They are not as good as a great grad student, but they are far better than no help at all: reliable, never tired, comfortable at 2am, and never frustrated when you ask them to redo a figure for the fifth time.

This isn't an "intelligence explosion" claim. The model isn't replacing the PI. What it's replacing is the bottleneck of needing collaborators and students just to make a tractable problem possible to attack. A senior researcher who can recognise good physics and bad physics now has access to a pool of execution labour they didn't have before.

I am over a decade out of an active research role, and I am running a real physics investigation as a side project. That would not have been possible 18 months ago.

Recommendations for Long-Running AI-Assisted Research

These are opinionated, and some will age badly given how fast the model and the tooling are moving.

Choose a scope you can validate at a glance. Pick a question where the headline diagnostic is a number or a single curve. If you can't tell instantly whether the result is right, neither can the AI.
Build persistent memory aggressively. Every hard-won correction goes in, whether it is a specific physical value, an algorithmic gotcha, or something the AI got wrong twice. Re-learning is the dominant hidden cost.
For any result you can't independently verify, ask "what would the data look like if I were wrong?" before "what would the data look like if I were right?" The Lawson-RK4 misdiagnosis cost three weeks because I never ran the falsifying diagnostic first.
Default to dropping marginal results. The AI will defend a borderline figure indefinitely. The discipline to publish less is yours.
Interrogate your own novelty claims. Don't let "no one has done this before" stand on faith, especially if you're not current on the literature. Ask the AI to find the prior work — retrieval is something it does well, particularly if you've had it build up a reading list you can point it back to.
Run experiments on git branches and tell Claude to commit relentlessly. I leaned on git heavily. A branch lets the AI try a parameter change or a refactor in a sandbox without touching the version that works, and frequent commits turn a bad run into a one-command rollback rather than a reconstruction job. When the AI is generating most of the code and configs, cheap rollback is what lets you give it room to run.
Build the cross-tool rhythm explicitly. If your work spans multiple repos or services (an upstream library + your project + a cloud compute provider + a paper repo), be explicit about the handoffs. File upstream issues. Plan for context-switching cost.
Treat the AI like the smart undergrad. Validate through physics outputs, not code review. Don't read the code; look at the plot. If the plot looks right and the diagnostics check out, the code is probably right. If the plot looks wrong, no amount of code reading will tell you why.
Question your framing every 1-2 weeks. If you've been working inside a hypothesis for two weeks without pushback, force a first-principles review. The AI won't supply that question.

The Numbers

Calendar time: ~9 weeks, including the extended detour
Production runs: 9 (6 in the main ν-scan, 3 in the M-convergence)
Modal GPU-hours: ~52 (mostly A100, mostly overnight)
Failed Lawson-RK4 runs along the way: ~12 (the three-week wrong frame cost compute too)
Memory files created: 6
Paper: 8 pages, 11 references, 3 figures, 1 results table, 1 convergence table

I don't have a clean dollar figure for Claude API usage on this project — the work spanned Claude Code sessions across multiple repos, various model tiers, and some agentic runs. Honest estimate: small compared to the GPU compute, which itself is small compared to a postdoc-month.

What's Next

The original experiment plan listed multi-ion turbulent heating as the actual physics target. The dissipative anomaly was a stepping-stone — a question I picked precisely because I could tell at a glance whether the answer was right. The next thing is harder. The physics is less clean, the literature is more contested, and the diagnostics don't reduce to a single number.

The thing I most want to find out is whether the rhythm scales — whether the system holds when the AI doesn't have a clear template from existing literature, when "what's the right plot?" is itself a research question, and when the answer isn't a flat line but something with structure I'll have to interpret. I'll know in a few months.

The intelligence-on-tap claim looks more real than it did a year ago. What I now think is the real constraint is the speed at which you can validate what the AI produces. The AI does the undergraduate work cleanly, and the bottleneck is the advisor looking at the outputs and deciding whether they are right.

Acknowledgements

Alex Schekochihin for reviewing the draft and continuing to point me at the right physics. The Anthropic team for the tools.

2026/05/19
in fde
9 min read

The FDE Fork: Platform or Outcomes

In If LEGO Had Forward Deployed Engineers, I ended with a wrinkle I promised to write up properly: AI keeps handing FDEs new bricks, the line between "forward deployed engineer" and "software engineer" is blurring, and at some point you have to ask whether you even need to productise the dragon at all.

This is that piece. It's now a real choice. There are two coherent ways to run a forward deployed company, AI made the second one viable, and a founder who hasn't consciously picked one is going to build a confused org with a confused FDE role. The nature of the FDE role isn't a fixed thing you can look up. It's downstream of a strategic decision most founders don't realise they're making.

Engineers building messy tools, a polished platform emerging behind them

Foundry was an accident

Start with where the role comes from, because the origin explains everything that follows.

Nobody at Palantir sat in a room and decided to build Foundry. Here's how I described it on a call last year:

The way Foundry as a product actually happened is very interesting. No one said, "Oh, let's build Foundry." It was literally forward deployed engineers working with customers, almost a consulting shop. And the difference between an FDE and a consultant is the alignment: we get paid to solve problems, not to spend hours solving them. Given they were engineers, what they would do is build tools to bootstrap themselves — mainly for the data integration piece. And some customers noticed this and said, "If you just license the tools, we'll pay licence fees for them." That's how Foundry happened. A couple of FDEs went away and said, "We're going to take a few months and build Foundry."

So the product and the FDE motion were entangled from day one. The FDEs weren't there to deliver Foundry. They were there to solve customer problems, and Foundry fell out of the residue — the tools they kept rebuilding to make themselves faster.

That's worth holding onto, because it means the FDE role was never defined by the platform. The platform was defined by the FDEs.

The 2015–2020 thesis: forward deployment in service of a platform

For most of the decade that followed, though, the relationship ran the other way. Once Foundry existed, the FDE motion had a job: feed the platform.

This is the role I described with the LEGO dragon. The customer wants a dragon, there's no brick for the curve of its neck, so the FDE drills holes and glues bricks and builds a Frankenstein scaffold and then builds the dragon. The deliverable to the customer is the dragon. The deliverable to your own company is the bag of weird bricks — the custom hacks you walk back to the product team so they can decide what to manufacture. That loop, customer problem → bespoke build → product signal → real product, was the whole game. The FDE was a product R&D function dressed up as a delivery function.

And the thing that made that motion sustainable — that justified years of low-margin, labour-heavy services work — was the ambition behind it. Palantir wasn't running a consultancy that happened to write software. It was building the operating system for the world's largest and most important institutions, and the services were the cost of discovering the shape of that operating system. You can't find the shape of a platform from a conference room. You find it by embedding engineers in twenty messy customers and seeing which weird bricks keep showing up.

You can see the platform doing its job in the staffing numbers. Early on, a single use case took something like three to five FDEs. By a few years in, the ratio had inverted — one FDE could carry two or three customers, because the platform had absorbed enough of the weird bricks that each new deployment needed less hand-building. The role was, by design, bending towards its own obsolescence. Every brick you productised was a brick the next FDE didn't have to drill.

That's the tell. In the platform model, a healthy FDE org is one that slowly needs fewer FDEs per dollar. The role is a scaffold. The building is the product.

A LEGO tower with its scaffolding being removed

What AI changes

Now the wrinkle.

The reason the platform model made sense wasn't just ambition, it was economics. Services scale linearly with headcount and carry consultancy margins. The only way to escape that gravity was to productise — to convert hand-built dragons into licensable bricks so that, eventually, the customer's own engineers build dragons on top of your platform while you collect licence fees. Productisation was the only exit from the margin trap.

AI weakens that constraint. When a single engineer with Claude Code, Skills, MCP servers, and a stack of internal agents can build a credible dragon in an afternoon, the cost of bespoke delivery collapses. And once bespoke delivery is cheap enough, you can run a services business at margins that used to require a product.

One engineer building a dragon, helped by automated brick machines

This is the "AI-powered services" thesis that Sequoia and YC have been talking about — services companies with software economics. The mechanism is exactly the brick library getting powerful enough that you no longer need to manufacture and sell bricks to make the unit economics work. You just keep the brick-shaping machine internal, point your FDEs at it, and sell the dragons.

So a third model becomes available — and notice it's not new, it's the original model with the economics fixed. Palantir started as "essentially a consulting shop." The reason it couldn't stay one was margins. AI is, in effect, an offer to remove that reason.

The fork

Which gives founders a genuine fork. Two coherent strategies, and you have to pick.

A LEGO road forking towards a platform on one side and a delivery workshop on the other

Path A — Platform. You productise. The FDE is a product scout. The compounding asset is the product, and you sell licences. The bet is that the weird bricks generalise — that the abstraction you extract from twenty customers is good enough that the twenty-first buys the platform instead of the service. You also have to be able to survive the valley: the years of unprofitable services before the licence revenue compounds. This is the Palantir-to-Foundry path, and it works when the abstraction is real and the market is large enough to be worth the wait.

Path B — Outcomes. You don't productise — or rather, you productise internally and only internally. You build the brick-shaping machine, the agents, the deployment tooling, and you never sell any of it. The FDE is a delivery superpower wielding private tooling no competitor can buy. The compounding asset is that internal toolchain plus the accumulated muscle memory of having deployed into a hundred messy environments. You sell outcomes, priced as outcomes. The bet is that AI keeps your margins healthy enough that you never need the licence-fee exit at all.

The honest tension between them is the one I raised in the LEGO piece. I argued there that walking the weird bricks back to product is "the step that makes it engineering" — skip it and you're just a very expensive consultancy in a t-shirt. Path B looks, from the Path A vantage point, exactly like collapsing that tension and building Accenture.

But I don't think that's quite right anymore. On Path B you still walk the weird bricks back — you just walk them back to your own internal platform team instead of to a product you'll sell. The loop still runs. It's just that the flywheel is your internal capability, and it compounds without ever being packaged, priced, documented, or supported for an external buyer. Whether that's a worse flywheel or a better one is genuinely open. It's worse because you forgo license-fee leverage and the discipline that selling a product imposes. It's better because you skip the brutal productisation tax — the years spent making a thing general, supportable, and sellable — and you keep your best tricks proprietary.

That's the fork, two different theories of where the compounding asset lives: in a product you sell, or in a capability you hoard.

What the fork does to the FDE role

Here's why this matters for the role specifically, and not just the cap table.

On Path A, the FDE is a product scout, and that has hard consequences. The incentives have to live at the company level — revenue per forward-deployed person across the whole company, never revenue per engagement. Measure engagements and your FDEs quietly become account managers: they optimise for charging more for each dragon and stop bringing back the bricks. And the role bends towards obsolescence, on purpose. The honest thing I'll say here: I left Palantir partly because I couldn't find an FDE role I'd still enjoy. The platform had matured enough that the discovery work — the actual reason I liked the job — had thinned out. That's not a failure of the model. That's the model working. On Path A, the role is supposed to eat itself.

On Path B, the opposite. There's a line I keep coming back to: when you don't have a product, the FDE is the product. On Path A that's a phase — true in the early days, less true every year. On Path B it's the steady state. The FDE never becomes a scaffold for something else, because there is nothing else; the forward deployed engineer, augmented by internal tooling, is the entire company. The role doesn't sunset. The risk is different and real: without the discipline of an external product to feed, FDEs can drift into pure delivery, and "we shape bricks internally" decays into "we don't shape bricks, we just bill." Path B without a strong internal platform culture really does become Accenture-with-better-margins.

This, by the way, is why "FDE" has become such a confused title. It's become a big fat umbrella — solutions engineer, solutions architect, the Palantir thing, all crammed under one acronym — and people complain it means different things to different people. It does. But a lot of that confusion isn't sloppy language. It's that the companies using the title haven't decided which fork they're on. A Path A FDE and a Path B FDE genuinely are different jobs, with different incentives, different career arcs, and different definitions of success. Of course the word means different things. The companies do.

The choice founders have to make

So the instruction is simple, even if the decision is hard: pick.

The Palantir FDE motion was sustainable because the ambition carried it — the operating system for the world's largest institutions was a vision big enough to justify a decade of unprofitable services. If you're running Path B, you can't borrow that vision, because you're explicitly choosing not to build the sellable operating system. You need your own sustaining story and your own scoreboard: outcomes delivered, margin per FDE, the rate at which your internal tooling makes the next deployment cheaper. Those are different KPIs than "licence revenue" and they reward different behaviour.

What you cannot do is stay ambiguous. An org that hires Path A product scouts, measures them on Path B engagement outcomes, and tells investors a platform story while running a services business will tear its FDE role apart. The FDEs will feel the contradiction first — they always do — and the best ones will leave, because the role they were sold isn't the role the incentives are paying for.

The FDE role was never one fixed thing. It's a function of strategy. In 2015 the strategy was "find the shape of the platform," and the role was a product scout. The strategy could now just as legitimately be "sell outcomes forever, keep the machine internal," and then the role is a permanent, AI-amplified delivery superpower. Both are real companies. Both can be great companies.

AI is what made the second one viable. It didn't make the choice for you. Pick on purpose.

2026/05/10
in fde
5 min read

If LEGO Had Forward Deployed Engineers

Forward Deployed Engineer is having a moment. Anthropic and OpenAI are pouring billions into the model. Lots of articles getting written about it. Founders are spinning up "FDE" titles before they've really worked out what the role does. And in nearly every conversation I have — with founders, recruiters, or curious engineers — the same question comes up: what actually makes an FDE different from a really good consultant?

Here's my attempt at explaining the role with a thought experiment.

Messy Lego

The customer wants a dragon

Imagine LEGO decides to spin up a Forward Deployed Engineering org tomorrow. The first customer walks in and says: I want a dragon.

You can solve that request two ways.

The solutions engineer path. A solutions engineer at LEGO has a beautifully organised inventory of every brick the company makes. They read the dragon brief, pick the right 287 pieces, write a clean instruction booklet, maybe even build the model themselves, and hand it over. The customer looks at it and goes, "Hmm — kind of looks like a parrot, but yeah, I can see the dragon. Thanks." Successful delivery. On to the next request.

The forward deployed engineer path. An FDE starts the same way, picking through the existing brick library — but they get stuck. There's no brick that gives them the right curve for the dragon's neck. The wing pieces don't articulate the way they need to. So they grab a brick, drill a hole through it, glue two together, sand a third one flat. They build a Frankenstein scaffolding of custom-modified pieces, and then they build the dragon.

Solutions Engineer vs FDE

The customer looks at the FDE's dragon and says, "Actually, I wanted a Lord of the Rings dragon, not a Zog from Julia Donaldson." Fine. The FDE iterates. More custom hacks. Eventually they hand over a dragon the customer actually loves.

The end state for the two paths look identical — both delivered a dragon. But the FDE has one more step, and that step is the entire point of the role.

The step that makes it engineering

The FDE walks back to the LEGO brick R&D team holding a bag of weird, hacked-together bricks and says: "These are the pieces I had to invent to build that dragon. We don't make any of them. Should we?"

The product team looks at those custom bricks and decides what to do. Maybe they manufacture one of them as a new SKU. Maybe they don't manufacture any specific brick, but the patterns suggest they should build a machine that lets customers shape their own bricks. Maybe they conclude the dragon use case itself isn't worth investing in, but the technique unlocked something else entirely.

That loop — customer problem → bespoke build → product signal → real product — is the whole game. Without it, you are simply a very expensive consultancy in a t-shirt.

This is where the "E" in FDE earns its keep. A forward deployed engineer has to be technical enough to actually drill the hole, glue the bricks, build the thing. Otherwise the signal that comes back is mush. "The customer wanted a dragon and we couldn't build one" is useless. "The customer wanted a dragon, and the only way I could approximate it was by violating the structural integrity of these four standard bricks in this very specific way" is a product roadmap.

Weird Brick

Misaligned incentives, by design

The dirty secret of running a healthy FDE org is that the incentives between FDEs and product teams are deliberately misaligned.

The FDE wakes up every morning thinking: how do I win this customer? What do I have to violate, hack, or hand-build to ship the dragon they want? The product team wakes up every morning thinking: what's the general abstraction that lets us serve a thousand customers without hand-building anything?

Those two incentives pull against each other constantly, and the tension is genuinely uncomfortable. It's also where good product comes from. Collapse the tension by making everyone think like a product manager, and you stop getting customer signal. Collapse it the other way, by making everyone think like a delivery engineer, and you build Accenture. The job of leadership is to hold the rope taut.

One important corollary: FDE KPIs have to live at the company level, not the engagement level. The moment you start measuring revenue per engagement, your FDEs stop being product scouts and start being account managers. They optimise for charging more for the dragon. They stop bringing back the weird bricks. The healthier metric is something like revenue per forward-deployed person, company-wide — because the win isn't the FDE working harder on each engagement, it's the FDE identifying which bricks are worth productising, so that eventually customers' own engineers end up building dragons on top of those bricks while you collect licence fees. That's the flywheel. Engagement-level metrics short-circuit it.

So what is an FDE?

If you take only one thing away: a Forward Deployed Engineer is a product R&D function dressed up as a delivery function. The deliverable to the customer is a dragon. The deliverable to your own company is the bag of weird bricks you had to invent to build it.

If no one is walking back to product with that bag of bricks, you don't have an FDE org. You have a really good services team. Both are valuable, both can be lucrative — but they're different jobs, and conflating them is how organisations end up confused about why their "FDEs" don't seem to be moving the product forward.

There's a wrinkle worth flagging, which I'll write up properly another time: AI keeps handing you new bricks. Claude Code, Skills, MCP servers, agent frameworks — the brick library itself is now powerful enough that a single engineer can build a credible dragon in an afternoon. As the bricks get more capable, the line between "forward deployed engineer" and "software engineer" starts to blur, and you have to ask whether you even need to productise the dragon at all, or whether the right move is to just keep delivering bespoke ones forever. That's a whole other topic.

For now, the LEGO test is enough: drill the brick, build the dragon, bring back the brick. That's the job.

2026/02/27
in fde
5 min read

What Claude Code's Creator Validated About Forward Deployed Engineering

Last week, Boris Cherny appeared on Lenny's Podcast to talk about Claude Code — the AI coding assistant that now generates 4% of all GitHub commits and helped Anthropic achieve a 200% productivity increase.

Every major insight Boris described — from building for future capability rather than current constraints to finding product-market fit through "latent demand" — perfectly captured what we'd been doing as Forward Deployed Engineers at Palantir for over a decade. It was like listening to someone independently discover gravity.

For seven years at Palantir, I lived the FDE model: small, elite teams embedded directly with customers, building systems ahead of their current capability, discovering requirements by watching how people actually worked rather than what they said they needed. Boris validated that this approach isn't legacy — it's the future of how we'll build AI products.

Building Six Months Ahead

"Build for the model 6 months from now, not today," Boris said. The Claude Code team deliberately avoided over-scaffolding. They gave the model tools and goals and got out of the way, betting on rapid capability improvements rather than constraining the system to current limitations.

This hit me because it's exactly how FDEs approach customer deployments. We don't build systems for where an organization is today — we build for where they'll be in six months. When I was deploying Palantir at large enterprises, the worst mistake was to over-constrain the system to current workflows. The org would evolve, their data would grow, their processes would mature, and suddenly the system we'd carefully tailored to their "requirements" became a straightjacket.

The best FDE deployments gave customers slightly more capability than they could immediately use. We'd build data pipelines that could handle 10x their current volume. We'd create analysis workflows that assumed they'd eventually want to ask more sophisticated questions. We'd design user interfaces that didn't require retraining when their team grew from 5 to 50 people.

This wasn't over-engineering — it was under-constraining. Just like Claude Code bet on the model getting smarter, we bet on the customer getting more sophisticated.

Latent Demand as Product Compass

One of Boris's revelations was that Claude Code's breakthroughs came from watching how people used it in creative ways. Data scientists running SQL queries in terminal windows. Non-technical users asking it to help grow tomatoes. People recovering corrupted wedding photos. This "latent demand" led to Cowork, built in just 10 days because they could see exactly what users were trying to do.

"Latent demand" is just Boris's term for what FDEs do every single day: sit with users and watch how they actually work.

The best features I ever built came from observing workarounds. An analyst who'd manually copy-paste data between systems every morning because the official integration was too rigid. An operations team that kept a separate spreadsheet to track what the official dashboard couldn't show them. An investigator who'd screenshot charts to paste into Word documents because the export function didn't capture what they needed to communicate.

These weren't feature requests — they were organizational antibodies. Users working around the system to get their real job done. Standard product management would survey these users and ask what features they wanted. They'd probably say "better export" or "more integration options." But that's not what they actually needed.

FDEs watch the workflow, not the words. We see the person taking screenshots and realize they don't need better export — they need a way to tell stories with data. We see the manual copy-pasting and realize the issue isn't integration — it's that the official process doesn't match how the work actually flows.

The most successful FDE engagements came from finding latent demand the customer couldn't articulate themselves. Not because they were dumb, but because they were so close to their daily work they couldn't see the pattern.

Ideas Over Engineering Capacity

"Coding is largely solved," Boris said. "The bottleneck is ideas and prioritisation." In the AI era, the new scarcity isn't engineering capacity — it's knowing what to build.

This validates everything FDEs were designed to solve. We were never primarily about coding. Yes, we could write software — often quite quickly — but our real value was understanding the problem deeply enough to know what to build.

The best FDE I worked with at Palantir wasn't the best coder on the team. They were the person who could sit in a room with a counter-terrorism unit and figure out what actually mattered. Who could distinguish between what the organization said it needed and what would actually make their mission successful. Who could see patterns across different deployments and recognize when a specific customer's problem was actually a general case of something we'd solved before.

This person would often write less code than junior engineers on the team. But they'd save months of work by solving the right problem the first time.

Now that AI can generate most code, this skill becomes even more critical. If Claude can write your functions, your value is knowing which functions need to exist.

The Death of "Software Engineer"

"The title 'software engineer' is dying," Boris observed. "Builder is the new reality." As AI democratizes coding, roles blur between engineering, product, design, and deployment.

FDEs were always "builders." Our job description was impossible to write because we did everything: product research, technical architecture, user interface design, data engineering, deployment operations, user training, and ongoing support. We didn't fit neatly into any org chart because the role was defined by outcome, not function.

Traditional software engineering was about implementing specifications. Someone else figured out what to build; engineers built it. But FDE work was always end-to-end ownership. We figured out what to build, built it, deployed it, and lived with the consequences.

This frustrated a lot of people who wanted clean role boundaries. But it was extraordinarily effective for solving complex, novel problems where the solution space wasn't well-defined.

The industry is catching up to what Palantir figured out years ago: when you're building something truly new, you need people who can think across the entire stack — technical, product, and operational. "Builder" is a better word than "engineer" because it captures the full scope.

What This Means for AI Companies

If you're building AI products today, the FDE model isn't legacy — it's your playbook.

Stop organizing around functional silos. Find people who can think across the whole problem, give them access to the best tools, and embed them directly with customers. Build slightly ahead of current capability rather than constraining to current workflows. Watch how people actually use your product, especially the ways they "misuse" it.

Most importantly: understand that your bottleneck isn't engineering capacity anymore. It's product judgment. It's knowing what to build. It's finding the latent demand that customers can't articulate themselves.

The companies that figure this out first will eat everyone else's lunch. Not because they have better AI models, but because they'll build the right things.

Boris Cherny just validated fifteen years of FDE practice. The future belongs to builders who can think end-to-end, work embedded with customers, and see patterns others miss.

The tooling has changed. The principles haven't.

2026/02/22
in hiring
7 min read

Interviewing in the age of AI

Interviews have always been a bad proxy. You get maybe an hour with someone and you're supposed to figure out whether they'll be effective in a role that plays out over months and years. You can't replicate real working conditions—the codebase they'd actually work in, the team dynamics, the ambiguity of real problems. So you construct artificial scenarios and hope the signal transfers.

That fundamental challenge hasn't changed. However, AI has made the gaps in our proxies impossible to ignore.

The signal problem

The core question in any interview is: can this person actually do the job? Everything else—the whiteboard problems, the take-homes, the system design rounds—is just scaffolding to get at that question indirectly.

But with AI, it's now possible to offload much of the thinking and problem-solving itself, making the assessment even harder.

When someone submits a clean take-home with sensible architecture and thorough tests, one used to be able to assume a baseline of understanding behind it. This is no longer true. Not because the code is bad—it's often excellent. But GitHub Copilot, Claude, and ChatGPT have converged on identical patterns. A few years ago, messy but functional code suggested a real engineer working under pressure. Now, too-perfect code could be the tell, but penalising clean code is obviously absurd.

At the same time banning AI isn't the answer. I want engineers using AI. It's the most significant productivity tool to hit software engineering in decades, and anyone not using it is leaving value on the table. The question I'm actually trying to answer in an interview is "can you think with AI, or are you just deferring to it?"

Old formats, honest limitations

These interview formats didn't suddenly break. They always had limitations as proxies for real work. AI just made those limitations undeniable.

Long take-home exercises were always a noisy signal. A four-hour project tells you someone can deliver polished work with unlimited resources and no time pressure—which is rarely what the actual job looks like. AI turned the noise up to eleven: now the output mostly tells me the candidate has access to coding tools. Table stakes.

LeetCode-style problems were always testing a narrow skill—pattern recognition and algorithmic recall—that correlates weakly with day-to-day engineering. AI happens to be exceptionally good at exactly this narrow skill, so now I can't even get the weak signal I used to.

Anything with a "correct answer" has this problem. The clearer the specification, the easier it is for AI to solve. Which is ironic—we used to think clear specs made for fair interviews, which they did. They also made for easy prompts.

I'm not saying these formats are worthless. But the signal they produce has shifted from "can this person solve problems?" to something murkier. And rather than trying to salvage them, I'd ask: what formats actually test the thing I care about?

What I'm looking for now

The formats I've been experimenting with share a common thread: they test whether someone can think, not whether they can produce output. AI is great at producing output, but thinking, judging and validating is still a human job.

Shorter exercises + longer conversations

Instead of a four-hour take-home, a thirty-minute exercise followed by forty-five minutes of discussion. The code is a starting point, not the deliverable.

Why did you structure it this way? What would you change if the requirements shifted to X? Where would this break at scale? What's the ugliest part of this code?

AI can generate code. It can't explain the tradeoffs you considered and rejected. It can't tell me about the moment you started down one path, realised it was wrong, and backed out. And I also just ask directly: how did you use AI? That question alone is surprisingly revealing. Someone who used AI well can articulate what they delegated, what they modified, and what they rejected. Someone who deferred to it entirely tends to get vague.

Those conversations reveal thinking—including whether someone used AI effectively as a tool versus blindly accepting its first suggestion.

Live investigation instead of live coding

Writing code from scratch under interview pressure was always a weird skill to test. It didn't map well to real work even before AI.

Investigation is different. I give candidates a system that's misbehaving—not a syntax error, something behavioural. A race condition. A caching issue. A misunderstood API contract. And yes, they can use whatever tools they want, including AI.

What I'm watching isn't whether they can find the bug. Claude Code can find bugs. What I'm watching is everything around the finding: how they scope the problem, what questions they ask before touching the code, which hypotheses they form first, what they choose to validate versus take on faith.

The person directing the investigation matters more than the investigation itself. A strong engineer will use AI to speed up the search but still decide where to search. They'll sanity-check the AI's suggestion against their own understanding rather than blindly applying a fix. They'll know when the tool is confidently wrong.

Someone who's genuinely thinking will say things like "that can't be the issue because X" or "let me verify this assumption first." Someone who's outsourced the thinking will paste the error into a chat window and accept whatever comes back.

Watching someone use AI well during an interview is actually one of the strongest positive signals I've found. When a candidate uses AI to quickly test a hypothesis, then critically evaluates the result and adjusts course — that's exactly the workflow I want to see on the job.

System design with real constraints

AI is great at generating architecture diagrams and textbook answers. It's less great at navigating the messy reality of your specific situation.

When I ask about system design, I focus on constraints. What if we need to support 10x the traffic? What if the team is two people? What if we need to ship in four weeks? What if this has to run in air-gapped environments?

Good engineers make different choices in different contexts. They can explain why this context changes the answer. Someone who's genuinely thinking—whether or not they used AI to explore options—will navigate these pivots fluidly. Someone who's outsourced the thinking will flounder when the constraints shift.

Roleplay scenarios

This is the most experimental—but possibly the most promising.

FDE and customer-facing engineering roles need skills that are fundamentally about human judgment: real-time conversation, reading the room, managing frustrated stakeholders, diagnosing problems under pressure.

I've started using roleplay scenarios where I play a customer with a problem, and the candidate has to figure out what's actually wrong—not what I say is wrong.

Concrete example: The Broken Dashboard

Here's the shape of an interview I've been running lately.

I play a frustrated stakeholder—a senior executive at a large organisation. Something is wrong with a dashboard that feeds into a critical business process. The numbers don't match what another team is reporting. There's time pressure. The candidate's job is to help me figure out what's going on.

The scenario is designed so that the obvious explanation ("the dashboard is broken, fix the code") is wrong. The real root causes are subtler—the kind of thing you'd only uncover by asking careful questions about data sources, definitions, and upstream processes. There's no bug to patch. The discrepancy is fully explainable, but only if you resist the urge to jump to conclusions.

I won't give away more than that—I plan to continue using this interview.

What this tests

Problem diagnosis under pressure. Does the candidate immediately promise to "fix" the thing, or do they slow down and figure out what's actually happening?

Customer communication. Can they manage a frustrated stakeholder while still asking clarifying questions? Do they resist the urge to commit to solutions before understanding the problem?

Data literacy. Do they think to ask about how the numbers are generated? Or do they assume the system is broken because that's what the customer said?

Ownership. When they figure out the root cause, do they offer a path forward—both for the immediate crisis and the longer-term fix?

Why this works

This isn't a prompt you can give to ChatGPT. There's no code to generate. The "answer" emerges through conversation—through noticing that the stakeholder doesn't actually understand how the system works, through asking the right diagnostic questions, through realising that different teams might be measuring different things.

It tests the thing that actually matters in deployment roles: can you figure out what's really going on, communicate clearly under pressure, and move towards a resolution? Those skills don't change regardless of what tools you're using—and they're the hardest to fake.

What I'm still figuring out

I won't pretend I've cracked this. Some open questions:

Consistency. Roleplay scenarios are harder to evaluate objectively than coding tests. Different interviewers might reach different conclusions about the same conversation.

Fairness. These approaches favour candidates who are comfortable thinking out loud, explaining their reasoning, engaging in back-to-back. That might disadvantage candidates who are brilliant but less verbally fluent.

Scalability. A forty-five-minute investigation exercise with live observation doesn't scale like a take-home. You need more interviewers, more coordination.

AI will keep improving. Maybe next year there's an AI that can roleplay its way through a customer scenario. Maybe debugging exercises become as compromised as LeetCode. I expect to keep iterating on this—the target is always moving.

The goal hasn't changed

I want to hire people who can do the job. The job now includes using AI effectively—so I'm not designing interviews to exclude AI. Instead, I'm designing interviews that reveal whether someone can think, whether or not they have AI in the room.

The best engineers I work with use AI constantly. They also know when to override it, when to dig deeper, when the AI's confident answer is confidently wrong. That judgment is what I'm interviewing for.

I think interviews are heading somewhere interesting. The formats that survive will be the ones that test what AI can't fake: genuine understanding, real-time judgment, and the ability to navigate ambiguity with another human. The interviews of five years from now will look less like exams and more like working sessions — because that's what they should have been all along.

2026/02/11
in ai
4 min read

Setting Up Jeeves: What Actually Made My AI Assistant Useful

I've been running a persistent AI assistant for a few weeks now. Named it Jeeves. Here's what I learned getting it to be genuinely useful rather than just a novelty.

The Baseline

I'm using OpenClaw — an open-source framework for persistent AI agents. Out of the box, it gives you a workspace with markdown files for memory, personality, and proactive task scheduling. The agent reads these files each session, so it has continuity. I won't explain the whole architecture here (their docs do that), but the key point is: the foundation is just text files. What you put in them is what makes it work.

GitHub as External Memory

My assistant's workspace lives at ~/.openclaw/workspace. That's fine for running locally, but I wanted:

Version control — if something breaks, I can roll back
Access from anywhere — especially my phone when I'm out

Solution: Jeeves syncs its workspace to a private GitHub repo daily. The morning heartbeat includes:

cp workspace files → private-notes/jeeves/
git add && git commit && git push

Now when I'm at a conference and want to check my assistant's notes on someone I'm about to meet, I just open GitHub on my phone.

Meeting Transcripts with Granola

Most of my work meetings happen over Zoom or Google Meet. I use Granola to capture transcripts — it runs locally and records what's said without needing bot attendees.

The challenge: Granola stores everything locally on my laptop. Jeeves runs separately and needs access to that context.

I built two tools to bridge this:

granola-py-client: A Python client for the Granola API. Uses async httpx, Pydantic validation, and can authenticate using the local Granola app's token.
granola-archiver: Automated system that polls for new transcripts, formats them as markdown with YAML frontmatter, and commits them to a GitHub repo organised by date (YYYY/MM/YYYY-MM-DD-title.md). Runs on a schedule via launchd.

Now I have months of meeting transcripts in a git repo. When I need context from a past conversation, Jeeves can grep through them in seconds. "What did we discuss with \<redacted client> about pricing?" — answered in a moment.

Integrations That Matter

The assistant is only as useful as what it can touch. Here's what actually gets used daily:

Email + Calendar (via gog CLI): - Morning briefing pulls calendar and flags urgent unread emails - School emails get parsed automatically → calendar events created with both me and my wife as attendees - I have notifications off; Jeeves is my notification layer

GitHub Issues as Daily Planner: - Each day gets an issue in a daily-planner repo - Jeeves drafts tomorrow's plan based on calendar and pending items - I check things off throughout the day

Telegram: - Primary interface — I message Jeeves like a person - It can reach out proactively (heartbeat system) - When I'm at an event and need a quick lookup on someone, I just ask

Time Tracking: - After client meetings, Jeeves prompts me to log time - Points me to the right spreadsheet for each client

A note on access: For anything sensitive, Jeeves has read-only access. It can check my email and calendar. This is intentional.

None of these integrations are complex. They're just the right hooks into how I already work.

What We Changed After a Few Weeks

After running Jeeves for a bit, I did a retro. Some adjustments:

Tone calibration: Early on, it was too enthusiastic. Lots of "Great question!" and affirmation. I updated the personality file to be drier, more direct. I want a sparring partner, not a cheerleader. When I told it "be more like the Wodehousian Jeeves," it briefly started calling me "sir" — had to walk that back.

Proactive but not annoying: The heartbeat system can easily become spam. Key principle: only surface things that need attention. If nothing's urgent, stay quiet. Late night? Stay quiet. I'm clearly busy? Stay quiet. The goal is an assistant that helps when needed and disappears otherwise.

School calendar automation: Initially I was manually adding school events. Now any email from the school gets parsed, events extracted, and both parents added to the calendar invite automatically. Small thing, but it removed recurring friction.

Memory hygiene: Daily notes accumulate fast. I added a periodic task for Jeeves to review recent daily files and distill anything important into long-term memory. Raw notes are a journal; curated memory is what matters for continuity.

Explicit boundaries: I have access to a lot through Jeeves — email, calendar, files. But in group contexts, it shouldn't speak as me or share my private context freely. I added explicit rules: monitor but don't respond without checking with me first. Ask before sending anything external.

What's Next

Things I'm still figuring out:

Multi-account support: I have separate Google accounts for personal and work. Currently only personal is connected.
Voice interface: For quick capture when I'm walking or driving.
Better handoff to sub-agents: Some tasks should spawn a background worker and report back. The plumbing exists but I'm not using it much yet.

The Actual Value

The surprising thing isn't any single capability. It's the compound effect of continuity.

Jeeves knows my kids' school schedule, my client pricing, the context of conversations I had months ago, and what I'm trying to get done today. It doesn't ask me to re-explain things. It builds on what it already knows.

That's the difference between a chatbot and an assistant.

Jeeves runs on OpenClaw + Claude. Named after the original gentleman's gentleman, though at its request, I've stopped calling it "sir."