Skip to content

2026

What Claude Code's Creator Validated About Forward Deployed Engineering

Last week, Boris Cherny appeared on Lenny's Podcast to talk about Claude Code — the AI coding assistant that now generates 4% of all GitHub commits and helped Anthropic achieve a 200% productivity increase.

Every major insight Boris described — from building for future capability rather than current constraints to finding product-market fit through "latent demand" — perfectly captured what we'd been doing as Forward Deployed Engineers at Palantir for over a decade. It was like listening to someone independently discover gravity.

For seven years at Palantir, I lived the FDE model: small, elite teams embedded directly with customers, building systems ahead of their current capability, discovering requirements by watching how people actually worked rather than what they said they needed. Boris validated that this approach isn't legacy — it's the future of how we'll build AI products.

Building Six Months Ahead

"Build for the model 6 months from now, not today," Boris said. The Claude Code team deliberately avoided over-scaffolding. They gave the model tools and goals and got out of the way, betting on rapid capability improvements rather than constraining the system to current limitations.

This hit me because it's exactly how FDEs approach customer deployments. We don't build systems for where an organization is today — we build for where they'll be in six months. When I was deploying Palantir at large enterprises, the worst mistake was to over-constrain the system to current workflows. The org would evolve, their data would grow, their processes would mature, and suddenly the system we'd carefully tailored to their "requirements" became a straightjacket.

The best FDE deployments gave customers slightly more capability than they could immediately use. We'd build data pipelines that could handle 10x their current volume. We'd create analysis workflows that assumed they'd eventually want to ask more sophisticated questions. We'd design user interfaces that didn't require retraining when their team grew from 5 to 50 people.

This wasn't over-engineering — it was under-constraining. Just like Claude Code bet on the model getting smarter, we bet on the customer getting more sophisticated.

Latent Demand as Product Compass

One of Boris's revelations was that Claude Code's breakthroughs came from watching how people used it in creative ways. Data scientists running SQL queries in terminal windows. Non-technical users asking it to help grow tomatoes. People recovering corrupted wedding photos. This "latent demand" led to Cowork, built in just 10 days because they could see exactly what users were trying to do.

"Latent demand" is just Boris's term for what FDEs do every single day: sit with users and watch how they actually work.

The best features I ever built came from observing workarounds. An analyst who'd manually copy-paste data between systems every morning because the official integration was too rigid. An operations team that kept a separate spreadsheet to track what the official dashboard couldn't show them. An investigator who'd screenshot charts to paste into Word documents because the export function didn't capture what they needed to communicate.

These weren't feature requests — they were organizational antibodies. Users working around the system to get their real job done. Standard product management would survey these users and ask what features they wanted. They'd probably say "better export" or "more integration options." But that's not what they actually needed.

FDEs watch the workflow, not the words. We see the person taking screenshots and realize they don't need better export — they need a way to tell stories with data. We see the manual copy-pasting and realize the issue isn't integration — it's that the official process doesn't match how the work actually flows.

The most successful FDE engagements came from finding latent demand the customer couldn't articulate themselves. Not because they were dumb, but because they were so close to their daily work they couldn't see the pattern.

Ideas Over Engineering Capacity

"Coding is largely solved," Boris said. "The bottleneck is ideas and prioritisation." In the AI era, the new scarcity isn't engineering capacity — it's knowing what to build.

This validates everything FDEs were designed to solve. We were never primarily about coding. Yes, we could write software — often quite quickly — but our real value was understanding the problem deeply enough to know what to build.

The best FDE I worked with at Palantir wasn't the best coder on the team. They were the person who could sit in a room with a counter-terrorism unit and figure out what actually mattered. Who could distinguish between what the organization said it needed and what would actually make their mission successful. Who could see patterns across different deployments and recognize when a specific customer's problem was actually a general case of something we'd solved before.

This person would often write less code than junior engineers on the team. But they'd save months of work by solving the right problem the first time.

Now that AI can generate most code, this skill becomes even more critical. If Claude can write your functions, your value is knowing which functions need to exist.

The Death of "Software Engineer"

"The title 'software engineer' is dying," Boris observed. "Builder is the new reality." As AI democratizes coding, roles blur between engineering, product, design, and deployment.

FDEs were always "builders." Our job description was impossible to write because we did everything: product research, technical architecture, user interface design, data engineering, deployment operations, user training, and ongoing support. We didn't fit neatly into any org chart because the role was defined by outcome, not function.

Traditional software engineering was about implementing specifications. Someone else figured out what to build; engineers built it. But FDE work was always end-to-end ownership. We figured out what to build, built it, deployed it, and lived with the consequences.

This frustrated a lot of people who wanted clean role boundaries. But it was extraordinarily effective for solving complex, novel problems where the solution space wasn't well-defined.

The industry is catching up to what Palantir figured out years ago: when you're building something truly new, you need people who can think across the entire stack — technical, product, and operational. "Builder" is a better word than "engineer" because it captures the full scope.

What This Means for AI Companies

If you're building AI products today, the FDE model isn't legacy — it's your playbook.

Stop organizing around functional silos. Find people who can think across the whole problem, give them access to the best tools, and embed them directly with customers. Build slightly ahead of current capability rather than constraining to current workflows. Watch how people actually use your product, especially the ways they "misuse" it.

Most importantly: understand that your bottleneck isn't engineering capacity anymore. It's product judgment. It's knowing what to build. It's finding the latent demand that customers can't articulate themselves.

The companies that figure this out first will eat everyone else's lunch. Not because they have better AI models, but because they'll build the right things.

Boris Cherny just validated fifteen years of FDE practice. The future belongs to builders who can think end-to-end, work embedded with customers, and see patterns others miss.

The tooling has changed. The principles haven't.

Interviewing in the age of AI

Interviews have always been a bad proxy. You get maybe an hour with someone and you're supposed to figure out whether they'll be effective in a role that plays out over months and years. You can't replicate real working conditions—the codebase they'd actually work in, the team dynamics, the ambiguity of real problems. So you construct artificial scenarios and hope the signal transfers.

That fundamental challenge hasn't changed. However, AI has made the gaps in our proxies impossible to ignore.

The signal problem

The core question in any interview is: can this person actually do the job? Everything else—the whiteboard problems, the take-homes, the system design rounds—is just scaffolding to get at that question indirectly.

But with AI, it's now possible to offload much of the thinking and problem-solving itself, making the assessment even harder.

When someone submits a clean take-home with sensible architecture and thorough tests, one used to be able to assume a baseline of understanding behind it. This is no longer true. Not because the code is bad—it's often excellent. But GitHub Copilot, Claude, and ChatGPT have converged on identical patterns. A few years ago, messy but functional code suggested a real engineer working under pressure. Now, too-perfect code could be the tell, but penalising clean code is obviously absurd.

At the same time banning AI isn't the answer. I want engineers using AI. It's the most significant productivity tool to hit software engineering in decades, and anyone not using it is leaving value on the table. The question I'm actually trying to answer in an interview is "can you think with AI, or are you just deferring to it?"

Old formats, honest limitations

These interview formats didn't suddenly break. They always had limitations as proxies for real work. AI just made those limitations undeniable.

Long take-home exercises were always a noisy signal. A four-hour project tells you someone can deliver polished work with unlimited resources and no time pressure—which is rarely what the actual job looks like. AI turned the noise up to eleven: now the output mostly tells me the candidate has access to coding tools. Table stakes.

LeetCode-style problems were always testing a narrow skill—pattern recognition and algorithmic recall—that correlates weakly with day-to-day engineering. AI happens to be exceptionally good at exactly this narrow skill, so now I can't even get the weak signal I used to.

Anything with a "correct answer" has this problem. The clearer the specification, the easier it is for AI to solve. Which is ironic—we used to think clear specs made for fair interviews, which they did. They also made for easy prompts.

I'm not saying these formats are worthless. But the signal they produce has shifted from "can this person solve problems?" to something murkier. And rather than trying to salvage them, I'd ask: what formats actually test the thing I care about?

What I'm looking for now

The formats I've been experimenting with share a common thread: they test whether someone can think, not whether they can produce output. AI is great at producing output, but thinking, judging and validating is still a human job.

Shorter exercises + longer conversations

Instead of a four-hour take-home, a thirty-minute exercise followed by forty-five minutes of discussion. The code is a starting point, not the deliverable.

Why did you structure it this way? What would you change if the requirements shifted to X? Where would this break at scale? What's the ugliest part of this code?

AI can generate code. It can't explain the tradeoffs you considered and rejected. It can't tell me about the moment you started down one path, realised it was wrong, and backed out. And I also just ask directly: how did you use AI? That question alone is surprisingly revealing. Someone who used AI well can articulate what they delegated, what they modified, and what they rejected. Someone who deferred to it entirely tends to get vague.

Those conversations reveal thinking—including whether someone used AI effectively as a tool versus blindly accepting its first suggestion.

Live investigation instead of live coding

Writing code from scratch under interview pressure was always a weird skill to test. It didn't map well to real work even before AI.

Investigation is different. I give candidates a system that's misbehaving—not a syntax error, something behavioural. A race condition. A caching issue. A misunderstood API contract. And yes, they can use whatever tools they want, including AI.

What I'm watching isn't whether they can find the bug. Claude Code can find bugs. What I'm watching is everything around the finding: how they scope the problem, what questions they ask before touching the code, which hypotheses they form first, what they choose to validate versus take on faith.

The person directing the investigation matters more than the investigation itself. A strong engineer will use AI to speed up the search but still decide where to search. They'll sanity-check the AI's suggestion against their own understanding rather than blindly applying a fix. They'll know when the tool is confidently wrong.

Someone who's genuinely thinking will say things like "that can't be the issue because X" or "let me verify this assumption first." Someone who's outsourced the thinking will paste the error into a chat window and accept whatever comes back.

Watching someone use AI well during an interview is actually one of the strongest positive signals I've found. When a candidate uses AI to quickly test a hypothesis, then critically evaluates the result and adjusts course — that's exactly the workflow I want to see on the job.

System design with real constraints

AI is great at generating architecture diagrams and textbook answers. It's less great at navigating the messy reality of your specific situation.

When I ask about system design, I focus on constraints. What if we need to support 10x the traffic? What if the team is two people? What if we need to ship in four weeks? What if this has to run in air-gapped environments?

Good engineers make different choices in different contexts. They can explain why this context changes the answer. Someone who's genuinely thinking—whether or not they used AI to explore options—will navigate these pivots fluidly. Someone who's outsourced the thinking will flounder when the constraints shift.

Roleplay scenarios

This is the most experimental—but possibly the most promising.

FDE and customer-facing engineering roles need skills that are fundamentally about human judgment: real-time conversation, reading the room, managing frustrated stakeholders, diagnosing problems under pressure.

I've started using roleplay scenarios where I play a customer with a problem, and the candidate has to figure out what's actually wrong—not what I say is wrong.

Concrete example: The Broken Dashboard

Here's the shape of an interview I've been running lately.

I play a frustrated stakeholder—a senior executive at a large organisation. Something is wrong with a dashboard that feeds into a critical business process. The numbers don't match what another team is reporting. There's time pressure. The candidate's job is to help me figure out what's going on.

The scenario is designed so that the obvious explanation ("the dashboard is broken, fix the code") is wrong. The real root causes are subtler—the kind of thing you'd only uncover by asking careful questions about data sources, definitions, and upstream processes. There's no bug to patch. The discrepancy is fully explainable, but only if you resist the urge to jump to conclusions.

I won't give away more than that—I plan to continue using this interview.

What this tests

Problem diagnosis under pressure. Does the candidate immediately promise to "fix" the thing, or do they slow down and figure out what's actually happening?

Customer communication. Can they manage a frustrated stakeholder while still asking clarifying questions? Do they resist the urge to commit to solutions before understanding the problem?

Data literacy. Do they think to ask about how the numbers are generated? Or do they assume the system is broken because that's what the customer said?

Ownership. When they figure out the root cause, do they offer a path forward—both for the immediate crisis and the longer-term fix?

Why this works

This isn't a prompt you can give to ChatGPT. There's no code to generate. The "answer" emerges through conversation—through noticing that the stakeholder doesn't actually understand how the system works, through asking the right diagnostic questions, through realising that different teams might be measuring different things.

It tests the thing that actually matters in deployment roles: can you figure out what's really going on, communicate clearly under pressure, and move towards a resolution? Those skills don't change regardless of what tools you're using—and they're the hardest to fake.

What I'm still figuring out

I won't pretend I've cracked this. Some open questions:

Consistency. Roleplay scenarios are harder to evaluate objectively than coding tests. Different interviewers might reach different conclusions about the same conversation.

Fairness. These approaches favour candidates who are comfortable thinking out loud, explaining their reasoning, engaging in back-to-back. That might disadvantage candidates who are brilliant but less verbally fluent.

Scalability. A forty-five-minute investigation exercise with live observation doesn't scale like a take-home. You need more interviewers, more coordination.

AI will keep improving. Maybe next year there's an AI that can roleplay its way through a customer scenario. Maybe debugging exercises become as compromised as LeetCode. I expect to keep iterating on this—the target is always moving.

The goal hasn't changed

I want to hire people who can do the job. The job now includes using AI effectively—so I'm not designing interviews to exclude AI. Instead, I'm designing interviews that reveal whether someone can think, whether or not they have AI in the room.

The best engineers I work with use AI constantly. They also know when to override it, when to dig deeper, when the AI's confident answer is confidently wrong. That judgment is what I'm interviewing for.

I think interviews are heading somewhere interesting. The formats that survive will be the ones that test what AI can't fake: genuine understanding, real-time judgment, and the ability to navigate ambiguity with another human. The interviews of five years from now will look less like exams and more like working sessions — because that's what they should have been all along.

Setting Up Jeeves: What Actually Made My AI Assistant Useful

I've been running a persistent AI assistant for a few weeks now. Named it Jeeves. Here's what I learned getting it to be genuinely useful rather than just a novelty.

The Baseline

I'm using OpenClaw — an open-source framework for persistent AI agents. Out of the box, it gives you a workspace with markdown files for memory, personality, and proactive task scheduling. The agent reads these files each session, so it has continuity. I won't explain the whole architecture here (their docs do that), but the key point is: the foundation is just text files. What you put in them is what makes it work.

GitHub as External Memory

My assistant's workspace lives at ~/.openclaw/workspace. That's fine for running locally, but I wanted:

  1. Version control — if something breaks, I can roll back
  2. Access from anywhere — especially my phone when I'm out

Solution: Jeeves syncs its workspace to a private GitHub repo daily. The morning heartbeat includes:

cp workspace files → private-notes/jeeves/
git add && git commit && git push

Now when I'm at a conference and want to check my assistant's notes on someone I'm about to meet, I just open GitHub on my phone.

Meeting Transcripts with Granola

Most of my work meetings happen over Zoom or Google Meet. I use Granola to capture transcripts — it runs locally and records what's said without needing bot attendees.

The challenge: Granola stores everything locally on my laptop. Jeeves runs separately and needs access to that context.

I built two tools to bridge this:

  • granola-py-client: A Python client for the Granola API. Uses async httpx, Pydantic validation, and can authenticate using the local Granola app's token.

  • granola-archiver: Automated system that polls for new transcripts, formats them as markdown with YAML frontmatter, and commits them to a GitHub repo organised by date (YYYY/MM/YYYY-MM-DD-title.md). Runs on a schedule via launchd.

Now I have months of meeting transcripts in a git repo. When I need context from a past conversation, Jeeves can grep through them in seconds. "What did we discuss with \<redacted client> about pricing?" — answered in a moment.

Integrations That Matter

The assistant is only as useful as what it can touch. Here's what actually gets used daily:

Email + Calendar (via gog CLI): - Morning briefing pulls calendar and flags urgent unread emails - School emails get parsed automatically → calendar events created with both me and my wife as attendees - I have notifications off; Jeeves is my notification layer

GitHub Issues as Daily Planner: - Each day gets an issue in a daily-planner repo - Jeeves drafts tomorrow's plan based on calendar and pending items - I check things off throughout the day

Telegram: - Primary interface — I message Jeeves like a person - It can reach out proactively (heartbeat system) - When I'm at an event and need a quick lookup on someone, I just ask

Time Tracking: - After client meetings, Jeeves prompts me to log time - Points me to the right spreadsheet for each client

A note on access: For anything sensitive, Jeeves has read-only access. It can check my email and calendar. This is intentional.

None of these integrations are complex. They're just the right hooks into how I already work.

What We Changed After a Few Weeks

After running Jeeves for a bit, I did a retro. Some adjustments:

Tone calibration: Early on, it was too enthusiastic. Lots of "Great question!" and affirmation. I updated the personality file to be drier, more direct. I want a sparring partner, not a cheerleader. When I told it "be more like the Wodehousian Jeeves," it briefly started calling me "sir" — had to walk that back.

Proactive but not annoying: The heartbeat system can easily become spam. Key principle: only surface things that need attention. If nothing's urgent, stay quiet. Late night? Stay quiet. I'm clearly busy? Stay quiet. The goal is an assistant that helps when needed and disappears otherwise.

School calendar automation: Initially I was manually adding school events. Now any email from the school gets parsed, events extracted, and both parents added to the calendar invite automatically. Small thing, but it removed recurring friction.

Memory hygiene: Daily notes accumulate fast. I added a periodic task for Jeeves to review recent daily files and distill anything important into long-term memory. Raw notes are a journal; curated memory is what matters for continuity.

Explicit boundaries: I have access to a lot through Jeeves — email, calendar, files. But in group contexts, it shouldn't speak as me or share my private context freely. I added explicit rules: monitor but don't respond without checking with me first. Ask before sending anything external.

What's Next

Things I'm still figuring out:

  • Multi-account support: I have separate Google accounts for personal and work. Currently only personal is connected.
  • Voice interface: For quick capture when I'm walking or driving.
  • Better handoff to sub-agents: Some tasks should spawn a background worker and report back. The plumbing exists but I'm not using it much yet.

The Actual Value

The surprising thing isn't any single capability. It's the compound effect of continuity.

Jeeves knows my kids' school schedule, my client pricing, the context of conversations I had months ago, and what I'm trying to get done today. It doesn't ask me to re-explain things. It builds on what it already knows.

That's the difference between a chatbot and an assistant.


Jeeves runs on OpenClaw + Claude. Named after the original gentleman's gentleman, though at its request, I've stopped calling it "sir."