Nobody calls me to tell me their ML team is happy. They call me when a second engineer has quit in a quarter, when a senior lead gave notice, when an exit interview revealed something the VP did not know. After fifteen years of placing ML engineers who left their last job for reasons both good and bad, I can tell you the retention patterns are not mysterious. Most departures are predictable. Most are preventable. Most are cheaper to address at month nine than at month eighteen, when the engineer has already accepted a competing offer.
Why ML engineers leave (it is rarely just money)
In exit interviews and follow-up candidate conversations, the top reasons I hear for ML departures, in rough order of frequency:
- Frustration with compute or tooling that slows daily work
- Unclear or shifting success metrics
- A feeling that their work does not ship
- Career ladder that exists on paper but not in practice
- Compensation drift relative to market
- Management that listens but does not act
Money is on the list. It is not at the top. The engineer who tells their manager “I got a competing offer for twenty-five percent more” has usually been quietly unhappy about one of the first four reasons for months, and the competing offer is the excuse to act.
The compute and tooling frustrations that push engineers out
The day-to-day experience of an ML engineer is shaped by compute access, data pipelines, and development tooling. When those are frictionless, the engineer spends their time on problems. When they are broken, the engineer spends their time on workarounds. I have watched senior engineers give notice specifically over GPU queue times, quarter-long ticket backlogs to get basic dev access, and feature engineering pipelines that fail silently. The fix is rarely dramatic. It is usually a budget line and a product owner assigned to developer experience, but the teams that ignore it pay for it in attrition.
Unclear success metrics: the silent retention killer
An ML engineer who does not know how their work will be judged is an ML engineer who is quietly updating their resume. Good teams define the model’s business metric, its evaluation set, and the bar for shipping before work starts. Struggling teams define the metric retroactively, in a review that surprises the engineer. Over the course of two years, this pattern compounds into a belief that the work does not matter, which becomes the reason the engineer leaves.
The research-vs-production tension and how to resolve it
ML engineers sit on a spectrum between research and production. Most organizations pull them toward whichever end is currently under-resourced, which creates whiplash. The retention fix is to be deliberate: assign each engineer a primary mode (research, applied, or platform) and a secondary mode, and hold those assignments stable across at least two review cycles. Engineers who understand which hat they are wearing on a given day do better work and stay longer than engineers who are asked to context-switch weekly.
Career ladders that actually exist on paper
“We have a dual-track career ladder” is one of the most common recruiting lines I hear, and one of the most commonly false. A real dual-track ladder has distinct promotion criteria for IC and management, actual promotions into senior IC roles (staff, principal, distinguished), and compensation at the top IC levels that matches or exceeds the equivalent management track. If your staff engineer makes less than your engineering manager, your dual-track ladder is marketing, not a retention tool.
Compensation benchmarking and pre-emptive raises
Compensation drift is real, and ML compensation moves faster than most HR cycles. Teams that retain ML engineers at high rates do two things: they benchmark comp against the actual market (Levels.fyi cuts, recruiter data, peer-level information) at least twice a year, and they make pre-emptive raises, adjustments that happen before the engineer brings a competing offer to the table. A pre-emptive ten-thousand-dollar raise at month fifteen costs infinitely less than losing a senior engineer at month eighteen and spending ninety thousand dollars to replace them.
When a counter-offer is the right play (and when it is not)
An engineer who brings you a competing offer has already decided to leave in their head. Counter-offers work about thirty percent of the time in my data, and only when the underlying dissatisfaction was purely compensation. If the engineer’s frustration was about tooling, metrics, management, or work they do not want to do, a counter-offer buys you six months of delayed departure and nothing else. When you do counter, counter specifically to the problem: not a fifteen-percent raise, but a fifteen-percent raise plus the tooling investment, plus the project reassignment, plus the written commitment to a promotion review in six months.
The stay interview every ML manager should be running
The single highest-ROI retention practice I recommend is a quarterly stay interview. Thirty minutes, one-on-one, with four questions:
- What has energized you in the last quarter?
- What has frustrated you?
- What would make you more likely to stay two more years?
- If a friend at another company reached out about a role, what would make you pick up the call?
The fourth question is the one that surfaces the real answer. Follow up on every stay interview with at least one visible action in the next sixty days, or the conversation becomes performative and the engineer stops giving you real answers.
Keeping your best engineers for the long haul
Retention is not a department. It is a management discipline built from dozens of small practices: compute that works, metrics that are clear, ladders that are real, compensation that tracks the market, managers who listen and then act. The teams that get this right save themselves the catastrophic cost of replacing senior ML talent every eighteen months. The ones that do not will keep calling me when the exit interviews roll in. Thoughtful Machine Learning recruitment can bring great engineers in the door; only deliberate retention keeps them there.