Most companies overestimate their AI maturity. They have a few models in production and assume that means they have an ML capability. What they usually have is a handful of talented people holding a fragile process together by hand.

Google's MLOps maturity model uses three levels. It is a good starting point, but in practice the distance between a team running notebooks manually and a team with automated retraining and monitoring is too wide for three buckets. Our team uses five. Here is what each one looks like, and roughly where the companies we assess actually land.

Level 1: Manual and ad-hoc

Data scientists run experiments in notebooks on their own machines. Models get to production through manual handoffs, often a script someone runs when they remember to. There is no version control on data, no reproducibility, and when the person who built the model leaves, the knowledge leaves with them. This is more common than anyone admits.

Level 2: Repeatable but manual

The team has standardised some of the workflow. Models are version-controlled. There is a documented path to production, even if a human still walks each step. Retraining happens, but on a "when we notice a problem" basis rather than a schedule. Most organisations we assess sit at Level 1 or Level 2, roughly 65 to 70 percent of them.

Level 3: Automated pipelines

This is where real MLOps begins. Training and deployment run through automated pipelines. CI/CD exists for models, not just application code. The team can rebuild and redeploy a model reliably without heroics. About 20 percent of the companies we see are here. The jump from Level 2 to Level 3 is the one most worth making, and the one most companies stall on because they buy a platform before fixing the workflow.

Level 4: Monitored and managed

The pipeline is automated and the production system is watched. Model performance, data drift, and prediction quality are monitored continuously, and degradation triggers alerts. The organisation knows when a model is quietly getting worse, which most do not. Around 8 to 10 percent reach this.

Level 5: Fully automated and governed

Retraining triggers automatically on drift or schedule. Governance, audit trails, and rollback are built into the pipeline rather than bolted on. Fewer than 3 percent of the organisations we assess operate here, and most do not need to. Level 5 is right for a bank running credit models at scale. It is overkill for a company with three models and five users.

Why the level matters more than the headcount

The reason we assess maturity at the start of a transformation engagement is simple. A company at Level 1 trying to act like it is at Level 4 will fail expensively. It will buy monitoring tools for pipelines that do not exist yet. The right next step is always one level up, not five. Knowing which rung you are actually standing on is the difference between a roadmap that works and a budget that disappears.


If you are weighing an AI investment, acquisition, vendor selection, or training programme, our team is happy to start with a conversation about scope and approach.

Schedule a Scoping Call

The views and findings in this article are shared for general information only. They are high-level perspectives, not legal, financial, regulatory, or other professional advice, and should not be relied upon for any specific decision or circumstance. For guidance tailored to your situation, please consult a qualified adviser.