If you’re a Data Scientist, this probably comes as no surprise: you have seen many projects that start as a great promise and fade away, or worse — hit production and then rolled back due to poor performance and customer complaints. Some just treat this as a “part of life” in ML teams. But have you ever wondered whether there’s a common denominator for those failures?
87% of ML projects are eventually not delivered to production (VB). This article from 2019 is cited in almost every MLOps startup pitch deck, and this number is well-established in the ML discourse. To be totally honest, I have tried to trace back this number and figure out how it was retrieved — and didn’t find any reliable source or research to support it. However, this number seems quite reasonable if you also consider projects that were stopped at an early stage of PoC. The more painful number is the relative amount of projects that were already committed to the management or even to customers, in which significant efforts have already been invested — that were terminated before (or after) hitting production. In my previous post: “how to run due diligence for ML teams”, I give a high-level overview of the ingredients of successful ML teams. Here, you can find some practical advice on how to build high-impact ML teams.
Companies invest millions in ML teams, employing top talent and paying accordingly, and the question hanging over this is: “does it worth it?” As an ML leader and enthusiast, I know it is. ML, when productionized correctly, provides value that other algorithm types can’t, and may turn out as a key differentiator for companies that embrace it. In the same breath, I believe that we, as Data Scientists, should be aware of the pitfalls that prevent us from delivering. As I always tend to do, I divide the pitfalls into 3 main groups:
- ML team mindset
- Company readiness towards ML
- ML infrastructure & ops
In this blog post, I share my experience with these pitfalls (yes, I have encountered them all) and provide practical approaches on how to avoid them.
Pitfall #1 — ML team mindset
It is no secret that Data Scientists are often perceived as disconnected from the business. They sit in the ivory tower, discussing complex scientific ideas and developing stuff that no one understands nor eventually uses. This is no exaggeration, I have seen many companies where this is exactly the case. The people in ML teams are high-level individuals, often with M.Sc. or Ph.D. degrees, but their mindset favors science over impact.
Here are some situations that exemplify the above:
- Given a business problem that requires text or tabular data classification, pushing for the latest and greatest Deep Learning model, instead of developing a simple baseline model first all the way to production.
This great paper by Intel’s AI group (2021) shows that for tabular data — ensemble models such as XGBoost perform on par with deep learning models. This beautiful paper by IBM (2017) claims that for text classification tasks, linear models on top of Word2Vec representations perform similarly to deep learning solutions.
I am not arguing against Deep Learning, I’m just saying that building a baseline model that yields pretty good results at first is a better strategy, that allows much quicker delivery. - Planning a 1-year project without any intermediate deliverables to customers or production. This mindset is very academic — “don’t disturb me till I’m done”, but is not suited for the industry pace, where the requirements may change along the way and there should be frequent interactions between the ML team and the customers, even during the development phase.
It depends on the project, of course, but we should think outside-the-box, and break down a long-term project into smaller pieces that can bring value in production.
A recent example I have encountered is a 1-year project of a complex classification model. We suggested adding a milestone of presenting the selected features in a designated dashboard. Indeed, it doesn’t have the ultimate value as the prediction itself, but releasing the interim results of the feature selection still conveys value to our customers and allows us to practice delivery to production much faster, develop trust internally in our organization, and gain our customer’s trust as well.
When I interview candidates for Data Science positions, I put much emphasis on understanding what is more important to them: impact or science. While it is important to me to work with Data Scientists who are familiar with state-of-the-art models and have a passion for science, their priority has to be impact-first in order to make the team successful.
Pitfall #2 — Company readiness towards ML
Here is a collection of tips, all of them are external to the ML team and relate to a general company perception, but without them, it will be hard or even impossible to succeed in delivering ML to production.
- Beware of Moonshots — setting ambitious, even imaginary goals for the ML team is one very common mistake that companies do. As ML leaders, we should set the expectations straight, and the company must acknowledge the fact that the ML team cannot solve all possible problems at once. It is especially important not to commit to huge projects at the beginning.
Start small, deliver gradual value, and create trust. - Hire an ML Product Manager — this should be your first hire. From my experience, having a PM that is business-oriented on one hand, and understands the flow and the complexities of ML development on other hand — is priceless. Owning the ML-based-Product, discovering the customers’ needs and the business value as well as designing the nitty-gritty requirements for the ML project should be done by a person who knows well both sides of the ML coin: the customer side and the Data Science side.
- Develop trusted customer relations — when working on ML projects, it is important to tune the customers’ expectations regarding the outcome and create a process to get quick feedback from customers. I recommend developing relations with customers that are open to the ML idea and making them design partners. Design partners allow us to use their data during the development phase, discuss our interim results, get their feedback, and make sure, as the project evolves, that we have a product-market-fit and that we are focused on the correct direction.
- Work in squads — in software development, it is well-established that scrum teams should contain all the roles that are needed to deliver a feature (e.g. backend, frontend, and QA automation), and bring business value fast. Somehow, in many organizations, ML teams don’t work in a similar model: they are isolated and work in a waterfall approach — only after the model is ready, it is sent to the engineering team which translates it to scalable code. This process is counter-productive and contradicts modern software development approach: ML teams should operate in squads as well, containing Data Scientists and Engineers that work together on the ML component end-to-end. This makes experimentation on real-world production data before releasing to production much simpler and contributes to the velocity of the team.
If your model’s release depends on another team, you should take into account that the other team’s priorities may change and it puts your delivery at risk. - Invest in ML infrastructure — the company should be aware that in order to have a scalable ML system, we need to invest in developing a proper infrastructure. That is, we are not just working on ad-hoc projects, but we have a unified system to develop, deploy and monitor ML models. This leads us to the next section, but it’s important to have the management’s buy-in on this.
Pitfall #3 — ML infrastructure & Ops
One of the most important principles in Software Engineering is “Fail-Fast”. In order to fail fast, one needs a good infrastructure to test, deploy, monitor, and alert when something fails. Investing time, thought and effort in such a system is worthwhile long-term, as it accelerates software development.
Problem is, that ML components are different from traditional software components. Let’s take for example a web service, that predicts whether there is a traffic light in an image taken by an automotive image sensor. This service worked well, but suddenly started returning wrong predictions: predicts traffic lights when they’re not there, and vice versa. “What happened?” you probably ask. One possible reason is that it started raining and the ML model was not trained on images taken in the rain. Another explanation could be that a developer deployed a new code to compress the image files and saved millions in storage costs, but the ML model was not trained on compressed images. There could be other explanations, however, in any scenario, the ML would yield bad output because it was not trained on this population: “garbage in — garbage out”. However, the responses of the web service would be successful (return code: 2XX), as the service is considered healthy, even though it returns completely wrong predictions. The bad news is, that the first person to notice the failure will be our customer, and that’s, well, let’s just say that’s unpleasant. Unless we find a way to continuously monitor and test the input population.
ML requires a different set of tools to establish a “fail-fast” system, and trusted CI/CD pipelines for ML are a relatively new concept. These tools usually fall under the definition of “MLOps” (ML operations), and the good news is that this field constantly develops and improves.