Introduction
When machine learning initially emerged, many speculated that it would spark another industrial revolution. Fast forward to today and many would say that it’s nothing more than a buzzword.
Don’t get me wrong. Machine learning is a useful tool, but it’s nothing more than that. And it’s a stretch to say that it’s anything like a swiss army knife — I’d think of it more like a water jet (something rather niche).
From my experiences, there are certainly a number of applications where machine learning shines. For example, Amazon’s recommendation system increased sales by over 30%. However, there are a greater number of applications where machine learning is a suboptimal solution.
In this article, we’re going to go over 4 reasons why you shouldn’t use machine learning.
With that said, let’s dive into it!
1. Data-related issues
As seen in the AI hierarchy of needs, machine learning relies on several other factors that serve as a foundation. This foundation encompasses everything from collecting data, storing data, moving data, and transforming data. It’s important that you have a robust process that achieves these preliminary steps or it’ll be less likely that you have reliable data.
Why is this so important? You’ve heard of the saying “garbage in, garbage out” — the performance of your machine learning models are limited by the quality of your data, which is why it’s so important that you have reliable data to start with.
Not only do you need your data to be reliable, but you need enough data to leverage the power of machine learning. Without these two criteria checked out, you won’t be able to get the full power of ML.
2. Interpretability
There are two general categories of models: predictive models and explanatory models:
- Predictive models solely focus on the model’s ability to produce accurate predictions.
- Explanatory models focus more on understanding the relationships between the variables in the data.
Machine Learning models, particularly ensemble learning models and neural networks, are predictive models — they are excellent at formulating predictions and far exceed the predictive power of traditional models like linear/logistic regression.
That being said, when it comes to understanding the relationships between the predictive variables and the target variable, these models are a black box. While you may understand the underlying mechanics behind these models, it’s still not very clear how they get to their final results.
And while some techniques like feature importance and correlation matrices exist, they are still quite limited in understanding relationships in your data. Overall, ML and deep learning are great for prediction, but lack in explainability.
3. Technical Debt
Maintaining machine learning models over time is challenging and expensive. Particularly, there are several types of “debt” to consider when maintaining machine learning models:
- Dependency debt: Dependency debt refers to the debt incurred from unstable data dependencies and underutilized data dependencies. In simpler terms, this refers to the cost of maintaining multiple versions of the same model, legacy features, and underutilized packages.
- Analysis debt: This refers to the idea that ML systems often end up influencing their own behavior if they update over time, resulting in direct and hidden feedback loops.
- Configuration debt: The configuration of machine learning systems themselves also incur a debt similar to any software system. It should be easy to make small configurations, it should be hard to make manual errors, and it should be easy to see the difference between different models.
4. Better Alternatives
Lastly, machine learning shouldn’t be used when simpler alternatives exist that are equally as effective. In my previous article, “Want to be a Data Scientist, Don’t Start with Machine Learning,” I emphasized the point that machine learning is not the answer to every problem.
A simple solution that takes 1 week to build that is 90% accurate will almost always be chosen over a machine learning model that takes 3 months to build that is 95% accurate.
Ideally, you should start with the simplest solution that you can implement and iteratively determine if the marginal benefits from the next best alternative outweighs the marginal costs.
If you can solve your problem with a Python script or a SQL query, you should do that first. If you can solve your problem with a decision tree, you should do that first. If you can solve your problem with a linear regression model, you should do that first.