Data Science in the Wild
If you’re reading this, you’re probably looking to break into data science. I just want to start by informing you that data science in the wild is a different beast to the data science that you might be familiar with from your degree or online course.
As a practitioner, I’d like to let you in on a few trade secrets I wish I had known before starting. I’ve learned the hard way and I’m graciously dropping these pearls so you don’t have to mess up like I have done. Thank me by subscribing (if you want) …please.
Here are some things people don’t tell you about being a data scientist. There are many more but I will give you my top four.
You need buy-in at the top.
You cannot make an impact in an organisation as a lone-wolf data scientist. You need buy-in from those at the top. This is a hard pill for a lot of us to swallow. We want to be Elon Musk saving the organisation with math, science, and supreme logic. Elon is 1 in 7 billion, but outside of your family and friends nobody knows you or cares about you. You need somebody credible to vouch for you that has the ear of your senior leadership team, board and executive committee (ExCo). This is the only way to create the culture change required to do impactful machine learning and save those millions that you desperately want to with your magic machine learning models. Heed this warning, you’ll suffer from tissue rejection if you do not get buy in at the top. Sure you’ll build some great prototypes, but nobody will want to put their budget towards productionising and scaling them.
Solution: Machine learning has to be the “in thing” in your organisation, the only person that can really push this as an agenda is a senior stakeholder, somebody like your CTO or even CEO equivalent. If you’re able to, get an audience with them as soon as you possibly can. Come prepared with a strategy and think big! Talk about why AI is any use at all, your plans for warming the organisation up, plans for delivery, potential costs and resourcing needs should all be part of your strategy. A 10 to 16 page strategy deck should do.
Almost nobody knows what machine learning is.
This is quite self-explanatory, most people you will meet in practice don’t really know what machine learning is, AND you’ll have a tough time convincing them it’s a worthwhile pursuit. Who are the people I’m referring to here? Your key stakeholders that hold the pen on how much the business can spend on productionising and supporting your models. I want you to drop your data science hat for a minute and pick up your investor/key business stakeholder hat. Now be very pragmatic about this, why should you, the investor, support some model that only works in theory? You probably don’t understand what you’re looking at and you have most certainly already anticipated the technical debt should this eccentric data scientist decide to up and leave your business.
The thing to get here is that a lot of business is about perception. I know you’ve created a brilliant model that will save the company millions (on paper), but you need to change their perception of machine learning away from some overly complex and clever stuff that they don’t get. To do this, you’ll need to speak their language. Ask yourself this, what would make you invest in a model?
Solution: You’ll need to warm them up with tangible examples. Something that has worked for me in the past is collaborating with external partners who have delivered solutions that have brought in quantifiable business value. I’ve asked my partners to work with me to deliver workshops to key stakeholders. The aim is to get people inspired and coming up with ideas about how machine learning could help them. You need to plant the seed.
Data is very difficult to get access to.
If any of you are familiar with Kaggle it’s a great way to get to grips with machine learning and test your skills against thousands of other budding data scientists and ML practitioners. The problem with Kaggle is that the data is too readily available. That’s right, it’s too easy to get data. In practice it is not like this at all. Data can be very difficult to get hold of, even data belonging to one’s own company. In some organisations you might be stuck in a loop of bureaucracy for months before you get hold of data. Have you planned for this?
Solution: Do not jump into projects! I know you got top 5% in your advanced regression challenge on Kaggle but don’t start promising to deliver this yet. Take time to assess many problem statements and scope your solutions out carefully. As part of this initial assessment, you’ll want to map out what data you think you might need AND score how difficult the data is to get access to. By the way just assuming poor data quality from the start will ease the mental burden here.
It takes a village to do machine learning at scale, a really, really expensive village.
By village I mean people, infrastructure and data. You’ll probably need a cloud platform, on premises infrastructure, labs, expertise, the right team, and data. This stuff is expensive, and you’ll have a big job on your hands convincing your organisation to invest in this if they haven’t already.
Solution: You need to prove value to your organisation. Are there any low hanging fruit? Can you build something at a smaller scale that has a big impact on one team? Never forget business is all about pragmatism, something that has worked well for one team is an easier sell than something that works well on paper but hasn’t yet been used. Find a problem that can be solved simply, focus on this, and do not try to boil the ocean. Remember as well, data science isn’t just about machine learning. Can you provide analytics, statistical modelling or anything else that can be of use to your business stakeholders? Have a think, you’re the data scientist now so you can come up with something. I believe in you.