Transitioning from Social Science to Data Science

As someone with limited formal training in computer science or engineering or math, it can be daunting to try to make the leap into data science. It seems like almost every job description calls for someone with a degree in some hardcore field that, while interesting and important, never caught the fancy of those of us in the social sciences who prefer the study of people and societies over the study of cells, chemicals, energy, numbers, or what not.

It certainly felt that way to me when I was getting started in data science. After wading through countless job descriptions looking for someone with a degree in something I never studied, it was difficult to not wonder, Is there a place for people with a social science background in data science? or, I didn’t major in a physical science or engineering, can I really make it as a data scientist?

Well, having worked as a data scientist now, I’m happy to say there is and you can in response to the above questions! That said, it can be difficult to grasp at first how your social science background can be useful in data science, and there are definite some key things that are helpful to be familiar with that you may have never learned in school. I hope this article provides you with insight on how your social science background can help you be a strong data scientist and helps you determine what skills school never taught you you should pick up to smooth the transition.

Social Science Background

My own social science training comes through the field of communication, specifically the study of mass communication formats (broadcast media, social media, so on; specifically entertainment media in my case) and people’s interactions with and responses to them. But the label of social science can be applied to all manners of fields beyond my own — e.g. psychology, sociology, criminology, economics, political science, etc. In fact, graduates of such fields make up a big portion of bachelor’s graduates [1] each year. And among them, you are not the only one looking to break into data science, nor will you be the first to make it — so don’t despair!

Relevance of Social Science in Data Science

It actually shouldn’t be crazy to realize that social science provides an exceptional background to move into data science. At the core of any social science is investigating how various stimuli or elements of people’s (or groups’) background, thinking, or behavior are associated with said people’s (or groups’) other thoughts or behaviors or outcomes or characteristics, and social scientists’ significant experience with such analyses is what gives them their edge — we may take practice in considering the effects of people’s backgrounds, thoughts, and behaviors for granted, but it’s not something most people spend a lot of time doing! Of course, the particulars of the data you work with will depend on the position, but I feel comfortable saying that in many jobs out there, much of what you do will entail investigating and predicting what some type of people do — be it buyers, sellers, viewers, users, interactors, drivers, riders, whomever — and how it varies depending on who they are (e.g. demographics, psychometrics), what they’ve done (e.g. past behavior features), and what they think.

More specifically, experience with data about people as commonly gained in the social sciences comes in handy in three particular regards. First and most directly, having thought a lot about demographic and psychological differences in outcomes really helps with designing methodologically rigorous analyses that investigate such differences. This is particularly true in the case of those who have received formal training in research design during their schoolwork, capable of fluently identifying and explaining mediators; moderators; threats to internal, external, construct, criterion validity; and so on, perhaps even able to design and analyze solid randomized experiments. Your experience considering and accounting for such elements will help design rigorous analyses that cleanly measure what they proclaim to measure.

Building directly on this, secondly, coming from social science can be great help in modeling. Having a sense of the range of demographic and psychometric factors that can affect a particular variable as well as how such factors can interact with each other is a valuable skill to have when it comes to constructing regression and classification models. Such intuition can be of use in feature transformation (e.g. knowing certain variables, like income, are usually skewed) and feature selection (when more systematic methods aren’t ideal or feasible), and can also help with specification of more complex models (e.g. hierarchical linear models, structural equation models) as the structure of the data deems fit.

Lastly, a less obvious way a social science background helps in data science is with regard to feature engineering. Those from social science have no doubt encountered various theories that aim to explain particular phenomena or behavior, often with an accompanying diagram showing how certain constructs directly or indirectly lead to an outcome. Such theories can be useful frameworks for feature engineering, especially if one has experience coming up with clever proxy variables for otherwise difficult to measure variables. For example, the integrative model of behavioral prediction [2] posits that intent to perform a behavior can is predicted by attitudes (how one feels about the behavior), perceived norms (perceptions of how ‘normal’ a behavior is), and self-efficacy (whether one feels they can perform the behavior). In the context of, say, predicting whether user A will listen to song X, existing consumption data and relevant metadata could be used to generate features that approximate such constructs — for example, user A’s attitude toward song X could be represented by user A’s consumption of songs with high metadata similarity to song X; user A’s norms about listening to song X could be represented by the extent to which users with listening histories similar to user A have listened to song X; and self-efficacy could be represented by the variety of different artists, genres, and nationalities of songs user A has listened to.

All of the above points are specially true if you’re fortunate enough to work in a field where the research is directly applicable to what you do, as is the case for me. Having conducted significant research on media preferences and effects, I feel exceptionally comfortable developing models aiming to predict media consumption at various levels. But even if your field of study doesn’t perfectly project onto your industry, you will over time observe parallels in methodological approaches that you may wish to look into. For example, you might have a lot of experience applying text mining techniques to documents, but then you might realize that such techniques can be reasonably applied in any context where the data can be considered a body of text, not just documents — whether user metadata, song metadata, consumption history, or what not. Keeping an open mind about how certain frameworks from a field you’re familiar with could be applied elsewhere is critical.

The Nuts and Bolts: Key Technical Skills

But all your schooling and wit won’t take you anywhere in data science if you don’t have a certain set of key technical skills. Especially with all the different ‘specializations’ you see touted on various jobs pages — experimentation, inference, machine learning, visualization, and so on —it can be difficult to get a sense of what tools you really want to have a good grasp. Certain jobs will definitely call for heavier application of certain techniques, but I do feel there is a ‘core’ toolkit that all data scientists are expected to be familiar with.

Spoiler alert: Practically all such skills involve some elements of coding. I understand the idea of coding can be daunting for some, but the bottom line is that if you want to be in data science, you have to get comfortable with coding and the trials and tribulations that come with it. Take a course, get a certificate, do whatever you need to do to get comfortable with the basics. A good place to start might be to Google for a tutorial on a data project that sounds interesting and go from there. No matter where you start, just know that Googling problems and troubleshooting with the help of StackOverflow et al, where other people have run into the same issue, is a timeless tradition that even professionals do all the time, nothing to be frustrated by. So let’s dig into what I think are the key skills and concepts you should be familiar with.

Book Recs

A big part of being a data scientist is constantly learning new concepts and methods. You’d quickly go broke if you took a paid course for everything you have to learn, so often times your go-to resources tend to be online tutorials and the like. I find books to be a happy medium between courses and tutorials, effectively a set of tutorials structured like a course. Here are a few books relating to the above I found useful as I prepared to enter data science, both to pick up new material and to brush up on concepts:

Some General Parting Advice

Okay, so far, I’ve given you an idea of how your social science background can be valuable in data science, outlined the tools and concepts you want to be familiar with to help make a successful transition to data science, and listed a few books you might find useful as you get started. In closing, I have some general advice for social scientists aspiring to enter data science that felt a little too broad to include in the previous sections.

First, no matter what field you choose to go into to as a data scientist, make sure you understand the field as best as you can. It can be assumed that many of the candidates for any given data science position have hugely overlapping technical skillsets, and clear domain knowledge pertaining to the industry at hand can push you above the top compared to other candidates. Think about it from the manager’s perspective: when everyone has a roughly similar skillset, wouldn’t you rather hire someone who understands the concepts and metrics on which those skills would be applied?

Second, never, ever be afraid to think ‘weirdly’. This is another area where I think coming from social science really helps, because there’s often so many theories that aim to explain the mechanism driving various outcomes and the process of innovating often entails making a ‘weird’ connection between two things that hasn’t been made before. If you feel like there’s connection between two things, a method from one context you think could be applied in another, don’t hesitate to look into it or bounce the thought off of others. I don’t mind airing the occasional thought that makes me sound like a bit of a lunatic, because I know that every once in a while, one of those comments will get at something really insightful or novel.

And lastly, no matter where you go and what you do, pursue your passions. There’s a Ralph Waldo Emerson quote that’s long been a favorite of mine:

The voyage of the best ship is a zigzag line of a hundred tacks. See the line from a sufficient distance, and it straightens itself to the average tendency. Your genuine action will explain itself, and will explain your other genuine actions. Your conformity explains nothing. Act singly, and what you have already done singly will justify you now. [5]

I never grew up wanting to be a data scientist, but at every step I pursued my passions, drawing the zigs and zags of my ship through my genuine actions. Looking back now, I can see them straightening to their average tendency, enabling me to see how everything I’ve done has brought me to where I am now. I hope that as you progress in your career, you pursue your passions and draw the zig zags of your own best ship.

Danny Kim Movie/TV & data science expert. Penn, Wharton, USC film alum. All views my own, not affiliated with employers/clients.

Exit mobile version