Data Science is Cool
Let’s face it, data science is cool. And since it’s cool, there are lots of great jobs out there for data people. Unfortunately, lots of people who would love data science just don’t know how to land their first data science job. In this post, I will tell you the three critical steps you need to know to get started.
Two years ago, I was in a PhD program studying math. I really loved learning math (and still do), but I wasn’t sure that I wanted to become a professor. At one point, I had a project fall apart completely after a year of work. I went through all five stages of grief in about 3 hours, and then I decided:
I Need to Do Something Else!
I knew I needed to do something other than becoming a professor, but I still wasn’t sure what to do. There were a few options I could think of. I could become a teacher, maybe in a high school, but that came with a whole host of difficulties (not least of which was the low pay). I could become a software developer, but software development wouldn’t use any of the math or stats skills I learned in school (and besides, I was not a great programmer).
Enter Data Science
It was at this point that I just started googling randomly. Eventually, I heard about a field called “Marketing Analytics,” where you would use statistical modeling to design targeted marketing programs. Now this was cool! I found the site Meetup.com and started to go to local data-focused meetups. It was here that I first learned about about Big Data, machine-learning, and “Data Science” (which didn’t sound very science-y at all, but did sound cool). I decided that I wanted a job doing Data Science, and soon!
Okay, so how do I actually get a job?
For me, getting a job in data science was 80 percent networking, 20 percent skills, and not the other way around. I think many people make the mistake of thinking that they have to be world-class experts at convolutional neural networks and Hadoop Map-Reduce before they even talk to a single employer. On the contrary, I think that for getting into the field of data science, your professional connections and your credibility are even more important than your skills. Here’s why: data science is such a huge and poorly-defined field that there is not one well-defined technical skill set that every “data scientist” shares. For most jobs, it is much more important to demonstrate to employers that you have general, transferrable data-related technical skills than to be an expert in the specific technologies they mention in the job description (hint: the “requirements” on the job description are usually way above and beyond the actual requirements needed to do the job — “They are more like guidelines”)
Step 1: The right kind of networking
You always hear, “It’s not what you know, it’s who you know.” People often say that in a cynical or resigned way, and they are really trying to imply that you can’t get a good job without already being a member of the local yacht club. On the contrary, I think there’s a more positive, proactive way to manage the “who you know” part of getting a data science job. Ramit Sethi calls it natural networking. When many people network, they are just there to use the people in their network for a job referral or to make a sale (I’ve been a victim of a sales call disguised as “networking” several times). The key to natural networking is to build up and maintain a network of professional contacts with whom you are building an authentic, two-way relationship. This means, for example, that your network should consist of people that you can help, not just people who you think can help you. For example, I was referred to my current job by some people that I met at a Meetup, and now I have paid it forward a few times by referring others to jobs and offering career advice. It’s a pretty simple and fun way to contribute, and it is an essential part of building an authentic professional network.
So how should you get started building your network? There are three strategies that I found useful in my own job search:
- Go to local data science Meetups at least once a month
- Systematically reach out on email and LinkedIn to at least 1 person a week
- Try to have at least one coffee meeting/informational interview a month (preferably one a week)
Step 2: What skills do I really need?
The skills you need really depend on the job you’re looking for. If you’re looking to be a high-performance data software engineer, you probably need to know Java and to really understand Hadoop Map-Reduce. If you want to do marketing analytics for an ad agency, you might need to know SAS. But what if you aren’t sure where to start? My recommendation is to learn R and a little bit about Hadoop. I talk about this at length, including the best R packages to learn, in my article Becoming a Data Hacker. The key point of the article is to pick one technology stack to focus on, and to do at least one project in that technology stack.
Step 3: The interview
From the company’s perspective, most job interviews are all about establishing two things:
– Can the candidate do the job?
– Is the candidate a good fit for the team?
The first question mainly comes down to technical skills and subject-matter knowledge, and the second one mostly comes down to personality and social skills. The best way to nail the first question is to have done real, independent data science projects, so you can talk about data science in convincing detail. It helps to have also read widely and gone to data science meetups, so that you can talk about data science in a general context.
The best way to nail the second question is to know how to answer the innocuous-sounding “tell us about yourself” questions appropriately. This is harder than it sounds, and it’s where interview practice is critical. For example, a typical interview question is: “What is your biggest weakness?” A naive person might just answer the question directly: “I procrastinate too much.” An inexperienced cynic might try to give a weakness that’s really a strength: “Sometimes I work too hard.” But a master interviewer would say something like: “Well, I am pretty good at managing projects, but I struggle sometimes with managing people. To improve this, I am taking an online management course, and I’ve taken on small leadership roles within my team lately.” The master interviewer gives an honest answer, but they also assuage the fears of the interview team, and they demonstrate self-awareness by showing that they are aware of the problem and trying to fix it. Always think about what the interviewer is really asking, and answer the real question, not just the surface question.
There’s more to come on this topic. But this short post should be enough to get you started. For now, here are a few action steps:
- Sign up for your first meetup. Go to meetup.com and search for data-related meetup groups in your local area. Here are some keywords to search: “big data,” “hadoop,” “analytics,” “R,” “data science,” “noSQL”
- Pick a small project to try for learning purposes (some ideas at the bottom of this post). It doesn’t have to be particular serious or hard, and it will be an awesome way to learn some new skills and beef up your credentials.
Ben McKown says
Great thoughts! Really useful for people looking to get into data science.
Thanks, Ben! I appreciate the positive feedback.
Ryan Dansie says
Aspiring data scientiest here, thanks for this. I was just wondering what would be the best way to display the skills learnt on a resume/cv?
Hi Ryan, good question about resumes. The most important thing to remember with resumes is: a resume is not a list of accomplishments or skills. Instead, you want to tell a *story* with your resume. What do you want the company to think about you when they read your resume? Also, you should keep the resume *short*: one page is best. I will put a resume post in my backlog for future posts.
Maxim K. says
I happen to find myself in a similar position: currently a PhD researcher with a statistical background, but aspiring more to Data Science than to academia. Thank you for the wonderful advice. I believe there are a lot of resources these days, enabling one to pick up the skills necessary. There is Coursera, featuring very accessible courses on R, Machine Learning and even Data Science itself. All in a nutshell, but a solid place to start. Then there is bigdatauniversity.com, offering more advanced and more technical courses. IT books are easy to find online as well, some even for free.
What is missing the most, in my opinion, is practice. Sure, one can manage to find a dataset and tinker with it, but this is rather unstructured and therefore not the best in terms of learning efficiency. It would be lovely to have something like Kaggle, without the competitive component though, oriented at budding Data Scientists; so that people learn together, not separately. Showing each other’s code, answering each other’s questions and so forth. A sort of sandbox to tinker with the technology in a more controlled environment.
Will Stanton says
Hi Maxim, thanks for sharing your thoughts. I agree that good practice materials are a missing piece in data science training. I love your idea of a collaborative version of Kaggle. I plan to continue writing about practice resources for data science. I would love if people like you were able to share ideas and code, as well.
Maxim K. says
Well, since I am about to sketch a more concrete learning trajectory for myself, it seems that sharing the specifics of it would be a logical next step. Maybe we should discuss this further, see if this turns into a hobby project of some kind, useful for many people I imagine. Should you be interested (with no strings attached), you have my e-mail, I believe.