The annoying parts of data science
Data science is so hot right now.
Many of you are thinking of joining the world of data science. And I am all for it! Data science is an exciting, growing field, and there are tons of great jobs out there. But data science is not for everyone. In this post, I will help you figure out if data science is right for you. If you are thinking of becoming a data scientist, be sure to ask the following questions first.
Will I enjoy (or tolerate) the boring parts of data science?
Most data scientists spend a lot of time in the weeds, doing very manual data work: building training sets, writing regular expressions, running basic queries to check assumptions, etc. This kind of work can add up to more than half of a data scientist’s job. If you don’t actually enjoy actually looking at real data, then you will not enjoy being a data scientist!
Am I doing it (just) for the money?
Data scientists tend to get paid very well, and that’s a big reason that people want to join the field. But for most people, it’s easier and more straightforward to become a mediocre software developer (just spend a little time learning to write mediocre Javascript and CSS). They make good money, too, because the demand for software developers is so ridiculously high in most places (usually much higher than for entry-level data analysts). If you can become a good software developer, the sky’s the limit salary-wise (this part is really hard). And if you become a great software developer, you can always come back to data science later.
Do I cherish being wrong?
Data scientists constantly make hypotheses and test them with data. Usually, these hypotheses are wrong, but it can be pretty hard to figure that out, especially if you badly want the hypothesis to be right. As a data scientist, you have to approach every hypothesis with suspicion, and you can’t get too attached to any one idea. Otherwise, you will fall victim to confirmation bias. It is said that if your hypotheses are right all the time, then you are not being bold enough with your predictions.
Am I willing to speak truth to power?
Even in the best places to work, there is a lot of pressure to support the party line. Executives, product managers, or salespeople will make their own hypotheses, and will ask you to prove them. But what happens if they are wrong? It can be pretty hard to explain to your boss’s boss that he or she is totally wrong about the value of a new product feature. And as Upton Sinclair said, “It is difficult to get a man to understand something, when his salary depends upon his not understanding it!”
Do I like thinking about the big picture (ie. the business)?
Why would a company hire a data scientist? Is it because they want to have their machine-learning models to have the lowest ROC scores in the industry? NO! They want to have a more successful business, and machine-learning models are just a means to an end. There are some data scientists in the world who really are spending all of their time doing hardcore technical work (mostly in academia or in research labs at places like Google), but the vast majority of data scientists are expected to make direct contributions to the business. That means that a data scientist should be just as comfortable talking about client retention and marketing materials as they are talking about Apache Mahout. If you hate thinking about business, you might not like being a data scientist.
What does all this mean for me?
If you are now doubting your decision to become a data scientist, GOOD! But I don’t want you to quit just yet. Despite all of the annoying parts, data science is still ridiculously fun. At least now you have a better understanding of what data science is like in the real world, not just in a Kaggle competition.
Tell me what you think
Would all of the manual work in data science scare you away? Are you not-so-thrilled to have to think about marketing materials? Or did I miss a huge negative of data science in this post? Please let me know in the comments.
Leave a Reply