“The sexiest job of the 21st Century” is the phrase that has defined the rise of data science. The past few years have seen a rise in discussion about the subject, both by producers (the practitioners of data science) and consumers (stakeholders and business who can gain from utilising data science).
A quick look with Google Trends affirms this; we can see a rise in mentions of this and similar terms (Big Data, Analytics…) since around 2012 (when I believe the statement originates, as far as I know from the original article by Harvard Business Review).
Now, I mentioned in a previous post about debates over the difference between Operational Research and Data Science. There is obvious overlap, so surely OR must also be gaining interest? Maybe not; another cursory glance at Google Trends shows no increase in talk about the subject. What gives? Is OR as a subject dying, to be replaced by the superior data scientists and their sexy job title? Is it losing a PR war with the more modern sounding disciplines, OR sounding like a relic of older times and not relevant in the digital age?
Data Science?
“A data scientist is a data analyst working in San Francisco”
“Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.”
There are plenty of quotable (and just slightly sarcastic) definitions of data science. The problem with defining the subject may be related to the talk of “unicorns” and “data rock stars” that overzealous recruiters have thrown around (usually in their search for new graduates, with ten years’ experience, in a technology that’s five years old). This is close to the truth, data science is driven by a mixture of statistical analysis, programming, algorithm design and understanding of the business problem at hand. In a nutshell, data science is about using data to predict, inform and direct.
Is that new?
If it sounds familiar, that’s because it is. There are many reasons why the Data Science term has exploded; some are due to changes in the business environment we find ourselves in and the technologies that are available, whilst others are maybe more superficial. However, businesses have always had a need to make better decisions, make more accurate predictions and simply know more about their problems. The reasoning is simple; if I know more than my competitor, I’ve got a better chance at beating them. The demand certainly isn’t new.
Data grows exponentially; as we design more complicated systems we collect more data, and therefore getting the information out of this data becomes more difficult. In this way, data science is a reaction to the increase in volume needing new tools and techniques.
We shouldn’t, however, underestimate the bandwagon effect. Much of the talk about the subject has been self-fulfilling, a reinforcing feedback loop that has led to Data Science becoming an over-used phrase. The subject certainly is marketing itself effectively; Data Science and Big Data have often been branded as products or tools, rather than methods, and this has helped adoption. Many big-name consultancies have followed suit, rebranding services under names of Data Science, Big Data or Advanced Analytics.
So what is the difference?
In an earlier post I attempted to define Operational Research, coming up with the following:
“the use of scientific methods, analysis and reasoning to aid decision making and strategy development”
The only difference between this definition and from that of Data Science is that it doesn’t explicitly mention data. Is this an effective gap between the disciplines? Can Operational Research happen without data? I would argue that any OR activity involves turning data into action.
What we think of as OR may not always involve masses of data, and may not involve data in machine readable format. In these cases the data is still there, but instead of being in a nice electronic machine readable form, it is in the messy state of being locked away inside these horrible squishy things called people. We often refer to this as “soft” data – as it is necessarily interwoven with experiences, attitudes and other uncontrollable influences, and so the data must be interpreted. Incredibly effective OR techniques, such as MCDA, are used to get this interpreted “soft” data out of people so that we can use much easier, nicer, less “fuzzy” computational techniques.
So we’re still doing the same thing; turning data into action. The “rise” of data science has led to some ground-breaking developments in machine learning, artificial intelligence, data processing, and I’m sure many other areas that I know far too little about. These techniques are making our jobs a hell of a lot easier, letting us analyse problems that were previously intangible, and pointing out problems we didn’t even realise we had.
The conclusion from this? What we call ourselves doesn’t matter. Data Scientist, Operational Researcher, Analyst… our business is solving problems and answering questions. As analysts, we love to learn about new tools, techniques and methods for solving fascinating problems; but for our stakeholders it is about accurate analysis providing effective solutions. If we deliver this, then we can call ourselves whatever we want.