29.03.19
Data Science is a young field and it means different thing to different people. Here I'll give my take on what it is and isn't. My working definition that I'll try out is:
Data Science is the practice of using data and algorithms to make predictions or prescriptions in an organisation
I'll flesh that out a bit and give a bit of background.
First I'll start off with a well worn Venn diagram and an early attempt to define the field.
This diagram was made by New-York Based data scientist Drew Conway in 2010. It is supposed to represent the three core sets of skills that make up the discipline:
According to Conway, and others since, somebody with some combination all three of these skill could be a data scientist.
At the time this diagram was made there were only handful of people in the world with the job title 'Data Scientist'; they mostly existed at the big internet companies in the USA. Over the past eight years their number has increased and the job has evolved.
I have some issues with that description. It is more a description of a set of skills than a definition of a profession. We don't describe astrophysics as 'the intersection of maths skills, telescopes and physics' it is 'the branch of astronomy concerned with the physical nature of stars and other celestial bodies'. Likewise medicine is not 'the intersection of biology, chemistry and people' or whatever but is 'the science or practice of the diagnosis, treatment, and prevention of disease'. A better description of data science would describe the scope and nature of its pursuit.
Here is a rough list of some of the things that I think make up data science:
So a key element is the context - we are working in an applied setting, using data and maths to influence decisions and solve problems. An important aspect is that it takes place in a (loosely defined) business context. Data scientists do not generally study natural phenomenon or do research purely for its own sake. Many other sciences already exits that do this. The practice is applied - it is toward solving business problems. Another key part here is that the output of data science should not be passive - they should support a course of action. And it tends to be prospective - it is about solving future problems rather than understanding what happened in the past. The data tends to be big and messy, meaning that the data scientists must be skilled in storing, cleaning and handling data.
Very few of these elements are hard and fast and there is a lot of overlap with existing fields. However, a useful way to sharpen the definition is to contrast with other fields:
Data science uses element of all of these fields but is distinct from them. Which leads to my definition:
Data Science is the practice of using data and algorithms to make predictions or prescriptions in an organisation
I think that the key part is around making prediction or prescriptions. Data science is forward looking and its outputs ought to be data 'products' that combine data in novel ways to automatically suggest optimal courses of action. Producing simple retrospective analysis is business intelligence.
What I've jotted down here is really my working definition and probably reflects my biases and background. Please leave comments below with any thoughts on what I might have missed or got wrong.