Data science is the study of extracting value from data. The three key words in this definition are “value,” “extracting,” and “study.”
The word “value” leaves it to the end user, a domain expert, to determine what value is. For example, for a large technology company, value can be pegged to revenue, which might depend on quantifiable measures such as the number of clicks on ads, the time a user spends on a service (“user engagement”), or how much a user is willing to pay for a service or product. For a policymaker, value can mean a justification for a policy change. For a scholar, value can simply mean the discovery of knowledge: a scientific breakthrough, an insight about human behavior, or a new interpretation of the world around us.
The word “extracting” emphasizes action on data. One or more transformations may need to occur to the raw data before any value is gained. (See The Data Life Cycle post.) The word extracting also implies that a lot of hard work may need to be done to mine the data for its worth.
Finally, the word “study” includes both the art and science that guides any field of scientific pursuit. The word “study” also includes both theory and practice. Today, data science shows great applicability to many domains and the demand for practicing data scientists far exceeds the supply. However, for data science to be considered a field of study, we must also lay out its theoretical foundations based on mathematics and scientific methods.
Jeannette M. Wing is Avanessians Director of the Data Science Institute and professor of computer science at Columbia University.