The Data Science Institute at Columbia has a three-part mission that encapsulates the great promise this new field has to improve the quality of life for all.  Our mission is:

  • To advance the state-of-the-art in data science;
  • To transform all fields, professions, and sectors through the application of data science;
  • To ensure the responsible use of data to benefit society.

Advancing state-of-the-art means pushing the frontiers of the field through basic research.  As data science is a new, emerging field, it also means defining what data science is.  (See posts on What is Data Science? and How Does Data Science Differ from Computer Science and from Statistics?)  Data science draws on computational and statistical techniques, where computational power enables large-scale analysis of statistical models, which are inherently approximations of real-world problems.  What new techniques do we need to invent as new domain problems expose the limitations of current techniques?  Since being at a university means we have an educational mission, we should also be pushing state-of-the-art in our educational activities.  Not only should we look at what content should be taught in data science programs, but we should look at how to use data science in novel ways for people to learn data science.  We should also be exploring educational activities to complement traditional degree and certificate programs.

Given that all fields have data, the potential to use data science to explore this data, to make new discoveries, and in the end to impact lives is unbounded.  Columbia University is a full-fledged university with strong disciplinary departments and schools in the arts, humanities, sciences, and engineering, and with strong professional schools in architecture, business, dentistry, journalism, law, medicine, nursing, public health, and social work.  Columbia is the perfect laboratory to explore the transformation of all fields, professions, and sectors through data science.  Fields are in different stages in their use of data, but over time all fields will be able to benefit from data science.  I see this transformation already happening in the obvious fields, such as biology and business, but also already in fields, such as history, dentistry, and social work, that have surprised me.  Note that I expect data science to add to and enhance existing methods used by other fields, not to supplant them.  For this transformation to spread widely, data scientists and domain experts need to work together.

The last part of the mission statement makes two points.  First, through our research and education in data science, we aim to have a positive impact on society.  We should be tackling societal grand challenges, such as climate change, education, energy, environment, healthcare, inequality, and social justice.  Tackling such challenges cannot be done by one discipline alone, and given the kinds and amounts of data amassed in these sectors, data science will be at the heart of addressing these challenges.  Data scientists can use their skills and methods to confront these societal issues.  At Columbia, we are fortunate to have world-class entities, such as the Earth Institute, Center on Global Energy Policy, the Herbert Irving Comprehensive Cancer Center, the Precision Medicine Initiative, the Zuckerman Mind Brain Behavior Institute, and the Institute for Social and Economic Research and Policy, which provide on-campus expertise in these topics and natural partners for collaboration with the Data Science Institute.

Second, by “responsible use,” I mean the fair and ethical use of data, transparency and accountability of our data science techniques and processes, and the safety and security of the systems we build that rely on these techniques and models.  I use the acronym “FATES” to stand for “responsible use”.  (see Data for Good: FATES, Elaborated post)  Here, the strengths of the humanities and social sciences at Columbia come to the fore.  The Data Science Institute looks to scholars in these fields to help frame the kinds of questions technologists should ask about as they are inventing new technology—not after deploying it.  We also need to understand the ethical issues and implications of our technology.  Drawing on how ethics is taught in our schools in business, journalism, law, and medicine gives us an excellent basis for teaching ethics in data science at Columbia.

In short, the Data Science Institute at Columbia has a broadly profound mission, from establishing the scientific foundations of the field of data science to touching every other field to calling upon the insights of social scientists and ethicists to the field—all for the benefit of society.

Jeannette M. Wing is Avanessians Director of the Data Science Institute and professor of computer science at Columbia University.