by Steve Pierson, ASA Director of Science Policy

The proliferation of master’s and doctoral programs in data science and analytics continues, seemingly due to the insatiable demand of employers for data scientists. Amstat News started reaching out two years ago to those in the statistical community who are involved in such programs to find out more. Given their interdisciplinary nature, we identified programs involving faculty with expertise in different disciplines—including statistics, given its foundational role in data science—to jointly reply to our questions. We have profiled many universities in our AprilJune, and December 2017 issues and January and April 2018 issues; here are three more.

COLUMBIA

Tian Zheng is a professor of statistics and associate director for education for the Data Science Institute at Columbia. She develops novel methods for studying complex data from different application domains and is currently the chair-elect for the ASA’s Statistical Learning and Data Science Section.

Jeannette Wing is Avanessians Director of the Data Science Institute and professor of computer science at Columbia University. Before Columbia, she was corporate vice president of Microsoft Research. She is widely recognized for her intellectual leadership in trustworthy computing.


Cliff Stein is a professor of industrial engineering and operations research and computer science at Columbia and chair of the curriculum subcommittee of the Data Science Institute’s education committee. He has been conducting research in combinatorial optimization, scheduling, and algorithms for large data.


Daniel Hsu is an associate professor in the Computer Science Department and a member of the Data Science Institute, both at Columbia University. His research interests are in algorithmic statistics and machine learning.

 


Degree name: Master of Science in Data Science
Year in which first students graduated/expected to graduate: December 2015
Number of students currently enrolled: 327 (two cohorts)
Partnering departments: Data Science Institute (lead), Computer Science, Statistics, Industrial Engineering and Operations Research
Program format: In-person; 30 credit hours required; a capstone project at the end of the program
Full-time/Part-time: We have both full-time and part-time students from a wide range of backgrounds (e.g., arts, humanities, business, science and engineering). Our students are at different career stages, from recent college graduates to mid-career managers.

What are the basic elements of your data science/analytics curriculum, and how was the curriculum developed?

An interdisciplinary education committee has been an important part of Columbia’s Data Science Institute (DSI) since the beginning, with members from computer science (CS), statistics, industrial engineering and operations research (IEOR), and other departments. This education committee discussed and developed the curriculum for the MS in data science program. Twenty-one credits of the program are core required classes, and nine credits are electives. The core required classes include three courses from statistics (two foundational courses in probability and statistics, one course on exploratory data analysis and visualization), three courses from CS (algorithms, machine learning, and computer systems), and one course on capstone with a curriculum component in data ethics.

Prerequisites for admission include mathematical preparation and some familiarity with computing. Prior industry experience is valued during the admission process, but not required. During the program, students with prior coursework in statistics or CS can be granted waivers for some of the core required courses to provide the flexibility to take more electives.

As the DSI emphasizes interdisciplinary research and collaboration, we provide students with the flexibility to look across campus at domain areas to fulfill elective requirements. In addition to taking advanced coursework in CS, statistics, and math, DSI students have taken technical classes in business, law, journalism, architecture, bioinformatics, and various departments throughout the university.

Students often take advantage of the many research opportunities across campus to gain additional hands-on experience, which can be used for elective credit. Many students will intern during the summer. DSI offers career support to obtain internships, including hosting a DSI internship fair in the spring.

MS students are required to complete a capstone project during their final semester. This course provides a unique opportunity for students in the MS in data science program to apply their knowledge of the foundations, theory, and methods of data science to address data science problems in industry, government, and the nonprofit sector. The course activities focus on a semester-length data science project sponsored by a faculty member, nonprofit organization, or industry affiliate of DSI. The project synthesizes the statistical, computational, and engineering challenges and the social issues involved in solving complex real-world problems.

Data ethics is embedded in our curriculum as discussions in individual courses and a more focused mini-curriculum in the capstone course.

What was your primary motivation(s) for developing a master’s data science/analytics program? What’s been the reaction from students so far?

Data science is emerging as a vital intellectual discipline driven by the increasing demand in all sectors for skilled practitioners who can extract value from today’s data. As a highly interdisciplinary field, aspiring students need training in computer science, statistics, and optimization algorithms to become data scientists who can solve applied problems around understanding, exploring, and forming predictions from data.

The Columbia University MS program in data science aims to shape an academic program that prepares a workforce of data scientists for a career in this rising field. Graduates of this program will pursue careers as data scientists, analysts, and researchers across all sectors.

Our program attracts students from a diverse pool. While the majority of applicants have an engineering or technical background, about 21 percent of the fall 2018 applicants earned a degree in math or statistics and 19 percent hold degrees in nontechnical disciplines, including biology, business, economics, law, medicine, philosophy, physics, psychology, religious studies, and urban planning.

The fall 2018 admissions cycle had 1,624 applications with a 17 percent acceptance rate. Of the 174 MS students who enrolled this fall, 24 percent are US citizens or permanent residents and 34 percent are female. Our international students are comprised of individuals from 16 countries, including China, India, France, South Korea, Mexico, and Thailand.

How do you view the relationship between statistics and data science/analytics?

Statistics is a foundational area for data science that provides theory and methods for understanding variation and trends in observed data and deriving inferential insights about the data-generating mechanism behind the data.

It is especially essential for drawing interpretable inferences and predictions based on statistical models and machine learning methods and addressing the biases and uncertainty in a data science application.

Statistics complements other areas of data science, such as machine learning and optimization, which provide the algorithmic and mathematical tools that enable the statistical methodologies, as well as nonstatistical models, for data science applications.

Subjects that may not have traditionally been in the purview of classical statistics, such as computational complexity, have become active research areas of statistics, in part due to increased interactions with other data science disciplines.

What types of jobs are you preparing your graduates for?

The Data Science Institute programs prepare graduates for roles throughout the data science lifecycle of a company. Our graduates have placed in roles such as data scientist, data engineer, data strategist, software engineer on a machine learning team, machine learning engineer, strategic consultant, and quant analyst.

The advanced technical and statistical training our students receive prepare them well for companies in need of big data support in every industry and throughout the world.

DSI grads are contributing to recommendation engines at large tech companies, detecting fraud and inappropriate content at social media companies, mapping the needs of underserved neighborhoods using Twitter data, creating new investing strategies at finance firms, managing algorithms for post-disaster response for large cities, creating fraud detection software, and solving many other corporate and societal challenges.

What advice do you have for students considering a data science/analytics degree?

At Columbia, we believe data science should touch all fields, professions, and sectors. We consider applications from students of all academic backgrounds as long as they are motivated to learn data science and well-prepared in math and computing, which can be demonstrated in one’s application through non-traditional preparations such as non-degree courses, working, and/or research experiences. Our program’s core ensures rigorous training in data science, while our personalized advising model provides flexibility to support different learning trajectories. Students from fields that are different from CS, statistics, and IEOR are all welcome to inquire and apply.

Our program’s core provides students with a set of skills that overlaps with programs such as CS, statistics, or IEOR but has its own distinct flavor. Every year, there are more jobs in a variety of areas that require the distinct blend of skills emphasized in our data science program.

For future data scientists who are considering a data science degree, our advice is to look for programs that are well grounded in the foundations of data science (including statistics), provide experiences with real-world data science applications, and have data ethics embedded in the curriculum.

Describe the employer demand for your graduates/students.

Employer demand for data science graduates is high and critical to the success of evolving businesses. Every industry—including finance, tech, health care, media, government, and nonprofits—is growing their data science talent pool.

DSI graduates fill roles that fall within the data lifecycle of a company. In the more than four years since our academic programs have launched, more than 500 companies have recruited directly from our academic programs, with our students placing at 98 percent in the field, demonstrating the high demand for our graduates.

Do you have advice for institutions considering the establishment of such a degree?

Data science is a “team sport.” It takes substantial collaboration to create an interdisciplinary program in data science. Institutions should create incentives for departments and individual faculty and provide resources for the program to support such a collaboration.

For example, administratively, academic programs may need to be hosted in an academic school/department. Having an interdepartmental program housed in a single-discipline department adds an additional burden to the host department and creates different “classes” of students within the same department/or shared space that compete for limited resources.

At Columbia, although the MS in data science program is administratively hosted in the CS department, the DSI serves as the primary operating unit. This provides our students undivided support for their academic life on campus, ranging from advising and collaborative space to career development.

View full article here