by Steve Pierson, ASA Director of Science Policy
Tian Zheng is a professor of statistics and associate director for education for the Data Science Institute at Columbia. She develops novel methods for studying complex data from different application domains and is currently the chair-elect for the ASA’s Statistical Learning and Data Science Section.
Jeannette Wing is Avanessians Director of the Data Science Institute and professor of computer science at Columbia University. Before Columbia, she was corporate vice president of Microsoft Research. She is widely recognized for her intellectual leadership in trustworthy computing.
Cliff Stein is a professor of industrial engineering and operations research and computer science at Columbia and chair of the curriculum subcommittee of the Data Science Institute’s education committee. He has been conducting research in combinatorial optimization, scheduling, and algorithms for large data.
Daniel Hsu is an associate professor in the Computer Science Department and a member of the Data Science Institute, both at Columbia University. His research interests are in algorithmic statistics and machine learning.
Degree name: Master of Science in Data ScienceYear in which first students graduated/expected to graduate: December 2015Number of students currently enrolled: 327 (two cohorts)Partnering departments: Data Science Institute (lead), Computer Science, Statistics, Industrial Engineering and Operations ResearchProgram format: In-person; 30 credit hours required; a capstone project at the end of the programFull-time/Part-time: We have both full-time and part-time students from a wide range of backgrounds (e.g., arts, humanities, business, science and engineering). Our students are at different career stages, from recent college graduates to mid-career managers.
An interdisciplinary education committee has been an important part of Columbia’s Data Science Institute (DSI) since the beginning, with members from computer science (CS), statistics, industrial engineering and operations research (IEOR), and other departments. This education committee discussed and developed the curriculum for the MS in data science program. Twenty-one credits of the program are core required classes, and nine credits are electives. The core required classes include three courses from statistics (two foundational courses in probability and statistics, one course on exploratory data analysis and visualization), three courses from CS (algorithms, machine learning, and computer systems), and one course on capstone with a curriculum component in data ethics.
Prerequisites for admission include mathematical preparation and some familiarity with computing. Prior industry experience is valued during the admission process, but not required. During the program, students with prior coursework in statistics or CS can be granted waivers for some of the core required courses to provide the flexibility to take more electives.
As the DSI emphasizes interdisciplinary research and collaboration, we provide students with the flexibility to look across campus at domain areas to fulfill elective requirements. In addition to taking advanced coursework in CS, statistics, and math, DSI students have taken technical classes in business, law, journalism, architecture, bioinformatics, and various departments throughout the university.
Students often take advantage of the many research opportunities across campus to gain additional hands-on experience, which can be used for elective credit. Many students will intern during the summer. DSI offers career support to obtain internships, including hosting a DSI internship fair in the spring.
MS students are required to complete a capstone project during their final semester. This course provides a unique opportunity for students in the MS in data science program to apply their knowledge of the foundations, theory, and methods of data science to address data science problems in industry, government, and the nonprofit sector. The course activities focus on a semester-length data science project sponsored by a faculty member, nonprofit organization, or industry affiliate of DSI. The project synthesizes the statistical, computational, and engineering challenges and the social issues involved in solving complex real-world problems.
Data ethics is embedded in our curriculum as discussions in individual courses and a more focused mini-curriculum in the capstone course.
Data science is emerging as a vital intellectual discipline driven by the increasing demand in all sectors for skilled practitioners who can extract value from today’s data. As a highly interdisciplinary field, aspiring students need training in computer science, statistics, and optimization algorithms to become data scientists who can solve applied problems around understanding, exploring, and forming predictions from data.
The Columbia University MS program in data science aims to shape an academic program that prepares a workforce of data scientists for a career in this rising field. Graduates of this program will pursue careers as data scientists, analysts, and researchers across all sectors.
Our program attracts students from a diverse pool. While the majority of applicants have an engineering or technical background, about 21 percent of the fall 2018 applicants earned a degree in math or statistics and 19 percent hold degrees in nontechnical disciplines, including biology, business, economics, law, medicine, philosophy, physics, psychology, religious studies, and urban planning.
The fall 2018 admissions cycle had 1,624 applications with a 17 percent acceptance rate. Of the 174 MS students who enrolled this fall, 24 percent are US citizens or permanent residents and 34 percent are female. Our international students are comprised of individuals from 16 countries, including China, India, France, South Korea, Mexico, and Thailand.
Statistics is a foundational area for data science that provides theory and methods for understanding variation and trends in observed data and deriving inferential insights about the data-generating mechanism behind the data.
It is especially essential for drawing interpretable inferences and predictions based on statistical models and machine learning methods and addressing the biases and uncertainty in a data science application.
Statistics complements other areas of data science, such as machine learning and optimization, which provide the algorithmic and mathematical tools that enable the statistical methodologies, as well as nonstatistical models, for data science applications.
Subjects that may not have traditionally been in the purview of classical statistics, such as computational complexity, have become active research areas of statistics, in part due to increased interactions with other data science disciplines.
The Data Science Institute programs prepare graduates for roles throughout the data science lifecycle of a company. Our graduates have placed in roles such as data scientist, data engineer, data strategist, software engineer on a machine learning team, machine learning engineer, strategic consultant, and quant analyst.
The advanced technical and statistical training our students receive prepare them well for companies in need of big data support in every industry and throughout the world.
DSI grads are contributing to recommendation engines at large tech companies, detecting fraud and inappropriate content at social media companies, mapping the needs of underserved neighborhoods using Twitter data, creating new investing strategies at finance firms, managing algorithms for post-disaster response for large cities, creating fraud detection software, and solving many other corporate and societal challenges.
At Columbia, we believe data science should touch all fields, professions, and sectors. We consider applications from students of all academic backgrounds as long as they are motivated to learn data science and well-prepared in math and computing, which can be demonstrated in one’s application through non-traditional preparations such as non-degree courses, working, and/or research experiences. Our program’s core ensures rigorous training in data science, while our personalized advising model provides flexibility to support different learning trajectories. Students from fields that are different from CS, statistics, and IEOR are all welcome to inquire and apply.
Our program’s core provides students with a set of skills that overlaps with programs such as CS, statistics, or IEOR but has its own distinct flavor. Every year, there are more jobs in a variety of areas that require the distinct blend of skills emphasized in our data science program.
For future data scientists who are considering a data science degree, our advice is to look for programs that are well grounded in the foundations of data science (including statistics), provide experiences with real-world data science applications, and have data ethics embedded in the curriculum.
Employer demand for data science graduates is high and critical to the success of evolving businesses. Every industry—including finance, tech, health care, media, government, and nonprofits—is growing their data science talent pool.
DSI graduates fill roles that fall within the data lifecycle of a company. In the more than four years since our academic programs have launched, more than 500 companies have recruited directly from our academic programs, with our students placing at 98 percent in the field, demonstrating the high demand for our graduates.
Data science is a “team sport.” It takes substantial collaboration to create an interdisciplinary program in data science. Institutions should create incentives for departments and individual faculty and provide resources for the program to support such a collaboration.
For example, administratively, academic programs may need to be hosted in an academic school/department. Having an interdepartmental program housed in a single-discipline department adds an additional burden to the host department and creates different “classes” of students within the same department/or shared space that compete for limited resources.
At Columbia, although the MS in data science program is administratively hosted in the CS department, the DSI serves as the primary operating unit. This provides our students undivided support for their academic life on campus, ranging from advising and collaborative space to career development.
View full article here