Master of Science in Data Science

The Master of Science in Data Science allows students to apply data science techniques to their field of interest, building on four foundational courses offered in our Certification of Professional Achievement in Data Sciences program. Our students have the opportunity to conduct original research, included in a capstone project, and interact with our industry partners and faculty. Students may also choose an elective track focused on entrepreneurship or a subject area covered by one of our six centers.

ELIGIBILITY REQUIREMENTS

  • Undergraduate degree
  • Prior quantitative coursework (calculus, linear algebra, etc...)
  • Prior introductory to computer programming coursework

WHO SHOULD APPLY?

Individuals looking to strengthen their career prospects or make a career change by developing in-depth expertise in data science.

APPLICATION REQUIREMENTS

We routinely offer a number of online information sessions and other recruiting events, please [Click Here]. To learn more about the admissions application requirements and to submit your application, please visit the Office of Graduate Student Affairs

DEADLINE

Applications are currently accepted for fall admission only. (We do not have a spring admission cycle.)
The priority deadline for Fall 2016 application submission is February 15th.  [Apply Here]

CURRICULUM

Candidates for the Master of Science in Data Science are required to complete a minimum of 30 credits, including 21 credits of required/core courses and 9 credits of electives. This program may be pursued part-time or full-time.

For the most up-to-date course offering and schedule information refer to COURSES.

REQUIRED/CORE COURSES:

STAT W4105 PROBABILITY
Prerequisites: MATH V1101 Calculus I and V1102 Calculus II or the equivalent.
A calculus-based introduction to probability theory. Topics covered include random variables, conditional probability, expectation, independence, Bayes' rule, important distributions, joint distributions, moment generating functions, central limit theorem, laws of large numbers and Markov's inequality.

CSOR W4246 ALGORITHMS FOR DATA SCIENCE
Prerequisites: basic knowledge in programming (e.g., at the level of COMS W1007), a basic grounding in calculus and linear algebra.
Methods for organizing data, e.g. hashing, trees, queues, lists,priority queues. Streaming algorithms for computing statistics on the data. Sorting and searching. Basic graph models and algorithms for searching, shortest paths, and matching. Dynamic programming. Linear and convex programming. Floating point arithmetic, stability of numerical algorithms, Eigenvalues, singular values, PCA, gradient descent, stochastic gradient descent, and block coordinate descent. Conjugate gradient, Newton and quasi-Newton methods. Large scale applications from signal processing, collaborative filtering, recommendations systems, etc.

STAT W4702 STATISTICAL INFERENCE AND MODELING
Prerequisites: Working knowledge of calculus and linear algebra (vectors and matrices), and STAT W4105 Probability or equivalent.
In this course, we will systematically cover fundamentals of statistical inference and testing, and give an introduction to statistical modeling. The first half of the course will be focused on inference and teesting, covering topics such as maximum likelihood estimates, hypothesis testing, likelihood ratio test, Bayesian inference, etc. The second half of the course will provide introduction to statistical modeling via introductory lectures on linear regression models, generalized linear regression models, nonparametric regression. and statistical computing.  Throughpout the course, real-data examples will be used in lecture discussion and homework problems.  This course lays the foundation, preparing the MA in Data Science students, for other courses in machine learning, data mining and visualization.

COMS W4121 COMPUTER SYSTEMS FOR DATA SCIENCE
Prerequisites: Background in Computer System Organization and good working knowledge of C/C++. Corequisites: CSOR W4246 Algorithms for Data Science, STAT W4105 Probability, or equivalent as approved by faculty advisor.
An introduction to computer architecture and distributed systems with an emphasis on warehouse scale computing systems. Topics will include fundamental tradeoffs in computer systems, hardware and software techniques for exploiting instruction-level parallelism, data-level parallelism and task level parallelism, scheduling, caching, prefetching, network and memory architecture, latency and throughput optimizations, specialization, and an introduction to programming data center computers.

COMS W4721 MACHINE LEARNING FOR DATA SCIENCE
Prerequisites: Background in linear algebra and probability and statistics.
An introduction to machine learning, with an emphasis on data science. Topics will include least squares methods, Gaussian distributions, linear classification, linear regression,  maximum likelihood, exponential family distributions, Bayesian networks, Bayesian inference, mixture models, the EM algorithm, graphical models, hidden Markov models, support vector machines, and kernel methods. Part of the course will be focused on methods and problems relevant to big data problems.

STAT W4701 EXPLORATORY DATA ANALYSIS AND VISUALIZATION
Prerequisite: programming.
Fundamentals of data visualization, layered grammer of graphics, perception of discrete and continuous variables, intreoduction to Mondran, mosaic pots, parallel coordinate plots, introduction to ggobi, linked pots, brushing, dynamic graphics, model visualization, clustering and classification.

ENGI E4800 DATA SCIENCE CAPSTONE AND ETHICS
Prerequisites: CSOR W4246 Algorithms for Data Science, STAT W4105 Probability, COMS W4121 Comptuer Systems for Data Science, or equivalent as approved by faculty advisor. Corequisites: to be completed along side of or after: STAT W4702 Statistical Inference and Modeling, COMS W4721 Maching Learning for Data Science, STAT W4701 Exploratory Data Analysis and Visualization, or equivalent as approved by faculty advisor.
This course provides a unique opportunity for students in the M.S in Data Science program to apply their knowledge of the foundations, theory and methods of data science to address data science problems in industry, government and the non-profit sector. The course activities focus on a semester-length data science project sponsored by a faculty member or local organization. The project synthesizes the statistical, computational, engineering challenges and social issues involved in solving complex real-world problems.

ELECTIVES:

Nine (9) credits of elective courses should be drawn upon existing graduate level courses at Columbia University.  In addition to advisor approval, elective course selection will be subject to course prerequsities, course availability, and the cross-registration procedures of the school/department offering the requested courses.

COMS E6910x and y FIELDWORK
1 pt. Members of the faculty.
Prerequisites: Obtained internship and approval from Professor Eleni Drinea. Only for M.S. students in the Computer Science Department (and Data Science Institute) who need relevant work experience as part of their program of study. Final report required. This course may not be taken for pass/fail credit or audited. For more information visit http://www.cs.columbia.edu/education/ms/cpt.

SUMA K4360 SUSTAINABILITY TECHNOLOGY AND THE EVOLUTION OF SMART CITIES
3 pts. Professor Gregory Falco. Syllabus.
This course is offered through the School of Continuing Education. The progress of sustainability in recent years has almost entirely been a result in the evolution of smart, sustainable technology solutions. This course examines opportunities to drive sustainability through technology applications with the end goal of piecing together all of the pieces to envision an intelligent city. Companies are increasingly turning to technology to fulfill their sustainability goals considering many technologies provide off-the-shelf, cost-effective and immediate savings compared to operationally invasive, resource-heavy sustainability transformation programs. Sustainability technology ranges from intelligent infrastructure to mobile applications that help to drive the "sharing economy". The course will provide an overview of the sustainability technologies that large corporations are actively pursuing and delve into the project management and integration strategies required to implement these solutions. Successful sustainability practitioners must not only have a strong understanding of the values and methodologies of sustainable operations, but also the tools and technologies available to drive sustainability throughout their organization. Upon completion of the class, students will have a sufficient level of understanding to discuss these solutions and relevant case studies with potential employers. This course will benefit anyone interested in a career in sustainability or in smart cities as it will provide them the skills and analytical capabilities to analyze which sustainability technologies are a good fit for their company's sustainability and growth strategy.

TUITION AND FEES

Students enrolled in the Master of Science program pay Columbia Engineering's rate of tuition, $1,782 per credit for the 2015-2016 academic year. Tuition and fees are prescribed by statute and are subject to change at the discretion of the Trustees. For more information on rates of tuition and other applicable fees, refer to Student Financial Services.

QUESTIONS

If you would like to learn more, please refer to our Frequently Asked Questions or sign up for one of our regularly scheduled online information sessions.


550 W. 120th St., Northwest Corner 1401, New York, NY 10027    212-854-5660
©2016 Columbia University