The Certification of Professional Achievement in Data Sciences prepares students to expand their career prospects or change career paths by developing foundational data science skills.
- Undergraduate degree
- Prior quantitative coursework (calculus, linear algebra, etc...)
- Prior introductory to computer programming coursework
- Uploaded transcripts from every post-secondary institution attended
- Three recommendation letters
- Personal statement
- Curriculum vitae / resumé
- $85 non-refundable application fee
To learn more about the admissions application requirements, please visit the Office of Graduate Student Affairs.
Applications are currently accepted for fall admission only. The priority deadline for Fall 2015 application submissions is February 15th. [Apply Here]
Candidates for the Certification of Professional Achievement in Data Sciences, a non-degree part-time program, are required to complete a minimum of 12 credits, including four required courses:
For the most up-to-date course offering and schedule information refer to COURSES.
Three of the four required Certification courses (exception STAT W4700) may be eligible for advance standing towards the Master of Science in Data Science program upon admission to the Master of Science in Data Science program. Since Columbia University's policy prohibits the double counting of coursework between programs, Certification students admitted to and enrolled in the Master of Science program will forego their Certification in order to allow these courses to count towards their Master of Science.
CSOR W4246 ALGORITHMS FOR DATA SCIENCE
Prerequisites: basic knowledge in programming (e.g., at the level of COMS W1007), a basic grounding in calculus and linear algebra.
Methods for organizing data, e.g. hashing, trees, queues, lists,priority queues. Streaming algorithms for computing statistics on the data. Sorting and searching. Basic graph models and algorithms for searching, shortest paths, and matching. Dynamic programming. Linear and convex programming. Floating point arithmetic, stability of numerical algorithms, Eigenvalues, singular values, PCA, gradient descent, stochastic gradient descent, and block coordinate descent. Conjugate gradient, Newton and quasi-Newton methods. Large scale applications from signal processing, collaborative filtering, recommendations systems, etc.
STAT W4700 PROBABLITY AND STATISTICS
This course covers the following topics: Fundamentals of probability theory and statistical inference used in data science; Probabilistic models, random variables, useful distributions, expectations, law of large numbers, central limit theorem; Statistical inference; point and confidence interval estimation, hypothesis tests, linear regression.
COMS W4721 MACHINE LEARNING FOR DATA SCIENCE
Prerequisites: Background in linear algebra and probability and statistics.
An introduction to machine learning, with an emphasis on data science. Topics will include least squares methods, Gaussian distributions, linear classification, linear regression, maximum likelihood, exponential family distributions, Bayesian networks, Bayesian inference, mixture models, the EM algorithm, graphical models, hidden Markov models, support vector machines, and kernel methods. Part of the course will be focused on methods and problems relevant to big data problems.
STAT W4701 EXPLORATORY DATA ANALYSIS AND VISUALIZATION
Fundamentals of data visualization, layered grammer of graphics, perception of discrete and continuous variables, intreoduction to Mondran, mosaic pots, parallel coordinate plots, introduction to ggobi, linked pots, brushing, dynamic graphics, model visualization, clustering and classification.