The following is a list of data science-related courses. Please refer to the Directory of Courses for the most current course offerings and information.

Statistics & Computer Science

STCS GR5705 (formerly STAT W4242)

Introduction to Data Science

Professor Tian Zheng (Syllabus

Data Science is a dynamic and fast growing field at the interface of Statistics and Computer Science. The emergence of massive datasets containing millions or even billions of observations provides the primary impetus for the field. Such datasets arise, for instance, in large-scale retailing, telecommunications, astronomy, and internet social media. This course will emphasize practical techniques for working with large-scale date. Specific topics covered will include statistical modeling and machine learning, data pipelines, programming languages, "big data" tools, and real world topics and case studies. The use of statistical and data manipulation software will be required. Course intended for non-quantitative graduate-level disciplines. This course will not count towards degree requirements for graduate programs such as Statistics, Computer Science, or Data Science. Students should inquire with their respective programs to determine eligibility of course to count towards minimum degree requirements. This course does not fulfill any major requirements for undergraduate degree programs offered by Computer Science.

Fall 2016 Semester: 3 credits
Sec 001 T 4:10PM-6:40PM; R 2:40PM-3:55PM Call #60780

Computer Science

COMS W4121

Computer Systems for Data Science

Professors Roxana Geambasu, Eugene Wu and Sambit Sahu.

Prerequisites: Background in Computer System Organization and good working knowledge of C/C++. Corequisites: CSOR W4246 Algorithms for Data Science, STAT W4203 Probability Theory, or equivalent as approved by faculty advisor.

An introduction to computer architecture and distributed systems with an emphasis on warehouse scale computing systems. Topics will include fundamental tradeoffs in computer systems, hardware and software techniques for exploiting instruction-level parallelism, data-level parallelism and task level parallelism, scheduling, caching, prefetching, network and memory architecture, latency and throughput optimizations, specialization, and an introduction to programming data center computers.

Spring Semester: 3 credits
Sec 001 Mon/Wed 7:40PM-8:55PM Call #19266

COMS W4776

Machine Learning for Data Science

Professor Daniel Hsu (Archived Syllabus

Prerequisites: Background in linear algebra and probability and statistics.

An introduction to machine learning, with an emphasis on data science. Topics will include least squares methods, Gaussian distributions, linear classification, linear regression, maximum likelihood, exponential family distributions, Bayesian networks, Bayesian inference, mixture models, the EM algorithm, graphical models, hidden Markov models, support vector machines, and kernel methods. Part of the course will be focused on methods and problems relevant to big data problems. Students may not receive credit for both COMS W4771 and W4776.

Spring Semester: 3 credits
Sec 001 Tue/Thu 6:10PM-7:25PM Call #73980

CSOR W4246

Algorithms for Data Science

Professor Eleni Drinea 

Prerequisites: basic knowledge in programming (e.g., at the level of COMS W1007), a basic grounding in calculus and linear algebra.

Methods for organizing data, e.g. hashing, trees, queues, lists, priority queues. Streaming algorithms for computing statistics on the data. Sorting and searching. Basic graph models and algorithms for searching, shortest paths, and matching. Dynamic programming. Linear and convex programming. Floating point arithmetic, stability of numerical algorithms, Eigenvalues, singular values, PCA, gradient descent, stochastic gradient descent, and block coordinate descent. Conjugate gradient, Newton and quasi-Newton methods. Large scale applications from signal processing, collaborative filtering, recommendations systems, etc.

Fall 2016 Semester: 3 credits
Sec 001 Tue/Thu 6:10PM-7:25PM  Call #23325
Sec 002 Tue/Thu 7:40PM-8:55PM  Call #20955



Probability Theory

Professor Banu Baydil (Syllabus)

Prerequisites: MATH V1101 Calculus I and V1102 Calculus II or the equivalent.

A calculus-based introduction to probability theory. Topics covered include random variables, conditional probability, expectation, independence, Bayes' rule, important distributions, joint distributions, moment generating functions, central limit theorem, laws of large numbers and Markov's inequality.

Fall 2016 Semester: 3 credits
Sec 006 Mon/Wed 6:10PM-8:55PM Call #87194
*MEETS 9/6-10/24


Probability & Statistics for Data Science

Professor Banu Baydil

Prerequisite: Calculus

This course covers the following topics: Fundamentals of probability theory and statistical inference used in data science; Probabilistic models, random variables, useful distributions, expectations, law of large numbers, central limit theorem; Statistical inference; point and confidence interval estimation, hypothesis tests, linear regression.

Fall 2016 Semester: 3 credits
Sec 001 Tue/Thu 6:10PM-7:25PM Call #18737

STAT W4701

Exploratory Data Analysis & Visualization

Instructor to be announced.

Prerequisite: programming.

Fundamentals of data visualization, layered grammar of graphics, perception of discrete and continuous variables, introduction to Mondran, mosaic pots, parallel coordinate plots, introduction to ggobi, linked pots, brushing, dynamic graphics, model visualization, clustering and classification.

Spring 2016 Semester: 3 credits
Sec 001 Tue/Thu 7:40PM-8:55PM Call #63823


Statistical Inference & Modeling

Professor Banu Baydil

Prerequisites: Working knowledge of calculus and linear algebra (vectors and matrices), and STAT GR5203 or equivalent.

Course covers fundamentals of statistical inference and testing, and gives an introduction to statistical modeling. The first half of the course will be focused on inference and testing, covering topics such as maximum likelihood estimates, hypothesis testing, likelihood ratio test, Bayesian inference, etc. The second half of the course will provide introduction to statistical modeling via introductory lectures on linear regression models, generalized linear regression models, nonparametric regression, and statistical computing. Throughout the course, real-data examples will be used in lecture discussion and homework problems.

Fall 2016 Semester: 3 credits
Sec 001 Mon/Wed 6:10PM-8:55PM   Call #13298
*Meets 10/26-12/12


BINF G4006

Translational Bioinformatics

Professor Nicholas P. Tatonetti (Syllabus

Prerequisites: Familiarity with programming in either Python or R. Basic Probability.

Methods in biomedical data science (i.e. translational bioinformatics) for graduate students and upperclassmen. Students study the statistical and computational algorithms to evaluate large biomedical data, including sequence analysis, application of supervised and unsupervised machine learning, graph theoretic models and network analysis, and chemical informatics. They study how to apply these algorithms to biomedical domains in non-human genetics, human genetics, pharmacology, and public health. Successful completion of the course readies the student for graduate level research in translational bioinformatics.

Fall 2016 Semester: 3 credits
Sec 001 Mon/Wed 2:00PM-3:30PM  Call #11004

EECS E6893

Topics in Information Processing: Big Data Analytics

Professor Ching-Yung Lin (Syllabus)

Prerequisites: one or more programming languages: C, C++, Java, Perl, Python, and/or Javascript.

With the advance of IT storage, processing, computation, and sensing technologies, Big Data has become a novel norm of life. Only until recently, computers are able to capture and analysis all sorts of large-scale data from all kinds of fields -- people, behavior, information, devices, sensors, biological signals, finance, vehicles, astronology, neurology, etc. Almost all industries are bracing into the challenge of Big Data and want to dig out valuable information to get insight to solve their challenges. This course shall provide the fundamental knowledge to equip students being able to handle those challenges. This discipline inherently involves many fields. Because of its importance and broad impact, new software and hardware tools and algorithms are quickly emerging. A data scientist needs to keep up with this ever changing trends to be able to create a state-of-the-art solution for real-world challenges.

Fall 2016 Semester: 3 credits
Sec 001 Thu 7:00PM-9:30PM  Call #12527

SUMA K4360

Sustainability Technology and the Evolution of Smart Cities

Professor Gregory Falco (Syllabus)

This course is offered through the School of Continuing Education. The progress of sustainability in recent years has almost entirely been a result in the evolution of smart, sustainable technology solutions. This course examines opportunities to drive sustainability through technology applications with the end goal of piecing together all of the pieces to envision an intelligent city. Companies are increasingly turning to technology to fulfill their sustainability goals considering many technologies provide off-the-shelf, cost-effective and immediate savings compared to operationally invasive, resource-heavy sustainability transformation programs. Sustainability technology ranges from intelligent infrastructure to mobile applications that help to drive the "sharing economy". The course will provide an overview of the sustainability technologies that large corporations are actively pursuing and delve into the project management and integration strategies required to implement these solutions. Successful sustainability practitioners must not only have a strong understanding of the values and methodologies of sustainable operations, but also the tools and technologies available to drive sustainability throughout their organization. Upon completion of the class, students will have a sufficient level of understanding to discuss these solutions and relevant case studies with potential employers. This course will benefit anyone interested in a career in sustainability or in smart cities as it will provide them the skills and analytical capabilities to analyze which sustainability technologies are a good fit for their company's sustainability and growth strategy.

Summer 2016 Semester: 3 credits
Sec 001 Thu 6:10PM-8:00PM Call #96246

Capstone Project

ENGI E4800

Data Science Capstone & Ethics

Professors Eleni Drinea and Owen Rambow

Prerequisites: CSOR W4246 Algorithms for Data Science, STAT W4105 Probability, COMS W4121 Computer Systems for Data Science, or equivalent as approved by faculty advisor. Corequisites: to be completed along side of or after: STAT W4702 Statistical Inference and Modeling, COMS W4721 Machine Learning for Data Science, STAT W4701 Exploratory Data Analysis and Visualization, or equivalent as approved by faculty advisor.

This course provides a unique opportunity for students in the M.S. in Data Science program to apply their knowledge of the foundations, theory and methods of data science to address data science problems in industry, government and the non-profit sector. The course activities focus on a semester-length data science project sponsored by a faculty member or local organization. The project synthesizes the statistical, computational, engineering challenges and social issues involved in solving complex real-world problems.

Fall 2016 Semester: 3 credits                        
Sec 001 Mon 5:40PM-8:30PM  Call #76011

550 W. 120th St., Northwest Corner 1401, New York, NY 10027    212-854-5660
©2017 Columbia University