Course Info
Below are some common required courses for students in our Master’s in Data Science (MSDS) and PhD in Data Science. To confirm which specific courses are required for each degree, be sure to review our MSDS and PhD program overview pages.
DATA 30100: Introduction to Data Science
The course will focus on the analysis of real life data and on statistical and machine learning methods to perform inference and to predict future outcomes. It will cover topics from the whole data life cycle, ranging from data collection (including wrangling, cleaning, and sampling) to summarizing results through visualization and interpretable summaries, with a focus on extracting meaning, value and information from data. Important aspects in data science, such as bias, fairness, privacy while building algorithms and predictive models, will also be explored.
DATA 31500: Data Interaction
This course provides core knowledge and technical skills around data interfaces, with an emphasis on visualization and front-end software development. Graduate students in Data Science and Computer Science will engage in project-based learning to become fluent with visualization APIs, computational notebooks, web development, technical writing, and presentation. Topics of interest include data visualization design, spatial and visual reasoning, cartography, interactive articles, data storytelling, data-driven persuasion, uncertainty communication, and model interpretability.
DATA 34100: Introduction to Data Systems & Data Design
The goal of this course is to teach students: (1) how to think about data , its logical semantics, and what is a query; (2) how to practically handle data, both in relational databases and other more flexible data processing frameworks (e.g. Spark); (3) practical design principles about schema, integrity constraints, etc. (4) an introduction to systems that allows students to understand performance, and helps them become better users.
DATA 34200: Data Engineering & Scalable Computing
This course covers the principles and practices of managing and processing data at scale. Students will learn about distributed systems, cloud computing, and big data technologies. Topics include data storage architectures, data catalogs and governance, distributed computing frameworks like Apache Spark, streaming data processing, and data transformation pipelines. The course will provide hands-on experience with state-of-the-art tools and techniques for building end-to-end data engineering solutions to support large-scale data science, analytics and AI applications.
DATA 35900: Responsible Use of Data & Algorithms
The goal of this course is to cultivate a societally-oriented mindset and to train students critically about the contexts into which data science is deployed. It will be organized around a series of modules consisting of three components: (i) a broad challenge, (ii) mathematical / technical approaches that have been used to address that challenge, and (iii) a real world case study. The modules will cover a diverse set of topics, including for example: disclosure avoidance (i.e. privacy as in differential privacy); algorithmic fairness; decision making in dynamic and strategic settings; biases in machine learning (e.g. word embeddings or facial recognition); data-driven policymaking; explainable and interpretable AI; and robustness to adversarial behavior.
DATA 37000: Introduction to Machine Learning & Neural Networks
This course is an introduction to machine learning (ML) for students to build a solid foundation in modeling and data science. It will cover both unsupervised and supervised ML algorithms, with the latter focusing on both regression and classification models. Python is the programming language of choice for implementing various models to solve complex problems across multiple domains. The course will also introduce basic neural network architectures, including Single-Layer Perceptron (SLP), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN). Students will apply these techniques in contexts where they are most effective. A strong understanding of linear algebra, multivariable calculus, and statistics/probability theory is expected. Python coding assignments and projects will be integral to the course.
DATA 37711: Foundations of Machine Learning & AI – Part I
This course is an introduction to machine learning targeted at students who want a deep understanding of the subject. Topics include modern approaches to supervised learning, unsupervised learning, and the use of machine learning in estimating real-world effects. In principle, no previous exposure to machine learning is required. However, students are expected to have mathematical maturity at the level of an advanced undergraduate, including being comfortable with linear algebra, multivariate calculus, and (non-measure theoretic) statistics and probability. Assignments include programming in python (and pytorch).
DATA 37712: Foundations of Machine Learning & AI – Part II
Deep generative models have become a staple of modern machine learning research. This course is meant as an introduction to the way generative models are structured and trained: students will learn the mechanics of generative models as well as getting their hands dirty building them. We will discuss open questions for which we lack complete theoretical or empirical answers, with importance placed on analyzing, interpreting, and making arguments from necessarily incomplete empirical evidence. We will have a specific focus on Autoregressive Transformers and their use as Large Language Models (LLMs), but will also touch on Diffusion Models as well as Reinforcement Learning. The goal of this course is to get students to be proficient enough with the inner workings of deep generative models-along with the theoretical and empirical support for their design-to be able to understand and reason about cutting-edge research. This is an advanced machine learning course, and assumes a familiarity with basic machine learning concepts (generalization, overfitting, etc.) and techniques (regularization, stochastic gradient descent, etc).
To view additional course options within our Data Science catalog as well as other adjacent program catalogs, click on the individual links below: