Courses

The M.S. in Data Science & Analytics program curriculum provides data science and analytics fundamentals, with a robust selection of elective courses so you can create your own path.

Current course offerings

Course schedules are subject to change before each semester begins. Please review the Academic Calendar for key dates.

Fall 2026

DSAN 5000: Introduction to Data Analytics (Core)
DSAN 5100: Probabilistic Modeling and Statistical Computing (Core)
DSAN 5400: Computational Linguistics (Elective)
DSAN 5550: Data Science and Climate Change (Elective)
DSAN 6000: Big Data and Cloud Computing (Core)
DSAN 6150: Biological and Biomedical Data Science (Elective)
DSAN 6300: Database Systems and SQL (Elective)
DSAN 6600: Neural Nets and Deep Learning (Elective)
DSAN 6650: Reinforcement Learning (Elective)
DSAN 6700: Machine Learning App Deployment (Elective)
DSAN 7000: Capstone Project (Elective)

Spring 2027

DSAN 5200: Advanced Data Visualization (Core)
DSAN 5300: Statistical Learning (Core)
DSAN 5400: Computational Linguistics – Advanced Python (Elective)
DSAN 5450: Data Ethics and Policy (Elective)
DSAN 5500: Data Structures, Objects, and Algorithms in Python (Elective)
DSAN 5550: Data Science and Climate Change (Elective)
DSAN 5900: Digital Storytelling (Elective)
DSAN 6500: Computer Vision Analytics & Generative Image Modeling (Elective)
DSAN 6550: Adaptive Measurement (Elective)
DSAN 6600: Neural Networks and Deep Learning (Elective)
DSAN 6725: Applied Generative AI for AI Developers (Elective)

Summer 2027

DSAN 6400: Network Analytics (Elective)
DSAN 5650: Causal Inference for Computational Social Science (Elective)

Core courses

Pre-Program Bootcamp: Programming in R and Python

The Georgetown Data Science & Analytics program offers an online course in programming preparation that covers R, Python and command-line use in the summer prior to matriculation. The course is:

Required for all incoming students
Equivalent to three credits
Designed for matriculating M.S. Data Science & Analytics students
Offered free of charge

This online course will run during the Georgetown Summer Session (May – August). It is tailored to prepare you for your first-semester courses and practice gained is essential for success.

DSAN 5000: Data Science and Analytics

3 credits | Offered in the Fall semester

This course introduces you to several core data science concepts. It teaches you how to synthesize disparate, possibly unstructured data to better understand and characterize the world, and in some cases, to draw meaningful inferences. Topics covered include:

The history of data science
Successes and failures in data analytics
The data analytics life cycle
Data/web scraping and APIs
Data wrangling
Data characterization (correlations, identifying clusters and associations)
Data inference
Basic machine learning, network analysis, data ethics and visual analytics

You will work on a semester-long data science project that starts with question formulation and data collection, and goes through all the stages of the life cycle, culminating in data storytelling. The course also maps data science case studies to topics presented throughout the semester.

Prerequisites: Intermediate coding experience in Python and knowledge of introductory statistics

DSAN 5100: Probabilistic Modeling and Statistical Computing

3 credits | Offered in the Fall semester

This course introduces the fundamentals of probabilistic modeling and statistical inference for data science. Students begin with probability, conditional probability, and common discrete and continuous distributions, then extend to joint, marginal, and conditional distributions in the multivariate setting, including the multinomial, bivariate normal, and multivariate normal. The course then explores stochastic processes (Markov chains and hidden Markov models), the laws of large numbers and central limit theorem, and estimation techniques such as the method of moments, maximum likelihood, and Bayesian approaches. Additional topics include interval estimation, hypothesis testing, ANOVA and MANOVA, non-parametric methods, and an introduction to time series analysis with stationarity, model selection, and forecasting.

As a capstone to the course, you will complete a final project that applies probabilistic modeling methods to a real-world research question using authentic data. The project emphasizes model selection, inference, and interpretation, and gives you the opportunity to connect course concepts to practical analysis.

Prerequisites: Introductory statistics, some coding experience (e.g., R)

DSAN 5200: Advanced Data Visualization

3 credits | Offered in the Spring semester

Presenting quantitative information in visual form is a core skill for data professionals, enabling clear communication of patterns, relationships, and insights in complex datasets. This course introduces principles and practices of data representation and visualization, drawing on insights from cognitive science and graphic design. Students will begin with an overview of the human visual system and perceptual processes, then learn to apply models for both data and images alongside design frameworks such as the “grammar of graphics.” Emphasis is placed on developing good design habits that balance accuracy, clarity, and aesthetic appeal. The course explores both static and dynamic visualization techniques. Students will learn the fundamentals of static visualization for creating clear, publication-quality figures and statistical graphics, while also examining dynamic and interactive methods that allow users to explore data in real time. Dynamic approaches, typically implemented with JavaScript libraries, provide opportunities for storytelling, dashboards, and immersive experiences beyond the limitations of static graphics. By comparing and contrasting static and dynamic methods, students will gain an understanding of when each approach is most effective. Practical skills are developed through hands-on work with common visualization tools across several ecosystems: static statistical graphics in Python (Matplotlib, Seaborn), interactive visualization with libraries such as Bokeh, Leaflet, and NetworkD3, the R package ggplot2, and industry tools like Tableau. By the end of the course, students will be able to design and implement effective visualizations for diverse audiences and contexts, ranging from academic publications to interactive data products and professional presentations.

Prerequisites: DSAN 5000

DSAN 5300: Statistical Learning

3 credits | Offered in the Spring semester

Statistical Learning provides a modern introduction to methods for both supervised and unsupervised learning. Core supervised techniques include linear and logistic regression, resampling and cross-validation, model selection, and regularization methods such as ridge and lasso. The course also covers support vector machines, discriminant analysis, and introductory neural networks with practical deep learning workflows. On the unsupervised side, students learn clustering methods and dimension reduction techniques such as Principal Component Analysis (PCA), which are especially valuable in the analysis of high-dimensional data. Additional advanced topics include survival analysis, symbolic regression, Gaussian processes, and causal inference.

The course culminates in a final poster project where students work on a real-world research question using real data. This project provides hands-on experience in model building, evaluation, and interpretation, giving students the opportunity to demonstrate their ability to analyze complex, high-dimensional data while showcasing their skills in predictive modeling and data-driven storytelling.

Prerequisites: DSAN 5100

DSAN 6000: Big Data and Cloud Computing

3 credits | Offered in the Fall semester

Data is everywhere, and often it’s simply too large or complex for traditional tools to handle. This hands-on, workshop-style course introduces the principles and practice of big data analytics and cloud computing. Students learn how to work with distributed computing frameworks such as Apache Spark, along with modern tools like DuckDB, Polars, and vector databases, and cloud platforms including AWS. Key topics include parallelization and concurrency, data warehousing, scalable machine learning with Spark MLlib, streaming analytics, and serverless data engineering. The course emphasizes end-to-end workflows—from data ingestion and cleaning to analysis, modeling, and presentation—using Python (PySpark), SQL, and Git/GitHub.

A major focus is on practical application: by the end of the course, students will be able to set up and manage cloud environments, process massive datasets, build scalable pipelines, and apply distributed machine learning methods. The course culminates in a final project where students execute a full big data workflow on real-world datasets using cloud resources.

Prerequisite: Working knowledge of Python and the Unix command line, some knowledge of data structures and DSAN 5000

Elective courses

Please note that electives offered are subject to change.

DSAN 5400: Computational Linguistics – Advanced Python

3 credits | Offered in both Fall and Spring semesters

This course presents topics in Natural Language Processing (NLP) and Python programming. The goal of this class is to explore techniques in NLP, with a strong emphasis on hands-on instruction that progressively matures basic Python users into expert Python developers. We will examine topics such as text classification, model evaluation, machine translation and distributed representations. Throughout the semester, you will select and read a book on AI ethics to motivate discussions on the social impact of modern NLP technologies. Applications include authorship identification, retrieval and textual similarity, to name a few.

About half of the total class time is devoted to addressing an essential but often neglected piece in software development education: moving from typical data science programming workflows (such as writing basic scripts) to developing sophisticated Python projects. In other words, you will learn to design professional-grade software that you and others will be proud to contribute to together. Programming topics are explored in great depth, including Python best practices, object-oriented design, project structuring and more. This class will give you the skills you need to contribute to the professional software repositories you work with already and even develop your own.

DSAN 5450: Data Ethics and Policy

3 credits | Offered in the Spring semester

This graduate-level course will train you to navigate the landscape of ethical issues which inevitably arise across a variety of fields and industries, in each step of the data science process. You will explore and critically evaluate a range of data-related issues in contemporary society, such as responsible data collection, algorithmic bias, privacy, transparency, accountability, democratic participation in data usage and data-driven decisions and the ethical implications of emerging technologies like artificial intelligence and machine learning (self-driving cars, ChatGPT, crowd-sourced training data, etc.).

Through a combination of theoretical discussions and real-world case studies, you will examine the profound public policy and social issues associated with data ethics. This course will empower you by introducing a set of general ethical frameworks (consequentialism, deontological ethics and virtue ethics) and discussing their relative strengths and weaknesses in terms of their ability to address modern ethical dilemmas and guide ethical decision-making processes in business, healthcare, government and academia. These theoretical frameworks will then be discussed in light of more practical regulatory and policy considerations, so that you will have the tools you need to draw conclusions (in the form of your final projects) about best practices for data handling within a particular field or topic of interest to you.

The course will thus equip you with a robust ethical “toolbox” for conscientiously gathering, interpreting and extracting meaning from data while respecting privacy, fairness, transparency, democratic accountability and other social concerns.

DSAN 5500: Data Structures, Objects, and Algorithms in Python

3 credits | Offered in the Spring semester

The Data Structures, Objects, and Algorithms in Python course will look at built-in data structures, such as dictionaries, lists, tuples, sets, strings and frozen sets. The course will also cover objects and classes in Python, as well as building new structures and objects. The class will cover algorithms including runtime, recurrence and development. Applications will include data science problems.

Prerequisite: A working or intermediate knowledge of Python

DSAN 5550: Data Science and Climate Change

3 credits | Offered in the Fall semester

Data science, as a key component of Artificial Intelligence, is helping shape profound changes to society. An equally forceful phenomenon affecting our world is climate change. This course will investigate the myriad ways data science can be used to address climate change. This will include aspects of climate change which data science is already beginning to tackle, such as mitigating emissions from the five most carbon-intensive societal activities – energy, manufacturing, agriculture/land use, transportation and buildings/infrastructure. We will also look at data science’s emerging role in areas such as climate modeling, biodiversity conservation, carbon capture, climate mitigation finance, geoengineering, climate ethics and reducing the carbon footprint of data science itself. We will see how the following data science and machine learning topics can be used to address climate change:

Regression
Gradient boosting
Causal inference
Interpretability
Optimization
Image processing
Natural language processing
Reinforcement learning
Time-series analysis
Several neural network architectures

Using a variety of existing data sources, you will undertake a final project of your choosing to apply a data science technique to an aspect of climate change.

DSAN 5600: Applied Time Series for Data Science

3 credits | Offered in the Fall semester

The analysis of data observed across time presents unique challenges due to temporal dependence, which violates the independence assumptions of many classical statistical and machine learning techniques. This course provides a comprehensive introduction to Time Series Analysis, equipping students with the theory and tools needed to model, interpret, and forecast temporal data.

The course begins with foundational concepts, including stationarity, autocorrelation, decomposition, and visualization techniques (Plotly, ggplotly, Tableau). Students then study classical univariate models such as AR, MA, ARMA, ARIMA, and Seasonal ARIMA (SARIMA), along with exponential smoothing and time-series cross-validation. Building on this, the course covers multivariate and exogenous-input models (VAR, ARIMAX, SARIMAX), Granger causality, and advanced methods for financial time series such as ARCH, GARCH, GARCH-M, and EGARCH. Additional topics include long-memory processes (ARFIMA), interrupted time series, and Bayesian structural time series (BSTS).

Modern machine learning approaches are also introduced, with emphasis on deep learning architectures for sequence modeling (RNN, LSTM, GRU, bi-directional models, transformers), implemented in PyTorch, TensorFlow, and Keras. Students gain exposure to structural and probabilistic models, as well as spectral methods such as Fourier analysis and power spectral density estimation.

Practical Applications & Project

Applications span a wide range of domains, including stock market and financial forecasting, econometrics and macroeconomic indicators, climate and environmental data, and public health studies such as COVID-19 impact analysis. Students will also undertake a self-directed mini capstone project using real-world time series data, applying classical and modern methods to their own area of interest. This project culminates in a portfolio-ready website that demonstrates their ability to model, forecast, and communicate insights from complex temporal data.

Prerequisites: DSAN 5000 and DSAN 5100

DSAN 5650: Causal Inference for Computational Social Science

3 credits | Offered in the Summer semester

This course provides you with the opportunity to take the analytical skills, machine learning algorithms and statistical methods learned throughout your first year in the program and explore how they can be employed towards carrying out rigorous, original research in the behavioral and social sciences. With a particular emphasis on tackling the additional challenges which arise when moving from associational to causal inference, particularly when only observational (as opposed to experimental) data is available, you will become proficient in cutting-edge causal Machine Learning techniques such as propensity score matching, synthetic controls, causal program evaluation, inverse social welfare function estimation from panel data and Double-Debiased Machine Learning.

In-class examples will cover continuous, discrete-choice and textual data from a wide swath of social and behavioral sciences: economics, political science, sociology, anthropology, quantitative history and digital humanities. After gaining experience through in-class labs and homework assignments focused on reproducing key findings from recent journal articles in each of these disciplines, you will spend the final weeks of the course on a final project demonstrating your ability to develop, evaluate and test the robustness of a causal hypothesis.

Prerequisites: DSAN 5000 and DSAN 5100

DSAN 5700: Blockchain Technologies for Data Science

3 credits | Offered in the Spring semester

This course is designed to provide hands-on experience in building public and private blockchains. You will gain the critical insight, practical knowledge and technical skills required to design and integrate successful blockchain technologies into a business domain. The course covers the basics of Blockchain technologies, including decentralized ledgers, consensus mechanisms, Public and Private key cryptography, smart contracts, etc. The class also examines the public policy and social issues addressed by and arose from adopting blockchain technology in finance, supply chain and healthcare. The course utilizes Blockchain services (such as Azure) to teach blockchain development on platforms such as Ethereum (or other) and provides the necessary path for learning to build Blockchain networks at scale. Blockchain technologies (distributed ledger, smart contracts, etc.) are a new paradigm in data management and sharing technologies. Gartner Technology hype cycle predicts blockchain to be the industrial-ready phase in the next five years. You will take a deep dive into organizations with integrated blockchain technologies as part of your business strategy. Blockchain’s mainstream use by firms like Walmart, Merck, BlackRock will mean that our Data Analytics program graduates will be interacting with the technology often and will require expertise in handling, extracting, converting data to and from the blockchain. This course will provide an early introduction, making you Blockchain-ready.

DSAN 5800: Advanced Natural Language Processing

3 credits | Offered in the Fall semester

This course provides a formalism for understanding the statistical machine learning methods that have come to dominate natural language processing. Divided into three core modules, the course explores:

(i) how language understanding is framed as a tractable statistical inference problem.
(ii) a formal yet practical treatment of the DNN architectures and learning algorithms used in NLP.
(iii) how these components are leveraged in modern AI systems such as information retrieval, recommender systems and conversational agents.

In exploring these topics, the course exposes you to the foundational math, practical applications, current research directions and software design that is critical to gaining proficiency as an NLP/ML practitioner. The course culminates in a capstone project, conducted over its final six weeks, in which you apply NLP to an interesting problem of your choosing. In past semesters students have built chatbots, code completion tools, stock trading algorithms, just to name a few. This course assumes a basic understanding of linear algebra, probability theory, first order optimization methods and proficiency in Python.

This is an advanced course. Suggested prerequisites are DSAN 5000, DSAN 5100 and DSAN 5400. However, first-year students with the necessary math, statistics and deep learning background will be considered.

DSAN 5900: Digital Storytelling

3 credits | Offered in the Spring semester

To be successful, a data scientist needs to have many skills beyond coding. This course will teach you how to communicate your findings and data to users, clients and stakeholders to make the biggest impact on your organization and your career. The course consists of two types of activities: lectures and exercises. Lectures provide a theoretical foundation of storytelling; the exercises are designed to help you learn practical skills that work best for different audiences. You will learn what fits your personal style and practice the power of storytelling that can inform and influence decision-makers.

Writing topics will include technical writing, writing for action and writing for communication in data science with respect to non-technical readers. Storytelling and visualization topics will include methods for presenting results as conclusions, presenting results as actionable items, and creating visual narratives. Interactive visualizations will focus not only on clearly illustrating information within data for use in decision-making, but also for use in discovery and exploration, as well as question generation. Topics in expression, color use and style will be included. You will learn how to make a point with your charts and how to make your charts clear to the audience. Further topics in the area of information presentation will include speaking with clarity to a team, group or large audience.

All parts of the course will contain applications that focus on the utilization of the results of data science and analytics to promote public good, to encourage social concerns and equality and to support change in areas such as business, public health and public policy.

DSAN 5925: Internship

0.25 credits | Offered every semester

The program does not require you to complete an internship, but an internship can provide you with practical work experience. This course enables international students to do internships at U.S. companies. You must obtain the approval from the program director and submit a Curricular Practical Training (CPT) form to proceed with an internship under this course. To be approved, an internship must be aligned with the Data Science and Analytics program goals and provide a significant learning experience for you. At the end of the internship, you must submit a deliverable to the course instructor.

DSAN 6150: Biological and Biomedical Data Science

3 credits | Offered in the Fall semester

We are bombarded every day with multiple claims of health risks (doing this will ruin your health), new treatments and cures (just take this for 30 days for a new you) and better lifestyle choices. How are these claims made, evaluated and validated using data science? Data drives our knowledge of biology, disease and effective treatments. This data is diverse, complex, large and in many respects unique. This data drives our understanding of whether risk factors or treatments causally change our health outcomes, whether our genes or our environment affects our health and decisions about drugs, protocols and public health that affect all of us every day. In this class, we explore this rich, diverse data landscape and the specialized methods needed to make sense of it, leveraging the instructor’s decades-long experience in collaborative epidemiological and biomedical research across academia, government and industry. We will explore designing good experiments to extract causal relationships and how we might still make valid decisions even in non-ideal settings. We will explore high-dimensional multivariate data and evaluate the validity of finding a “needle in a haystack” biomarker that can be targeted for treatment. We will see how statistical modeling (survival analysis in particular), machine learning, AI and explainable AI have made an impact in helping us understand this world within. We will see how data-driven decision-making using Bayesian analysis works. This journey will take us through real-life applications in bioinformatics (understanding how genes, proteins and other molecular markers affect disease), epidemiology (how diseases spread and how interventions can prevent it) and clinical research (clinical trials, observational studies, case-control studies).

Prerequisite: DSAN 5100 or first-year DSAN students with the necessary statistics background will be considered

DSAN 6300: Database Systems and SQL

3 credits | Offered in the Fall semester

This course will explore several aspects of modern database management systems, database programming, relational databases, semi-structured databases and SQL. The course will begin with an introduction to relational models, normal forms and schema design, relational algebra and SQL Programming. The course will focus on application development using relational databases and will introduce Big Data concepts and discuss Big Data Processing. Both structured and semi-structured data will be considered, such as XML, JSON and record-style. Query processing methods will be applied and evaluated. Topics will also include recursion in SQL, constraints and triggers, indices and transactions, data storage including column-oriented and distributed storage, noSQL and different types of databases, such as non-relational, scientific, parallel and streaming. The course will discuss types of database-system architectures, including cloud-based services. Applications will coincide with data science and analytics, as well as public policy, intelligence generation and narratives. Tools may also include cloud-based DBMS.

DSAN 6400: Network Analytics

3 credits | Offered in the Summer semester

The design and analysis of networks to represent interactions between and within data is a quickly emerging discipline of significant importance. Data Analytics combines graph theory, optimization, data science, data visualization, community and cluster analysis and more. Topics in this course will help answer intriguing questions such as, “How can we make sense of large, highly-associated data sets, ranging from social networks to the smart power grid?” or “Which models are more accurate for predicting popularity on Twitter?” or “How can we estimate the spread of a contagion or of information?” The course will begin with a discussion of applications, specifically to data science and analytics. From there, a formal framework for analysis of graphs and trees will be introduced. This will include graph theory and representation, optimization and graph-based algorithms. Next packages in Python and/or R will be investigated for the purposes of exploring and visualizing data that contain relationships. These packages will then be used to model and analyze complex data sets for the purposes of community detection, path analysis, influencer assessment, logistics analytics, contagion or information spread (such as rumor spreading), web page ranking and more. Examples of data science applications are provided with real-world data sets, including social network data, web-based data, attributed data, flow data, biological data and more.

DSAN 6500: Computer Vision & Generative Image Modeling

3 credits | Offered in the Spring semester

Computer Vision Analytics & Generative Image Modeling offers a comprehensive introduction to image mining and computer vision. The course covers image acquisition, representation and processing, including convolution, Fourier transforms, filters and feature generation. Advanced topics include classification, segmentation, spatial relations, deep-fake detection, object tracking and image sentiment analysis. You will explore cutting-edge techniques such as diffusion models, Vision Transformers, advanced GANs (StyleGAN, CycleGAN), Neural Radiance Fields (NeRFs), zero-shot and few-shot learning, image-to-image translation, self-supervised learning, 3D generative models, cross-modal generation and adversarial robustness. Practical applications include facial recognition, OpenCV in Python, deep fake detection, gesture analysis and object/scene categorization. This course equips you with the skills to develop sophisticated computer vision and generative image solutions.

DSAN 6550: Adaptive Measurement with AI

3 credits | Offered in the Spring semester

This course provides an opportunity for you to engage in learning new algorithms and data science methods in measurement that is applied across all research fields. Unlike traditional one-size-fits-all assessments, how to make an adaptive test, survey, scale or game for individuals that are tailored by their ability, interests, behavior, health status and learning requirements is the major theme to be explored in this course. Topics will include but not limited to fundamental psychometric modeling, item response theory, item bank, item information, equating and differential item functioning, cognitive diagnostic modeling, adaptive testing, game-based assessment, sequence mining on process data, generalized models in large-scale assessment, automated item scoring with NLP, personalized assessment design and generative AI for automated item generation. In addition, this course will invite 1-2 experts from industry to share the fresh ideas and latest products in measurement to the class. The knowledge taught in this course is emergent and arousing incredibly increasing attention in recent years, especially in high demand at organizations or government that manage large-scale assessments (e.g., World Bank, USAID, OECD, AIR, NCES), high-tech learning, education, game companies (e.g., Pearson, ETS, Duolingo, Roblox), e-commercial high tech platforms imbedded with behavioral science (e.g., Amazon, Meta, Google), public health (e.g., NIH), medical recovery services (e.g., hospital, mental health center), just name a few, and extended to general needs in interdisciplinary research and survey designs.

Prerequisite: Basic statistical knowledge and programming skills in R or Python

DSAN 6600: Neural Networks and Advanced Deep Learning

3 credits | Offered in both Fall and Spring semesters

Neural Networks and Advanced Deep Learning explores both foundational and cutting-edge deep learning techniques. You start with a short review of core concepts such as feed-forward networks, activation functions, backpropagation and optimization using TensorFlow and Keras. The course reviews convolutional and recurrent neural networks, auto-encoders and methods to prevent overfitting. Advanced topics include Transformers and attention mechanisms, Graph Neural Networks, self-supervised and contrastive learning, neural architecture search, adversarial robustness, energy-based models and neural ODEs. Additionally, the curriculum delves into advanced optimization, neuro-symbolic AI and bio-inspired deep learning. Various practical applications are covered to equip you with the skills to address complex deep learning challenges across diverse domains.

DSAN 6650: Reinforcement Learning

3 credits | Offered in the Fall semester

The field of machine learning is typically divided into three fundamental paradigms: supervised learning, unsupervised learning, and reinforcement learning (RL). Reinforcement learning focuses on how intelligent agents learn to act within an environment to maximize a cumulative reward function. Over the past several decades, concepts from deep learning have been increasingly integrated into RL, giving rise to the field of deep reinforcement learning, which has produced remarkable results across applications such as self-driving cars, autonomous gameplay, robotics, trading and finance, and natural language processing. This course begins with the fundamentals of traditional (non-deep) reinforcement learning, then reviews key deep learning concepts before transitioning to deep RL through the incorporation of artificial neural networks into RL models. With a strong coding emphasis in Python, students will explore topics including, Markov decision processes, solving tabular RL problems, function approximation with neural networks, multi-armed and contextual bandits Monte Carlo methods, temporal difference learning, deep Q-learning, actor-critic methods, and policy gradient methods. Prerequisites include intermediate Python programming skills as well as knowledge of introductory statistics and multivariable calculus.

Prerequisite: DSAN 5300

DSAN 6700: Machine Learning App Deployment

3 credits | Offered in the Fall semester

Machine learning application deployment bridges the gap between training a machine learning model and running it reliably in production. Many data scientists can build a model that functions well in a notebook. But far more interesting in industry is the question that immediately follows: how do I make this available to users, at scale, without it silently breaking over time? This class answers that question concretely, using a single semester-long project as the vehicle. By the end of the course, students will have built, deployed, and monitored a full ML-powered pipeline.

The course is organized around a deliberate progression. The first part of the class establishes the engineering foundations that every subsequent week builds on: modern Python packaging, automated quality gates, probabilistic data structures, system architecture, and containerization. In the second part, we build the running service: a REST API, experiment tracking, distributed caching, and cloud deployment. The final part of the course focuses on making the service production-grade: reliability patterns, security, orchestration, and observability. Each week adds a concrete component to the project.

The culminating activity of the course is the final project, where students will build and deploy their own sophisticated application that draws both from the skills we learn from the running example we study throughout the semester. The strategies discussed here are often components of state-of-the-art production systems. This class will help students build and deploy robust production-grade systems. Incrementally, students will construct production-grade, cloud-deployed AI systems, drawing from cutting-edge tools used in today’s AI ecosystems. Students are expected to be able to easily program in Python at an intermediate level and have strong prior exposure to deep learning or machine learning.

Prerequisite: DSAN 5000

DSAN 6725: Applied Generative AI for AI Developers

3 credits | Offered in the Spring semester

This course is designed for AI developers aiming to build cutting-edge Generative AI (GenAI) applications. Focusing on the applied side of AI, you will explore key techniques such as in-context learning (ICL), retrieval-augmented generation (RAG), AI agents and responsible AI principles. The course covers advanced tools and methods, including embedding models, inference optimizations (e.g., quantization, multi-adapter swapping), fine-tuning of pre-trained models and benchmarking LLMs. You will gain hands-on experience with open-source tools like LangChain, LlamaIndex and platforms such as AWS, applying your skills in practical GenAI applications. The course culminates in a capstone project, preparing you to deploy scalable, optimized AI systems in real-world scenarios. This course bridges the gap between data science knowledge and applied AI development, empowering you to solve industry-level challenges. This is an advanced course open to second-year students. However, first-year students may register with permission from the instructor.

DSAN 6750: Geographic Information Systems (GIS) and Applications

3 credits | Offered in the Fall semester

Geographic Information Systems (GIS) are used as tools for describing, analyzing, managing and presenting information about the relationships between geographical and spatial locations, sizes and shapes. This is known as attribute data. GIS uses techniques that can represent social and environmental data as a map, with a significant number of applications including those in engineering, architecture, public health, environmental science and business. GIS data will be created through a variety of methods, including those offered by global positioning system (GPS) technologies.

Prerequisite: Knowledge of R and Python

DSAN 6850: NLP with Large Language Models

3 credits | Offered in the Spring semester

In recent times, Large Language Models (LLMs) have earned the attention of the world. OpenAI’s infamous generative LLM, ChatGPT, became the fastest-growing consumer application in history in only two months–and the feverish interest around LLMs continues to grow. This course is concerned with applying LLMs to natural language processing (NLP) problems in real-life settings. This is a seminar-based course, so you will spend the majority of time outside of class reading, with in-class time dedicated to presenting and discussing recent research developments in NLP. The course will begin with a review of the transformer architecture that underlies LLMs and describe its prominent role in modern NLP. Then, we will discuss modern issues using transformers, including: the training and scaling of transformer-based models, variations on the classic transformer, transfer learning in low-resource settings, model deployment, distributed systems and more. Meta-learning, multimodal learning and societal impact will also be covered. You will work on applications such as cross-language information retrieval, machine translation, prompt engineering and select tasks outside of NLP. By the end of the course, you will have mastered transformer-based models and will be poised to use them at the cutting edge of NLP practice today.

DSAN 7000: Advanced Research Methodologies (Capstone Project)

3 credits | Offered in the Fall semester

The Capstone Project course is designed to equip you with advanced research methodologies for developing impactful data science projects, alongside the principles of effective scientific writing. You will engage in real-world, collaborative projects under the joint supervision of internal faculty members and external industry mentors. Through an immersive curriculum, you will learn the principles of research question development, research methods selection, effective results interpretation, scientific writing, journal paper submission, peer review, authorship and communicating scientific ideas to academic and non-academic audiences. In addition, this course will provide guidance on paper publishing, including how to identify an appropriate journal, navigate the selection process, edit and measure impact. Ideal outcomes include improved self-editing, development of effective strategies for offering and receiving concise editorial recommendations among peers and finalizing a research paper to be submitted to academic publications (e.g., journal, conference, research reports, etc.). Capstone projects are designed to bridge theory and practice and foster meaningful collaboration with academic researchers and industry mentors. These projects are expected to generate substantive contributions to your professional development and future career trajectories. This course is particularly well-suited for students interested in academic research practices and/or pursuing doctoral studies. While publication cannot be guaranteed, you will fully engage in the scholarly experience of preparing a high-quality research paper for academic review.

Prerequisites: DSAN 5000, DSAN 5100, DSAN 5300; knowledge about data science, introductory statistics, statistical learning and coding experience in R and/or Python

Request more information

Discover how Georgetown’s M.S. in Data Science & Analytics program can accelerate your data science and analytics career.

Loading…