Elective Courses

Data Science & Analytics Electives

DSAN-5400: Computational Linguistics – Advanced Python

This course presents advanced topics in Natural Language Processing (NLP) and Python programming for both text processing and analysis. The goal of this class is to explore both classical and modern techniques in NLP, with emphasis on hands-on application. We will examine topics such as text classification, model evaluation, nearest neighbors, and distributed representations. Applications include authorship identification, information retrieval, and semantic textual similarity, to name a few.

Programming topics include Python best practices, scientific computing libraries (e.g., NumPy, sklearn, etc.), exception handling, object-oriented programming, and more. By the end of this course, students will be able to program proficiently in Python, with enough comfort to reference software documentation and pseudocode to write sophisticated programs from scratch. 3 credits.

DSAN-5500: Data Structures, Objects, and Algorithms in Python

The  Data Structures, Objects, and Algorithms in Python course will look at built-in data structures, such as dictionaries, lists, tuples, sets, strings, and frozen sets. The course will also cover objects and classes in Python, as well as building new structures and objects. The class will cover algorithms including runtime, recurrence, and development. Applications will include data science problems. Prerequisite: A working or intermediate knowledge of Python. 3 credits.

DSAN-5600: Time Series

The analysis of experimental data that have been observed at different points in time leads to new and unique problems in statistical modeling and inference. The obvious correlation introduced by the sampling of adjacent points in time can severely restrict the applicability of the many conventional statistical/ Machine Learning methods traditionally dependent on the assumption that these adjacent observations are independent and identically distributed. The systematic approach by which one goes about answering the mathematical and statistical questions posed by these time correlations is commonly referred to as Time Series Analysis. This class will cover but not be limited to traditional time series modeling including ARIMA, SARIMA, ARIMAX, SARIMAX, and VAR models, Financial Time Series modeling including, ARCH, GARCH models, and nowcasting, Bayesian structural time series (BSTS) models, Spectral Analysis and Deep Learning Techniques for Time Series.

Analytics techniques include model fitting, statistical methods, visualization, and data storytelling. This course will include structured programming with the R language, statistical computing, the use of models to make forecasts, data formatting, cleaning and manipulation of data, solving statistical and time series equations, building predictive models, utilizing graphical applications, and applying applicable machine learning methods and models. It is recommended that students know multivariate calculus, linear algebra, probability, and statistics at the undergraduate level. 3 credits.

DSAN-5700: Blockchain Technologies for Data Science

This course is designed to provide hands-on experience in building public and private blockchains. Students will gain the critical insight, practical knowledge, and technical skills required to design and integrate successful blockchain technologies into a business domain. The course covers the basics of Blockchain technologies including decentralized ledgers, consensus mechanisms, Public and Private key cryptography, smart contracts, etc. The class also examines the public policy and social issues addressed by and arose from adopting blockchain technology in finance, supply chain, and healthcare. The course utilizes Blockchain services (such as Azure) to teach blockchain development on platforms such as Ethereum (or other) and provides the necessary path for learning to build Blockchain networks at scale. Blockchain technologies (distributed ledger, smart contracts, etc.) are a new paradigm in data management and sharing technologies. Gartner Technology hype cycle predicts blockchain to be the industrial ready phase in the next five years. Students will take a deep dive into organizations with integrated blockchain technologies as part of their business strategy. Blockchain’s mainstream use by firms like Walmart, Merck, BlackRock will mean that our Data Analytics program graduates will be interacting with the technology often and will require expertise in handling, extracting, converting data to and from the blockchain. This course will provide an early introduction to the graduates, making them Blockchain-ready. 3 credits.

DSAN-5800: Natural Language Processing for Data Analytics

This course will cover the major techniques for mining and analyzing textual data to extract interesting patterns, discover knowledge, and support decision-making. Students will learn the main concepts and algorithms in Natural Language Processing and their applications in data science. These include search and information retrieval, document clustering and classification, topic modeling, sentiment analysis, and deriving meaning from unstructured narratives. In addition to traditional techniques in machine learning such as regression, decision trees, and Naive Bayes algorithms, the course will also examine the latest approaches in Deep Learning. Students will be given the opportunity to develop hands-on experience in building foundational tools and machine learning algorithms that can be applied to real analytics problems. The data obtained from textual content can be used to augment numerical data for the purposes of building predictive models, identifying emerging issues, detecting opinion, and determining important relationships. Prerequisite: Working knowledge of Python. 3 credits.

DSAN-5900: Digital Storytelling

This course will offer a strong foundation in data communication, storytelling for data science, and decision science in data science. Writing topics will include technical writing, writing for action, specifications documentation, and writing for communication in data science with respect to non-technical readers. Storytelling and visualization topics will include methods for presenting results as conclusions, presenting results as actionable items, and creating presenting visual narratives so as to clarify the information contained with data for those making decisions. Visualization topics will include coding tools, such as ggplot and matplotlib, interactive tools, such as plotly and Tableau, as well as advanced interactive visualization methods such as NetworkD3, leaflet, and client-server-based Shiny/R. Interactive visualizations will focus not only on clearly illustrating information within data for use in decision-making, but also for use in discovery and exploration, as well as question generation. Topics in expression, color use, and style will be included. Further topics in the area of information presentation will include speaking with clarity to a team, group, or large audience, understanding and focusing explanations toward an audience appropriately, presenting conclusions that any audience can understand and utilize, as well as flow, clarity, and methods in improving communication. All parts of the course will contain applications is decision science and will focus on the utilization of the results of data science and analytics to promote public good, to encourage social concerns and equality, and to support change in areas such as business, public health, and public policy. 3 credits.

DSAN-5925: Internship

An internship provides the student with practical work experience. This course enables international students to do internships at US companies. Students must obtain the approval from the Program Director and submit a Curricular Practical Training (CPT) form to proceed with an internship under this course. To be approved, an internship must be aligned with the Analytics program goals and provide a significant learning experience for the student. At the end of the internship, the student must submit a deliverable to the course instructor, 0.25 credits. (The program does not require students to complete an internship.)

DSAN-6100: Optimization

Optimization is concerned with the general task of finding a set of parameters such that a given target function is made as small as possible or such that the fit with a desired goal is as close as possible. Such parameters can be numbers, but also character strings, geometric shapes, or paths in a network. These problems are ubiquitous in data science. Topics of this course include: Common mathematical optimization paradigms, efficient algorithmic techniques, and important Data Science applications of optimization over Euclidean spaces. The primary paradigms covered are Linear Programming, Convex Programming, and Semidefinite Programming. Algorithmic techniques include Line Searches, Gradient Descent, Newton’s method, the Simplex Method, and Interior Point Methods. Various formulations of the least-squares problem are used to motivate theory and techniques throughout the course, and the course concludes with a selection of applications of optimization in Data Science (which may include Clustering, Community Detection, Dimension Reduction, Expectation Maximization, Latent Semantic Indexing, Neural Networks, Search, Spectral Embeddings, Stochastic Gradient Descent, Support Vector Machines, or Visualization depending upon student interest). 3 credits

DSAN-6200: Advanced Analytics and Applied Math for Streaming and High Dimension Data and Applications

This course covers sparse, low-rank, non-linear, and randomized techniques for analyzing high dimensional data in unsupervised or supervised settings. Techniques are motivated by real world applications and datasets. Applications include analysis of financial data, gene expression data, and image and video datasets. For example, streaming data, Internet-of-things data, medical image and video data, gene expression data, and financial datasets are all examples of high-dimensional datasets where the number of variables describing the dataset can overwhelm traditional estimation and computational approaches. Sparse, low-rank, non-linear, and randomized approaches have recently emerged to handle these kinds of datasets. The successes of these methods demonstrates that high dimensional data often exhibits a low dimensional structure that can be exploited to recover robust, efficient data analysis. This class includes coverage of LASSO, CART, highD logistic regression, sparse dictionary learning, alternating minimization, compression applications for image compression, PCA, kernel PCA, Johnson-Lindenstrauss, metric graphs and spectral embeddings, spectral graph theory, manifold learning, K-SVD, geometric multiresolution analysis, kernel density estimation, and others. 3 credits.

DSAN-6300: Database Systems and SQL

This course will explore several aspects of modern database management systems, database programming, relational databases, semi-structured databases, and SQL. The course will begin with an introduction to relational models, normal forms and schema design, relational algebra, and SQL Programming. The course will focus on application development using relational databases and will introduce Big Data concepts and discuss Big Data Processing. Both structured and semi-structured data will be considered, such as XML, JSON, and record-style. Query processing methods will be applied and evaluated. Topics will also include recursion in SQL, constraints and triggers, indices and transactions, data storage including column-oriented and distributed storage, noSQL, and different types of databases, such as non-relational, scientific, parallel, and streaming. The course will discuss types of database-system architectures, including cloud-based services. Applications will coincide with data science and analytics, as well as public policy, intelligence generation, and narratives. Tools may also include cloud-based DBMS. 3 credits.

DSAN-6400: Network Analytics

The design and analysis of networks to represent interactions between and within data is a quickly emerging discipline of significant importance. Data Analytics combines graph theory, optimization, data science, data visualization, community and cluster analysis, and more. Topics in this course will help answer intriguing questions such as, “How can we make sense of large, highly-associated data sets, ranging from social networks to the smart power grid?” or “Which models are more accurate for predicting popularity on Twitter?” or “How can we estimate the spread of a contagion or of information?” The course will begin with a discussion of applications, specifically to data science and analytics. From there, a formal framework for analysis of graphs and trees will be introduced. This will include graph theory and representation, optimization, and graph-based algorithms. Next packages in Python and/or R will be investigated for the purposes of exploring and visualizing data that contain relationships. These packages will then be used to model and analyze complex data sets for the purposes of community detection, path analysis, influencer assessment, logistics analytics, contagion or information spread (such as rumor spreading), web page ranking, and more. Examples of data science applications are provided with real-world data sets including social network data, web-based data, attributed data, flow data, biological data, and more. 3 credits

DSAN-6500: Image Mining and Computer Vision Analytics

The Image Mining and Computer Vision Analytics course provides a comprehensive introduction to Image Mining and Computer Vision Analytics including image acquisition and representation, low-level vision, and high-level analytics. Image Mining topics may include image representation, image processing, and image retrieval. Low-level vision topics may include, convolution, Fourier Transform, filters, operators, and feature generation. High level vision topics may include, classification, segmentation, spatial relations, deep-fake (vision fake) detection, object tracking, deformable models, image sentiment analysis, and graph based models. Applications will include facial recognition in Python, OpenCV in Python, Deep Fake (fake images) detection, image sentiment and gesture analysis, and object/scene detection and categorization. 3 credits

DSAN-6600: Neural Networks and Deep Learning

This course will explore the fundamentals of artificial neural networks (ANNs) and deep learning. The following topics will be covered: feed-forward ANNs, activation functions, output transfer functions for regression and classification, cost functions and related likelihood functions, backpropagation and optimization (including stochastic gradient descent and conjugate gradient), auto-encoders for manifold learning and dimensionality reduction, convolutional neural networks, and recurrent neural networks. Overfitting and regularization will be discussed from both theoretical and practical viewpoints. Concepts and techniques will be applied to several domains including image processing, time series analysis, natural language processing, and more. Students will gain mastery of popular deep learning frameworks in the Python ecosystem including Tensorflow and Keras. Prerequisite: DSAN-5100. 3 credits.

DSAN-6650: Reinforcement Learning

The field of machine learning is typically divided into three fundamental sub-paradigms. These include supervised learning, unsupervised learning, and reinforcement learning (RL). The discipline of reinforcement learning focuses on how intelligent agents learn to perform actions, inside a specified environment, to maximize a cumulative reward function. Over the past several decades, there has been a push to incorporate concepts from the field of deep-learning into the agents used in RL algorithms. This has spawned the field of deep reinforcement learning. To date, the field of deep RL has yielded stunning results in a wide range of technological applications. These include, but are not limited to, self-driving cars, autonomous game play, robotics, trading and finance, and Natural Language Processing. This course will begin with an introduction to the fundamentals of traditional, i.e. non-deep, reinforcement learning. After reviewing fundamental deep learning topics the course will transition to deep RL by incorporating artificial neural networks into the models. The course includes a coding emphasis to showcase applied implementations of RL within the python ecosystem. Topics include Markov Decision Processes, Multi-armed Bandits, Monte Carlo Methods, Temporal Difference Learning, Function Approximation, Deep Neural Networks, Actor-Critic, Deep Q-Learning, Policy Gradient Methods, and connections to Psychology and to Neuroscience. Students must have intermediate coding experience in Python as well as knowledge of introductory statistics and multivariable calculus. Prerequisite: DSAN-6600. 3 credits.

DSAN-6700: Machine Learning App Deployment

This course will focus on gathering data so as to build an ensemble of machine learning methods that can predict, classify, and explore data entered by a client. This is an applied class that will engage in the steps required to create and deploy online a ML application that can be utilized by a client. Topics will include a very brief review of common ML techniques including NB, SVM, DT, RF, ARM, NN, and clustering. APIs will be used to gather, explore, clean, and prepare data for training a ML model ensemble. The model will be deployed via a client-server web application to enable clients to enter data and receive a classification or prediction. Prerequisite: DSAN-5000. 3 credits.

DSAN-6750: Geographic Information Systems (GIS) and Applications

Geographic Information Systems (GIS) are used as tools for describing, analyzing, managing, and presenting information about the relationships between geographical and spatial locations, sizes, and shapes. This is known as attribute data. GIS uses techniques that can represent social and environmental data as a map, with a significant number of applications including those in engineering, architecture, public health, environmental science, and business. GIS data will be created through a variety of methods including those offered by global positioning system (GPS) technologies. This course will assume knowledge of R and Python. 3 credits.

DSAN-6800: Principles of Cybersecurity

This course explores several aspects of modern security systems, risk management, security policies, and covers an overview of digital forensics. The course begins with a definition of what information security is, the need for security policies and controls, then moves into security management, risk management, incident response and planning, discussion of ethical and legal issues surrounding security, and into technical aspects of security to include authentication, authorization, security appliances, cryptography, ending with an overview of digital forensics along with security maintenance and auditing. 3 credits.

DSAN-7000: Advanced Research Methodologies

The aim of this course is to teach students advanced research methodologies in developing data science research projects as well as effective scientific writing. Topics will include the principles of research questions development, research methods selection, effective results interpretation, effective writing, appropriate journal paper writing styles, peer review, authorship, and communicating scientific ideas to academic and non-academic audiences. In addition, this course will provide guidance on paper publishing, including how to identify an appropriate journal, navigating the selection process, editing, and measuring impact. Ideal outcomes include improved self-editing, development of effective strategies for offering and receiving concise editorial recommendations among peers, and finalizing a research paper to be submitted to academic publications (e.g., journal, conference, research reports, etc.). This course specifically targets students who are interested in learning academic research practices and/or pursuing further education opportunities in a PhD program. The ideal outcome is to make a submission for academic publication, and students are encouraged to submit high-quality papers to academic journals. Although publication cannot be guaranteed, students will experience the learning process of submitting a research paper. Prerequisites: DSAN-5000, DSAN-5100, DSAN-5300; Knowledge about data science, introductory statistics, statistical learning, and coding experience in R and/or Python. 3 credits.

Back to Top

Other Department Electives

Please be aware that courses that are part of other departments’ curriculum may have seating priority for their students, or have prerequisite restrictions. You should speak directly with the course instructor to see if you are eligible for enrolling.

Machine Learning for Bioinform (BIST-532) (new window) 

Offered by the Department of Biostatistics, Bioinformatics, and Biomathematics

Clinical Trials and Experimental Design (BIST-540) (new window)

Offered by the Department of Biostatistics, Bioinformatics, and Biomathematics

Social Network Analysis (CCTP-696) (new window)

Offered by the Communication, Culture and Technology (CCT) Master of Arts Program.

Regression Analysis (OPIM-573) (new window)

Offered by the Department of Operations & Information Management.

Database Development and Management (OPIM-654) (new window) 

Offered by the Department of Operations & Information Management.

GIS and Spatial Data Modeling for Public Policy (PPOL-683) (new window) 

Offered by the Department of Public Policy.

The Policy Issues of Big Data and Artificial Intelligence (PPOL-762) (new window) 

Offered by the Department of Public Policy.

Back to top