Students in the Master of Science in Analytics, concentration in Data Science program, must successfully complete 30 credits and maintain a cumulative GPA of not less than 3.00.
Concentration in Data Sciences COURSEWORK
There are five three-credit core courses, designed to provide students with a solid foundation in data science. We also give a zero-credit programming course in the summer prior to matriculation. The additional five three-credit elective courses allow students to learn tailored skills, helping them apply data analysis to fields of interest. Coursework may be taken in any order that is allowed by the prerequisites.
Our five-course core (plus zero-credit summer course) is designed to give students an overview of the massive data landscape.
- Advanced Programming Topics - ONLINE (no credit)
The Analytics program gives an asynchronous, online course in programming preparation that covers R, Python, and command line use in the summer prior to matriculation. The course is equivalent to three credits, is designed for matriculating MS Analytics students, and is offered free of charge. It is required for incoming students who do not have a computer science degree and adequate preparation. Students admitted to the program will only have this requirement waived with approval from the Program Director (Todd Leen) or Program Coordinator (Heather Connor). This course will run during Georgetown Summer Session II (July 10 - August 11). Students must complete this course to matriculate in the fall unless granted a waiver by the program.
- Introduction to Data Analytics (ANLY-501)
This course introduces students to several core data science concepts. It teaches students how to synthesize disparate, possibly unstructured data to better understand and characterize the world, and in some cases, to draw meaningful inferences. Topics covered include: the history of data science, successes and failures in data analytics, the data analytics life cycle, data/web scraping and APIs, data wrangling, data characterization (correlations, identifying clusters and associations), data inference and basic machine learning, network analysis, data ethics, and visual analytics. Students will work on a semester-long data science project that starts with question formulation and data collection, and goes through all the stages of the life cycle, culminating in data storytelling. The course also maps data science case studies to topics presented throughout the semester. Prerequisites: Intermediate coding experience in Python3, and knowledge of introductory statistics, 3 credits.
- Massive Data Fundamentals (ANLY-502)
Today's data scientists are commonly faced with huge data sets (Big Data) that may arrive at fantastic rates and in a broad variety of formats. This core course addresses the resulting challenges. The course will introduce students to the advantages and limitations of distributed computing and to methods of assessing its impact. Techniques for parallel processing (MapReduce) and their implementation (Hadoop) will be covered, as well as techniques for accessing unstructured data and for handling streaming data. These techniques will be applied to real world examples, using clusters of computational cores and cloud computing. Prerequisite: Working knowledge of Python and the Unix command line, some knowledge of data structures and ANLY-501, 3 credits.
- Scientific and Analytical Visualization (ANLY-503)
Presenting quantitative information in visual form is an essential communication skill for data professionals. This course introduces representation methods and visualization techniques for complex data, drawing on insights from cognitive science and graphic design. Students will obtain an overview of the human visual system, learn to use models for data and for images, and acquire good design practices, such as those using the “grammar of graphics.” Students will use common statistical design tools such as graphic methods in Python3, interactive graphic methods such as Bokeh, Leaflet, and NetworkD3, the R package ggplot2, and Tableau. Prerequisites: ANLY-501,ANLY-511, ANLY-512 3 credits.
- Probabilistic Modeling and Statistical Computing (ANLY-511)
Probabilistic models are essential for the understanding of data that are affected by uncertainty. This course introduces students to the fundamentals of probabilistic modeling and then covers computational techniques for the analysis of such data. After introducing basic concepts and approaches such as probability distributions, random variables, and conditioning, the course covers basic probability distributions that are frequently used in practice and some of their properties, such as Laws of Large Numbers. In the second half, students will learn about computational techniques for the use of probabilistic models. This includes methods for faithful simulation of random variables (Monte Carlo), the extraction of condensed models from observed data (maximum likelihood, Bayesian models), methods for models with hidden or partially observed variables (latent variables, expectation-maximization, hidden Markov models), and some general data science techniques that incorporate probabilistic models (graphical models, stochastic optimization). Prerequisites: Introductory statistics, some coding experience (e.g. R), 3 credits.
- Statistical Learning for Analytics (ANLY-512)
is concerned with algorithms that use statistical techniques to find structure or patterns in given data (unsupervised learning) or use given instances of data to predict outcomes in new cases (supervised learning). A well-known method of this type is linear regression, and this will be covered early in the course. Statistical methods for making discrete predictions (classification) such as logistic regression will also be covered. Special emphasis will be placed on techniques for handling high-dimensional data (i.e. instances with many attributes), including variable selection and dimension reduction. The course will also cover ensemble methods such as bagging and boosting that are often used to improve the results of given classification methods. Unsupervised methods covered in this course include model-based and hierarchical clustering. Prerequisites: ANLY-511, 3 credits.
- Effective Presentation for Technology & Science (ANLY-520)
Clearly communicating problems, ideas, data, analysis approaches, results, and recommendations for action are vital for career success in technology and science. Strong technical writing is clear and unambiguous, easy to read, and concise. This course improves students’ writing, presentation, and critique skills. They will learn to communicate material to technical and non-technical audiences. Students will learn to write strongly by improving text clarity, simplicity, and conciseness, and incorporating high-quality graphics (LaTeX will be used for paper preparation). Students will learn to craft oral presentations that are clear, easy to follow, informative, and compelling, and will develop delivery skills that improve comprehension, audibility, comfort, and audience engagement, 3 credits.
- Computational Linguistics - Advanced Python (ANLY-521)
This course teaches advanced topics in programming for linguistic data analysis and processing using the Python language. A series of assignments will give students hands-on practice implementing core algorithms for linguistic tasks. By the end of the course, students will be able to transform pseudocode into well-written code for algorithms that make sense of textual data, and to evaluate the algorithms quantitatively and qualitatively. Linguistic tasks will include edit distance, semantic similarity, authorship detection, and named entity recognition. Python topics will include the appropriate use of data structures; mathematical objects in numpy; exception handling; object-oriented programming; and software development practices such as code documentation and version control.
- Data Ethics Privacy & Security (ANLY-530)
This course introduces, discusses, and considers a robust set of issues involved in data ethics, privacy, and security. Topics will include the ethical collection, storage, and use of data, as well as best practices for managing and sharing massive data. Concerns related to security and data privacy, as well as related consequences such as discrimination, inappropriate monetary gain, and other potential abuses will be explored. This class will also examine issues relating to fairness, transparency, and the privacy of algorithmic systems that utilize data, as well as the fairness and transparency of decisions made by machines. Topics will include how the use, storage, manipulation, and exchange of massive data can affect the individual as well as the society. Concepts such as data encryption, differential privacy, and secure multiparty computation will be addressed. 3 credits.
- Structures and Algorithms for Analytics (ANLY-550)
This course covers algorithmic techniques for solving different types of data science problems. It will cover Big O notation, data structures (arrays, stacks, queues, lists, trees, heaps, graphs), sorting and searching (binary search trees, hash tables), and algorithmic paradigms for efficient problem solving (divide and conquer, recursion, greedy algorithms, dynamic programming, etc.). It will include both theory and practice. You will learn to design, analyze, and implement fundamental data structures and algorithms. This course will provide the algorithmic background essential for further study of computer science topics. Prerequisites: ANLY-501 and ANLY-511, 3 credits.
- Optimization (ANLY-561)
Optimization is concerned with the general task of finding a set of parameters such that a given target function is made as small as possible or such that the fit with a desired goal is as close as possible. Such parameters can be numbers, but also character strings, geometric shapes, or paths in a network. These problems are ubiquitous in data science. Topics of this course include: Common mathematical optimization paradigms, efficient algorithmic techniques, and important Data Science applications of optimization over Euclidean spaces. The primary paradigms covered are Linear Programming, Convex Programming, and Semidefinite Programming. Algorithmic techniques include Line Searches, Gradient Descent, Newton's method, the Simplex Method, and Interior Point Methods. Various formulations of the least-squares problem are used to motivate theory and techniques throughout the course, and the course concludes with a selection of applications of optimization in Data Science (which may include Clustering, Community Detection, Dimension Reduction, Expectation Maximization, Latent Semantic Indexing, Neural Networks, Search, Spectral Embeddings, Stochastic Gradient Descent, Support Vector Machines, or Visualization depending upon student interest), 3 credits.
- Natural Language Processing for Data Analytics (ANLY-580)
This course will cover the major techniques for mining and analyzing textual data to extract interesting patterns, discover knowledge, and support decision-making. Students will learn the main concepts and algorithms in Natural Language Processing and their applications in data science. These include search and information retrieval, document clustering and classification, topic modeling, sentiment analysis, and deriving meaning from unstructured narratives. In addition to traditional techniques in machine learning such as regression, decision trees, and Naive Bayes algorithms, the course will also examine the latest approaches in Deep Learning. Students will be given the opportunity to develop hands-on experience in building foundational tools and machine learning algorithms that can be applied to real analytics problems. The data obtained from textual content can be used to augment numerical data for the purposes of building predictive models, identifying emerging issues, detecting opinion, and determining important relationships, Prerequisites: Working knowledge of Python, ANLY-511 and ANLY-512 or their equivalent, 3 credits.
- Neural Networks and Deep Learning (ANLY-590)
This course will explore the fundamentals of artificial neural networks (ANNs) and deep learning. The following topics will be covered: feed-forward ANNs, activation functions, output transfer functions for regression and classification, cost functions and related likelihood functions, backpropagation and optimization (including stochastic gradient descent and conjugate gradient), auto-encoders for manifold learning and dimensionality reduction, convolutional neural networks, and recurrent neural networks. Overfitting and regularization will be discussed from both theoretical and practical viewpoints. Concepts and techniques will be applied to several domains including image processing, time series analysis, natural language processing, and more. Students will gain mastery of popular deep learning frameworks in the Python ecosystem including Tensorflow and Keras, Prerequisites: ANLY-511 and ANLY-512, fluency with Python, 3 credits.
- Advanced Machine Learning (ANLY-601)
The course covers theory and practice of pattern recognition, including advanced topics in machine learning. The course builds on fundamentals from ANLY-511 & 512, extending theoretical and practical depth, and introducing advanced machine learning methods. The techniques discussed have applications in statistical signal processing, pattern recognition, anomaly and fault detection, speech and image processing and recognition, and decision science. Topics will include some mix of detection and ROC curves, parametric and non-parametric density estimation, empirical and theoretical error bounds, Bayesian methods, sparse models, nonlinear dimensionality reduction and manifold learning, mixture models and EM, non-parametric regression (Gaussian processes), elements of information theory, and neural networks, Prerequisites: ANLY-511, ANLY-512, 3 credits.
- Relational and Semi-Structured Databases and SQL Programming (ANLY-640)
This course will explore several aspects of modern database management systems, database programming, relational databases, semi-structured databases, and SQL. The course will begin with an introduction to relational models, normal forms and schema design, relational algebra, and SQL Programming. The course will focus on application development using relational databases and will introduce Big Data concepts and discuss Big Data Processing. Both structured and semi-structured data will be considered, such as XML, JSON, and record-style. Query processing methods will be applied and evaluated. Topics will also include recursion in SQL, constraints and triggers, indices and transactions, data storage including column-oriented and distributed storage, noSQL, and different types of databases, such as non-relational, scientific, parallel, and streaming. The course will discuss types of database-system architectures, including cloud-based services. Applications will coincide with data science and analytics, as well as public policy, intelligence generation, and narratives. Tools may also include cloud-based DBMS.
- ANLY Internship (ANLY-905)
The ANLY Internship course permits the student to gain practical work experience in data analysis. Internships must be directly related to the student’s academic program goals and further both their practical and academic skills. Students must obtain the approval of the ANLY Program Director to register. Approved internships must be aligned with Analytics program and provide a significant learning experience for the student. At the end of the internship, the student must submit a deliverable to the course instructor, 0.25 credits.
Please see the Course Descriptions page for elective courses in other departments that have been pre-approved by the program and will satisfy elective requirements. Additional coursework may be approved upon request, and at the discretion of the program.