Course Descriptions

Core Courses

Pre-Program Bootcamp: Programming in R and Python

The Georgetown Data Science & Analytics program gives an online course in programming preparation that covers R, Python, and command line use in the summer prior to matriculation. The course is equivalent to three credits, is designed for matriculating MS Data Science & Analytics students, and is offered free of charge. It is required for all incoming students. This course will run during the Georgetown Summer Session (May – August). Students must complete this course to matriculate in the fall.

DSAN 5000: Data Science and Analytics

This course introduces students to several core data science concepts. It teaches students how to synthesize disparate, possibly unstructured data to better understand and characterize the world, and in some cases, to draw meaningful inferences. Topics covered include: the history of data science, successes and failures in data analytics, the data analytics life cycle, data/web scraping and APIs, data wrangling, data characterization (correlations, identifying clusters and associations), data inference and basic machine learning, network analysis, data ethics, and visual analytics. Students will work on a semester-long data science project that starts with question formulation and data collection, and goes through all the stages of the life cycle, culminating in data storytelling. The course also maps data science case studies to topics presented throughout the semester. Prerequisites: Intermediate coding experience in Python3, and knowledge of introductory statistics. 3 credits. Offered in the Fall semester.

DSAN 5100: Probabilistic Modeling and Statistical Computing

Probabilistic models are essential for the understanding of data that are affected by uncertainty. This course introduces students to the fundamentals of probabilistic modeling and then covers computational techniques for the analysis of such data. After introducing basic concepts and approaches such as probability distributions, random variables, and conditioning, the course covers basic probability distributions that are frequently used in practice and some of their properties, such as Laws of Large Numbers. In the second half, students will learn about computational techniques for the use of probabilistic models. This includes methods for faithful simulation of random variables (Monte Carlo), the extraction of condensed models from observed data (maximum likelihood, Bayesian models), methods for models with hidden or partially observed variables (latent variables, expectation-maximization, hidden Markov models), and some general data science techniques that incorporate probabilistic models (graphical models, stochastic optimization). Prerequisites: Introductory statistics, some coding experience (e.g. R). 3 credits. Offered in the Fall semester.

DSAN 5200: Advanced Data Visualization

Presenting quantitative information in visual form is an essential communication skill for data professionals. This course introduces representation methods and visualization techniques for complex data, drawing on insights from cognitive science and graphic design. Students will obtain an overview of the human visual system, learn to use models for data and for images, and acquire good design practices, such as those using the “grammar of graphics.” Students will use common statistical design tools such as graphic methods in Python3, interactive graphic methods such as Bokeh, Leaflet, and NetworkD3, the R package ggplot2, and Tableau. Prerequisites: DSAN 5000. 3 credits. Offered in the Spring semester.

DSAN 5300: Statistical Learning 

Statistical Learning is concerned with algorithms that use statistical techniques to find structure or patterns in given data (unsupervised learning) or use given instances of data to predict outcomes in new cases (supervised learning). A well-known method of this type is linear regression, and this will be covered early in the course. Statistical methods for making discrete predictions (classification) such as logistic regression will also be covered. Special emphasis will be placed on techniques for handling high-dimensional data (i.e. instances with many attributes), including variable selection and dimension reduction. The course will also cover ensemble methods such as bagging and boosting that are often used to improve the results of given classification methods. Unsupervised methods covered in this course include model-based and hierarchical clustering. Prerequisites: DSAN 5100. 3 credits. Offered in the Spring semester.

DSAN 6000: Big Data and Cloud Computing

Today’s data scientists are commonly faced with huge data sets (Big Data) that may arrive at fantastic rates and in a broad variety of formats. This core course addresses the resulting challenges. The course will introduce students to the advantages and limitations of distributed computing and to methods of assessing its impact. Techniques for parallel processing (MapReduce) and their implementation (Hadoop) will be covered, as well as techniques for accessing unstructured data and for handling streaming data. These techniques will be applied to real world examples, using clusters of computational cores and cloud computing. Prerequisite: Working knowledge of Python and the Unix command line, some knowledge of data structures and DSAN 5000. 3 credits. Offered in the Fall semester.

Elective Courses

DSAN 5400: Computational Linguistics – Advanced Python

This course presents topics in Natural Language Processing (NLP) and Python programming. The goal of this class is to explore techniques in NLP, with a strong emphasis on hands-on instruction that progressively matures basic Python users into expert Python developers. We will examine topics such as text classification, model evaluation, machine translation, and distributed representations. Throughout the semester, students will select and read a book on AI ethics to motivate discussions on the social impact of modern NLP technologies. Applications include authorship identification, retrieval, and textual similarity, to name a few.

About half of the total class time is devoted to addressing an essential but often neglected piece in software development education: moving from typical data science programming workflows (such as writing basic scripts) to developing sophisticated Python projects. In other words, students will learn to design professional-grade software that they and others will be proud to contribute to together. Programming topics are explored in great depth, including Python best practices, object-oriented design, project structuring, and more. This class will give students the skills they need to contribute to the professional software repositories they work with already and even develop their own. 3 credits. Offered in the Fall and Spring semesters.

DSAN 5450: Data Ethics and Policy

This graduate-level course will train students to navigate the landscape of ethical issues which inevitably arise, across a variety of fields and industries, in each step of the data science process. Students will explore and critically evaluate a range of data-related issues in contemporary society, such as responsible data collection, algorithmic bias, privacy, transparency, accountability, democratic participation in data usage and data-driven decisions, and the ethical implications of emerging technologies like artificial intelligence and machine learning (self-driving cars, ChatGPT, crowd-sourced training data, etc.).

Through a combination of theoretical discussions and real-world case studies, students will examine the profound public policy and social issues associated with data ethics. This course will empower students by introducing a set of general ethical frameworks (consequentialism, deontological ethics, and virtue ethics) and discussing their relative strengths and weaknesses in terms of their ability to address modern ethical dilemmas and guide ethical decision-making processes in business, healthcare, government, and academia. These theoretical frameworks will then be discussed in light of more practical regulatory and policy considerations, so that students will have the tools they need to draw conclusions (in the form of their final projects) about best practices for data handling within a particular field or topic of interest to them.

The course will thus equip students with a robust ethical “toolbox” for conscientiously gathering, interpreting, and extracting meaning from data while respecting privacy, fairness, transparency, democratic accountability, and other social concerns. 3 credits. Offered in the Spring semester.

DSAN 5500: Data Structures, Objects, and Algorithms in Python

The  Data Structures, Objects, and Algorithms in Python course will look at built-in data structures, such as dictionaries, lists, tuples, sets, strings, and frozen sets. The course will also cover objects and classes in Python, as well as building new structures and objects. The class will cover algorithms including runtime, recurrence, and development. Applications will include data science problems. Prerequisite: A working or intermediate knowledge of Python. 3 credits. Offered in the Spring semester.

DSAN 5550: Data Science and Climate Change

Data Science, as a key component of Artificial Intelligence, is helping shape profound changes to society.  An equally forceful phenomenon affecting our world is climate change.  This course will investigate the myriad ways Data Science can be used to address climate change.  This will include aspects of climate change which Data Science is already beginning to tackle, such as mitigating emissions from the five most carbon-intensive societal activities – energy, manufacturing, agriculture / land use, transportation, and buildings / infrastructure.  We will also look at Data Science’s emerging role in areas such as climate modeling, biodiversity conservation, carbon capture, climate mitigation finance, geoengineering, climate ethics, and reducing the carbon footprint of Data Science itself.  We will see how the following Data Science and Machine Learning topics can be used to address climate change: regression, gradient boosting, causal inference, interpretability, optimization, image processing, natural language processing, reinforcement learning, time-series analysis, and several neural network architectures.  Using a variety of existing data sources, students will undertake a final project of their choosing to apply a Data Science technique to an aspect of climate change. Prerequisite: DSAN 5000. 3 credits. Offered in the Fall semester.

DSAN 5600: Applied Time Series for Data Science

The analysis of experimental data collected at different time points presents a fascinating challenge in the realm of statistical modeling and inference. The inherent correlations in Time Series introduced by the sampling of adjacent time points can significantly constrain the applicability of traditional statistical and Machine Learning techniques directly, which often rely on the assumption of independent and identically distributed observations. This course offers a systematic exploration of these challenges through the lens of Time Series Analysis, providing students with the tools to unravel the mathematical and statistical complexities associated with temporal dependencies.

The class ventures into multifaceted aspects of time series analysis, encompassing traditional models such as ARIMA, SARIMA,ARFIMA,  Multivariate TS: VAR, ARIMAX, SARIMAX and VARMAX as well as specialized domains such as Financial Time Series modeling: ARCH, GARCH, GJR-GARCH, E-GARCH, M-GARCH models, Bayesian structural time series (BSTS) models,Spectral Analysis, and harness the potential of Deep Learning techniques: Fully recurrent(RNN),Long short-term memory (LSTM),Gated recurrent unit (GRU),  Bi-directional, Continuous-time, etc ,Transformers for Time Series data.

In this course, students will acquire a robust analytical skill set, encompassing model fitting, statistical methods, data visualization, and data storytelling. This course will include structured programming with the R and python languages.

One of the distinctive features of this course is its emphasis on practical applications. Students will have the opportunity to delve into real-world analysis across diverse domains, including but not limited to, Stock Market Analysis, where students explore intricate financial time series, Econometrics involving the application of time series analysis to economic data for forecasting macroeconomic indicators and market trends, COVID-19 Impact Analysis, Climate Data Analysis, and more. Additionally, students will undertake self-directed time series analysis projects, a mini capstone project, allowing them to apply their acquired skills to areas of personal interest, culminating in the creation of a comprehensive portfolio(a website)—a testament to their proficiency in handling complex temporal data across multifaceted domains.

This course equips students not only with theoretical knowledge but also practical skills that are invaluable in tackling real-world data challenges across a wide spectrum of applications. It is recommended that students know multivariate calculus, linear algebra, probability, and statistics at the undergraduate level. 3 credits. Offered in the Spring semester.

DSAN 5700: Blockchain Technologies for Data Science

This course is designed to provide hands-on experience in building public and private blockchains. Students will gain the critical insight, practical knowledge, and technical skills required to design and integrate successful blockchain technologies into a business domain. The course covers the basics of Blockchain technologies including decentralized ledgers, consensus mechanisms, Public and Private key cryptography, smart contracts, etc. The class also examines the public policy and social issues addressed by and arose from adopting blockchain technology in finance, supply chain, and healthcare. The course utilizes Blockchain services (such as Azure) to teach blockchain development on platforms such as Ethereum (or other) and provides the necessary path for learning to build Blockchain networks at scale. Blockchain technologies (distributed ledger, smart contracts, etc.) are a new paradigm in data management and sharing technologies. Gartner Technology hype cycle predicts blockchain to be the industrial ready phase in the next five years. Students will take a deep dive into organizations with integrated blockchain technologies as part of their business strategy. Blockchain’s mainstream use by firms like Walmart, Merck, BlackRock will mean that our Data Analytics program graduates will be interacting with the technology often and will require expertise in handling, extracting, converting data to and from the blockchain. This course will provide an early introduction to the graduates, making them Blockchain-ready. 3 credits. Offered in the Spring semester.

DSAN 5800: Advanced Natural Language Processing

This course provides a formalism for understanding the statistical machine learning methods that have come to dominate natural language processing. Divided into three core modules, the course explores (i) how language understanding is framed as a tractable statistical inference problem, (ii) a formal yet practical treatment of the DNN architectures and learning algorithms used in NLP, and (iii) how these components are leveraged in modern AI systems such as information retrieval, recommender systems, and conversational agents. In exploring these topics, the course exposes students to the foundational math, practical applications, current research directions, and software design that is critical to gaining proficiency as an NLP/ML practitioner. The course culminates in a capstone project, conducted over its final six weeks, in which students apply NLP to an interesting problem of their choosing. In past semesters students have built chatbots, code completion tools, stock trading algorithms, just to name a few. This course assumes a basic understanding of linear algebra, probability theory, first order optimization methods, and proficiency in Python.
This is an advanced course. Suggested prerequisites are DSAN 5000, DSAN 5100 and DSAN 5400. However, first-year students with the necessary math, statistics, and deep learning background will be considered. 3 credits. Offered in the Fall semester.

DSAN 5900: Digital Storytelling

To be successful, a data scientist needs to have many skills beyond coding. This course will teach you how to communicate your findings and data to users, clients, stakeholders to make the biggest impact on your organization and your career. The course consists of two types of activities: lectures and exercises. Lectures provide a theoretical foundation of storytelling; the exercises are designed to help you learn practical skills that work best for different audiences. You will learn what fits your personal style and practice the power of storytelling that can inform and influence decision-makers.

Writing topics will include technical writing, writing for action, and writing for communication in data science with respect to non-technical readers. Storytelling and visualization topics will include methods for presenting results as conclusions, presenting results as actionable items, and creating visual narratives. Interactive visualizations will focus not only on clearly illustrating information within data for use in decision-making, but also for use in discovery and exploration, as well as question generation. Topics in expression, color use, and style will be included. You will learn how to make a point with your charts, and how to make your charts clear to the audience. Further topics in the area of information presentation will include speaking with clarity to a team, group, or large audience.

All parts of the course will contain applications that focus on the utilization of the results of data science and analytics to promote public good, to encourage social concerns and equality, and to support change in areas such as business, public health, and public policy. 3 credits. Offered in the Spring semester.

DSAN 5925: Internship

An internship provides the student with practical work experience. This course enables international students to do internships at US companies. Students must obtain the approval from the Program Director and submit a Curricular Practical Training (CPT) form to proceed with an internship under this course. To be approved, an internship must be aligned with the Data Science and Analytics program goals and provide a significant learning experience for the student. At the end of the internship, the student must submit a deliverable to the course instructor, 0.25 credits. (The program does not require students to complete an internship.) Offered every semester.

DSAN 6150: Biological and Biomedical Data Science

We are bombarded everyday with multiple claims of health risks (doing this will ruin your health), new treatments and cures (just take this for 30 days for a new you), and better lifestyle choices. How are these claims made, evaluated and validated using data science? Data drives our knowledge of biology, disease and effective treatments. This data is diverse, complex, large, and in many respects unique. This data drives our understanding of whether risk factors or treatments causally change our health outcomes, whether our genes or our environment affects our health, and decisions about drugs, protocols and public health that affect all of us everyday. In this class we explore this rich, diverse data landscape and the specialized methods needed to make sense of it, leveraging the instructor’s decades-long experience in collaborative epidemiological and biomedical research across academia, government and industry. We will explore designing good experiments to extract causal relationships, and how we might still make valid decisions even in non-ideal settings. We will explore high-dimensional multivariate data and evaluate the validity of finding a “needle in a haystack” biomarker that can be targeted for treatment. We will see how statistical modeling (survival analysis in particular), machine learning, AI, and explainable AI have made an impact in helping us understand this world within. We will see how data-driven decision making using Bayesian analysis works. This journey will take us through real-life applications in bioinformatics (understanding how genes, proteins and other molecular markers affect disease), epidemiology (how do diseases spread and how interventions can prevent it) and clinical research (clinical trials, observational studies, case-control studies). DSAN 5100 is a prerequisite. However, first-year DSAN students with the necessary statistics background will be considered. 3 credits. Offered in the Fall semester.

DSAN 6300: Database Systems and SQL

This course will explore several aspects of modern database management systems, database programming, relational databases, semi-structured databases, and SQL. The course will begin with an introduction to relational models, normal forms and schema design, relational algebra, and SQL Programming. The course will focus on application development using relational databases and will introduce Big Data concepts and discuss Big Data Processing. Both structured and semi-structured data will be considered, such as XML, JSON, and record-style. Query processing methods will be applied and evaluated. Topics will also include recursion in SQL, constraints and triggers, indices and transactions, data storage including column-oriented and distributed storage, noSQL, and different types of databases, such as non-relational, scientific, parallel, and streaming. The course will discuss types of database-system architectures, including cloud-based services. Applications will coincide with data science and analytics, as well as public policy, intelligence generation, and narratives. Tools may also include cloud-based DBMS. 3 credits. Offered in the Fall semester.

DSAN 6400: Network Analytics

The design and analysis of networks to represent interactions between and within data is a quickly emerging discipline of significant importance. Data Analytics combines graph theory, optimization, data science, data visualization, community and cluster analysis, and more. Topics in this course will help answer intriguing questions such as, “How can we make sense of large, highly-associated data sets, ranging from social networks to the smart power grid?” or “Which models are more accurate for predicting popularity on Twitter?” or “How can we estimate the spread of a contagion or of information?” The course will begin with a discussion of applications, specifically to data science and analytics. From there, a formal framework for analysis of graphs and trees will be introduced. This will include graph theory and representation, optimization, and graph-based algorithms. Next packages in Python and/or R will be investigated for the purposes of exploring and visualizing data that contain relationships. These packages will then be used to model and analyze complex data sets for the purposes of community detection, path analysis, influencer assessment, logistics analytics, contagion or information spread (such as rumor spreading), web page ranking, and more. Examples of data science applications are provided with real-world data sets including social network data, web-based data, attributed data, flow data, biological data, and more. 3 credits. Offered in the Summer semester.

DSAN 6500: Computer Vision & Generative Image Modeling

Computer Vision Analytics & Generative Image Modeling offers a comprehensive introduction to image mining and computer vision. The course covers image acquisition, representation, and processing, including convolution, Fourier transforms, filters, and feature generation. Advanced topics include classification, segmentation, spatial relations, deep-fake detection, object tracking, and image sentiment analysis. Students will explore cutting-edge techniques such as diffusion models, Vision Transformers, advanced GANs (StyleGAN, CycleGAN), Neural Radiance Fields (NeRFs), zero-shot and few-shot learning, image-to-image translation, self-supervised learning, 3D generative models, cross-modal generation, and adversarial robustness. Practical applications include facial recognition, OpenCV in Python, deep fake detection, gesture analysis, and object/scene categorization. This course equips students with the skills to develop sophisticated computer vision and generative image solutions. 3 credits. Offered in the Spring semester.

DSAN 6550: Adaptive Measurement with AI

This course provides an opportunity for students to engage in learning new algorithms and data science methods in measurement that is applied across all research fields. Unlike traditional one-size-fits-all assessments, how to make an adaptive test, survey, scale, or game for individuals that are tailored by their ability, interests, behavior, health status, and learning requirements is the major theme to be explored in this course. Topics will include but not limited to fundamental psychometric modeling, item response theory, item bank, item information, equating and differential item functioning, cognitive diagnostic modeling, adaptive testing, game-based assessment, sequence mining on process data, generalized models in large-scale assessment, automated item scoring with NLP, personalized assessment design, and generative AI for automated item generation. In addition, this course will invite 1-2 experts from industry to share the fresh ideas and latest products in measurement to the class. The knowledge taught in this course is emergent and arousing incredibly increasing attention in recent years, especially in high demand at organizations or government that manage large-scale assessments (e.g., World Bank, USAID, OECD, AIR, NCES), high-tech learning, education, game companies (e.g., Pearson, ETS, Duolingo, Roblox), e-commercial high tech platforms imbedded with behavioral science (e.g., Amazon, Meta, Google), public health (e.g., NIH), medical recovery services (e.g., hospital, mental health center), just name a few, and extended to general needs in interdisciplinary research and survey designs. Prerequisite: Basic statistical knowledge and programming skills in R or Python. 3 credits. Offered in the Spring semester.

DSAN 6600: Neural Networks and Advanced Deep Learning

Neural Networks and Advanced Deep Learning explores both foundational and cutting-edge deep learning techniques. Students start with a short review of core concepts such as feed-forward networks, activation functions, backpropagation, and optimization using TensorFlow and Keras. The course reviews convolutional and recurrent neural networks, auto-encoders, and methods to prevent overfitting. Advanced topics include Transformers and attention mechanisms, Graph Neural Networks, self-supervised and contrastive learning, neural architecture search, adversarial robustness, energy-based models, and neural ODEs. Additionally, the curriculum delves into advanced optimization, neuro-symbolic AI, and bio-inspired deep learning. Various practical applications are covered to equip students with the skills to address complex deep learning challenges across diverse domains.  Prerequisite: DSAN 5100. 3 credits. Offered in the Fall and Spring semesters.

DSAN 6650: Reinforcement Learning

The field of machine learning is typically divided into three fundamental sub-paradigms. These include supervised learning, unsupervised learning, and reinforcement learning (RL). The discipline of reinforcement learning focuses on how intelligent agents learn to perform actions, inside a specified environment, to maximize a cumulative reward function. Over the past several decades, there has been a push to incorporate concepts from the field of deep-learning into the agents used in RL algorithms. This has spawned the field of deep reinforcement learning. To date, the field of deep RL has yielded stunning results in a wide range of technological applications. These include, but are not limited to, self-driving cars, autonomous game play, robotics, trading and finance, and Natural Language Processing. This course will begin with an introduction to the fundamentals of traditional, i.e. non-deep, reinforcement learning. After reviewing fundamental deep learning topics the course will transition to deep RL by incorporating artificial neural networks into the models. The course includes a coding emphasis to showcase applied implementations of RL within the python ecosystem. Topics include multi-armed bandits, contextual bandits, function approximation via neural networks, Markov decision processes, Monte Carlo methods, temporal difference learning, deep Q-learning, actor-critic methods, and policy gradient methods. Students must have intermediate coding experience in Python as well as knowledge of introductory statistics and multivariable calculus. 3 credits. Offered in the Fall semester.

DSAN 6700: Machine Learning App Deployment

This course will focus on gathering data so as to build an ensemble of machine learning methods that can predict, classify, and explore data entered by a client. This is an applied class that will engage in the steps required to create and deploy online a ML application that can be utilized by a client. Topics will include a very brief review of common ML techniques including NB, SVM, DT, RF, ARM, NN, and clustering. APIs will be used to gather, explore, clean, and prepare data for training a ML model ensemble. The model will be deployed via a client-server web application to enable clients to enter data and receive a classification or prediction. Prerequisite: DSAN 5000. 3 credits. Offered in the Fall semester.

DSAN 6725: Applied Generative AI for AI Developers

This course is designed for AI developers aiming to build cutting-edge Generative AI (GenAI) applications. Focusing on the applied side of AI, students will explore key techniques such as in-context learning (ICL), retrieval-augmented generation (RAG), AI agents, and responsible AI principles. The course covers advanced tools and methods, including embedding models, inference optimizations (e.g., quantization, multi-adapter swapping), fine-tuning of pre-trained models and benchmarking LLMs. Students will gain hands-on experience with open-source tools like LangChain, LlamaIndex, and platforms such as AWS, applying their skills in practical GenAI applications. The course culminates in a capstone project, preparing participants to deploy scalable, optimized AI systems in real-world scenarios. This course bridges the gap between data science knowledge and applied AI development, empowering students to solve industry-level challenges. This is an advanced course open to second-year students. However, first-year students may register with permission from the instructor. 3 credits. Offered in the Spring semester.

DSAN 6750: Geographic Information Systems (GIS) and Applications

Geographic Information Systems (GIS) are used as tools for describing, analyzing, managing, and presenting information about the relationships between geographical and spatial locations, sizes, and shapes. This is known as attribute data. GIS uses techniques that can represent social and environmental data as a map, with a significant number of applications including those in engineering, architecture, public health, environmental science, and business. GIS data will be created through a variety of methods including those offered by global positioning system (GPS) technologies. This course will assume knowledge of R and Python. 3 credits. Offered in the Fall semester.

DSAN 6800: Principles of Cybersecurity

This course explores several aspects of modern security systems, risk management, security policies, and covers an overview of digital forensics. The course begins with a definition of what information security is, the need for security policies and controls, then moves into security management, risk management, incident response and planning, discussion of ethical and legal issues surrounding security, and into technical aspects of security to include authentication, authorization, security appliances, cryptography, ending with an overview of digital forensics along with security maintenance and auditing. 3 credits. Offered in the Fall and Spring semesters.

DSAN 6850: NLP with Large Language Models

In recent times, Large Language Models (LLMs) have earned the attention of the world. OpenAI’s infamous generative LLM, ChatGPT, became the fastest-growing consumer application in history in only two months–and the feverish interest around LLMs continues to grow. This course is concerned with applying LLMs to natural language processing (NLP) problems in real-life settings. This is a seminar-based course, so students will spend the majority of time outside of class reading, with in-class time dedicated to presenting and discussing recent research developments in NLP. The course will begin with a review of the transformer architecture that underlies LLMs and describe its prominent role in modern NLP. Then, we will discuss modern issues using transformers, including: the training and scaling of transformer-based models, variations on the classic transformer, transfer learning in low-resource settings, model deployment, distributed systems, and more. Meta-learning, multimodal learning, and societal impact will also be covered. Students will work on applications such as cross-language information retrieval, machine translation, prompt engineering, and select tasks outside of NLP. By the end of the course, students will have mastered transformer-based models and will be poised to use them at the cutting edge of NLP practice today. 3 credits. Offered in the Spring semester.

DSAN 7000: Advanced Research Methodologies (Capstone Project)

The aim of this course is to teach students advanced research methodologies in developing data science research projects as well as effective scientific writing. Topics will include the principles of research questions development, research methods selection, effective results interpretation, effective writing, appropriate journal paper writing styles, peer review, authorship, and communicating scientific ideas to academic and non-academic audiences. In addition, this course will provide guidance on paper publishing, including how to identify an appropriate journal, navigating the selection process, editing, and measuring impact. Ideal outcomes include improved self-editing, development of effective strategies for offering and receiving concise editorial recommendations among peers, and finalizing a research paper to be submitted to academic publications (e.g., journal, conference, research reports, etc.). This course specifically targets students who are interested in learning academic research practices and/or pursuing further education opportunities in a PhD program. The ideal outcome is to make a submission for academic publication, and students are encouraged to submit high-quality papers to academic journals. Although publication cannot be guaranteed, students will experience the learning process of submitting a research paper. Prerequisites: DSAN 5000, DSAN 5100, DSAN 5300; Knowledge about data science, introductory statistics, statistical learning, and coding experience in R and/or Python. 3 credits. Offered in the Fall semester.

Back to Top