Faculty Speaker Abstracts

Dr. Feifang Hu | Chair, Department of Statistics- George Washington University

Talk: AI Statisticians in Clinical Trials

Statisticians play a central role in clinical trials, contributing to study design, sample size
determination, randomization, interim monitoring, data analysis, and regulatory
reporting. As clinical trials become increasingly complex and data-rich, there is growing
interest in leveraging artificial intelligence to augment and partially automate statistical
workflows. This talk provides an overview of the traditional responsibilities of
statisticians in clinical trials and examines how recent advances in AI can be used to
develop “AI statisticians”—intelligent agents capable of assisting with key tasks such as
protocol design, adaptive randomization, model selection, interim analysis, and reporting.
We will discuss the conceptual framework for building such AI agents, including the
integration of statistical principles with large language models, domain knowledge, and
decision-making algorithms. Through concrete demonstrations, the talk will illustrate
how AI statisticians can support or partially replace routine statistical functions while
maintaining rigor and reproducibility. The presentation will also highlight current
limitations, ethical considerations, and regulatory challenges, and outline future
directions for the role of AI in transforming statistical practice in clinical trials.

Dr. Qiwei Britt He | Provost’s Distinguished Associate Professor in the Data Science and Analytics- Georgetown University

Talk: Reimagining Future Educational Assessments Through Multimodal Data and AI Integration

Computer-based assessments create new opportunities to capture rich, fine-grained log data from human-machine interactions, offering deeper insights into test-takers’ strategies and behaviors. These multidimensional sequential data, such as spanning actions, timing information, eye-tracking, and other behavioral traces, pose significant challenges for traditional unidimensional sequence models. This presentation provides an overview of sequence mining techniques for analyzing unstructured process data, with a focus on how these methods can inform item design and strengthen measurement frameworks. It will also discuss the broader potential of integrating generative AI techniques for test design in large-scale assessments, highlighting emerging opportunities for more adaptive, inclusive, and behaviorally grounded measurement systems.

Dr. Michael Baron | Professor of Statistics- American University

Talk: Multiple Testing and Minimax Error Spending

Multiple hypothesis testing is one of central problems in modern clinical trials and many
other applications. Such studies require statistical decisions for each individual hypothesis, while controlling the Type I and Type II familywise error rates. By introducing asymmetry into error spending across hypotheses, we formulate, justify, and solve the associated minimax optimization problems that optimize the sample size, cost, or risk, while maintaining rigorous control of familywise error rates.

Dr. Takumi Saegusa | Associate Professor- Statistics- University of Maryland

Talk: Data Integration: From Classical Sampling to Big Data

The rapid expansion of AI-driven data environments has transformed the landscape of data collection and integration. Data integration itself is not new in statistical methodology; for example, multiple-frame surveys combine heterogeneous datasets obtained through well-designed surveys. However, methods originally developed for multiple-frame surveys must now be reconsidered in a broader setting in which non-probability samples, collected without known sampling designs, play an increasingly prominent role. We discuss what has been developed so far and what remains to be done in the context of estimating a survival function. We also consider ways in which AI may help address challenges arising from overlapping datasets and dependence among observations.

Dr. Wanli Qiao | Associate Professor- Statistics- George Mason University

Talk: From Modes to Ridges: Consistency of Mean Shift Algorithms

The Mean Shift algorithm is a non-parametric technique used to locate the local maxima, or modes, of a density function. By iteratively shifting points toward regions of higher density, the procedure offers a natural way to identify cluster centers. While it is a classical algorithm for modal clustering, its extension, the Subspace Constrained Mean Shift (SCMS) algorithm, can be used to capture more complex structures called density ridges, which serve as a geometric skeleton for the underlying data distribution. We present a comprehensive framework for the statistical consistency of these procedures. First, we discuss recent results providing rigorous guarantees for the recovery of cluster structures via the hill-climbing Mean Shift algorithm. We then extend these principles to the SCMS algorithm, establishing its consistency in ridge estimation and exploring the geometric properties that enable the recovery of manifold-like structures. Together, these results provide a formal basis for understanding how Mean Shift-type algorithms navigate the geometric landscapes of complex distributions.