The n-dimensional array (ndarray) is a ubiquitous data structure in scientific computing, whether analyzing time-varying movies of neural activity, collections of satellite images, or sensor time series. The ndarray generalizes the two-dimensional matrix to support data structures spanning multiple dimensions, and many applications could benefit from efficient distributed implementations. While Spark's distributed DataFrame provides rich support for large tabular data, handling ndarrays remains a challenge. This talk introduces Bolt, an open-source implementation of an ndarray built on PySpark. Bolt provides a familiar API enabling distributed computations across one or more array dimensions at a time. It also implements an efficient chunking scheme to minimize shuffle complexity as well as analysis of shape information to simplify error handling. In this talk, we look at Bolt's design and implementation before taking a deep-dive into a particular use case -- probing how decision making works in the brain by analyzing whole-brain neuroimaging + behavioral data from an animal engaged in a virtual reality environment.


Jason Wittenbach is a Senior Machine Learning Engineer in the the "Center for Machine Learning" at Capital One. He has a BS in Mathematics and Physics from the University of Notre Dame and a PhD in Physics from The Pennsylvania State University. During his doctoral work, he studied mathematical models of neural circuits to understand how the brain uses sensory-motor feedback in decision making, using the songbird as a model system. As a postdoc at the Janelia Research Campus (Howard Hughes Medical Institute), he built computational tools for processing large neuroimaging datasets and employed machine learning techniques to uncover links between neural activity and animal behavior. A Capital One, he has worked on projects in market intelligence and anti money laundering, as well as developed an automated hyperparaemter tuning platform. Jason is interested in designing and leveraging cutting-edge machine learning techniques to solve important domain-specific problems, as well as architecting and building software tools that support these analytic methods.

  • Data Viz Challenge
    Georgetown Co-Hosts Data Visualization Challenge

    In collaboration with GWU and CGDV, Georgetown Analytics invites you to participate in a Data Visualization Challenge on April 12, 2019.

  • Georgetown Analytics Hosts Deloitte Core Consulting Series
    Georgetown Analytics Hosts Deloitte Core Consulting Series

    The MS Analytics program is partnering with the Deloitte Foundation to bring an exciting opportunity to Georgetown's campus on March 22 & 29, 2019.

  • Linguistic Diversity Around The World
    One Text / Two Languages

    This workshop covered how linguists gather, process and analyze code-switched data, exploring the NPL pipeline for processing multilingual texts and discussing various approaches to language identification.

  • Analytics student Ratnadeep Mitra stands in front of his poster next to program director Dr. Ami Gates.
    Analytics Student Showcase

    MS Analytics students showcased what they learned this semester by applying contemporary deep learning techniques to a variety of exciting topics and problems.