Scalable nd-arrays for Neuroimaging and Beyond

Posted in News Story  |  Tagged

Jason Wittenbach, Capital One


The n-dimensional array (ndarray) is a ubiquitous data structure in scientific computing, whether analyzing time-varying movies of neural activity, collections of satellite images, or sensor time series. The ndarray generalizes the two-dimensional matrix to support data structures spanning multiple dimensions, and many applications could benefit from efficient distributed implementations. While Spark’s distributed DataFrame provides rich support for large tabular data, handling ndarrays remains a challenge. This talk introduces Bolt, an open-source implementation of an ndarray built on PySpark. Bolt provides a familiar API enabling distributed computations across one or more array dimensions at a time. It also implements an efficient chunking scheme to minimize shuffle complexity as well as analysis of shape information to simplify error handling. In this talk, we look at Bolt’s design and implementation before taking a deep-dive into a particular use case — probing how decision making works in the brain by analyzing whole-brain neuroimaging + behavioral data from an animal engaged in a virtual reality environment.


Jason Wittenbach is a Senior Machine Learning Engineer in the the “Center for Machine Learning” at Capital One. He has a BS in Mathematics and Physics from the University of Notre Dame and a PhD in Physics from The Pennsylvania State University. During his doctoral work, he studied mathematical models of neural circuits to understand how the brain uses sensory-motor feedback in decision making, using the songbird as a model system. As a postdoc at the Janelia Research Campus (Howard Hughes Medical Institute), he built computational tools for processing large neuroimaging datasets and employed machine learning techniques to uncover links between neural activity and animal behavior. A Capital One, he has worked on projects in market intelligence and anti money laundering, as well as developed an automated hyperparaemter tuning platform. Jason is interested in designing and leveraging cutting-edge machine learning techniques to solve important domain-specific problems, as well as architecting and building software tools that support these analytic methods.

More Analytics Seminars

Student Perspective: Torqata Reinvent the Wheel Hackathon

February 2nd, 2023

Participating in a Hackathon is an important milestone in any data scientist’s journey. Hackathons provide an exciting opportunity to take the skills and techniques you’ve learned and apply them to solving a challenging business problem for a company.…

DSAN Partners with Lander Analytics for R Gov Conference

December 15th, 2022

The first two days of December saw DSAN partnering with Lander Analytics to host the R Gov Conference. The R Gov Conference hosts one of the most elite gatherings of data scientists and data professionals who come together to explore, share, and inspire ideas, and to promote the growth of open-source ideals.…

DSAN and DataKindDC Partner for a DataDive for Social Impact

November 7th, 2022

On Saturday, October 22, we partnered with DataKindDC for an all-day data dive, giving our students a chance to work in real-time alongside local volunteer data scientists on projects for four non-profits to move the selected organization’s programming forward and benefit the communities they serve.…