Recent Research and Publication Summary
* Strawn, N. (2021). Filament Plots for Data Visualization. arXiv preprint arXiv:2107.10869.
The efficiency of modern computer graphics allows us to explore collections of space curves simultaneously with “drag-to-rotate” interfaces. This inspires us to replace “scatterplots of points” with “scatterplots of curves” to simultaneously visualize relationships across an entire dataset. Since spaces of curves are infinite dimensional, scatterplots of curves avoid the “lossy” nature of scatterplots of points. In particular, if two points are close in a scatterplot of points derived from high-dimensional data, it does not generally follow that the two associated data points are close in the data space. Standard Andrews plots provide scatterplots of curves that perfectly preserve Euclidean distances, but simultaneous visualization of these graphs over an entire dataset produces visual clutter because graphs of functions generally overlap in 2D. We mitigate this visual clutter issue by constructing computationally inexpensive 3D extensions of Andrews plots. First, we construct optimally smooth 3D Andrews plots by considering linear isometries from Euclidean data spaces to spaces of planar parametric curves. We rigorously parametrize the linear isometries that produce (on average) optimally smooth curves over a given dataset. This parameterization of optimal isometries reveals many degrees of freedom, and (using recent results on generalized Gauss sums) we identify a particular member of this set which admits an asymptotic “tour” property that avoids certain local degeneracies as well. Finally, we construct unit-length 3D curves (filaments) by numerically solving Frenet-Serret systems given data from these 3D Andrews plots. We conclude with examples of filament plots for several standard datasets, illustrating how filament plots avoid visual clutter. Code and examples available at https://github.com/n8epi/filaments (new window). Below is an example of filament plots for the Wisconsin breast cancer dataset.
* George, T. B., Strawn, N. K., & Leviyang, S. (2021). Tree-Based Co-Clustering Identifies Chromatin Accessibility Patterns Associated With Hematopoietic Lineage Structure. Frontiers in Genetics, 1892.
Using an ATACseq dataset recently published by the ImmGen consortium, we construct associations between chromatin accessibility and hematopoietic cell types using a novel co-clustering approach that accounts for the structure of the hematopoietic, differentiation tree. Under a model in which all loci and cell types within a co-cluster have a shared accessibility state, we show that roughly 80% of cell type associated accessibility variation can be captured through 12 cell type clusters and 20 genomic locus clusters, with the cell type clusters reflecting coherent components of the differentiation tree. Using publicly available ChIPseq datasets, we show that our clustering reflects transcription factor binding patterns with implications for regulation across cell types. We show that traditional methods such as hierarchical and kmeans clusterings lead to cell type clusters that are more dispersed on the tree than our tree-based algorithm. We provide a python package, chromcocluster, that implements the algorithms presented.
*Strawn, N. (2017, August). Framed frames for data frames. In Wavelets and Sparsity XVII (Vol. 10394, p. 103941A). International Society for Optics and Photonics.
This work considers mapping datasets to sets of images via “image space embeddings”. Such embeddings induce linear dictionaries for data via standard dictionaries for image processing (e.g. wavelets, shearlets, etc). The images obtained via these embeddings may also be used to visualize data in a “lossless” manner. Below, the first image is 100 image space mappings of “benign” examples from the Wisconsin breast cancer dataset. The second image is 100 “malignant” examples from the same dataset.
* Minsker, S., & Strawn, N. (2017). Distributed Statistical Estimation and Rates of Convergence in Normal Approximation. arXiv preprint arXiv:1704.02658.
In this work, we consider finite sample bounds for Geometric Medians, and we also show that approximate minimizers of the mean norm difference functional also approximate the Geometric Median in an asymptotically stable manner.