Title:

Learning & Exploiting Low-Dimensional Structure in High-Dimentional Data

Abstract:

This talk will focus on the problem of learning low-dimensional geometric structure in high-dimensional data. We allow the lower-dimensional subspace to be non-linear. There are a variety of algorithms available for “manifold learning” and non-linear dimensionality reduction, mostly relying on locally linear approximations and not providing a likelihood-based approach for inferences. We propose a new class of simple geometric dictionaries for characterizing the subspace, along with a simple optimization algorithm and a model-based approach to inference. We provide strong theory support, in terms of tight bounds on covering numbers, showing advantages of our approach relative to local linear dictionaries. These advantages are shown to carry over to practical performance in a variety of settings including manifold learning, manifold de-noising, data visualization, classification (providing a competitor to deep neural networks that requires fewer training examples), and geodesic distance estimation. We additionally provide a Bayesian nonparametric methodology for inference, using a new class of kernels, which is shown to outperform current methods, such as mixtures of multivariate Gaussians.

Short Bio:

David Dunson is Arts & Sciences Distinguished Professor of Statistical Science and Mathematics at Duke University.  His research focuses on developing methodology for analysis and interpretation of complex and high-dimensional data, with a particular emphasis on Bayesian and probability modeling approaches.  He is particularly interested in work at the intersection of statistics, differential geometry, and computer science.  Methods development and theory are directly motivated by applications in neuroscience, genomics, environmental health, and ecology among others.  In these settings, it is common for data to have a structured form, consisting of replicated networks/graphs, trees, functions, tensors, etc.  A focus is on developing fundamentally new frameworks for statistical inferences in challenging settings, including improving robustness to modeling assumptions and scalability to large datasets.  He has won numerous awards, including the 2010 COPSS Presidents’ Award, which is widely viewed as the most prestigious award in statistics and represents statistics version of the Field’s Medal, being given to one outstanding researcher under the age of 41 per year internationally.  His work has had substantial impact, with ~48,000 citations on google scholar and an H-index of 75.