Sohail Bahmani, Georgia Institute of Technology

Nonlinear regression via convex programming

We consider a class of parametric regression problems where the signal is observed through random nonlinear functions with a Difference of Convex (DC) form. This model describes a broad subset of nonlinear regression problems that includes familiar special cases such as phase retrieval/quadratic regression and blind deconvolution/bilinear regression. Given the DC decomposition of the observation functions as well as an approximate solution, we formulate a convex program as an estimator that operates in the natural space of the signal. Our approach is computationally superior to the methods based on semidefinite/sum-of-squares relaxation---tailored for polynomial observation functions---and can compete with the non-convex methods studied in special regression problems. Furthermore, under mild moment assumptions, we derive the sample complexity of the proposed convex estimator using a PAC-Bayesian argument. We instantiate our results with bilinear regression with Gaussian factors and provide a method for constructing the required initial approximate solution.

Bernhard Bodmann, University of Houston

Random tessellations of projective space and accurate localization with yes/no questions

The problem considered in this talk is the approximate recovery of a one-dimensional subspace in $R^n$ or $C^n$ from qualitative information. To this end, we choose subspaces $\{V_j\}_{j=1}^m$.
For each vector $x$, we then obtain information whether $x$ is closer to $V_j$ or to the orthogonal complement $V_j^\perp$. Based on the answers to these yes/no questions, we estimate the span of $\{x\}$.
Our choice of the subspaces is uniform at random with respect to the measure induced by the Haar measure on the unitaries. When the dimension $n$ is even, it is advantageous to choose the subspaces to be of dimension $n/2$. The results we obtain are related to a 1-bit compressed sensing recovery strategy by Plan and Vershynin, which is based on measuring with random Gaussian quadratic forms. For fixed $n$,
the error bound based on projections has asymptotically better accuracy than the bound obtained with Gaussian quadratic forms. The main result is as follows: For any $\alpha>0$, there is a universal constant $C_\alpha > 0$ such that for any even, sufficiently large $n$, $\delta > 0$, and $P = \{P_j\}_{j=1}^m$ independent,
uniformly distributed rank $n/2$-projections, with $m ≥ C_\alpha n^2\delta^{-2} log(n\delta^{-1}) the following property
holds with probability greater than $1-n^{-\alpha}$: For each $W = x \otimes x^*$, let $W$ be the projection onto the eigenspace corresponding to the maximal eigenvalue of
$$
H = \sum_{j=1}^m sgn(tr[W P_j ] − 1/2)P_j,
$$
then the operator norm bound
$$
\|W − x \otimes x^*\| ≤ \delta
$$
holds for all $x$, $\|x\| = 1$.

This is joint work with Dylan Domel-White.

Javier Alejandro Chavez-Dominguez, University of Oklahoma

Frame potential for finite-dimensional Banach spaces

Frames for Hilbert spaces, as overcomplete versions of bases, are quite useful in applications because they provide decompositions that are more robust. Those frames that consist of vectors of norm one and are additionally tight have even more computational advantages, e.g. they provide fast convergence for said decompositions.
Benedetto and Fickus have defined a \emph{frame potential}, a numerical quantity that can be assigned to a collection of finitely many vectors in a Hilbert space, which characterizes unit norm tight frames: a sequence of $k$ norm-one vectors in an $n$-dimensional Hilbert space (where $k \ge n$) has frame potential at least $k^2/n$, with equality if and only if the sequence forms a tight frame (and there always exist frames achieving this bound).
The main result of this paper is a generalization of the aforementioned result to the context of finite-dimensional Banach spaces. We use the $2$-summing norm to define a frame potential for a sequence of $k$ norm-one vectors in an $n$-dimensional Banach space (where $k \ge n$), which generalizes the Hilbert-space notion. This generalized potential is also bounded below by $k^2/n$, with equality if and only if the sequence is a tight frame. The arguments rely on the geometry of the space of $2$-summing maps from a finite-dimensional Banach space to itself. For a wide class of spaces, in particular complex $n$-dimensional Banach spaces with a $1$-unconditional basis, we show that the equality case does occur.

This is joint work with Dan Freeman and Keri Kornelson.

Bosu Choi, University of Texas at Austin

Sparse harmonic transforms:
a new class of sublinear-time algorithms for approximating functions of many variables

In this talk we will discuss fast and memory efficient numerical methods for learning the best $s$ term approximation of functions of many variables in terms of bounded orthonormal tensor product bases. Such functions appear in many applications including, e.g., various Uncertainty Quantification (UQ) problems involving the solution of parametric PDE that are approximately sparse in Chebyshev or Legendre product bases.

More concretely, let $\mathcal{B}$ be a finite Bounded Orthonormal Product Basis (BOPB) of cardinality $|\mathcal{B}| = N$. Herein we will develop methods that rapidly approximate any function $f$ that is nearly sparse in the BOPB, that is, $f$ of the form
\[f(\vect{x}) \approx \sum_{b \in \mathcal{S}} c_b \cdot b(\vect{x}) \]
with $\mathcal{S} \subset \mathcal{B}$ of $|\mathcal{S}| = s$ is much less than $N$.
Our method has a runtime of just $(s \log N)^{O(1)}$, uses only $(s \log N)^{O(1)}$ function evaluations on a fixed and nonadaptive grid, and not more than $(s \log N)^{O(1)}$ bits of memory. We emphasize that nothing about $\mathcal{S}$ or any of the coefficients $c_b$ is assumed in advance other than that $\mathcal{S} \subset \mathcal{B}$ has $|\mathcal{S}| = s$. Both $\mathcal{S}$ and its related coefficients $c_b$ will be learned from the given function evaluations by the developed method.

For $s \ll N$, the runtime $(s \log N)^{O(1)}$ will be less than what is required to simply enumerate the elements of the basis $\mathcal{B}$. This and the similarly reduced sample and memory requirements set our algorithm apart from previous works based on standard compressive sensing algorithms such as basis pursuit which typically store and utilize full intermediate basis representations of size $\Omega(N)$ during the solution process.

Besides these improved theoretical guarantees, also the empirical performance of our method improves over previous approaches. In particular, the enhanced memory efficiency allows us to tackle much larger problem sizes than in previous works.

This is joint work with Mark Iwen (Michigan State University) and Toni Volkmer (Chemnitz University of Technology).

Sinan Gunturk, Courant Institute

Quasirandom sequences in quantization and analog-to-digital conversion problems

Quasirandom (or low-discrepancy) sequences have a rich history
in numerical analysis. While they are typically employed in
high-dimensional numerical integration problems and Monte Carlo
simulation, they also arise in other contexts where purely random (or
pseudorandom) sequences may not be the optimal choice. In this talk, we
will present examples of quasirandom sequences that appear naturally in
the setting of quantization and analog-to-digital conversion problems.

Jamie Haddock, University of California, Los Angeles

Analyzing hybrid randomized and greedy projection methods

Stochastic iterative algorithms have gained recent interest for solving large-scale systems of equations, Ax=y. One such example is the Randomized Kaczmarz (RK) algorithm, which acts only on single rows of the matrix A at a time. While RK randomly selects a row, Motzkin's algorithm employs a greedy row selection; the Sampling Kaczmarz-Motzkin (SKM) algorithm combines these two strategies. In this talk, we present a convergence analysis for SKM which interpolates between RK and Motzkin's algorithm.

Paul Hand, Northeastern University

Compressive sensing and compressive phase retrieval under a learned generative prior

Deep generative modeling has led to new and state of the art approaches for enforcing structural priors in a variety of inverse problems. In contrast to priors given by sparsity, deep models can provide direct low-dimensional parameterizations of the manifold of images or signals belonging to a particular natural class, allowing for recovery algorithms to be posed in a low-dimensional space. This dimensionality may even be lower than the sparsity level of the same signals when viewed in a fixed basis. In this talk, we will show rigorous recovery guarantees for solving inverse problems under a learned generative prior. First, we will discuss convergence guarantees for compressive sensing under random neural network priors. Then, we will show that generative priors allow for a significant advance to be made in the problem of compressive phase retrieval. To date, no known computationally efficient algorithm exists for solving phase retrieval under a sparsity prior at sample complexity proportional to the signal complexity. With generative priors, we establish a new approach for compressive phase retrieval and establish rigorous guarantees with sample complexity proportional to the signal complexity.

Mark Iwen, Michigan State University

Phase retrieval from windowed Fourier measurements via Wigner deconvolution and angular synchronization with associated lower bounds

We will discuss phase retrieval from locally supported STFT magnitude measurements of a vector x based on a two-step approach: First, an aliased Wigner distribution deconvolution approach is used to solve for a portion of the lifted rank-one signal xx^*. Second, an angular synchronization approach is used to recover x from the known portion of xx^*. We will also discuss lower bounds for the Lipschitz continuity of these measurements based off of the size of the support of our measurement masks. These lower bounds are independent of our reconstruction algorithm and give insight into the best possible performance of any such method.

This is a joint work with Mike Perlmutter, Sami Merhi, and Aditya Viswanathan.

Nicolas Keriven, ENS Paris

Fisher metric, support stability and optimal number of measurements in compressive off-the-grid recovery

Many problems in machine learning and imaging can be framed as an infinite dimensional Lasso problem to estimate a sparse measure. This includes for instance regression using a continuously parameterized dictionary, mixture model estimation and super-resolution of images. To make the problem tractable, one typically sketches the observations (often called compressive-sensing in imaging) using randomized projections. In this work, we provide a comprehensive treatment of the recovery performances of this class of approaches. We show that for a large class of operators, the Fisher-Rao distance induced by the measurement process is the natural way to enforce and generalize the classical minimal separation condition appearing in the literature. We then prove that (up to log factors) a number of sketches proportional to the sparsity is enough to identify the sought after measure with robustness to noise. Finally, we show that, under additional hypothesis, exact support stability holds (the number of recovered atoms matches that of the measure of interest) when the level of noise is smaller than a specified value.

This is a joint work with Clarice Poon (Bath Uni.) and Gabriel Peyré (ENS).

Felix Krahmer, Technical University of Munich

On the convex geometry of blind deconvolution

Low-rank matrix recovery from structured measurements has been a topic of intense study in the last decade and many important problems such as blind deconvolution have been formulated in this framework. An important benchmark method to solve these problems is to minimize the nuclear norm, a convex proxy for the rank. A common approach to establish recovery guarantees for this convex program relies on the construction of a so-called approximate dual certificate. However, this approach provides only limited insight in various respects. Most prominently, the noise bounds exhibit seemingly suboptimal dimension factors. In this paper we take a novel, more geometric viewpoint to analyze both the matrix completion and the blind deconvolution scenario. We find that for both these applications the dimension factors in the noise bounds are not an artifact of the proof, but the problems are intrinsically badly conditioned. We show, however, that bad conditioning only arises for very small noise levels: Under mild assumptions that include many realistic noise levels we derive near-optimal error estimates for blind deconvolution under adversarial noise.

This is joint work with Dominik Stöger (TUM).

Richard Kueng, California Institute of Technology

Binary component decomposition of matrices

We study the problem of decomposing a low-rank matrix into a factor with binary entries, either from {±1} or from {0,1}, and an unconstrained factor. This research answers fundamental questions about the existence and uniqueness of these decompositions. It also leads to tractable factorization algorithms that succeed under a mild deterministic condition.

This is joint work with Joel Tropp (Caltech).

Weilin Li, Courant Institute

Super-resolution, subspace methods, and non-harmonic Fourier matrices

This talk is concerned with the super-resolution problem of recovering a discrete measure on the torus consisting of S atoms, given M consecutive noisy Fourier coefficients. Super-resolution is sensitive to noise when the distance between two atoms is less than 1/M. We connect this problem to the minimum singular value of non-harmonic Fourier matrices. New results for the latter are presented, and as consequences, we derive results regarding the information theoretic limit of super-resolution and the resolution limit of subspace methods (namely, MUSIC and ESPRIT). These results rigorously establish the super-resolution phenomena of these algorithms that were empirically discovered long ago, and numerical results indicate that our bounds are sharp or nearly sharp.

This is a joint work with Albert Fannjiang and Wenjing Liao.

Dustin Mixon, The Ohio State University

SqueezeFit: Label-aware dimensionality reduction by semidefinite programming

Given labeled points in a high-dimensional vector space, we seek a low-dimensional subspace such that projecting onto this subspace maintains some prescribed distance between points of differing labels. Intended applications include compressive classification. This talk will introduce a semidefinite relaxation of this problem, along with various performance guarantees.

This is a joint work with Culver McWhirter (OSU) and Soledad Villar (NYU).

Krishna Narayanan, Texas A&M University

Sub-string matching in sub-linear time using sparse Fourier transforms

We consider the problem of querying a string (or a database) of length $N$ bits to determine all the locations where a substring (query) of length $M$ appears either exactly or is within a Hamming distance of $K$ from the query. We assume that sketches of the original signal can be computed offline and stored. Using the sparse Fourier transform computation-based approach introduced by Pawar and Ramchandran, we show that all such matches can be determined with high probability in sub-linear time. Specifically, if the query length $M = O(N^\mu)$ and the number of matches $L=O(N^\lambda)$, we show that for $\lambda < 1-\mu$ all the matching positions can be determined with a probability that approaches 1 as $N \rightarrow \infty$ for $K \leq \frac{1}{6}M$. More importantly our scheme has a worst-case computational complexity that is only $O\left(\max\{N^{1-\mu}\log^2 N, N^{\mu+\lambda}\log N \}\right)$, which means we can recover all the matching positions in {\it sub-linear} time for $\lambda<1-\mu$. This is a substantial improvement over the best known computational complexity of $O\left(N^{1-0.359 \mu} \right)$ for recovering one matching position by Andoni {\em et al.}. Further, the number of Fourier transform coefficients that need to be computed, stored and accessed, i.e., the sketching complexity of this algorithm is only $O\left(N^{1-\mu}\log N\right)$. Several extensions of the main theme are also discussed.
Potential applications of this work include text matching, audio/image matching, DNA matching in genomics, metabolomics, radio astronomy, searching for signatures of events within large databases, detecting viruses within binary executable files.

Jelani Nelson, University of California, Berkeley

Optimal terminal dimensionality reduction in Euclidean space

The Johnson-Lindenstrauss lemma states that for any X a subset of R^d
with |X| = n and for any epsilon, there exists a map f:X-->R^m for m =
O(log n / epsilon^2) such that:
for all x in X, for all y in X, (1-epsilon)|x - y|_2 <= |f(x) -
f(y)|_2 <= (1+epsilon)|x - y|_2
We show that this statement can be strengthened. In particular, the
above claim holds true even if "for all y in X" is replaced with "for
all y in R^d".

Nazanin Rahnavard , University of Central Florida

Beyond uniform compressive sensing: cross-layer design approaches for efficient data acquisition in wireless sensor networks

Compressive sensing (CS) is an elegant technique that enables under sampling of signals with a sparse representation in a proper basis. However, conventional CS approaches mostly consider a simple signal model with uniform sparsity and equal importance of all data, which may not be the case for many practical signals. Moreover, CS techniques have mainly focused on minimizing the number of measurements, as opposed to minimizing the overall cost of collecting measurements.
In this talk, compressive sensing techniques for energy-efficient data acquisition in large-scale wireless sensor networks (WSNs) are discussed. I will present how the signal and system properties can be integrated into the design of novel compressive sensing techniques for enhanced performances. Our schemes exploit the spatio-temporal correlation of sensor readings and integrate CS with the underlying networking protocols for a cross-layer design. Our results prove the efficiency of the proposed schemes in reducing the communication cost and enhancing the reliability and life time of WSNs. Such designs significantly enhance CS performance and can facilitate its application in a variety of fields.

Justin Romberg, Georgia Institute of Technology

Block sketching for distributed learning

We study how two fundamental problems in approximation and learning can be solved in a distributed setting.
In the first, we show how the ridge regression problem can be solved from a localized sketch, where different subsets of the input data are compressed independently from one another, corresponding to a block diagonal structure on the sketching matrix. We show that under mild conditions, block sketching performs as well as centralized sketching with a dense matrix.

For the second problem, we consider the problem of sketching a low rank matrix when the columns are distributed across multiple computational nodes. We show that if each subset of columns is sketched independently, then the matrix can be recovered at a central location by solving low-rank recovery problem. We show that a novel convex relaxation of this problem results in optimal sample complexity bounds. This relaxation is tailored to the form of the matrix compression scheme being considered and also provides some general guidelines for relaxing the problem for other compression schemes.

Rayan Saab, University of California, San Diego

New algorithms and improved bounds for one-bit compressed sensing on manifolds

We study the problem of approximately recovering signals on a manifold from one-bit linear measurements drawn from either a Gaussian ensemble, partial circulant ensemble, or bounded orthonormal ensemble and quantized using noise shaping schemes. We assume we are given a Geometric Multi-Resolution Analysis, which approximates the manifold, and we propose a convex optimization algorithm for signal recovery. We prove an upper bound on the recovery error which outperforms prior works that use memoryless scalar quantization. Additionally, our result requires a simpler analysis, and extends the class of measurements beyond Gaussians.

Giang Tran, University of Waterloo

Learning functions from weakly dependent data: a compressed sensing approach

We study the problem of learning nonlinear functions from identically distributed but not independent data that is corrupted by outliers and noise. The learning problem is written as a parameter estimation problem where we incorporate both the unknown coefficients and the corruptions in a basis pursuit framework. The main contribution of our paper is to provide a reconstruction guarantee for the associated l1-optimization problem where the sampling matrix is formed from dependent data. We prove that the sampling matrix satisfies the null space property, provided that the data is compact and satisfies a suitable concentration inequality.

This is joint work with Rachel Ward (UT Austin), Hayden Schaeffer (CMU), and Lam Ho (Dalhousie University).

Soledad Villar, New York University

Graph isomorphism testing and function approximation with GNNs

Graph neural networks (GNNs) have achieved lots of success on graph-structured data. In the light of this, there has been increasing interest in studying their representation power. One line of work focuses on the universal approximation of permutation-invariant functions by certain classes of GNNs, and another demonstrates the limitation of GNNs via graph isomorphism tests.
Our work connects these two perspectives and proves their equivalence. We further develop a framework of the representation power of GNNs with the language of sigma-algebra, which incorporates both viewpoints. Using this framework, we compare the expressive power of different classes of GNNs as well as other methods on graphs. In particular, we prove that order-2 Graph G-invariant networks fail to distinguish non-isomorphic regular graphs with the same degree. We then extend them to a new architecture, Ring-GNNs, which succeeds on distinguishing these graphs and provides improvements on real-world social network datasets.

This is joint work with Joan Bruna, Lei Chen and Zhengdao Chen.

Anru Zhang, UW-Madison

High-dimensional tensor regression via importance sketching

The past decades have seen a large body of work on high-dimensional tensors or multiway arrays that arise in numerous applications and have been applied to many statistical problems. In many of these settings, the tensor of interest is high-dimensional in that the ambient dimension is substantially larger than the sample size. Oftentimes, however, the tensor comes with natural low-rank or sparsity structure. How to exploit such structure for tensors poses new statistical and computational challenges.

In this talk, we discuss a novel procedure for low-rank tensor regression, namely Importance Sketching Low-rank Estimation for Tensors (ISLET). The central idea behind ISLET is the novel importance sketching, i.e., carefully designed sketches based on both the responses and the structures of the parameter of interest. We show that the proposed method is sharply minimax optimal in terms of the mean-squared error under low-rank Tucker assumptions. In addition, if a tensor is low-rank with group sparsity, our procedure also achieves minimax optimality. Further, we show through numerical study that ISLET achieves comparable or better mean-squared error performance to existing state-of-the-art methods whilst having substantial storage and run-time advantages. In particular, our procedure performs reliable estimation with tensors of dimension $p = O(10^8)$ and is $1$ or $2$ orders of magnitude faster than baseline methods.