2019-03-08: Rates of Convergence for Sparse Variational Gaussian Process Regression https://arxiv.org/abs/1903.03571v1Extensions to models with non-conjugate likelihoods, especially bounding the additional error introduced by sparsity in the framework of Hensman et al

‹›
Excellent variational approximations to Gaussian process posteriors have been
developed which avoid the $\mathcal{O}\left(N^3\right)$ scaling with dataset
size $N$. They reduce the computational cost to $\mathcal{O}\left(NM^2\right)$,
with $M\ll N$ being the number of inducing variables, which summarise the
process. While the computational cost seems to be linear in $N$, the true
complexity of the algorithm depends on how $M$ must increase to ensure a
certain quality of approximation. We address this by characterising the
behavior of an upper bound on the KL divergence to the posterior. We show that
with high probability the KL divergence can be made arbitrarily small by
growing $M$ more slowly than $N$. A particular case of interest is that for
regression with normally distributed inputs in D-dimensions with the popular
Squared Exponential kernel, $M=\mathcal{O}(\log^D N)$ is sufficient. Our
results show that as datasets grow, Gaussian process posteriors can truly be
approximated cheaply, and provide a concrete rule for how to increase $M$ in
continual learning scenarios.