Streaming Media

Document Type


Publication Date



Algorithms, Gaussian linear model, least squares, dense matrices, Kaczmarz


The emergence of massive data sets, over the past twenty or so years, has lead to the development of Randomized Numerical Linear Algebra. Randomized matrix algorithms perform random sketching and sampling of rows or columns, in order to reduce the problem dimension or compute low-rank approximations. We review randomized algorithms for the solution of least squares/regression problems, based on row sketching from the left, or column sketching from the right. These algorithms tend to be efficient and accurate on matrices that have many more rows than columns. We present probabilistic bounds for the amount of sampling required to achieve a user-specified error tolerance. Along the way we illustrate important concepts from numerical analysis (conditioning and pre-conditioning), probability (coherence, concentration inequalities), and statistics (sampling and leverage scores). Numerical experiments illustrate that the bounds are informative even for small problem dimensions and stringent success probabilities. To stress-test the bounds, we present algorithms that generate 'adversarial' matrices' for user-specified coherence and leverage scores. If time permits, we discuss the additional effect of uncertainties from the underlying Gaussian linear model in a regression problem.


The captions accompanying these videos were generated automatically by Kaltura software which may not accurately transcribe scientific, medical, and technical terms.