Seminari del Dipartimento di Matematica

Seminario del 2024

Urte Adomaityte

Heavy tails in high dimensions: relaxing data assumptions in exact asymptotics for classification and robust regression

fisica matematica

interdisciplinare

We characterise the exact asymptotic performance of high-dimensional classification and robust regression estimators under convex loss and regularisation assumptions. Using tools from replica theory, our analysis covers a large family of data distribution assumptions, including any power-law tail, and allows us to determine cases where Gaussian data universality breaks. For classification, we characterise the learning of a mixture of clouds by studying the generalisation performance of the obtained estimator, analyse the role of regularisation and analytically derive the data separability transition. For robust regression, we provide an exact asymptotic characterisation of the recovery of a planted estimator under heavy-tailed contamination of covariates and label noise. We show that, unlike in the classical regime of small dimension-to-data sample ratio, regularisation becomes necessary for the Huber loss estimator to achieve optimality under heavy-tailed contamination in the modern high-dimensional regime, and we derive decay rates for the estimation error of ridge regression.