Questo sito utilizza solo cookie tecnici per il corretto funzionamento delle pagine web e per il miglioramento dei servizi.
Se vuoi saperne di più o negare il consenso consulta l'informativa sulla privacy.
Proseguendo la navigazione del sito acconsenti all'uso dei cookie.

Convegno
“ROCCELLA CONFERENCE ON INFERENCE AND AI - ROCKIN' AI 2025”

The bond between Physics and Artificial Intelligence has never been stronger, as underscored by last year's Nobel Prize, awarded to Hinton and Hopfield, whose discoveries were central topics throughout the conference. Only a few months have passed since our first edition, yet this short time has been enough for major developments to emerge, drastically reshaping our world from political, social, and scientific perspectives. This rapid progress demands that the scientific community work to explain a wide range of phenomena that lie at the heart of AI’s functioning but remain largely understood only at an empirical level. Despite this, AI's success is so remarkable that it is now embedded in numerous devices, many of which are used daily by most of us. Addressing fundamental theoretical questions and achieving a deeper understanding of the basic building blocks of machine learning models and training algorithms remains crucial. Given the high-dimensional nature of real-world data and the vast number of tunable parameters in machine learning models, statistical physics and high-dimensional inference provide a natural framework. Since Gardner and Derrida’s seminal works on the perceptron in the 1980s, the interplay between disciplines has led to a wealth of innovative ideas and insights into the functioning of neural networks. This multidisciplinary conference aims to bring together researchers from statistical physics, mathematical physics, and machine learning. Our goal is to provide diverse perspectives on key topics in contemporary machine learning, including: - Associative memories, diffusion models, energy-based models. - Representation learning and structured data modeling. - Language modeling, self-supervised learning, reasoning, and alternative learning paradigms. - Mathematical physics approaches to high-dimensional probability, spin glasses, and Boltzmann machines. - Theoretical aspects of neural networks. The event will take place from September 1st to 5th, 2025, in Roccella Jonica, Calabria (Italy).

organizzato da: Federica Gerace

Vai alla home page del convegno

Elenco seminari

Settembre

dal giorno
01/09/2025
al giorno
05/09/2025

Giulio Biroli

Memorization-Generalization Transition in Diffusion Models

fisica matematica

Diffusion models have achieved remarkable success across a wide range of generative tasks. A key challenge is understanding the mechanisms that prevent their memorization of training data and allow generalization. In this seminar I will discuss the role of the training dynamics in the transition from generalization to memorization. I will show the emergence of two distinct timescales: an early time $\tau_\mathrm{gen}$ at which models begin to generate high-quality samples, and a later time $\tau_\mathrm{mem}$ beyond which memorization emerges. Crucially, we find that $\tau_\mathrm{mem}$ increases linearly with the training set size $n$, while $\tau_\mathrm{gen}$ remains constant. This creates a growing window of training times with $n$ where models generalize effectively, despite showing strong memorization if training continues beyond it. It is only when $n$ becomes larger than a model-dependent threshold that overfitting disappears at infinite training times. These findings reveal a form of implicit dynamical regularization in the training dynamics, which allow to avoid memorization even in highly overparameterized settings. Our results are supported by numerical experiments with standard U-Net architectures on realistic and synthetic datasets, and by a theoretical analysis using a tractable random features model studied in the high-dimensional limit.

Settembre

dal giorno
01/09/2025
al giorno
05/09/2025

Marylou Gabrié

Sampling assisted by generative modeling

fisica matematica

Deep generative models parametrize very flexible families of distributions able to fit complicated datasets of images or text. These models provide independent samples from complex high-distributions at negligible costs. On the other hand, sampling exactly a target distribution, such as the Boltzmann distribution of a physical system, is typically challenging: either because of dimensionality, multi-modality, ill-conditioning or a combination of the previous. In this talk, I will discuss opportunities and challenges in enhancing traditional Monte Carlo methods with learning.

Settembre

dal giorno
01/09/2025
al giorno
05/09/2025

Nicolas Macris

Denoising Score Matching with Random Features: Insights on Diffusion Models from Precise Learning Curves

fisica matematica

We theoretically investigate the phenomena of generalization and memorization in diffusion models. Empirical studies suggest that these phenomena are influenced by model complexity and the size of the training dataset. In our experiments, we further observe that the number of noise samples per data sample (m) used during Denoising Score Matching (DSM) plays a significant and non-trivial role. We capture these behaviors and shed insights into their mechanisms by deriving asymptotically precise expressions for test and train errors of DSM under a simple theoretical setting. The score function is parameterized by random features neural networks, with the target distribution being d-dimensional Gaussian. We operate in a regime where the dimension d, number of data samples n, and number of features p tend to infinity while keeping the ratios n/d and p/d fixed. By characterizing the test and train errors, we identify regimes of generalization and memorization as a function of n/d, p/d, and m. Our theoretical findings are consistent with the empirical observations.

Settembre

dal giorno
01/09/2025
al giorno
05/09/2025

Matteo Marsili

How does abstraction emerge in deep neural networks?

fisica matematica

Abstraction is the process of extracting the essential features from raw data while ignoring irrelevant details. This is similar to the process of focusing on large-scale properties, systematically removing irrelevant small-scale details, implemented in the renormalisation group of statistical physics. This analogy is suggestive because the fixed points of the renormalisation group offer an ideal candidate of a truly abstract -- i.e. data independent -- representation. It has been observed that abstraction emerges with depth in neural networks. Deep layers of neural network capture abstract characteristics of data, such as "cat-ness" or "dog-ness" in images, by combining the lower level features encoded in shallow layers (e.g. edges). Yet we argue that depth alone is not enough to develop truly abstract representations. We advocate that the level of abstraction crucially depends on how broad the training set is. We address the issue within a renormalisation group approach where a representation is expanded to encompass a broader set of data. We take the unique fixed point of this transformation -- the Hierarchical Feature Model -- as a candidate for an abstract representation. This theoretical picture is tested in numerical experiments based on Deep Belief Networks trained on data of different breadth. These show that representations in deep layers of neural networks approach the Hierarchical Feature Model as the data gets broader, in agreement with theoretical predictions.

Settembre

dal giorno
01/09/2025
al giorno
05/09/2025

Manfred Opper

Variational inference for stochastic differential equations driven by fractional Brownian motion

fisica matematica

Stochastic differential equations (SDE) driven by white noise are important models for stochastic dynamical systems in natural science and engineering. The statistical inference of the parameters of such models based on noisy observations has also attracted considerable interest in the machine learning community. Using Girsanov's change of measure approach one can apply powerful variational techniques to solve the inference problem. A limitation of standard SDE models is the fact that they show typically a fast, exponential decay of correlation functions. If one is interested in stochastic processes with a long time memory, a well known possibility is to replace the Brownian motion in the SDE by the so called fractional Brownian motion (fBM) which is no longer a Markov process. Unfortunately, variational inference for this case is much less straightforward. Our approach to this problem utilises a somewhat overlooked idea by Carmona and Coutin (1998) who showed that fBM can be exactly represented as an infinite dimensional linear combination of Ornstein-Uhlenbeck processes with different time constants. Using an appropriate discretisation, we arrive at a finite dimensional approximation which is an 'ordinary' SDE model in an augmented space. For this new model we can apply (more or less) off-the shelve variational inference approaches. We also discuss application of this approach to generative diffusion models.

Settembre

dal giorno
01/09/2025
al giorno
05/09/2025

Federico Ricci-Tersenghi

Computing algorithmic thresholds for hard combinatorial problems

fisica matematica

Understanding the limit of algorithms in solving hard optimization combinatorial problems is a fundamental problem both in basic research and real-world applications. However, results are scarce, especially for sparse models, which are the most realistic ones. I will summarize our current understanding of algorithmic thresholds in well-known problems, like satisfiability and coloring, focusing mainly on the dependence of algorithmic thresholds on the time scaling and on the analytical attempts to estimate these thresholds.

Settembre

dal giorno
01/09/2025
al giorno
05/09/2025

Guilhem Semerjian

Learning with low-degree polynomials

fisica matematica

Low-degree polynomials provide a versatile methodology to build systematic approximations of high-dimensional inference problems. In this talk I will present some recent results obtained by applying this framework to problems of supervised learning, focussing in particular on two layers neural networks of extensive width.

Settembre

dal giorno
01/09/2025
al giorno
05/09/2025

Beatriz Seoane

A theoretical framework for overfitting in simple generative energy based models

fisica matematica

We investigate the impact of limited data on training pairwise energy-based models for inverse problems aimed at identifying interaction networks. Utilizing the Gaussian model as testbed, we dissect training trajectories across the eigenbasis of the coupling matrix, exploiting the independent evolution of eigenmodes and revealing that the learning timescales are tied to the spectral decomposition of the empirical covariance matrix. We see that optimal points for early stopping arise from the interplay between these timescales and the initial conditions of training. Moreover, we show that finite data corrections can be accurately modeled through asymptotic random matrix theory calculations and provide the counterpart of generalized cross-validation in the energy based model context. Our analytical framework extends to binary-variable maximum-entropy pairwise models with minimal variations. These findings offer strategies to control overfitting in discrete-variable models through empirical shrinkage corrections, improving the management of overfitting in energy-based generative models. Finally, we propose a generalization to arbitrary energy-based models by deriving the neural tangent kernel dynamics of the score function under the score-matching algorithm.

Settembre

dal giorno
01/09/2025
al giorno
05/09/2025

Pierfrancesco Urbani

Generalization and overfitting in overparametrized two-layer neural networks

fisica matematica

Understanding the generalization properties of large, overparametrized, neural networks is a central problem in theoretical machine learning. Several insightful ideas have been proposed in this regard, among them: the implicit regularization hypothesis, the possibility of having benign overfitting and the existence of feature learning regimes where neural networks learn the latent structure of data. However a precise understanding of the emergence/validity of these behaviors cannot be disentangled from the study of the non-linear training dynamics. We use a technique from statistical physics, dynamical mean field theory, to study the training dynamics and obtain a rich picture of how generalization and overfitting arise in large overparametrized models. In particular, focusing on large 2-layer neural networks, we point out: (i) the emergence of a separation of timescales controlling feature learning and overfitting, (ii) a non-monotone behavior of the test error and, correspondingly, a 'feature unlearning' phase at large times and (iii) the emergence of algorithmic inductive bias towards small complexity. Joint work with Andrea Montanari.

Settembre

dal giorno
01/09/2025
al giorno
05/09/2025

Andrey Lokhov

Learning and Sampling with Markov Random Fields

fisica matematica

Boltzmann distribution in physics, Gibbs measure in mathematics, exponential family distributions in statistics, undirected graphical models in computer science, or energy-based models in machine learning — all these notions refer to the same general concept, also known as Markov Random Fields (MRFs). A recurrent interest for MRFs in many different branches of science is explained by the fact that they serve as a natural and interpretable modeling foundation for many scientific applications: MRFs have been used for modeling of natural systems at equilibrium since the creation of statistical physics! Yet, unknown training algorithms, as well as the lack of tools for generating predictions from these models presented the main barriers to the widespread use of MRFs in Scientific Machine Learning. In this talk, we review the state-of-the-art for learning of MRFs from data, and for constructing MRFs in forms which allow for an efficient generation of predictions and sampling. We illustrate a wide applicability of this concept in several distinct scientific areas: random graph models, statistical and quantum mechanical models, and field theories.

Settembre

dal giorno
01/09/2025
al giorno
05/09/2025

Luca Biggio

On the Bias of Next-Token Predictors Toward Systematically Inefficient Reasoning: A Shortest-Path Case Study

fisica matematica

Recent advances in natural language processing highlight two key factors for improving reasoning in large language models (LLMs): (i) allocating more test-time compute tends to help on harder problems but often introduces redundancy in the reasoning trace, and (ii) compute is most effective when reasoning is systematic and incremental, forming structured chains of thought (CoTs) akin to human problem-solving. To study these factors in isolation, we introduce a controlled setting based on shortest-path tasks in layered graphs. We train decoder-only transformers on question–trace–answer triples using a custom tokenizer, comparing models trained on optimal bottom-up dynamic programming traces with those trained on longer, valid traces involving backtracking. Surprisingly, with the same training-token budget, models trained on inefficient traces generalize better to unseen graphs. This benefit is not due to length alone—injecting arbitrary redundancy into reasoning traces fails to help and can even hurt performance. Instead, we find that generalization correlates with the model's confidence in next-token prediction, suggesting that long, coherent, and locally incremental traces make the training signal easier to optimize.

Settembre

dal giorno
01/09/2025
al giorno
05/09/2025

Enrico Malatesta

Are Neural Networks Collision Resistant?

fisica matematica

We study the computational hardness of finding collisions in neural networks, i.e. any two sets of weights that give the same labels to a random dataset. When the number of output neurons is sufficiently large, we establish the emergence of an overlap gap in the space of collisions. This is believed to indicate that efficient algorithms will not be able to find collisions. Such claim is supported by numerical experiments using approximate message passing algorithms, which stop working below the predicted value from the analysis. Our results also show that, by composing such a collision resistant neural network with an Error Correcting Code, one can obtain a Hash Function. Beyond relevance to cryptography for designing collision resistant one-way functions, our work uncovers new forms of computational hardness emerging in large neural networks.

Settembre

dal giorno
01/09/2025
al giorno
05/09/2025

Jacob Zavatone-Veth

Relazione all'interno del convegno: ROccella Conference on INference and AI - ROCKIN' AI 2025

fisica matematica

Recent years have seen substantial advances in our understanding of high-dimensional ridge regression, but existing theories assume that training examples are independent. By leveraging techniques from random matrix theory and free probability, we provide sharp asymptotics for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations. We demonstrate that in this setting, the generalized cross validation estimator (GCV) fails to correctly predict the out-of-sample risk. However, in the case where the noise residuals have the same correlations as the data points, one can modify the GCV to yield an efficiently-computable unbiased estimator that concentrates in the high-dimensional limit, which we dub CorrGCV. We further extend our asymptotic analysis to the case where the test point has nontrivial correlations with the training set, a setting often encountered in time series forecasting. Assuming knowledge of the correlation structure of the time series, this again yields an extension of the GCV estimator, and sharply characterizes the degree to which such test points yield an overly optimistic prediction of long-time risk.

Settembre

dal giorno
01/09/2025
al giorno
05/09/2025

Francesco Zamponi

Relazione all'interno del convegno: ROccella Conference on INference and AI - ROCKIN' AI 2025

fisica matematica

The task of sampling efficiently the Gibbs-Boltzmann distribution of disordered systems is important both for the theoretical understanding of these models and for the solution of practical optimization problems. Unfortunately, this task is known to be hard, especially for spin glasses at low temperatures. Recently, many attempts have been made to tackle the problem by mixing classical Monte Carlo schemes with newly devised Neural Networks that learn to propose smart moves. In this talk I will review a few physically-interpretable deep architectures, and in particular one whose number of parameters scales linearly with the size of the system and that can be applied to a large variety of topologies. I will show that these architectures can accurately learn the Gibbs-Boltzmann distribution for the two-dimensional and three-dimensional Edwards-Anderson models, and specifically for some of its most difficult instances. I will show that the performance increases with the number of layers, in a way that clearly connects to the correlation length of the system, thus providing a simple and interpretable criterion to choose the optimal depth. Finally, I will discuss the performances of these architectures in proposing smart Monte Carlo moves and compare to state-of-the-art algorithms.