Elenco seminari del ciclo di seminari
“SEMINARS IN MATHEMATICAL PHYSICS AND BEYOND”

Note: this is the second part of a two-part seminar. AI is progressing at a remarkable speed, but we still don’t have a clear understanding of basic concepts in cognition (e.g. learning, understanding, abstraction, awareness, intelligence, etc). I shall argue that research focused on understanding how learning machines such as LLMs or deep neural networks do what they do, sidesteps the key issue by defining these concepts from the outset. For example, statistical learning is based on a classification of problems (supervised/unsupervised, classification, regression etc.) and addresses the resulting optimisation problem (maximisation of the likelihood, minimisation of errors, etc). Learning entails first of all detecting what makes sense to be learned 1) from very few samples, and 2) without a priori knowing why that date makes sense. This requires a quantitative notion of relevance that can distinguish data that makes sense from meaningless noise. I will first introduce and discuss the notion of relevance. Next I will claim that learning differs from understanding, where the latter implies integrating data that make sense into a pre-existing representation. The properties of this representation should be abstract, i.e. independent of the data, precisely because they need to represent data of a widely different domain. This is what enables higher cognitive functions that we do all the time, like drawing analogies and relating data learned independently from widely different domains. Such a representation should be flexible and continuously adaptable if more data or more resources are made available. I will show that such an abstract representation can be defined as the fixed point of a renormalisation group transformation, and it coincides with a model that can be defined from the principle of maximal relevance. I will provide empirical evidence that the representations of simple neural networks approach this universal model as the network is trained on a broader and broader domain of data. Overall, the aim of the seminar is to support the idea that an approach to central issues in cognition is also possible studying very simple models and does not necessarily require understanding large machine learning models.
Note: this is the first part of a two-part seminar. AI is progressing at a remarkable speed, but we still don’t have a clear understanding of basic concepts in cognition (e.g. learning, understanding, abstraction, awareness, intelligence, etc). I shall argue that research focused on understanding how learning machines such as LLMs or deep neural networks do what they do, sidesteps the key issue by defining these concepts from the outset. For example, statistical learning is based on a classification of problems (supervised/unsupervised, classification, regression etc.) and addresses the resulting optimisation problem (maximisation of the likelihood, minimisation of errors, etc). Learning entails first of all detecting what makes sense to be learned 1) from very few samples, and 2) without a priori knowing why that date makes sense. This requires a quantitative notion of relevance that can distinguish data that makes sense from meaningless noise. I will first introduce and discuss the notion of relevance. Next I will claim that learning differs from understanding, where the latter implies integrating data that make sense into a pre-existing representation. The properties of this representation should be abstract, i.e. independent of the data, precisely because they need to represent data of a widely different domain. This is what enables higher cognitive functions that we do all the time, like drawing analogies and relating data learned independently from widely different domains. Such a representation should be flexible and continuously adaptable if more data or more resources are made available. I will show that such an abstract representation can be defined as the fixed point of a renormalisation group transformation, and it coincides with a model that can be defined from the principle of maximal relevance. I will provide empirical evidence that the representations of simple neural networks approach this universal model as the network is trained on a broader and broader domain of data. Overall, the aim of the seminar is to support the idea that an approach to central issues in cognition is also possible studying very simple models and does not necessarily require understanding large machine learning models.
Matrix denoising is central to signal processing and machine learning. Its analysis when the matrix to infer has a factorised structure with a rank growing proportionally to its dimension remains a challenge, except when it is rotationally invariant. In this case, the information theoretically optimal estimator, called rotational invariant estimator, is known and its performance is rigorously controlled. Beyond this setting few results can be found. The reason is that the model is not a usual spin system because of the growing rank dimension, nor a matrix model due to the lack of rotation symmetry, but rather a hybrid between the two. It is rather a "matrix glass". In this talk I shall illustrate our progresses towards the understanding of Bayesian matrix denoising when the hidden signal is a factored matrix XX⊺ that is not rotationally invariant. Monte Carlo simulations suggest the existence of a denoising-factorisation transition separating a phase where denoising using the rotational invariant estimator remains optimal due to universality properties of the same nature as in random matrix theory, from one where universality breaks down and better denoising is possible by exploiting the signal's prior and factorised structure, though algorithmically hard. We also argue that it is only beyond the transition that factorisation, i.e., estimating X itself, becomes possible up to sign and permutation ambiguities. On the theoretical side, we combine different mean-field techniques in order to access the minimum mean-square error and mutual information. Interestingly, our alternative method yields equations which can be reproduced using the replica approach of Sakata and Kabashima that were deemed wrong for a long time. Using numerical insights, we then delimit the portion of the phase diagram where this mean-field theory is reliable, and correct it using universality when it is not. Our ansatz matches well the numerics when accounting for finite size effects.