I want to review some key concepts from statistical learning theory. My knowledge of it is very many influenced by the regularization viewpoint (due to the MIT class with Poggio).
Let be training set of labeled examples. We want to learn a function to predict unseen data. A learning algorithm is a map . We are also given a real-valued loss function .
The empirical loss of on is This is the training loss that we observe. What we really care about is the true loss . We don’t observe , but if we have a generalization error then we have an idea of how good the predictor is. The goal of regularization and the following analysis is to bound the generalization error.
One way to achieve good generalization performance is if the learning algorithm does not change much if the training set changes a bit. An extreme case is if $f_S$ does not depend on at all. In which case approaches per Central Limit Theorem. But this algorithm does not do any “learning” and would have poor empirical error. An algorithm that only seeks to minimize empirical error could be very fragile and can change significantly if the training samples change. Regularization is one way to achieve this balance.