Statistical learning theory

I want to review some key concepts from statistical learning theory. My knowledge of it is very many influenced by the regularization viewpoint (due to the MIT class with Poggio).

Let S=(x_1,y_1), ..., (x_n,y_n) be training set of labeled examples. We want to learn a function f_S: X \rightarrow Y to predict unseen data. A learning algorithm is a map A: S \rightarrow f_S. We are also given a real-valued loss function V(f_S(x),y).

The empirical loss of f_S on S is I_S[f_S] = \frac{1}{n}\sum V(f_S(x_i),y_i). This is the training loss that we observe. What we really care about is the true loss I[f_S] = E_{(x,y)}[V(f_S(x),y)]. We don’t observe I[f_S], but if we have a generalization error D[f_S]=I[f_S]-I_S[f_S] then we have an idea of how good the predictor is. The goal of regularization and the following analysis is to bound the generalization error.

One way to achieve good generalization performance is if the learning algorithm does not change much if the training set S changes a bit. An extreme case is if $f_S$ does not depend on S at all. In which case I[f_S] approaches I_S[f_S] per Central Limit Theorem. But this algorithm does not do any “learning” and would have poor empirical error. An algorithm that only seeks to minimize empirical error could be very fragile and can change significantly if the training samples change. Regularization is one way to achieve this balance.

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s