Excess Risk Decomposition

Rahul Deora
2 min readJul 27, 2018

I have started doing the new Foundations to Machine Learning course by Bloomberg(https://bloomberg.github.io/foml/#about) and am loving it so far. Here are some notes on a very interesting lecture. I highly recommend people to do this course.

Lecture 5: Excess Risk Decomposition

The main goal of ML is to solve the Bayes decision function which finds a function of inputs that minimises the loss function. When we search over all possible functions this is called Bayes decision function. Usually we restrict ourselves to a hypothesis space. This prevents us from over fitting and makes training much easier. Most famous ML methods rely on some hypothesis space. The difference between the perfect function that could exist and perfect function within the hypothesis space is called the Approximation Error.

Risk = Approximation Error

However, we cannot get this perfect function within this space as we are limited to our data and not all possible data. This creates another error: Estimation Error(Diagram 1). Generally as our hypothesis space size increases, we expect estimation error to increase as there are now more ways in which we could be far away from the perfect function in the space.

Risk = Approximation Error+Estimation Error

Notice there is some trade off. Generally, as our hypothesis space F gets larger, our approximation error will decrease as there is probably a better function that can model the data and estimation error would increase for reasons given above.

Now as we are training our model in batches we start at some mildly optimised function and slowly converge to the perfect function represented by our data(Diagram 2). The difference between our current function and most optimised function is called the Optimisation Error.

We define this error as even though we can reach the most optimised version with our data, the last 1 or 0.01% of accuracy requires massive amounts of training and utilising things like second order optimisers to get the best out of our network. So we forgo some accuracy for practical purposes.

So finally our total error is:

Risk=Approximation Error + Estimation Error + Optimisation Error

#100daysofML

--

--