R-Squared: Formula Explanation

Saurabh Gupta
Analytics Vidhya
Published in
3 min readJan 25, 2021

--

We all must have seen these terms whenever we are working on some regression model. We all know the definition, we all know the usage, we all know their formula but what we don’t think of is how the formula works.

Today we are going to talk about their formula what does it mean, how does it work, why if it is closer to 1 it is better.

We are going to break down the formula and trying to understand how each term impacts the value of R-Squared.

The formula for R-Squared
Best-fit line model and Average line model

SS RES term in the formula is described in the best-fit line graph. Its shows the sum of the Square of the distance between the actual point and the predicted point in the best-fit line.

SS TOT term in the formula is described in the mean line graph. Its shows the sum of the Square of the distance between the actual point and the mean of all the points in the mean line.

Now think for a best-fit line the most of the predicted points are on the line or closer to it, residuals for best-fit are going to small and hence SS RES is going to be the small term. SS TOT will always be a large-term as compare to the SS RES because it is representing the distance of the predicted points from the mean(mean line).

Before subtracting from 1

When the numerator is a small term and the denominator is a large term we will be going to get a very small value that is less than 1. And when you subtract this very small value from 1 we get a value closer to 1.

Hence as our best-fit model improves, the residuals will decrease, and with that SS RES will become smaller and R-squared will get closer to 1.

If R-squared is 0 or less than 0

For this condition to be true numerator (SS RES) has to equal to or greater than the denominator (SS TOT) and if that's the case it means the model we have created is performing poorly than that of the average(mean) line.

Final Thoughts

There are lots of factors we should consider to judge a model R-squared is one of them. It’s easy to understand its Formula when we break down the terms and study the impact of each term separately. We have used a simple linear regression graph with only a single factor affecting it for Multiple linear regression R-squared fails.

--

--