Let’s start with discussing the terminologies used in the image.

Bias- Represent error in training data.

Variance- Represent error in test data.

Over-Fitting- The algorithm is showing a good fit on training data but not on the test data i.e low bias and high variance.

Under-Fitting- The algorithm is neither showing a good fit on training data nor the test data i.e high bias and high variance.

Now we know what is Over-Fitting and Under-Fitting. Let’s discuss what should we do when we have this problem.


  1. Try the regularized model

Regularized regression is a type of regression where the coefficient…

Metrics are used to evaluate the performance of Machine learning algorithms, classification as well as regression algorithms. We must carefully choose the metrics because the measurement of the performance of Machine learning algorithms will be dependent entirely on the metric we choose.

Performance Metrics for Classification

Confusion Matrix

It is used in the classification problem to establish a relationship between predicted values and actual values. It shows how many values are predicted correctly and how many are predicted wrong for each class. We can derive the different type of metrics that will show how much good fit our model is.

For balance dataset

1. Accuracy

When the dataset is balanced we…

We are going to discuss some distribution functions. We will see their properties and try to understand them with basic examples.

Binomial Distribution

Binomial Distribution

Points to remember

  1. It’s a Discrete distribution.
  2. There are two potential outcomes.
  3. The probability of success is the same.
  4. Fixed number of the trail.
  5. Trials are independent of each other.

The first thing we always wonder why to use the combination in the formula. To answer that let’s see this example


Car accident data show 8% of people die in a car accident. A random sample of 10 people. Let see different scenarios.

  1. All people die in a…

This article is all about relationship”. Relationship between two independent variables. How those variables are related and how strongly they are related. We will go step by step and try to discuss each aspect of correlation.

First, let’s discuss a very important term Covariance. Covariance tells how the variables are related. If the covariance is positive that means if one variable increase other will increase and if negative that means if one variable increase other will decrease.

As the name suggests, Adjusted R-Squared is an adjusted version of R-Squared. The question arises why we need to adjust R-Squared.

So in this article, we are going to see why Adjusted R-Squared is needed and we will break down its formula and try to understand the impact of each term on value Adjusted R-Squared.

I encourage you to read my article R-Squared: Formula Explanation. It will help you in the understanding of R-Squared.

Let start with answering our very first question, “why do we need to adjust R-Squared?”. For that, we need to discuss the drawbacks of R-Squared.

Simple and Multiple Linear Regression


We all must have seen these terms whenever we are working on some regression model. We all know the definition, we all know the usage, we all know their formula but what we don’t think of is how the formula works.

Today we are going to talk about their formula what does it mean, how does it work, why if it is closer to 1 it is better.

We are going to break down the formula and trying to understand how each term impacts the value of R-Squared.

The formula for R-Squared

Saurabh Gupta

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store