We have always heard about Multicollinearity whenever we talk about the regression model but we never wonder ways to check this.

I know plotting a correlation matrix using **“M atplotlib”**

We gonna discuss the Variance Inflation Factor (VIF) but before that let’s have a quick discussion on **Multicollinearity**.

- Multicollinearity means independent variables in a model are correlated.
- Multicollinearity among independent variables can reduce the performance of the model.
- Multicollinearity can be a problem in multiple regression because the input variables are all influencing each other. Therefore, they are not actually independent, and…

Let’s start with discussing the terminologies used in the image.

**Bias-** Represent error in training data.

**Variance- **Represent error in test data.

**Over-Fitting-** The algorithm is showing a good fit on training data but not on the test data i.e low bias and high variance.

**Under-Fitting- **The algorithm is neither showing a good fit on training data nor the test data i.e high bias and high variance.

Now we know what is Over-Fitting and Under-Fitting. Let’s discuss what should we do when we have this problem.

- Try the regularized model

Regularized regression is a type of regression where the coefficient…

Metrics are used to evaluate the performance of Machine learning algorithms, classification as well as regression algorithms. We must carefully choose the metrics because the measurement of the performance of Machine learning algorithms will be dependent entirely on the metric we choose.

It is used in the classification problem to establish a relationship between predicted values and actual values. It shows how many values are predicted correctly and how many are predicted wrong for each class. We can derive the different type of metrics that will show how much good fit our model is.

When the dataset is balanced we…

We are going to discuss some distribution functions. We will see their properties and try to understand them with basic examples.

Points to remember

- It’s a Discrete distribution.
- There are two potential outcomes.
- The probability of success is the same.
- Fixed number of the trail.
- Trials are independent of each other.

The first thing we always wonder why to use the **combination **in the formula. To answer that let’s see this example

*Example:*

Car accident data show 8% of people die in a car accident. A random sample of 10 people. Let see different scenarios.

- All people die in a…

This article is all about **“ relationship”. **Relationship between two independent variables. How those variables are related and how strongly they are related. We will go step by step and try to discuss each aspect of correlation.

First, let’s discuss a very important term **Covariance**. Covariance tells how the variables are related. If the covariance is positive that means if one variable increase other will increase and if negative that means if one variable increase other will decrease.

As the name suggests, Adjusted R-Squared is an adjusted version of R-Squared. The question arises why we need to adjust R-Squared.

So in this article, we are going to see why Adjusted R-Squared is needed and we will break down its formula and try to understand the impact of each term on value Adjusted R-Squared.

I encourage you to read my article R-Squared: Formula Explanation. It will help you in the understanding of R-Squared.

Let start with answering our very first question, ** “why do we need to adjust R-Squared?”.** For that, we need to discuss the drawbacks of R-Squared.

R-Squared…

We all must have seen these terms whenever we are working on some regression model. We all know the definition, we all know the usage, we all know their formula but what we don’t think of is how the formula works.

Today we are going to talk about their formula what does it mean, how does it work, why if it is closer to 1 it is better.

We are going to break down the formula and trying to understand how each term impacts the value of R-Squared.