Dustin Newman

Linear Regression Bias

May 24, 2024

This post aims to show the impact and importance of the bias term in linear regression models. Linear regression models take the form

$$ \hat{y} = \beta_{0} + \sum_{j=1}^{p} \beta_{j} X_{j} $$

for inputs $X_{1}, \ldots, X_{p}$ (i.e. there are $p$ features).

I will be using the Kaggle housing prices dataset as a simple example, despite the collinearity of the data. We’ll start by getting the data

import numpy as np
import pandas as pd

data = pd.read_csv(DATA_PATH)

Some basic preprocessing:

# Transform categorical data
data = pd.get_dummies(data, drop_first=True, dtype=int)
Y = data['price']
X = data.drop('price', axis=1)

# Standardize
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X = pd.DataFrame(scaler.fit_transform(X), columns=X.columns)

And then we add the bias term to $X$

X['bias'] = 1
X, Y = X.to_numpy(), Y.to_numpy()

Now we will compute the coefficients $\beta$ analytically using the Normal Equation

$$ \beta = (X^{T}X)^{-1} X^{T} y $$

and then use the coefficient vector to calculate $\hat{y}$ and the mean squared error

beta = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(Y)
Yhat = X.dot(beta)
mse = np.mean((Yhat - Y) ** 2)

Now, we will see what happens when we follow the same procedure but without adding the bias term. All other steps remain the same.

CoefficientWith biasWithout bias
Area529330.60363747529330.60363747
Bedrooms84642.7888535584642.78885355
Bathrooms495817.70908537495817.70908537
Stories390748.2656748390748.2656748
Parking238532.39119855238532.39119855
Main road146735.42974425146735.42974425
Guest room114950.32935761114950.32935761
Basement167040.77163767167040.77163767
Hot water178965.10323907178965.10323907
A/C401991.91011608401991.91011608
Preferred area276197.74367233276197.74367233
Semi-furnished-22847.00683726-22847.00683726
Unfurnished-192857.24250658-192857.24250658
Bias4766729.24770643N/A

And we can see the significant improvement in the mean squared error:

With biasWithout bias
Mean squared error1111187722284.400123832895443224.234

A 21x improvement! The plot comparing the two is even more enlightening.

A graph plotting the biased versus unbiased predictions for the dataset

You can literally see the intercept term at work “lifting” the predictions and why it makes such a simple, yet impressive difference in the performance of the model.