View on GitHub

50 Senz of Sith

Bayesian Regression

Last updated 2025-06-02

Classical Linear Regression

Data

Here’s a simple dataset with a nearly perfect linear relationship between X and Y:

X Y
1 0.500 0.500
2 2.372 2.376
3 4.247 4.248
4 6.118 6.131
5 7.996 7.990

Linear Regression Model

Using ordinary least squares (OLS) via statsmodels(code not shown), we obtain:

\[Y = 0.0031 + 0.9998X\]

With 95% confidence intervals:

This frequentist approach assumes that the only uncertainty lies in the dependent variable Y, and that the residuals are independent, normally distributed, and homoscedastic.

Bayesian Regression (Simple)

Let’s now reframe the same linear model in a Bayesian framework using PyMC.

Model in PyMC

Imports (used throughout this article):

import arviz as az
import pymc as pm

Model definition:

with pm.Model() as linmodel:
  # Priors
  intercept = pm.Normal("intercept", mu=0, sigma=1)
  slope = pm.Normal("slope", mu=1, sigma=1)
  sigma = pm.HalfNormal("sigma", sigma=1)

  # Linear model
  y_pred = pm.Deterministic("model", intercept + slope * (X))

  # Likelihood
  pm.Normal("y_pred", mu=y_pred, sigma=sigma, observed=Y)

  # Inference
  idata = pm.sample(2000, cores=4, nuts_kwargs={"target_accept": 0.95})

ArviZ is used for model diagnostics and summaries.
Set hdi_prob = 0.95 to compute the 95% highest density intervals (HDIs).

Posterior Summary

The estimated regression line is:

\[Y = 0.003 + 1.000X\]

With 95% credible intervals:

The Bayesian approach treats the model parameters as distributions, not fixed values. The result is a full posterior distribution over all parameters, giving us more than just point estimates – it gives us uncertainty.

Heteroscedasticity in Bayesian

What is Heteroscedasticity?

Heteroscedasticity occurs when the variability of the residuals (errors) depends on the value of X. In simpler terms, the spread of Y increases (or decreases) along the range of X. This violates a key assumption of classical OLS regression: constant variance (homoscedasticity).

OLS is not well-equipped for heteroscedastic data. In frequentist settings, one workaround is weighted least squares (WLS) – a method that can get quite nuanced and brittle.

In Bayesian modelling, heteroscedasticity is straightforward to implement.

Data

In this example, each Y value is the average of three repeated measurements (y1, y2, y3​), and the standard deviation (sdy) at each point reflects measurement error that increases with X:

X y1 y2 y3 Y sdy
1 0.500 0.549 0.549 0.550 0.549 0.001
2 2.372 2.614 2.616 2.615 2.615 0.001
3 4.251 4.667 4.691 4.680 4.680 0.010
4 6.134 6.751 6.735 6.736 6.741 0.007
5 7.998 8.791 8.784 8.813 8.796 0.013

Frequentist WLS (Weighted Least Squares)

Using WLS in statsmodels (code not shown):

\[Y = 0.0057 + 1.0989X\]

With 95% confidence intervals:

Bayesian Modelling with Heteroscedasticity

All we need to do in PyMC is replace the constant error term sigma with a known, data-derived sdy:

with pm.Model() as hetmodel:
  # Priors
  intercept = pm.Normal("intercept", mu=0, sigma=1)
  slope = pm.Normal("slope", mu=1, sigma=1)
  
  # Linear model
  y_pred = pm.Deterministic("model", intercept + slope * (X))

  # Likelihood using observed heteroscedastic error
  pm.Normal("y_pred", mu=y_pred, sigma=sdy, observed=Y)

  # inference data
  idata = pm.sample(2000, cores=4, nuts_kwargs={"target_accept": 0.95})

Posterior Summary

Bayesian model estimates:

\[Y = -0.001 + 1.102X\]

With 95% credible intervals:

Bayesian models allow you to model heteroscedasticity directly and transparently by encoding the varying uncertainty directly in the likelihood – no special workaround needed.

Error-in-variables (Uncertainty in Both X and Y)

Now let’s address a more challenging scenario: both X and Y have measurement error. This situation frequently arises in real-world experiments.

Why Frequentist Methods Struggle

In the frequentist framework, accounting for measurement error in the independent variable X leads to nontrivial models. One commonly used solution is Deming regression, which assumes known variances in both X and Y and minimizes orthogonal distances rather than vertical residuals.

Even with Deming regression, generalizing to weighted, multivariable, or non-Gaussian cases becomes increasingly messy.

Data

Each X and Y value below is the average of three measurements, with associated standard deviations:

x1 x2 x3 y1 y2 y3 X sdx Y sdy
1 0.701 0.701 0.699 0.746 0.750 0.749 0.700 0.001 0.749 0.002
2 2.575 2.577 2.577 2.814 2.809 2.814 2.576 0.001 2.812 0.002
3 4.449 4.440 4.452 4.876 4.881 4.878 4.447 0.005 4.878 0.002
4 6.334 6.328 6.318 6.946 6.939 6.932 6.327 0.007 6.939 0.006
5 8.198 8.197 8.206 9.004 9.001 9.005 8.200 0.004 9.003 0.002

Weighted Deming Regression (Frequentist)

Estimated model:

\[Y = -0.0227 + 1.1008X\]

95% confidence intervals:

Bayesian Error-in-Variables Model

Here’s where Bayesian methods shine. We simply:

with pm.Model() as errmodel:
  # Priors
  intercept = pm.Normal("intercept", mu=0, sigma=5)
  slope = pm.Normal("slope", mu=1, sigma=5)

  # Latent variable: true X values
  x_mean = pm.Normal("x_mean", mu=X, sigma=sdx)
  
  # Linear model
  y_pred = pm.Deterministic("model", intercept + slope * (x_mean))

  # Likelihood with observed error in Y
  pm.Normal("y_pred", mu=y_pred, sigma=sdy, observed=Y)

  # Inference
  idata = pm.sample(2000, cores=4, nuts_kwargs={"target_accept": 0.95})

Posterior Summary

Estimated model:

\[Y = -0.022 + 1.101X\]

95% credible intervals:

With PyMC, extending your model to account for measurement uncertainty in both X and Y is seamless. This is a powerful advantage over frequentist approaches that require special-case solutions.

Summary

Bayesian regression provides a natural and extensible framework to handle various forms of uncertainty:

Unlike frequentist solutions that often require entirely new techniques or approximations, Bayesian modelling with PyMC evolves incrementally, retaining model transparency and interpretability.

Table of Content