Relevance Vector Machines Explained

A Step-by-Step Introduction to Relevance Vector Machines

Relevance Vector Machine regression: left panel shows a clean sinc function fit with relevance vectors marked as circles; right panel shows robust RVM prediction despite noisy and outlier-contaminated data

This tutorial paper has been written to make Tipping's Relevance Vector Machines (RVMs) as simple to understand as possible for those with minimal experience of Machine Learning. It assumes knowledge of probability in the areas of Bayes' theorem and Gaussian distributions including marginal and conditional Gaussian distributions. It also assumes familiarity with matrix differentiation, the vector representation of regression and kernel (basis) functions.

What Is a Relevance Vector Machine?

A Relevance Vector Machine (RVM) is a Bayesian sparse kernel method introduced by Michael Tipping in 2001. Like Support Vector Machines, RVMs use kernel functions to model non-linear relationships, but they take a fundamentally different approach: instead of finding maximum-margin hyperplanes, RVMs place a prior over the model weights and use Bayesian inference to determine which data points (the "relevance vectors") are most important for prediction.

The key advantage of RVMs over SVMs is sparsity — they typically use far fewer basis functions, producing faster predictions at test time. They also provide probabilistic outputs (calibrated uncertainty estimates), which SVMs do not naturally offer. The trade-off is that training an RVM can be more computationally expensive than training an SVM, and the solution is not guaranteed to be globally optimal.

What the Tutorial Covers

  • Bayesian inference and the evidence framework
  • How RVMs achieve sparsity compared to SVMs
  • The relevance vector and automatic relevance determination
  • Kernel functions and basis function selection
  • Practical implementation considerations

Relevance Vector Machine vs Support Vector Machine

Both RVMs and SVMs are kernel-based methods for classification and regression, but they differ in important ways. SVMs minimise a regularised empirical risk and produce solutions defined by support vectors — data points that lie on or within the margin. RVMs instead maximise the marginal likelihood (type-II maximum likelihood) and prune irrelevant basis functions during training, yielding a much sparser model. Where an SVM might retain 30-50% of training points as support vectors, an RVM will typically use fewer than 5% as relevance vectors.

For a full introduction to SVMs, see the companion tutorial on Support Vector Machines Explained.

Relevance Vector Machines vs Gaussian Processes

Relevance Vector Machines and Gaussian Processes (GPs) are both Bayesian approaches to regression and classification that provide calibrated uncertainty estimates with each prediction. However, they differ significantly in how they achieve this. A Gaussian Process defines a distribution directly over functions and makes predictions by conditioning on the observed data, with computational cost that scales as O(n3) in the number of training points due to matrix inversion. RVMs, by contrast, place a prior over the model weights and use automatic relevance determination to prune the vast majority of basis functions during training — producing a sparse model that is much faster at test time.

In practice, GPs tend to give slightly better-calibrated uncertainty estimates on smooth problems, while RVMs excel where sparsity and fast prediction are valued — for instance in real-time applications or when the training set is large enough that full GP inference becomes prohibitive. Both methods require choosing a kernel function, though RVMs additionally learn which training points are "relevant" and discard the rest.

Download the full tutorial (PDF)

Frequently Asked Questions about Relevance Vector Machines

What is the difference between a relevance vector machine and a support vector machine?

Both are kernel-based methods, but SVMs find a maximum-margin separating hyperplane using a frequentist approach, while RVMs use Bayesian inference to determine which data points (relevance vectors) contribute to the model. RVMs typically produce much sparser solutions and provide probabilistic predictions, whereas SVMs offer deterministic outputs with strong generalisation guarantees. See the full comparison in Support Vector Machines Explained.

When should I use a relevance vector machine instead of an SVM?

RVMs are preferred when you need probabilistic outputs (confidence intervals on predictions), when test-time speed is critical (RVMs use far fewer basis functions), or when you want an automatic method for selecting model complexity. SVMs may be preferred when you need guaranteed convex optimisation or when training speed is the bottleneck.

Are relevance vector machines used in practice?

Yes, though less commonly than SVMs or Gaussian Processes. RVMs have found applications in signal processing, geostatistics, medical image analysis and financial prediction. Their sparsity makes them particularly attractive for embedded systems or real-time applications where prediction latency matters.

What are the disadvantages of relevance vector machines?

The main drawbacks are: (1) training can be slower than SVMs because the evidence framework involves iterative re-estimation of hyperparameters; (2) the solution is not guaranteed to be globally optimal; and (3) the model can be sensitive to the choice of kernel and initialisation. Despite these limitations, RVMs remain a valuable tool in the Bayesian machine learning toolkit.

What is automatic relevance determination in RVMs?

Automatic relevance determination (ARD) is the mechanism by which an RVM decides which basis functions (and therefore which training points) are important. Each weight in the model has an individual precision hyperparameter. During training, the evidence framework drives many of these precisions to infinity, effectively setting the corresponding weights to zero and removing the associated data points from the model. The surviving points are the "relevance vectors".

Related Tutorials

Written by Dr Tristan Fletcher. Browse all ML tutorials.