AI Bias Examples

How Useful Systems Can Still Be Unfair

Part of A Sceptic's Guide to AI.

Decisions wrapped in math look objective but can quietly scale old prejudices. That is the core problem with AI bias — and it is poorly served by both the panic narrative ("AI is irredeemably biased") and the engineering narrative ("bias is a solvable problem that smart people are already fixing").

AI systems can be biased. They can also be less biased than the humans they replace. The critical difference is not whether bias exists, but how it scales, whether it is detectable, and who bears the consequences.

Contents

Why Bias Appears in AI Systems

AI systems learn from data. If the data reflects biased decisions, the system learns those biases. This is not a flaw in the algorithm — it is the algorithm working exactly as designed. The problem is upstream.

There are several distinct mechanisms through which bias enters AI systems:

Historical bias in training data

The most common source. A hiring model trained on ten years of hiring decisions inherits a decade of human preferences, including discriminatory ones. A lending model trained on historical approvals learns the patterns of historical lending practices, including redlining-era effects that persist in the data long after the policies were officially abandoned.

The insidious part: even if you remove protected attributes (gender, race, age) from the input data, the model can reconstruct proxies from correlated features. Postcode correlates with ethnicity. Name correlates with gender. University correlates with socioeconomic background. Removing the label does not remove the signal.

Selection bias

You can only train on data you have. If your training data is not representative of the population the system will serve, the model will perform differently across groups. A facial recognition system trained predominantly on lighter-skinned faces will perform worse on darker-skinned faces — not because of any intentional bias, but because the training set was unrepresentative.

This is particularly dangerous because performance metrics averaged across the whole population can look acceptable while masking severe underperformance for specific subgroups.

Label bias

Many AI systems are trained on human-generated labels, and those labels carry human judgement. If radiologists from one institution systematically under-diagnose a condition in a particular demographic, a model trained on their labels will learn that under-diagnosis as ground truth. The model does not distinguish between "this condition is less common in this group" and "this condition is less frequently detected in this group."

Feedback loops

Once deployed, biased systems can reinforce their own biases. A predictive policing system that directs more officers to a neighbourhood will produce more arrests in that neighbourhood, which generates more data suggesting that neighbourhood is high-crime, which directs even more officers there. The system creates the evidence that justifies its own predictions.

Real-World Examples

Domain What happened Mechanism
Hiring A major technology company's experimental hiring tool was found to systematically downgrade CVs containing indicators associated with women (e.g. "women's chess club captain"). The system learned from a decade of hiring data in which men were disproportionately hired for technical roles. Historical bias
Healthcare A widely used algorithm for allocating healthcare resources was found to systematically under-refer Black patients. The system used healthcare spending as a proxy for medical need — but because Black patients historically had less access to healthcare, they had lower spending even when sicker. Label bias, historical bias
Criminal justice Risk assessment tools used in sentencing and bail decisions have been shown to produce higher risk scores for Black defendants, even when controlling for criminal history. Evaluations found that the tools were roughly equally accurate across races but had different error patterns — specifically, they were more likely to falsely flag Black defendants as high-risk. Historical bias, feedback loops
Facial recognition Commercial facial recognition systems showed error rates up to 34% for darker-skinned women compared to less than 1% for lighter-skinned men. The disparity was directly linked to the composition of training datasets. Selection bias
Lending Algorithmic lending systems have been shown to charge higher interest rates to minority borrowers, even after controlling for creditworthiness. The systems used features that served as proxies for race — neighbourhood, spending patterns, bank balance volatility. Historical bias, proxy discrimination

In each case, the system was performing a useful function — screening candidates, allocating resources, assessing risk. The bias did not make the systems useless. It made them unfair in specific, measurable, and consequential ways.

The Uncomfortable Comparison: Humans vs Algorithms

Counterpoint. People are often more biased than algorithms — and we tend to hold AI to a higher standard than people.

A human hiring manager making snap judgements is influenced by a candidate's accent, appearance, name, and institution in ways that are well-documented and essentially unauditable. A human judge's bail decisions correlate with whether they have eaten recently. A doctor's diagnostic accuracy varies with fatigue, caseload, and unconscious associations between patient demographics and disease probability.

Algorithms, for all their flaws, are consistent. They apply the same criteria to every case. They can be audited. Their biases, once identified, can be measured and addressed systematically. None of this is true for individual human decision-makers.

The critical distinction. The difference is not that algorithms are more biased than humans — often they are less biased. The difference is that algorithmic bias operates at a scale and speed that human bias cannot match. A biased hiring manager might affect fifty candidates a year. A biased algorithm can affect fifty thousand in a day. The individual harm may be smaller, but the aggregate harm can be vastly larger.

This means the right comparison is not "biased algorithm vs unbiased human" (the unbiased human does not exist). It is "biased algorithm at scale vs biased human at unit scale." Both are problems. They require different solutions.

What to Do About It

The practical response to AI bias is not to avoid AI in consequential decisions. It is to adopt a systematic approach to detection, measurement, and mitigation.

Before deployment

  • Audit training data for representation. Who is in the dataset? Who is missing? What historical decisions does the data encode?
  • Test for disparate impact across protected groups. Do not rely on aggregate accuracy — disaggregate performance metrics by demographic.
  • Examine proxy features. If you have removed protected attributes, check whether the model has found proxies. Techniques like permutation importance and SHAP values can reveal which features drive predictions.

During deployment

  • Monitor outcomes by group. Set up dashboards that track decision rates, error rates, and outcome distributions across demographics.
  • Maintain human oversight for high-stakes decisions. AI should inform but not replace human judgement where the stakes are high and verification is hard.
  • Create feedback channels. People affected by AI decisions need a way to challenge those decisions and have them reviewed by a human.

Ongoing

  • Re-audit regularly. Bias can emerge over time as populations shift, feedback loops accumulate, or the relationship between features and outcomes changes.
  • Be honest about trade-offs. In many cases, reducing bias involves accepting some reduction in aggregate accuracy. This is a policy decision, not a technical one. Make it explicitly.
  • Document everything. Bias audits, mitigation decisions, and trade-off analyses should be recorded. If you cannot explain why your system treats different groups differently, you cannot defend it.

Key takeaways:

  • AI bias is a data problem more than an algorithm problem — systems learn the biases present in their training data
  • Removing protected attributes does not prevent bias — models find proxies
  • Humans are often more biased than algorithms, but algorithmic bias scales to affect millions
  • Aggregate accuracy metrics can hide severe underperformance for specific groups
  • The response is not to avoid AI but to audit, monitor, and maintain human oversight where it matters

Related Reading

I advise organisations on responsible AI deployment, including bias auditing and fairness assessment. Get in touch.

Written by Dr Tristan Fletcher.