The Confidence Gap: Why Genomic Predictions Sometimes Get It Wrong

Exploring the uncertainty in genomic sequence-to-activity models and how AI sometimes gets DNA predictions wrong

Genomics AI Prediction

The Language of Life

Imagine being able to read DNA like a blueprint, predicting exactly how each gene will behave—and what happens when that blueprint is misinterpreted. This is the challenge scientists face with genomic sequence-to-activity models, sophisticated artificial intelligence systems designed to predict how DNA sequences control gene activity. These models represent a revolutionary tool for understanding the genetic syntax that dictates everything from our eye color to our susceptibility to diseases.

Researchers have discovered that AI models display remarkable overconfidence when analyzing standard reference genomes, yet become surprisingly uncertain when predicting the effects of genetic variations between individuals 1 3 .

Overconfidence

Models show high confidence on reference sequences even when their predictions are incorrect.

Uncertainty

Models become uncertain when predicting effects of genetic variations between individuals.

When AI Studies DNA

What Are Sequence-to-Activity Models?

Genomic sequence-to-activity models are deep learning systems—complex algorithms inspired by the human brain—that take DNA sequences as input and predict various molecular outputs.

Gene Expression Levels

How actively a gene is being used

Transcription Factor Binding

Where proteins that control genes attach to DNA

Chromatin Accessibility

How open or closed different DNA regions are

Histone Modifications

Chemical tags that influence gene activity

The Uncertainty Problem

Despite their sophistication, these models show a troubling pattern: they make excellent predictions for standard reference genomes but struggle with the genetic variations that make each of us unique 1 3 .

Data Uncertainty
Model Uncertainty

As Ayesha Bajwa and colleagues noted in their 2024 study, "Models tend to make high-confidence predictions on reference sequences, even when incorrect, and low-confidence predictions on sequences with variants" 3 .

A Deep Dive into the Uncertainty Experiment

The Ensemble Approach

To investigate this uncertainty problem, researchers from Stanford University and other institutions conducted a clever experiment using an ensemble of Basenji2 models—a representative state-of-the-art architecture for genomic prediction 1 .

Rather than relying on a single model, they trained five replicates of the Basenji2 model, each with the same architecture and training data but differing only in their random initial parameters and the random sampling of training examples 1 .

DNA sequencing visualization

Key Findings: The Consistency Spectrum

The results revealed striking patterns in when these models agree—and when they don't.

>0.9

Reference Sequences

High model consistency with median correlation scores exceeding 0.9

~60%

Consistently Correct

Models agree and match experimental data on reference sequences

>50%

Variant Inconsistency

Model replicates made inconsistent predictions for eQTLs and personal genomes

Prediction Consistency Across Sequence Types
Sequence Type Model Consistency
Reference Genome High (median correlation >0.9)
eQTLs Low (>50% inconsistent predictions)
Personal Genomes Low (>50% inconsistent predictions)
Assay Type Median Consistency
CAGE Slightly lower
DNase/ATAC-seq High
TF ChIP-seq High
Histone ChIP-seq High

The Scientist's Toolkit

Essential Resources for Genomic Prediction Research

Basenji2 Framework

Deep learning architecture for genomic prediction

ENCODE/FANTOM Data

Large collections of functional genomics profiles

Model Ensembles

Multiple models with different random seeds

Poisson Mixture Model

Statistical approach for combining predictions

Tool Usage in Genomic Research

Beyond the Confidence Gap

The discovery of systematic uncertainty patterns in genomic prediction models has far-reaching implications for both basic research and medical applications.

Key Insight

The inconsistency in predicting variant effects suggests we should be cautious when interpreting individual results from these models 1 3 .

Rather than taking any single prediction at face value, researchers can use ensemble approaches to gauge confidence levels, similar to how the scientific team did in this study 1 3 .

This uncertainty characterization also points toward concrete strategies for improvement. Training models on more diverse genetic sequences—including data from multiple species—has already shown promise in boosting performance 6 .

DNA visualization

Embracing Uncertainty

The journey to perfectly decode genomic regulation continues, but acknowledging the confidence gap represents crucial progress.

By understanding where these powerful models fail—and where they succeed—we can use them more wisely.

The path forward isn't about eliminating uncertainty but about mapping its contours—knowing when to trust our genomic AI guides, and when to proceed with caution.

References