Decoding Tuberculosis

How Whole-Genome Sequencing is Revolutionizing Outbreak Tracking

"The Genomic Detective Story Unfolding Inside Our Lungs"

Tuberculosis (TB) remains the world's deadliest infectious disease, claiming 1.5 million lives annually. For decades, scientists tracked its spread using crude genetic "fingerprints" that revealed only fragments of Mycobacterium tuberculosis's secrets. Now, whole-genome sequencing (WGS) has transformed TB investigation into a high-resolution science, uncovering transmission chains invisible to traditional methods. By reading all 4.4 million letters of the bacterium's DNA code, researchers are exposing hidden outbreaks, predicting drug resistance, and ultimately saving lives 2 6 .


The Genomic Revolution

From Blurry Snapshots to 4K Resolution

Traditional TB genotyping methods like spoligotyping and MIRU-VNTR examined less than 0.01% of the genome—equivalent to judging a library by its front desk. WGS examines >90%, transforming our view of TB diversity:

Evolution of TB Genotyping Methods
Method Genome Coverage Key Limitation Cluster Detection Power
Spoligotyping 0.001% Misses subtle transmission links Low
MIRU-VNTR (24-loci) 0.01% Limited discrimination in outbreaks Moderate
Whole-genome sequencing >90% Computational complexity High (detects 2-5 SNP clusters) 5 8

Lineages Tell a Global Story

WGS reveals TB's family tree with unprecedented clarity. Nine human-adapted lineages exist, each with distinct geographies:

AncientLineages (L1, L5-L9)

Predominantly African and Asian strains

ModernLineages (L2-L4)

Cause 90% of global TB, including:

  • L2 (Beijing): Hypervirulent Asian strains
  • L4 (Euro-American): Dominant in the Americas and Europe, including outbreak-prone subtypes like LAM and Haarlem 2 6

Brazil's Santa Catarina state exemplifies this distribution—60% of strains belong to the LAM sublineage, with 44% clustered in ongoing transmission chains 6 .


Anatomy of a Genomic Breakthrough: The Arctic Outbreak

Unmasking Superspreaders in Iqaluit

When TB rates surged among Inuit communities in Arctic Canada, WGS exposed a transmission nightmare invisible to conventional epidemiology:

Methodology: Connecting the Genomic Dots

Sample Collection

140 M. tuberculosis isolates from 135 patients (2009-2015)

Sequencing

Illumina platforms sequenced entire genomes

Variant Calling

Single nucleotide polymorphisms (SNPs) identified using H37Rv reference

Transmission Threshold

≤3 SNPs defined genomic clusters

Epidemiological Integration

Overlaid social network data 1

Results: The Hidden Network

  • One massive cluster contained 62% of sequenced cases
  • Three superspreading events traced to a homeless shelter
  • Shelter-to-community spread confirmed
  • A nonsanctioned gambling house amplified transmission

Impact: The study exposed how socioeconomic factors fuel TB. The algorithm developed showed near-perfect reproducibility (κ=0.98), proving WGS's reliability for outbreak mapping 1 .

Drug Resistance Prediction via WGS (Shanghai/Russia Study)
Drug Sensitivity Key Resistance Mutations
Rifampicin 79.7% rpoB S450L, H445A/P
Isoniazid 86.3% katG S315T, inhA promoter
Streptomycin 88.4% rpsL K43R
MDR-TB Prediction 92.2% rpoB S450L + katG S315T combo

3


The Scientist's Toolkit: WGS Essentials

Research Reagent Solutions for TB Genomics

Core Components of a TB WGS Pipeline
Component Example Products/Tools Function
DNA Extraction CTAB method, Mag-MK kits Breaks tough mycobacterial cell walls
Library Prep Illumina TruSeq, Nextera kits Fragments DNA for sequencing
Sequencing Illumina HiSeq, iSeq100 High-throughput base calling
Alignment Burrows-Wheeler Aligner (BWA) Maps reads to reference genome
Variant Calling SAMtools, VarScan Identifies SNPs/indels
Lineage Assignment SnpEff, PhyResSE Classifies strains into lineages
Transmission Inference 5-12 SNP threshold Defines recent transmission clusters

3 4 9


Behind the Scenes: Challenges in the Code

The Variant-Calling Conundrum

Not all WGS pipelines agree. When five research groups analyzed the same German outbreak:

  • Pipelines identified 63–416 SNPs versus 85 validated SNPs
  • Only 55 SNPs were universally detected
  • Transmission links changed drastically: 0–80% of pairs appeared linked depending on bioinformatic choices 8
How Technical Choices Alter Transmission Inferences
Pipeline Variable Impact on Results Recommendation
Reference genome choice Lineage-specific biases Use lineage-appropriate references
SNP filters (stringency) More stringent = fewer SNPs/clusters Optimize for local diversity
Excluded regions (PE/PPE) Misses diversity in hypervariable genes Include with caution
Sequencing depth <20x coverage reduces sensitivity Maintain >50x coverage

4 8

Direct-from-Sputum Sequencing: The New Frontier

Bypassing culture accelerates diagnosis. In Spain:

  • 85% of smear-positive samples yielded sequences within a week
  • 57% were immediately linked to transmission clusters
  • Cost: €217/sample—potentially game-changing for resource-limited settings 9

The Future: Pathogen Genomics as Public Health Infrastructure

Beyond Outbreaks: Evolution in Action

WGS reveals how TB adapts to human populations. In New York City:

  • Endemic "C-strain" isolates acquired mutations in virulence genes (mmpL, esx)
  • Minimal drug resistance mutations—success came from improved transmission fitness, not drug evasion

Real-Time Surveillance is Coming

CDC's National TB Molecular Surveillance Center now sequences every U.S. isolate. Their approach:

wgMLST

2,672-locus genotyping scheme replacing older methods

Drug resistance surveillance

Screening for 300+ resistance mutations

Phylogenetic alerts

Automatic cluster detection

5


Conclusion: The Genomic Crystal Ball

Whole-genome sequencing has transformed TB from a ghost in the shadows to a pathogen we can track in real time. As one Inuit community study starkly revealed, outbreaks thrive where social vulnerability and pathogen evolution intersect. With costs plummeting and pipelines improving, WGS promises not just to describe outbreaks, but to prevent them—turning the tide on humanity's oldest plague.

In the Inuit study, three superspreaders seeded 62% of cases. Genomics exposed what interviews could not: the shelter walls and gambling dens where TB found its opportunity. Sometimes, the genome is the only whistleblower.

References