How Missing DNA and Protein Sequences Shape Life
Imagine reading a novel filled with thousands of words, yet noticing that certain simple combinations like "xpq" or "zbk" never appear—not just in this book, but in any book ever written. What if these "forbidden words" were missing because they caused some fundamental problem with the very fabric of language? This is precisely the mystery that scientists are unraveling in genomics with the study of nullomers and nullpeptides—short sequences of DNA and proteins that are completely absent from genomes and proteomes despite being theoretically possible.
The groundbreaking discovery that our genetic blueprint contains conspicuous absences has opened an entirely new window into understanding evolution, disease, and the fundamental rules of biology.
Research led by Ilias Georgakopoulos-Soares and colleagues, published in Genome Biology in 2021, systematically identified these missing sequences across thirty different species and made the startling finding that their absence isn't random—many appear to be purposely excluded by natural selection 1 . This revelation not only helps us understand what makes certain genetic sequences potentially harmful but has also led to innovative applications in cancer diagnosis and treatment that were unimaginable just a decade ago.
Nullomers are short DNA sequences that do not appear in a particular genome, despite being possible given the four-letter alphabet of DNA (A, T, C, G). Similarly, nullpeptides are absent amino acid sequences in an organism's complete set of proteins. Think of them as the genetic equivalent of "forbidden words" that evolution has seemingly banned from life's vocabulary.
The shortest nullomers in the human genome are 11 base pairs long, with only 104 such sequences missing 1 .
As sequence length increases, the number of nullomers grows exponentially—reaching approximately 40 million at 14 base pairs and a staggering 400 million at 15 base pairs 1 .
What makes these absent sequences particularly fascinating is that they're not just randomly missing. Research indicates that a significant proportion are under negative selection, meaning their absence provides some evolutionary advantage 1 7 . The most compelling evidence comes from examining where these sequences are missing—coding sequences and promoters show the strongest selection against nullomers, suggesting they might disrupt critical genetic functions 1 .
Why would nature go to the trouble of eliminating particular short DNA or protein sequences? Several compelling theories have emerged:
Some nullpeptides might produce proteins that interfere with essential cellular processes. When synthesized artificially and introduced into cells, certain nullomer-derived peptides have proven lethal to cancer cells 8 , suggesting they disrupt fundamental biological pathways.
Missing sequences might cause problems with DNA folding or protein shape, preventing molecules from functioning properly.
Nullomers in promoter regions could potentially disrupt the careful control of gene expression 1 .
Our immune systems might recognize these sequences as "non-self," triggering autoimmune reactions if they were produced 8 .
The strongest evidence supporting the functional significance of nullomers comes from their non-random distribution across the genome. They're significantly underrepresented in functionally important regions, and mathematical models show there are more nullomers than expected by chance alone 1 .
The 2021 study published in Genome Biology represented a watershed moment in nullomer research by conducting the most comprehensive analysis of these absent sequences to date 1 7 . The research team employed a multi-faceted approach to distinguish meaningful absences from random ones.
The researchers analyzed thirty eukaryotic species, identifying all nullomers up to 15 base pairs in length and nullpeptides up to seven amino acids. They examined these absences across different functional genomic categories: coding sequences, exons, introns, 5'UTRs, 3'UTRs, promoters, enhancers, and open chromatin regions 1 . To determine whether nullomers were absent by chance or selection, they developed a sophisticated scoring system called φN that combined three different metrics: single-base substitution frequency, genomic simulations, and evolutionary conservation across species 1 .
The investigation revealed that coding sequences and promoters contain the highest proportion of selected-against nullomers, underscoring their functional importance 1 . The researchers also identified 36,081 peptides up to six amino acids in length that don't exist in any known organism—dubbed "primes"—representing the most universally forbidden biological sequences 1 7 .
| Sequence Length (base pairs) | Number of Nullomers | Percentage of Possible Sequences |
|---|---|---|
| 11 | 104 | Not specified |
| 12 | Not specified | 0.26% |
| 14 | ~40 million | Not specified |
| 15 | ~400 million | 37.8% |
Identifying and studying these absent sequences requires specialized bioinformatics tools and approaches. Researchers in this field rely on a sophisticated array of computational resources:
| Tool/Resource | Function | Application in Nullomer Research |
|---|---|---|
| BLAST | Sequence similarity searching 6 | Identifying absent sequences by failure to match |
| DRAGEN | Secondary analysis of NGS data | Processing whole-genome sequencing data for nullomer detection |
| MEME Suite | DNA motif discovery and analysis 6 | Identifying patterns in nullomer distribution |
| Cloud Computing Platforms | Handling massive genomic datasets 9 | Storing and processing terabytes of sequence data |
| Human Pangenome Reference | Comprehensive human genomic variation | Distinguishing true nullomers from rare variants |
Next-generation sequencing technologies have been instrumental in nullomer research, with platforms like Illumina's NovaSeq X and Oxford Nanopore providing the comprehensive genomic data needed to confidently identify absent sequences 9 . The enormous computational demands of analyzing hundreds of millions of potential sequences have made cloud computing platforms like Amazon Web Services and Google Cloud Genomics essential for modern nullomer research 9 .
Additionally, artificial intelligence has begun playing a crucial role in genomic analysis. Tools like Google's DeepVariant use machine learning to identify genetic variants with remarkable accuracy 9 , helping researchers distinguish between true nullomers and sequencing artifacts—a critical distinction when studying sequences that by definition don't exist in reference genomes.
One of the most promising applications of nullomer research has emerged in cancer diagnostics. Since nullomers are generally absent from healthy human genomes, their appearance in tumors can serve as a powerful biomarker for disease. Researchers have dubbed these cancer-associated nullomers "neomers" 5 .
The diagnostic approach works because cancer cells accumulate somatic mutations—genetic changes that aren't inherited but occur during one's lifetime. When these mutations create sequences that match known nullomers, they provide a distinctive cancer signature. In a landmark 2025 study published in Communications Medicine, researchers analyzed 2,577 cancer genomes across 21 cancer types and demonstrated that neomer-based classifiers could distinguish tumor types with higher accuracy than state-of-the-art methods 5 .
Even more impressively, the team sequenced cell-free DNA from 465 individuals and found that neomers could detect lung and ovarian cancers with exceptional accuracy (AUC of 0.89 to 0.94) 5 .
This approach is particularly valuable for early-stage detection when tumors are small and produce very low levels of circulating tumor DNA—a scenario that has historically challenged conventional detection methods 5 .
If nullomers are potentially harmful sequences that evolution has eliminated, could they be harnessed as therapeutics? Surprisingly, research suggests the answer is yes—particularly in oncology, where disrupting cellular processes is precisely the goal.
In an innovative approach, researchers have begun testing nullomer-derived peptides as cancer-specific toxins. One such peptide, called 9S1R, is a scrambled version of a prime sequence (a peptide absent from all known organisms) that has been modified with five arginine amino acids to enhance its ability to enter cells 8 .
This multi-pronged attack on cancer cells—simultaneously disrupting energy production while stimulating immune recognition—suggests that nullomer-based therapies might offer advantages over conventional treatments that target single pathways. As the researchers noted, "Peptide based drugs are showing great promise in clinical studies" due to their "small size, specificity, effects on a broad range of cancers, low toxicity and low manufacturing cost" 8 .
The study of what's absent from our genome has evolved from a biological curiosity to a rich field with profound implications for understanding evolution and developing novel medical diagnostics and therapies. As one researcher aptly stated, nullomers provide "a sensitive, specific, and simple cancer diagnostic tool" while also helping "identify cancer-associated mutations in gene regulatory elements" 5 .
Their ability to distinguish between human populations suggests uses in these fields 1 .
Their conservation across species makes them valuable for phylogenetic classification 1 .
Their potential to stimulate immune responses opens possibilities for this application 8 .
Initial discovery of nullomers as a biological curiosity
Development of computational tools for systematic identification
Comprehensive 30-species study published in Genome Biology 1
First therapeutic applications in cancer models 8
Neomer-based cancer diagnostics demonstrated in clinical samples 5
As sequencing technologies continue to advance and more genomes are decoded, our catalog of nullomers will become increasingly complete, potentially revealing new biological principles and therapeutic opportunities. The systematic investigation of what nature has excluded from life's blueprint is reminding us that sometimes, what's missing can be just as important as what's present.