Knowing that a number of proteins have a similar content of the same AA means to be able to predict that the expression of these proteins will depend on the same environmental conditions controlling AA availability. From this consideration arises the need to create a tool capable of quickly calculating the AA content of proteins and compare groups of protein based on their AA content. The information gleaned from this comparison may help to solve many biological and biochemical questions.
Over the last decade, this body of work has progressively developed a composition-based view of the human proteome, demonstrating that amino acid content, far from being a passive consequence of protein sequence, encodes biologically and pathophysiologically relevant information linked to metabolism, tissue environment, and disease.
The 2013 PLOS ONE study (Human Protein Cluster Analysis Using Amino Acid Frequencies, ISSN: 1932-6203, doi: 10.1371/journal.pone.0060220) establishes the conceptual and methodological foundation of this research line. By representing human proteins as vectors of amino acid frequencies and applying hierarchical clustering to the entire proteome, the study demonstrates that amino acid composition alone is sufficient to generate biologically meaningful groupings.
Proteins with strong structural or extracellular roles cluster robustly, while multi-domain or functionally related proteins often diverge compositionally. This observation leads to a key hypothesis: amino acid composition reflects long-term metabolic and environmental constraints acting during protein synthesis, rather than merely functional similarity. This work introduces amino acid composition as an orthogonal informational layer to sequence, structure, and expression.
Building on this framework, the 2019 Royal Society Open Science study (ISSN: 2054-5703, doi: 10.1098/rsos.181891) introduces a specific mechanistic interpretation, focusing on the glutamate/glutamine (Glu/Gln) ratio as a marker of tissue oxygenation. Grounded in known metabolic pathways linking hypoxia to glutamine synthesis, the study demonstrates that proteins expressed in differently oxygenated cell populations exhibit systematic differences in Glu/Gln content.
Importantly, the analysis extends beyond individual proteins to chromosomal regions, revealing gene clusters encoding proteins with shared compositional biases consistent with hypoxic environments. This work marks a decisive transition from exploratory clustering to functional, physiology-linked interpretation, positioning amino acid composition as a potential biochemical indicator of tissue state.
The 2020 ANTIOXIDANTS article (The role of glutathione in protecting against the severe inflammatory response triggered by covid-19. ANTIOXIDANTS, Silvagno, Vernone & Pescarmona, ISSN: 2076-3921, doi: 10.3390/antiox9070624 ) extends the composition-based perspective into the realm of redox biology and acute inflammatory responses. Focusing on glutathione, a central cellular antioxidant whose level depends on both synthesis and amino acid availability, the study explores its protective role against the severe inflammatory response triggered by COVID-19. This work reinforces the broader conceptual theme in two ways: Biochemical resource allocation matters: Glutathione synthesis consumes cysteine, glutamate, and glycine. Under conditions of stress or infection, competition for these amino acids can influence redox balance and cellular resilience. Compositional constraints have functional consequences: Variables such as amino acid availability are linked to clinically relevant outcomes (e.g., inflammatory severity in viral disease), highlighting that metabolic environment and proteome composition are integrally connected to pathology.
Thus, the 2020 study bridges metabolic theory and immunopathology, showing how amino acid-dependent antioxidant capacity modulates disease course.
The chromosome walking methodology (Chromosome walking: A novel approach to analyse amino acid content of human proteins ordered by gene position. APPLIED SCIENCES, ISSN: 2076-3417, doi: 10.3390/app11083511) further expands the framework by incorporating genomic spatial context. By ordering proteins according to chromosomal gene position and applying a sliding-window enrichment analysis, this approach identifies local genomic regions encoding proteins enriched in specific amino acids.
Rather than clustering proteins globally, chromosome walking reveals regional coherence along chromosomes, suggesting that groups of neighboring genes may share metabolic or compositional constraints. The identification of glutamate-enriched regions, including loci associated with neurodevelopmental disorders, underscores the relevance of this approach for linking amino acid composition to genome organization and disease susceptibility.
The 2023 ANTIOXIDANTS paper (How the Competition for Cysteine May Promote Infection of SARS-CoV-2 by Triggering Oxidative Stress, ISSN: 2076-3921, doi: 10.3390/antiox12020483) extends compositional reasoning into the domain of redox biology and infection. Focusing on cysteine availability, the study highlights how competition for sulfur-containing amino acids can influence oxidative stress and viral replication.
Here, amino acid composition and availability are explicitly connected to cellular vulnerability under stress conditions, reinforcing the central theme that metabolic constraints on amino acid pools have direct functional and pathological consequences.
The 2024 IJMS study (Iron Overload in Brain: Transport Mismatches, Microbleeding Events, and How Nanochelating Therapies May Counteract Their Effects. INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, ISSN: 1422-0067, doi: 10.3390/ijms25042337) on iron overload in the brain further broadens the scope to metal metabolism and neurodegeneration. While not centered exclusively on amino acid frequencies, the work is conceptually aligned with the broader framework: it emphasizes how biochemical imbalances (iron transport mismatches, oxidative stress, microbleeding) shape tissue vulnerability and disease progression.
This study reinforces the idea that molecular composition and biochemical environment—including amino acids, metals, and redox state—must be considered jointly to understand complex pathologies such as neurodegeneration.
The most recent 2025 IJMS paper (Data Mining and Biochemical Profiling Reveal Novel Biomarker Candidates in Alzheimer’s Disease, ISSN: 1422-0067, doi: 10.3390/ijms26157536 ) represents a methodological and conceptual convergence of the entire research trajectory. Advanced data mining techniques are applied to biochemical and molecular profiles to identify novel biomarker candidates in Alzheimer’s disease.
This work translates the earlier composition-based and metabolism-oriented insights into a biomarker discovery framework, demonstrating clinical relevance. It reflects a shift from hypothesis generation to applied translational research, while remaining grounded in the central idea that biochemical composition carries diagnostic and prognostic information.