In 2023, I started working on the NSF Grant “Towards a unified theory of regulatory functions and networks across biological and social systems” (Award #2133863). Here, we developed a theory to explain the optimal regulation of complex systems such as genomes, cells, or societies. More recently, I have joined the Department of Environmental Studies at NYU as a part-time remote postdoc to work with Mingzhen Lu. We are investigating scaling phenomena in urban ecosystems. All these projects have in common not only the search for general quantitative principles but also the development of databases and applications that could be useful for the scientific community. 

 In the future, I plan to continue developing the following research areas:

i) Developing general theories from first principles, from genomes to societies. This includes developing integrated theories for a phenomenon at a specific level or organization, for example, cells, or finding common principles to build a theory that could be applied from genomes to ecosystems.

ii) Building databases, from recycling and integrating data. This is a byproduct of compiling data to find general statistical patterns or to test our theories. Often, we use the data we compile for specific predictions, but it is useful to develop a generic database that could be useful in alternative ways for the scientific community. Often our databases have the format of a set of tables that can be explored in a programming environment of a database management system.

iii) Building algorithms and apps to apply theories and compute statistics. Another byproduct of developing models and theories is code. We develop algorithms to test our theories and to make computations by integrating theories. For example, we have recently developed an algorithm to compute per-cell proteomic properties. Proteomic studies have traditionally focused on population-level analyses with an emphasis on the relative abundance of various proteins. Such studies have been useful in uncovering physiological differences across diverse species, the physiological response of individual species to distinct environmental conditions, and the function of individual proteins in the context of cellular networks. However, the absolute value of protein abundance in a cell is important for understanding single-cell physiology, detailed biophysical considerations, connecting diverse quantitative data, and for comparisons across species. Such detailed quantification will naturally occur as single-cell proteomics becomes more prevalent, but it is also of current interest to leverage population studies. There are several challenges here. First, most population studies do not measure the quantity of cells associated with a proteome. Second, recent work has shown that cell physiology radically shifts with cell size, and these effects need to be accounted for in going from population to single-cell estimates. Here we develop and implement a method to estimate the basic properties of proteomes, based on well-established scaling relationships among cell components, including genome size, cell size, and proteome volume. Our method estimates similar but higher total proteins per cell compared to previous theoretical and empirical estimations. Our algorithm has applications for interpreting proteomes, analyzing environmental samples, and designing artificial cells. While focusing on prokaryotes, we discuss how the method can be extended to unicellular and multicellular eukaryotes.  

Some of the ongoing scientific articles emerging from these projects are listed below.


Holehouse, J., Redner, S., Yang, V.C., Krapivsky, L., Arroyo, J.I., West, G., Youn, H., & Kempes, C. (2025) Unifying function diversity patterns across biological and social systems

 

Arroyo, J.I., Maass, A., Kempes, C., Marquet, P., West, G. (2025) A general model for genomic traits evolution

 

Arroyo, J.I., Lopez, A., Maass, A., Kempes, C., West, G., Marquet, P. (2025) Scaling patterns in thermal performances

 

Lopez, A., Arroyo, J.I., Kempes, C., West, G., Marquet, P. (2025). A database of thermal performances

 

Arroyo, J.I., Holehouse, J., Kempes, C, West, G. (2025) Scaling in the properties of distributions of cellular protein abundances in prokaryotes

 

Arroyo, J.I., Kempes, C., West, G. (2025) Scaling in the regulatory composition of proteomes

 

Arroyo, J.I., Kempes, C., Hernandez, J. West, G. (2025) Scaling of glia cells in nervous systems

 

Arroyo, J.I., Kempes, C., West, G. (2025) The minimal cost of rules in societies

  

Hernandez J, Arroyo J.I., Caron S, Kempes C, Lu M.(2025) A database of elementomes.

 

Arroyo J.I., Caron S, Hernandez J, Kempes, Lu M.(2025) Statistical properties of the elemental chemical composition of organisms and their environments

 

Arroyo, J.I. (2025) Diversification and genome length in endotherms