Protein Disorder, Gene Regulation and Omics Lab
Come join our lab in DGIST as a Graduate student
Protein Disorder, Gene Regulation and Omics Lab
Gene regulation is the remarkable ability of cells to control which genes are active and how strongly they are expressed in response to internal and external cues. Central to this process are transcription factors (TFs)—proteins that recognize and bind specific DNA sequences to control the expression of target genes.
Our research aims to uncover how TFs find their correct binding sites among thousands of possibilities and how their activity is fine-tuned to ensure accurate transcriptional control.
We are also fascinated by the evolution of transcription factors, exploring how subtle changes in their sequences or mechanisms have contributed to the emergence of new cellular traits and even new species.
Recently, we discovered that intrinsically disordered regions (IDRs) within TFs play a key role in determining their specificity, following a distinct amino-acid “grammar” that guides target recognition. [more]
Most proteins perform their functions by folding into well-defined three-dimensional structures. Yet many eukaryotic proteins—especially transcription factors—contain long intrinsically disordered regions (IDRs) that remain flexible but still perform highly specific functions. These regions challenge the traditional “structure-equals-function” paradigm of molecular biology.
Our goal is to decode the sequence logic that allows these flexible regions to carry out precise biological roles. We also study the advantages and vulnerabilities of IDR-based mechanisms compared with folded domains, particularly in contexts such as cellular stress and disease.
Recently, we showed that TF IDRs use dispersed hydrophobic residues within their otherwise hydrophilic sequences, as well as short linear motifs, to achieve genome specificity. [more]
Modern biology is being transformed by an explosion of data, driven by advances in omics technologies such as next-generation sequencing (NGS), CRISPR-Cas9 genome editing, and high-throughput computational analysis. These tools have made it possible to explore biological systems at an unprecedented scale and resolution.
In our lab, we actively use, optimize, and develop such technologies to tackle fundamental questions in gene regulation and protein function.
For example, we have developed a molecular timer that tracks biological processes on a minute scale, upscaled genome-wide profiling methods to increase their convenience and throughput, and improved a technique for identifying RNA-protein interactions transcriptome-wide. [more]
To fulfill their regulatory function, transcription factors (TFs) must bind specific genomic targets. While the DNA-binding domain (DBD) defines motif preference, this alone is insufficient—especially in large eukaryotic genomes, where TFs bind only a small fraction of thousands of motifs occurences. Any additional mechanism must also operate efficiently to support the rapid transcriptional responses observed in cells.
We aim to uncover the molecular basis of TF specificity and how it contributes to biological processes such as gene regulation and evolution. Using budding yeast as a model, we combine ChEC-seq, which profiles TF binding at base-pair resolution, CRISPR-Cas9–mediated gene editing for targeted manipulations, and bioinformatic data analysis to derive novel insights on a genom-wide scale and nucleotide / amino acid resolution.
TF specificity was long ascribed to DBDs, which reproduce full-length binding preferences in vitro. Yet in vivo, most motif occurrences are unbound, indicating additional determinants of specificity.
During my Postdoc, we created >100 endogenous truncation variants of >50 yeast TFs and mapped their binding using ChEC-seq. In most cases, binding similarity to the full-length protein decreased gradually with increasing truncation of the intrinsically disordered region (IDR), while the IDR alone often retained partial similarity. These results suggest that specificity determinants distributed throughout the IDR cooperate with the DBD to define in-vivo binding. Comparing promoters bound by IDR-only versus DBD-only variants further showed an IDR preference for highly regulated promoters. (Kumar, Jonas et al., 2023)
To identify the sequence grammar underlying this specificity, we analyzed two TFs, Msn2 and Gln3, by generating dozens of variants that altered amino-acid order (reshuffling) or composition (replacement/removal). Binding similarity was preserved upon reshuffling or substitution of hydrophilic or charged residues, but removing hydrophobic residues—notably L and F—reduced similarity to the wild type and increased resemblance to DBD-only variants. Together with complementary experiments, these findings support an IDR grammar in which dispersed hydrophobic “sticker” residues within a hydrophilic environment mediate specific TF–DNA interactions. We also identified a short linear motif (SLiM) that supplements this grammar by mediating interaction with another TF. (Jonas, Carmi, Krupkin et al., 2023; Hurieva, Kumar et al., 2024)
Building on these results, we are now engineering synthetic TF IDRs with tailored binding specificities and exploring the molecular mechanisms enabling IDRs to guide TFs to their genomic targets.
Most evolutionary changes in regulation arise from adjusting gene expression rather than inventing new proteins. While changes in TF binding sites can tune transcriptional networks, direct TF alterations are constrained. Gene duplication provides an exception, freeing one copy to evolve new functions.
By examining ~30 TF paralog pairs in S. cerevisiae and their non-duplicated orthologs, we observed a wide range of divergence. Typically, one paralog resembled the ortholog while the other diverged, suggesting neo-functionalization dominates over sub-functionalization. Additional experiments showed that changes in TF IDRs, not paralog competition, drive divergence. (Gera, Jonas et al., 2022)
We are now identifying the determinants of evolutionary capacity and how TF duplicates contributed to phenotypic diversification across the yeast lineage after whole-genome duplication.
Intrinsically disordered regions as facilitators of the transcription factor target search
F Jonas, Y Navon, N Barkai; Nature Review Genetics (2025) [pdf]
Disordered sequences of transcription factors regulate genomic binding by integrating diverse sequence grammars and interaction types
B Hurieva, DK Kumar, R Morag, O Lupo, M Carmi, N Barkai, F Jonas; Nucleic Acids Research (2024) [pdf]
The molecular grammar of protein disorder guiding genome-binding locations
F Jonas, M Carmi, B Krupkin, J Steinberger, S Brodsky, T Jana, N Barkai; Nucleic Acids Research (2023) [pdf]
Evolution of binding preferences among whole-genome duplicated transcription factors
T Gera, F Jonas, R More, N Barkai; ELife (2022) [pdf]
Most cellular functions are carried out by proteins that fold into defined unique 3D structures, which determine their molecular activity—hence the major impact of structure prediction tools such as AlphaFold on biological understanding. However, a substantial fraction of eukaryotic proteins, including many transcription factors and DNA repair components, contain long segments that remain intrinsically disordered. Despite lacking stable structure, these regions are essential for protein function, yet they fall outside the classical structure–function paradigm.
Although mechanisms such as liquid–liquid phase separation and induced folding have been proposed from individual examples, a general framework explaining how disordered regions operate is still missing.
Our goal is to develop a general understanding of IDR function that allows us to identify functional sequence elements and predict the roles of uncharacterized IDRs. To this end, we study IDRs across diverse protein families using high-throughput generation and phenotyping of sequence variants, combined with proteome-wide experiments and computational analyses, including both classical sequence comparison and machine-learning approaches.
After deciphering one such sequence grammar for transcription factors (see above), we are now extending these studies to other DNA-binding proteins and metabolic enzymes, aiming to reveal generalizable principles of IDR function across the proteome.
Intrinsically disordered regions as facilitators of the transcription factor target search
F Jonas, Y Navon, N Barkai; Nature Review Genetics (2025) [pdf]
Beyond RNA-binding domains: determinants of protein–RNA binding
I Zigdon, M Carmi, S Brodsky, Z Rosenwaser, N Barkai, F Jonas; RNA (2024) [pdf]
Disordered sequences of transcription factors regulate genomic binding by integrating diverse sequence grammars and interaction types
B Hurieva, DK Kumar, R Morag, O Lupo, M Carmi, N Barkai, F Jonas; Nucleic Acids Research (2024) [pdf]
The molecular grammar of protein disorder guiding genome-binding locations
F Jonas, M Carmi, B Krupkin, J Steinberger, S Brodsky, T Jana, N Barkai; Nucleic Acids Research (2023) [pdf]
Complementary strategies for directing in vivo transcription factor binding through DNA binding domains and intrinsically disordered regions
DK Kumar, F Jonas, T Jana, S Brodsky, M Carmi, N Barkai; Molecular Cell (2023) [pdf]
Most cellular functions are carried out by proteins that fold into defined unique 3D structures, which determine their molecular activity—hence the major impact of structure prediction tools such as AlphaFold on biological understanding. However, a substantial fraction of eukaryotic proteins, including many transcription factors and DNA repair components, contain long segments that remain intrinsically disordered. Despite lacking stable structure, these regions are essential for protein function, yet they fall outside the classical structure–function paradigm.
Although mechanisms such as liquid–liquid phase separation and induced folding have been proposed from individual examples, a general framework explaining how disordered regions operate is still missing.
Our goal is to develop a general understanding of IDR function that allows us to identify functional sequence elements and predict the roles of uncharacterized IDRs. To this end, we study IDRs across diverse protein families using high-throughput generation and phenotyping of sequence variants, combined with proteome-wide experiments and computational analyses, including both classical sequence comparison and machine-learning approaches.
After deciphering one such sequence grammar for transcription factors (see above), we are now extending these studies to other DNA-binding proteins and metabolic enzymes, aiming to reveal generalizable principles of IDR function across the proteome.
Beyond RNA-binding domains: determinants of protein–RNA binding
I Zigdon, M Carmi, S Brodsky, Z Rosenwaser, N Barkai, F Jonas; RNA (2024) [pdf]
ChEC-Seq: A Comprehensive Guide for Scalable and Cost-Efficient Genome-Wide Profiling in Saccharomyces cerevisiae
T Gera, DK Kumar, G Yaakov, N Barkai, F Jonas; Methods in Molecular Biology (2024) [pdf]
Measurement of histone replacement dynamics with genetically encoded exchange timers in yeast
G Yaakov, F Jonas, N Barkai; Nature Biotechnology (2021) [pdf