Recca Holzel, Shelby Rheinschmidt, Zoya Ansari
The base pairs found in DNA can interact in non-traditional ways. When the sequence of As, Ts, Gs, and Cs are just right, a complex known as a G-quadruplex can form. Our goal is to write code that can scan through a given DNA sequence and identify the location of sequences that may give rise to G-quadruplexes.
The G-quadruplex is a unique secondary structure of DNA and forms when guanines participate in Hoogsteen base pairing, a type of non-canonical base pairing distinct from normal Watson-Crick base pairing. These structures usually need to be stabilized by a central metal[1].
G-quadruplexes (G4s) are known to be involved with transcriptional dysregulation and DNA damage and are associated with cancer and age-related diseases. G4 formation can lead to stalls in replication forks leading to DNA strand breaks leading to mutations and signaling dysregulation [20]. However, G-quadruplexes do not always have deleterious effects. They are often found in DNA promotor regions, suggesting they may play a role in regulating gene expression [3]. Thus, identifying G-quadruplexes is important not just for human health, but also for understanding how different organisms regulate protein production. We seek to identify how G4 formation varies among different organisms and how G4s are distributed within organisms.
Do some organisms have more G-quadruplexes in their genomes than others?
For organisms that have multiple chromosomes, like humans and yeast, are G-quadruplexes evenly distributed among chromosomes?
Does GC content predict G-quadruplex abundance?
Are G-quadruplexes mostly found within genes or outside genes?
Are there trends in which genes are most enriched for G-quadruplexes?