With Christine Heitsch. Preprint available here.
Recent research on RNA has repeatedly revealed the same phenomenon: after RNA folds into a three-dimensional structure, the ends of the strand are close to each other. We place these analyses into a combinatorial framework, and conclude that for several different models of branching structures the ends of the structures are almost certainly proximal. Ultimately, this implies that end proximity is a property of branching structures in general, as opposed to a special property of RNA. Using analytic combinatorics, our results are framed in terms of Dyck paths, Motzkin paths, and context-free grammars. We then give implications for pattern-avoiding permutations.
With Samuel Simon. Published in LIPIcs as part of the Proceedings for Analysis of Algorithms 2024. Journal. arXiv.
Lattice walks are used to model various physical phenomena. In particular, walks within Weyl chambers connect directly to representation theory via the Littelmann path model. We derive asymptotics for centrally weighted lattice walks within the Weyl chamber corresponding to A2 by using tools from analytic combinatorics in several variables (ACSV). We find universality classes depending on the weights of the walks, in line with prior results on the weighted Gouyou-Beauchamps model. Along the way, we identify a type of singularity within a multivariate rational generating function that is not yet covered by the theory of ACSV. We conjecture asymptotics for this type of singularity.
With Tristan Larson. Published in Séminaire Lotharingien de Combinatoire as part of the Proceedings for FPSAC 2024. Journal. arXiv.
We derive asymptotic formulae for the coefficients of bivariate generating functions with algebraic and logarithmic factors. Logarithms appear when encoding cycles of combinatorial objects, and also implicitly when objects can be broken into indecomposable parts. Asymptotics are quickly computable and can verify combinatorial properties of sequences and assist in randomly generating objects. While multiple approaches for algebraic asymptotics have recently emerged, we find that the contour manipulation approach can be extended to these D-finite generating functions.
With Jonathan Kariv and Noah Williams. Published in Mathematics of Computation, 2024. Journal. arXiv.
Consider coloring the integers {1, 2, ..., n} red and blue while trying to avoid 3 equally-spaced integers being colored the same. Van der Waerden's Theorem is a classic result in Ramsey Theory stating that for a fixed k and large enough n, you cannot avoid coloring k equally-spaced dots the same. But, how can you minimize the number of these monochromatic arithmetic progressions?
We prove a known 2-coloring of the integers [N]:={1,2,3,…,N} minimizes the number of monochromatic arithmetic 3-progressions under certain restrictions. A monochromatic arithmetic progression is a set of equally-spaced integers that are all the same color. Previous work by Parrilo, Robertson and Saracino conjectured an optimal coloring for large N that involves 12 colored blocks. Here, we prove that the conjecture is optimal among anti-symmetric colorings with 12 or fewer colored blocks. We leverage a connection to the coloring of the continuous interval [0,1] used by Parrilo, Robertson, and Saracino as well as by Butler, Costello and Graham, using BCG diagrams like the one on the left. Our proof identifies classes of colorings with permutations, then counts the permutations using mixed integer linear programming.
With Christine Heistch. Book Chapter, published in RNA Folding, Springer Protocals, ed. Ronny Lorenz, 2024. Springer. Direct download.
The structure of an RNA sequence encodes information about its biological function. Dynamic programming algorithms are often used to predict the conformation of an RNA molecule from its sequence alone, and adding experimental data as auxiliary information improves prediction accuracy. This auxiliary data is typically incorporated into the nearest neighbor thermodynamic model by converting the data into pseudoenergies. Here, we look at how much of the space of possible structures auxiliary data allows prediction methods to explore. We find that for a large class of RNA sequences, auxiliary data shifts the predictions significantly. Additionally, we find that predictions are highly sensitive to the parameters which define the auxiliary data pseudoenergies. In fact, the parameter space can typically be partitioned into regions where different structural predictions predominate.
With Stephen Melczer, Tiadora Ruza, and Mark C. Wilson. Published in Séminaire Lotharingien de Combinatoire as part of the Proceedings for FPSAC 2022. Journal. arXiv.
With the tools from Analytic Combinatorics in Several Variables, it is well-understood when multivariate arrays can be encoded in multivariate rational generating functions (GFs), and how to extract asymptotics from these generating functions. However, not much is known about how to extract asymptotics for more general GFs. Algebraic GFs are one step broader than rational GFs, and they encode arrays defined by recursive structures described by context-free languages, such as binary trees and constrained (random) walks; RNA secondary structures; applications of the kernel method, the quadratic method for maps, and generalizations.
We present a strategy for computing asymptotics of coefficients of d-variate algebraic generating functions. Using known constructions, we embed the coefficient array into an array represented by a rational generating functions in d+1 variables, and then apply ACSV theory to analyse the latter. This method allows us to give systematic results in the multivariate case, seems more promising than trying to derive analogs of the rational ACSV theory for algebraic GFs, and gives the prospect of further improvements as embedding methods are studied in more detail.
With Megan Bernstein and Michael Damron. Published in Stochastic Processes and their Applications, 2022. Journal. arXiv.
The Euclidean first-passage percolation model of Howard and Newman is a rotationally invariant percolation model built on a Poisson point process. It is known that the passage time between 0 and ne_1 obeys a diffusive upper bound: Var T(0, ne_1) <= Cn, and in this paper we improve this inequality to Cn/log n.
The methods follow the strategy used for sublinear variance proofs on the lattice, using the Falik-Samorodnitsky inequality and a Bernoulli encoding, but with substantial technical difficulties. To deal with the different setup of the Euclidean model, we represent the passage time as a function of Bernoulli sequences and uniform sequences, and develop several "greedy lattice animal" arguments.
With Christine E. Heitsch. Published in Bulletin of Mathematical Biology, 2020. Journal. arXiv.
A growing number of RNA sequences are now known to have distributions of multiple stable sequences. Recent algorithms use the list of nucleotides in a sequence and auxiliary experimental data to predict such distributions. Although the algorithms are largely successful in identifying a distribution's constituent structures, it remains challenging to recover their relative weightings. In this paper, we quantify this issue using a total variation distance. Then, we prove under a Nussinov-Jacobson model that a large proportion of RNA structure pairs cannot be jointly reconstructed with low total variation distance. Finally, we characterize the uncertainty in predicting conformational ratios by analyzing the amount of information in the auxiliary data.