This is a little column I have thought out where I will share some highlights of what my mind has lately been occupied by. I am planning to post here every a couple weeks or so. The idea for this page was loosely inspired by John Baez's This Week's Finds.
The posts below are intended to be sketches of the ideas and as such may be imprecise at places. Either way I am happy to receive any comments or corrections regarding any of them.
I've decided to use LaTeX code on this page, even though due to limitations of Google Sites I cannot make it automatically render. To have it compiled you have to use some browser extension, e.g. TeX All the Things for Chrome.
I now write up the posts in LaTeX, which you can access by clicking the title of each section. The old posts have been ported to LaTeX too.
Subtitle: Stacky cohomology
Recent work on prismatization and geometrization of these concepts has put new perspectives on realizing cohomology theories in terms of various geometric objects (typically stacks) which in a suitable way capture this cohomology. Part of this story is quite classical, namely that of de Rham spaces and stacks, and last week I had the pleasure to present this topic at my friend's study group. I will provide the notes and recording of the talk on the relevant page later, for now, enjoy my more informal write up of these ideas.
13/04/2025
Subtitle: Limits where there's no limits
Last year I have taken a break from my studies, and in that time decided to dedicate some time to learn bits and pieces of other areas, which resulted in most of the TWIL posts below. The topic I spent the most time on however is trying to understand some of the details of Wiles's proof of Fermat's Last Theorem.
Through a happenstance, learning those things have turned out to be useful for the stuff I've been doing as part of my PhD in two places - first to give a talk at a study group a friend of mine has been giving on automorphic lifting theorems, and now actually for the research project I am undertaking, where Taylor-Wiles method played an instrumental role.
This proof is a wonderful piece of mathematics. I have been wanting to write something on the topic before, I have half-finished notes which go into a fair amount of technical detail, except for the part about modularity lifting (which is the most technical). Instead, I've decided to now write that part up as a post here, focusing on the ideas, but highlighting some of its important bits.
Since the TeX plugin I mention above is no longer functional, I have decided to write the post up in LaTeX and put it here. Over the coming weeks I may turn the other TWIL posts into PDFs too.
28/03/2025
Subtitle: Quantum certainty
[This is based on a post I wrote back in December]
Towards the end of the last post I have mentioned a class of problems related to probabilistically checkable proofs, which are known as interactive protocols - one considers an exchange between a verifier, usually restricted to be a (probabilistic) polynomial time machine, and one or more provers, which are computationally unlimited and are trying to convince the verifier of an answer to some question. The exchange proceeds by the verifier generating and sending some messages to the provers, who then respond with their answers. The verifier then decides to either accept or reject the input.
We think of the provers as trying to convince the verifier that the input is a valid instance of some decision problem. If this is indeed the case, then we require the verifier to always accept the input - the provers can be "honest" and provide true responses, which the verifier will be able to confirm. On the other hand, if the instance is not valid, then we have to make sure that the verifier can catch the provers if they try to "cheat" and provide invalid responses, at least with high probability.
We define the complexity class IP to consist of those decision problems which admit an interactive proof protocol with one prover, and the class MIP to consist of problems which admit such a protocol with multiple provers. We will refrain from defining these more formally, instead settling on some examples.
It should be fairly clear that the class NP is contained in IP - the prover can just provide the required witness which the verifier can check without any randomness or further exchanges. It is also not too difficult to see that if we do not permit randomness, then IP and MIP would in fact coincide with NP (we can simply supply the provers' responses ahead of time, since the queries will never change).
Let us now look at an example of a problem which is not known (and believed not to be) in NP: the graph non-isomorphism problem. Given two graphs $G_1,G_2$, we wish to be convinced that they are not isomorphic. This can be achieved via an interactive protocol as follows: first, the verifier secretly randomly chooses $i\in\{1,2\}$ and randomly permutates the vertices of $G_i$. They then send the resulting graph $G$ to the prover, who is to respond with which $i$ the verifier has chosen. If the graphs are not isomorphic, then the prover will always be able to unambiguously figure that out from $G$, as it can't be isomorphic to both $G_1$ and $G_2$. But if they are isomorphic, then the prover cannot determine $i$ from $G$, and so cannot convince the prover with more than 50/50 chance. By repeating this scheme multiple times, a dishonest prover will have negligible chance of getting lucky with their answers every time.
It turns out that the class IP is quite a bit larger than NP, and in fact coincides with the complexity class PSPACE. The proof is quite elegant, involving arithmetization of Boolean functions by polynomials of high degree, and then using the interactive protocols to evaluate them. The details can be found on my old blog.
Let us now consider the class MIP. It clearly contains IP, but having multiple provers who cannot communicate with each other gives us more possibilities in checking whether they may be cheating - we can cross check their answers against each other. For a simple example, suppose that we are trying to check that some graph $G$ is 3-colorable. At each step of the protocol, we randomly choose two vertices $v,w$ of $G$ which are either equal or connected by an edge, and then send $v$ to one of the provers and $w$ to the other. If $v,w$ are equal, we expect the provers to return as an answer the same color, otherwise we expect different colors. Since the provers don't know which case they are in, the first provision forces them to consistently describe the same coloring, and the latter lets us verify it is a valid coloring. This doesn't quite show the problem in question is in MIP - for invalid instances, the probability of success is smaller than $1$, but still very close to it, and we cannot easily make it negligible. However hopefully it still gets across the idea behind the utility of multiple provers.
We can show that MIP actually equals the class NEXP of problems which can be solved in nondeterministic exponential time. It is notable since, as mentioned above, if we remove randomness, we simply get the class NP, which we know is strictly smaller than NEXP, so this gives us a rare instance of a complexity class where randomness provably enlarges it.
It turns out that something very interesting happens if we assume that the provers in MIP are not completely independent. Allowing arbitrary communication between them makes them effectively act as one, returning us to the strength of IP. But if we allow them to share information only in some specific ways, then this can actually increase the power of possible protocols.
The specific class of interest is MIP*, where we have multiple provers which cannot communicate directly, but share some number of entangled qubits which they can measure, and use them to produce answers which are correlated with each other.
Now, at first it may seem counterintuitive that this could ever increase the strength - any shared information introduces more potential for cheating provers. However, the trick is that we can design more general protocols in which honest provers need to use this correlation to produce convincing responses. There are some fairly well-known instances of such setups in other contexts, like the magic square game, which cannot be consistently won by uncorrelated players, but quantum correlation can be used to draft winning strategies.
All that said, when determining the power of MIP*, we have two opposing factors: more general protocols, but ones in which dishonest provers have more opportunity to cheat. It turns out that with quantum entanglement, the former is more prevalent, and incredibly so - MIP* turns out to be as large as it could possibly be, giving the titular result that MIP* equals RE, the class of all recursively enumerable problems, those verifiable by Turing machines without any bound on their running time.
The crucial idea in the proof is compression, which lets us take an interactive protocol and then produce a new one in which one verifies exponentially large instances of the original protocol. A little more precisely, we shall consider an interactive protocol $P_n$, which decides some problem depending on some parameter $n\in\mathbb N$, e.g. "does a given Turing machine halt in $n$ steps?". Then (under appropriate assumptions) one can produce a new interactive protocol $Q_n$ in which the provers convince the verifier that the protocol $P_N$ succeeds, where $N = 2^n$.
The general idea for $Q_n$ is to exploit the shared quantum states between the provers to force them to perform the computation of $P_N$, and then certify the result of that computation to the verifier. Recall that in an interactive protocol, the interaction consists of the verifier randomly generating some questions for the provers, provers responding with their answers, and then the verifier concluding based on their responses. The questions and answers are themselves messages of length polynomial in the parameter $N$, so to perform the compression we need to exponentially reduce both the question size and the answer size.
For question reduction, instead of sending the provers exponentially long questions for $P_N$, in the protocol $Q_n$ we force the provers to themselves generate a suitable pair of questions, using the shared quantum bits. This is where entanglement plays the key role - the two questions that the provers generate will depend on each other, and so using the correlation of qubits we can make sure they produce questions which follow the same distribution as the ones given by the verifier in $P_N$. On the other hand, quantum uncertaintly can be exploited to make sure the provers cannot deduce each other's questions (to prevent them from conspiring). The quantum correlations turn out to be precisely enough to balance these two requirements.
Of course, since the provers are not trusted, we need to somehow make sure that the provers have generated their questions in the required manner. This is done through a Pauli basis test - we require the provers to return us some data based on the qubits they generate, and we check for correlation between them. These checks are not unlike the original Bell tests, and correlation between them can be shown to be possible iff the qubits were measured in suitable bases, which guarantees both that the provers have the right questions generated, and that they have collapsed any information which could have allowed them to find out the other's question.
Now the provers have their exponentially long questions, and can figure out their exponentially long answers. But they can't send them back to the verifier, so we need to perform the answer reduction - they need to convince the verifier that they have their answers, and that a verifier for $P_N$ would accept given these answers. This is more or less precisely what PCP theorems discussed in the previous post do - they provide ways in which one can certify an exponentially long computation, and that certificate can be probabilistically verified in polynomial time.
There is just one interesting wrinkle in this - since the verifier in $P_N$ needs both answers to perform their computation, and neither verifier has access to them both, they can't actually generate the certificate. To get around this, one uses a process called oracularization - we send one of the provers both questions and ask them to simulate both provers for $P_N$ and produce both answers, which gives them the data needed to produce the proof certificate for us to verify. In other to guarantee that the first prover is not cheating, we randomly select one of the questions and send it to the second prover, and see if their response matches one of the first prover's. Since the second prover cannot cheat with just one question, this lets us verify the provers' honesty. (The name "oracularization" refers to the fact this lets us treat one of the provers as a fixed oracle, with responses independent of the other prover.) This process works basically just as written for classical interactive protocols, and in suitable settings can be also used for quantum provers.
Combining all of these ingredients gives a compression algorithm, at least under a number of technical assumptions. For instance, we need to restrict what kind of probability distribution $P_N$ uses for questions (so that the verifier in $Q_n$ can force provers to follow it), and the oracularization step requires there to be an optimal strategy for $P_N$ which doesn't destroy too much information (so the second prover can actually generate the same response as the first prover). Luckily, one can pick those properties so that they are preserved by compression, which means this process can then be iterated.
In 2012, an MIP* protocol for NEXP has been found (by modifying the MIP protocol so that provers cannot break it using entanglement). In 2019, this compression has been first implemented to prove MIP* contains NEEXP, however these constructions were not general enough to permit iteration. Finally, these new results from 2020 now let you show that for any $k$-times exponential time problems are in MIP*. This is still a far cry from all of RE, but it turns out that compression is still enough for this general result, using recursion.
Specifically, the construction taking protocols $P_n$ to $Q_n$ is computable, so by Kleene's recursion theorem, given a Turing machine $M$, we can construct a protocol $P_n$ which works as follows: first, simulate $M$ for $n$ steps. If $M$ halts, then accept, and otherwise, run the compressed protocol $Q_n$ associated to $P_n$. (More precisely, what we get is a protocol $P_n$ which is equivalent, in some sense, to this simulated one.)
We can now inductively show that if $M$ halts, then all of the $P_n$ have successful provers. On the other hand, if $M$ doesn't halt, then for all $n$, $P_n$ has them iff $Q_n$ does. However, analyzing the compression algorithm shows that $Q_n$ requires exponentially more entanglement than $P_n$, so in this case we can deduce that it is impossible to have provers using finite amount of entanglement to produce a result which can convince the verifier. That is, we have produced an MIP* protocol for checking whether a given Turing machine halts.
Using this construction we can then produce a single MIP* for the halting problem, and hence any problem in RE. $\square$
This result has turned out to have some quite profound consequences even outside of quantum information theory, most notably in the disproof of the Connes' embedding conjecture in the theory of von Neumann algebras. The general idea is that such results, if true, would give one a way to compute the maximum probability of provers being able to convince a verifier in an MIP* protocol, which would imply all MIP* problems are decidable.
28/01/2025
Subtitle: Probably convincing evidence
[This is based on a couple posts I wrote back in November]
Complexity theory provides a theoretical framework for studying the idea of feasibility in solving the problems. The basic idea is that we consider problems which can be solved by some computational agent with limited resources, typically that being time restricted to a polynomial in the size of the input.
One of the most important classes of decision problems which are of interest is NP, which can be defined as follows: given an instance of a problem (e.g. some graph which we want to verify is 3-colorable), if the instance is satisfying, then it is possible to give some sort of evidence (e.g. the coloring itself) which can be verified and confirmed in polynomial time, while if the instance is not satisfying, then no such convincing evidence can be provided.
In various contexts, it may be of interest to modify exactly how such "evidence" is provided (giving what a priori may be completely different complexity class). For instance, we may want to permit such evidence which is exponentially long, and hence cannot be fully verified in polynomial time, and we demand that analyzing a suitable part of it can convince us of the answer. In order to not be redundant, we now allow the algorithm to be randomized, so that we can analyze different pieces of the evidence, and we demand that it leads to the correct conclusion with high probability.
These are called probabilistically checkable proofs, or PCPs. It turns out that we can impose quite severe restrictions on the algorithms, and still have those exist. Specifically, we have what is sometimes called the "exponential PCP theorem", which asserts that any NP problem admits PCPs from which only boundedly many bits are ever queried. That is, even though the proof is exponentially long, then by suitably picking very few of its entries to examine, we can ascertain its validity.
Let us make the statement of this theorem more precise: for any NP-complete problem there is a probabilistic polynomial time algorithm $V$ (called a verifier) which takes in two inputs $A,S$ (the instance of the problem and the evidence) with the following properties:
The number of entries of $S$ which $V$ will read is bounded independently of the inputs (but which elements we read can depend on the results of randomness).
If $A$ is a valid instance of the problem, then there is some bitstring $S$ such that $V$ will accept on input $A,S$ with probability $1$.
If $A$ is not a valid instance, then no matter what $S$ is, $V$ will reject the input with probability at least $\frac{1}{2}$. (we could replace this value with any positive constant)
Note that the first point is in stark contrast to how NP is defined, where we allow the entire string to be read. For probabilistically checkable proofs, we need to find some ways to introduce lots of redundancy such that while checking only a small fragment of this proof we can convince ourselves that it is valid.
To see how something like this is possible by modifying how we present our data, let us consider a toy example of trying to check that some two given strings $P,Q$ are equal. If we are given them as inputs directly, then it is impossible to do with high probability without checking all the bits - if $P,Q$ happen to only differ at one value, and ahead of time we don't know which, then unless we check all the bits, we can't do better than a random guess with any reasonable probability.
So instead of being given the strings $P,Q$, we can let our evidence string $S$ consist of their Walsh-Hadamard codes - if $P$ has length $n$, we let $\mathrm{WH}(P)$ to be the string of length $2^n$ which consists of all possible dot products of $P$ with another string of length $n$, and similarly for $\mathrm{WH}(Q)$. Now, the point is that if $P$ and $Q$ are at all different, even at just one entry, then $\mathrm{WH}(P)$ and $\mathrm{WH}(Q)$ will differ in at least half of their entries. So by querying a random entry, we will be able to tell them apart with probability $\frac{1}{2}$, and by repeating this a number of times, we can get the probability arbitrarily high.
Of course, in a PCP context, we cannot just trust that $S$ will consist of valid WH codes for whatever strings of interest. However, we can check whether it encodes some valid WH code - interpreting strings of length $2^n$ as functions $\mathbb F_2^n \to \mathbb F_2$, WH codes are precisely the linear functionals, and given such a function $f$, by querying random pairs $x,y$ and checking whether $f(x+y)=f(x)+f(y)$, we can with high probability check that $f$ is linear, or at least agrees with one for most of its inputs. Thus in our PCP algorithms, we can more robustly encode strings using their WH codes. It just then remains to further verify that the strings encoded provide evidence for our NP problem. With this said, we can get into more specifics.
It is enough to show that one specific NP-complete problem, and our choice is the problem of determining whether a system of quadratic equations over $\mathbb F_2$ is solvable or not (one can easily reduce 3-SAT to it). Given an instance $A$ of this problem, that is a system of quadratic equation in some number $n$ of variables, the evidence will, unsurprisingly, contain a WH code for a tuple $u=(u_1,\dots,u_n)$ which is claimed to solve the system. In order to check its validity, we will also include the WH code of $u\otimes u$, which is a length $n^2$ string whose entries are $u_iu_j$ for $1\leq i,j\leq n$.
We have already explained how we can, with a few queries, verify that these strings are valid WH codes of some strings $u,v$. We can verify that $v=u\otimes u$ using the WH codes as well - given two random length $n$ strings $x,y$, the dot product $(u\otimes u)\cdot(x\otimes y)$ is the product of dot products $u\cdot x,v\cdot y$, so using three queries we can check whether $\mathrm{WH}(v)_{x\otimes y}=\mathrm{WH}(u\otimes u)_{x\otimes y}$. As before, this lets us check $v=u\otimes u$ with high probability.
Lastly, it remains to check whether $u$ is indeed a valid solution to $A$. Checking any one equation can be done with a single query to $\mathrm{WH}(u\otimes u)$. This alone doesn't suffice if we want to keep the number of queries bounded, but for this we can bring the idea which lead us to consider WH codes - instead of just looking at one equation, we will look at a random linear combination of all equations. If $u$ is a valid solution, then it will also solve this equation, while if not, then we will notice that with probability $\frac{1}{2}$. Combining all of these steps, we get a probabilistic algorithm which, with high confidence, tells us whether $S$ encodes a valid solution to the system $A$, that is, solvability of systems of quadratic equations admits PCPs, and hence so does any NP problem.$\square$
Towards the start I have called the result we have just proven the "exponential PCP theorem". This is because it is an easy version of what is now a well-known result called the PCP theorem, which asserts that any NP problem admits PCPs as above which work with one additional restriction - the number of random bits they use is linear in the size of the input. This way, we may safely assume $S$ is polynomially sized (as all computation paths will only query polynomially many bits). Furthermore, this restriction also lets us deterministically check whether $S$ it is a valid PCP, so we get that problems admitting PCPs with this property are precisely the problems in NP.
The proof of the full PCP theorem is much longer and while it uses some of the same ideas as above, it also involves a lot of new and interesting, though very technical, ideas. I don't think I want to summarize them too thoroughly so I will only give the vaguest of sketches.
Existence of PCPs is related to approximation problem known as gap constrained satisfaction problem ($r$-GAP CSP, where $r<1$ is some fixed value) in which we consider strings satisfying some constraints, each of which depends on a bounded number of characters in the string, and we are trying to tell apart two possibilities: either there is a string which satisfies all the constraints, or for any string, the fraction of constraints it satisfies is bounded from above by $r$. It turns out that the PCP theorem is equivalent to the statement that, for some constant $r<1$, $r$-GAP CSP is NP-hard.
Note that if $r$ were allowed to vary with the size of the input, this would be trivial - we can encode a 3-SAT instance as a CSP, and if the number of clauses is $k$, then any assignment of values either satisfies all the clauses, or at most $\frac{k-1}{k}=1-\frac{1}{k}$ of them. Therefore we "just" need a way to increase this gap of size $\frac{1}{k}$ into one of constant size.
This process is called gap amplification, and there is a general method through which we can turn a CSP with a gap $1-c$ (for small enough $c$) into one with a gap $1-2c$, with only a linear increase in the instance size. Starting with $c=\frac{1}{k}$, we can then bring the gap to a constant size in logarithmically many steps.
The gap amplification is by far the most technical, and proceeds by applying a powering operation which keeps the variables the same but introduces new constraints depending on paths in a certain graph representing the original constraints. Idea being that if some constraint were not satisfied in the original CSP, then there will be lots of paths witnessing that. The formal proof depends on making the graph in question quite regular, namely an expander, and that alone should indicate the intricacy in formally proving the relevant properties. With the gap amplification in place though, the whole proof falls in place without too much issue.
With the PCP theorem proven, we can take this in a number of other directions, most basic being figuring out how many bits are necessary (I believe it is known to be 3, but I'm not certain). One similar result is one which asserts that if we drop the restriction on the number of random bits, then we can in fact provide PCPs for all NEXPTIME problems (the analogue of NP with exponential time). It's easy to see the converse inclusion even if we do not restrict boundedness of queries, so we get a characterization of NEXPTIME this way. One can also consider an "adaptative" version known as interactive protocols, in which rather than consulting a fixed string, the verifier can ask questions to some computationally unbounded provers and try to be convinced by their answers. If we use multiple independent provers, then one can deduce from the above result that the resulting class also equals NEXPTIME (we can query the provers about contents of a potential string $S$, and we need multiple of them to be sure no cheating occurs), while with one prover the class equals PSPACE - a result I have once written about on my now-defunct blog.
Except for the last paragraph, I have primarily learned these results, and a lot of other interesting complexity theory stuff, from Arora and Barak's book, where all the details are explained in detail. There were several other posts on complexity theory I was planning to make, but I am not sure I will get around to them. One, however, which uses PCP theorems to ultimately give a very intriguing result, should be coming up in the next few days... things will get quantum.
26/01/2025
Subtitle: Choiceless pathologies
[This post is based on a couple of write ups I started back in November]
Axiom of choice is something most mathematicians will have heard of, but it's importance and prevalence throughout all of mathematics is often lost. There are many statements that are taken for granted, and they sometimes use choice in ways which are not entirely clear. For instance, let us consider the following statements, familiar from abstract algebra:
Fact 1: Every field is contained in an algebraically closed field.
Fact 2: Any two algebraic closures of a field are isomorphic.
Fact 3: Every principal ideal domain is either a field or contains irreducible elements.
It turns out that all three of these statements need the axiom of choice in their proof, in the sense that they are not provable in the Zermelo-Fraenkel set theory (ZF). This means, that as far as ZF is concerned, it is possible to have a field with no algebraic closures, one with multiple distinct closures (in fact, $\mathbb Q$ can be such a field), or a ring which is a nontrivial PID but without any irreducible elements - in particular, it cannot be a unique factorization domain!
The way such unprovability results are proven is by constructing models of ZF in which these statements are false. However, unlike algebraic structures like groups, constructing such models is difficult (and in some sense, impossible). Instead what we have to do is begin with some existing model, and modify it to gain properties we wish it to have. These methods are very technical and I shall not describe them in much detail, but it consists of two steps: forcing, which enlarges the model by adding new "generic" elements (in some ways not unlike adjoining indeterminates to a ring), and then a restriction to a submodel. I don't know a good basic reference which would cover what is needed to follow this post (all the necessary details are covered in Jech's Set Theory, but it's not a very friendly introduction), but I will try to give the gist of the ideas involved. I will also sweep under the rug all the technical details. Full proofs are given in two papers by Hodges (Sections 3-4 in this one, and entirety of this one).
Let us now describe the framework of the constructions. We start with some model $M$ of ZFC, and some countable algebraic structure $A$ inside $M$. The idea is to take $A$ which has a lot of automorphisms, and our goal is to then use them to construct a larger model $M[G]$ containing a structure $A'$ isomorphic to $A$, and then find a smaller model $N(A)$ so that $N(A)$ contains $A'$, but all of its automorphisms are not there.
As mentioned, $M[G]$ is constructed using a method of forcing. For each element $x$ of $A$, one adjoins a new set $g_x$. The collection $G$ of all these $g_x$ is "generic" which, among other things, means that they are "indistinguishable", more formally meaning that any permutation of the elements $g_x$ comes from an automorphism of the entire model $M[G]$. We can now let $A' = \{g_x | x \in A\}$ and equip it with an algebraic structure such that the map $A\to A',x\mapsto g_x$ is an isomorphism. By genericity, any automorphism of $A'$ then arises from some automorphism of $M[G]$.
Now we want to restrict our model. Roughly, we take $N(A)$ to be the subset of $M[G]$ which consists of those sets which are definable using $A'$ and some finite set of its elements as parameters. We call this finite set the support of a set in $N(A)$. In particular, we will have $A'$ in $N(A)$ but, typically, the isomorphism $A\to A'$ will not be there. In fact, they can be structurally quite different - even if $A$ we started with had many automorphisms, $A'$ will usually have only a few. Specifically, $A'$ has the following property, which Hodges calls $N(A)$-symmetry: given a subset $R\subseteq A'^n$ which is an element of $N(A)$, then there is some finite subset $T\subseteq A'$ such that any automorphism of $A'$ which fixes all elements of $T$, will preserve the subset $R$. In essense, $T$ is just the support of $R$ in the sense above - any automorphism of $A'$ fixing $T$ will extend to an automorphism of $M[G]$ fixing all the parameters in definition of $R$, which you can argue implies $R$ will be fixed.
To summarize, given a countable structure $A$ in the model $M$, we have constructed a model $M[G]$ with a structure $A'$ isomorphic to $A$ and which is $N(A)$-symmetric for some submodel $N(A)$. Knowing this is sufficient to reach the desired conclusions. Let us begin by explaining how this can be used to find a model in which Fact 2 fails.
We take $A=\overline{\mathbb Q}$, the standard algebraic closure of $\mathbb Q$ (which can be constructed for instance as the set of algebraic numbers in $\mathbb C$). In $M[G]$, $A'$ is isomorphic to $A$, hence also is an algebraic closure of $\mathbb Q$, and the same will be still true in $N(A)$ (many properties like being algebraically closed are absolute, meaning they do not change when we pass to a smaller or larger model). We just need to show that in $N(A)$, $A$ and $A'$ are not isomorphic. To see that, we just need to note that $A$ itself is not $N(A)$-symmetric - because it is countable, we can still construct a lot of its automorphisms, and in particular if we consider the subfield $K=A\cap\mathbb R$, then for any finite subset we will be able to find an automorphism of $A$ fixing it but not preserving $K$. In fact, it turns out that $A'$ is rigid in $N(A)$ (symmetry implies that if this were the case, then $A'$ would be an abelian extension of a normal extension of a number field, which you can show is impossible). Thus in $N(A)$, $\mathbb Q$ has two distinct algebraic closures.
Now let us show a failure of Fact 1, by giving a field with no algebraic closures at all: we begin with $A = \mathbb Q(X_1,X_2,...)$, the field of rational functions in infinitely many indeterminates. Then in $N(A)$, there is no field $F$ containing $A'$ in which each $g_{X_i}$ is a square. Indeed, any such field has support which involves only finitely of the $g_{X_i}$, so we can take some $g_{X_a}$ which isn't among them. If $y$ were some square root of $g_{X_a}$ in $F$, then there again is some other $g_{X_b}$ which doesn't appear in their support. Now consider an automorphism of $A$ which maps $X_a$ to $X_b^2 X_a$ and fixes all other $X_i$. The image $z$ of $y$ under the induced automorphism in $M[G]$ automorphism is either $g_{X_b} y$ or $-g_{X_b} y$. But now we can consider an automorphism of $M[G]$ induced by mapping $X_b$ to $-X_b$, which should preserve $y$ and $z$ (since it fixes their supports - support of $z$ is given by replacing $X_a$ by $X_b^2X_a$, which is also fixed), but it takes $g_{X_b}$ to $-g_{X_b}$, which is a contradiction. Thus we get that $A'$ is not contained in any algebraically closed field.
Finally, we are left with failure of Fact 3, which is arguably the most interesting one - both for the statement itself, but also how it is constructed, which involves some very nontrivial algebraic number theory.
Let us begin with some number field $K_0$ with its ring of integers $R_0=O_{K_0}$. Let $K_1$ be the Hilbert class field of $K_0$ - the maximal unramified abelian extension of $K_0$. Any principal prime ideal in $R_0$ splits completely in $K_1$, which means in particular that if $K_1\neq K_0$, no element of $R_0$ is prime in $R_1$. Further, Hilbert's principal ideal theorem asserts that any ideal of $R_0$ becomes principal in $R_1$ (this does not mean that $R_1$ is a PID - it will have other ideals which need not be principal). Therefore in some sense, passing from $K_0$ to $K_1$ brings us closer to our goal.
The key is then to iterate this construction - recursively define $K_{n+1}$ to be the Hilbert class field of $K_n$. By Golod-Shafarevich theorem there exist number fields $K_0$, for instance $\mathbb Q(\sqrt{-30030})$, such that this sequence does not stabilize. Letting $R_n=O_{K_n}$ and $R=\bigcup_n R_n$, we then get a ring which has no irreducible elements, and any finitely generated ideal is principal (as its generators lie in some $R_n$; letting $I$ be the ideal in $R_n$ they generate, we then have $IR_{n+1}$, and hence $IR$, principal). $R$ will also have some infinitely generated ideals, however by now applying the constructions discussed above with $A=R$, it turns out that we get a ring $A'$ which by absoluteness has the same properties mentioned, but $N(A)$-symmetry will imply that it is in fact Noetherian, hence a principal ideal domain. We really just need to explain the last result.
The statement follows from the following purely algebraic statement: if $I$ is an ideal of $R$ which has support (which, recall, means a finite subset of $R$ such that any automorphism fixing it, fixes $I$ as a set), say contained in $R_i$, then I is generated by $I \cap R_{i+1}$. Since R_i+1 is Noetherian, we indeed get $I$ is finitely generated.
In fact we will prove something stronger: for any $x \in I$, say $x \in R_j$ with $j>i$, there is some $y \in I \cap R_{i+1}$ such that $x\in yR$. This we prove by contradiction - assuming otherwise, we will recursively find sequences $x_n \in R_j, a_n \in R_{i+1}$ such that $x_0=x, a_n \mid x_n$ in $R_j$ and $x_{n+1} = x_n/a_n$. If such a sequence were to exist, then the ideal generated by all the $(x_n)$ will contradict Noetherianness of $R_j$.
Let $I_n$ be the colon ideal $I : y=\{z\in R\mid yz\in I\}$ for $y=x/x_n=a_1\dots a_n$. If $I_n$ were to be all of $R$, this means $y \in I cap R_{i+1}$, and then $x = x_n y \in yR$, which we assumed doesn't happen. So $I_n$ is proper ideal, and $I_n \cap R_j$ is contained in some prime ideal $P$ of $R_j$. Let $Q = P \cap R_i$. It is standard that any other prime $P'$ of $R_j$ containing $Q$ is related to $P$ by some automorphism of $R$ over $R_i$. Now, since $I_n$ has support in $R_i$ (because $I$ does, and one can check by induction the same for $I_n$), it is preserved by automorphisms, so $I_n \cap R_j$ is also contained in $P'$. Now, because $K_j/K_i$ is unramified, the intersection of all these primes is precisely $R_j Q$, which by the principal ideal theorem is generated by some $a_{n+1}$ in $R_{i+1}$. Therefore $a_{n+1}$ divides $x_n$, so we can define $x_{n+1} = x_n/a_{n+1}$ and the construction is done.
What I find the most interesting about this construction is quite how much of the nontrivial theory of Hilbert class fields it requires - it's not enough to have some random infinite tower in which all principal ideals become principal and non-prime, but we truly have used that it is unramified in order to show that the ideals with support are finitely generated. I would be curious to what extent this can be weakened - perhaps it is enough that no ideal in the tower stays totally ramified (which would produce an easily defineable infinitely generated ideal, like that generated by all $2^{1/n}$ in $\mathbb Q[2^{1/n}\mid n\in\mathbb N]$).
12/01/2025
Subtitle: Variational order
[This post is based on a short post I wrote back in August. In case anyone was curious, yes I was reading Cixin's The Three-Body Problem at the time. This is the first time I have seen a scientific article referenced in a footnote of a fiction book. Merry Christmas everyone!]
The two-body problem in Newtonian gravity is perfectly well understood. If we "normalize" a configuration so that the center of mass is fixed, the orbits are given precisely by conics, and in particular you get a large variety of elliptical periodic orbits.
The three-body problem, on the other hand, is a famous example of a chaotic dynamical system, where small perturbations of orbits can cause large deviations with time, even in the simplest case of equally massive bodies. There are obvious examples of stable orbits, like three bodies moving in a circle about a fixed point, but existence of "nontrivial" such orbits has been pretty mysterious, especially when it comes to rigorously establishing their existence.
One of the most notable proofs of this sort was given by Chenciner and Montgomery, which provides an elegant orbit in which the three objects all trace out the same figure-eight-shaped curve in a plane. Like many arguments of this sort, it uses variational methods. The proof is rather technical but uses some quite clever ideas, which is a good excuse to write a post exposing some of them. I highly recommend checking out the original papers, which contains some useful figures which I do not reproduce here.
First, thinking about what such a figure-eight orbit ought to look like, we distinguish two types of configurations: Euler configurations in which one of the bodies passes through the center of mass, which makes the positions are centrally symmetric, and isosceles configurations, which are symmetric in one of the symmetry axes of the figure eight. Furthermore, the entire orbit is going to be quite symmetric in both space and time - starting with an Euler configuration, it will after some time reach an isosceles configuration, and then after equal length of time it will reach a symmetric Euler configuration. After doing so again, the configuration will reach the starting configuration, just with the bodies cyclically permuted. This way, we see that the whole periodic orbit is determined by its twelfth, going from an Euler configuration to an isosceles configuration.
We now introduce the variatonal machinery. Let $\mathcal X$ be the set of all possible configurations of three bodies in $\mathbb R^2$ with center of mass at the origin. Now, for some fixed $T$, let $\Lambda$ be the space of all possible paths $q:[0,T]\to\Lambda$ with the property that $q(0)$ is an Euler configuration and $q(T)$ is an isosceles one. There is no constraint on these paths besides continuity, and we wish to show that there is one such path the configuration evolves following the laws of Newtonian gravity.
The Euler-Lagrange equations, which are a cornerstone of the calculus of variations, give us that this condition is equivalent to minimization of the action, which is given by an integral of the form
$$\mathcal A=\int_0^T L\,dt$$
where $L=L(q(t),q'(t))$ is a function called the Lagrangian, which in this case is given in terms of the kinetic and potential gravitational energy. Standard results imply that there indeed exists a minimizer of this action. The main issue is that we have to guarantee this minimizer is a collision-free trajectory.
This is done by showing a lower bound for action of any trajectory which does admit a collision. Indeed, this action has to be greater than $A_2$, which is the action of a two-body collision between two bodies which start at rest and collide at time $T$. Indeed, by first moving one of the masses towards infinity, we reduce the action while keeping the collision, so we can assume we have just two bodies. One can then further see how "simplifying" this collision until one has the one described above, showing $A_2$ as the minimum.
Therefore, to show that the action-minimizing path is non-colliding, it is enough to show that there is some path of action smaller than $A_2$. Some such curves are given by equipotential test paths, which are characterized by travelling at a constant speed in the configuration space within the locus of constant potential. It turns out that this is equivalent to an inequality $\ell_0<\frac{\pi}{5}$ on the length of these paths. This is by far the most technical part, where one writes it an explicit integral using coordinates on a certain reduced configuration space, formed by quotienting by rotations, which is then numerically estimated.
We therefore have a collision-free trajectory following Newton's laws going from an Euler configuration to an isosceles one. As mentioned towards the start, this is what we expect a twelfth of a full periodic solution to look like, so we can stitch 12 symmetric copies of this path to get a supposed solution. This is easily seen to perfectly trace out a path in the reduced configuration space. The only wrinkle remaining is that, since this only taken modulo rotations, the actual trajectory might not be periodic, but rather actually only returns to a rotated version of itself, exhibiting some sort of precession. Using something called the area rule this can be computed in terms of the behavior in the reduced space, and can be shown to not occur. We thus conclude existence of the periodic figure-eight orbit. $\square$
Calculus of variations is a topic I never properly learned, but through my passing interests in (mathematical) physics I have gained a lot of appreciation for it, so am always happy discovering tidbits like this which explain their use.
25/12/2024
Subtitle: Chaos among primes
[This is based on a short write up I made all the way back in May. I wasn't planning to expand it originally, but it's a nice and ultimately quite simple idea so decided to put it here.]
One of the founding results of analytic number theory is the prime number theorem (PNT), which asserts the number of primes up to some bound $x$, $\pi(x)$, is asymptotic to $\frac{x}{\log x}$. A heuristic which is captured by this result is that if you pick a random number of size around $x$, the probability of it being prime is approximately $\frac{1}{\log x}$.
This is the foundation of Cramer's model, which, together with some refinements, has been useful in forming conjectural predictions for the finer structure of prime numbers, especially in patterns like prime tuples. One thing in particular that comes easily from this model is the distribution of primes in short intervals. Specifically, for some $x\geq y\geq 0$, how many primes are there between $x$ and $x+y$? The heuristic suggests that this ought to be about $\frac{y}{\log x}$, but the question is under what assumptions on $y$ this asymptotic is actually valid, not merely on average.
Of course, $y$ cannot be too small. If we take $y=1$, then $\pi(x+y)-\pi(x)$ is either $0$ or $1$ for any $x$, so while it is $\frac{1}{\log x}$ on average, we don't have an asymptotic $\pi(x+y)-\pi(x)\sim\frac{y}{\log x}$. Existence of long prime gaps also implies that this asymptotic fails for $y=\log x$ (by PNT) or slightly larger.
On the other hand, again from PNT, the asymptotic $\pi(x+y)-\pi(x)\sim\frac{y}{\log x}$ does hold if we assume $y\geq cx$ for some constant $c>0$. Lots of later work has shown that we can even take $y\geq x^c$ for some constants $c$, which have since been pushed down to $0.525$ by Baker, Harman and Pintz. One is naturally lead to ask how far down this can be pushed.
Cramer's model suggests an answer: probabilistically, we have that with probability $1$ the asymptotic should hold whenever $y$ is asymptotically larger than $(\log x)^2$. However, to a major surprise of the community at the time, Maier has shown this to be not the case, even for $y$ as large as $(\log x)^A$ for any $A$. More precisely, he has shown the following:
Theorem: For any $A>1$, setting $y=(\log x)^A$, we have
$$\liminf_{x\to\infty}\frac{\pi(x+y)-\pi(x)}{y/\log x}<1<\limsup_{x\to\infty}\frac{\pi(x+y)-\pi(x)}{y/\log x}.$$
The method of proof is what has since became known as Maier's matrix method. Instead of looking one short interval, we consider the array of numbers $Qa+b$ for $x<a<2x$ and $0<b<y$, where $Q$ is some integer to be chosen later. On one hand, this is a union of $x$ intervals of length $y$, so in order to show that there is a bias in how many primes are contained in one of them, it is enough to show some bias in the entire matrix.
On the other hand, this array is a union of $y$ of arithmetic progressions of length $x$ and difference $Q$. For $x$ large relative to $Q$, explicit versions of PNT in arithmetic progressions says that each of those contains an expected number of primes, roughly $\frac{x}{\varphi(Q)\log x}$, as long as $b$ is coprime to $Q$. It turns out that the ultimate bias will come from counting such $b$.
Let us elucidate what this bias looks like. The $Q$ we take will be a product of primes up to some bound $B$. We are thus lead to count the number $\Phi(y,B)$ of integers $0<b<y$ with no prime factors below $B$ - such numbers are commonly called $B$-rough, or $B$-quasiprime. When $x$ is large enough compared to $B$, namely superlinear in $Q\approx e^B$, we have an asymptotic
$$\Phi(x,B)\sim\frac{\varphi(Q)}{Q}x=x\prod_{p<B}\left(1-\frac{1}{p}\right).$$
Further, when $B$ grows, Mertens' third theorem asserts that this product grows like $\frac{e^{-\gamma}}{\log B}$.
When $x$ is not as large, namely is on the order a power of $B$, we have the following intriguing result:
Theorem (Buchstab): For any $c>1$ we have
$$\Phi(x,x^{1/c})\sim\omega(c)\frac{x}{\log x^{1/c}},$$
where $\omega$ is the Buchstab function.
I will not give the precise definition of $\omega$ here, but we will outline the proof of this result at the end of the post. The crucial property is that $\omega(c)$ tends to $e^{-\gamma}$ as $c\to\infty$ (as can be deduced with some work from the asymptotic mentioned before) but not monotonically - indeed, for any $a>1$, $\omega(c)-e^{-\gamma}$ takes both positive and negative values on the interval $[a,a+1]$. This oscillatory behavior is precisely what Maier's theorem exploits.
Let us now put all these pieces together. First we shall only show that the $\limsup$ in question is greater than $1$. Let us pick some $A'>A$ such that $\omega(A')>e^{-\gamma}$.
One uniform version of PNT in arithmetic progressions asserts that a progression of the form $Qa+b$ for given coprime $Q,b$ will contain an expected number of primes as long as $x>Q^D$ for some absolute constant $D$, provided $Q$ is a good modulus, meaning there are no Siegel zeros of $L$-functions of conductor $Q$. Though not all $Q$ are known to be good, we know "most" of them are. We will not elaborate on this much further.
Now, finally, let $B$ be arbitrary, $Q=\prod_{p<B}p$, $x=Q^D$, and $y=B^{A'}\sim(D\log x)^{A'}$. The entirety of the above discussion implies that the number of primes in the matrix $Qa+b,x<a<2x,0<b<y$ is asymptotic to
$$\frac{x}{\varphi(Q)\log x}\Phi(y,B)\sim\frac{x}{\log x}\frac{y\omega(A')}{\log B\varphi(Q)}\sim x\frac{y}{\log x}\frac{\omega(A')}{e^{-\gamma}}.$$
It follows that one of the $x$ intervals of length $y$ contains more than an expected number of primes by a factor of at least $\frac{\omega(A')}{e^{-\gamma}}>1$, and since $y>(\log x)^A$, the same ought to be true for some subinterval of length $(\log x)^A$. This, together with an analogous reasoning using a value satisfying $\omega(A')<e^{-\gamma}$ concludes the proof of Maier's theorem. $\square$
This failure of the probabilistic model has put into question a lot of prior assumptions about the "pseudorandom" distribution of primes, and various ideas have been put forward for how to further refine this model to give more accurate predictions. I will not elaborate on this much more, instead referring to a short article by Pintz which discusses this and some other aspects of Cramer's model.
To finish off the post, let me outline the proof of Buchstab's theorem, as I have found it to be not very commonly given, and try to indicate why it has the oscillatory behavior we needed. The starting point is Buchstab's formula: for any $B\leq C\leq x$, we have
$$\Phi(x,B)=\Phi(x,C)+\sum_{B\leq p<C}\sum_{k\geq 1}\Phi(x/p^k,p).$$
This is immediate by grouping numbers according to their least prime factor. Since $\Phi(x/p^k,p)\leq x/p^k$, the terms with $k\geq 2$ in total contribute $O(x/B)$. This opens a way to recursively providing an estimate.
First, when $B\geq\sqrt{x}$, $\Phi(x,B)$ is simply the number of primes between $B$ and $x$, which is $\sim\frac{x}{\log x}$ as soon as $B=o(x)$. When $B=x^{1/c},1<c\leq 2$, this can be written as $\omega(c)\frac{x}{\log B}$ where $\omega(c)=\frac{1}{c}$. Otherwise, letting $C=\sqrt{x}$, we have
$$\Phi(x,B)\sim\Phi(x,\sqrt{x})+\sum_{B\leq p<\sqrt{x}}\Phi(x/p,p).$$
If now $B=x^{1/c}$ for $c>2$, then for any $B\leq p<\sqrt{x}$ we have $p=(x/p)^{1/c'}$ for $c'=\frac{\log x}{\log p}-1\leq c-1$. Assuming we have established Buchstab's theorem for such $c'$ (in a suitably uniform manner), this gives
$$\Phi(x,B)\sim\frac{x}{\log x}+\sum_{B\leq p<\sqrt{x}}\frac{x/p}{\log p}\omega\left(\frac{\log x}{\log p}-1\right)=\frac{x}{c\log B}\left(1+\sum_{B\leq p<\sqrt{x}}\frac{\log x}{p\log p}\omega\left(\frac{\log x}{\log p}-1\right)\right).$$
Knowing the distribution of primes, with little work one can estimate the last sum to be asymptotic to $\int_2^c\omega(t-1)dt=\int_1^{c-1}\omega(t)dt$. Ultimately this gives Buchstab's asymptotic again, by taking the value
$$\omega(c)=\frac{1}{c}\left(1+\int_1^{c-1}\omega(t)dt\right)$$
which we can cake as the recursive definition of this function (which makes sense by induction on $\lfloor c\rfloor$, with the base case $\omega(c)=\frac{1}{c}$ for $1<c\leq 2$).
This formula shows that $\omega$ is in some sense "self-averaging", so it makes sense that it converges - and indeed we know it tends to $e^{-\gamma}$ by Mertens' theorem, as stated before. Heuristically, we can also see why this indicates oscillation: in $[1,2]$, $\omega$ begins larger than the limiting value and then falls below it. From there on, its value depends on the values of $\omega$ near $1$, which are larger than the current value, so the value will now increase, until the value at $c$ exceeds that at $c-1$, and then it will start decreasing, and it will alternate like that.
This is not a formal argument, and I'm not sure how easy it would be to turn it into one (the $1+$ in front of the integral complicates things). There is a short proof of the oscillatory behavior in Maier's article, but it doesn't provide much in the way of intuition.
23/12/2024
Subtitle: Diophantine approximations, exactly as they should be
[There are a few topics I've read into over the past few months that I have made some partial write-ups of, intending to turn them into posts here. For this topic, the original write up dates back to early October. I am pretending to be productive through the holiday season by properly writing these topics up, however delayed.]
One of the most basic questions in Diophantine approximation is, given an arbitrary real $\alpha$, how closely can we approximate it by rationals $\frac{p}{q}$, relative to the size of the denominator $q$? It is trivial that for any $q$, we can pick $p$ such that $|\alpha-\frac{p}{q}|<\frac{1}{q}$. Dirichlet's theorem says that if we are free to choose $q$, we can do much better - there are infinitely many fractions $\frac{p}{q}$ such that $|\alpha-\frac{p}{q}|<\frac{1}{q^2}$. Up to a constant, this is optimal (see Hurwitz's theorem), however it turns out that this is unimprovable only for a small (in fact countable) set of numbers, and in fact for almost all reals, we can get approximations to within $\frac{c}{q^2}$ for any $c>0$.
One can ask how much this can be improved, by letting $c$ depend on $q$. We are lead to the following problem: given a function $\psi:\mathbb N\to[0,1]$, when is it true that for almost all real numbers $\alpha$, there are infinitely many reduced fractions $\frac{p}{q}$ such that $|\alpha-\frac{p}{q}|<\frac{\psi(q)}{q}$? (we shall comment on the restriction to reduced fractions later).
There is a fairly easy measure-theoretic argument giving a necessary condition: let us fix $\psi$. For any given $q$, the set $A_q$ of $\alpha\in[0,1]$ for which $|\alpha-\frac{p}{q}|<\frac{\psi(q)}{q}$ for some $p$ coprime to $q$ clearly has measure at most $\varphi(q)\frac{\psi(q)}{q}$, for $\varphi$ the Euler totient function. We are then asking for the set of $\alpha$ which belong to infinitely many of the $A_q$. A partial answer is given by the first Borel-Cantelli lemma, viewing $[0,1]$ as a probability space: assuming the sum of measures $\mu(A_q)$ converges, then the measure of the set $\{\alpha\in[0,1]\mid\alpha\text{ in infinitely many of }A_q\}$ is zero. We thus get the following:
Proposition: If $\psi$ is such that $\sum_{q=1}^\infty\frac{\varphi(q)\psi(q)}{q}$ is finite, then for almost every $\alpha\in\mathbb R$, there are only finitely many coprime solutions to $|\alpha-\frac{p}{q}|<\frac{\psi(q)}{q}$.
What about the converse? The second Borel-Cantelli lemma provides one, if we pretend that the events $A_q$ are independent. Heuristically, it feels like it should be vaguely true - for different $q,q'$, the reduced fractions with these denominators are distinct, so any correlation between those events should be small. While far from rigorous, this has lead Duffin and Schaeffer to conjecture the following statement, which they have proven under some additional technical solutions:
Conjecture: If $\psi$ is such that $\sum_{q=1}^\infty\frac{\varphi(q)\psi(q)}{q}$ is infinite, then for almost every $\alpha\in\mathbb R$, there are infinitely many coprime solutions to $|\alpha-\frac{p}{q}|<\frac{\psi(q)}{q}$.
This conjecture was proven in 2019 by Maynard and Koukoulopoulos. Before explaining how the proof works, let us comment on the non-reduced fraction case. We can always replace a fracton $\frac{p}{q}$ with a non-reduced one $\frac{p'}{q'}$, and since we put no restriction on $\psi$, sometimes the bound $\frac{\psi(q')}{q'}$ is not as small. It turns out that this is the only correction one has to take - turns out that the above conjecture is equivalent to the following version due to Catlin:
Conjecture (equivalent version): Given $\psi$, let $\psi^*(q)=\varphi(q)\sup\{\psi(n)/n\mid q|n\}$. Then for almost all $\alpha$ there are infinitely many solutions $(p,q)\in\mathbb Z^2$ to $|\alpha-\frac{p}{q}|<\frac{\psi(q)}{q}$, iff $\sum_{q=1}^\infty\psi^(q)$ is infinite.
We will now outline the proof of the conjecture, following Maynard-Koukoulopoulos.
The proof roughly follows the idea of the second Borel-Cantelli lemma. The events $A_q$ are not independent. However, turns out that we can still reach the desired conclusion if we prove that, in a suitable sense, these events are "approximately independent on average". More precisely, a standard technique known as the second moment method lets us reach the desired technique if we know that for most pairs $q,r$ we have $\mu(A_q\cap A_r)$ not much larger than $\mu(A_q)\mu(A_r)$.
Through a fairly standard argument, $\mu(A_q\cap A_r)$ is bounded by $P(q,r)\mu(A_q)\mu(A_r)$, where $P(q,r)$ is a certain product over primes dividing $qr$ (and depending on $\psi$). We are thus reduced to showing that $P(q,r)$ is typically small, more specifically that there are few pairs $q,r$ for which there are many such primes. This is all the measure theory one needs, what remains is basically a purely arithmetic problem.
This problem is reduced to questions of the following form: suppose we have a set $S$ of integers such that a significant fraction of pairs of elements of $S$ have a large gcd. Is there necessarily one large integer which divides most elements of $S$? The answer turns out to be no, but it works if we weigh elements suitably, namely by $\frac{\varphi(q)}{q}$.
The proof proceeds by "compression method", where you repeatedly restrict $S$ to a smaller subset which is still relatively large, but where you gain increasingly much control over primes occurring in its elements. More precisely, you iteratively find two large subsets $V,W\subseteq S$ such that $\gcd(v,w)$ is large for most $v\in V,w\in W$, but such that you control increasingly many primes dividing them, for instance a given prime divdes all elements of $V$ but none of $W$. When repeated carefully, we are left with a set such that all pairs $(v,w)$ have the same gcd which is large.
In the context of the conjecture, the procedure is given in terms of structures called GCD graphs, which are essentially appropriately weighted bipartite graphs, encoding for which pairs of elements the gcds are large. We begin with such a GCD graph $G$, and we wish to show that its (suitably defined) size is small. The compression method lets us reduce the graph in such a way that its size can be controlled (and in fact increases at every step). After reduction, it is possible to estimate the size of the resulting graph $G'$, and hence of the original $G$, proving the required estimates. This completes the proof.
On the whole, the resulting argument is quite long and involved, with a lot of moving pieces, but ultimately pretty much completely elementary. In particular, it uses hardly any analytic number theory, nothing more difficult than Mertens's theorems.
Similar ideas have been since used to show quantitative versions of the conjecture. Indeed, Aistleitner, Borda and Hauke show that for almost all $\alpha$, the number of approximations satisfying $|\alpha-\frac{p}{q}|<\frac{\psi(q)}{q}$ with $q\leq Q$ grows, as a function of $Q$, like $2\sum_{q\leq Q}\frac{\varphi(q)\psi(q)}{q}$, precisely in line with heuristics one may write, and in fact, under some mild assumptions at least, satisfies a version of Central Limit Theorem. The proofs all follow the same general idea of the second moment method, but by getting increasingly refined estimates on the correlations involved.
To end, let me just mention that in the past year, alternative approach due to Hauke, Vazquez-Saez and Walker has been developed which doesn't use GCD graphs. It establishes the same technical bounds as needed in the original approach, but puts them in a more general setting using which proofs can be shown by a reduction to a minimal counterexample. This method can be used to establish the quantitative results too.
21/12/2024
Subtitle: Smoothness for free!
So it's been a few weeks eh? Lack of motivation plus getting distracted with some things, both math-wise and outside. I was recently prompted by a friend recently to look at some things related to Hilbert's fifth problem. This is in almost its entirety based on Terence Tao's book on the topic.
The basic question motivating this study is as follows: given a topological group structure on a manifold, is it automatically isomorphic to a Lie group? That is, is there a smooth structure on the manifold which makes the operations smooth? The answer turns out to be yes, but the tools necessary for resolving it are closely related to understanding general structure of locally compact groups. They turn out to be essentially built out of Lie groups, in a manner described by the Gleason-Yamabe Theorem which we state towards the end.
Full resolution of the problem proceeds in several steps, which can all be loosely described as improving regularity of the group structure. This regularity is local in nature, and while group structure is rather global, there is a local notion which is useful for many purposes: a local group consists of a space equipped with a partially defined multiplication and inversion operations in a neighbourhood of an identity, which satisfy group actions for those terms which are defined. For instance, any neighbourhood of identity in any topological group can be equipped with a local group structure in an obvious way.
A basic utility of local groups, especially in the Lie group context, is that we can pull back the group structure to (an open set in) $\mathbb R^n$ and study it there using classical tools of analysis. Using the Taylor expansion we see that the resulting group is a $C^{1,1}$ local groups, meaning it is defined in a neighbourhood of $0\in\mathbb R^n$ and its operation satisfies the estimate
$$x*y=x+y+O(|x||y|).$$
It turns out that this estimate alone is sufficient to get a lot of analysis going. In particular (up to using the exponential map we can additionally assume this group is *radically homogeneous*, meaning $sx*tx=(s+t)x$ for any scalars $s,t$) this condition implies the Baker-Campbell-Hausdorff formula for $C^{1,1}$ local groups:
$$x*y=x+\int_0^1F(\mathrm{Ad}_x\mathrm{Ad}_{ty})y\,dt,$$
where $F(z)=\frac{z\log z}{z-1}$, and $\mathrm{Ad}_x=\exp(\mathrm{ad}_x)$ for some linear map $\mathrm{ad}:\mathbb R^n\to M_n(\mathbb R)$. The expression on the right-hand side is in particular analytic, which in particular implies that any $C^{1,1}$ local group is locally isomorphic to a Lie group.
To continue the classification efforts, we are therefore reduced to finding $C^{1,1}$ local groups. A useful intermediate between topology and analysis is provided by metrics, and in this case too they provide a useful step, in particular Gleason metrics: functions $\left|\cdot\right| : G \to \mathbb R$ with the following properties for small enough g,h: $|g^n| \gg n|g|$ (escape property) and $|[g,h]| \ll |g| |h|$ (commutator estimate). We get the following:
Theorem: suppose that $G$ is a locally compact group which admits a Gleason metric. Then G is locally isomorphic to a $C^{1,1}$ local group, and hence is a Lie group.
For the proof, and for all of the theory, a crucial object is the set $L(G)$ of one-parameter subgroups of $G$, that is continuous homomorphisms $\mathbb R \to G$. There is an obvious scalar action of $\mathbb R$, and we can define addition on it by the formula
$$(f+g)(x) = \lim_{n\to\infty} (f(x/n)g(x/n))^n.$$
This gives us a structure of a vector space over $\mathbb R$. Moreover, equipped with compact-open topology, it turns out to be a locally compact, metrizable topological vector space, and hence is finite-dimensional.
This alone, while interesting, is not too useful, since a priori it is unclear that $L(G)$ is large or even nontrivial. However using the escape property we can build a lot of "approximate subgroups", which using Arzela-Ascoli can then be assembled into regular one-parameter subgroups. We can in fact prove that there are "enough" of them for the exponential map $L(G) \to G, f \mapsto f(1)$ to be a local homeomorphism. This alone gives us a manifold structure on $G$, and using the commutator estimate we can eventually get that pulling the group structure along the exponential map gives a $C^{1,1}$ estimate.
We are now lead to the task of producing Gleason metrics. One source of metrics on locally compact grous is by considering functions on the group. Specifically, for a continuous function $\psi:G\to[0,\infty)$, we can define a pseudonorm $|x|_\psi=\|\tau_x\psi-\psi\|$, where $\tau_x\psi(y)=\psi(xy)$ and the last norm is some norm on the suitable function space, like the sup norm, or the $L^2$ norm. How nice this metric is depends on how "smooth" $\psi$ is, and in particular this can be improved with help of convolutions with respect to the Haar measure.
One example where this convolution argument is useful is in showing that the escape property alone implies that a metric is Gleason. This is done by comparing the metric with ones defined in terms of functions above, and by convolving we can establish that some of them have the commutator property.
We can also use convolutions in a different way to essentially complete the program we are after in the case of compact groups. Specifically, convolution maps $T_g:f\mapsto f*g$ are compact self-adjoint operators on $L^2(G)$, whose spectral theory is easily understood, in particular they have finite-dimensional nonzero eigenspaces. With some work this can be used to show a very basic version of Peter-Weyl theorem: for any $y\in G$ there is some finite-dimensional invariant subspace of $L^2(G)$ on which $y$ acts nontrivially. We can then establish the following result, which is a special case of our final goal and which we will need as an intermediate step:
Gleason-Yamabe theorem (compact version): if $G$ is compact, then for any neighbourhood $U$ of the identity, there is a compact normal subgroup $H\trianglelefteq G$ contained in $U$ such that $G/H$ is a Lie group.
To see this, using compactness and the above result we can produce a finite-dimensional representation $G\to GL_n(\mathbb C)$ which is nontrivial on all elements of $G\setminus U$. Letting $H$ be its kernel, we get that $G/H$ is a linear group, hence a Lie group.
To proceed further, we introduce another property which will simplify the study: a topological group is NSS ("no small subgroups") if some neighbourhood of the identity which contains no nontrivial subgroups. The escape property easily implies that a group with a Gleason metric is NSS. Turns out the converse holds too - any locally compact NSS group admits a Gleason metric. This gives a "purely topological" characterization of Lie groups which we now record:
Theorem: A topological group is a Lie group iff it is locally compact and NSS.
To see the result, we note that NSS itself implies a sort of escape property for neighbourhoods - for any (small enough) neighbourhoods $V \subseteq U$ of the identity, there is some $n$ such that if $g,g^2,...,g^n \in U$ for some $g$, then $g \in V$. Using this we can define "approximate metrics" which, using again various comvolution tricks, satisfies a triangle inequality up to some constant. This can be then refined to genuine metrics.
To tackle more general groups, we consider a weakening of NSS to the subgroup trapping property - any neighbourhood $U$ of the identity contains a sub-neighbourhood $V$ such that all the subgroups contained in $V$ jointly generate a subgroup contained in $U$. Any group $G$ with this property is "close" to NSS - up to passing to an open subgroup, $G$ has arbitrarily small normal compact subgroups $K$ with $G/K$ NSS. In proving this we use the structure result for compact groups mentioned before - after picking a (not necessarily normal) compact subgroup, we can find some quotient of it which is a Lie group, hence has NSS, with help of which we can then produce an NSS quotient of $G$.
Now comes the most technical part of the proof: it turns out that every locally compact group automatically has subgroup trapping property. We will not explain the proof, which uses a lot of techniques already discussed like the structure of compact groups and convolution tricks, along with others like the theory of Hausdorff distances. From there, we can get what is considered a satisfactory resolution of the Hilbert's fifth problem, a structure result for locally compact groups:
Gleason-Yamabe theorem: If G is a locally compact topological group, then for any neighbourhood $U$ of the identity there is an open subgroup $G'$ and a compact normal subgroup $K$ contained in $U$ such that $G'/K$ is isomorphic to a Lie group.
In particular, if $G$ is Hausdorff and connected, it is an inverse limit of Lie groups.
Actually, $G'$ can be chosen independently of $U$, via an independent argument using van Dantzig's theorem: a totally disconnected locally compact group contains a compact open subgroup. This gives us control over the "non-Lie" part of the locally compact group. The proof of this theorem is itself pretty nice - it relies on finding a compact clopen neighbourhood of 1, from which the conclusion is quite easy easy.
Finally, let us result to the first version of Hilbert's fifth problem we mentioned: that any topological group structure on a manifold G is isomorphic to a Lie group. It follows from the above results, but the deduction is still quite tricky - what we immediately get is that $G$ is an inverse limit of Lie groups $G_n$ such that kernels $K_n$ of $G\to G_n$ are compact. Using some Lie algebra theory, we can locally invert these projections, in particular we get $\dim(G_n) \leq \dim(G)$. We may assume all the $G_n$ have equal dimensions, and then the groups $G_m/G_n$ are discrete, hence finite by compactness of projection kernels. Thus $K_n = \varprojlim_m G_m/G_n$ is profinite, in particular totally disconnected. By playing this against local connectedness of $G_n$, we find that $G$ is locally isomorphic to $G_n \times K_n$, and considering connectedness again we get $G_n$ must be discrete. Hence $G$ is locally isomorphic to a Lie group $G_n$, hence must itself be a Lie group.
While, in my eyes, a fairly satisfying resolution, this is far from the end of the story. One direction which remains, and in fact is still open, is the question of conditions under which a locally compact group acting on a connected manifold must be a Lie group. This is known if the action is faithful and transitive, by methods similar to the above, but remains open if we drop transitivity. Interestingly, this has been reduced to the case of $G=\mathbb Z_p$ - we only need to show $p$-adic groups do not have faithful actions on manifolds.
I find this work interesting, but unfortunately is is quite far from my research area, so I'm unlikely to pursue this excursion much further. It definitely was fun while it lasted though!
9/08/2024
Subtitle: Norm residue isomorphism theorem: it's "easy"!
The norm residue isomorphism theorem, formerly known as the Bloch-Kato conjecture, is a statement which describes in a rather explicit way etale cohomology of fields with torsion coefficients. Specifically, it states that the resulting cohomology rings are isomorphic to (torsion quotients of) Milnor K-theory, which are given by an explicit presentation in terms of the arithmetic of the field. This isomorphism has some notable consequences, most basic being the fact that the cohomology ring is generated in degree 1, so that all cohomology classes can be written in terms of cup products of classes in degree 1.
This conjecture was originally proven by Voevodsky. To put it mildly, the proof was hard, being based on his newly developed motivic homotopy theory. In recent years however, De Clercq and Florence have worked on a new approach, a work which has been recently completed. A friend of mine has ran a study group on the topic, and I had the pleasure to give the final talk which gathers up the results and concludes the norm residue theorem. I would like to summarize the argument here.
The premise of this whole work rests on an equivalent reformulation: for any field $k$ and any prime $p$ which is nonzero in $k$, the map $H^n(G_k,\mu_{p^2}^{\otimes n})\to H^n(G_k,\mu_p^{\otimes n})$ induced by $p$-power map is surjective, where $\mu_n$ is the group of $n$-th roots of unity viewed as a module with an action of the absolute Galois group $G_k$ of $k$. That is, the result can be reformulated in terms of lifting of certain cohomology classes for this Galois group. This is the point of view undertaken in this work, and it has inspired the basic notion studied in the papers: that of smooth profinite groups.
First let us introduce a general notion capturing what we are trying to prove: let $G$ be a profinite group and $T$ a $\mathbb Z/p^{e+1}$-module for some $e\in\mathbb N$, free of rank $1$ over that ring. We say the pair $(G,T)$ is $(n,e)$-cyclotomic if the map $H^n(G,T^{\otimes n})\to H^n(G,(T/p)^{\otimes n})$ is surjective. For a $\mathbb Z_p$-module $T$ we also say that $(G,T)$ is $(n,e)$-cyclotomic if $(G,T/p^e)$ is, and it is $(n,\infty)$-cyclotomic if this holds for all $e\in\mathbb N$. We can then state our goal by saying that $(G_k,\mathbb Z_p(1))$ is an $(n,1)$-cyclotomic module for all $n$.
For $n=1$ this is easy - by Kummer theory, we have an isomorphism $H^1(G_k,\mu_{p^m})\cong k^\times/(k^\times)^{p^m}$. This makes it obvious that the $p$-th power maps are surjective, which in the above notation means that $(G_k,\mathbb Z_p(1))$ is $(1,\infty)$-cyclotomic. The result we are after then follows from the following insanely general result: any $(1,\infty)$-cyclotomic pair is automatically $(n,1)$-cyclotomic for all $n$.
To prove this implication, one shows a slightly different result. It turns out that one gets a more robust notion if instead of considering lifting for a single $G$-module, one only asks for existence of lift over some lift of a module. More precisely, we will say that a profinite group $G$ is $(n,e)$-smooth if for any (perfect) $\mathbb Z/p$-module $A$ with an action of $G$ and a class $c\in H^n(G,A)$, there is some lift $A[c]$ of $A$ to $\mathbb Z/p^{e+1}$ such that the image of the map $H^n(G,A[c])\to H^n(G,A)$ contains $c$. That is, every cohomology class can be lifted, but exactly where it lives may depend on the class.
The main theorem of the theory of smooth profinite groups is then the following Smoothness Theorem: if $G$ is $(1,1)$-smooth, then it is $(n,1)$-smooth for all $n$. This turns out to imply the aforementioned fact about cyclotomic pairs, although the implication is quite technical and I would rather not explain it.
So, why does the smoothness theorem hold? This is best understood in terms of extensions - cohomology groups are Ext groups in the category of $G$-modules, and those can be described as "Yoneda Ext groups", classifying certain exact sequences of groups. $(1,1)$-smoothness tells us that short exact sequences which start and end with a rank $1$ module. However iterated extensions can easily contain terms of arbitrarily high rank. We have, however, the Uplifting Theorem, which says we can lift modules equipped with complete flags, that is filtrations with rank $1$ graded pieces. The details are quite difficult and involving a fair bit of geometric theory, but the intuition does come down to iterating such rank 1 extensions.
Given the Uplifting Theorem, lifting of Yoneda extensions turns out to be quite straightforward once we know they can be filtered in a manner compatible with the group action. This is done by a rather pretty argument involving reduction to $p$-groups, and using that actions of $p$-groups on $\mathbb F_p$-vector spaces admit such filtrations. All things combined this finishes the proof of the Smoothness Theorem, and hence of the norm residue isomorphism theorem.
This work is quite remarkable in how elementary it is, compared to Voevodsky's original proof. It also has other consequences, thanks to its generality - the Uplifting Theorem directly implies that mod $p$ Galois representations always lift to mod $p^2$ Galois representations. There is still certainly more to be done there, and I'm looking forward to what will come next.
15/12/2022
Subtitle: Iwasawa theory strikes again!
This was a long and interesting summer filled with various conferences and seminars. One (actually two) was a conference organized on the occassion of Bertolini's 60th (+1, because of COVID) birthday, and was probably the most interesting conference I've been to - never before I was so uniformly interested in all of the topics there. Today I would like to summarize what I've learned from one of the talks, namely Giada Grossi's talk on Mazur's main conjecture and applications to Birch and Swinnerton-Dyer conjectures.
In my Week 17 post (around 10 months ago!) I have discussed the basic principles of Iwasawa theory. One thing I did not get around to explaining is the Iwasawa main conjecture (which is now a theorem of Mazur-Wiles) - as it is relevant to this post, let me briefly state it here. I refer to the notation defined in that Week 17 post.
Recall that out of the class groups of cyclotomic fields we have produced a finitely-generated $\Lambda$-module $Y_\infty$. By the structure theorem, it is pseudoisomorphic to a module of the form $\Lambda^r\oplus\bigoplus_i \Lambda/(p^{k_i})\bigoplus_j \Lambda/(f_j^{m_j})$. In fact, in this case the module turns out to further be a torsion module, meaning $r=0$. For such a module we can define the characteristic ideal $Ch(Y_\infty)=\prod_i(p^{k_i})\prod_j(f_j^{m_j})$, which is in some ways a measure of "size" of this torsion module.
The Iwasawa main conjecture predicts a surprising connection between this $\Lambda$-module and $p$-adic $L$-functions - specifically, the characteristic ideal should be generated (up to the augmentation ideal to account for the pole) by the Kubota-Leopoldt $p$-adic zeta function, which is a $p$-adic analytic counterpart of the classical Riemann zeta function. I shall not even begin to explain how the proof proceeds, only mentioning that intriguing fact that one only needs to prove one of the inclusions between the two ideals, as equality will then follow for "size" reasons related to the analytic class number formula.
This study of class groups is what could be considered the "classical" Iwasawa theory. In the recent decades these ideas have been developed quite substantially, most prominently in the direction of studying Selmer groups of an elliptic curve $E$ over $\mathbb Q$. We can consider the $\mathbb Z_p$-extension $\mathbb Q_\infty=\bigcup_n\mathbb Q_n$. Now we can consider the $p$-Selmer groups of $E$ over those fields, and similarly to the class group case, we can form a $\Lambda$-module $X_E$ by considering (the Pontryagin dual of) their inverse limit. Mazur's main conjecture then predicts that this is a torsion $\Lambda$-module, and that its characteristic ideal is generated by the $p$-adic $L$-function attached to $E$, as constructed by Bertolini-Darmon-Prasanna.
To me, more interesting than the proof of the statement were some applications that Giada was talking about, related to the Birch and Swinnerton-Dyer conjecture. The two main directions of these applications are the "converse to Kolyvagin" theorems, and (the $p$-part of) the BSD formula. Let me elaborate on the former: Kolyvagin's celebrated theorem establishes that if $E$ has analytic rank $0$ or $1$, then so is the algebraic rank of $E$, and further we get that the Tate-Shafarevich group of $E$ is finite. The converse theorems should go the other way and establish, under the assumption that the ($p$-part of) Tate-Shafarevich group is finite and algebraic rank is $0$ or $1$, the same is true of the analytic rank.
In the algebraic rank $0$ case, the proof follows from a control theorem, which asserts that if the Tate-Shafarevich group is finite, then its order is related to a value of the generator of the characteristic ideal. From the Mazur's main conjecture, we know that the $p$-adic $L$-function is going to be such a generator, thus providing us with information about its value. Thanks to relations between $p$-adic and classical $L$-functions, we can deduce the analytic rank is $0$ in this case too, and get the $p$-part of BSD formula in this case.
Rank $1$ case is more interesting. By Kolyvagin's theorem, we know $E$ can't have analytic rank $0$, i.e. $L(E/\mathbb Q,s)=0$. For fairly general reasons, we can find a ("nice") imaginary quadratic field $K$ such that the twist $E^K$ has analytic rank $0$ over $\mathbb Q$. Consider now the factorization $L(E/K,s)=L(E/\mathbb Q,s)L(E^K/\mathbb Q,s)$. By taking derivatives and evaluating at $1$, we then get $L'(E/K,1)=L'(E/\mathbb Q,1)L(E^K/\mathbb Q,1)$. We know the last factor is nonzero, and by above we can understand its relation to the order of Tate-Shafarevich group. To get our hands on the other two factors, we need a new idea: anticyclotomic Iwasawa theory.
Up to now, we were only working with cyclotomic extensions, as over $\mathbb Q$ they exhaust all abelian extensions. Over other number fields this is no longer the case, and in particular imaginary quadratic fields have another distinguished tower of extensions, the anticyclotomic $\mathbb Z_p$-extension $K_\infty^-$, which is characterized by the fact that $K_\infty^-/\mathbb Q$ is Galois, and conjugation by the nontrivial element of $Gal(K/\mathbb Q)$ acts on $Gal(K_\infty^-/K)$ as inversion (while for a cyclotomic tower, it acts as identity, hence the name "anticyclotomic"). Just as above, one can consider a tower of Selmer groups of $E/K_n^-$ which gives rise to a $\Lambda$-module. If $E$ has rank $1$ over $K$, this module satisfies a version of the control theorem which says that the value of the generator of its characteristic ideal is related to the order of the Tate-Shafarevich group, together with the "size" (more precisely, regulator) of the generator of $E(K)$.
On the other hand, there is also a construction of an "anticyclotomic $p$-adic $L$-function", and we have an anticyclotomic version of the main conjecture, asserting that it generates the characteristic ideal above. To complete the picture, we need a result which gives us interpretation of the value of this $p$-adic $L$-function. It turns out to be most directly related to the Heegner point on $E$, which is a certain distinguished element of $E(K)$ which arises from modularity of elliptic curves. This is an analogue of the classical Gross-Zagier formula, which establishes a relation between those Heegner points and the value of the derivative of $L(E/K,s)$ at $1$. Combining those, we get exactly what we want: $L'(E/K,1)$ is related to the value of the anticyclotomic $p$-adic $L$-function, which in turn by the main conjecture is related to the order of Tate-Shafarevich group. Looking back at the equation $L'(E/K,1)=L'(E/\mathbb Q,1)L(E^K/\mathbb Q,1)$ (and recalling that since $E^K/\mathbb Q$ has rank $0$, $E/K$ must have rank $1$, so the results are applicable), all of those results combine to give us information about $L'(E/\mathbb Q,1)$ - it is nonzero, and we can relate it to the order of the Tate-Shafarevich group of $E$.
The short summary is that in rank $0$ we get the results from Mazur's main conjecture together with a control theorem, while in rank $1$ case we also need to employ the anticyclotomic main conjecture along with the appropriate control theorem and Gross-Zagier formula (together with its $p$-adic analogue). How universally applicable those results are obviously depends on how much is known of the main conjectures, and this is a topic of many recent investigations, including Giada's paper which was the basis of that talk of hers. As someone who so far had little interest in those Iwasawa-type main conjectures, this work is a great way of showcasing some of its external utility, making me even more excited for those developments!
28/09/2022
Subtitle: Linear algebra is always easier
The structure of Shimura varieties over complex numbers, and also over number fields, can be said to be quite simple - they are smooth quasiprojective varieties with many possible compactifications satisfying various functoriality properties. However many arithmetic applications require one to study not just the Shimura varieties themselves, but rather their integral models, which also include reductions of those varieties modulo primes. In some cases, those reductions are similarly nice - for instance, when we consider a Shimura variety modulo a prime $p$ which is defined by an adelic subgroup of level not divisible by $p$ (in the moduli interpretations, this corresponds to specifying torsion subgroups of order coprime to $p$) we once again recover smoothness. But in other cases, which are generally the ones of main interest, the variety has bad reduction, which can be quite nasty (although I will not discuss it here, a good worked out example is Figure 4.1 in this paper).
It turns out that there is a type of a geometric gadget which can be immensely helpful in understanding the structure of those special fibers, which is called a local model. As I will briefly explain below, those local models are moduli spaces of certain linear algebraic data which can be seen as occurring in the structure of the abelian varieties parametrized by the Shimura varieties. These two spaces are not isomorphic, but they have a very useful property that they admit exactly the same types of singularities, giving the local structure from which one often can patch together information about the global shape. I will follow a paper of Tilouine for the outline here, and I urge you to consult it if you are interested in details.
The Shimura varieties there are Siegel threefolds, which parametrize abelian surfaces $A$ with extra structure. Suppose we want to consider ones which include the data of a subgroup $H$ of order $p$ in $A[p]$ (this is the $*=P$ case in Tilouine's paper). By considering the quotient $A/H$, this is equivalent to the data of an order $p$ isogeny $A\to A'$ for some other abelian surface $p$. Understanding local structure of the Shimura variety amounts to understanding deformation theory of such an object. By the theory of Serre-Tate, this is equivalent to understanding deformations of associated $p$-divisible groups $A[p^\infty]\to A'[p^\infty]$.
An immensely useful tool for studying $p$-divisible groups is given by Dieudonné modules. They are certain $\mathbb Z_p$-modules which can be functorially assigned to $p$-divisible groups and recover many of their properties. For the diagram $A[p^\infty]\to A'[p^\infty]$ above, one can show that the associated diagram is isomorphic to one which we denote $St$, which consists of $\mathbb Z_p^4\xrightarrow{f}\mathbb Z_p^4$ with the map given by $(a,b,c,d)\mapsto(pa,b,c,d)$.
There is one more piece of data differentiating those structures, namely direct summands $\omega_1,\omega_2\subseteq\mathbb Z_p^4$ such that $f(\omega_2)\subseteq \omega_1$, arising from the Hodge filtration on $p$-divisible groups. The local model is then a moduli space of such pairs. The reasoning above should not be considered a proof, but hopefully it suggests why it should be true that the types of singularities of the Siegel threefold are the same as those on the local model (more formally: the completed local rings we can see on the two varieties are the same).
This local model is much easier to study, as it is a subvariety of a Grassmannian cut out by equations which are easy to made explicit. In this case, we find an affine chart isomorphic to $\mathbb Z_p[x,y,z,t]/(xy-p)$, whose special fiber is isomorphic to a pair of 3-dimensional planes intersecting transversely. We thus deduce the same singularities occur in the Siegel threefold, which means that the integral models of Siegel threefolds are semistable - an important structural property.
Finer analysis of the local model can also reveal some other properties of the Shimura varieties, like the stratification according to the structure of the subgroup of $H$. I am currently employing these methods in trying to compute the analogous structure for Picard modular surfaces, parametrizing abelian threefolds with extra structure. When complete I might post the resulting computation here.
23/06/2022
Subtitle: When even algebraic stacks are not enough
One of the most important classes of objects in modern number theory are Galois representations. In spirit of many modern developments in algebraic geometry, it is desirable to find a suitably well-behaved moduli space of those. However, it turns out that (for some rather subtle reasons) the "obvious" approaches to developing those have various drawbacks, essentially because they don't capture enough of families of Galois representations which we would like to consider.
Recently, Emerton and Gee have developed an approach this problem by instead considering instead stacks of $(\varphi,\Gamma)$-modules. By some classical work of Fontaine, at the level of points these are in correspondence with Galois representations, but the corresponding stack has very different geometry.
Two of my friends are currently running a study group on the geometry of those Emerton-Gee stacks. Last week I have given a talk there about intermediate results, concerning the stacks of $\varphi$-modules. However, it turns out that already in this simpler case, one deals with objects which are too complicated to be covered by the "classical" theory of algebraic stacks. One has to go deeper, into the world of formal and Ind-algebraic stacks...
If we strip away all the technical detail, the ideas behind those objects are relatively straightforward: algebraic stacks (also known as Artin stacks) are objects which locally can be described as quotients of schemes by actions of algebraic groups. Formal algebraic stacks are analogously objects which are locally quotients of formal schemes, while Ind-algebraic stacks are Ind-objects (essentially colimits) in the category of algebraic stacks. Formal algebraic stacks tend to also be Ind-algebraic, while under some rather mild conditions, Ind-algebraic stacks are also formal.
Now let us define the objects of interest, the $\varphi$-modules: we are interested in the ring $\mathbb A^+=W(k)[[T]]$, the ($p$-typical) Witt vectors over some finite field $k$ of characteristic $p$, and we equip it with an endomorphism $\varphi$ which coincides with $p$-power Frobenius modulo $p$. We also consider the ring $\mathbb A$ which is the $p$-adic completion of $\mathbb A^+[T^{-1}]$, and to which $\varphi$ naturally extends. (One also wants to consider "relative" versions of those rings, but let me ignore those for simplicity). A $\varphi$-module over $\mathbb A$ as finite projective module $M$ equipped with a semilinear endomorphism $\varphi_M$, and we say it is etale if the induced linear map$$\Phi_M=1\otimes\varphi_M:R\otimes_{R,\varphi}M\to M$$is an isomorphism. We wish to understand the moduli space of these objects.
The key to understanding them is to understand how such modules arise from modules over $\mathbb A^+$. The analogous map $\Phi_M$ no longer has to be an isomorphism, but we can arrange for it to be injective, and the cokernel will have to be torsion, as it is killed after base changing to $\mathbb A$. We quantify it by saying such a $\varphi$-module over $\mathbb A^+$ has height at most $h$ if cokernel of $\Phi_M$ is killed by $T^h$ (it is common, and in proper development of Emerton-Gee stack crucial, to replace $T$ by some other polynomial $F(T)$ in that definition). We then have informally justified the statement along the following lines:
Let $R$ be the stack of etale $\varphi$-modules over $\mathbb A$, and for each $h$, let $C_h$ be the stack of $\varphi$-modules over $\mathbb A^+$ of height at most $h$. Let $R_h$ be the image of $C_h$ in $R$ under the base change map, then $R$ is the colimit of $R_h$.
The stacks $C_h$ are much simpler to study. It is still far from obvious, but using the theory of affine Grassmannians one can show that each $C_h$ is a formal algebraic stack (indeed, by restricting to $\varphi$-modules which are defined over $\mathbb Z/p^a$ for a fixed $a$, we get algebraic stacks which combine into that formal stack). However, dealing with $R_h$, or even making sense of what it is, is surprisingly difficult - it requires a development of the notion of "scheme-theoretic image" for algebraic stacks. However once this is done, the statement above does formally hold, and establishes $R$ is Ind-algebraic.
This is about what I have covered in my talk. Of course this is only the first part in developing the Emerton-Gee stack. More technical, but similar in spirit, ideas underlie a construction of the full stack of $(\varphi,\Gamma)$-modules, of which one can then establish all the fundamental properties. These ideas have already proven themselves to have some worth, as Emerton and Gee have proven with their help existence of certain crystalline lifts. I've love to report on that another time, but unfortunately I will not be present in the study group when those are presented... perhaps it will happen anyway!
29/05/2022
Subtitle: Complex multiplication over function fields
In the Week 10 post (which was written significantly more than 13 weeks ago!) I have explored Lubin-Tate theory, which can be viewed as a version of theory of complex multiplication which uses formal group laws, and associated groups, to construct abelian extensions of local fields. Recently I had an opportunity to learn another version of this theory, this time applicable to function fields (of dimension 1 over a finite field): the theory of Drinfeld modules. Those come in many incarnations, one of which, shtukas, is particularly important in many modern developments. My main resource for this topic were notes by Bjorn Poonen on Drinfeld modules.
For simplicity, we shall work over the ring $A=\mathbb F_q[T]$, which is to be thought of as the analogue of the integers $\mathbb Z$ (but with minor modifications we can replace it with any finite extension of $\mathbb F_q[T]$). We then have the fraction field $K=\mathbb F_q(T)$, its completion at the infinite place $K_\infty=\mathbb \F_q((1/T))$, and its (completed) algebraic closure $C$, which are analogues of $\mathbb Q,\mathbb R$ and $\mathbb C$ respectively. We then consider $A$-lattices in $C$: discrete submodules spanned by a finite subset linearly independent over $K_\infty$. Note that since $C/K_\infty$ is an infinite extension, we can find such submodules of arbitrarily high rank. This is in contrast with $\mathbb C/\mathbb R$, which has dimension $2$, so the $\mathbb Z$-sublattices have rank at most $2$. Now, one definition of a Drinfeld module is as the quotient $C/\Lambda$ for some $A$-lattice $\Lambda$. This is analogous to the definition of complex tori in complex geometry.
Now here is a surprising fact: for any $A$-lattice $\Lambda$, the quotient $C/\Lambda$ is $C$-analytically isomorphic to $C$ itself! This way, we can think of a Drinfeld module as the field $C$ equipped with some "nonstandard" structure of an $A$-module. One can check that under this identification, each element of $A$ acts on $C$ via a polynomial, and this action is $\mathbb F_q$-linear. Such polynomials are easy to classify: they are of the form $\sum_{i=0}^na_ix^{q^i}$. Since we consider the action of those, we consider this as a ring under composition, denoted $C\{\tau\}$ with $\tau=x^q$. This ring is noncommutative: $\tau c=c^q\tau$ for $c\in C$. Therefore we can view a Drinfeld module as above as an $\mathbb F_q$-linear map $A\to C\{\tau\}$. With this in mind, we can define Drinfeld modules algebraically, and this makes sense over other fields than $C$: for any field $L$ containing $A$ (or more generally equipped with a map from $A$), a Drinfeld module over $L$ is an $\mathbb F_q$-linear map $A\to L\{\tau\}$ (plus one condition on the derivatives which I ignore).
Now, as with all varieties of explicit class field theory like complex multiplication, one then considers torsion. Indeed, undoing the analysis of the previous paragraph, from a Drinfeld module we get an action of $A$ on $C$, so we can consider the torsion points under this action, and take extensions of $K$ generated by them. Choosing the modules appropriately (essentially we want $\Lambda$ to be of rank $1$ over $A$; for $A=\mathbb F_q[T]$ we can take the Carlitz module determined by $A\to L\{\tau\},T\mapsto T+\tau$), we can show that torsion gives rise to all abelian extensions of $K$. (Actually, all the ones unramified at infinity, but then picking a different "point at infinity" we can account for ramification there too.)
Now, explicit class field theory is not the only application of Drinfeld modules. Another one is given by the moduli spaces of such modules, which are analogues of classical modular curves, and as such are simply called Drinfeld modular varieties. They play a role similar to Shimura varieties, and let one attack Langlands correspondence over function fields. However, to actually complete this proof, which is nicely outlined in section 3 of this paper, we need to enhance this moduli from Drinfeld modules to shtukas. We explain how to construct this other structure below, following the What Is... notice on shtukas.
The key, is to consider graded modules associated to $L\{\tau\}$, viewed as $L[T]=L\otimes_{\mathbb F_q}A$-module. It is of finite rank over $L[T]$, so it gives rise to a vector bundle $V$ over the projective line. Multiplication by $\tau$ is a semilinear map taking us one step up in the gradation, so we get a diagram of the form $V'\hookrightarrow V\xleftarrow{\tau}V'$ with $V'$ a modification of $V$ of degree $1$ (the cokernel of the inclusion is $1$-dimensional). This is precisely the data of a shtuka on $\mathbb P^1$. Not all such diagrams arise from Drinfeld modules, and it is this enlarged moduli that is necessary in studying Langlands over function fields.
That was a long post, but it is still merely the beginning of the story. We can extend this all in many directions: we have Anderson t-motives (analogues of lattices in $\mathbb C^n$), shtukas with more paws (the above is the case of two paws, which are the points over which the two maps in the diagram are not isomorphisms), and even the recent theory of mixed-characteristic shtukas (depending on Scholze's theory of diamonds; this is quite far afield from the rest, but I highly recommend these notes by Scholze and Weinstein). There are exciting developments on the horizon, related for instance to the recent geometrization of local Langlands program, which might at some point be featured to some extent here!
15/04/2022
Subtitle: Arizona Winter School recap
Over the past week or so I have attended the Arizona Winter School for the first time. It was a wonderful experience and an opportunity to meet many new people with closely related interests (and, in my time off, to explore the US for the first time!). All the lectures were great and are freely available online, in particular I recommend Akshay Venkatesh's pair of lectures on some novel perspectives in the primes-and-knots analogy.
Here I wanted to highlight the bit of math which has been the topic of Ellen Eischen's research project group which I was a part of. The goal of the project was to understand and generalize the method by Cléry and van der Geer of producing Siegel modular forms to the setting of automorphic forms on unitary groups. For simplicity let me focus on the Siegel setting.
Siegel modular forms are defined as functions on the Siegel upper half-plane, the set $\mathfrak h_g$ of symmetric complex $g\times g$ matrices with positive-definite imaginary part which have appropriate transformation properties under the action of a symplectic group. Using block-diagonal matrices, we have an embedding $\mathfrak h_j\times\mathfrak h_{g-j}\to\mathfrak h$ for any $0\leq j\leq g$. Given a Siegel modular form $f:\mathfrak h_g\to\mathbb C$, we can consider its restriction to the image of this embedding, which turns out to always be a product of modular forms on individual factors.
Unfortunately, in many cases this restriction vanishes, being forced to for instance by the fact in low degrees there aren't any (Siegel) modular forms of small weight. Cléry and van der Geer have realized one can extend this construction in order to account for this vanishing by considering the derivatives of the function, in the directions perpendicular to the image. Specifically, let us write a general element of $\mathfrak h_g$ as $\begin{pmatrix}\tau'&z\\z^T&\tau''\end{pmatrix}$ with $\tau',\tau'',z$ of size $j\times j,(g-j)\times(g-j),j\times(g-j)$ respectively. The restriction to the image of the embedding above corresponds to setting $z=0$. If this restriction vanishes, then instead we can consider the vector given by the derivatives of this function in the direction given by coordinates of $z$, and then the result will (perhaps after appropriate projection if $j,g-j>1$) be a vector-valued Siegel modular form, of weight directly related to that of original function (in the simplest nontrivial case$g=2,j=1$, the weight increases by precisely $1$).
Now, we have no guarantee that this vector of derivatives itself is nonzero (indeed, it often is, for parity of weight reasons). In that case though we can consider the vector formed by second derivatives, which will then give us a modular form of yet higher degree. Similarly we can perform this construction to any Siegel modular form, taking the derivatives of lowest order for which not all of them vanish.
There is a variety of ways to approach this result. One can do it via an explicit computation, as we have done in our project - for first derivatives this is very manageable, but gets tricky for higher orders. In the original paper the proof is presented in geometric terms, viewing modular forms as sections of a line bundle on a Shimura variety, and derivatives as sections of (symmetric powers of) the conormal bundle to the image of the embedding. Either of these appears to be quite amenable to generalization to the setting of unitary groups, and we shall see what other generalizations lie ahead!
13/03/2022
Subtitle: Improving topological algebra
Condensed mathematics is a recent invention of Dustin Clausen and Peter Scholze, intended to "fix" some issues which topological algebras exhibit. It turns out that in nearly all cases, trying to consider algebraic structures with any kind of topology causes failure of many of the nice categorical properties. For instance, the category of topological abelian groups is not abelian, as there exist bijective morphisms which are not isomorphisms due to mismatches in topology.
Clausen and Scholze fix this by realizing (nice) topological spaces as condensed sets, sheaves on the site of profinite sets. This larger category of condensed sets behaves very much like a topos (the only difference lying in some set-theoretic technicalities), and as such the abelian group objects in it form a very nice category. By considering various derived categories of objects, they have in fact successfully embedded into this theory the theory of adic spaces, and in fact improved it - unlike classical adic spaces whose structure presheaves may fail to be sheaves, the derived structure presheaves are always sheaves (of "analytic animated commutative rings").
Perhaps more surprisingly, this theory, built out of profinite sets, can also accommodate archimedean theories over real or complex numbers, via what they call the p-liquid structures.
I discuss all this, and more, in the notes I have written up, available here.
25/02/2022
Subtitle: Why is everything "reciprocity"?
One of the main ingredients in the recent progress on Bloch-Kato conjecture was in establishing a result known as the explicit reciprocity law, which in the present case asserts a relation between a certain Euler system and a value of a p-adic L-function. However, being familiar with many other results in number theory dubbed "reciprocity", and so was naturally intrigued as to why this statement has been called so as well. While trying to uncover it I went through many references but none of them seem to explain the whole story. And while I can't claim I grasp the entire logical chain leading from one point to the other, here is my attempt at outlining it.
The more "classical" part of this reciprocity picture is the one beginning with Gauss's law of quadratic reciprocity and leading up to Artin reciprocity law - the latter was given this name as it is possible to deduce all the older reciprocity laws. An easy corollary to Artin reciprocity which we will look at is given by various generalized versions of Hilbert reciprocity: for any $n\in\mathbb N$ and a number field $K$ containing $n$-th roots of unity, we have $\prod_v(a,b)_{v,n}=1$ for all $a,b\in K^\times$, where $v$ ranges over all places of $K$. Here $(a,b)_{v,n}$ is the $n$-th power Hilbert symbol, which can be defined by $(a,b)_v=\theta_v(a)(\sqrt[n]{b})/\sqrt[n]{b}$, where $\theta_v$ is the local Artin map, taking values in $\mathrm{Gal}(K_v^{ab}/K_v)$.
This is where the explicit reciprocity laws enter the picture: rather than providing variants of the statement of the reciprocity law, they are giving relatively explicit ways to evaluate Hilbert symbols involved. From here on we work over a single completed field $K_v$, so we shall drop the index $v$.
In some cases this is easy: for instance if $K(\sqrt[n]{b})/K$ is unramified, then $\theta(a)$ acts on this extension as the Frobenius, which is easy to compute. One other simple case where evaluation at primes is easy is the case of cyclotomic fields: for instance, consider the extension $\mathbb Q_p(\zeta_p)/\mathbb Q$. For any $a=p^k\cdot u$, with $u\in\mathbb Z_p^\times$, the action of $\theta_p(a)$ is determined by $\zeta_p\mapsto\zeta_p^{u^{-1}}$.
Artin and Hasse have provided the first of what is nowadays called explicit reciprocity law: the expression they provide are of the form$$(\zeta_{p^k},a)_{p^k}=\zeta_{p^k}^{T_k(\log a)/p^k},$$$$(\pi_{p^k},a)_{p^k}=\zeta_{p^k}^{T_k(\zeta_{p^k}\log\frac{a}{\pi_{p^k}})/p^k}$$for $a\in\mathbb Q_p$, where $T_k$ denotes the trace map of $\mathbb Q_p(\zeta_{p^k})/\mathbb Q_p$ and $\pi_{p^k}=1-\zeta_{p^k}$. Iwasawa has generalized this formula to the following one:$$(a,b)_{p^k}=\zeta_{p^k}^{[a,b]_k},$$where$$[a,b]_k=\frac{1}{p^m}T_m(\zeta_{p^m}\frac{d\log b'}{d\pi_m}\log a$$for some $m\geq 2k+1$, $b'$ an element in $\mathbb Q_p(\zeta_{p^m})$ of norm $b$, and $\frac{d}{d\pi_m}$ denotes formal differentiation with respect to a uniformizer. For those, see Iwasawa's article.
(A big achievement in explicit class field theory was the discovery of Lubin and Tate that a version of theory of complex multiplication has an analogue for any local field, using formal group laws. The above results have been generalized by Wiles. Those generalizations are relevant to those more general explicit reciprocity laws discussed below but let us not get into them in detail.)
In several places (e.g. in Wiles's article above) I have encountered a claim that Iwasawa's interest in those explicit reciprocity laws came out of studying a relation between Coleman's systems of cyclotomic units and Kubota-Leopoldt zeta functions. Such results are explain for instance in Section 6.6 of these notes by Jacinto-Williams, in this survey by Bertolini et al., or in this article of Perrin-Riou. None of them indicate however how the Artin-Hasse-type formulas are useful in deriving them, and I haven't managed to track them down in Iwasawa's writing, so how they are related is unclear to me (if anyone can clear the situation up, feel free to contact me!)
These results have the form which is reminiscent of the explicit reciprocity laws as mentioned at the beginning. Indeed, Coleman's cyclotomic units satisfy various norm relations which turn them into the simplest nontrivial example of an Euler system. More generally, Euler systems are various collections of cohomology classes of a Galois representation which are compatible under corestriction maps - the relation here comes from the Kummer map, which identifies Galois cohomology $H^1(K,\mathbb Q_p(1))$ with the group $K^\times\otimes\mathbb Q_p$. Abstractly, the system of cyclotomic units can be viewed as an element of an Iwasawa cohomology group$$H^1_{\mathrm{Iw}}(\mathbb Q_p,\mathbb Q_p(1))=\varprojlim H^1(\mathbb Q_p(\zeta_{p^n}),\mathbb Q_p(1)),$$and the relation is provided by the Coleman map from this cohomology to the appropriate space of measures.
This setup generalizes in a variety of ways. Firstly, one can consider Euler systems for general Galois representations, which similarly give classes in Iwasawa cohomology groups. Coleman map has been generalized by Perrin-Riou to the "big logarithm" map, which gives p-adic analytic functions interpolating various Bloch-Kato logarithms (those generalize the Kummer map composed with classical logarithm map). The modern formulations of explicit reciprocity laws postulate that this analytic function coincides with a p-adic L-function associated to the representation.
(While this itself isn't part of the "reciprocity" story, we ought to at least mention what the purpose of these results is: a special case of Bloch-Kato type results asserts that if a (complex) L-function of the representation vanishes, then the corresponding Selmer group is trivial. Modulo technical details, the "Euler system machine" of Rubin implies that if we show the Euler system is nonzero, triviality of the Selmer group will follow. Using the explicit reciprocity law we compare the Euler system with the value of the p-adic L-function, which itself interpolates values of the complex L-function, providing the desired relation.)
31/01/2022
Update 11/02/2022: In Kato's article "Lectures on the approach to Iwasawa theory for Hasse-Weil L-functions via $B_{dR}$" (part of this volume) he explains some relation between Artin-Hasse-type explicit reciprocity laws and those more involved arithmetical objects: firstly, in Chapter II, Theorem 2.1.7, he gives a formula for the value of the dual exponential function on tensor powers of Tate modules (of either multiplicative group or an elliptic curve, for instance), which specialize to formulas of Artin-Hasse and Wiles mentioned above. In Chapter III, Theorem 1.2.6 he deduces from it a relation between certain elements of cohomology made up from cyclotomic units on one hand (which I'm assuming are related to the Iwasawa cohomology classes above) and values of a certain L-functions expressible in terms of logarithms of cyclotomic units. The whole setup is a bit too complicated for me to reproduce here, but it does appear to be the kind of tie between the two types of reciprocity I wanted to find!
Subtitle: Controlling points with monodromy
I was thinking of making a post on theory of automorphic forms and representations I've been learning recently, but I've decided to for now replace it with a smaller topic which I actually have, largely, learned over the past week (the automorphic theory might come in a long overdue post like higher Hida theory below did :) ) The topic of today's post is Lawrence-Venkatesh's proof of Faltings's theorem on finiteness of points on curves over number fields. There is currently a seminar running studying the original paper (following this schedule) and I was assigned to give the two talks which deduce Faltings's theorem from the rest of the theory (which can be considered as black box). I have not studied the rest of the setup all that deeply, but I will try to briefly sketch how it works to the best of my understanding.
Let $Y$ be a curve over a number field of genus $g\geq 2$. The key idea, present already in older proofs, including that of Faltings, is to take some family$X\to Y$ and study Galois representations attached to the fibers $X_y$ for $y$ in $Y(K)$. There are some subtleties regarding semisimplicity of those representations - for present purposes this result is known by Faltings's work, but they try to make the work independent of it, partly to generalize to situations where analogous results are not known. Let me gloss over it here however. It is then easy to show that there are only finitely many possibilities for this representation (for ramification reasons).
Using methods of $p$-adic Hodge theory, specifically comparisons between etale and crystalline cohomology, one can view these Galois representations as filtered $\phi$-modules, $\phi$ referring to action of Frobenius (over some auxiliary finite place $v$). We can see how the filtration varies in the family (comparing them between fibers using a Gauss-Manin connection), which gives rise to a period map. This map is well-defined either locally or on the universal cover - the global incompatibility is measured by monodromy. The period map is essentially injective, which would show that the Galois representations are generally distinct.
Unfortunately, the situation is a little more involved, because filtered $\phi$-modules with different values of period map can still be isomorphic - this is essentially due to possibility of the centralizer of Frobenius being large. The way to get around this is to show that the monodromy (or more precisely the Zariski closure of the image of appropriate map) must be large, and hence so must the image of the period map. If we show the intersection with the centralizer of Frobenius is small enough, we can still pull off the desired finiteness.
(The above arguments are made less trivial by making use of an extensive and somewhat subtle interplay between $p$-adic and complex period maps.)
The above method can be turned into some fairly general concrete statements. The result of this sort which implies Faltings's theorem is Proposition 5.3: it begins with a curve $Y$ which has a finite etale cover $Y'$ over which a family $X$ of abelian varieties is defined. The fact we ought to pass to families over $Y'$ as opposed to (vaguely more natural) over $Y$ is an intriguing one, and has to do with families over $Y$ generally having too large centralizers. Under some technical conditions, one of which asks for large monodromy, we get a finiteness result on the subset of points $y\in Y(K)$ such that the fiber of $Y'\to Y$ over $y$ is "small" in appropriate sense. This "smallness" condition essentially amounts to saying that the fiber should have very few points which are defined over a small degree extension of $K$ (or more precisely, its completion at a place $v$). This smallness condition implies that the centralizers must be correspondingly small, hence why the theorem applies to it.
The key to Faltings's theorem is to pick a family $X\to Y'\to Y$ and a place $v$ above in such a way that all points in $Y(K)$ have small fibers. One way to do this, picked by Lawrence and Venkatesh, is to take a Kodaira-Parshin family, in which $Y'\to Y$ parametrizes covers of $Y$ ramified at exactly one point and with prescribed Galois group. Monodromy condition is verified for them through a direct topological argument. The smallness argument, on the other hand, is very algebraic, necessarily so as it deals with Galois action on the fibers, although it still makes fair use of comparison with algebraic situation. In the end, as the authors note, there doesn't seem to be anything intrinsically special about the families they consider, except for being explicit enough to do all the computations with them.
The method has already had successes beyond Faltings's theorem. Apart from some toy examples (like the S-unit theorem they present, or Siegel's theorem on integral points of elliptic curves as done here), in the very same paper C give an application to studying integral points on hypersurfaces, showing that ("generically") they must lie in a proper Zariski closed subspace. This method shows promise of being applicable widely beyond these few cases considered so far, and while not many works have been published yet, many are likely to come soon.
16/01/2022
Subtitle: Interpolate everything!
This post is long overdue, and it represents a summary of ideas which I've been working with for a couple months now. This week I had an opportunity to speak briefly about classical Hida theory, so I think this might be as good a time as any for me to write here about what it is, why it isn't sufficient for certain purposes, and how a "higher" version of it was used to get around these problems.
The general idea behind this problem is that of p-adic interpolation of various data, like modular forms. Specifically, given a modular form $f$ of some weight $k_0$, we are looking for a sequence of modular forms $f_k$ of weight $k$, for $k$ ranging over some ("large", preferably cofinite) subset of $\mathbb N$ with the property that these forms are all specializations of a family ranging over a $p$-adic parameter. In this case this can be made quite explicit, since modular forms can be represented by their $q$-expansions. Thus writing $f(q)=\sum_{n=0}^\infty a_nq^n$, what we would like to have is some $p$-adic analytic functions (in a sense I'm not going to make precise) $A_n$ such that $A_n(k_0)=a_n$ and, for each $k$ in appropriate subset of $\mathbb N$, we have that $\sum_{n=0}^\infty A_n(k)q^n$ is a modular form of weight $k$. The simplest example is given by (a modification of) the classical Eisenstein series: for $k\in\mathbb N$ even and greater than $2$, we have a modular form given by $E_k(q)=\frac{\zeta(1-k)}{2}+\sum_{n\geq 1}\sigma_{k-1}(n)q^n$. For a fixed $n$, the function $k\mapsto\sigma_{k-1}(n)=\sum_{d\mid n}d^{k-1}$ doesn't interpolate well, because of the terms divisible by $p$. However, its variant $\sigma_{k-1}^*(n)$ summing only over $d$ coprime to $p$ does interpolate, thanks to Euler's theorem $d^{p^{m-1}(p-1)}\equiv 1\pmod{p^m}$ (technically we should also restrict $k$ to some congruence class modulo $p-1$ but let me ignore that). After also modifying the $\zeta$ term in tune with Kummer congruences, we get modified Eisenstein series $$E_k^*(q)=\frac{(1-p^{k-1})\zeta(1-k)}{2}+\sum_{n\geq 1}\sigma_{k-1}^*(n)q^n,$$where now each coefficient can be viewed as a $p$-adic analytic function of $k$.
Families like this are commonly called $\Lambda$-adic modular forms, where $\Lambda$ represents the ring of analytic functions on the parameter space. However, outside explicit examples like the one above, it is far from clear whether other examples exist. A body of work which has established that is due to Hida. The main insight is to restrict attention to modular forms which are ordinary - under the action of a certain operator $U_p$, related to the $p$-th Hecke operator, it should be an eigenvector with eigenvalue a $p$-adic unit (this was later relaxed by Coleman to consider forms of "finite slope", which amounts to the eigenvalue being nonzero). Hida has then shown that the space of ordinary $\Lambda$-adic cusp forms interpolates the spaces of classical ordinary cusp forms, and in particular any ordinary cusp form belongs to a $p$-adic family.
Taking a more abstract viewpoint, modular forms that we are considering here are global sections of certain line bundles on modular curves, and indeed one possible approach to this theory is via certain towers of modular curves, known as Igusa towers (this is not Hida's original approach, but it is one that generalizes most easily.)
The importance of these and related results of Hida lies mostly in the capability to interpolate Galois representations associated to modular curves. These representations can be viewed as living inside the first cohomology group of the modular curves, and as such it is in our interest to interpolate these groups. In the case of modular curves, we are lucky, because by Serre duality results we can relate $H^1$ to $H^0$, and proceed with interpolation that way.
This approach, however, fails if we try to apply it to higher-dimensional situations, specifically automorphic forms on Shimura varieties (attached to some algebraic groups) which are not curves. The automorphic forms of interest, and associated Galois representations, live inside $H^1$ of the variety, which is neither the bottom nor the top cohomology group anymore. Briefly, the reason these methods cannot "access" higher cohomology is that we proceed by studying the aforementioned Igusa towers over the ordinary locus of the modular curves, which is affine, and hence has no higher cohomology.
A resolution of this problem came from Pilloni who has introduced higher Hida theory (and higher Coleman theory) for Shimura varieties for the group $\mathrm{GSp}_4$ by studying in finer detail stratifications of Shimura varieties over finite fields according to how far from ordinary the points are. In particular he takes Igusa towers over a locus which is not affine but is a union of two affines, and thus can "see" both $H^0$ and $H^1$. These resulting interpolation methods were then employed by Loeffler-Pilloni-Skinner-Zerbes to produce $p$-adic $L$-functions for Galois representations attached to $\mathrm{GSp}_4$, which eventually lead to a proof of new cases of Bloch-Kato conjecture for these representations.
We are currently at an exciting point seeing many new developments in the direction of generalizing these methods. Last year Nguyen has completed a development of higher Hida and Coleman theories for the group $\mathrm{GU}(2,1)$, and recently Boxer and Pilloni have generalized these methods in a wide generality. Around the same time, Gyujin Oh has announced a construction of the $p$-adic $L$-functions for this group. A goal of my PhD project is take these inputs and apply them using methods of Loeffler-Zerbes to produce yet new cases of the Bloch-Kato conjecture for the corresponding representations (and perhaps, eventually, new cases of BSD for abelian varieties.)
3/12/2021
Subtitle: The power of towers
Class groups of number fields are objects of paramount importance in algebraic number theory, of great relevance both in classical and modern developments of the subject. It is then not surprising that they turn out to be notoriously difficult to study. One of approaches to this problem is the one taken by Iwasawa, whose basic idea was nicely summarized by Hunter Brooks in a MathOverflow post:
Iwasawa theory has its origins in the following counterintuitive insight of Iwasawa: instead of trying to describe the structure of any particular Galois module, it is often easier to describe every Galois module in an infinite tower of fields at once.
Let us set the scene for this theory. Fix a prime $p$. For any number field $K$, we can consider an infinite cyclotomic extension $K(\zeta_{p^\infty})$, where $\zeta_{p^\infty}$ denotes the set of roots of unity of order a power of $p$. Inside this extension we can pick out a unique tower of fields $K=K_0\subseteq K_1\subseteq K_2\subseteq\dots$ with the property that $K_n/K$ is a Galois extension with Galois group $\mathbb Z/p^n$. The Galois group of the union $K_\infty=\bigcup_n K_n$ is isomorphic to $\mathbb Z_p$, and we thus call it the cyclotomic $\mathbb Z_p$-extension of $K$. A lot of what follows will be true of other $\mathbb Z_p$-extensions, but even this single case is of interest. As an example, for $K=\mathbb Q(\zeta_p)$ with $p>2$ we have $K_n=\mathbb Q(\zeta_{p^{n+1}})$. As we will have two "copies" of $\mathbb Z_p$ floating around later, we denote the Galois by $\Gamma$.
Consider now the class groups of the fields $K_n$. Working over a single prime $p$, let $Y_n$ denote the $p$-part of this group. Each of the $Y_n$ is then a module over $\mathbb Z_p$ and admits an action of the Galois group of $K_n/K$. They further form an inverse system (given by norm maps) compatible with all these actions. The inverse limit $Y_\infty$ is then a module over a completed group ring $\Lambda=\mathbb Z_p[[\Gamma]]$, which is isomorphic to the formal power series ring $\mathbb Z_p[[T]]$. This ring is not a principal ideal domain, but in many regards it is close to one, and in particular we have an almost-classification of its finitely generated modules. Specifically, for any finitely generated module there is a pseudoisomorphism, meaning a morphism with finite kernel and cokernel, from the module to a product of cyclic modules of the form $\Lambda^r\oplus\bigoplus_i \Lambda/(p^{k_i})\bigoplus_j \Lambda/(f_j^{m_j})$, where $f_j\in\Lambda\cong\mathbb Z_p[[T]]$ are represented by irreducible polynomials which are also distinguished - each but top coefficient is divisible by $p$. This result can be largely viewed as the reason why viewing these objects in towers simplifies the study - the ring $\mathbb Z_p[[\Gamma]]$ is much better behaved than its quotients $\mathbb Z_p[\mathbb Z/p^n]$ we would have to study otherwise.
It turns out, though is far from obvious, that $Y_\infty$ is such a finitely generated module, and moreover the modules $Y_n$ can be recovered from it as quotients by some pretty explicit elements. These results require a fair bit of input from class field theory to construct extensions of $K_n$ which encode these class groups. By performing some calculations of sizes of cyclic modules, and accounting for some variation coming from the pseudoisomorphism, we can recover the following result of Iwasawa, which in the course I'm following was called the Iwasawa control theorem:
Theorem: There exist integers $\mu\geq 0,\lambda\geq 0$ and $\nu$ such that, for all large enough $n$, the $p$-part of the class group of $K_n$ has order $p^{\mu p^n+\lambda n+\nu}$.
7/11/2021
Subtitle: Classifying lifts
The topic of considering deformation rings of (Galois) representations has its origins in some of Hida's work on interpolation of modular forms (which will probably be featured in a post here rather soon - stay tuned!), but has only really picked up in pace thanks to Wiles's work on the modularity conjecture, where the crucial isomorphism between deformation rings and Hecke algebras (known as the "R=T" theorem) was a crucial tool in producing required modularity lifting theorems.
Starting soon there will be a study group whose goal is to cover a recent paper which studies the abstract structure of these deformation rings for representations of absolute Galois groups of $\mathbb Q_p$ in quite considerable generality. I have agreed to give the first talk in this study group (and have prepared some notes for it) in which I explain the broadest of generalities of the topic. I would like to do the same (in perhaps even broader generalities) here.
To start off, we fix some profinite group (usually a Galois group of some (infinite) Galois extension) $G$, and a representation $V$ of $G$ over some finite field $F$ of characteristic $p$. Our goal is to understand the possible lifts of this representation to local rings $A$ with residue field $F$, which we call deformations of $V$ to $A$. We introduce the deformation functor which to any (Artinian, or complete Noetherian, depending on our purposes) local ring $A$ associates the set of isomorphism classes of deformations of $V$. In favorable cases, this functor turns out to be (pro-)representable, and the representing object is called the universal deformation ring $R_V$, and over it lives the universal deformation of $V$.
If $G$ is given by an explicit presentation (as a quotient of a free profinite group), the deformation ring is reasonably easy to construct. In the notes above I explain it for a trivial representation of a certain quotient of the absolute Galois group of a local field. Unfortunately, these explicit presentations rarely give us good general results about the structure of these presentations. One tool more helpful in these considerations is the tangent space, which I introduce in this talk, as well as higher cohomology groups. The applications to studying presentations of deformations rings will be the topic of later talks.
11/10/2021
Subtitle: Alpbach 2021 recap
Over the last week I have attended the Alpbach 2021 workshop, my first in-person event in almost a year and a half. Met a lot of new people, and learned a lot of new math, and had otherwise a great time! The three main lecture courses were on Euler systems, infinite-dimensional geometry of numbers, and uniform Mordell-Lang conjecture. All of them were really interesting, but it is the one on infinite-dimensional geometry of numbers that I found the most intriguing, largely because I was not familiar with it to any extent prior, and would like to briefly summarize here. As for the other two, let me link the lecture notes for the first of the courses, and the paper on which the last one was based.
Infinite-dimensional geometry of numbers of Bost and Charles has its roots in Arakelov geometry, and is an attempt in developing some kind of theory of quasicoherent sheaves "over $\overline{\operatorname{Spec}\mathbb Z}$", the integer spectrum completed by adjoining a point at infinity corresponding to the real place (something could be said about geometry over $\mathbb F_1$ here, but I'm not sure what.) For finite free modules, the objects of interest are pairs $\overline E=(E,\|\cdot\|)$, consiring of a free abelian group of finite rank together with a Euclidean norm on $E_{\mathbb R}$. We would then be interested in the number of "global sections" of $\overline E$. A natural approach, employed originally by Arakelov, would be to take these elements of $E$ which have norm bounded by $1$, an archimedean condition similar to one defining integral elements at nonarchimedean places. This, however, turns out to be inadequate for trying to develop infinite-dimensional analogues, due to functorial properties holding only up to error terms depending on dimension.
A better measure of how many "small" vectors there is given by a theta invariant, which here is given by $$h^0_\theta(\overline E)=\log\sum_{v\in E}e^{-\pi\|v\|^2}.$$ This one works much better, for instance it is additive: $h^0_\theta(\overline E\oplus\overline F)=h^0_\theta(\overline E)+h^0_\theta(\overline F)$. It also satisfies a Riemann-Roch type formula, which follows easily from Poisson summation: $h^0_\theta(\overline E)-h^0_\theta(\overline E^\vee)=\widehat\deg\overline E$, where the degree is the logarithm of the covolume of $E$. Given $h^0$, we also define $h^1$, simply by $h^1_\theta(\overline E)=h^0_\theta(\overline E^\vee)$, compatibly with some version of Serre duality, if we admit that $\overline{\operatorname{Spec}\mathbb Z}$ is a genus $1$ curve. It turns out to measure how well approximable elements of $E_{\mathbb R}$ by elements of $E$.
To pass to infinite-dimensional setting, we weaken the condition on $E$ to it merely being a countable abelian group, and we ask for $\|\cdot\|$ to be a seminorm on $E_{\mathbb R}$. The definition of $h^0_\theta$ works just finite (except it might be infinite - that's a feature, not a bug!), though generalizing $h^1_\theta$ is not as straightforward, and has an "upper limit" and "lower limit" versions, which coincide only in some favorable cases.
A big source of such lattices comes from Hermitian vector bundles: let $X/\mathbb Z$ be a (separated, finite type) scheme and $(\mathcal E,\|\cdot\|)$ a Hermitian vector bundle on $X(\mathbb C)$. For any compact subset $K\subseteq X(\mathbb C)$ we can consider $H^0(X,\mathcal E)$ equipped with an $L^2$-norm coming from some finite measure $\mu$ on $K$. This construction, however, depends on the auxiliary choices of $K$ and $\mu$. An elegant way around these problems is to use some notions from functional analysis of topological vector spaces, namely the fact these spaces can be equipped with a structure of "dual of nuclear Frechet space" (tangentially, apparently that fact essentially generalizes finiteness of sections of coherent sheaves on compact analytic spaces.)
There have been many other pieces of theory introduced, one of which being A-schemes (arithmetic/absolute schemes), a possible approach to geometry over $\overline{\operatorname{Spec}\mathbb Z}$. Instead of delving into these, let me mention one application of the theory, which lead to a new proof of the Schneider-Lang theorem. The theorem can be stated as saying that if the image of some meromorphic map satisfying certain conditions contains enough algebraic points, then this image is an algebraic variety. The new proof uses some numerical bounds on degrees $h^0_\theta$ of some (pro-)Euclidean lattices, and appealing to the method used to prove Chow's theorem. So while the transcendence theory result per se is not new, this new method gives a new perspective on "why" it should be true, and shows promise in future applications.
12/09/2021
Subtitle: Moduli and all that
The last month has been pretty busy for me, both mathematically and non-mathematically, mostly filled with me trying to (finally) understand some of the theory of Shimura varieties, something that I've been meaning to do for a long time but the circumstances only now pushed me to do it. I've written some notes of my own while reading Milne's notes, which I refer to as a substitute of a write-up here.
Shimura varieties have deep significance in arithmetic which are far from clear just from the definitions. Hopefully in the coming weeks and months I will be able to post about some of their applications as I go through them.
02/08/2021
Subtitle: The plethora of reductions
Recently I was asked to speak at the étale cohomology learning seminar organized by some of my friends at Oxford. I was assigned the topic of explaining the proof and consequences of the proper base change theorem. This lined up pretty well as this proof is one of the things I intended to look into myself, and working through it was a very pleasant experience - seeing how all the pieces slowly fit together to fit the general statement. The proof is outlined in reasonable amount of detail in the slides I have prepared for the talks (1,2), but for the sake of this post (and a final review for myself) let me outline the main steps going into the proof. This treatment is based on the proof presented on Stacks project.
The proper base change theorem asserts that for any proper morphism $f:X\to Y$ and a torsion étale sheaf $F$ on $X$, the base change map between the higher direct images $R^nf_*F$ and those of its pullback by any morphism of schemes coincide. I will refrain from explaining the definition of this map, as one of the first steps in the proof is the reduction to the following more straightforward case: for a strictly henselian local ring $A$ and a proper $A$-scheme $X$, the map $H^n(X,F)\to H^n(X_0,F|_{X_0})$, where $X_0$ is the special fiber of $X$, is an isomorphism. The reduction is simply the case of passing to the (geometric) stalks and comparing fibers between a morphism and a fiber product.
Having performed this reduction, we first tackle the case of $n=0$, i.e. seeing how things work for global sections. If we work with Zariski sheaves, the corresponding statement is rather straightforward from general topological considerations, the key point being that due to henselian property, irreducible components of $X$ have connected fibers, nonempty thanks to properness. For étale sheaves this requires an extra argument, to see that some sections on étale neighbourhoods can be glued over some scheme finite over $X$.
Having understoof the global sections, to generalize the result to $n>0$ it is enough to understand how injective sheaves behave under base change. Specifically, in the above situation, if $F$ is injective, we ought to show $F|_{X_0}$ is acyclic. Using this criterion and some routine reductions, it turns out to be enough to show the result when $X$ is the projective line over $A$. The case of $n=1$ can be done directly, by reducing to the case of a constant sheaf and using an interpretation of $H^1$ in terms of étale torsors. So the case of $n>1$ remains.
This final case is a corollary about a rather general result applicable to all separated schemes which are a sum of some number of affine opens. Through Mayer-Vietoris exact sequence we are left to study vanishing of cohomology on affine schemes. In the case of henselian rings this is the content of Gabber's "affine analog of proper base change", established using some clever diagram chasing. In general one is simply applies henselization and the result follows.
Next up in the seminar is the finiteness theorem for étale cohomology. Thankfully I am not the one presenting that topic, though I am looking forward to hearing about it!
02/07/2021
Subtitle: A point-like diamond
In algebraic geometry over a field the tools available very often vary depending on the nature of the base field. When the base field is taken algebraically or separably closed, one often says that one is in the geometric situation, and otherwise, when the base field is not separably closed and so one has nontrivial Galois actions in place, one is in the arithmetic situation. The latter tends to involve more complicated objects. One good example of that is the étale fundamental group - for a variety over an algebraically closed field of characteristic zero, we have a comparison between it and the classical fundamental group of a complexification, while in the arithmetic situation one has an additional Galois action. It is generally understood that things in arithmetic context cannot be reducible to those in geometric context. However, this might not necessarily be the case.
The simplest example of an arithmetic fundamental group is an absolute Galois of a field, which is the fundamental group of a point. Some of the most important ones are those of the $p$-adic fields $\newcommand{\Q}{\mathbb Q}\Q_p$. It turns out that there is a geometric object, defined over an algebraically closed field, whose étale fundamental group is isomorphic to the group $\mathrm{Gal}(\overline{\Q_p}/\Q_p)$. It appears that the idea of such a construction was first put forward by Peter Scholze, and then expanded by Jared Weinstein here. I am not aware of any deep insights coming from it, although it is definitely of interest in its own right, as fair few nontrivial tools come into the construction, which I now wish to outline.
Fix an algebraically closed field $C$ which is nonarchimedean of residue characteristic $p$. Let $D_C$ be the open unit disk in $C$ centered at $1$ (viewed as a rigid or an adic space). This disk has a structure of a $\mathbb Z_p$-module, with group operation given by multiplication and action of $z\in\mathbb Z_p$ given by $1+x\mapsto(1+x)^z$. The $p$-th root map doesn't induce a morphism of this space, but we can change that by introducing a new space, an inverse limit under the Frobenius map $\widetilde D_C=\varprojlim_{x\mapsto x^p}D_C$, which we can now view as a perfectoid space in the sense of Scholze. It now admits an action of $\mathbb Q_p$, and the punctured disk $\widetilde D_C'$ has an action of a group $\mathbb Q_p^\times$. The object in question is the quotient $\widetilde D_C'/\Q_p^\times$, though one has to be careful what one means by it, as we discuss below.
Whatever kind of object $\widetilde D_C^*/\Q_p^\times$ is, its finite étale covers should correspond to $\Q_p^\times$-equivariant finite étale covers of $\widetilde D_C^*$. The first key step now comes from Scholze's tilting correspondence, which identifies the covers of this perfectoid space over $C$ with covers of its tilt, $\widetilde D_{C^\flat}^*$ defined over $C^\flat$. The clever part comes in observing now that $\widetilde D_{C^\flat}^*$ can also be viewed as a perfectoid space over $\mathbb F_p((t^{1/p^\infty}))$,where $t$ is the coordinate of our unit disk. This field is a tilt of a cyclotomic field $\Q_p(\zeta_{p^\infty})$, so we can untilt $\widetilde D_{C^\flat}^*$ to a space over this last field. One can verify that this untilt is a space closely related to the adic Fargues-Fontaine curve. The remaining step comes from the fact that the Fargues-Fontaine curve is geometrically simply connected, meaning that its finite étale covers come only from the base extensions of the field, so a relevant étale fundamental group works out to be isomorphic to the absolute Galois group of $\Q_p(\zeta_{p^\infty})$. To get the absolute Galois group of $\Q_p$, we have to involve the Galois group $\mathrm{Gal}(\Q_p(\zeta_{p^\infty})/\Q_p)\cong\mathbb Z_p^\times$. The missing action of $p^{\mathbb Z}$ comes from the way Fargues-Fontaine curve is constructed; more details are given in Weinstein's paper linked above (a more detailed sketch is given in the introduction). Combining all of these comparisons does gives the desired equivalence between $\Q_p^\times$-equivariant covers of $\widetilde D_C^*$and algebraic extensions of $\Q_p$.
To actually construct the quotient described above, we have to pass yet beyond the world of perfectoid spaces and into the world of diamonds as defined by Scholze, which are essentially a perfectoid equivalent of algebraic spaces, described as sheaves on the (pro-)étale site. Relating the equivariant covers of $\widetilde D_C^*$ and the covers of the quotient takes a considerable amount of work and working it out in details takes a bulk of Weinstein's paper.
The amount of work and nontrivial mathematics which comes into constructing this object and describing its properties is quite astonishing, and to me really highlights the subtlety of the situation. I have yet to really learn this mathematics, but this makes me even more excited to do so one day!
24/06/2021
Subtitle: When the position of the indices matters
This post is about a month overdue, so what I have learned actually happened quite a bit back. This is more or less a follow-up to the previous post, and as (not) promised I will talk briefly about how the Hasse-Arf theorem lets one conclude the proof of the local Kronecker-Weber theorem, classifying the maximal abelian extension of a local field.
Rather than spell out the idea in a post here, I have decided to write it up into a note, in a way which hopefully motivates the way in which these considerations, like the upper numbering of ramification groups, arise. The note can be found here.
01/06/2021
Subtitle: Poor man's complex multiplication
It has been a while since I have learned class field theory, and at the time I have never really bothered to learn the Lubin-Tate theory (in part because of a bad experience in my first attempt, having been drowned by technical details.) This term however I am attending Johannes Anschütz's course on Lubin-Tate spaces, and one of the first topics discussed was Lubin-Tate theory. Looking past some of the technicalities I grew to believe it is a very pretty theory. Let me summarize some of its elements.
The goal of the theory is to give an explicit description of maximal abelian extension $K^{ab}$ of a local field $K$. The unramified part $K^{ur}$ of this extension is easy to describe, being generated by roots of unity of order coprime to the residue characteristic. The "totally ramified part" would be a totally ramified extension $K'$ satisfying $K^{ab}=K^{ur}K'$. It is actually not uniquely determined, and our constructions will depend on a uniformizer $\pi\in K$.
One motivation (ahistorical one, see discussion in Milne's CFT notes, end of section I.4) behind this construction is the theory of complex multiplication, which was greatly successful in providing a description of maximal abelian extensions of imaginary quadratic fields (or, more generally, CM fields) in terms of torsion points on elliptic curves (or abelian varieties) with complex multiplication by integers of that field. When dealing with local fields in place of global fields, we seek a "local" analogue of abelian varieties. These are provided by formal group laws.
A formal group law over the ring of integers $O_K$ of $K$ is a formal power series $F(X,Y)\in O_K[[X,Y]]$ which satisfies certain identities reminiscent of identity and associativity laws for groups. If we denote by $m_K$ the maximal ideal in $O_K$, then $F$ defines an actual group law on $m_K$, and in fact even $m_{\bar K}$, the maximal ideal in the integers of algebraic closure, which we will need later.
Now fix a uniformizer $\pi\in K$ and consider the polynomial $f(X)=X^q+\pi X$, where $q$ is the size of the residue field (we could pick $f$ differently for a cleaner theory butit is not necessary). The first crucial result of the theory is existence of the unique formal group law, the Lubin-Tate group law $F_f$ such that $f$ defines an endomorphism of this group law (that notion being defined in the obvious way). Furthermore, there is a natural way to associate an action of $O_K$ on this group law, giving an analogue of the complex multiplication. In particular the action of $\pi\in O_K$ is given by $f$ itself.
All of the above implies that $F_f$ defines an $O_K$-module structure on $m_{\bar K}$ (importantly, different from the obvious one.) What we are interested in is the torsion of this module: for $n\geq 1$ we define $\Lambda_{f,n}$ to be $\pi^n$-torsion of this action and $K_{\pi,n}=K(\Lambda_{f,n})$. Explicitly, this field can be described as the splitting field of $f^n(X)=f(f(\dots(X)\dots))$. These fields have some quite marvelous properties which follow rather quickly from the module structure: $K_{\pi,n}$ is totally ramified Galois extension and its Galois group preserves $\Lambda_{f,n}$. This last turns out to be a cyclic module isomorphic to $O_K/\pi^n$, which provides an isomorphism $\mathrm{Gal}(K_{\pi,n}/K)\cong(O_K/\pi^n)^\times$. The compositum of all these fields, $K_\pi$, is a totally ramified abelian extension with Galois group isomorphic to $O_K^\times$.
What, unfortunately, is not clear from these considerations is the fact that $K^{ab}=K^{ur}K_\pi$. This is true, but requires a more involved argument, which in Milne's notes and the course I'm taking will be carried out via Hasse-Arf theorem, which I might (but probably won't) cover in another post. This quest is, however, worth its effort - with the abelian extension described this explicitly, one can also very explicitly describe the (local) Artin reciprocity map, paving way to explicit (local) class field theory, useful also in global class field theory.
29/04/2021
Subtitle: Perfect deperfections
Currently I am attending Kiran Kedlaya's course on prismatic cohomology. One of the topics which we discussed this week and which I found particularly pretty is that of perfect prisms, which I would like to explain here.
The basic objects involved are $\delta$-rings, which are simply (commutative, unital) rings $A$ equipped with an operation $\delta:A\to A$ which satisfy a certain pair of identities. These identities imply that a map $\phi:A\to A$ defined by $\phi(x)=x^p+p\delta(x)$ is a ring endomorphism, a lift of Frobenius from $A/p$ to $A$. A $\delta$-pair is a pair $(A,I)$ consisting of a $\delta$-ring and an ideal in this ring.
A prism is a special kind of $\delta$-ring satisfying the following properties: $I$ is, locally on $\operatorname{Spec}A$, principal and generated by a non-zero-divisor; $A$ is (derived) $(p,I)$-complete, and $p\in I+\varphi(I)A$. A prism is called perfect if additionally the morphism $\phi$ is bijective. Perfectness has somewhat surprising consequences on the structure of the prism, for instance it implies that $I$ is globally principal. In fact, we have a classification of such prisms: if $(A,I)$ is a perfect prism, then $A$ is isomorphic to the Witt vector ring $W(A/p)$, and $I$ is generated by a so-called distinguished element.
The perhaps most surprising is the fact that perfect prisms form a category equivalent to that of (integral) perfectoid rings, as defined for instance in Bhatt-Morrow-Scholze. More specifically, if $(A,I)$ is a perfect prism, then $A/I$ (what Kedlaya calls the slice of the prism in his course) is perfectoid in the above sense. Conversely, if $R$ is perfectoid, we can consider its tilt $R^\flat$. The Witt ring $W(R^\flat)$ is equipped with a surjection $\theta:W(R^\flat)\to R$, and $(W(R^\flat),\ker\theta)$ is a perfect prism. This formalism has allowed Bhatt and Scholze to reprove some of the theorems about perfectoid rings.
Taking that as a motivation for studying perfect prisms, we can then consider general prisms as some kind of "deperfection" of the notion of a perfectoid ring (an assertion which, I admit, I have never seen good explanation of besides the above) which, from the categorical point of view, is quite better behaved. This category can also be used to devise the prismatic site and the prismatic cohomology, which I hope to learn more about over the next week.
18/04/2021
Subtitle: Arithmetic in unexpected places
I have recently been informed of the existence of this paper which resolves a question in the theory of hyperbolic 3-manifolds... under the assumption of Langlands reciprocity and generalized Riemann hypothesis. Having learning about this I figured this is something I have to document here! This post is mostly based on a summary due to Matt Emerton, which I recommend if anyone is interested in more background or more precise outline. Here I will just give a sketch of the result.
The question essentially asks about the following: assume $M$ is a closed hyperbolic (= admits a metric of constant curvature -1) manifold. Can there exist finite covers $M'\to M$ of arbitrarily large injectivity radii (= radii of balls in tangent spaces on which the exponential maps are injective) which are all homology 3-spheres? This last condition is the crucial one and means that $M'$ has the same homology groups as the 3-sphere, and boils down to vanishing of the first (co)homology group. Calegari and Dunfield have established that the answer to this question is yes. Let me describe their construction.
The construction shares many similarities to the construction of classical modular curves, but with some necessary adjustments. Firstly, in order to get a threefold one has to quotient not the usual upper half-plane in $\mathbb C$, but rather the 3-dimensional half-space $h^3$ in $\mathbb R^3$. By passing to universal covers, all hyperbolid threefolds arise as its quotients. There is a natural action of the group $\newcommand{\SL}{\mathrm{SL}}\SL(2,\mathbb C)$ on this half-space, though I will omit its description. This group has many discrete subgroups we could quotient by. The most obvious ones are congruence subgroups of $\SL(2,\mathbb Z[i])$, or ones coming from other number fields, but these turn out to be unsuitable since the quotients will not be compact. This is just like how classical modular curves are noncompact, at least until we add cusps.
The way around that, turns out, is to replace usual matrix algebras by quaternion algebras. Here specifically we take a certain quaternion algebra $D$ over $\mathbb Q(\sqrt{-2})$ and a maximal order $B$ (a subring behaving like the ring of integers of a number field). We have an isomorphism $D\otimes_{\mathbb Q(\sqrt{-2})}\mathbb C\cong M(2,\mathbb C)$. Identifying $B$ with a subring of $M(2,\mathbb C)$ and taking its intersections with $\SL(2,\mathbb C)$ gives rise to a group $\Gamma$. We can then define its congruence subgroups $\Gamma_n$ and, finally, quotients $M_n=\Gamma_n\backslash h^3$. Choosing $\Gamma_n$ appropriately, these manifolds form a tower of finite covers of three-folds.
These threefolds are compact (thanks to the fact the ring $B$ has no nontrivial unipotents) and it's not hard to see injectivity radii tend to infinity. The remaining part is to show they are homology spheres, and this is where we have to bring in the big guns. Assuming one of the $M_n$ was not a homology sphere, it would have nontrivial first cohomology group, which by Hodge theory would be represented by a harmonic differential form. By some "general nonsense" one can additionally pick this form to be an eigenvector for all the Hecke operators on this curve, which together with harmonicity means it gives rise to an automorphic eigenform on $\Gamma\backslash \SL(2,\mathbb C)$. Under a conjectural version of Langlands reciprocity (which has been proven under some technical conditions, which however are not satisfied here), such an eigenform gives rise to a Galois representation which satisfies some conditions (most notably a strong unramifiedness and irreducibility conditions) which turn out to be incompatible. This incompatibility requires at one point a calculation involving Odlyzko's bounds which rely on GRH, though authors have admitted this use is not really essential and GRH is avoidable.
So there you have it, a conjecture in geometry established using deep number-theoretical techniques. One could go as far as to say that this is the most ambitious crossover event in history. This comes to show that number theory is far from being an isolated field of mathematics that it was once thought to be, and I'm excited to see more such connections in the future.
(As a final note, it would be unfair not to mention that (unfortunately?) Boston and Ellenberg have presented a method to prove that the manifolds above are homology spheres which avoids the use of Langlands conjecture (and GRH), instead employing the theory of pro-p groups.)
17/03/2021
Subtitle: Shouting at non-invariance so it goes away
For the past few weeks I was reading into some symplectic geometry in order to approach one of my miniprojects, which concerns the paper A symplectic look at the Fargues-Fontaine curve. One of the tools introduced there is known as loud Floer cochains which I have recently learned about. The whole background is difficult to describe concisely, but the idea is to develop a certain variant of Floer cohomology. It is a cohomology theory for pairs Lagrangian submanifolds of a symplectic manifold and one of its crucial properties is that it is invariant under Hamiltonian isotopies - in fact, this is what enables us to define it in general, instead of just in the case of transversal submanifolds, in particular when the two Lagrangians coincide, which is usually the case of main interest.
(There is plethora of technical issues when developing this theory, and in fact it is not always well-defined. I am not going to get into all this in part because I know little about it myself.)
Usually this cohomology theory is developed with coefficients in a ring known as the Novikov ring, which is essentially the ring of formal power series in which we admit arbitrary nonnegative real exponents. This way various infinite sums can be said to converge formally. One of the concerns of the above paper though is investigation of what happens when we set the value of the Novikov parameter (the formal variable) to be 1 - as expected, issues arise from infinite sums, but those are circumvented by working with coefficients in a suitable non-Archimedean field (as well as working with an "F-field"). A particularly notable issue though is that the invariance under Hamiltonian isotopies now fails - this is because certain sums encoding quasi-isomorphisms no longer converge.
This is where loud Floer cochains come in: it turns out that we can recover invariance under Hamiltonian isotopies by taking a Lagrangian, and applying an isotopy to it for an infinitely long time. The nature of Hamiltonian isotopies makes the Lagrangian deform essentially by making it "twist" into waves of unbounded amplitude. This twisting makes all the annuli between two Lagrangians to break (as the increasing amplitude will introduce more and more intersections). Removing these annuli turns out to remove all the problematic terms which arise when trying to define cohomology invariant under the isotopies (at least in the case of a torus treated in the paper.) This is summarized by the following slogan in the paper:
By shouting infinitely loud, all of the annuli break, along with whatever problems they posed for noninvariance.
22/02/2021
Subtitle: Baby version of modularity
I have recently stumbled upon Barry Mazur's article Number Theory as Gadfly. While my interest in it came from just the technical appendix in it, I have decided to check out the entirety of the article, and from it I came to gain some new insight into the idea of modularity theorems/conjectures which I wish to share here. Specifically, I shall explain how one version of "modularity" which gives a result which holds for all algebraic curves with algebraic coefficients. My understanding comes from several papers online, none of which explained the result in quite the way as I do below, so there may be some inaccuracies.
The starting point is a theorem of Belyi: for any smooth algebraic curve $X$ defined over some number field there is a nonconstant morphism $X\to\mathbb P^1$ all of whose ramification points lie over $0,1,$ or $\infty$. Let $P'=\mathbb P^1\setminus\{0,1,\infty\}$ and let $X'\subseteq X$ be the complement of preimages of $0,1,\infty$. If we assume $X$ is projective, then the resulting map $X'\to P'$ is finite and unramified, and so if we look at its complex points, we get a finite covering map $\newcommand{\C}{\mathbb C}X'(\C)\to P'(\C)$.
Now. $P'(\C)$ is simply the twice-punctured complex plane, and it can be given an explicit universal cover in terms of modular functions: the function $f(z)=\frac{\eta(z/2)^8}{\eta(2z)^8}$ realizes the complex upper half-plane $H$ as the universal cover of $P'(\C)$ (according to this MO post) and its deck transformation group is (the image in $PSL_2(\mathbb Z)$ of) the congruence group $\Gamma(2)$. This covering map $H\to P'(\C)$ factors through any other covering map, in particular we get a covering $H\to X'(\C)$, which thus realizes $X'(\C)$ as a quotient of $H$ by a finite-index subgroup $\Gamma$ of $\Gamma(2)$. By taking a suitable compactification at cusps, we can realize the whole $X(\C)$ as such a quotient too. This is suspiciously close to modularity!
The difference between this result and actual modularity is that $\Gamma'$ here can be essentially arbitrary, while in modularity we demand that $\Gamma'$ be a congruence subgroup - it should contain a subgroup $\Gamma(N)$ for some $N\in\mathbb N$. This distinction may seem rather insignificant, especially in light of the fact that it being nontrivial is in a certain sense a coincidence - had we been working in $SL_n(\mathbb Z)$ for some $n>2$ instead, every finite-index subgroup would be a congruence subgroup. And yet for $n=2$ this difference exists, and modularity turns out to be unbelievably more subtle.
09/02/2021
Subtitle: Equidistribution and automorphy
The first thing I learned this week is that it is hard to keep a blog like that going while taking as many classes as I am taking. Either way, here goes something I have stumbled upon today.
In the past I have many times come across the Sato-Tate conjecture, which reads as follows: Let $E/\mathbb Q$ be an elliptic curve with no complex multiplication. For any prime of good reduction let $a_p=p+1-|E_p(\mathbb F_p)|$, where $E_p$ is the reduction of $E$. Then the values $\frac{a_p}{\sqrt{p}}$ follow a distribution which on interval $[-2,2]$ is given by the density function $\frac{1}{2\pi}\sqrt{1-4t^2}$. There is a different, "better" formulation given by writing $\frac{a_p}{\sqrt{p}}=2\cos\theta_p$, and then asserting $\theta_p$ are distributed in $[-\pi,\pi]$ according to density $\frac{2}{\pi}\sin^2\theta$.
I have always been mystified by this statement - it is far from clear why it's these distributions that the considered values should follow. I have never bothered to look into this topic further, knowing that this is some very deep mathematics we are talking about here. However this time I decided to give it a go and see what I can find.
Quickly I've stumbled upon some notes by Sutherland for Arizona Winter School 2016. While they do not contain anywhere near a full proof, they provide some intuition, one which I do find satisfactory - the distributions above come from the distribution of traces in the group $SU(2)$. To see the relevance, we have to recall that the numbers $a_p$ are given by traces of actions of the Frobenius elements on some Tate module (or cohomology if you wish). By the Riemann hypothesis for elliptic curves over finite fields, $a_p$ is the sum of two eigenvalues of this operator which, upon taking some embedding in $\mathbb C$, are complex conjugate of absolute value $\sqrt{p}$. This means that dividing the operator itself by $\sqrt{p}$, we get an element of $SU(2)$. This element is not independent of many choices which are to be made, but its conjugacy class is.
Finally, $SU(2)$ is a compact topological group, so it possesses a Haar measure. We can then ask whether the operators constructed above are equidistributed. More precisely, we have to consider the measure induced on the space of conjugacy classes. The latter can be parametrized by either trace or the argument of one of the eigenvalues, and a short computation reveals the induced distributions are precisely the ones predicted by the Sato-Tate conjecture!
This is just one part of the notes above. Another part that caught my eye there was an outline of Tate's argument when justifying the conjecture (Sato arrived at it independently through numerical evidence). Like in probability theory, a distribution of the sort considered is determined by its moments (expectations/averages of the values of powers). Those last ones can be interpreted in terms of symmetric powers of representations, and Tate has explained how we can get the desired distribution if we knew the resulting L-functions are holomorphic and nonvanishing in a suitable domain. Those facts later turned out to be provable for automorphic L-functions in the sense of Langlands. This way Sato-Tate conjecture has been reduced to proving (potential) automorphy of these symmetric powers, a feat which was achieved in 2011.
I really need to learn more about these things some day...
25/01/2021
Subtitle: If you don't want to make a choice, make all of them.
Firstly, happy new year to everyone out there! Hope this year treats everyone better than the last one did. I took a break from documenting my progress here over Christmas, but that most certainly didn't stop me from doing maths, so you can take this week's post as a summary of the entire last month.
I have managed to (more or less) finish Le Stum's book which I have discussed in two of the previous posts, and with its end I have managed to clarify one of the things I was wondering for a while. In the week 1 post I have explained the several various approaches to calculus on rigid spaces (it's essentially the same as for schemes), and asserted that they are all equivalent. At the time however the author has almost immediately switched to working almost exclusively with the approach via integrable connections, and it was not clear what the point is to introducing the other view points, especially the most abstract approach via crystals. This has been clarified in the last couple of chapters though.
Most of the considerations in the book were performed using frames: given a variety of interest $X$, we embed it as an open subset in a (usually proper) variety $Y$, which we then embed as a closed subscheme in a (usually smooth) formal scheme $P$, after which we proceed to perform reasoning on the associated rigid space $P_K$. The first definition of rigid cohomology is given in terms of such a frame $X\subset Y\subset P$, as well as a module with integrable connection on (a subset of) $P_K$.
However, in the end, it is just $X$ that we are interested in, and choices of $Y$ and $P$ in general will be arbitrary. The way to deal with this is rather intriguing: rather than choosing a single frame and integrable connection on it, we instead consider the collection of all possible frames, and we take a family of modules on all resulting $P_K$, which amounts to what came to be called an (iso)crystal on $X$. Invariance properties of rigid cohomology then guarantee we can compute the cohomology using the module on any one of those $P_K$.
Unfortunately, as Le Stum notes, this definition is not quite satisfactory, as even though we have made our "coefficient objects" to only depend on $X$, defining the cohomology still depends on the choice of $P$ (albeit only up to a canonical isomorphism). A "better" definition would be one involving cohomology on a suitable site, similarly to how crystalline cohomology is defined. It is unclear to me at this point whether such a definition exists, but I would be definitely interested in finding out more about that myself in the future!
06/01/2021
Subtitle: Classifying untilts
This topic is distinct from the one discussed from the one discussed in the previous two weeks. Over the coming months I will be working on two miniprojects, the first of which loosely tied to what I talked about previously, the second one on what I shall talk about today.
The setup consists of a single perfectoid field $C$ of characteristic $p$. We can then consider untilts of $C$ in characteristic $0$ - those are perfectoid fields $K$ equipped with an isomorphism between the tilt $K^\flat$ of $K$ and $C$. It turns out that the collection of isomorphism classes of those units can be given a natural geometric structure. More precisely, we can identify those classes with certain points in an adic spectrum $\mathrm{Spa}(A_{inf},A_{inf})$, where $A_{inf}$ is a certain "period ring" of Fontaine. More precisely, we consider the subspace $\mathcal Y$ obtained by removing a single point (corresponding to a characteristic $p$ untilt).
However, in a sense we have a bit of redundancy going on here. Indeed, since we consider untilts to consist not just of $K$ but also an isomorphism, there are many untilts with the same underlying field, and they all differ by an automorphism of $C$. I don't believe quotienting out by all automorphisms of $C$ gives a nice geometric space, but, thanks to the fact we are in the world of adic spaces, we may quotient out by the action of (powers of) the Frobenius automorphism $\varphi$. The corresponding action of $\varphi^{\mathbb Z}$ on $\mathcal Y$ is properly discontinuous, so the resulting quotient is itself an adic space - this is the adic Fargues-Fontaine curve $\mathcal X^{FF}=\mathcal Y/\varphi^{\mathbb Z}$.
It might be desirable to construct a "simpler" space which classifies untilts like that, for instance a scheme. The quotient construction above doesn't work - because of how functions on schemes there are, the quotient is not really well-behaved, in fact I believe it is just a single point (similarly to trying to quotient $\mathbb A^1$ by an action of $\mathbb Z$ by translations - the quotient is a point essentially because $\mathbb A^1$ admits no nonconstant periodic functions). However, once we have constructed $\mathcal X^{FF}$, it turns out to be possible to "schemify" it, by taking sections of powers of a certain line bundle on it and forming a Proj scheme out of the resulting graded algebra. The resulting space is very "large" in algebraic sense, but in many regards still behaves like a projective curve over a field, in fact more specifically the projective line. More on that may come in a later post, for now I shall settle on just outlining this definition.
Some references:
Morrow's Bourbaki seminar notes, explains in greater detail all of the above and discusses its relation with other topics, including Scholze's diamonds (which might appear in a post here one day!)
Anschütz's Lectures on the FF curve, a course which covers many of the topics in the theory with complete proofs.
Lurie's course on the FF curve, as above. Notably he discusses in what ways the FF curve is analogous to a quotient of a punctured unit disk by a certain action. Although Lurie does not discuss the adic side of things, comparing the two helped me gain the intuition for many of the definitions.
13/12/2020
Subtitle: Call for overconvergence
As far as I know the following is a phenomenon shared by just about every affinoid variety. For simplicity though I focus on the case of the closed unit disk.
Let $D$ be the closed disk, viewed as an affinoid subvariety of the affine line over some non-Archimedean field $K$. We are interested in the de Rham cohomology of this space. Drawing analogy to the complex-analytic situation, where the closed disk is contractible, we would expect this cohomology to be trivial in positive degrees. However, as we will see in a second, this is not the case.
$D$ is one-dimensional, so the only nonzero differential forms on $D$ are the $0$-forms, which are represented by power series $\sum_{i=0}^\infty a_ix^i$ with $a_i\to 0$ (so that the power series converges on all of $D$, and $1$-forms, which can be written as $\left(\sum_{i=0}^\infty a_ix^i\right)dx$, again with $a_i\to 0$. The differential is given by
$$d\left(\sum_{i=0}^\infty a_ix^i\right)=\sum_{i=1}^\infty a_iix^{i-1}dx=\left(\sum_{i=1}^\infty ia_ix^{i-1}\right)dx.$$
Finally, the nontrivial cohomology groups $H_{dR}^0(D),H_{dR}^1(D)$ are the kernel and the cokernel of this differential. The kernel is what we expect: $d$ kills precisely the constant functions, so $H_{dR}^0(D)\cong K$. For the cokernel, we would think that the usual termwise differentiation would do the trick: $\left(\sum_{i=0}^\infty a_ix^i\right)dx$ is the image of the power series
$$\sum_{i=0}^\infty\frac{a_i}{i+1}x^{i+1}$$
The issue is that the coefficients of this power series need not converge to zero, as in the non-Archimedean world reciprocals of integers are unbounded. Explicitly we can consider power series of the form
$$\sum_{i\in A} p^ix^{p^i-1}$$
for any infinite subset $A$ of $\mathbb N$. With those we deduce the cokernel $H_{dR}^1(D)$ is infinite-dimensional.
This is a somewhat unsettling situation which would deserve a salvage. One idea is restricting one's attention to only overconvergent sections of the sheafs of differential forms. In simplest terms, those are the differential forms represented by power series which converge not just on $D$, but also on some open disk containing $D$. This way any overconvergent $1$-form is indeed a differential of some overconvergent $0$-form, which means $H^1$ computed in this way gives the expected answer, which is zero.
Using overconvergent sections is certainly not the most obvious way to give a better behaved cohomology, but it turns out to be one which is incredibly fruitful to generalize.
The computation is taken from section 4.3.1 of Le Stum's Rigid Cohomology.
05/12/2020
Subtitle: The many face(t)s of crystals
I am currently reading Le Stum's book Rigid Cohomology and have just reached chapter 4, simply titled "Calculus". In it are discussed several different kinds of "differential" objects in algebraic geometry which can be considered over some fixed space $V$ (Le Stum focuses on rigid analytic spaces, but the theory is very similar over schemes. I also omit reference to the base scheme):
Crystals: in fancy terminology, they are certain "sheaves on the infinitesimal site of $V$". Such an object consists of a sheaf $\mathcal E_{V'}$ on every "infinitesimal thickening" $V'$ of a space $V_0'$ over $V$, together with isomorphisms $\mathcal E_{V''}\cong u^*\mathcal E_{V'}$ for morphisms $u:V''\to V'$, satisfying suitable cocycle conditions.
Stratifications: by embedding $V$ diagonally in $V\times V$, we have certain canonical infinitesimal neighbourhoods $V^{(n)}$ of $V$. Each of them can be viewed as a space over $V$ in two ways, via two projections $p_1^{(n)},p_2^{(n)}:V^{(n)}\hookrightarrow V\times V\to V$. This way given a sheaf $\mathcal E$ on $V$ we can consider its pullbacks to $V^{(n)}$ via those two projections, giving sheaves $p_1^{(n)*}\mathcal E,p_2^{(n)*}\mathcal E$. A stratification on $\mathcal E$ is then a choice of compatible isomorphisms $p_1^{(n)*}\mathcal E\to p_2^{(n)*}\mathcal E$, again satisfying suitable cocycle conditions.
D-modules: at least when $V$ is smooth, we can define on it the (noncommutative) algebra of differential operators $\mathcal D_V$. A D-module on $V$ is then a sheaf with a structure of $\mathcal D_V$-module.
Modules with (integrable) connections: let $\Omega_V^1$ be the sheaf of differentials on $V$. A connection on an $\mathcal O_V$-module $\mathcal E$ is a map $\nabla: \mathcal E\to\mathcal E\otimes\Omega_V^1$ which satisfies the Leibniz rule: $\nabla(fs)=f\nabla(s)+s\otimes df$. Such a connection can be extended to a map $\nabla_k: \mathcal E\otimes\Omega_V^k\to\mathcal E\otimes\Omega_V^{k+1}$ for all $k$, and this connection is called integrable (or flat) if $\nabla_{k+1}\circ\nabla_k=0$ for all $k$ (in fact, it suffices to assume this for $k=0$).
At a first glance there is not much relation between those notions, but in the end it turns out that they are all equivalent (in the sense of equivalence of suitable categories), at least under the assumption that $V$ is smooth of characteristic $0$ (in which case we can work with coordinates "etale locally"). Unfortunately, Le Stum's book doesn't discuss many details of those equivalences, in order to just focus on the last of those four notions. Thankfully there is a detailed reference for this topic, namely section 2 in Berthelot-Ogus's Notes on Crystalline Cohomology. If you can look past the old-school typesetting, it is a book definitely worth checking out. Although for my present purposes section 2 was enough, the rest of the book the extension of this theory to characteristic $p$ situation (using so-called divided power structures) to develop crystalline cohomology, something I would like to delve into in the future.
Some of the other references I have looked at relating to this which I want to share:
Some online notes on calculus on schemes, which nicely explain some of the notions above in a concrete case of an affine space,
Another set of online notes on calculus on schemes, treating the topic in different way, which I have used mostly as a reference for Gauss-Manin connection,
Kedlaya's notes on p-adic cohomology, which I also used as a reference for Gauss-Manin connections (and a refresher on spectral sequences.)
28/11/2020