Galois Theory

Galois representations and elliptic curves

For a quick definition of many of the terms used here, you may refer to the Glossary.

External references for this section: [Gou]


Introduction and motivation

So far we have seen two essentially different ways to formulate the notion of "modularity" for an elliptic curve E over q. The first is that E is modular if it has a parameterization by modular functions, i. e. a map X

(N)->E. The second is that E is modular if there is a modular form f of weight 2 for (N) such that there is an equality of L-functions, L(E,s) = L(f,s). The Taniyama-Shimura conjecture says that both of these conditions are true for any E.

In practice, it has proven difficult to use either of these properties, either to develop a proof of the conjecture or to apply the conjecture to other problems (like Fermat's Last Theorem). The difficulty, perhaps, lies in the disparity between the essentially analytic nature of the properties and the algebraic nature of an elliptic curve and the kind of problems to which we want to apply the theory.

We seem to need some more algebraic formulation of what it means for an elliptic curve to be modular. It now appears that in fact work during the last 10 or 15 years by Wiles and others (especially Ribet, Mazur, and Serre), has provided just what we need in the form of a definition of modularity involving the theory of representations of a Galois group.

In order to approach this, we need to review several standard areas of abstract algebra.

Galois theory

Galois theory is essentially the "complete" theory of the roots of polynomial equations in one variable. That is, it presents as complete a picture as possible in the general case of the solutions of polynomial equations. The study of such equations is one of the oldest parts of mathematics, of course. The explicit formulas for solutions of a quadratic equation, and sometimes a cubic equation, are taught in secondary schools. There are also explicit formulas for quartic equations, but not for quintics and equations of higher degree. Classic geometric problems like construction with ruler and compass of regular polygons and trisection of angles can be interpreted in terms of Galois theory (and thereby classified as solvable or not).

Galois theory is hardly a new part of mathematics. It is, like most things, the work of many people, but the most important ideas and results were conceived by Evariste Galois in 1832. In modern terminology, it is formulated using the concept of an algebraic structure called a field. A field is a set, which may be finite or infinite, that has two distinct but closely related group structures on it. The most common examples are the rational numbers, Q, the real numbers, R, and the complex numbers C. Each of these has group structures corresponding to the operations of addition and multiplication. The two operations are related in that multiplication is required to be "distributive" with respect to addition, i. e. a(b + c) = ab + ac. Everything else follows from the group axioms and the distributive rule.

In Galois theory, the primary object of interest is the polynomial equation in one variable,

where the coefficients {a

} are all in some specific "base" field. The goal of the theory is to say as much as possible about the roots of such equations, that is, values of x for which the equation is true. In general, the roots of the equation will not be members of the same base field as the coefficients. One may think of the roots simply as abstract objects which can be "adjoined" to the base field to provide solutions of the equation.

We can introduce symbols for roots of certain equations, e. g. and i, and then express other roots in terms of those symbols. It turns out that when one adds such symbols to a field (i. e. "adjoins" them) and uses the equation they satisfy as an additional axiom, then the enlarged set also satisfies all the axioms for a field, and it is called an extension field.

Looking again at any polynomial equation, one finds that it can have at most n roots in any extension field, where n is the degree of the polynomial. It may have fewer distinct roots if some are repeated: x

+ 2x + 1 = (x+1) = 0 has just a single root (x = -1).

Galois' brilliant insight was that one can know essentially "everything" there is to know about the roots of polynomial equations by considering a new object, a group, namely the group of all "reasonable" permuations of those roots. Here, "reasonable" is not a technical term, but can be explained as follows.

A permutation of the roots determines an "automorphism" of the extension field that contains those roots, that is, a map T of the extension field to itself which preserves the field structure. In partucular, T(ab) = T(a)T(b) and T(a + b) = T(a) + T(b). Furthermore, if a number c is in the base field (the field of the coefficients of the polynomial), then we require T(c) = c, i. e. T leaves the base field fixed.

Given this, not all permutations of the roots of a polynomial may be reasonable, because they don't induce an automorphism of the extension field which leaves the base field fixed. This may happen if there are polynomial relationships among the roots with coefficients in the base field.

For instance, in the polynomial

f(x) = (x-i)(x+i)(x-2i)(x+2i) = x + 5x + 4

the roots are x

= i, x = -i, x = 2i, x = -2i. We have the relations x = 2x and x = 2x

. We can allow permutations that exchange x and x, or x and x, or both. But we can't allow a permutation that exchanges x

and x. Because the resulting field automorphism T would require T(x

) = x = 2x = 2T(x) = T(2x) = T(4x) = 4T(x).

The set of "reasonable" permutations thus generates a set of automorphisms of the extension field that leaves the base field fixed. This set of automorphisms is actually a group, and it is called the Galois group of the extension. The Galois group is a way of encoding all available information about the relationships of the roots of polynomials with coefficients in the base field that factor completely in the extension field. So in order to study all roots of a given polynomial, it is sufficient to find an extension field that contains all of the roots and examine the Galois group.

Notice that we have managed to express one kind of mathematical problem - description of the roots of a polynomial equation - in terms of a symmetry group, where the symmetry in question involves permutations among the roots. Here again, symmetry operations can be used to express a concept of "similarity" or "likeness". In this case, certain roots of an equation are "like" others because they satisfy the same algebraic relations, even though they are not numerically the same. But for all algebraic purposes they are interchangeable.

For future reference, we will simply state the fundamental facts of Galois theory. We say that a (finite) field extension E

F is Galois if E is the field obtained by adjoining to F all roots of some irreducible polynomial with coefficients in F. The Galois group of E over F, Gal(E/F), is the group of automorphisms of E that leave F fixed (i. e., that map all elements of F to themselves). The fundamental theorem says that there is a 1:1 correspondence of intermediate fields E' such that E

E' F, and subgroups H of Gal(E/F), where E' is the field left fixed by H. Further, H is a normal subgroup of Gal(E/F), if and only if the corresponding extension E' is Galois over F, in which case Gal(E'/F) is isomorphic to the quotient group Gal(E/F)/H.

Group representations

We have had ample evidence that groups are very useful mathematical objects. We have seen them used to describe such diverse phenomena as geometric transformations of a shape and the roots of an algebraic equation - to say nothing of their many applications outside of pure mathematics itself.

But there is one problem in working with abstract groups, in that it is often not easy to do computations with them. Group elements hardly ever involve ordinary numbers such as people, or computers, are accustomed to computing with. They are abstract objects like geometric transformations or permutations of a set. Sometimes they are just symbols related by certain equations.

However, it was discovered long ago that it is always possible to "represent" an abstract group in terms of objects that can easily be computed with, namely matrices over a field such as R or C. In fact, it can be done with matricies whose entries are members of a ring, such as the integers Z, rather than a field. (A ring is like a field, except that nonzero elements don't necessarily have multiplicative inverses.)

The construction is very straightforward. Given an arbitrary group G and a ring R, one first constructs the "group ring" R(G) which consists of finite formal "sums" of "products" of elements of R and elements of G, e. g.

The sums here are in a formal sense, unrelated to the addition operation in the ring. Addition of such sums is done in the "obvious" way, and multiplication is done using the addition and multiplication laws of the ring and the group law. For instance,

Note that multiplication doesn't have to be commutative in either the ring or the group, though most often the ring is commutative.

This group ring is an "R module" because it also allows for multiplication by elements of R in the obvious way. In case R is a field, it is simply a "vector space", and all the usual concepts of linear algebra apply. In particular, linear maps from the group ring to itself, called endomorphisms, can be expressed as matrices with entries lying in R.

Now we just have to observe that every element g of G induces a natural endomorphism of R(G) given by multiplication (on the right, in the noncommutative case). Specifically, if g

G we define

Endomorphisms have a natural operation, namely composition, where ()(x) = ((x)) = (xg) = xgh =

(x) if x R(G). Further, since G is a group, has an inverse, namely . In particular, any

must be 1-to-1, and in that case it is called an automorphism. The relationship

= says that the group law for endomphisms is the "same" as that of G itself, in the sense that the map g->

is a group "homomorphism", in other words, a map that preserves the group structure.

Any map from G to a group of endomorphisms on an R-module is called a linear group representation, or simply a representation. When the R-module is finitely generated (as it is when G is finite) and R is a field, then we just have a finite dimensional vector space V, and we choose a basis. Given a basis, we can then express any endomorphism of V as a matrix. Hence a group representation gives us a homomorphism from G to a group of matrices. If the homomorphism is "injective" (1 to 1), then we can treat G as a subgroup of a matrix group. This is the case, in particular, for the representation using the group ring. (If

is the identity as an automorphism, g must be the identity of the group.) So computations become straightforward (though possibly tedious), for either humans or computers, since they are just matrix operations.

p-adic numbers

At this point we need to make just one more detour. In number theory it happens that it is often easier to approach problems in terms of one prime number at a time. For example, this is why we considered the points of an elliptic curve over a field F

rather than over Q. This is such a common occurrence that we need more powerful tools to work with. One of the most useful of such tools is the "p-adic numbers".

So pick some prime p. Whenever nm, there is a natural map of rings Z/pZ -> Z/pZ given by reduction modulo p

. Using an abstract algebraic construction called the projective limit, one can define a "universal" object which is a ring that in some sense encompasses all the Z/p

Z simultaneously. This object is called the ring of "p-adic integers" and denoted by Z

. In particular, the ordinary integers Z can be viewed as a subring of Z

, since any integer can eventually be represented by itself in Z/p

Z for all n sufficiently large. The quotient field of Z is denoted by Q, and Q may be viewed as a subfield of Q

. Elements of Q are called p-adic numbers.

There are various other ways of defining Z. In particular, a topology can be defined on Z in which "closeness" of two elements n

and n is measured by the power of p that divides z - z. (The points are closer together the larger that power of p is.) This actually yields a metric space, and Z

is just the completion of Z in this topology.

Z can also be defined in terms of formal power series

where 0

a <. p. Such series actually converge in the p-adic topology. Any ordinary integer can be uniquely represented as a polynomial in powers of p with non-negative integral coefficients <. p, so again Z is naturally included in Z


Use of p-adic numbers is ubiquitous in modern number theory. There is a well-developed theory of p-adic analysis which is analogous to classical analysis on the topologically complete field C. (But note that Q

isn't algebraically complete like C is.) There are even p-adic analogues of zeta and L-functions.

Galois representations and elliptic curves

We have apparently strayed quite far from the topic of elliptic curves, to say nothing of Fermat's Last Theorem. What's the connection? It is that given an elliptic curve, we can define in a fairly straightforward way, a family of representations, one for all but a finite number of primes, of an important "universal" Galois group on a group of 2-by-2 matrices over the p-adic numbers Q


Although linear group representations can be constructed by means of the group ring, as above, they can also arise naturally in many other ways. They need not be injective, either. In the abstract, a representation of a group G is just a group homomorphism

: G -> GL(R), for some n > 0 and some ring R. (GL

(R) is the group of invertible n-by-n matrices with entries in R.)

The group representation we are about to look at encodes a lot of information about the particular elliptic curve E on which it is based. In this case, Its purpose is as a tool for studying E rather than for the information it carries about the group.

The group in question is the Galois group of an infinite field extension. We start with the rational numbers Q. There is a field, called the algebraic closure of Q,

, which is the smallest subfield of C that contains all finite extensions of Q. Essentially,

is the field generated by all algebraic numbers, i. e. all roots of polynomial equations whose coefficients lie in a finite extension of Q. It is quite a much larger field than Q, though it is a small subfield of C. With a little bit of work, one can define the Galois group of the extension

/Q. Needless to say, it is not a finite group. From now on, G will denote this group.

Let E be a particular elliptic curve. Recall from the introductory material on elliptic curves, that there are groups E[m] of "m-division points" of E, i. e. the subgroup of points of E that have orders dividing m. Furthermore, E[m] is isomorphic to (Z/mZ)

. The coordinates of points in E[m] are actually algebraic numbers, so if g

G, g "acts" on points of E[m]. A priori the result of this action is just another point on the elliptic curve, but it isn't hard to show it is actually in E[m] too. In fact, this action of G respects the group structure of E[m]. Hence g corresponds to an endomorphism of the Z/mZ-module (Z/mZ)

. It's not hard to check this means we have a representation of G in GL

(Z/mZ), one for each m.

If p is any prime, an important special case is m=p for any positive integer n. We get representations of G in GL

(Z/pZ) for all n. Using the same kind of abstract algebra as was used to construct Z itself, we can piece together the representations of G for each n, and the result is a single representation of G on GL

(Q). This representation incorporates a great deal of information about the elliptic curve E.

What kind of information in particular? Let N be the conductor of E. Consider first the representation (E,m): G->GL

(Z/mZ) for arbitrary m. Then for any prime number q that is prime to mN there is a simple expression for the value of a

= q + 1 - #(E(Z/qZ)), the qth coefficient of the Dirichlet series of the L-function L(E,s), modulo m. The p-adic representation

(E,p): G->GL(Q) is even better. For any prime q that doesn't divide pN, we get a congruence for a

modulo p for any n. Since a

is actually an integer, that is enough to determine it exactly. The best part of this is that from just this one representation at a single prime, we can recover the actual value of a

for almost all q.

In more detail, here's how this works for a given m. The kernel of the representation (E,m) (all elements that map to the identity) is an infinite subgroup of finite index in G, so by Galois theory there is a finite extension K

of Q in , and G modulo the kernel is isomorphic to the (finite) Galois group G

= Gal(K/Q) of K over Q as well as to a subgroup of GL(Z/mZ). (K is just the field generated over Q by adjoining the coordinates of all points in E[m].) So we can regard G

as a subgroup of GL

(Z/mZ). In fact, Serre has shown that (except for the rare case where E has the property known as "complex multiplication") G

= GL(Z/pZ) for almost all p.

In algebraic number theory (specifically, in "class field theory"), a special element of G can be identified called the "Frobenius automorphism".

is actually well-defined only as a member of a certain (conjugacy) class. However, viewing

now as an element of GL(Z/mZ), its trace is well-defined and, miraculously, this value is none other (modulo m) than the coefficient a

= p + 1 - #(E(Z/pZ)) that occurs in the L-function of E, for all primes p that do not divide m or the conductor of E.

Galois representations and modular forms

Following the line of thinking we pursued with L-functions, it seems that there ought to be some way to define a Galois representation corresponding to a modular form. We would expect, further, that whenever a modular form is related to an elliptic curve (either because it has the same L-function or because it arises from a covering X

(N)->E), the Galois representation corresponding to f should be the same as the one corresponding to E. Also, this notion of modularity should be equivalent to the others. And, lastly, every elliptic curve should be modular in this sense, if the Taniyama-Shimura conjecture is true.

Unfortunately, this part of the theory seems to involve the largest number of technicalities, so that it is especially difficult to explain the constructions and the reasoning involved. On the other hand, this is the form of the theory where it has actually been possible to prove every elliptic curve is modular (in the semistable case) and apply the result to problems like Fermat's Last Theorem.

So we're going to cop out now on explaining the gory details, and instead move on to getting an overview of the theory in action, as applied to FLT.