842: Applied Algebra [Algebraic Statistics]
Instructor: Rodriguez ; Time: MWF 8:50-9:40 ; Location: Van Vleck B309; Class #: 80539 or 80670 ; Office Hours: After class for ~40min;
Instructor: Rodriguez ; Time: MWF 8:50-9:40 ; Location: Van Vleck B309; Class #: 80539 or 80670 ; Office Hours: After class for ~40min;
This course will cover several topics in statistics and data science from the perspective of applied algebra. The course is designed to be inclusive by introducing a new topic from statistics paired with a new technique from applied algebra. This way, those who have a background from either will be well prepared. Examples of topics include maximum likelihood estimation, the method of moments, graphical models, and more.
Suggested references and a sample of topics
Many of the lectures will be based on material from Seth Sullivant's textbook Algebraic Statistics
Coming from statistics, this is a method of estimation of population parameters. We will use this example to motivate some techniques from computational algebraic geometry.
(Chapter 7) A method of estimating the parameters of an assumed probability distribution, given some observed data.
(Chapter 13 and 14) An important family of statistial models that are used to build complicated interaction structures between many random variables.
Monodromy is a technique in numerical nonlinear algebra. We see how it can be applied to problems in likelihood geometry.
The award winning textbook Ideals, Varieties, and Algorithms by Cox, Little, and O'Shea is a great introduction to algebraic geometry
Chapter 4: This will be paired with the algebraic geometry topic of irreducible decomposition and primary decomposition.
After introducing discrete exponential families and toric varieties in Chapter 5, we discuss this Chapter 9 material.
A list of project topics and potential start points is found here.
This will be continuously updated.
The Applied Algebra Seminar usually meets Thursday's at 11am over Zoom.
The long program Algebraic statistics and our changing world is happening at IMSI (Chicago) Fall 2023.
AlToGeLiS usually meets Thursday's at 10am Central over Zoom.
MRC Program (Summer 2022): Data Science at the Crossroads of Analysis, Geometry, and Topology
Algebraic Statistics Hawaii Conference
Student Lectures and Projects (PDFs and Slides link)
5/6: K.S.
5/4: Y.S. +J.X.
5/2: M.B. + D.M.
4/29: C.P + H.C.
4/27:O.G. + Zeyu W.
4/22: S.Y.
4/11: I.S.
4/25: We discussed identifiability and some of the history of the method of moments. I suggest checking out this public lecture by Anna Seigal.
4/22: S.Y. + more stat's.
4/20: u-generation finishing up and then EM Algorithm in Ch 14.3.
4/18: Computational Algebra Day -- u-generation intro.
4/15: (Good Friday) Open Problem day. Using 4x4 matrices as our start point, we went in three different directions for open problems.
4/13: Talked a bit about Chapter 9. (More will be seen in student presentations)
4/11: I.S. gave a presentation on Binomial Edge Ideal and Conditional Independence.
4/8: We reviewed convex polytopes (Ch 8.1) and mixed volumes to count solutions to polynomial systems using polyhedral geometry. See Chapter 3
4/6: We discussed the likelihood ratio test and computed tangent cones and approximated the distribution of the likelihood ratio test statistic.
3/30: We discussed MLE for Gaussian models and how the ED degree arises in a special case.
We only discussed ED degrees for simple examples, but there is a deep and rich theory with lots of ongoing work
A fundamental example comes from low rank approximation and leads into low rank approximation of tensors.
In the tidbit we reviewed tangent cones and the algorithm in the textbook that says how to compute tangent cones using Grobner basis.
3/28: We discussed conditional independence models in regards to Gaussian models.
3/25: Connected traditional presentations of hypothesis testing to what was discussed in the previous class. Reference: Chapter 10 of Mathematical statistics with applications (pdf coming soon).
3/23: Hypothesis testing introduction and tangent cones from Chapter 6.
3/21: Monodromy for solving systems.
Reference: Trace Test and decomposable sparse systems
Software guide to checkout
3/11: We will have a software day. Bring laptops or a friend with a laptop.
Installing M2: http://www2.macaulay2.com/Macaulay2/Downloads/
It's pretty straightforward for a Mac.
Exercise: Pick an A-matrix and find the generators of the toric ideal given by A.
Using the generators of your toric ideal, find the likelihood equations for generic data, e.g., (2,3,5,7,11,..).
Hint: needsPackage"Binomials"
help "matrix"
help "kernel(Matrix)"
help "generators(Module)"
"help "latticeBasisIdeal"
help "saturate"
What is the degree of your likelihood equations?
Try it for a different data vector and compare with a friend.
Exercise: Try out the exercise from 2/9. What can you say about mixtures? See the reference on 2/7.
Exercise: Try this out.
needsPackage"NumericalAlgebraicGeometry"
help "NumericalAlgebraicGeometry"
help "solveSystem"
help "irreducibleDecomposition"
Pick your favorite function f and let ell be a linear function with generic coefficients.
How many critical points does the function f+ell have?
Choose different coefficients for ell to find the most real solutions.
3/9: We went over Prop 6.2.4 and gave some background to why it is useful.
Further details can be found Chapter 12 of Sturmfels' GB and Convex polytopes book (UW Madison Login needed)
3/7: We reviewed Likelihood Geometry, namely the applications of the algebra ideas. We also covered Chapter 6.2 Discrete Regular Exponential Families.
Suggested exercise: Work through Example 6.2.7 [AS] and run 4ti2.
Compute the ML degree of the model in Example 6.2.5 [AS].
[AS] Exercise 6.2.
[AS] Exercise 6.3.
Bonus: For those wanting to dive further into Toric Geometry checkout these recent lecture notes on the subject https://arxiv.org/abs/2203.01690 by Simon Telen.
3/4: We covered Chapter 7.2 in particular a version of Algorithm 7.2.4.
Check out the Ex 1.14 and Ex 1.17 in Likelihood Geometry.
Some of you might find the references to resonance loci (page 12 of 55) interesting.
3/2: We covered Chapter 3.6 and the Random Censoring Model in Example 7.1.5.
2/28: We went over what generic means. We also defined colon ideals and saturation.
The motivation came from solving the score equations.
For more details on colon ideals and saturation see Chapter 4.4 in [IVA].In particular, Prop 9 and Theorem 14.
Participation surveys and Project Ideas are due today.
2/25: We covered statistical models and likelihood.
See Chapter 5.1-5.3 in [AS] for more details.
2/23: We discussed homotopy continuation and the Bezout (and multihomogeneous Bezout homotopy).
Suggested: Read this crash course and/or this tutorial
If you haven't already, please check out Macaulay2.
For numerical a.g., check out these software references:
NumericalAlgebraicGeometry.m2 (Macaulay2)
HomotopyContinuation.jl
Bertini
For random 3x3 matrices A,B,C use continuation software to solve the polynomial eigenvalue problem (λ^2A+λB+C)X=0 where X is the vector (1,x1,x2). This system has three unknowns and three equations.
How many solutions did you find? How many start points are there?
At the start of class there was a blurb on random monomial ideals. I should also mention the UW Madison contingent on this includes Caitlyn Booms.
2/21: We had an intro to numerical algebraic geometry. Our discussion was motivated by Puiseux series.
[CBMS, Sturmfels] Chapter 1 and Chapter 3 can be used as references.
2/17: More on Conditional Independence models for discrete random variables
Theorem 4.3.5. Try working through the proof in the textbook, or give a computational proof for the binary variable case.
2/15: Conditional Independence models and binomial ideals
We went over Def 4.1.2 for the discrete random variable case.
Exercise: Read Prop 4.1.6. Determine generator(s) for the conditional independence ideal given by the statement X_{1} || X_{2} | X_{3} where X_1, X_2, X_3 are discrete random variables. Hint: see the top of page 75 of the textbook [AS].
We went over Def 4.1.7. What is your guess: is the conditional independence ideal I_{A || B | C} a prime ideal?
From the statistics/probability side: I didn't get to go over Prop 4.1.4 today. Here is the idea. It is known that it is impossible to find a finite set of axioms (rules) from which all conditional independence relations can be deduced [Stu92]. But exist a few easy conditional independence implications which follow directly from the definitions. These are called the ``conditional independence axioms" or ``conditional independence inference rules" and are stated in Prop 4.1.4.
Use Macaulay2 to determine the primary decomposition of the sum of the two ideals we found in class. As a hint type in: help "primaryDecomposition(Ideal)"
2/13: Motivating Irreducible Decomposition and Primary Decomposition. Here are the items we talked about in class.
Prop 4.2.2. It would be nice if someone could present a proof of this as a Tidbit.
Prop 4.2.3. Do Exercise 3.5 first, and then prove the proposition.
Def 4.2.4. Exercise: which monomial ideals are primary ideals?
Prop 4.2.5. Prove part 2 and part 3. Hint: See the textbook.
Bonus: Def 4.2.6: Primary Decomposition. Read the definition and conjecture an algorithm for finding the primary decomposition for monomial ideals in 2 variables.
2/11: Multivariate Normal Distributions
We discussed covariance in class today.
Prove Cov[X,Y]=E[XY]-E[X]E[Y]
Do Ex 2.9.
Suggestion: Read Def 2.3.13 of correlation and go through Example 2.3.18 in AS.
Review Def 2.3.19 of u-moments which was introduced quickly at the end of lecture.
Ex 2.8 to review conditional distributions.
2/9: Icebreaker day and GB Review
Checkout Macaulay2 and use it to compute a GB for the ideal <h1,h2,h3> we studied by hand today. Hint:
R = QQ[mu,sig, m1,m2,m3,MonomialOrder=>Lex];
h1 = mu-m1
h2 = mu^2+sig^2-m2
h3 = mu^3+3*mu*sig^2-m3
I = ideal(h1,h2,h3)
G = gb I
gens G
2/7 [Ch 2, AS], [Moment varieties of Gaussian mixtures, J.Alg.Stat]
Suggested Exercise: Work through the Binomial expectation running example is AS (Flipping n coins with probability of heads p): Examples 2.2.2; 2.2.7; and 2.3.2;
Challenges from Moment varieties of Gaussian mixtures:
A Solution to Remark 8.
A proposition for G_{2,5} analogous to Proposition 7.
2/4: [Ch 3.4, AS]; [What is.. a Grobner Basis]
Exercise: Find a GB for <x+3y-5, 2x+6y+z-17> using Buchberger's Algorithm.
Exercise: Given a linear a linear system of equations, explain how Reduced Row Echelon Form gives a Grobner Basis.
Fun Friday video: whales
2/2: [Ch 3.3, AS]; [p72, Dixon's Lemma, IVA]; Supplement: [Chapter 2, IVA];
We disagreed on the example because I thought I wrote this example on the board (For clarity, here I use x,y,z instead of p1,p2,p3): -5x^3+7 x^2 z^2+4 x y^2 z+11 z^2
Prove [Theorem 3.3.9, AS]
Sign-up sheet: for Mathematical-Tidbits-In-Five-Minutes. Starting 2/18, each class has an assigned person who has up to five minutes at the end of class to share a tidbit. This can be a proof of a theorem, a neat example, or a solution to an exercise. Since you aren't required to present, there is a backup speaker for that day who can informally say what they are doing for their project or something about their research interest (like an elevator pitch).
1/31: [Ch 3.2, AS]; [Chapter 1.5, IVA]
Prove [Prop 3.2.12, AS]
Prove [Prop 3.2.13, AS]
1/28: [Section 2.1 and 2.2, AS]
The course grade is determined by a project (80%) and participation (20%). Algebraic Statistics is a diverse field using techniques from combinatorics, commutative algebra, algebraic geometry, and more. As a result there is wide range of applications in statistics. The project is an opportunity to dive into a new topic that interests you.
If you have worries about the final project, then please see me in office hours.
See the syllabus on Canvas for details