This site describes how to download, install, and run the HOList proof assistant and benchmark, including the DeepHOL automated theorem prover. For details, please check out our ICML paper.

Another document on autoformalization gives a higher level description on how HOList could fit into a general learning framework for strong automated reasoning.


The official HOList benchmark for automated theorem proving can be found as the file theorem_database_v1.1.textpb in the deephol data release. It consists of all theorem statements in the core, complex, and flyspeck corpora we describe in our paper. The goal of the benchmark is to prove as many theorems as possible in the HOList environment in the order they appear in the database. That is, only theorems that occur before the current theorem are supposed to be used as premises (lemmata) in its proof. The theorem database comes with a test/validation/train label that we used in our experiments.

Versions: We update the benchmark when required to keep it up to date with the proof assistant. We are currently at version 1.1, which differs from the first release only in the addition of two theorems in the flyspeck section of the theorem database. Evaluations on the sections core and complex are not affected whatsoever. The previous version of the theorem database can be found here.


Imitation Learning

Training only makes use of human proofs.

Reinforcement + Imitation

Trains on both human proofs and synthesized proofs.

Reinforcement Learning

Training does not use any human proofs.

Proof Assistant

The HOList proof assistant allows machine learning algorithms to interact with the HOL Light interactive theorem prover. Our HOL Light repository includes a Docker file that makes it easy to install and use the environment. As the installation from source may take some time, we offer a pre-built docker container. First, create a network for communication with the HOList environment:

docker network create holist_net

Then run HOL Light:

docker run -d --network=holist_net --name=holist gcr.io/deepmath/hol-light

For an example how to communicate with the container, see the DeepHOL instructions below. You can stop the container as follows:

docker stop holist && docker rm holist && docker network rm holist_net

DeepHOL Neural Prover

DeepHOL is an automated tactic-based neural theorem prover. Its code is located on GitHub. Note that our installation instructions were tested on Ubuntu Linux only.

First, get the model checkpoints, sample configuration files (and the benchmark theorem database):

wget https://storage.googleapis.com/deepmath/deephol.zip -O /tmp/deephol.zip
unzip /tmp/deephol.zip -d /tmp

After this, the theorem database can be found in: /tmp/deephol/theorem_database_v1.1.textpb

First, run the HOList environment, as described above. Then run the DeepHOL prover:

docker run -it --network=holist_net -v /tmp/deephol:/data gcr.io/deepmath/deephol \
  --prover_options=/data/configuration/prover_options.textpb \
  --output=/data/prooflog.textpbs \

This generates /tmp/deephol/prooflog.textpbs containing the proof log, please see the proof checker section below to verify.


docker stop holist && docker rm holist && docker network rm holist_net


The previously downloaded file deephol.zip also contains example models for DeepHOL. These can be specified using path_model_prefix in the prover options. In addition, one should specify model_architecture, the model architecture of path_model_prefix. In our current release, there are two kinds: PAIR_DEFAULT (premises selected do not depend on the tactic chosen) and PARAMETERS_CONDITIONED_ON_TAC (premise predictions is per-tactic, "tactic-dependent").

  • /tmp/deephol/checkpoints/holparam_3330000 is a model checkpoint for the model trained with imitation learning only. model_architecture: PAIR_DEFAULT.
  • /tmp/deephol/checkpoints/loop_tac_depend_5210000 is a model checkpoint trained with the reinforcement learning loop. model_architecture: PARAMETERS_CONDITIONED_ON_TAC.

Proofs and Training Data

You will need gsutil to download the datasets (sudo pip install gsutil, or see here for instructions):

mkdir $HOME/deephol-data
gsutil -m cp -R gs://deepmath $HOME/deephol-data

The data contains human proofs and synthetic proofs, which are available as "proof logs". Proof logs consist of a list of proof nodes and some meta data. The first proof node of each proof log is the theorem statement that we try to prove; other proof nodes are subgoals encountered later in the proof. Each proof node lists includes a list of tactics applied to it, which each can produce a list of subgoals.

  • Human proofs: We instrumented HOL Light to produce proof logs for each theorem proven using the prove function. This data set includes proofs for both "core" and "complex" benchmarks.
  • Synthetic proofs: This data set includes all proofs discovered during two recent runs of our reinforcement learning loop. Therefore, it can include multiple proofs for a theorem.

Proof logs are normalized: By default, HOL Light often creates fresh types and fresh variables during parsing and proving. In a naive (token-based) machine learning approach, these types and variables create a lot of tokens that occur in exactly one term. This poses a risk for learning approaches, as it could allow a trained model to identify specific terms by these tokens. We therefore normalized all expressions and replaced these types and variables with more unique variable names (e.g. ?0, ?1, and GEN%PVAR%0).

Training data: To enable researchers to quickly start training their own models, we also provide our training examples (tfexamples). Please find them together with the proof logs stored as tfrecordio.

Proof Checker

The proof checker allows you to make sure that your proofs are correct - with an extremely high level of certainty. The proof checker operates in two steps: It first translates proof logs into OCaml code, and then compiles and runs the entire HOL Light library while replacing the original human proofs with the proofs from the proof logs. Don't forget to check the final report at the end of step two.

First put the proof logs you want to check in /tmp/deephol/proof_logs.textpbs. The format of the proof logs is assumed to be one protobuf in text format per line in the file. Now translate the proof logs to OCaml code as follows:

docker run -it -v /tmp/deephol:/data --entrypoint "/home/proof_checker" gcr.io/deepmath/deephol \
  --theorem_database=/data/theorem_database_v1.1.textpb \
  --proof_logs=/data/prooflog.textpbs \

This generates a file /tmp/deephol/synthetic_proofs.ml with the OCaml code of your proofs.

To check these proofs, you will need our modified version of HOL Light.

Copy the OCaml proofs you generated into the repository:

cp /tmp/deephol/synthetic_proofs.ml hol-light/

Now compile HOL Light including the synthetic proofs with the following command:

docker build -f hol-light/Dockerfile_check_proofs --ulimit stack=1000000000 --tag check_proofs hol-light/

Finally, run HOL Light to check your proofs in proof checking mode:

docker run check_proofs

Expect a 1% failure rate: Around 1% of the proofs found by DeepHOL fail in the proof checker. Manual inspection suggests that all existing failures are due to subtle differences in how the proof assistant works during stateless proving and during proof checking. We will fix this in future releases.

Integrating with custom checkpoint

If you would like to train your own model (rather than using our training code or our released checkpoints), this section gives a high-level overview of the changes you'll need to make.

The API for interfacing with any trained model is implemented as an abstract base class in predictions.py. You should subclass the Predictions class and implement any "@abc.abstractmethod" defined there. For examples of concrete subclasses, see HolparamPredictor and TacDependentPredictor in holparam_predictor.py.

You'll also need to add your new predictor to the get_predictor function in prover.py (this is where the path_model_prefix gets passed through, along with any other prover options that are required for initializing your predictor).