Projects | Python Type Inference

        A static type inference tool for catching errors in Python.

 

Overview 

Python is a popular dynamically-typed object-oriented language. However, programmers used to strongly-typed languages like Haskell or ML (or even just compiled languages like C++ and Java) may find Python's lack of type annotations and very minimal static error checking to be very troubling. One has no idea whether or not a block of code has any errors -- even simple errors such as passing arguments to a function in the wrong order -- without executing that block of code.

This page describes a project to develop a static type-inference and type-checking tool for Python to catch errors statically. There have been a number of attempts to add static type-inference and type-checking systems to Python, including Starkiller, Psyco, PyPy, and the works of Brett Cannon and of John Aycock. These attempts tend to focus on increasing the performance on Python code, and often trade supporting all legal Python for improved performance on the more commonly used parts of Python, while the goal of this project is to focus on finding errors and supporting as large a subset of legal Python as is possible.

Status

A simple flow-insensitive type-inference tool has been written (in Haskell) that does a reasonable job inferring and checking types on a somewhat restricted subset of Python. Currently, no classes or modules are supported (no support for qualified names). Also not supported are a some complicated but non-essential syntactic constructs (list comprehension, slicing, default function parameters), as well as generators and heterogeneous lists.

The type system currently supports basic built-in, function, list, and tuple types, in addition to universal and restricted quantification in type schemas:

    type ::=   number | string 
| list{type}
| tuple{type_1, type_2, ..., type_n}
| type -> type

schema ::= for_all t. type
| for_all t in {type_1, ..., type_n}. type

Polymorphic types are inferred for non-recursive functions, but not for recursive functions at the moment. 

Implementation

The type inference tool consists of three main stages: lexing and parsing, scoping analysis, and actual type inference.

The lexer and parser appear in files python-lexer.hs and python-parser.hs. Although these two are split into different files, lexing and parsing are done in a single pass, using Haskell's parser combinator library Parsec.

Scoping analysis code is in python-annotate.hs. This code annotates the Python AST so that the type inference code can ignore all Python scoping issues. It mainly annotates all variables in the AST with unique identifiers so that distinct variables in different scopes but sharing the same name are given different identifiers. The code liberally uses "Scrap Your Boilerplate"-stlye lightweight generic Haskell.

The type inference code all appears in infer-type.hs. It uses the standard Algorithm W/Unification approach, except that it adds tuples and homogeneous lists in the natural way, and modifies the algorithm slightly to support constrained quantification. The handling of let-polymorphism is modeled on some notes from the University of Chicago's CMSC 22610.

The Python AST data structure is defined in python-ast.hs, the AST pretty-printer is in python-pprint.hs, and the code for the tool which actual applies the parser, does type inference, and outputs the result is in test-tool.hs.

Examples

  • Example 1 (recursion, lists, for statements, tuples, tuple assignment):
        def range(start, end): 
    if start == end: return [end]
    else: return [start] + range(start+1, end)

    def fib(n):
    (old, cur) = (0, 1)
    for i in range(1, n-1):
    (old, cur) = (cur, old+cur)
    return cur
    has inferred types:
        range :: num -> num -> [num]
    fib :: num -> num
  • Example 2 (nested functions, let-polymorphism, constrained quantification):
        def f(x, y):
    def g(x): return x
    return g(x)*5

    a = f("hello", 17)
    b = f(4, (3, " world"))
    has inferred types:
        f :: all x in {num, string}. all y. x -> y -> x
    g :: all x. x -> x

    a :: string
    b :: string

Files

  • cs136c-python-type-infer.tgz This file contains all of the code for the final version of the project. It requires a recent version of GHC to build. The Makefile produces a single target, test-tool, which takes a Python file as a command-line argument or attempts to read a Python program from stdin. It outputs a dump of the abstract syntax tree (AST), a pretty-printed version of the AST, and then the types it infers for the code (or an error of some kind).  

About

This project is the work of Jacob Burnim, Mike Kocurek, and Benji Stein. It was undertaken as a final project for the the third term of Jason Hickey's Caltech course CS 136, Programming Languages Laboratory.

Several presentations about the project were given over the course of term: