Correction & Update (12/25/2022):
(1) We use our latest pre-trained model checkpoint and re-evaluate the results on four meta-types which achieve better results for RQ1&2.
(2) For a fairer comparison for RQ3, we fix some erroneous data preprocessing on the raw dataset and keep the size of the dataset from both language sources to be both of 5k (also experiment on 10k). And we select the checkpoint which achieves the best EM score on the validation set for all methods.
(3) The comparison results (Table 4,5) with the LambdaNet and Typilus baselines are unfair due to inconsistent prediction space and sampling bias during implementation. We update the latest results in the following.
We report the results of the baselines from the original LambdaNet, DeepTyper, Typilus paper and GitHub repositories. Specifically, for TypeScript (inter-project), the LambdaNet (lib only) and DeepTyper (lib only) results are evaluated on library type annotations only on the LambdaNet dataset. while DeepTyper (all), PLATO-seq (all) and PLATO (all) are evaluated on the DeepTyper dataset with a much larger prediction space that contains both user-defined and library type annotations. For Python (intra-project), PLATO w/o pytype achieves better performance than all compared baselines under the same evaluation setting.
(4) We update all the corresponding case studies and conduct an extra analysis of the source of improvement as shown in Table 5(RQ3).
The updated version of our paper can be found here.
Hitherto statistical type inference systems rely thoroughly on supervised learning approaches, which require laborious manual effort to collect and label large amount of data. Most Turing-complete imperative languages share similar control- and data-flow structures, which make it possible to transfer knowledge learned from one language to another. In this paper, we propose a cross-lingual transfer learning framework, PLATO, for statistical type inference, which allows us to leverage prior knowledge learned from the labeled dataset of one language and transfer it to the others, eg, Python to JavaScript, Java to JavaScript, etc. PLATO is powered by a novel kernelized attention mechanism to constrain the attention scope of the backbone BERT model such that model is forced to base its prediction on correct features instead of spurious bias. Furthermore, we apply syntax enhancement by leveraging srcML meta-grammar representation to increase feature overlap among language domains. We evaluated PLATO under two settings: (1) no labelled target language data, and (2) partial labelled target language data. Experimental results show that PLATO outperforms the baseline methods by a large margin under both settings.
Figure 1
Figure 1 displays an overview of the PLATO framework.
PLATO consists of four major parts: (1) variable type closeness matrix extraction, (2) syntax enhancement, (3) training, and (4) ensemble-based inference. The inputs to our system is the source code sequence, its corresponding srcML meta-grammar sequence and variable type closeness matrix. The output is the trained model that can predict the corresponding type annotations for each tokens in the given code sequence. The key idea to make use of cross-lingual knowledge is to increase feature overlap among different language domains such that we can use cross-lignual data to further boost the performance of a statistical type inference tool for another language. In this work, we achieve this via two major methods:
VTC-based kernelized attention
Syntax Enhancement
Figure 2
Figure 2 shows the model architecture we used in this work. We adopt a two stage training paradigm: 1) unsupervised cross-programming language model (XPLM) pretraining, 2) supervised type inference fine-tuning. The inputs of the model for both training stages are the same, the only differences are the output training loss. Concisely, as shown in Figure 2, the XPLM model receive two inputs, namely, the augmented code vector c' and its corresponding variable type-closeness (VTC) adjacency matrix A_{K}^{Q}. For the c', instead of merely using source code representation as input, we augment it with srcML meta-grammar representation such that there are more feature overlap among different language domains which we empirically find to significantly improve performance. As for the variable type-closeness matrix, it is used to constrain the attention scope of the model at each layer such that each token is forced to leverage the relevant tokens within the attention scope of itself defined by the VTC graph kernel we defined.