Results & Evaluation

Results when no labelled target data available

In the above table we show the results when no labelled target data is available. We show the results of transfer on overlapped types of different languages. We compare our methods with five baselines, including three popular domain adaptation baselines (TAPT, MMD and ADV) as well as two supervised learning baselines learnt from source and target language data respectively. The results show that under the setting w/o any target language data, our method improves over the domain adaptation baselines significantly and achieved the state-of-the-art performance wither using optionally-typed language (i.e. TypeScript/Python) or strongly-typed language(i.e. Java) as source language.

Results when partial labelled target data available

In the real-world, during the early stage of a weakly-typed programming language, the type hint annotations of the language provided by developers are scarce, especially for primitive types. Thus it would be extremely valuable if we were able to quickly build a functional type inference tool by leveraging existing cross-lingual labelled dataset to augment the training data of the model. Therefore, we evaluate under the setting when partial target language data is available. The results show that by using cross-lingual dataset with syntax enhancement and the kernelized attention methods we proposed, PLATO significantly improves over all the baseline methods when both the target language dataset is scarce or when training with full target language dataset.

Ablation study

we perform an ablation study to study the contribution of different components of our method. We build the following baselines to evaluate each component: PLATO without syntax enhancement, PLATO without VTC-based Kernelized Attention and Ensemble Inference. As shown in the above table, different components in our framework are all effective and contribute to the final performance. Besides, to further investigate how the VTC-based kernelized attention affect the behavior of BERT, we perform a case study and visualize the attention of the model (shown in the above figure). For exmaple, in the code snippet of Example 1 , we show the attention of variable "showcolors" on each token in the code sequence. With the VTC-based kernelized attention, PLATO identifies that "showcolors" has a very high attention score on "true" and it is thus correctly predicted as boolean type. However, without VTC, the token "timeoutinterval" and "0" have high attention scores and "showcolors" is incorrectly predicted as Timer type.

Page updated

Google Sites

Report abuse