Figure 1 shows a screenshot of the submissions to AtCoder Beginner Contest 044. The task distribution shift, time distribution shift, and programmer distribution shift are prepared based on the tags "Task", "Submission Time", and "User". Note that, all the users are anonymous and we do not extract private information about the country, birth year, etc.
Figure 1. Screenshot of the submission results at AtCoder. The considered tags when crawling data are highlighted with a blue frame. The red ones are for the distribution shifts.
The table blow shows the details of crawled data (the raw data of Python75) from AtCoder.
Table 1. Raw data crawled from AtCoder beginner contests.
Two figures below show code examples with different distribution shifts.
Figure 2. An example of task distribution shift. Two programs target different tasks using Python.
Figure 3. An example of programmer distribution shift. Both users submit programs to AtCoder Beginner Contest: 058, Task A – A - ι⊥l. Programming language: Python.
The figure below shows examples of the splitting by the frequency of tokens based on the density distribution. By comparison, the ID and OOD codes have very different token distributions.
Figure 4. Density distribution of ID and OOD code tokens. Each subfigure corresponds to token representations of two code files in a given task (caption of the subfigure). x-axis: ID code tokens. y-axis: OOD code tokens. First row: Python75. Second row: Java250-S. Last row: Python800-S.
The figures below show the CST representations of 4 submissions. The distance between the first two and last two are 121 and 90, respectively.
Figure 5. CST representation of submission 1.
Figure 6. CST representation of submission 2.
Figure 7. CST representation of submission 3.
Figure 8. CST representation of submission 4.
Five more examples in CodeS+ under node/edge distribution shifts (Dataset: Java250-S). Two code examples targrt same task in each example below.
Distribution Type: Node-AST
Distribution Type: Node-CFG
Distribution Type: Edge-Dataflow
Distribution Type: Edge-PDG
Distribution Type: Edge-Reftype