For Python800 and Java250, we reuse existing datasets from CodeNet licensed under the Apache License 2.0, which allows for modification, distribution, and private use.
As for Python75, we only scrape public-facing data and respect the Privacy Policy and Copyright declared by AtCoder. Concerning the Privacy Policy, we avoid scraping personal information, such as country/region, birth year, affiliation, etc. that can identify a particular individual. In addition, all users are anonymous. Concerning the Copyright, AtCoder points out ``The rights and obligations of Users arising in connection with these Terms shall be governed by and construed in accordance with the laws of Japan''. As claimed by Article 32, Copyright Law of Japan: (1) It is permissible to quote and thereby exploit a work that has been made public. In such a case, the work must be quoted consistent with fair prac-tices and within a scope that is justified for the purpose of news reporting, critique, study, or other place in which the work is quoted. The CodeS dataset is noncommercial and is for nonprofit research and educational purposes.
! For PrimeVul, we reuse existing datasets from PrimeVul licensed under the MIT license, which allows for modification, distribution, and private use.
Therefore, we believe that there are no ethical concerns.