In this study, we reveal that ChatGPT-User does not adhere to the robots.txt policy and accesses the pages under a directory disallowed in the robots.txt of our self-hosted websites. Specifically, as shown below, we self-host a domain and disallow the ChatGPT-User from accessing pages under a certain directory. However, from our access log, we found that the ChatGPT-User actually accesses these pages and returns the answer with content from that page.
Access log obtained from our self-hosted server can be seen below.