Next, we convert the UTC timestamp in the column 'created_utc' to EST/EDT (GMT-5/4) using the zoneinfo standard python library. There are primarily three reasons for this choice: 1) More than half of reddit users are located in either North and South America. In particular, redditors from the United States (~43%), Canada (~5%), Mexico (~1.8%), Brazil (~2.6%) make up ~52.4% of redditors, as per World Population Review using 2023 data, 2) the hourly distribution graph is consistent with wake and sleep hours of North and South America, and 3) The majority of posts are in English. Limited resources prevent us from separating posts by geographical region.
These slices were then saved separately to data/partitioned