Dataset Descriptions

VkusVill Dataset

VkusVill is a large Russian grocery chain, specializing in local suppliers and healthy food. They have a relatively limited assortment of products, all sold under their brand name. They have an active loyalty program, with over 70% of transactions being made with a loyalty card (i.e. individual customer or household). This data set contains all transactions purchased with a loyalty card.

Loyalty cards are free and any person can apply. Loyalty cards allow customers to collect bonus points which can later be used towards purchases. One “point” is equivalent to one ruble.

One of VkusVill’s promotions for its loyalty program members is the “Favorite Product” (Любимый Продукт) promotion. After spending 500 rubles with her loyalty card, a customer can select a single product from the entire assortment to be her “favorite” product, and receive a 20% discount on that product. The customer may change the Favorite Product once a week. If, on a concrete purchase occasion, the customer’s Favorite Product is out of stock, the customer can select any product to be her “favorite product for the day”, for which she will get 20% discount till the end of the day.

Uchi.ru Dataset

Uchi.ru is a Russian educational platform for school children to study multiple academic subjects (math, russian language, English, science, among others) in an entertaining way. The platform is extremely popular in Russia, with a user base of over 3 million students. The educational content on this platform is presented in the form of gamified cards, where each card contains a set of interactive exercises. The cards are grouped based on topic, e.g., «addition and subtraction» for math. Uchi.ru also regularly holds Olympiads in the same subjects, in which students solve more creative outside-the-school-curriculum problems. Students with high scores on the olympiads receive awards and certificates.

This dataset includes information about performance and usage of the platform by a random sample of second-grade students across multiple subjects. The data are individual-level but anonymized. For each student in the sample, the dataset contains the sequence of cards solved by a student, along with their usage and performance variables, such as time spent per card and % of correctly solved problems. An additional dataset displays performance of the same set of students on Olympiads.