Home > The Data
The "Google Play Store Apps" dataset has been created by Gautham Prakash and contains 1.1 million+ app data. The dataset can be found in Kaggle.
As of this site creation, the last update of the dataset was on December 20.
Due to run-time optimization, we preprocessed the dataset so which it contains the following columns:
App_Name
Category: associating an app with a category such as Travel, Social, Lifestyle, etc.
Rating: 0.0 to 5.0, mean of user ratings.
Rating_Count: number of ratings that have been applied by users.
Installs: Rounded down sum of installs, for e.g. 10,000+, 100,000+, ect.
Maximum_Installs: Actual number of installs (=downloads).
Price: USD price of an app, includes 0.0 for free apps.
Size: in MB.
Released: released date of the app in the play store, in yyyy-mm-dd format.
Last_updated: date of the app's last update, in yyyy-mm-dd format.
Note: The data contains apps that have been last updated in 2020 only.
Ad_supported: True if the app support advertisements, False otherwise.
In_App_Purchases: True if the app supports in-app purchases, False otherwise.
Ad_bool: 0 and 1 values for Ad_Supported column.
14 Columns, 112087 Rows.
The ETL preprocessing has been executed via google colab and can be found on GitHub. The optimized data is available for download below.
A sample of the data frame is shown below: