Unsupervised and supervised methods for the detection of hurriedly created profiles in recommender systems

Introduction

We propose a framework to identify anomalous rating profiles, where each attacker (outlier) hurriedly creates profiles that inject into the system an unspecified combination of random ratings and specific ratings, without any prior knowledge of the existing ratings see Fig. 1.

We are interested in studying the Hurry attack, where the malicious users (outliers) create profiles without any prior knowledge of the system’s ratings and without really affecting the ratings of items e.g. promoting or demoting specific items. We try to detect users that hurriedly create a profile of abnormal ratings by inserting an unspecified combination of random ratings and specific ratings that are not consistent with normal user behavior [1].


Fig 1. An example of user-item preferences. The preferences of the user with the red bug icon seem to be abnormal according to the preferences of the other users.

Methodology

Figure 2. The schema of the proposed unsupervised method.

The proposed detection system consists of the following three stages [1]:

  • In the first stage, we eliminate users and items with sparse ratings.
  • In the second stage, five attributes that well discriminate abnormal from normal profiles are computed from the user-item rating matrix and the synthetic coordinates of the SCoR system.
  • In the last stage, a decision making process is applied, e.g. the proposed probabilistic framework or a decision k-means clustering method to automatically detect abnormal profiles. In the case where labeling of sample data is available, we train a random forest classifier instead of relying on a probabilistic framework, that certainly outperforms the supervised methods.

Experimental Results

Figure 3. The F1 score for the methods RF, PROB, W−KMEANS and W − KMEANS4 for different values of filler (left) and attack size (right) on (a) ML100k (b) ML datasets [1].


For each abnormal profile, the filler size is set to {30, 60, 90, 120, 150} and the attack size is set to {3%, 6%, 9%, 12%, 15%}, respectively. The synthetic data containing the abnormal profiles are inserted into the authentic data to construct the final experimental datasets. Therefore, we end up with 75 (3 × 5 × 5) experimental datasets resulting from three real datasets (ML100k, ML and SN), 5 different attack sizes and 5 different filler sizes [1].

Downloads

    • You can download the matlab code of the method proposed in [1] from URL
    • You can download the datasets of the method proposed in [1] from URL
    • See the corresponding readme files for more details.

Publications

[1] C. Panagiotakis, H. Papadakis, and P. Fragopoulou, Unsupervised and Supervised Methods for the Detection of Hurriedly Created Profiles in Recommender Systems, International Journal of Machine Learning and Cybernetics, 2020.

[2] C. Panagiotakis, H. Papadakis and P. Fragopoulou, Detection of Hurriedly Created Abnormal Profiles in Recommender Systems, International Conference on Intelligent Systems, 2018.