We plan to clarify the relationship and difference between our work and these additional references. In particular,
[0] does consider changes in item attributes or group labels due to the newly observed user feedback over time (i.e., changing item popularity and thus changing popularity group in which that item exists). However, they focus on ensuring item-side exposure fairness between item groups with different levels of popularity. Thus, their method is not directly comparable with ours.
[1] does focus on the long-term effects of intervention of utility fairness between different demographic groups in a social network. However, the differences between our study and theirs are fundamental:
They study the impact of addition of fairness notions on the state of a social network, i.e., the difference in average network sizes between user groups, unlike our focus on performance disparity (PD).
Their recommendation problem is link recommendation in a unipartite social network. (e.g., friend recommendation on Facebook), while our focus is on item recommendation in a user-item bipartite graph.
their relevance scores between users are not based on the learned model parameters, but rely on linear combination of several unipartite network properties, such as the size of a user’s current network and users’ common neighbors (i.e., triadic closure), which cannot be inferred from bipartite networks. For these reasons, this work is not directly comparable with ours.
[2] aims to balance the trade-off between item-side exposure fairness and user-side utility individual fairness, which is different from our user-side group fairness (i.e., performance disparity). Specifically, while they ensure the exposure fairness, they want the resulting recommendation utility reduction to be equally allocated to every user.
[3] is closest literature to our work, as they try to impose user-side performance disparity in the training phase and ensure that the learned fairness can be consistent on all items in the test phase via IPS-based ideas. However, their model is designed for static recommendation settings and tested on a single randomly split training/test data. In other words, they do not consider the incremental update settings.
We have conducted additional experiments by adapting the method in [3] to our settings, i.e., we can periodically retrain the model at each time period. Both this method and FADE are based on the Matrix Factorization backbone recommendation model and tested in Task-R on Movielenz and Modcloth datasets.
The results reveal that FADE is more effective in reducing PD in all cases. Regarding overall performance, they are quite comparable. We suspect the sub-optimal PD reduction of the baseline method is due to the naive loss design of absolute DPD and retraining model update strategy.
Moreover, since their method is based on historical retraining, it requires significantly more runtime (FADE: 4.08s and [3]: 7232.23s) measured in the same setting as Table 2 in our paper. Additionally, their computation per epoch itself takes more time than ours due to additional computation regarding IPS when we compare them in the same retraining update setting (Retrain-Fair: 1401.18s and [3]: 7232.23s). Also, they have significantly more learnable parameters: additional user/item embeddings for IPS and two neural networks for two user groups.
References
[0] Ge et al. (2021, March). Towards long-term fairness in the recommendation. In Proceedings of the 14th ACM International Conference on web search and data mining (pp. 445-453).
[1] Nil-Jana Akpinar et al. 2022. Long-term Dynamics of Fairness Intervention in Connection Recommender Systems. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society (AIES '22). Association for Computing Machinery, New York, NY, USA, 22–35.
[2] Wu et al. (2021, July). Tfrom: A two-sided fairness-aware recommendation model for both customers and providers. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1013-1022).
[3] Jiakai Tang et al. 2023. When Fairness meets Bias: a Debiased Framework for Fairness aware Top-N Recommendation. In Proceedings of the 17th ACM Conference on Recommender Systems (RecSys '23). Association for Computing Machinery, New York, NY, USA, 200–210.