Neural Collaborative Filtering Recommendation Model for De-Exposure Bias Based on Fused Rewards
Introduction
Recommendation systems have become indispensable in the digital age, serving as critical bridges between users and products or services. These systems analyze user behavior and preferences to deliver personalized suggestions, enhancing user experience and engagement. However, despite their widespread adoption, recommendation systems face significant challenges, one of which is exposure bias. Exposure bias occurs when a system disproportionately recommends highly exposed items while neglecting those with lower exposure, even if the latter may align with user preferences. This bias stems from sparse interaction data and uneven exposure patterns, leading to a feedback loop where popular items gain more visibility, and less-exposed items remain obscure.
Traditional recommendation algorithms, including collaborative filtering and matrix factorization, often struggle to address exposure bias effectively. Neural Collaborative Filtering (NCF) has emerged as a promising approach, leveraging deep learning to model complex user-item interactions. However, NCF still relies heavily on historical interaction data, which tends to be biased toward frequently exposed items. To mitigate this limitation, researchers have explored hybrid models that combine NCF with exploration-focused techniques, such as multi-armed bandit algorithms.
This paper introduces a novel approach called the Neural Collaborative Filtering Recommendation Model for De-Exposure Bias Based on Fused Rewards (NCF_Reward). The model integrates NCF with the Linear Upper Confidence Bound (LinUCB) algorithm to balance exploration and exploitation in recommendations. By embedding reward features derived from LinUCB into the NCF framework, the model enhances its ability to identify and promote low-exposure items with high potential value.
Background and Related Work
Exposure Bias in Recommendation Systems
Exposure bias arises from the implicit assumption that users can only interact with items they are exposed to. Consequently, unobserved interactions do not necessarily indicate disinterest but rather a lack of exposure. This bias skews recommendation systems toward over-recommending popular items, creating a feedback loop that further marginalizes less-exposed items. Several studies have attempted to address this issue.
For instance, Liang et al. proposed ExposureMF, a probabilistic model that treats exposure as a latent variable. While effective in certain scenarios, ExposureMF struggles with dynamic user behavior and lacks validation in real-time environments. Other approaches, such as adversarial regularization and inverse propensity scoring, have shown promise but face challenges in handling sensitive data attributes or adapting to dynamic environments.
Neural Collaborative Filtering
NCF represents a significant advancement over traditional collaborative filtering by using neural networks to model user-item interactions. Unlike matrix factorization, which relies on linear dot products, NCF employs multi-layer perceptrons (MLPs) to capture non-linear relationships. The model consists of an embedding layer that maps user and item IDs into dense vectors, followed by neural layers that learn interaction patterns. The output layer predicts user preferences, typically optimized using binary cross-entropy loss for implicit feedback data.
Despite its strengths, NCF suffers from exposure bias because it primarily learns from observed interactions, which are skewed toward high-exposure items. This limitation motivates the integration of exploration mechanisms to uncover hidden user preferences.
Multi-Armed Bandit Algorithms
Multi-armed bandit (MAB) algorithms, particularly contextual bandits like LinUCB, are designed to balance exploration and exploitation. LinUCB leverages contextual information to estimate the potential reward of each item and selects items with the highest upper confidence bounds. This approach ensures that the system explores less-exposed items while exploiting known high-reward items.
Recent work has explored combining neural networks with bandit algorithms. For example, DeepLinUCB uses deep learning to enhance reward prediction, while ENR integrates user behavior and contextual data for improved recommendations. However, these methods often focus on single behaviors (e.g., clicks) and lack comprehensive evaluation across diverse recommendation scenarios.
Proposed Model: NCF_Reward
The NCF_Reward model addresses exposure bias by fusing NCF with LinUCB-generated reward features. The model architecture consists of three main components:
-
User and Item Feature Extraction
The model begins by extracting user and item features from the dataset. For users, attributes such as ID, age, gender, and occupation are encoded into dense vectors. Similarly, item features include ID and genre information. These features are processed through embedding layers and fully connected networks to generate compact representations. -
Reward Value Feature Extraction
LinUCB is employed to compute reward values for each user-item pair. The algorithm estimates the expected reward based on historical interactions and contextual features. Items with high uncertainty (i.e., low exposure but high potential) are assigned higher exploration rewards. These reward values are then embedded into the NCF framework to guide the recommendation process. -
Fused NCF Architecture
The final model combines the user and item embeddings with the reward features. The matrix factorization (MF) component computes element-wise products of user and item vectors, while the MLP component learns non-linear interactions. The outputs of both components are concatenated and passed through a final layer to generate recommendation scores.
Experimental Evaluation
Datasets and Setup
The model was evaluated on two widely used datasets: MovieLens-100K and MovieLens-1M. These datasets contain user ratings for movies, with significant disparities in item exposure. The data was split into training (60%), validation (20%), and test (20%) sets.
Evaluation Metrics
The performance was assessed using both fairness and accuracy metrics:
• Fairness Metrics: Exposure and Gini coefficient measure the distribution of recommended items.
• Accuracy Metrics: Hit Rate (HR), Normalized Discounted Cumulative Gain (NDCG), precision, recall, and coverage evaluate recommendation quality.
Results
-
Ablation Study
Removing either the NCF or LinUCB components degraded performance, confirming their synergistic contribution. The full NCF_Reward model achieved the highest HR@10 and NDCG@10 scores. -
Comparison with Baselines
NCF_Reward outperformed traditional NCF and state-of-the-art debiasing models (EBPR, FaiRIR_RL, FaiRIR_Sim). Notably, it increased exposure by approximately 60%, demonstrating superior fairness. -
Hyperparameter Analysis
Experiments revealed that a learning rate of 0.0001 and an embedding dimension of 64 yielded optimal results. Higher learning rates led to unstable training, while lower dimensions reduced model capacity. -
Case Study
For a sample user, NCF_Reward significantly boosted the rankings of low-exposure items (e.g., from 265th to 14th), while high-exposure items saw moderate declines. This illustrates the model’s ability to rebalance recommendations.
Conclusion
The NCF_Reward model effectively mitigates exposure bias by integrating LinUCB-derived rewards into the NCF framework. Experimental results demonstrate substantial improvements in both fairness and accuracy, with a 60% increase in exposure for low-exposure items. The model’s success lies in its ability to balance exploration and exploitation, ensuring diverse and relevant recommendations.
Future work could explore dynamic reward adjustment and the integration of additional contextual features (e.g., time, location) to further enhance performance. Extending the model to other domains, such as e-commerce or news recommendation, would also validate its generalizability.
doi.org/10.19734/j.issn.1001-3695.2024.05.0184
Was this helpful?
0 / 0