Community Discovery in Public Opinion Social Networks Based on Improved Label Propagation Algorithm
Introduction
The rapid development of computer science and information technology has ushered in an era dominated by user-generated content on the internet. Social media platforms enable information to spread at unprecedented speeds, making public opinion on social networks a significant societal phenomenon. In 2023 alone, numerous high-impact events—ranging from social issues to international affairs—demonstrated the pervasive influence of social media. Understanding and analyzing these public opinion trends require identifying the key discussion topics within social networks. Community detection, a fundamental task in network analysis, plays a crucial role in uncovering these thematic clusters.
Traditional community detection methods can be broadly categorized into three types: similarity-based clustering, modularity optimization, and label propagation. While each has its strengths, label propagation algorithms (LPA) are particularly suitable for modeling dynamic opinion interactions in social networks. However, conventional LPA suffers from limitations such as susceptibility to local optima and randomness in label updates. This paper proposes an improved LPA that addresses these issues by incorporating node similarity and opinion dynamics, resulting in more stable and accurate community detection in public opinion networks.
Background and Related Work
Community Detection in Social Networks
Community detection aims to identify groups of nodes with dense internal connections and sparse external links. Early work by Girvan and Newman introduced the concept using edge betweenness, leading to the development of various algorithms. Three primary approaches have emerged:
- Similarity-Based Clustering: These methods group nodes based on similarity metrics, often using techniques like hierarchical clustering or spectral clustering. While effective, they may overlook node attributes and interactions.
- Modularity Optimization: These methods maximize modularity, a measure of community structure quality. However, they struggle with resolution limits and computational complexity in large networks.
- Label Propagation: LPA, introduced by Raghavan et al., is efficient and scalable, making it ideal for large networks. Yet, its reliance on random label updates and uniform neighbor interactions limits its accuracy.
Challenges in Public Opinion Networks
Public opinion networks differ from general social networks in several ways:
• Semantic Relationships: Traditional networks built on follower or retweet relationships may not capture deep semantic connections. A semantic social network, constructed based on content similarity, better reflects thematic communities.
• Selective Exposure: Individuals tend to interact with like-minded peers, reinforcing their beliefs. However, the open nature of social platforms exposes them to diverse opinions, influencing their views.
• Node Influence: Different users (e.g., government accounts, influencers, or ordinary individuals) have varying levels of influence, which affects opinion dynamics.
These nuances necessitate improvements to LPA to better model real-world opinion interactions.
Limitations of Traditional LPA
Issues in Node Selection
Traditional LPA propagates labels to all neighbors, ignoring selective exposure. In reality, individuals are more likely to engage with peers sharing similar views, though they may still encounter differing opinions. This oversight leads to suboptimal community detection.
Randomness in Label Updates
LPA updates labels randomly when multiple options are equally frequent, causing instability. Additionally, the order of updates is arbitrary, disregarding node influence. This randomness can propagate errors, leading to inconsistent results.
Proposed Improvements
Enhanced Node Selection
To address selective exposure, the improved LPA selects neighbors for label propagation based on content similarity. Using doc2vec, text content is vectorized, and cosine similarity measures the semantic closeness between nodes. The probability of selecting a neighbor for propagation is proportional to their similarity. A parameter k controls the fraction of neighbors considered, introducing controlled randomness to avoid local optima.
Experiments show that an optimal k (e.g., 0.85 or 0.9) improves modularity by up to 78% compared to traditional LPA, where k = 1 (all neighbors). This confirms that selective interaction enhances community detection quality.
Opinion-Driven Label Updates
To mitigate randomness, the algorithm incorporates the Hegselmann-Krause (HK) opinion dynamics model. Key enhancements include:
- Node Influence Ranking: Nodes are ranked by influence, calculated using attributes (e.g., follower count, engagement metrics) and network topology. High-influence nodes update first, preventing “reverse flow” errors.
- Opinion Integration: Nodes update their labels based on the average opinion of selected neighbors, weighted by influence. This ensures labels align with prevailing opinions in the community.
These modifications stabilize the algorithm, reducing variability in results.
Algorithm Overview
The improved LPA operates as follows:
- Initialization: Assign unique labels and opinion values to nodes.
- Influence Calculation: Compute node influence using topological potential and attribute weights.
- Label Propagation: For each node, select a fraction (k) of neighbors based on similarity and update opinions using the enhanced HK model.
- Label Update: Assign the label closest to the average opinion of selected neighbors.
- Termination: Repeat until labels and opinions stabilize.
The algorithm’s time complexity is dominated by influence calculation (O(n²)) and iterative updates (O(Lnkd)), making it scalable for large networks.
Experimental Validation
Dataset and Metrics
The study uses a dataset of 1,526 high-engagement posts from Sina Weibo discussing a 2022 violent incident in China. The network is constructed using semantic similarity. Performance is evaluated using:
• Modularity (Q): Measures community structure quality.
• Normalized Mutual Information (NMI): Compares detected communities to a reference (Louvain algorithm results).
• Adjusted Rand Index (ARI): Assesses clustering accuracy.
Results
- Sensitivity to k: Modularity and NMI improve as k increases, plateauing around k = 0.7. Optimal values (k = 0.85, 0.9) yield the highest modularity (0.629 vs. 0.602 for k = 1).
- Comparison with Baselines: The improved LPA outperforms traditional LPA and other variants (ITSLR, WILPAS, TS) in stability and accuracy.
- Community-Opinion Alignment: At convergence, nodes in the same community share similar opinions, validating the algorithm’s ability to reflect real-world opinion clusters.
Case Study
Four dominant themes emerge from the Weibo dataset:
- Demands for Justice: Calls to punish perpetrators (29% of nodes, opinion ~0.17).
- Distrust in Authorities: Skepticism over police reports (24%, opinion ~0.32).
- Criticism of Local Governance: Concerns about corruption (26%, opinion ~0.21).
- Women’s Safety: Discussions on self-protection (21%, opinion ~0.43).
These findings highlight the algorithm’s utility in identifying and analyzing public sentiment.
Practical Implications
- Identifying Key Themes: Authorities can pinpoint critical issues (e.g., distrust in officials) and tailor responses.
- Reducing Information Asymmetry: Timely clarifications (e.g., on police reports) can mitigate misinformation and foster consensus.
- Policy Formulation: Insights into community structure and opinions aid in designing targeted interventions.
Limitations and Future Work
While the improved LPA advances community detection, challenges remain:
• Overlapping Communities: The algorithm does not identify nodes belonging to multiple themes.
• Residual Instability: Controlled randomness improves exploration but introduces minor variability.
Future work could integrate overlapping community detection and further refine opinion dynamics modeling.
Conclusion
This paper presents an enhanced LPA for community detection in public opinion networks. By incorporating selective neighbor interaction and opinion-driven label updates, the algorithm achieves higher modularity and stability than traditional methods. Empirical results demonstrate its effectiveness in uncovering thematic communities and mapping opinion distributions, offering valuable tools for social media analysis and public sentiment monitoring.
doi.org/10.19734/j.issn.1001-3695.2024.06.0194
Was this helpful?
0 / 0