Multiple Local Community Detection with Integrated Node Attributes

Multiple Local Community Detection with Integrated Node Attributes

Introduction

Local multiple community detection is a critical task in social network analysis, aiming to uncover the complex affiliations and relationships among users within networks. Traditional approaches primarily rely on network topology, often neglecting the rich attribute information associated with nodes. This limitation restricts their ability to accurately identify overlapping communities, particularly in social networks where users exhibit diverse attributes and belong to multiple groups. To address this gap, the proposed algorithm, Multiple Local Community Detection with Integrated Node Attributes (MLCDINA), integrates both structural and attribute information to enhance community detection accuracy.

MLCDINA introduces several innovations to improve local community discovery. First, it combines structural and attribute similarities to compute edge weights, providing a more comprehensive measure of node relationships. Second, it introduces a weighted local clustering coefficient that accounts for both structural density and attribute homogeneity within neighborhoods. Finally, the algorithm employs a novel Intimacy Random Walk (IRW) to assess node importance, reducing the influence of peripheral nodes while emphasizing core community members.

Background and Motivation

Social networks are inherently complex, with users participating in multiple communities simultaneously. Traditional community detection methods often focus on non-overlapping partitions, failing to capture the nuanced relationships in real-world networks. Local multiple community detection extends these methods by identifying overlapping communities centered around seed nodes. However, existing techniques predominantly rely on topological features, ignoring valuable node attributes such as user profiles, interests, or behavioral data.

The integration of node attributes can significantly enhance community detection. For instance, in a social network, users sharing similar interests or affiliations may form tight-knit communities even if their direct connections are sparse. By leveraging both structural and attribute data, MLCDINA provides a more holistic view of community structures, enabling more accurate identification of overlapping groups.

Algorithm Overview

MLCDINA consists of three main stages: subgraph sampling, core member identification, and local community discovery. Each stage is designed to leverage both structural and attribute information, ensuring robust community detection.

Subgraph Sampling

The algorithm begins by sampling a subgraph around a given seed node. This subgraph should be sufficiently large to encompass all potential communities containing the seed while remaining computationally tractable. To achieve this, MLCDINA performs a breadth-first search (BFS) to gather an initial subgraph. Edge weights within this subgraph are computed by combining structural similarity (using Jaccard similarity) and attribute similarity (using cosine similarity). A balance parameter α controls the relative influence of structural and attribute information.

Next, the algorithm iteratively expands the subgraph by incorporating neighboring nodes with high Importance of Integration of Structure and Attributes (IISA) scores. These scores reflect the combined structural and attribute relevance of nodes to the seed. The expansion process continues until a predefined node limit is reached, ensuring the subgraph remains manageable in size.

Core Member Identification

Core members are nodes that exhibit high IISA scores, indicating their central role within potential communities. To identify these members, MLCDINA computes IISA scores for all nodes in the sampled subgraph. The IISA score for a node is derived from the aggregated probabilities of random walks originating from its neighbors, weighted by both structural and attribute similarities.

Nodes with IISA scores exceeding that of the seed node are designated as core members. These members are further grouped into disjoint sets based on their connectivity within the subgraph, each representing the core of a distinct community.

Local Community Discovery

Using the identified core members and the seed node, MLCDINA discovers local communities through a personalized PageRank (PPR) approach adapted for weighted networks. The PPR process prioritizes nodes with strong structural and attribute ties to the core members, ensuring cohesive community formation.

To refine the detected communities, the algorithm performs addition and removal operations. Nodes outside the community with high IISA scores relative to the community are added, while internal nodes with low IISA scores are removed. This refinement enhances community quality by ensuring members are both structurally and attritively coherent.

Key Innovations

Edge Weight Computation

MLCDINA computes edge weights by fusing structural and attribute similarities. Structural similarity is measured using Jaccard similarity, which evaluates the overlap in neighbors between two nodes. Attribute similarity is measured using cosine similarity, which compares node attribute vectors. The combined edge weight provides a balanced measure of node relationships, capturing both topological proximity and attribute alignment.

Weighted Local Clustering Coefficient

Traditional local clustering coefficients measure the density of connections among a node’s neighbors but ignore attribute homogeneity. MLCDINA introduces a weighted variant that incorporates edge weights, reflecting both structural density and attribute similarity. This enhancement ensures that sampled subgraphs are not only densely connected but also exhibit high attribute consistency, improving community detection accuracy.

Intimacy Random Walk (IRW)

Standard random walks often overemphasize peripheral nodes due to their direct connections to central nodes. IRW addresses this by considering the intimacy between nodes, defined as the product of their mutual edge weights. This approach reduces the influence of weakly connected nodes while amplifying the importance of nodes with strong structural and attribute ties. As a result, IRW more accurately identifies core community members.

Experimental Evaluation

MLCDINA was evaluated on five real-world attributed networks: Politics-UK, Politics-IE, Football, Olympics, and Rugby. These datasets represent diverse social networks with varying community structures and attribute distributions.

Performance of IRW

Comparisons between IRW and the Active Random Walk (ARW) used in prior work demonstrated IRW’s superiority. IRW achieved higher F1-scores across all datasets, particularly in networks with larger and more attribute-cohesive communities, such as Politics-UK and Politics-IE. In networks with smaller, more heterogeneous communities (e.g., Football and Olympics), IRW still outperformed ARW, albeit by a smaller margin.

Comparison with Existing Algorithms

MLCDINA was compared against three state-of-the-art local multiple community detection algorithms: Multicom, HoSIM, and MLC. The results showed that MLCDINA consistently achieved the highest F1-scores across all datasets. Notably, it performed well in both large-community networks (e.g., Politics-UK) and small-community networks (e.g., Olympics), demonstrating its robustness to varying community structures.

Parameter Sensitivity

The balance parameter α, which controls the trade-off between structural and attribute information, was tested across a range of values. The optimal performance was observed at α = 0.5, indicating that equal weighting of structural and attribute data generally yields the best results. However, performance was less sensitive to α in networks with clearly defined communities (e.g., Politics-UK), suggesting that structural cues alone may suffice in such cases.

Conclusion

MLCDINA represents a significant advancement in local multiple community detection by effectively integrating node attributes with structural information. Its innovations—edge weight fusion, weighted local clustering coefficients, and Intimacy Random Walk—collectively enhance the accuracy and robustness of community detection. Experimental results confirm its superiority over existing methods, particularly in attributed social networks.

Future work could extend MLCDINA to directed networks, where asymmetric relationships introduce additional complexity. Moreover, exploring dynamic attributed networks, where both structure and attributes evolve over time, presents another promising direction.

doi.org/10.19734/j.issn.1001-3695.2024.06.0190

Was this helpful?

0 / 0