Posted on 

Note | Robust detection of link communities in large social networks by exploiting link semantics - PART 2

Experiments

Dataset

We selected two datasets, including the content of internal emails of the US energy company Enron (Enron scandal, California energy crisis) and the content of three forums of the news site Reddit over three days. If user A comments on user B’s post, A link is generated from A to B with the content of the comment.

2.1

So how can we tell if our community’s findings are correct?

For the first dataset, Berkeley students have divided these user nodes into 11 user communities, and we can directly compare the results found by the communities with those 11 communities. For the second dataset, we can directly compare the found communities with the content of the three forums.

2.2

The method of comparison

We adopted 8 kinds of the most advanced community discovery algorithms, including the use of topology, the use of node content, the use of link content, overlapping, non-overlapping (overlapping means that a user node can be placed into multiple communities), etc., as shown in the figure.

2.3

Evaluation Indicators

F-score and Jaccard similarity are two parameters used to evaluate the similarity. The larger the two parameters, the better the community discovery result.

2.4

Result

2.5

2.6

Case study

A dataset from Reddit dated August 27, 2012 was selected for our analysis, compared to SCI

The results of the SCI method are as follows:

2.7

Our method:
2.8

2.9

Another benefit of our approach is that we can find the word cloud of the community through and :

2.10

Some suggestions:

2.11

Conclusion

2.12

This is the title of the article. Let’s discuss and summarize the keywords.

Robust: In the traditional methods, when the network topology and topic clustering do not coincide, the efficiency of the method will become very low, but our method separates the network topology and topic clustering to discuss, which has a certain robustness.

Detection of Link Communities: The main content of this thesis is community discovery.

Exploiting the Link Semantics: based on Link Semantics.