Bayesian clustering of huge friendship networks

Janne Aukia, 2007

Master's Thesis

Image of a network clustered using M0

Social networks have typically structure: there are dense groups of nodes and some nodes have disproportionately many connections. The structure emerges, because friendships are not formed randomly. Instead, people tend to become friends with those who are similar to themselves. This is called homophily.

The M0 algorithm finds clustering structure in networks with homophily by Bayesian statistical inference. The algorithm is based on a generative model for creating the edges of a network based on a mixture of latent components. The model parameters are inferred using Gibbs sampling. Because of homophily, the nodes that belong to the same cluster are likely to have similar traits.

In my master's thesis, an effective implementation of the M0 algorithm is introduced, which uses a balanced binary tree for storing the component probabilities. The implementation can be used on networks with even millions of nodes. The algorithm is tested on a range of well studied small networks and on a friendship network with over 600 000 users crawled from the Last.fm service.

Icon for a PDF file Bayesian clustering of huge friendship networks (4.5 MB)