CN10 Statistical Inference


Objectives

  • use ensemble perspective on complex networks
  • focus on community detection.

Questions

  • what is the difference between supervised and unsupervised machine learning?
    • whether we have the labelled y
  • what is the goal of statistical inference?
    • study properties of our observed network based on already studied statistical ensemble.
  • what is a probabilistic generative model?
    • generate the new objects according to the probability.
  • what is the likelihood function?
    • $P(A|B)$
  • why is the maximization of the likelihood function equivalent to the maximization of the log-likelihood function?
    • because the log function is monotonic.
  • Can you explain the parameters in the stochastic block model?
    • $E_{kl}$: the actual number of edges between block $k$​​ and block $l$​
    • $N_{kl}$: the maximal number of edges between block $k$​ and block $l$​.
    • $M_{kl}$: the probability of an edge in block $k$​ linked to one in block $l$​.
  • How can we calculate the maximum likelihood of a stochastic block model for a given block assignment vector $\vec{z}$?
    • first, find the most likely block matrix.
    • second, calculate the likelihood on that matrix.
  • Explain how we can use the stochastic block model to infer community structures in networks
    • for each block number, find the optimal block assignment
    • compare the likelihood under different block number and choose the one best balances the complexity and fitting.
  • Give an example for overfitting in the context of the stochastic block model.
    • each node has its own block.
  • What is “Occam’s razor” and what does it tell us about “overfitting”?
    • More things should not be used than are necessary.
    • try to find the balance between complexity and explanatory power.

Keywords

  • community detection
  • statistical inference
  • stochastic block model

Statistical

Statistical ensemble

Probabilistic generative model

A stochastic model that generates networks G with probability $P(G)$

  1. consider aggregate statistics (macrostates)
  2. define statistical ensemble of microstates
  3. study expected properties of microstates based on macrostate.

We start from abstract to numerous concrete cases and back to abstract.

Our objective is to study statistical ensemble.

Reverse way

  1. consider empirical network $G_e$
  2. from numerous statistical ensembles, select the most plausible one
  3. using our knowledge of that statistical model to make some conclusion of our network.

We start from one empirical observation, and go to abstract models and back to the empirical case finally.

Our objective is to study properties of our observed network based on already studied statistical ensemble.

In some way, the probabilistic generative model should be studied first.

Liklihood

the probability of a model to generate this given observation $G_e$
$$
L(\theta|G_e)=P_{\theta}(G_e)
$$

Maximum likelihood estimator

the estimation of parameters that maximize the likelihood of our observation.

log-likelihood function

turn the exponents into factors. A simplified calculation method.

  • what decides the likelihood?
    • observation (object)
    • generative model
    • studied statistical index

Community Detection

Methods

  • optimization of partition quality/modularity $Q$​.

  • spectral partitioning based on Fiedler vector of Laplacian matrix

    • This is hard partition right? means we can only partition on disconnected network, instead of finding subclusters in any network.
  • statistical inference

    • we define a space of statistical ensembles, each encoding different community structures.
    • choose the ensemble which maximizes our empirical network
    • decide our community structure based on the chosen ensemble.

    Question: what kind of statistical ensembles do we have?

Stochastic Block Model

idea

using latent variable: assign a node to a block.

  • stochastic: the block assignment is stochastic
  • block: the final matrix is based on blocks, not nodes as common seen in network. In other way, the basic entry is block.
  • it is a generalization of $G(n,p)$​ model
  • we can
    • use the stochastic block matrix and block assignment vector to generate a network,
    • use the block assignment and a network to find the maximum likelihood stochastic block matrix.

Comparison with other statistical ensembles

  • the ensembles what we have learnt:
    • given some statistical index, e.g. $p$ in $G(n,p)$ model
    • generates new networks
    • For each network, it will have a corresponding likelihood.
    • here, network is a variable.
  • stochastic block:
    • given an observed network, and block number
    • generates new cluster assignments
    • For each assignment, it will have a corresponding likelihood.
    • here, network is a parameter.

Calculation

For given number of blocks B, the likelihood for a block assignment vector $\vec{z}$:
$$
L(M,\vec{z})=\Pi_{(i,j)\in E}M_{z_iz_j}\cdot\Pi_{(i,j)\notin E}(1-M_{z_iz_j})
$$

$$
=\Pi_{1\leq k\leq l\leq B}M_{kl}^{E_{kl}}\cdot (1-M_{kl})^{N_{kl}-E{kl}}
$$

  • the matrix $M$: is an independent parameter given in ahead, not relying on others, e.g. block assignment. Different $M$ will give different likelihood. However, given $z$ ,we can find a $M_z$ returning the maximum likelihood.
  • $N_{kl}$ are decided by the vector $z$, the block assignment vector.
  • $E_{kl}$ are decided by the vector $z$ and the topology of the network.

We redefine

$$
\hat{L}(\vec{z})=L(\hat{M}, \vec{z})=\Pi_{1\leq k\leq l\leq B}\frac{E_{kl}}{N_{kl}}^{E_{kl}}(1-\frac{E_{kl}}{N_{kl}})^{N_{kl}-E_{kl}}
$$

optimal block assignment definition:
$$
\hat{\vec{z}}=\arg_{\vec{z}}\max \hat{L}(\vec{z})
$$
This block detection method is still based on the edge relationship.

Optimal number of communities

The above likelihood is calculated given the number of blocks. But what is the optimal number of blocks?

Special case

Each node is in its own cluster. In this situation, the block number is the same as the node number. The likelihood is 1. It can be interpreted that the assignment is the only possible one given the block number.

Conclusion

We need to find a balance between the model complexity and fitting level.


Author: Fululu
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source Fululu !
  TOC