CN8 Random Walks and Diffusion


Objectives

  • topology’s influence on dynamical process
  • model and analyze diffusion process in complex networks.
  • random walk model for diffusion
  • Markov chain convergence theorem
  • Feedback centrality measures

Questions

  • how is the transition matrix of a random walk defined for weighted and unweighted networks?
    • what is the entry in the transition matrix
    • what is the effect of weight?
    • $T_{ij}=\frac{A_{ij}}{\sum_k A_{ik}}$
  • When does a random walk in a network converge to a unique stationary distribution?
    • the transition matrix: irreducible, aperiodic
  • How can we compute the stationary distribution of a random walk?
    • multiply the transition matrix many times enough.
    • calculate the eigenvector corresponding to the eigenvalue1 of the transition matrix.
  • How is total variation distance defined? What is its maximum value?
    • the half of the sum of absolute difference in each dimension.
    • 1
  • For a right-stochastic transition matrix T and an initial distribution $\pi^{(0)}$, how can we compute the visitation probabilities after t steps of a random walk?
    • $\pi^{(0)}T^t$
  • Can you give an example for a network in which a random walk never reaches a stationary distribution?
    • $\pi^{(0)}=(1,0,0)$
    • $P(1\to2)=1,P(2\to3)=1,P(3\to1)=1$
  • Can you give an example for a network in which a random walk reaches multiple stationary distributions for different initial distributions?
    • separated network with many clusters, within each cluster it will reach a stable distribution.
  • How can a periodic transition matrix be made aperiodic?
    • add small random values on the diagonal entries.

Keywords

dynamic process, total variation distance, transition matrix

Dynamic Process

Definition

  • Dynamic: means the nodes’ states changing with time. The network topology is static.

Classification

  • Four classes:
    • synchronization: the purpose of the common wave length or rhythm is to strengthen the group bond, like homophily. Property’s effect on network evolution
      • e.g: people sharing the same culture work together
    • consensus: nodes adopt state of neighbors. Network evolution’s effect on property change.
      • e.g.: opinion formation
    • propagation: a process by which some initial quantities multiply or proliferate throughout the network. Non-conserved quantity: spreading to others doesn’t reduce the original quantity.
      • e.g.: disease broadcast, rumours proliferation.
    • diffusion: net movement of anything (e.g., ideas, ions, molecules) from a region of higher concentration to a region of lower concentration. Conserved quantity.
      • e.g.: water, gas, electricity transportation.

Diffusion process

  • the topology’s influence
  • the role of nodes

Random Walk

Idea

using probability to model lack of knowledge.

generate random paths through a state space.

Markov Chains

If we use the Markov property (memoryless) to build random walk, we get a kind of random walk, memoryless Markov process.
$$
P(X_{t+1=s_{t+1}}|X_t=s_t)
$$

Transition Matrix

$A$​ is the adjacency matrix of the network. We can define transition matrix as
$$
T_{ij}=\frac{A_{ij}}{\sum_k A_{ik}}
$$

  • weighted/unweighted is decided by $A$.
  • the $T_{ij}$​​ is normalized, so we can scale out all the outgoing weights of a node without any change of $T$.

Row-stochastic/right-stochastic

the row sum of stochastic vector is 1.

Right means we multiply $T$ at the right side of a vector $vT$​. Here $v\in\mathbb{R}^{1\times N}$

Aperiodicity

State i is said to be periodic with period t iff

  • $T_{ii}^{(n)} = 0, \forall n\neq t, 2t, 3t, \cdots$
  • $T_{ii}^{(n)} \neq 0, \forall n= t, 2t, 3t, \cdots$

Write in another way
$$
r(s_i)= \text{gcd}{t\geq 1: (T^t)_{ii}>0}
$$

  • aperiodic: $r(s_i)=1,\forall i\in V$
  • periodic: otherwise

How to make a matrix aperiodic?

  • add small random value on the diagonal items.
  • $T_{ii}=\epsilon,\forall v\in V$
  • a random walk with such a transition matrix is called lazy.
  • a network with self-loop

Irreducibility

Reducible:
$$
\exists (i,j), \delta,\forall t>\delta, (T^t)_{ij}=0
$$
Means

  • strong connectivity of directed networks.
  • connectivity of undirected networks.

Markov Chain Convergence theorem

$T$: irreducible and aperiodic

for any initial distribution $\pi^{(0)}$​
$$
\exists| \pi,\delta(\pi^{(t)},\pi)\to 0 (t\to \infty)
$$
the stable distribution is independent from the initial distribution.

Existence

if there is a periodic behavior, the network doesn’t have a stationary distribution.

Lazy

Uniqueness

means for all possible initial distribution, have only one final stable distribution. Requires irreducibility.

Initial Distribution
$$
\pi^{(0)}\in\mathbb{R}^{1\times N}
$$

  • the sum is 1.
  • all entries are non-negative.

$$
\pi^{(t)}=\pi^{(0)}\cdot T^t
$$

Stationary distribution

$$
\pi^{(t)}= \pi^{(t+1)}\cdot 1=\pi^{(t)}\cdot T
$$

  • the left eigenvector of $T$ corresponding to eigenvalue $v=1$.
  • gives a notion of node centrality, feedback centrality.

View from another point:

$T^{t}_{ij}$: the probability of node $i$​​​ goes to node $j$​​​​​ after $t$​​​​ steps.

It should be unchanged when $t$​​ is large enough.

That is, there is a $t_0$​​​,

$$
\forall t>t_0 \forall k>0 T^{t+k}{ij}=T^t{ij}
$$

  • That means, $T^{t+1} = T^t =T^t\cdot T$ Each row of $T^t$ is the eigenvector of $T$, with the eigenvalue 1.
  • The stationary distribution is also the eigenvector of $T$ with the eigenvalue 1.
  • Each row of $T^t$ is the same, which is exactly the stationary distribution $\pi$.

To network

  • undirected network: $\pi_i=\frac{d_i}{2m}$ directly related to the node degrees
  • directed network:
    • the number of nodes pointing to a node
    • the importance of these pointing nodes

Total Variation distance

$$
\delta(\pi_1,\pi_2)=\frac{1}{2}\sum_i |\pi_{1,i}-\pi_{2,i}|
$$

  • normalized by 2: to restrict the value less than or equal to 1. The biggest value of the sum of differences for each dimension is 2, with the 1-sum restriction.
  • used to measure if a sequences of vectors converges.

Others

Connectivity of directed networks

  • strongly connected: all pairs of nodes $i,j$​ have a path from $i$​ to $j$​ and from $j$​ to $i$.
  • weakly connected: if the corresponding undirected network is connected.

Feedback Centrality

$$
\pi = \pi \cdot T
$$

$$
\pi_j = \sum_i \pi_i T_{ij}
$$

  • feedback: the centrality of those nodes that point to a node influence the node’s centrality.

  • a node that is pointed to by “important” nodes should be “intuitively” more important than those pointed by “unimportant” nodes.

  • a node’s importance transition on others will be reduced with more outgoing links.

Zero-indegree

Since a node’s eigenvector, feedback centrality, is determined by its in-going nodes, a node with indegree 0 has 0 centrality.

Useless for directed acyclic graphs.

Solution: PageRank

assign constant amount of one unit of centrality “for free”
$$
\alpha x= x\cdot T+\vec{1}
$$

$$
x_j =\frac{1}{\alpha}\sum_i x_i\frac{A_{ij}}{d_{out}(i)}+1
$$

Eigenvector Centrality

$$
\alpha x= x\cdot A
$$

$$
x_j=\frac{1}{\alpha}\sum_i x_iA_{ij}
$$

  • $\alpha$ is the largest eigenvalue of $A$
    • usually, we take the largest one, but not necessarily.
  • $T$ is replaced by the adjacency matrix $A$. The difference is whether to normalize the degrees. In another way, whether “conserve quantity”.
    • $T$: conserve
    • $A$: not conserve
    • The importance of a node will keep even with more outgoing links.

Centrality comparison

  • degree: local measure
  • Betweenness, closeness, eigenvector, and PageRank: global characteristics


Author: Fululu
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source Fululu !
  TOC