Objectives
- topology’s influence on dynamical process
- model and analyze diffusion process in complex networks.
- random walk model for diffusion
- Markov chain convergence theorem
- Feedback centrality measures
Questions
- how is the transition matrix of a random walk defined for weighted and unweighted networks?
- what is the entry in the transition matrix
- what is the effect of weight?
- $T_{ij}=\frac{A_{ij}}{\sum_k A_{ik}}$
- When does a random walk in a network converge to a unique stationary distribution?
- the transition matrix: irreducible, aperiodic
- How can we compute the stationary distribution of a random walk?
- multiply the transition matrix many times enough.
- calculate the eigenvector corresponding to the eigenvalue1 of the transition matrix.
- How is total variation distance defined? What is its maximum value?
- the half of the sum of absolute difference in each dimension.
- 1
- For a right-stochastic transition matrix T and an initial distribution $\pi^{(0)}$, how can we compute the visitation probabilities after t steps of a random walk?
- $\pi^{(0)}T^t$
- Can you give an example for a network in which a random walk never reaches a stationary distribution?
- $\pi^{(0)}=(1,0,0)$
- $P(1\to2)=1,P(2\to3)=1,P(3\to1)=1$
- Can you give an example for a network in which a random walk reaches multiple stationary distributions for different initial distributions?
- separated network with many clusters, within each cluster it will reach a stable distribution.
- How can a periodic transition matrix be made aperiodic?
- add small random values on the diagonal entries.
Keywords
dynamic process, total variation distance, transition matrix
Dynamic Process
Definition
- Dynamic: means the nodes’ states changing with time. The network topology is static.
Classification
- Four classes:
- synchronization: the purpose of the common wave length or rhythm is to strengthen the group bond, like homophily. Property’s effect on network evolution
- e.g: people sharing the same culture work together
- consensus: nodes adopt state of neighbors. Network evolution’s effect on property change.
- e.g.: opinion formation
- propagation: a process by which some initial quantities multiply or proliferate throughout the network. Non-conserved quantity: spreading to others doesn’t reduce the original quantity.
- e.g.: disease broadcast, rumours proliferation.
- diffusion: net movement of anything (e.g., ideas, ions, molecules) from a region of higher concentration to a region of lower concentration. Conserved quantity.
- e.g.: water, gas, electricity transportation.
- synchronization: the purpose of the common wave length or rhythm is to strengthen the group bond, like homophily. Property’s effect on network evolution
Diffusion process
- the topology’s influence
- the role of nodes
Random Walk
Idea
using probability to model lack of knowledge.
generate random paths through a state space.
Markov Chains
If we use the Markov property (memoryless) to build random walk, we get a kind of random walk, memoryless Markov process.
$$
P(X_{t+1=s_{t+1}}|X_t=s_t)
$$
Transition Matrix
$A$ is the adjacency matrix of the network. We can define transition matrix as
$$
T_{ij}=\frac{A_{ij}}{\sum_k A_{ik}}
$$
- weighted/unweighted is decided by $A$.
- the $T_{ij}$ is normalized, so we can scale out all the outgoing weights of a node without any change of $T$.
Row-stochastic/right-stochastic
the row sum of stochastic vector is 1.
Right means we multiply $T$ at the right side of a vector $vT$. Here $v\in\mathbb{R}^{1\times N}$
Aperiodicity
State i is said to be periodic with period t iff
- $T_{ii}^{(n)} = 0, \forall n\neq t, 2t, 3t, \cdots$
- $T_{ii}^{(n)} \neq 0, \forall n= t, 2t, 3t, \cdots$
Write in another way
$$
r(s_i)= \text{gcd}{t\geq 1: (T^t)_{ii}>0}
$$
- aperiodic: $r(s_i)=1,\forall i\in V$
- periodic: otherwise
How to make a matrix aperiodic?
- add small random value on the diagonal items.
- $T_{ii}=\epsilon,\forall v\in V$
- a random walk with such a transition matrix is called lazy.
- a network with self-loop
Irreducibility
Reducible:
$$
\exists (i,j), \delta,\forall t>\delta, (T^t)_{ij}=0
$$
Means
- strong connectivity of directed networks.
- connectivity of undirected networks.
Markov Chain Convergence theorem
$T$: irreducible and aperiodic
for any initial distribution $\pi^{(0)}$
$$
\exists| \pi,\delta(\pi^{(t)},\pi)\to 0 (t\to \infty)
$$
the stable distribution is independent from the initial distribution.
Existence
if there is a periodic behavior, the network doesn’t have a stationary distribution.
Lazy
Uniqueness
means for all possible initial distribution, have only one final stable distribution. Requires irreducibility.
Initial Distribution
$$
\pi^{(0)}\in\mathbb{R}^{1\times N}
$$
- the sum is 1.
- all entries are non-negative.
$$
\pi^{(t)}=\pi^{(0)}\cdot T^t
$$
Stationary distribution
$$
\pi^{(t)}= \pi^{(t+1)}\cdot 1=\pi^{(t)}\cdot T
$$
- the left eigenvector of $T$ corresponding to eigenvalue $v=1$.
- gives a notion of node centrality, feedback centrality.
View from another point:
$T^{t}_{ij}$: the probability of node $i$ goes to node $j$ after $t$ steps.
It should be unchanged when $t$ is large enough.
That is, there is a $t_0$,
$$
\forall t>t_0 \forall k>0 T^{t+k}{ij}=T^t{ij}
$$
- That means, $T^{t+1} = T^t =T^t\cdot T$ Each row of $T^t$ is the eigenvector of $T$, with the eigenvalue 1.
- The stationary distribution is also the eigenvector of $T$ with the eigenvalue 1.
- Each row of $T^t$ is the same, which is exactly the stationary distribution $\pi$.
To network
- undirected network: $\pi_i=\frac{d_i}{2m}$ directly related to the node degrees
- directed network:
- the number of nodes pointing to a node
- the importance of these pointing nodes
Total Variation distance
$$
\delta(\pi_1,\pi_2)=\frac{1}{2}\sum_i |\pi_{1,i}-\pi_{2,i}|
$$
- normalized by 2: to restrict the value less than or equal to 1. The biggest value of the sum of differences for each dimension is 2, with the 1-sum restriction.
- used to measure if a sequences of vectors converges.
Others
Connectivity of directed networks
- strongly connected: all pairs of nodes $i,j$ have a path from $i$ to $j$ and from $j$ to $i$.
- weakly connected: if the corresponding undirected network is connected.
Feedback Centrality
$$
\pi = \pi \cdot T
$$
$$
\pi_j = \sum_i \pi_i T_{ij}
$$
feedback: the centrality of those nodes that point to a node influence the node’s centrality.
a node that is pointed to by “important” nodes should be “intuitively” more important than those pointed by “unimportant” nodes.
a node’s importance transition on others will be reduced with more outgoing links.
Zero-indegree
Since a node’s eigenvector, feedback centrality, is determined by its in-going nodes, a node with indegree 0 has 0 centrality.
Useless for directed acyclic graphs.
Solution: PageRank
assign constant amount of one unit of centrality “for free”
$$
\alpha x= x\cdot T+\vec{1}
$$
$$
x_j =\frac{1}{\alpha}\sum_i x_i\frac{A_{ij}}{d_{out}(i)}+1
$$
Eigenvector Centrality
$$
\alpha x= x\cdot A
$$
$$
x_j=\frac{1}{\alpha}\sum_i x_iA_{ij}
$$
- $\alpha$ is the largest eigenvalue of $A$
- usually, we take the largest one, but not necessarily.
- $T$ is replaced by the adjacency matrix $A$. The difference is whether to normalize the degrees. In another way, whether “conserve quantity”.
- $T$: conserve
- $A$: not conserve
- The importance of a node will keep even with more outgoing links.
Centrality comparison
- degree: local measure
- Betweenness, closeness, eigenvector, and PageRank: global characteristics