Aims
- friendship paradox and sampling in networks
- emergence of a giant connected component
- molloy-reed criterion in random networks
Questions
- what is special about the generating functions $G_0$ and $G_1$ in sparse Erdos-Renyi networks?
- In sparse Erdos-Renyi networks, $np$ is a constant $\lambda$. $P(X=m)=\frac{\lambda^m e^{-\lambda}}{m!}$
- what is $G_0$ and what is $G_1$
- $G_0$: the generating function of the degree of a node
- $G_1$: the generating function of the degree of a neighbor of a node, not including the link degree 1 between them.
- Speciality: $G_0=G_1$
- In which case can we calculate the number of nodes at distance $l$ of a randomly chosen node $i$ as $\langle k\rangle ^l$ ?
- when the degree follows Poisson distribution.
- Under which conditions can we use generating functions to study connected components?
- we use $H_1$ to count the component size of a node in forward direction. We don’t count the node that we arrived from.
- what is the Molloy-Reed criterion?
- when the ratio of 2-nd raw moment and 1-st raw moment bigger than 2, the gcc will emerge
- what is the condition for the existence of a giant connected component in the $G(n,p)$ model?
- $np>1$
- what is the condition for the existence of a giant connected component in the $G(n,m)$ model?
- $p = 2m/(n-1)/n$. So, $2m/(n-1)>1$
- In an Erdos-Renyi network, what does the Molloy-Reed criterion imply for $G_0’(1)=\langle k\rangle$ and $G_1’(1)=\langle k_n\rangle-1$
- $G_1’(1)>1$
Keys
giant connected components, $G_0$, $G_1$, Molloy-Reed criterion, self-consistency condition
Sparse Erdos-Renyi network: Poisson Distribution Degree
Friendship paradox can be interpreted as a sampling bias.
We define $G_1(x)=\frac{G_0’(x)}{G_0’(1)}$
In sparse Erdos-Renyi network,
$$
P(X=m)=\frac{\lambda^m e^{-\lambda}}{m!}
$$
$$
G_0(x)=\sum_{i=1} p(i)x^i=\sum_{i=1} \frac{(x\lambda)^i e^{-\lambda}}{i!}
$$
$$
=e^{-\lambda}\sum_{i=1} \frac{(x\lambda)^i }{i!}=e^{-\lambda}e^{x\lambda}
$$
$$
G_0’(x)=e^{-\lambda}\lambda e^{x\lambda}
$$
$$
G_1(x)=\frac{G_0’(x)}{G_0’(1)} = \frac{e^{x\lambda}}{e^\lambda}=G_0(x)
$$
$$
\langle k_n\rangle = G_1’(1)+1=G_0’(1)+1=\langle k\rangle + 1
$$
- $G_0$ and $G_1$ are identical.
- if we don’t count the edge between the node and its neighbor, they have the same average degree.
- All of the assumption is the degree follows Poisson Distribution.
- the generating function $G_0$ contains all information about the random topology, which makes them particularly easy to study
Component size
Question
when does a giant connected component arise?
Exists giant connected component $\Leftrightarrow$ The average component size $\langle s\rangle$ diverges with $n\to\infty$
Speaking of something’s average, of course, we can use the generating function.
How to find the generating function describing $P(C=s)$ , the probability of a component with size $s$
Here, the definition of “component” involves “direction”. That is a node’s component doesn’t include its father node.
Generating Functions
$H_1$
We define $H_1$ as the generating function of the component size that includes a node as some other node’s neighbor.
$H_1$ satisfies: the sum of its children’s component size add 1, count itself, is this node’s component size. Since its children also satisfy themselves as some node’s neighbor. Here, the neighbor is this node. It can also be calculated by $H_1$. We have
$$
H_1(x)=xG_1(H_1(x))
$$
- Here, we use $G_1$ instead of $G_0$. Because we only count the node’s children, not including its father. The difference of $G_0$ and $G_1$ is direction.
- $G_0$ doesn’t take direction as a count. All the neighbors of a node are counted.
- $G_1$ indeed considers direction. Its father node won’t be counted, but only its children.
- This is also called self-consistency condition.
$H_0$
We define $H_0$ as the generating function of the component size of a node.
Based on $H_1$, it is easy to calculate
$$
H_0=xG_0(H_1(x))
$$
Hence,
$$
\langle s\rangle = H_0’(1)
$$
$\langle s\rangle$
We cannot get the exact formula of $H_0$ and $H_1$, but we can get the value of $\langle x\rangle$.
$$
\langle s\rangle = H_0’(1)=[G_0(H_1(x))+xG_0’(H_1(x))H_1’(x)]_{x=1}
$$
$$
=G_0(H_1(1))+G_0’(H_1(1))H_1’(1)
$$
It is good to know all the generating function is 1 at $x=1$
So,
$$
\langle s\rangle =G_0(1)+G_0’(1)H_1’(1)
$$
We only need to calculate $H_1’(1)$
$$
H_1’(x)=G_1(H_1(x))+xG_1’(H_1(x))H_1’(x)
$$
$$
H_1’(1)=G_1(H_1(1))+G_1’(H_1(1))H_1’(1) = 1+G_1’(1)H_1’(1)
$$
We can get
$$
H_1’(1)=\frac{1}{1-G_1’(1)}
$$
Therefore,
$$
\langle s\rangle = 1+\frac{G_0’(1)}{1-G_1’(1)}=\frac{\langle k\rangle}{2-\langle k_n\rangle} = \frac{\langle k\rangle}{2-\frac{\langle k^2\rangle}{\langle k\rangle }}
$$
- $G_0’(1)$: the average degree $\langle k\rangle$
- $G_0’(1)$: the mean neighbor degree $\langle k_n\rangle -1$
Emergency of the giant connected component
$\langle k\rangle \to\infty$, very dense network
$\langle k_n\rangle - 1\to 1\Rightarrow \langle k_n\rangle \to 2$, sparse
In sparse Erdos-Renyi model, we have $np\to\infty$ as a fixed number.
$$
\langle k_n\rangle = \frac{\langle k^2\rangle}{\langle k\rangle}=\frac{np(1+np)}{np}=1+np
$$
when $np>1$, it has a giant connected component.The interpretation: each node needs to be connected-on-average- to at least one other node.
Question??? If $np>1$, according to this calculation, the average component size will be less than 0. How to explain???? The definition of component is wired here.
Molloy-Reed Criterion
There must be a gcc for
$$
\frac{\langle k^2\rangle}{\langle k\rangle}>2
$$
Phase transition phenomena
At some point, the small change of the parameter will cause an abrupt change of some system’s properties.