CN2 Network Essentials


Basic Knowledge of network

Questions

  • Define the average shortest path length of a network
    • the sum of the shortest path between every two nodes,
    • divided by the number of pairs
      • $n(n-1)$ for directed
      • or $\frac{n(n-1)}{2}$ for undirected
  • Why does the k-th power of an adjacency matrix count paths of length $k$?
  • according to the definition of matrix power.
  • Can you define the betweenness and closeness centrality of a node.
    • betweenness: the number of shortest path through this node.
      • if normalized, divided the number of shortest paths between node $s$ and $t$.
    • closeness: the average of the its distance to other nodes. Then, take the inverse.
  • Can you construct a network in which the node with highest betweenness centrality is the one with the smallest degree centrality?
    • the node between two fully-connected communities.
  • Can you explain how we can find the modularity $Q_{opt}$ of a network?
    • searching all the possible community divisions.
  • How is $Q_{max}$ different from $Q_{opt}$?
    • $Q_{max}$​ is a calculation method working on $Q$ and $C$, but $Q_{opt}$ is a maximize operation on all possible $Q(G,C)$
  • For a fully connected network with $n$ nodes and no self-loops and a single community $C_1$, can you show that $Q\to 0, n\to\infty$?
    • with a single community, we know that
      $$
      \delta(c_i, c_j) = 1, Q(G,C)=\frac{1}{2m}\sum_{i,j}(A_{ij}-\frac{d_id_j}{2m})
      $$

    • the fully connected means $d_i=n-1, m=\frac{n(n-1)}{2}, A_{ij}=1,\forall i\neq j$​

    • bring all these values we can get
      $$
      Q(G, C)= 1 -\frac{1}{2m}\sum_{i,j}\frac{(n-1)(n-1)}{n(n-1)}=1-\frac{1}{n(n-1)}nn\frac{(n-1)}{n}
      $$

    • Here, there may be some different ideas on if $\sum_{ij}$ include the diagonal objects, that is whether there are totally $n\times n$ elements or $n\times (n-1)$ elements. This is not a big influence.

  • For a network with n nodes, only connected by self-loops (represented by 1-elements on the diagonal) and each node i assigned to its own community $C_i$ , can you show that $Q\to\frac{1}{2}, n\to\infty$ ?
    • with its own community, we know that $\delta(c_i, c_j)=0,\forall i\neq j$​, $Q(G,C)=\frac{1}{2m}\sum_i(A_{ii}-\frac{d_id_i}{2m})$​
    • only self-loops means $d_i=2, m=n, A_{ii}=1$
    • bring all these values we can get $Q(G, C)=\frac{1}{2n}\sum_i(1-\frac{4}{2n})=\frac{1}{2}-\frac{2}{n}$
    • Note that the degree of each node is 2, not 1.

Keywords

modularity

Path

length of path

$$
\text{len}(n_0, n_1,\cdots, n_l)=l
$$

powers of adjacency matrix

elements $A_{ij}^k$​ counts paths of (exactly) length $k$​ between $i$ and $j$.

Distance of two nodes

the shortest length of path between these two nodes.

Diameter of network

the longest shortest path.

Use the symbol $\text{diam}(G)$

Average Shortest path length of network

“the typical shortest path”

use the symbol $\langle l\rangle$

Connected Components

maximally (means locally) connected subgraphs

  • size: the number of nodes
  • can be shown by the infinite power of adjacency matrix $A^k, k\to\infty$

    largest connected component

    the largest subgraph

    Giant connected component

    if the largest connected component contains almost all the nodes $\frac{|V’|}{|V|}\simeq 1$

Community

partitions of nodes, can be overlapping. But we only consider non-overlapping partition.

intuitively, the following two conditions should hold:

  • nodes within the same community should be “well-connected” to each other
  • nodes in different communities should have few connections.

Partition Quality: $Q$​

$$
Q(G,C)=\frac{1}{2m}\sum_{i,j}(A_{ij}-\frac{d_id_j}{2m})\delta(c_i, c_j)
$$

two special cases:

  • all nodes in a single community: $Q=0$
  • each community contains only one node: $Q=0$
  • theoretically maximum value $1$.

$Q_{opt}$​

$$
Q_{opt}=\max_C Q(G,C)
$$

the optimal partition quality. This is a property of a given network $G$. It is called modularity

  • object: $G$
  • less than 1, due to two reasons:
    • there is a single link $(b, d)$ that connects nodes in different communities
    • there are missing links within both communities (simply because the graph does not have enough links)

$Q_{max}$​

$$
Q_{\max}(G,C)=1-\frac{1}{2m}\sum_{i,j}(\frac{d_id_j}{2m})\delta(c_i, c_j)
$$

“max” here means a calculation method, guided by some max idea for given $G$ and $C$, not meaning any maximize process.

  • object: $G$ and $C$

Community assortativity coefficient

the ratio of the $Q_{opt}$ with $Q_{\max}$ got by this optimal $C$.

  • object: $G$.

  • quantify how close the topology is to the strongest possible partitioning, to a range between 0 and 1. This allow us to judge whether a network exhibits community structure or not.
    $$
    \frac{Q_{opt}(G)}{Q_{\max}(G,\tilde{C})}
    $$


Author: Fululu
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source Fululu !
  TOC