Basic Knowledge of network
Questions
- Define the average shortest path length of a network
- the sum of the shortest path between every two nodes,
- divided by the number of pairs
- $n(n-1)$ for directed
- or $\frac{n(n-1)}{2}$ for undirected
- Why does the k-th power of an adjacency matrix count paths of length $k$?
- according to the definition of matrix power.
- Can you define the betweenness and closeness centrality of a node.
- betweenness: the number of shortest path through this node.
- if normalized, divided the number of shortest paths between node $s$ and $t$.
- closeness: the average of the its distance to other nodes. Then, take the inverse.
- betweenness: the number of shortest path through this node.
- Can you construct a network in which the node with highest betweenness centrality is the one with the smallest degree centrality?
- the node between two fully-connected communities.
- Can you explain how we can find the modularity $Q_{opt}$ of a network?
- searching all the possible community divisions.
- How is $Q_{max}$ different from $Q_{opt}$?
- $Q_{max}$ is a calculation method working on $Q$ and $C$, but $Q_{opt}$ is a maximize operation on all possible $Q(G,C)$
- For a fully connected network with $n$ nodes and no self-loops and a single community $C_1$, can you show that $Q\to 0, n\to\infty$?
with a single community, we know that
$$
\delta(c_i, c_j) = 1, Q(G,C)=\frac{1}{2m}\sum_{i,j}(A_{ij}-\frac{d_id_j}{2m})
$$the fully connected means $d_i=n-1, m=\frac{n(n-1)}{2}, A_{ij}=1,\forall i\neq j$
bring all these values we can get
$$
Q(G, C)= 1 -\frac{1}{2m}\sum_{i,j}\frac{(n-1)(n-1)}{n(n-1)}=1-\frac{1}{n(n-1)}nn\frac{(n-1)}{n}
$$Here, there may be some different ideas on if $\sum_{ij}$ include the diagonal objects, that is whether there are totally $n\times n$ elements or $n\times (n-1)$ elements. This is not a big influence.
- For a network with n nodes, only connected by self-loops (represented by 1-elements on the diagonal) and each node i assigned to its own community $C_i$ , can you show that $Q\to\frac{1}{2}, n\to\infty$ ?
- with its own community, we know that $\delta(c_i, c_j)=0,\forall i\neq j$, $Q(G,C)=\frac{1}{2m}\sum_i(A_{ii}-\frac{d_id_i}{2m})$
- only self-loops means $d_i=2, m=n, A_{ii}=1$
- bring all these values we can get $Q(G, C)=\frac{1}{2n}\sum_i(1-\frac{4}{2n})=\frac{1}{2}-\frac{2}{n}$
- Note that the degree of each node is 2, not 1.
Keywords
modularity
Path
length of path
$$
\text{len}(n_0, n_1,\cdots, n_l)=l
$$
powers of adjacency matrix
elements $A_{ij}^k$ counts paths of (exactly) length $k$ between $i$ and $j$.
Distance of two nodes
the shortest length of path between these two nodes.
Diameter of network
the longest shortest path.
Use the symbol $\text{diam}(G)$
Average Shortest path length of network
“the typical shortest path”
use the symbol $\langle l\rangle$
Connected Components
maximally (means locally) connected subgraphs
- size: the number of nodes
- can be shown by the infinite power of adjacency matrix $A^k, k\to\infty$
largest connected component
the largest subgraphGiant connected component
if the largest connected component contains almost all the nodes $\frac{|V’|}{|V|}\simeq 1$
Community
partitions of nodes, can be overlapping. But we only consider non-overlapping partition.
intuitively, the following two conditions should hold:
- nodes within the same community should be “well-connected” to each other
- nodes in different communities should have few connections.
Partition Quality: $Q$
$$
Q(G,C)=\frac{1}{2m}\sum_{i,j}(A_{ij}-\frac{d_id_j}{2m})\delta(c_i, c_j)
$$
two special cases:
- all nodes in a single community: $Q=0$
- each community contains only one node: $Q=0$
- theoretically maximum value $1$.
$Q_{opt}$
$$
Q_{opt}=\max_C Q(G,C)
$$
the optimal partition quality. This is a property of a given network $G$. It is called modularity
- object: $G$
- less than 1, due to two reasons:
- there is a single link $(b, d)$ that connects nodes in different communities
- there are missing links within both communities (simply because the graph does not have enough links)
$Q_{max}$
$$
Q_{\max}(G,C)=1-\frac{1}{2m}\sum_{i,j}(\frac{d_id_j}{2m})\delta(c_i, c_j)
$$
“max” here means a calculation method, guided by some max idea for given $G$ and $C$, not meaning any maximize process.
- object: $G$ and $C$
Community assortativity coefficient
the ratio of the $Q_{opt}$ with $Q_{\max}$ got by this optimal $C$.
object: $G$.
quantify how close the topology is to the strongest possible partitioning, to a range between 0 and 1. This allow us to judge whether a network exhibits community structure or not.
$$
\frac{Q_{opt}(G)}{Q_{\max}(G,\tilde{C})}
$$