# The Fitness-Corrected Block Model, or how to create maximum-entropy data-driven spatial social networks ### Formalism

Let ({mathcal {G}}) be the ensemble of all easy graphs of N vertices. If P is a likelihood distribution over ({mathcal {G}}), P(G) is the likelihood of graph (Gin {mathcal {G}}), and (leftlangle cdot rightrangle _P) denotes the expectation with respect to P. The vertex set (V={v_i}_{i=0}^{N-1}) is partitioned into n blocks ({B_I}_{I=0}^{n-1}). The measurement of block I is (N_I=|B_I|) and, for every (v_iin V), (I_i) denotes the index of the block to which (v_i) belongs. For all pairs IJ, (N_{IJ}) denotes the variety of doable pairs (ij) with (v_iin I), (v_jin J) and (ine j), i.e., (N_{II}=N_I(N_I-1)) and (N_{IJ}=N_IN_J) if (Ine J)—discover that within the case of (N_{II}) we’re counting twice the variety of {couples}, as we’ll do for the counts of edges within the following.

Each (Gin {mathcal {G}}) is uniquely decided by its adjacency matrix (A(G)={a_{ij}(G)}_{i,j=0}^{N-1}), the place (a_{ij}(G)=1) if edge ((i,j)in E(G)) and (a_{ij}(G)=0) in any other case. The diploma of vertex (v_i) in G is (deg _i(G)=sum _j a_{ij}(G)). The whole diploma of block I is (deg _I(G) = sum _{iin I} deg _i(G)) and, for all IJ, (L_{IJ}(G)=sum _{iin I}sum _{jin J} a_{ij}(G)) is the variety of edges between (B_I) and (B_J) in G, or, if (I=J), twice that quantity. Therefore, utilizing the definitions, (deg _I(G)=sum _J L_{IJ}(G)). For the sake of simplicity, the dependence of those portions on the precise graph G might be usually omitted within the following.

### Maximum entropy Degree-Corrected Block Model4

The most entropy Degree-Corrected Block Model (DCBM) is outlined as the utmost entropy likelihood distribution P over ({mathcal {G}}) by which the variety of hyperlinks per block and the diploma sequence are constrained on common, i.e.

start{aligned} leftlangle L_{IJ}rightrangle _P&= K_{IJ} quad textual content {for all } I,J finish{aligned}

(7)

start{aligned} leftlangle deg _irightrangle _P&= k_i quad textual content {for all } i, finish{aligned}

(8)

with (sum _J K_{IJ}=sum _{iin I} k_i), for all I. If H(P) denotes the Shannon entropy of P, the sought P will be obtained by discovering the stationary factors of

start{aligned} H'(P,vec {eta },vec {theta })=H(P)-C(P,vec {eta },vec {theta }) = -leftlangle ln P(G)rightrangle _P – left( sum _{Ile J}eta _{IJ}left( leftlangle L_{IJ}rightrangle _P-K_{IJ}proper) + sum _itheta _ileft( leftlangle deg _irightrangle _P-k_iright) +alpha (sum _G P(G)-1)proper) finish{aligned}

(9)

the place (eta _{IJ}), (theta _i) and (alpha ) are Lagrange multipliers: whereas, (eta _{IJ}) and (theta _i) management the situations (7) and (8), respectively, (alpha ) is important for the normalization of the likelihood P(G). Since the useful derivatives with respect to P(G) are (frac{mathrm{d}leftlangle L_{IJ}rightrangle _P}{mathrm{d}P(G)}= sum nolimits _{iin I}sum limits _{jin J} a_{ij}(G)) and (frac{mathrm{d}leftlangle deg _irightrangle _P}{mathrm{d}P(G)}= sum nolimits _j a_{ij}(G)), then

start{aligned} frac{mathrm{d}C(P,vec {eta },vec {theta })}{mathrm{d}P(G)}= sum _{Ile J}eta _{IJ}sum _{iin I}sum _{jin J} a_{ij}(G) + sum _i theta _i sum _j a_{ij}(G) +alpha = sum _{i This leads tostart{aligned} P(G)propto exp left[ -sum _{i

and the likelihood per graph factorises by way of chances per hyperlink as

start{aligned} p_{ij}={left{ start{array}{ll} dfrac{x_ix_jy_{I_iJ_j}}{1+x_ix_jy_{I_iJ_j}} &{}textual content {if } ine j 0 &{}textual content {in any other case} finish{array}proper. } finish{aligned}

(10)

the place (x_i=e^{-theta _i}) and (y_{IJ}=e^{-eta _{IJ}}). For all i and all (Ile J), (x_i) and (y_{IJ}) will be discovered by fixing the system of equations

start{aligned}&sum _{iin I}sum _{jin J}p_{ij} = K_{IJ} quad textual content {for all } Ile J finish{aligned}

(11)

start{aligned}&sum _j p_{ij} = k_i quad textual content {for all } i finish{aligned}

(12)

#### Sparse DCBM

When the typical diploma (leftlangle krightrangle ) is small, i.e., when the community is sparse, the sting likelihood of the DCBM will be approximated as (p_{ij}approx x_ix_jy_{I_iJ_j}). This permits to rewrite (11) and (12) as

start{aligned} y_{IJ} sum _{iin I} x_i sum _{start{array}{c} jin J jne i finish{array}}x_j&= K_{IJ} quad textual content {for all } Ile J finish{aligned}

(13)

start{aligned} x_i sum _J y_{I_iJ} sum _{start{array}{c} jin J jne i finish{array}}x_j&= k_i quad textual content {for all } i finish{aligned}

(14)

Since the constants Okay and (vec {ok}) are, by building, sure by the relation (sum _{J} K_{IJ} = leftlangle deg _Irightrangle = sum _{iin I} k_i), (13) and (14) admit the next resolution

start{aligned} x_i&= k_i quad textual content {for all } i y_{IJ}&= frac{K_{IJ}}{left( sum _{iin I}k_iright) left( sum _{start{array}{c} jin J jne i finish{array}}k_jright) } quad textual content {for all } Ile J finish{aligned}

### Maximum entropy Fitness-Corrected Block Model

Given a scalar (pin (0,1)), a symmetric matrix (Delta ) such that (sum _{I,J}Delta _{I,J}=2), and a health sequence (vec {f}), we outline the Fitness-Corrected Block Model (FCBM) as the utmost entropy mannequin fulfilling the next two situations:

start{aligned} leftlangle L_{IJ}rightrangle _P&= pleft( {start{array}{c}N 2end{array}}proper) Delta _{IJ} quad textual content {for all } I, J finish{aligned}

(15)

start{aligned} leftlangle deg _irightrangle _P&= pleft( {start{array}{c}N 2end{array}}proper) frac{f_i}{sum _{uin I_i} f_u} sum _J Delta _{I_iJ} quad textual content {for all } i finish{aligned}

(16)

The FCBM is a variation of the DCBM the place the community density p is a configuration parameter. As within the unique stochastic block mannequin, the block-level mixing is specified by way of a set of edge-densities (Delta _{IJ}), slightly than a set of edge-counts (K_{IJ}). Similarly, the diploma sequence (vec {ok}) is changed by a vertex intrinsic health sequence (vec {f}), in step with earlier fashions out there within the literature5,6. (f_i) measures (v_i)’s propensity to determine hyperlinks and (leftlangle deg _irightrangle _P) is ready proportional to (f_i) by a continuing that is determined by (I_i), apart from p. By design, (15) and (16) indicate (sum _Jleftlangle L_{IJ}rightrangle _P = leftlangle deg _Irightrangle _P = sum _{iin I} leftlangle deg _irightrangle _P), and (16) will be rewritten as

start{aligned} leftlangle deg _irightrangle _P = frac{f_i}{sum _{uin I_i} f_u} leftlangle deg _Irightrangle _P finish{aligned}

which clarifies the function of (vec {f}) as a component of intra-block heterogeneity.

For fastened (Delta ) and (vec {f}), the derivation of the utmost entropy FCBM is similar to that of the DCBM: the utmost entropy likelihood per graph factorizes into the likelihood per edge given by (10) and the vectors of constants (vec {x}) and (vec {y}) will be obtained by fixing the analogous of (11) and (12), i.e.

start{aligned}&sum _{iin I}sum _{jin J}p_{ij} = pleft( {start{array}{c}N 2end{array}}proper) Delta _{IJ} quad textual content {for all } Ile J finish{aligned}

(17)

start{aligned}&sum _j p_{ij} = pleft( {start{array}{c}N 2end{array}}proper) frac{f_i}{sum _{uin I_i} f_u} sum _J Delta _{I_iJ} quad textual content {for all } i finish{aligned}

(18)

#### Sparse FCBM

In the sparse-regime, i.e, when (pll 1), utilizing (p_{ij}approx x_ix_jy_{I_iJ_j}) yields the next approximate resolution to (17) and (18):

start{aligned} x_i&= pleft( {start{array}{c}N 2end{array}}proper) frac{f_i}{sum _{uin I_i} f_u} sum _J Delta _{I_iJ} = frac{f_i}{sum _{uin I_i} f_u} leftlangle deg _Irightrangle _P quad textual content {for all } i y_{IJ}&= frac{pleft( {start{array}{c}N 2end{array}}proper) Delta _{IJ}}{ pleft( {start{array}{c}N 2end{array}}proper) left( sum _O Delta _{IO} proper) pleft( {start{array}{c}N 2end{array}}proper) left( sum _O Delta _{OJ} proper) } = frac{pleft( {start{array}{c}N 2end{array}}proper) Delta _{IJ}}{leftlangle deg _Irightrangle _P leftlangle deg _Jrightrangle _P} quad textual content {for all } Ile J finish{aligned}

In this regime, the likelihood per edge can thus be rewritten as

start{aligned} p_{ij} approx pleft( {start{array}{c}N 2end{array}}proper) frac{f_i}{sum _{uin I_i} f_u} frac{f_j}{sum _{win J_j} f_w} Delta _{I_iJ_j}. finish{aligned}

(19)

#### Degree distribution for the sparse FCBM

Let us assume that every health worth (f_i) is drawn from an acceptable distribution (p_f). The sparse-regime approximation permits to estimate the anticipated diploma distribution of the sampled graph G with respect to each (p_f) and the graph sampling likelihood P. If (N_{I_i}) is giant sufficient, we’ve (sum _{uin I_i }f_u approx leftlangle frightrangle _{p_f} N_{I_i}). Now, following the method utilized in Ref.5, we’ve

start{aligned} ok(f,I) = leftlangle deg _ibigm vert f_i=f,I_i=Irightrangle _{p_f,P} approx dfrac{f}{leftlangle frightrangle _{p_f}N_{I}}leftlangle deg _Irightrangle _P = dfrac{f}{leftlangle frightrangle _{p_f}} mu _I finish{aligned}

(20)

the place (mu _I = frac{leftlangle deg _Irightrangle _P}{N_I} = leftlangle deg _i bigm vert I_i=Irightrangle _P) is the anticipated common diploma of the vertices in I. Equation (20) will be inverted resulting in estimate

start{aligned} f(ok,I) approx ok frac{leftlangle frightrangle _{p_f}}{mu _I} finish{aligned}

in order that

start{aligned} p_k(ok,I) = Pr [k(f)=kbigm vert I] = Pr [f(k)=fbigm vert I]frac{mathrm{d}f(ok)}{mathrm{d}ok} approx p_fleft( ok frac{leftlangle frightrangle _{p_f}}{mu _I}proper) frac{leftlangle frightrangle _{p_f}}{mu _I} finish{aligned}

(21)

and, therefore,

start{aligned} p_k(ok) = Pr [k(f)=k] = sum _I p_k(ok,I)frac{N_I}{N} approx sum _I p_fleft( ok frac{leftlangle frightrangle _{p_f}}{mu _I}proper) frac{leftlangle frightrangle _{p_f}}{mu _I} frac{N_I}{N} = frac{leftlangle frightrangle _{p_f}}{N} sum _I p_fleft( ok frac{leftlangle frightrangle _{p_f}}{mu _I}proper) frac{N_I}{mu _I} finish{aligned}

(22)

A barely much less correct, but a lot easier, approximation will be obtained computing first

start{aligned} ok(f) = leftlangle deg _ibigm vert f_i=frightrangle _{p_f,P} = sum _I ok(f,I) frac{N_I}{N} approx frac{f}{leftlangle frightrangle _{p_f}} sum _I mu _I frac{N_I}{N} = frac{f}{leftlangle frightrangle _{p_f}} mu finish{aligned}

(23)

the place (mu =sum _I mu _I frac{N_I}{N}) is the typical diploma of the community. Then, (23) will be inverted as

start{aligned} f(ok) approx ok frac{leftlangle frightrangle _{p_f}}{mu } finish{aligned}

yielding

start{aligned} p_k(ok) = Pr [k(f)=k] = Pr [f(k)=f]frac{mathrm{d}f(ok)}{mathrm{d}ok} approx p_fleft( ok frac{leftlangle frightrangle _{p_f}}{mu }proper) frac{leftlangle frightrangle _{p_f}}{mu } finish{aligned}

(24)

In many circumstances, (p_k) belongs to the identical household of likelihood distributions of (p_f): e.g., if (p_f) follows a power-law, lognormal or exponential distribution, then the identical holds for (p_k).

### Data-driven FCBM for spatial social networks

Our FCBM will be simply tuned upon actual information and empirical findings to supply cases of a most entropy spatial social community. On one hand, the native density and demographic profile of the inhabitants are typically out there within the type of discrete, geographically positioned (e.g., residents in 500 m (occasions ) 500 m tiles) and/or age-stratified (e.g., 0–5 years outdated) inhabitants segments. These information naturally induce a partition of the inhabitants into blocks. On the opposite hand, intra-block inhabitants heterogeneity will be managed by a vertex-related social health, presumably modelled upon measurable options akin to wealth, employment or mobility.

Formally, let the vertex set V describe an age-stratified inhabitants of N people dwelling in a territory tessellated into sq. tiles of facet l. Each (v_i) is characterised by two data-driven discrete attributes: its tile of residence (t_iin T), that’s, the discretized place of (v_i) within the territory, and its age-group (g_iin Gamma ). (t_i) and (g_i) could both be straight out there—within the case of an actual inhabitants—or be drawn, respectively, from given spatial density (p_t) and age-distribution (p_g)—within the case of an artificial inhabitants. These two attributes induce a partition of the inhabitants into (n=|T|cdot |Gamma |) blocks ({B_I}_{i=0}^{n-1}), with (v_iin B_I=(t_I,g_I)) if and provided that (t_i=t_I) and (g_i=g_I). We embrace the widely-acknowledged assumption that an inverse-power-law relation binds the gap (d_{IJ}) and the frequency of social relations between people dwelling in (t_I) and (t_J)12,13. For all pairs of blocks IJ, we thus outline the edge-density

start{aligned} Delta _{IJ} = frac{d_{IJ}^{-beta }s_{IJ}}{sum _{Ole Q}d_{OQ}^{-beta }s_{OQ}} finish{aligned}

the place (d_{IJ}) is the normalized (geographic or euclidean) distance between tiles (t_I) and (t_J); the normalization is obtained by means of a division by (frac{l}{2}). We set (d_{II}=1), in order that the gap between people in the identical tile is half the gap of people dwelling in neighboring tiles. (beta >0) is a configuration parameter. (s_{IJ}) measures the tendency of age teams (g_I) and (g_J) to socialize with one another; such a (|Gamma |occasions |Gamma |) symmetric age-based social mixing matrix S will be obtained, by imposing reciprocity and normalizing, from an acceptable data-driven contact matrix9.

Finally, we extract the health vector (vec {f}), set the density parameter (pin (0,1)) and, for all pairs ij, compute the sting likelihood (p_{ij}) as described for the FCBM. As a end result, the graph sampling likelihood P ensures that: (1) the anticipated variety of hyperlinks between (B_I) and (B_J) is proportional to (s_{I,J}) and decays as (d_{I,J}^{-beta }); (2) the anticipated diploma of (v_i) is proportional to (f_i) and to the anticipated whole diploma of block (I_i).

In this case, the sparse-regime approximation yields

start{aligned} p_{ij} approx p left( {start{array}{c}N 2end{array}}proper) frac{f_i}{sum _{uin I_i} f_u} frac{f_j}{sum _{win J_j} f_w}frac{d_{I_iJ_j}^{-beta }s_{I_iJ_j}}{sum _{Ile J}d_{IJ}^{-beta }s_{IJ}} finish{aligned}

(25)

Expression (25) defines a phenomenological mannequin, the place the likelihood of two people being linked is proportional to their sociability and to the cohesion of their age-groups, whereas decaying as an influence of their distance. Clearly, the estimates obtained in (22) and (24) for the diploma distribution keep legitimate within the data-driven mannequin. If the community is sufficiently sparse and the inhabitants of all tiles/teams is sufficiently giant, the diploma distribution of the sampled graph G is managed by the out there information and by the chosen health distribution (p_f).