## Abstract

Social network structure has often been attributed to two network evolution mechanisms—triadic closure and choice homophily—which are commonly considered independently or with static models. However, empirical studies suggest that their dynamic interplay generates the observed homophily of real-world social networks. By combining these mechanisms in a dynamic model, we confirm the longheld hypothesis that choice homophily and triadic closure cause induced homophily. We estimate how much observed homophily in friendship and communication networks is amplified due to triadic closure. We find that cumulative effects of homophily amplification can also lead to the widely documented core-periphery structure of networks, and to memory of homophilic constraints (equivalent to hysteresis in physics). The model shows that even small individual bias may prompt network-level changes such as segregation or core group dominance. Our results highlight that individual-level mechanisms should not be analyzed separately without considering the dynamics of society as a whole.

## INTRODUCTION

One of the most important traits of human sociality is homophily (*1*), the tendency of similar people to be connected to each other due to their shared biological and cultural attributes such as gender, occupation, or political affiliation. Homophily has been observed across various social networks (*1*–*5*), and it is a major force behind several pressing social issues including inequality, segregation, and online echo chambers (*6*–*8*). Thus, a thorough quantitative understanding of the network mechanisms leading to homophily (*9*–*12*) is essential for promoting a sufficient flow of information (*1*, *13*) and equal opportunity in social networks of individuals with diverse personal preferences.

The homophily observed in social networks is often attributed either to choice homophily, defined by people’s preference when choosing whom to connect with, or to induced homophily, rising from constraints in the opportunities of individuals to make connections (*2*). These two mechanisms and their relative importance have long been a subject of study in social network research (*14*, *15*). However, as suggested by longitudinal empirical results (*16*, *17*), the two mechanisms of homophily generation cannot be separated without considering the cumulative advantage-like dynamics (*10*) driving the evolution of social networks: Choice homophily creates circumstances for induced homophily, such as groups of similar people interacting, which are then further reinforced in cycles of choice and induced homophily. While the dynamics of homophily is well understood in the case of tipping point models of residential segregation (*18*, *19*), a similar understanding of the dynamics in social networks is still needed to validate and measure homophily amplification (*17*).

Here, we introduce a minimal model of social network evolution to analyze to what extent the structural constraints caused by triadic closure and choice homophily interact. The triadic closure mechanism uses the existing social network to create new connections between people who share common friends, acquaintances, or other connections. This mechanism has been reported as the most common structural constraint (*16*) and can explain many salient features of empirical social networks. These include a high number of closed triangles between acquaintances and fat-tailed degree distributions (*20*–*23*). Thus, triadic closure should be considered as the main mechanism in most minimal dynamic social network models (*20*, *24*, *25*). In our approach, individuals are considered to belong to either of two groups (*a* and *b*) representing the values of a static attribute of interest (gender, class, party, etc.), and they rewire their connections by two mechanisms: triadic closure (modeling the creation of edges via current contacts) and random rewiring [emulating any unknown mechanisms beyond triadic closure, such as focal closure in large foci (*16*, *26*)]. Choice homophily/heterophily is implemented by accepting new links with a bias probability dependent on the similarity of attributes between individuals.

In this study, we characterize the rich tapestry of emergent behavior captured by the model with a mean-field bifurcation analysis for varying relative group sizes, triadic closure probabilities, and choice homophily rates. We measure the amount of observed homophily in the network, which we interpret as the sum of choice homophily (a parameter in the model) plus the induced homophily caused by triadic closure acting on a homophilic social network. By tuning the parameters of the system with empirical data on friendship and communication networks, we find that, under the right circumstances, even a small amount of choice homophily may be greatly amplified by triadic closure to produce large amounts of observed homophily. Further, we find that the interplay of triadic closure and homophily can explain the emergence of core-periphery structures. These findings suggest that the observations of homophilous patterns of association in society should not be explained solely on the basis of a human preference for similarity, but as a constantly evolving interplay between structural constraints and homophily, one that requires computational simulation as a central part of the analysis.

### Model definition and parameters

We introduce a model of social network evolution with a simultaneous interplay of triadic closure and choice homophily (see Fig. 1, A to E for an illustration and further model details). The model is stylized such that it contains a minimal set of simple mechanisms on how social relationships are made and forgotten; details beyond these core mechanisms are modeled by uniform randomness to assume the least amount of information on them. The initial social structure is a random network with static attribute groups *a* and *b* (of relative sizes *n _{a}* and

*n*, with

_{b}*n*+

_{a}*n*= 1) distributed among nodes uniformly at random and independently of the initial network structure, such that there is a fraction

_{b}*P*=

_{ab}*P*of edges between groups, and fractions

_{ba}*P*,

_{aa}*P*within each group (

_{bb}*P*+

_{ab}*P*+

_{aa}*P*= 1).

_{bb}From its initial state, the network evolves with nodes updating their connections. At each time step, we select a focal node uniformly at random and a candidate neighbor, representing a social encounter that might lead to a new social relationship. The candidate neighbor is chosen by triadic closure with probability *c* or uniformly at random otherwise [emulating any other mechanisms for edge creation beyond the triadic closure; (*23*, *27*–*30*)]. The triadic closure mechanism selects a candidate neighbor by randomly sampling a neighbor of the focal node and then a neighbor of the sampled neighbor. If the edge between the focal node and the candidate node cannot be created (because it would create a multi-edge or a self-loop, or because the degree of the focal node is zero), then no updates are made for the focal node at this step.

The edge between the focal node and candidate neighbor is created with probability *S _{ab}* if the focal node is in group

*a*and the candidate neighbor is in group

*b*. The elements

*S*form a 2 × 2 bias matrix specifying the amount of choice homophily/heterophily in the social network. For simplicity, we parameterize the bias matrix as

_{ab}*S*=

_{aa}*s*,

_{a}*S*= 1 −

_{ab}*s*,

_{a}*S*=

_{bb}*s*, and

_{b}*S*= 1 −

_{ba}*s*, where

_{b}*s*(

_{a}*s*) is the choice homophily for group

_{b}*a*(

*b*). In this way, when

*s*=

_{a}*s*= 1/2, all of the elements of the bias matrix are also 1/2, i.e., there is no choice homophily bias. Multiplying the bias matrix by a constant changes the speed of network evolution, but not the fixed points of the dynamics. Note that the bias depends only on the groups the two individuals belong to, meaning that individuals have homogeneous choice homophily preferences (

_{b}*31*,

*32*).

Last, as maintaining social connections requires mental capacity and time investment, creating new connections implies forgetting some of the old ones (*33*). We model this process by randomly removing an edge of the focal node after a successful edge creation. This keeps the degree of the focal node unchanged (the individual making the time investment) but alters the degrees of the other two affected nodes, thus changing the degree distribution of the network. Note that random link removal may open triangles, while in reality, links that are surrounded by triangles are more likely to be strong links (*24*) and thus less likely to be removed. However, here we opt for random link removal (*29*) which does not involve these additional assumptions of social behavior and is a typical choice in this type of social network models [along with random node deletion; (*26*, *27*)].

## RESULTS

In our approach, the interplay between homophily and triadic closure (in a social network with two attribute groups) forms a dynamical system in which the evolution from an arbitrary initial network depends on the parameters that regulate choice homophily (*s _{a}*,

*s*) and triadic closure (

_{b}*c*). As with any other network model, the associated stochastic process exhibits random fluctuations, but the average dynamics of key behavioral quantities such as the degree distribution, clustering coefficient, and, centrally, the observed homophily depend deterministically on the model parameters. We characterize the behavior of the network following our model dynamics using a mean-field approximation and confirm our results with numerical simulations (see Materials and Methods and the Supplementary Materials).

The amount of homophily that can be directly observed in the model network is not necessarily the same as the choice homophily parameter in the model. In a network with two groups of equal size (*n _{a}* =

*n*), the observed homophily within groups

_{b}*a*and

*b*(denoted by

*o*and

_{a}*o*) is equal to the transition probability that following a link from a group leads to the same group (

_{b}*T*and

_{aa}*T*). Thus, when there is no triadic closure (

_{bb}*c*= 0), the choice homophily equals the expected fraction of neighboring nodes in the same group (

*s*=

_{a}*o*=

_{a}*T*and

_{aa}*s*=

_{b}*o*=

_{b}*T*). If one of the groups is larger (

_{bb}*n*>

_{a}*n*) and there is no triadic closure (

_{b}*c*= 0), the fraction of neighbors

*T*in, group

_{aa}*a*is larger than the choice homophily bias (

*s*). This is why we define a group-size correction in the observed homophily as

_{a}*b*. Note that this relation simplifies to

*o*=

_{a}*T*and

_{aa}*o*=

_{b}*T*when the attribute groups are of equal size (see Fig. 1F). The size-corrected observed homophily equals choice homophily (

_{bb}*o*=

_{a}*s*and

_{a}*o*=

_{b}*s*) when there is no triadic closure (

_{b}*c*= 0; see Materials and Methods). In the case of arbitrary triadic closure (

*c*≥ 0), we further define induced homophily

*i*such that the observed homophily is the sum of induced homophily and choice homophily.

_{a}Consistently, for *c* = 0, we have *i _{a}* = 0, and all observed homophily in the social network is choice homophily.

When individuals form new edges using the triadic closure mechanism in a network with existing homophilous patterns of connectivity, they link to their own group even without having an explicit choice homophily bias (Fig. 1F). This process increases the observed homophily (*o _{a}*) in the network beyond that due to choice homophily (

*s*), which in turn increases the likelihood for homophilic connections in upcoming triadic closure events. That is, the existing observed homophily originally due to choice homophily creates induced homophily (more opportunities for similar people to meet), which in turn creates even more induced homophily. We call the result of this cumulative advantage-like cycle homophily amplification, since the amount of observed homophily (

_{a}*o*) is larger than the amount of choice homophily (

_{a}*s*) (see Fig. 2A for an illustration). The results of this cumulative homophily amplification are shown for equally sized groups and symmetric choice homophily (

_{a}*s*=

*s*=

_{a}*s*) in Fig. 2B. In the extreme case of no random rewiring (lack of other mechanisms of edge creation), even a moderate choice homophily bias (

_{b}*s*≥ 2/3) will segregate the social network into fully disconnected groups (see Fig. 2B for the mean-field solution and the Supplementary Materials for a derivation of this result).

In addition to homophily amplification, the triadic closure mechanism and choice homophily may also lead to a core-periphery (*34*–*36*) social structure where the core group mostly connects with itself, while the periphery group almost exclusively connects with the core group even in the presence of choice homophily (see Fig. 2A for an illustration of the core-periphery structure and Fig. 2B inset for the analytic and simulation results). This effect, seemingly opposed to the drive of individuals to find homophilous connections in the periphery group, is due to the large likelihood of finding a candidate neighbor in the core group while attempting to close triangles (Fig. 1D). This likelihood is larger when the connectivity between the two groups is weak, which explains the role of choice homophily in the formation of the core-periphery structure (see the Supplementary Materials for a schematic similar to Fig. 1F). Homophily amplification and core-periphery are two competing results of the interaction between triadic closure and choice homophily. A core-periphery social structure is possible within otherwise symmetric networks with high triadic closure when there is enough choice homophily to boost the core-periphery structure, but not enough to tip the balance toward two tightly interconnected groups. Alternatively, a core-periphery structure can appear when there is enough asymmetry in the social network due to unequal group sizes or choice homophily biases (Fig. 3).

The rise of homophily amplification and core-periphery social structures depends not only on the parameters regulating triadic closure and choice homophily but also on the initial conditions and random fluctuations of network evolution, meaning that the system exhibits memory of previous structural constraints, or homophily hysteresis (see Fig. 2B for an example where either of the two groups can become the core, and Fig. 3 for a more systematic analysis). In other words, if the system parameters change (i.e., choice homophily and probability of triadic closure), the social network can experience dramatic, nonreversible changes such that returning to the previous parameters does not return the system to the same final state (i.e., the same fixed point). This suggests that social networks may have persistent memory of homophily, with a structure dependent both on current choice homophily biases and their history. Therefore, we speculate that the timing of interventions aiming to reduce observed homophily or the formation of core groups in, say, an online social network can be critical. Once the network has reached a stable point of its dynamics or is close to one, it can be much more difficult to drive the system to another stable point by attempting to change the choice homophily of individuals or other parameters.

The time scales at which the social network is driven toward homophily amplification or a core-periphery structure vary greatly (Fig. 4A). Homophily amplification is generally a fast process, requiring only a few rewiring events per edge for the social network to reach a stable point. On the other hand, a core-periphery social structure evolves slowly toward equilibrium, and even if a network would eventually stabilize to a core-periphery structure, it may exhibit fast homophily amplification first (Fig. 4B). This result suggests that even if a real social network (in society or in online platforms) would follow our stylistic model accurately, it might not show a stable, fully realized core-periphery structure but a transient state slowly drifting toward the structural dominance of one group over the other. If the network is first driven toward homophily amplification, the group that eventually becomes the core can depend purely on random chance (Fig. 4A).

To estimate how much observed homophily differs from choice homophily in real-world social networks, as well as to find the stable point that best corresponds to their structure, we fit several empirical datasets of off- and online social interactions to our model of triadic closure and choice homophily (Fig. 5). We use two approaches for fitting the data: (i) solving for the choice homophily paramaters *s _{a}* and

*s*from the fixed points of the mean-field equations, given the

_{a}*T*matrix in data and a triadic closure probability

*c*[we denote these solutions

*s*(

_{a}*c*) and

*s*(

_{b}*c*)], and (ii) using an approximate Bayesian computation (ABC) method (

*37*) to find estimates for the model parameters (denoted by

*A*as the relative difference between the normalized choice and observed homophilies

*i*follows Eq. 2. Note that the observed homophily

*c*= 0 (see Materials and Methods for details).

In terms of fitting the mean-field behavior of the model to data, three networks show homophily amplification in both groups: a Facebook friendship network consisting of two classes in a U.S. university (*38*), a 1-day contact network of primary school students divided by gender (*39*), and a network of political blogs divided by party affiliation (*40*). The rest of them—a friendship network in a website for sharing music listening habits (Last.fm) and a network of company directors (*41*), both divided by gender—display a pattern where part of the observed homophily within one of the groups could be explained by homophily amplification, but the choice homophily in the other groups could be underestimated because of the triadic closure mechanism. In the case of the board of directors, we observe choice heterophily within males and choice homophily within females, which together with the triadic closure mechanism explain the core of female directors observed in the study where this network was introduced (*41*). In the mean-field fitting procedure, the maximum homophily amplification possible [*A*(*c* = 1)] goes higher than 50% for the social network of political blogs, the largest Facebook network, and the board of directors. The exact estimate of choice homophily depends on the latent tendency for triadic closure in the network (*c*). However, the parity of the amplification [*A*(*c*)] is independent of this estimate, and the growth of amplification is monotonous as a function of *c*.

The qualitative picture we get for the amplification estimates using the ABC method is mostly similar to the mean-field fitting process. The changes are mostly in the scales of the effects for some datasets, and in the primary school data, the almost nonexistent amplification is now estimated as negative amplification. In addition, the ABC method gives us an estimate for the triadic closure probability (〈*c*^{ABC}〉), which ranges from very high (for the Facebook and political blogs networks) to medium (for the other datasets). Full posterior distributions for the choice homophily variables have a fair amount of variance, which means that the point estimates we give are indicative of expected behavior only (see Fig. 5 and the Supplementary Materials for posterior distributions for other parameter values). The ABC method also gives us a distribution of possible initial *T* matrices or, equivalently, *P* matrices (see the Supplementary Materials for related distributions). Although for most networks the initial condition is largely irrelevant, for the political blogs network, the observed homophily needs to be high in the initial condition. This is because of the political blogs network being located in a part of a parameter space with multiple fixed points (i.e., homophily hysteresis). Our model suggests that the network could have been in a core-periphery fixed point if the network had not evolved from a structurally polarized situation historically (see the Supplementary Materials for details).

A systematic analysis of 100 Facebook networks (*38*) reveals that the bulk of ABC estimated amplification values are positive, ranging up to the 60% value observed for the largest of these networks (and above in a few extreme cases). Note that there are large differences between the choice homophily estimates (*r* = 0.88), which may indicate that the amount of choice homophily we observe in each university is a feature of the university and not of the student class. Results on both of our fitting procedures suggest that using observed homophily as a naive estimator for choice homophily can lead to a serious overestimation or underestimation of the intensity of homophily (even for a moderate amount of triadic closure) in several real-world social networks, both in society and online platforms.

## DISCUSSION

Our findings show that the homophilous patterns of association typically seen in empirical social networks not only arise because of an individual preference for similarity but are also the result of a cumulative advantage-like process that has the tendency to amplify this intrinsic bias for choice homophily due to triadic closure. By means of a minimal model of social network evolution, we find bounds on the amounts of triadic closure and choice homophily necessary for such amplification of homophily to arise. This corroborates theoretically previous observations in organizational (*42*) and communication (*16*, *17*) networks. In the generic case of a moderate amount of triadic closure events and similarly sized attribute groups, choice homophily is amplified by triadic closure through a tipping point mechanism analogous to the one responsible for residential segregation in the Schelling model (*18*), in which segregation takes place in the social network topology rather than in physical space.

In addition to homophily amplification, our results suggest that the interplay between triadic closure and choice homophily is a plausible explanation for the emergence of the core-periphery structure found in social, communication, academic, trade, and financial networks (*34*–*36*). In such structures, the core group of individuals is so well connected that following edges via triadic closure almost always leads to the same group, making the core even more connected. While triadic closure and homophily are already considered as contributing factors in the formation of communities (cohesive and assortative groups densely connected within), the impact of node attributes on the core-periphery structure is less studied. Our model implies that the dynamic transition to core-periphery networks is slow and often preceded by fast but temporary homophily amplification. This may partly explain why the social network literature has focused on clustered networks rather than other, rarer types of intermediate-scale structures.

The coupled effects of triadic closure and choice homophily also include the memory of homophilic constraints, i.e., systems with multiple, coexisting stable points for a wide range of relative group sizes and amounts of triadic closure and choice homophily. In other words, even if choice homophily or triadic closure tendencies are changed, a social network may preserve memory of their current structural configuration. This makes it difficult to alter the shape of a stable network, for example, by varying the typical choice homophily of its individuals. On the basis of these findings, we expect that, when planning external interventions to reduce homophily-induced social segregation, measures of action should be taken sooner rather than later, since the scale of interventions with meaningful effect on the structure of the social network increases with time. Still, more research is needed before we can deliver concrete suggestions on intervention strategies.

Because choice homophily is not directly observable from static social network data, one needs to infer it from the available information. Such inference is always subject to assuming a model for data creation, and the exact estimates for choice homophily should always be interpreted with this in mind. Fitting the mean-field solution of the model to data assumes that the real-world network is in a stable state, since it is simply based on matching the linking probabilities (*T*) of the data with the stable states of the model. The more elaborate ABC method involves more observables and a finite number of evolution steps. One could also include the number of steps *t* as a fitting parameter to investigate its effect in the convergence of the method. The results from the mean-field and ABC fitting procedures match each other qualitatively in the sense that the overall conclusions drawn from them are the same. However, the point estimate values are different in many cases, and there is a notable amount of variance in the posterior distributions of the parameter estimates for the ABC method. This variance could potentially be reduced by allowing for larger network sizes (which are limited by computational constraints), by including different summary statistics in the discrepancy function, by further tuning the ABC method parameters, or by including additional mechanisms in the model. Applying these methods to mechanistic network models is a relatively new approach (*43*), and new developments in this area can be expected as these methods mature.

In contrast to our approach, stationary, nonmechanistic models, such as exponential random graphs (ERGMs), can also be used to study the interplay between triadic closure and homophily (*44*–*46*).The key conceptual difference to our approach is that ERGMs are static network models, which can be used to balance between the tendency toward triangles, homophilic edges, homophilic triangles, and many other network features as factors explaining network structure. Our approach is rather a model of cumulative interplay of two explicit and microscopic network evolution mechanisms [note, however, that some carefully crafted microscopic network evolution models can converge, under certain assumptions, to ERGMs as stable states (*46*, *47*), and ERGMs are often sampled with Markov chain Monte Carlo methods in which networks are rewired (*48*)]. This means that ERGMs do not explicitly model cumulative processes or tell anything about multiple time scales or metastable states, which we find as a consequence of combining triadic closure with a choice homophily bias.

Our approach assumes that the biological and cultural attributes underlying homophily are constant in time. While this assumption is mostly true for long-term individual characteristics such as gender or religion, it is less so for traits like political affiliation, occupation, and opinions. Networks where both edges and attributes change adaptively to each other (i.e., following adaptive coevolutionary dynamics) have been studied extensively for biological, economic, and social phenomena (*49*–*51*). When edges between nodes with similar attributes are favored, the adaptive dynamics self-organizes into heterogeneous networks where groups of individuals sharing attributes are structurally distinguishable from each other (*52*–*56*). Such a generic feature of adaptive networks makes it likely that our observations of the cumulative effects of triadic closure and homophily will hold even in the case of time-dependent individual attributes (*57*, *58*). The study of an adaptive interplay between triadic closure and homophily is a worthy line of future research that may reveal additional, complex feedback loops between social structure and attribute evolution.

The simplicity of our framework suggests that the presence of triadic closure and choice homophily for a given attribute value is enough to explain some salient features of empirical social networks like homophily amplification and core-periphery structures. Yet, the effects of more realistic features of society, such as the existence of more than two values for a single attribute, structural constraints beyond triadic closure, and the coexistence of several attributes in a population [in the spirit of the Axelrod model of cultural dissemination (*59*)], remain to be studied. We anticipate that our results promote even more interest in the data-driven computational simulation of social interactions and shed further light on the relationship between triadic closure and homophily. This insight will help researchers and policy-makers in devising intervention strategies to decrease the most adverse effects of homophilic decision making, including segregated social structures such as gender-specific workplaces and partisan political systems.

## MATERIALS AND METHODS

### Mean-field bifurcation analysis

We derive approximate analytical expressions for the temporal evolution of the amount of observed homophily in a social network based on a mean-field bifurcation analysis of our model. The key assumption in this approximation is that all nodes within both groups are statistically equivalent, in the sense that we only track the relative number of connections within and between groups to determine the state of the system. Otherwise, the network is considered fully random (i.e., following a stochastic block model) and of infinite size. The validity of this approximation, which omits all higher-order structure such as local clustering or degree distributions, is confirmed by comparing it to extensive numerical simulations (see section S2).

In the case of two attribute groups, the state of the system at time *t* can be tracked by a 2 × 2 matrix *P*, where the element *P _{ab}* is the probability that an edge chosen uniformly at random lies between groups

*a*and

*b*. Equivalently, we may follow the dynamics of a 2 × 2 transition matrix

*T*, where element

*T*is the probability that following a random edge from a node in group

_{ab}*a*leads to a node in group

*b*. We can switch from the

*P*to the

*T*matrix via

*T*, and noting that

_{bb}*T*= 1 −

_{ab}*T*and

_{aa}*T*= 1 −

_{ba}*T*. A similar bijective transformation exists from

_{bb}*T*to

*P*.

The evolution rules of choice homophily and triadic closure in the mean field are captured in the model matrix *M _{ab}*, defined as the probability that, in a single time step of the dynamics, we create an edge between nodes in groups

*a*and

*b*, respectively, when a node from group

*a*has been selected as the focal node. It can be written as

The first term in the parenthesis on the right side of the equation is the probability that triadic closure chooses a candidate neighbor between groups *a* and *b*, and the second term gives this probability for random choice of the neighbor. This sum is then multiplied by the choice homophily probability that the link is accepted (*S _{ab}*).

Using *M _{ab}*, we then write a rate equation describing the change in the fraction of edges within group

*a*

*P*(or a similar equation for

_{bb}*P*). Here, the first term on the right side of the equation is the rate at which the edges are created, and the second term is the rate at which they are deleted because of successful rewiring. We determine the fixed points of the rate equations and their stability, and therefore those of the mean-field dynamics, through linear stability analysis (see the Supplementary Materials for details of the analytical solution of the model and Figs. 2 to 4 for a summary of the analytical results).

_{ab}### Numerical simulations

We use numerical simulations to verify the accuracy of the mean-field approximation of Eq. 6 (Fig. 2B). We first construct a random network with *N* = 10^{5} nodes and average degree 〈*k*〉 = 50. To create networks with different initial conditions in terms of in- and out-group edges, we choose values for the fractions of same-group neighbors *T _{aa}* and

*T*. For simulations in Fig. 2B the initial networks have (

_{bb}*T*,

_{aa}*T*) = (0.5,0.5). For the inset, we use two initial conditions, i.e., (

_{bb}*T*,

_{aa}*T*) = (0.1,0.9) and (

_{bb}*T*,

_{aa}*T*) = (0.9,0.1). We then create two random networks so that the number of edges in each network corresponds to the desired number of in-group edges. Last, we place the remaining edges randomly between the two groups, so that the final network has

_{bb}*L*=

*N*〈

*k*〉/2 edges.

Simulations follow the model definition described above. Between times *t* and *t* + 1, we attempt to rewire *L* edges, so that on average, each edge in the network is chosen once. For the parameters in Fig. 2B, *t* = 10^{2} is enough for getting convergence to a fixed point, while for the parameters in the inset, we need *t* = 10^{3}. Each point in Fig. 2B is averaged over 10^{2} realizations, with the SD smaller than the marker size (see the Supplementary Materials for a more detailed analysis of model parameters).

### Social network data

We use several large-scale social network datasets to determine empirically the possible effects of triadic closure on the observed homophily. The first one is Facebook, a friendship network of two classes at the University of Pennsylvania in the United States. The dataset includes friendships and metadata for 100 universities during 2005 (*38*). For each university, we use the subnetwork of the two largest classes. The second one is Polblogs, a network of political blogs collected in 2005 (*40*), with edges between two nodes if at least one of the blogs links to the other. Blogs are split into two groups using the classification of liberal and conservative blogs provided by the original study. The third one is School, a network between students collected by automatically sensing proximity between individuals. The original data have a 20-s time resolution for 2 days, which we aggregate into edges by considering two nodes connected if they have been in each other’s proximity for at least 20 min during the observation period. Nodes are split into two groups according to gender (*39*). The fourth one is Last.fm, a snapshot of a self-reported friendship network in a music-listening website. The network is split into two groups according to gender, and it includes only users for which this information is available (see table S1 for a summary of dataset features and the Supplementary Materials for more details). The fifth one is Directors (*41*), a network of board directors for publicly listed companies in Norway, with groups determined by gender. This dataset was collected to analyze the effect of an affirmative action law enforced in 2008, which required for each gender to have at least 40% representation in any position, with a link established if two people belong to the same board of directors.

### Homophily measures and model fitting

We have three ways of estimating choice homophily in the data: naive choice homophily estimation, choice homophily estimation with the mean-field approximation, and choice homophily estimation with ABC.

*Estimation with the mean-field approximation*. To estimate the amount of choice homophily in both groups of an empirical network, we solve the following inverse problem: Given a certain value *c* of triadic closure, we find the choice homophily parameters *s _{a}* and

*s*in our mean-field solution of the model that lead to the observed edge fractions between and within groups,

_{b}*P*,

_{aa}*P*, and

_{bb}*P*. We solve this inverse problem by setting

_{ab}*dP*/

_{aa}*dt*= 0 in Eq. 6 and solving for

*s*(

_{a}*c*) and

*s*(

_{b}*c*) given the matrix

*P*or equivalently the transition matrix

*T*(see the Supplementary Materials for a closed-form formula).

*Estimation with ABC*. ABC methods allow us to infer model parameters without knowing the explicit functional form of the likelihood function of the model. We use a recently developed method called Bayesian Optimization for Likelihood-Free Inference (*37*). The parameters we fit are the two choice homophily parameters *s _{a}* and

*s*, the triadic closure probability

_{b}*c*, and the transition matrix at the initial condition (

*T*

_{0}). We use uniform priors for all of these variables. The relative group size

*n*and average degree 〈

_{a}*k*〉 are set to the same value as in the data. The number of time evolution steps is set to

*t*= 200. The expected model results are not sensitive to network size, and for computational reasons, network size is limited to

*N*= 1000 (this limitation might affect the variance of the posterior estimates). The method also requires defining statistics we want to fit, and for this, we use 14 different statistics related to connectivity of the groups, clustering coefficients, and core-periphery structure. For consistency, we find that creating synthetic networks with our model, the ABC fitting method is able to recover all parameter values (with some issues mostly focused around the unstable fixed points) (see the Supplementary Materials for details).

*Naive estimation*. In the naive estimation of choice homophily, we assume that all of the observed homophily is due to choice homophily. For groups of equal size, these estimates are given by *T _{aa}* and

*T*(i.e., elements of the transition matrix). Since group sizes will have an effect on the

_{bb}*T*matrix, we normalize the estimates separately for each group size (to keep choice homophily estimates comparable to each other). If we set

*c*= 0 in the mean-field estimation, we get a naive estimate of the biases that does not consider any triadic closure but, for example, corrects for a disproportionate amount of links observed within large groups as compared with small groups even if there is no intrinsic bias

Note, however, that this feature of our estimation process leads to a different size correction than the Coleman homophily index (see Table 1) (*60*).

## SUPPLEMENTARY MATERIALS

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/6/19/eaax7310/DC1

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is **not** for commercial advantage and provided the original work is properly cited.

## REFERENCES AND NOTES

**Acknowledgments:**Numerical simulations were performed using computer resources within the Aalto University School of Science “Science-IT” project. J.U.-C. and M.K. thank the support from the Academy of Finland through the ECANET-project (No. 32779). We thank R. Dunbar, J. Saramäki, R. Hari, and M. San Miguel for helpful feedback.

**Author contributions:**All authors contributed to the writing of the article and interpretation of the results. A.A., G.I., and M.K. contributed to the simulation and analysis software and deriving the mathematical results. A.A., J.U.-C., and M.K. prepared and analyzed the data.

**Competing interests:**The authors declare that they have no competing interests.

**Data and materials availability:**The Last.fm network dataset used in this study has been uploaded to Zenodo and is available under the accession number 3726824. The other data sets have been downloaded from publicly available data repositories. Key parts of the code used in this study are available from M.K. upon reasonable request.

- Copyright © 2020 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC).