Introduction

Social network analysis has been widely used to study marital alliances in a given polity or historical milieu and to obtain insights on economic, political, and sociological issues.Footnote 1 Marriage (or marital) networks are usually built on a family level and they are represented by a graph where the nodes (vertices) are families, and an edge (arc) linking two nodes indicates the number of marriages between members of the two families (weighted network) or, less often, whether there is at least one matrimonial link or none between them (unweighted). More precisely, starting from a set of individuals and their matrimonial ties, the former is partitioned into families along patriarchal line, so a family is a group of relatives with a common surname. This is the usual simplification to an unipartite network of what is, to start with, a bipartite graph where the two “parties” are the female and the male partitions of the family, and an edge joins the node of the groom’s family to the one of the bride’s family. A consequence of this simplification is that an intra-family marriage, for example a marriage between two cousins with the same last name, is represented by an edge between two separates nodes in the bipartite graph but is transformed into a self-loop, i.e., an edge that originates from and terminates in the same node, because in the unipartite network, the groom’s and the bride’s family nodes are the same.

The emergence of self-loops has disadvantages: since self-loops are often dropped in the analysis, useful information about the network of alliances is lost. Clearly, the loss in information depends on the number of self-loops which, in turn, is linked to the size of the families and their willingness to arrange internal marriages and still be considered a single family. Moreover, the existence of self-loops is also a cultural phenomenon. For example, in much of Africa marriage is exogamic and used as a way to create bonds between different clans, whereas in South Asian countries or in the Jew tradition, endogamic marriage is socially accepted and even encouraged [1, 2].

In this paper, we are interested in how self-loops can affect centrality measures [3], such as betweenness centrality, eigenvalue centrality, and PageRank. Centrality measures fall roughly in two fields: those using the adjacency matrix, like eigenvector centrality, and those using paths over the network. Since self-loops are represented by non-zero values on the diagonal of the adjacency matrix, the first type of measures could take self-loops into account and use them to compute matrix eigenvalues (eigenvector centrality) [4]. On the other hand, path-based measures typically work by traveling or measuring the connections between one vertex and another, and such paths generally exclude self-loops. However, even when self-loops could be used in the analysis, they are usually dropped before creating the graph from which the computations are made, so they are not taken into account in any meaningful way.

Technically, networks that include self-loops are pseudographs or multigraphs [5],Footnote 2 whose main difference with regular graphs is the fact that there can be several edges joining two nodes or, as in this case, a node with itselfFootnote 3; this is why, it could be argued that an effective treatment of networks that include self-loops should take this fact into account. Unfortunately, and despite pioneering efforts [7] and [8], for the time being, it is impossible to perform comprehensive social network analysis, including node-level measurements as well as meso-level structures (modular structures such as communities), if marital networks are represented using multigraphs.Footnote 4

Due to the lack of either social analysis algorithms or software capable of handling self-loops, most recent papers [9, 10] keep applying the usual methodology of collapsing these graphs to single-mode graphs and then employ off-the-shelf social network analysis tools (such as the igraph R or Python library [11] or Gephi [12]). The use of such tools is essential for the democratization of social network analysis in historical, anthropological, and sociological contexts, and being able to use them is set as the foundation of the proposals we present in this paper.

To study the influence of self-loops in the analysis of marital networks, we propose a selection of methods for converting bipartite networks that include self-loops into single-mode networks. Before presenting our proposals, we note that since centrality measures use paths over edges for their computation, any method to incorporate self-loops involves the creation of new nodes and new edges connecting these “artificial” nodes to the old ones. Therefore, our alternative methods will differ in the way nodes and edges representing intra-marriages are incorporated into the graph.

After describing three alternative methods for incorporating self-loops, we apply them to a dataset of marriages from the Republic of Venice and discuss their relative merits. Note that it is complicated or even impossible to use an external measure to prove that the centrality values obtained by including self-loops are better than those obtained by excluding them. Therefore, we will validate our method empirically, first by testing whether it is able to meaningfully incorporate self-loops into the calculation of centrality measures, without introducing uninterpretable artifacts that would obscure the analysis; and, second, whether the resulting ranking of nodes is able to better represent the status and position of families, as described in the original article in which the dataset was first introduced.

Once we have chosen the method that best satisfies the above requirements, we apply it to a second marriage network of Taiwanese elite families; this network has very different characteristic in terms of both the number of intra-marriages to be included, and the structure of the graph. This will allow us to check the robustness of our method.

This paper focuses on marital networks, but the importance of self-loops is also acknowledged in other contexts. For example, [13] proposes a method to analyze private vehicle commuting traffic networks in cases where intra-county traffic connections are significant. Their model, called CCME-SL, is able to account for self-loops in community detection algorithms; [14] and [15], instead, discuss the importance of self-connections between nodes when studying the persistence of metapopulations in geo-ecological networks and suggest new network metrics to account for them. Different contexts have different interpretation on self-loops: in an online social network context [16], it refers to re-posts of former content, for instance, and the paper studies its influence in the context of information diffusion among support groups. In an epidemiological setting [17]; it would be equivalent to self-contagions or contagions among members of the same community, although the cited paper analyzes commercial networks and its influence in the spreading of swine fever. In general, providing a tool that is able to use self-loops beyond high-level measures (like its number or existence in certain nodes) will contribute to a deeper understanding of social network dynamics in many different contexts.

The rest of the paper is organized as follows: the next section is a brief survey of analyses of marital networks; next, “Datasets” presents the datasets we will be using; “Methods and results” describes the steps of each proposed method and applies them to a specially selected social network, the Venetian Republic marital network; we then validate the methods that meets our requirements in a second network, the Taiwanese elite families network. A brief discussion follows in “Discussion”. Finally, our conclusions are presented in “Conclusion”.

This paper has been developed in an open science environment, following the principles of Agile Science [18]. This guarantees in-time delivery, as well as a clear problem-solving orientation from the beginning. Milestones in the development of the paper can be checked in its repository.

A survey of marriage networks

Marriage and kinship networks are an interesting source of insights into the social, economic, and political dynamics of polities below a certain size, where families are strongly linked by mechanisms of economic or social inheritance. After Padgett and Ansell’s pioneering analysis of marriage networks in the Grand Duchy of Florence [19], they have been explored in many different cultures and historical periods: marriages in medieval Venice have been studied to shed light on the pattern of long-distance trade [20] and on the careers of politicians [21]. In the Republic of Venice, access to power was restricted to aristocrats and nobility was hereditary. This is why, these two papers include all available marriages between noble families; [22], on the other hand, limits his attention to the families of doges, the heads of the republic. Marriages in the Venetian Republic territory of Ragusa (present-day Dubrovnik) in the sixteenth, eighteenth, and nineteenth centuries are the focus of [23].

Marital networks in East Asian countries have also been extensively investigated: [9] examines Taiwanese elite families (1895–1996), [24] the Joseon Dynasty in Korea (1476–1910), and [25] the Tang aristocracy in China (618–906). Moving to Southeast Asia, namely the Philippines, [26] uses family network centrality to explain mayoral elections results in the first decade of the 2000s and [27] analyzes the family networks of bureaucrats and their relationship with the effectiveness of public service delivery. To conclude this long list, Haitian elites are the subject of [28], and [10] examine marriages among ’Ndrangheta families in the South of Italy. For more references, see also [29].

Centrality measures are at the heart of most of the papers listed above, but self-loops are not. This article attempts to make a contribution in this area by discussing different ways of including intra-family connections and empirically testing them to find the most appropriate. We will do this using two datasets that apply the usual approach of turning the bipartite marriage network to a unipartite network and that include a significant number of intra-family connections overlooked in the original study. They are the Republic of Venice network by Puga and Trefler [20] and the Taiwanese elite family network by Dluhošová [9] that we present in the next section.

Datasets

Table 1 Summary of the original datasets used in this paper, which shows the differences between them in almost all aspects

The dataset of marriages involving a noble husband in the Republic of Venice, from 1348 to 1887,Footnote 5 is based on records from the Archivio di Stato di Venezia and was digitized by Puga and Treffler [20]. In the process, family names were normalized to the most common spelling.Footnote 6 We have also eliminated all marriages that include a non-patrician wife.Footnote 7 The unipartite, undirected, and weighted network thus obtained includes 348 nodes and 12227 arcs. The total number of intra-marriages in this network is 385, 3.15% of all marriages.

The next dataset we are going to use is Dluhošová’s marital network of Taiwanese elite families [9]. The original database includes several types of kin relationships, from which we extracted the marriages to obtain a undirected weighted unipartite network as before. The original dataset includes family names in Chinese characters. We processed these names using machine translation and some manual corrections, so that family names in this paper match those in the original one. In the resulting network, there are 1243 nodes and 1365 edges, making it much more sparse than the previous one. Out of the total number of edges, only 18 are self-loops, that is, 1.32% of the total number of marriages. This percentage is lower than the 3.15% of the Venetian network, but not in a totally different order of magnitude.

A summary of the two datasets is shown in Table 1; one can see that they are quite different from most points of view, the main one being the clustering coefficient. Both, however, correspond to cultures where the concept of “family” spans several generations, and, most importantly, include self-loops, so they are both adequate for our purposes. Please check the Declarations section for data and code availability.

Methods and results

In this section, we propose three different methods to include self-loops in the analysis of marriage networks, and apply them to the Venetian dataset; as this network has the largest number of self-loops, it can give us a better idea of the usefulness of our proposals. The results obtained on this network will allow us to choose what looks the most appropriate method among the three proposed. To verify the quality of our selected method, we apply it again to the Taiwanese dataset, a marital network very different from the Venetian one.

For each method, we will first check whether it is able to meaningfully incorporate self-loops into the calculation of centrality measures, and second, whether the resulting ranking of nodes is able to better represent the status and position of families described in the original papers where the datasets were first introduced.

We will consider three of the measures most commonly used in social network analysis: betweenness, eigenvector, and PageRank centrality. Betweenness centrality [30] is a measure of brokerage and bridging, that is, of how well one family is able to intermediate between the others; it is a good first approximation of a family’s power, reputation, or influence. Eigenvector centrality [31, 32] has been used extensively in social network analysis and takes into account not only how well connected is a node but also the importance of such connections; PageRank centrality [33], as EV centrality, is defined recursively, but is based on the importance of all the in-coming ties. Other centrality measures would either be unrelated or unaffected by self-loops, such as closeness centrality, or would be affected in a trivial way, for example degree centrality.

Fig. 1
figure 1

Venetian marital networks with self-loops eliminated; this will be our baseline graph

The baseline network we will be working with is shown in Fig. 1; self-loops have obviously been dropped. For the following discussion, values of centrality measures calculated on this dataset will be used as a benchmark. The rankings of the top ten Venetian families for the three centrality measures we consider in this paper are shown in Table 2.

Table 2 Top ten families in the Venetian dataset, with self-loops excluded, according to the three centrality measures: betweenness (left), EV centrality (middle), and PageRank (right)
Table 3 Top ten families by number of intra-family marriages

Instead, Table 3 shows the top ten families in terms of the number of intra-marriages. As we can see form the table, the Contarini family has the highest number of internal marriages, accounting for 5% of the total number of marriages.

Before presenting our proposals, notice that to incorporate self-loops, any method must create new nodes and new edges linking the new, “artificial” nodes to the old ones, since methods such as the PageRank and betweenness centrality use paths over edges for their computation; new edges will have to match, somehow, the intra-links we want to incorporate into the graph modeling the social network. Therefore, the proposed method will differ in how they create new nodes.

Method 1: “New nodes”

Fig. 2
figure 2

Illustration of the “new node” method, with the original network with a self-loop on the left and the modified network on the right. The original nodes are light blue; the “replica” ones gold

The first method tested is relatively simple and straightforward: for each node with n self-loops, add a new node connected only to the original one, with a weight equivalent to n. This is illustrated in Fig. 2. In other words, we convert a self-loop into an edge between two nodes for the same family: the original node and the new one.

This method, trivial as it is, does not change the overall shape of the network, so the structural influence of these “new” nodes (and their edges) is lost. Once rendered, it might help visualize the placement of the families with some degree of intra-marriages, but little else; this could be achieved in other ways that do not involve changes in the network, such as the use of size or color in the visualization of nodes. Thus, we discard this method altogether.

Method 2: “Split families”

Fig. 3
figure 3

Illustration of the “split families” method, with the original network with a self-loop on the left and the modified network (via the “split families” method) on the right. Female nodes have been painted gold. We have assumed that there is a single marriage between nodes, and thus, C and B need to be either male or female each, while D is split in two different (and unconnected) nodes, since they do not have a self-loop

A different approach for taking intra-family ties into account is to consider husband and wives as different vertices of the graph; this would convert the “raw” bipartite graph (with the two parties being “bride” and “groom” nodes) in a single-party graph by simply relabeling the graph as a single-mode graph and analyzing it as such; this is illustrated in Fig. 3. From a historical perspective, this makes sense only in contexts where marriages are not egalitarian, and female and male parts of a family are separate actors, belonging to different “classes”; but, from a strictly pragmatic point of view, it is a simple way of treating a family’s marriages to itself and marriages to other families equally. The side effect is that female and male nodes of the same family will be separated by, at least, one other node, unless there are intra-family marriages, of course; this is illustrated by node D in the figure, that has been separated in nodes \(D-F\) and \(D-M\), which are not directly connected. Please note that, in this method, all nodes and edges are changed, since “M” nodes can be connected only to “F” nodes.

Fig. 4
figure 4

Graph representation of data processed using the “split families” method, that considers separately husbands and wives in the marital network. The representation uses the default rendering algorithm in the igraph R package. “Husband” nodes are colored in blue and “Wife” nodes in gold

Table 4 Top ten nodes in the Venetian dataset processed with the “split families” method according to the three centrality measures: betweenness (left), EV centrality (middle), and PageRank (right)

To get some insight about the structure of the network with split families, the latter is rendered in Fig. 4 where male nodes are in blue and female nodes are in gold.Footnote 8 The figure shows how some families seem to occupy the center through the “husband” nodes, while others through their “wife” nodes, implying that some families achieve centrality by marrying their daughters (and providing dowry for it), while others, possibly more successful families, are sought for their position. Another interesting feature is that two small sub-networks, connecting a female of one family to the male of another, have been created; these sub-networks were originally connected through their “other” parts, so this is an artifact of this representation: We cannot pretend that the female and male part of a family is disconnected even if there are not intra-marriages. However, trying to correct this effect would lead to additional artifacts.

Since differences in centrality values calculated on different networks are meaningless, to see how this method affects centrality measures we will compare family rankings. Top ten rankings for the split families network are shown in Table 4. Comparing them with the benchmark of Table 2 is not an easy task, due to the fact that the original nodes have been split and converted in others, but some facts emerge clearly nonetheless. The first, not surprising, is that the Contarinis, originally the most central family and also the one with the most intra-marriages, remain at the top of the ranking for both male and female. The second is that the family ranked fourth in the benchmark case, the Donatos, has now fallen (for both nodes) below the Morosinis who were originally only in fifth place. According to Table 3, the Donatos only had 11 inter-marriages, while the Morosinis had 23; this is why, the Donatos had to give way.

Other comparisons, however, are more difficult, because for the three centrality measure of Table 4, the male and female nodes, once split, end up in different position in the rankings, making it difficult to measure the centrality of the family as a whole. Moreover, from a historical or sociological point of view, it is very difficult to interpret the “male” and “female” members of the family as different actors of the social network. Therefore, although this methodology for taking into account intra-family marriages might open some interesting angles of research, we discard it.

Method 3: “Duplicated node”

The third method we propose is similar to the first one, because it creates a new node for each family with intra-family connections, which is why we will call it duplicated nodes. As with Method 1, this new node is connected to the original one by an edge of weight equivalent to the number of intra-family marriages; however, it is now also linked to all the nodes to which the original node was connected; thus, the “original” node and its “replica” have all the same connections (possibly including connections with “replica” nodes, of course), and are also connected to each other.

Fig. 5
figure 5

Illustration of the “duplication” method, with the original network with a self-loop on the left and the modified network on the right. The original nodes are light blue and the “replica” ones gold

To better understand how this method works, Fig. 5 illustrates its effects over the usual simplified network with four nodes and one self-loop for node A: in practice, when A is duplicated to its “replica” A’, if originally there was a registered marriage between, let us say, the Contarini (A) and the Morosini (B), an additional “fictional marriages” (that is, an edge) will be created between the other Contarini (A’) and the Morosini; as well as, of course, the Contarini–Contarini (A–A’) weighted edge. The justification is that, to account for intra-family marriages, we must consider that large families have two (undistinguished) parts; this would account for the links that have been created between the different parts of the two (large) families. Then, of course, intra-family marriages will link these two parts of the family, which again are undistinguished.

Fig. 6
figure 6

Graph representation of data processed using the “duplicated nodes” method, that duplicates the node for families that have intra-connections; “original” nodes are colored in blue, “replicated” nodes in gold. The default rendering method in the igraph package has been used to place the nodes

The Venetian network treated with the duplicated node method is shown in Fig. 6, with the “replica” nodes in gold; what we can observe in this image is that the nodes in gold, which are fewer in number that those in blue, are mainly located in the center of the graph. This tells us that families with intra-family marriages are more central than the others, i.e., they have higher centrality measures and are thus placed in the center by the layout algorithm.

Table 5 Top ten nodes in the Venetian dataset processed with the duplicated nodes method according to the three centrality measures: betweenness (left), EV centrality (middle), and PageRank (right)

We can now compare the centrality measures obtained by the duplicated node method with the benchmark. The new values are shown in Table 5, where the “replica” nodes have been eliminated from the ranking, because they have exactly the same values as the “original” one by design. A comparison of Tables 2 (left) and 5 (left) shows that the addition of the “replica” nodes decreased betweenness centrality for every node of the ranking. To understand why, consider that betweenness centrality measures how much a certain node is “in-between”, that is, how often it is found when going from one random node to another using the shortest path. Our procedure has increased the number of nodes and edges, and thus, the measures for specific nodes are bound to be affected; in particular, what decreases per-node betweenness in the families shown in this ranking is the fact that other families have also duplicated their nodes, and this creates new nodes that will have the exact same short path passing through them; thus, the decrease in betweenness will be due mainly to the number of families with intra-marriages (duplicated nodes) that will still need to go through the node to get to other nodes.

However, a more proper way to compare two networks with different structures is to look at rankings and how they are affected by the new method. We see that the top six families in terms of betweenness centrality retain their position, but there are changes in the bottom four and, in particular, the Venier family, who did not belong to the original ranking, is now in seventh place. Overall, however, there are no drastic changes, which is good in this particular case, because this would have been at odds with the other family status indicators presented in the original paper.

The effect on eigenvector centrality observed when comparing Table 2 (center) and 5 (center) is also very small. The Contarini family, which we know has the highest number of intra-family marriages, is still the first in the ranking and the other families following the Contarini also maintain their position, with the exception of the tenth place which is now held by the Pisani instead of the Loredan. As we mentioned above for betweenness centrality, the small variations are consistent with the other status indicators for Venetian families. Notice also that the swap in places of the Pisani and the Loredan can be traced to the difference in internal marriages of the two families. From Table 3 we see that the Pisani family has eight self-marriages, while the Loredan only has four. Therefore, intra-marriages increase the (relative) centrality of the families who have them.

Finally, we turn our attention to PageRank centrality. This is a measure designed primarily for directed graphs; when used in undirected graphs, as we are doing here, it gives a recursive measure of influence alternative to eigenvector centrality. As can be seen by comparing the rightmost column of Table  2 and 5, for well-connected families, which are also those with a high number of intra-marriages, the changes are so negligible that the ranking of the first ten families remains the same.

Beyond changes in rankings, the effects of the duplicating nodes method on a family’s influence can be illustrated using the simple network of Fig. 5. If the “replica” node (A’) is interpreted as “another” part of the family that is not directly related (or not directly enough to prevent internal marriages) to “the original” one, when the new edges are added, an alternative way of getting to any part of the family, either the original node (A) or the other, separate, part of the family (A’), is created. The splitting family method, which divides the family by gender, was also a way of introducing “another” part of the family, but the division it introduced was fixed, whereas the duplicated node method simply indicates that there are different parts of the family, independent enough as to allow internal marriages, but still externally recognized as belonging to the same casata or dynastic house. Therefore, unlike the other methods studied so far, the duplicated nodes does have a straightforward interpretation in social terms.

The duplicated node method and the Taiwanese network

Fig. 7
figure 7

The Taiwanese marriage network, using graphopt method for layout; this method is optimized for graphs with a large number of nodes

We now apply the duplicated node method to the elite families marital network in Taiwan. The network is rendered in Fig. 7.

Table 6 Intra-family marriages by family in the Taiwanese dataset

Table 6 lists all families with an intra-family marriage. The first two families in the table have two, the others only one. As indicated in “Datasets”, this network has been extracted from Dluhošová [9] who classifies the most prominent families between “old” and “new” ones, as well as so-called “Mainland ruling elite” families. Additionally, using community analysis, the paper finds 11 communities, designated with letters from A to K, where letters are assigned in descending order of number of nodes; the “A” community, thus, is the largest with 4.33% of the nodes.Footnote 9 The main families in each community (referred to as “networks” in the paper) are also indicated by a number in descending order of centrality: for instance, the Ling Xiantang family, labeled A1, is the most prominent family in the “A” community, the “Wufeng Ling family network”.Footnote 10

Table 7 Top ten nodes in the Taiwanese elite families dataset with self-loops dropped according to the three centrality measures: betweenness (left), EV centrality (middle), and PageRank (right)

As we did before, we first compute benchmark centrality measures when self-loops are eliminated. The resulting rankings for the ten top families are shown in Table 7 for betweenness centrality, eigenvector centrality, and and PageRank, respectively. Looking at the three tables, we immediately see a striking difference between the Taiwanese network and the Venetian one: in the benchmark measures for Venice (Table 2), the most central family is the Contarini, the same for all three rankings. Furthermore, seven out of ten families appear in all three rankings and a total of only 13 families are found across the three rankings, i.e., all measures of centrality identify the same small group of families. In the Taiwanese dataset, instead, the first spot in the ranking is taken by a different family for each measure, and a family ranked first for one measure is not even in the top ten for another, as in the case of the Yan Fu family, which is first in eigenvector centrality but not in the top ten for betweenness centrality. Looking at the data, one notices that the structure and composition of the network are totally different, and the network is much more sparse, which causes different measures to increase the centrality of specific families depending not so much on their degree, but on the position they have in the network; this is also noticeable in the betweenness centrality measures, which in this case are two orders of magnitude greater than in the Venetian network.

Having established in the previous section that the duplicated node method can provide us with a way to introduce self-loops in any of these centrality measures, let us then apply it to this network. As we can see from the rendering of the resulting network, shown in Fig. 8, duplicating nodes only adds 16 new nodes in a network with numerous families; this is in sharp contrast with the Venetian network of Fig. 6 which, as we said, had less nodes and proportionally many more self-loops. Therefore, this is obviously an extreme case of network with intra-family ties: a very small percentage of families are big enough to allow internal marriages. As the following analysis shows, they do, however, have an impact on centrality measures.

Fig. 8
figure 8

The Taiwanese network using the “duplication” method; “original” nodes are colored in blue and “replicated” nodes in gold. The graphopt method has been used to place the nodes, as in the previous figure

Table 8 Top ten nodes in the Taiwanese dataset processed with the duplicated nodes method according to the three centrality measures: betweenness (left), EV centrality (middle), and PageRank (right)

Let us see how the introduction of these new ties impacts the rankings of the three centrality measures considered. The results are shown in Table 8. A first look indicates that even though we have only introduced 16 new nodes with their edges, there is a considerable effect in all the rankings. However, the impact varies from one ranking to the next. We compare the benchmark and the new ranking in turn.

Starting with betweenness centrality and comparing Tables 8 (left) and 7 (left), we see that only the top two families and the sixth keep their position in the ranking. In addition, one family, Tainan Liu, which in the benchmark case was included in the top ten in fourth place, drops out of the new ranking. The fact that there are small corrections in the families included in the top ten ranking matches what happens with the Venetian marital network. Another common feature is the decrease in betweenness centrality values, although the change is now on a different scale: in the Venetian network, the value was almost halved, while in this network, there is only a small correction; this can be explained by the small number of families with self-loops, which implies that the information we are adding to the network should not have such a big numeric impact, although, as we have seen, it has an impact in the rankings.

What really sets this case apart is that the changes are not induced directly by the introduction of new nodes and edges, because the families that change their position in the ranking are not those with self-links; in fact, there are only two families with (a single) intra-marriage, among those that are in the top ten for betweenness centrality; these are Lin Dingbang and Tainan Liu, and for both betweenness centrality decreases, for the Tainan Liu family to the point that it drops out from the ranking.

Since the Tainan Liu family is also the one that tops one of the other two rankings, we should probably analyze its position in the network to illustrate how it achieves it and how it is impacted by the addition of new nodes; the ego network, that is the sub-network that includes all nodes connected to Tainan Liu, is shown in Fig. 9. As it can be seen, it is (mostly) a star-type network: The Tainan Liu family serves as connection for a good number of nodes, and most of them can only connect to the rest of the network through it. The ego network also includes 2 of the 16 nodes that have been added; in such a sparsely connected network, the addition of new nodes and their corresponding edges is bound to have a great local impact. The fact that many nodes are only connected through the Tainan Liu family explains its high eigenvector and Page Rank centrality; on the other hand, since another node has been added in this version of the network, it provides alternative paths to the many nodes connected to it, thus decreasing its centrality, critically in this case, since it makes the family drop from the eigenvector centrality ranking. This is a totally intended effect of the introduction of new nodes and edges for families with intra-marriages.

Fig. 9
figure 9

Ego network for the Tainan Liu family

If we compare Tables 7 (center) and 8 (center), which show the rankings in terms of eigenvector centrality, the situation is different: The Tainan Liu family, which was not among the top ten families in the benchmark case, rises to first place in the duplicated nodes ranking; moreover, all the top ten families in the new ranking were not in the benchmark top ten. Also notice that four out of the top five families in the new ranging have self-loops (all of them except Lin Xiantang).

Since the ranking has totally changed, we need to ground these results to what was published in the original paper. This is not straightforward, because, as explained at the beginning of this subsection, families were classified along two dimensions, size and centrality, represented as letters and numbers, respectively. In what follow we will try to use this classification to understand what type of families are raised to the top in the duplicated node ranking. Looking at the top five families for eigenvector centrality in the new ranking (center column of Table 8), we find, in descending order, Tainan Liu family (A3), Qingshui Cai (A7), Lin Xiantang (A1), Ling Dinbang (A2), and Ling Weiyuan (F1). On the other hand, in the benchmark ranking without self-loops, in the top two positions there are, in descending order, Yan Fu (F3) and Lin Weirang (F2), followed by Wan Shao Mou, which is not even listed in the largest or most prominent families,Footnote 11) and then Yan Yunnian (B1) and another unlisted family, Jia Dehuai.

Since the most prominent families by size are identified with the first letters of the alphabet and by centrality with smaller numbers, the duplicated node method seems better able to identify the most prominent families. In other words, looking at eigenvector centrality, the ranking without self-loops put seemingly more irrelevant families, either from smaller communities or less central) at the top, while the new ranking picks families from the A cluster and with small numbers.

Finally, the PR rankings, shown in Tables 7 (right) and 8 (right), do not change too much when including self-loops in the way of duplicated nodes (as it was the case with the Venetian marriage network). The top four families in the ranking do not vary: Chen Zhonghe is J1 (unnamed in the original paper), Yan Fu (F3), Yan Yunnian (B1), and Cheng Zhong Mo (A12) families, scattered over different networks. Wu Xiuji (B10) has substitute of Tainan Liu (A3) in fifth place, and Lin Weirang (F1) in the next one. The B group corresponds to the Jilong Yan network, also composed by old families. This change in ranking is certainly more difficult to interpret, since both A and B families are old. However, it is interesting to note that the Lin Weirang family, which is mentioned in Dluhošová’s paper, section 5.1, as the link of the F network (Banqiao Lin, old) with the C network (ROC Mainland ruling family, mainland elite families) and the J network (Gaoxiang Chen, old), has been boosted with respect to the benchmark ranking, where it was in eighth place. Although not a family including self-loops itself, this does show how the structural changes induced by the duplicated method can make the resulting PageRank better reflect the actual status of a single node within the network.

Discussion

In this section, we discuss the soundness of the three methods and how they affect various dimensions of the graph on which they are applied. In particular, we will consider four aspects: first, whether the method introduces structural changes at the node level; second, whether and how the new edges brings changes at the community level; third, whether it produces artifacts that have no plausible explanation; and, fourth, how it accounts for the importance of intra-family marriages. Table 9 contains a summary of our conclusions, based on the results of “Methods and results”.

Table 9 Summary of features of the three methods we have introduced for accounting for intra-family marriages

Looking at the checklist shown in Table 9, the “duplicated nodes” method of “Method 3: “Duplicated node”” stands out above the other two. First, it does not introduce structural changes at the node level, since the “replica” nodes are structurally equivalent to the original ones, whereas the new nodes in the “split family method” are not. Second, it does introduce changes in the community, due to the new edges, but this effect is intentional and obviously introduces global structural changes, as we have seen in the previous section. This result is also achieved by the “split family method”, but not by the “new nodes” method which, by adding a single weighted edge for each family with intra-marriages, fails in this respect. Finally, what really distinguishes the “duplicated nodes” method from the other two is that it does not create unexplained artifacts: the new nodes it introduces can be interpreted as other representatives of the family. In practice, the duplicated nodes method takes into account the fact that families with some amount of intra-family marriages are, by custom or law, large enough to have more than one “actor” or to have agency in several directions. However, the two nodes of the same family will be, from the analytic point of view, indistinguishable from each other, and externally considered the same, which is why they have exactly the same values for all centrality measures. The fact that they are linked to each other also accounts for their intra-family marriages, and explains how the latter contribute to family cohesion.

The new nodes method (the “Method 1: “New nodes””), by contrast, introduces other nodes that are only linked to the original ones, and thus have no real interpretation, and the “split nodes” method (the “Method 2: “Split families””), by dividing every family along gender lines, treats the male and female parts of the family as separated agencies and introduces the undesirable artifact that there might be two nodes of the same family that are not linked to each other, as only families with self-loops have such a link. Moreover, with the “split family” method, measures of centrality are structurally different for female and male nodes of the same family, and thus impossible to ground in the family social reality, as explained above.

All three methods proposed are heuristic, and their validity can only be ascertained post hoc, by empirically verifying whether they better represent the social or historical status and position of the families. Of course, self-links could be also investigated in other ways outside the field of social network analysis; for instance, we could simply look at the percentage of intra-family marriages versus inter-family marriages. However, if we acknowledge that marriages form social links, social network analysis offers the researcher quantitative insights at the actor (family) and meso (community) level that would otherwise not be available, and excluding intra-family marriages from it could lead to quantitative errors that are difficult to overcome.

Conclusion

Intra-family ties are an important part of the dynamics of marital networks; however, they have so far rarely been taken into account when computing centrality measures. In this paper, we propose alternative ways to incorporate these intra-family ties into the graph representing the marriage network; all of the methods suggested are compatible with the use of existing social network analysis software.

After empirically testing three alternative methods with two (very different) marriage datasets, one from the Republic of Venice and the other from Taiwanese elite families across the 19th–20th centuries, we conclude that the most meaningful way to introduce self-loops is the “duplicated nodes” method. This creates a “replica” node for each family with non-zero internal marriages and connects it to the original node with an edge whose weight is equal to the number of internal marriages in the family; it also connects the “replicas” node to all other nodes to which the original node was connected. The advantage of this method is that it makes it possible to estimate the influence of intra-family marriages into the structure of the whole network, without producing artifacts. These “replica” nodes share the same actor-level measures with the node they mirror and have a well-founded meaning, as they can be interpreted as “another representation” of the family.

We show that the “duplicated nodes” method is able to create family rankings, with respect to the three main centrality measures we consider that better reflect the family social status, as reported by the original paper in which the two networks were first introduced. This is also the case with a very small percentage of self-loops, whether they are calculated with respect to the total number of marriages or the total number of families.

Another advantage of the method is that it is not computationally intensive and can be performed either manually, by manipulating the spreadsheet, or using any data-oriented scripting language such as R or Python. In the near future, we plan to publish a library, using the language R and possibly other languages, to allow this method to be easily used, so that as more marital network datasets become available, it will be possible to study their general dynamics even in the presence of extensive intra-family ties.

An additional improvement to this method would be to adjust the weights of the new edges, so that we are able to approximate, in the case of EV centrality, its value computed using self-loops. That way we could validate numerically, as well as qualitatively, the results obtained. Additional validation can be reached by applying the duplicated nodes method to other networks with self-loops, such as the ones mentioned in the “Introdcution” and “A survey of marriage networks”.