Blockchain Transaction Analysis Using Dominant Sets

Awan, Malik Khurram; Cortesi, Agostino

doi:10.1007/978-3-319-59105-6_20

Malik Khurram Awan¹⁶ &
Agostino Cortesi¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10244))

Included in the following conference series:

IFIP International Conference on Computer Information Systems and Industrial Management

2762 Accesses
9 Citations

Abstract

Blockchain is an emerging backbone technology behind different crypto-currencies. It can also be used for other purposes and areas. There are different scalability issues associated with blockchain. It is important to know the in depth structure of blockchain by identifying common behaviors of the transactions and the effect of these behaviors on the nodes of the network. Dominant set approach can categorize the blockchain transactions into different clusters without mentioning number of clusters in advance. The experimental evaluation of blockchain transactions shows better clustering accuracy of dominant set approach than existing method of central clustering approach.

You have full access to this open access chapter, Download conference paper PDF

Behavior pattern clustering in blockchain networks

Article 26 January 2017

Butian Huang, Zhenguang Liu, … Qinming He

Clustering Blockchain Data

Exploring Blockchain Mining Mechanism Limitations

Keywords

1 Introduction

Blockchain is a technology that is working behind bitcoin. The rising adoption and promising security of blockchain technology has the potential to reshape the current infrastructure of different business areas. In bitcoin, blockchain is a decentralized ledger containing the complete transaction history which is public at each node. All transactions are locked with the information of time, date, nodes and the amount. Nodes generate transactions and broadcast them to the network. Miners generate new blocks by solving proof-of-work and broadcast them to the network.

Blockchain analysis under different clustering heuristics can help to study its in-depth structure and to know the behavior of nodes and associated transactions. The clustering results obtained from these heuristics can also be useful for backend designers of the blockchain technology. These days achieving scalability in blockchain is a heated topic. Many researchers have presented different approaches and ideas for the scalability of blockchain to get better throughput in minimum latency. The recent research work [3, 5, 7, 10, 12, 15, 20, 24] of past two years about scalability issues of blockchain model it is shown that blockchain incur storage, latency, security, processing and bandwidth problems [5, 12, 15, 20, 24]. Different researchers have carried out research activities by introducing some new proposals and improving the existing mechanism of blockchain in different crypto-curriencies [1, 4, 8, 9, 11, 13, 14, 16, 19, 25]. In [18] the authors discovered that by making the clustering of non-real account names of users according to their shared ownership and associated real names can give the clearer picture of the cryptocurrency. In [17] transaction volume per node is presented with clustering coefficients per node in the bitcoin network.

Inside blockchain network, there are thousands of transactions which are increasing over time. To manage and organize this increasing number of transactions it is important to study their common behavior. Clustering these increasing number of transactions can help to find out the transactions with common behavior and to identify and trace anomalous transactions.

This paper is a step towards it by using the dominant set approach for blockchain transactions that automatically categorize the blockchain transactions into different clusters without specifying number of clusters in advance. The percentage accuracy of experimental results show better performance of dominant set approach than existing method of central clustering approach.

In next Sect. 2 we highlight the basic terminologies and algorithmic structure of dominant set approach. Section 3 shows experimental results, evaluation measures and the results summary. Conclusion is discussed in Sect. 4.

2 Background

Generally clustering can be categorized into two types: central and pairwise. Central clustering is a feature based clustering. K-means algorithm is a type of feature based clustering. Pairwise clustering is a graph based clustering and a more general approach and adaptable in the sense that the algorithm works fine with wider range of input forms.

Dominant set clustering framework has been defined and introduced in [22, 23]. The clustering method of choice is dominant sets, a pairwise clustering method that generalizes the idea of maximal clique to weighted graphs. It works with pairwise similarities i.e. metric embedding is not needed. For an undirected unweighted graph G = (V, E), clique can be defined as a subset of mutually adjacent nodes where $V=\{1,2 \ldots ,n\}$ denotes vertex set and $E\subseteq V\times V$ represents the edge set. The key notions and definitions of the dominant set approach are discussed below:

For a non-empty set of vertices $S\subseteq V$ and $i\in V$. The average weighted degree of node i w.r.t set S is defined as:

$$\begin{aligned} AWDeg_S(i)= \frac{1}{|S|} \sum \limits _{j\in S} w_{i,j} \end{aligned}$$

(1)

In case of relative similarity $\phi _S(i,j)$, when node j does not belong to set S as shown in Fig. 1. We can measure the similarity between nodes i and j w.r.t the average similarity between node i and its neighbors in S which is expressed as:

$$\begin{aligned} \phi _S(i,j) = w_{i,j} - AWDeg_S(i) \end{aligned}$$

(2)

For set $S \subseteq V$ and $i \in S$ we can compute the weight of node i w.r.t S:

$$\begin{aligned} W_S(i)=\left\{ \begin{array}{ll} 1 &{} if \, |S| \, = \, 1 \\ \sum \limits _{j\in S\backslash \{i\}} \phi _{S\backslash \{i\}}(j,i)W_{S\backslash \{i\}}(j) &{} \text {otherwise} \end{array} \right. \end{aligned}$$

(3)

Total weight of the set S can be calculated by adding up all weights of $W_S(i)$ of the above mentioned recursive function.

$$\begin{aligned} W(S)=\sum \limits _{i\in S} W_S(i) \end{aligned}$$

(4)

Further in [22, 23] set S is specified as dominant set by defining the internal and external criteria of clustering as:

I. $W_S(i)>0$ for all $i\in S$

II. $W_{S\cup \{i\}}(i)<0$ for all $i\notin S$

The authors in [22, 23] transformed the combinatorial problem of identifying dominant set in graph into quadratic optimization problem and the use of dynamic system of evolutionary game theory to solve it. Which is a general-form of a problem in graph theory known as Motzkin-Straus problem [21]. The findings in [22, 23] reveal that there is a one-to-one correspondence between dominant sets and the strict local maximizer of the problem.

$$\begin{aligned} max \quad x^TWx \qquad subject.to \quad x \in \varDelta \end{aligned}$$

(5)

where $\varDelta _n = \{\bar{x}\in R_+^n :\bar{x}\ge \bar{0} , \bar{e}^T\bar{x}=1 \}$ is the standard simplex. The weighted characteristic vector of a dominant set S, which is a strict local solution of the problem (5) is defined as:

$$ x^S = \left\{ \begin{array}{ll} \frac{W_S(i)}{W(S)} &{}\ if\, i \in \, S;\\ 0 &{}\ {otherwise}.\end{array} \right. $$

The Motzkin-Straus problem is generalized by creating 1-to-1 correspondence between dominant set and strict local solution of the problem (5) in a conversely way by explaining that if $x^*$ is a strict local solution of the problem (5) then its support $ \sigma (x^*)=\{i \in V : x_i^*\ne 0\}$ is a dominant set provided that $w_{\sigma \cup \{i\}}\ne 0 \forall i\ne \sigma $.

For finding the local solution of the above mentioned quadratic problem (5) we use replicator dynamics that originates from evolutionary game theory. The following evolutionary step is adopted.

$$\begin{aligned} x_{i}(t+1) = x_{i}(t) \frac{(AX)_{i}}{X(t)^{T}AX(t)} \end{aligned}$$

(6)

For $i = 1\ldots n$, all trajectories that start within standard simplex $\triangle $ will remain in the simplex $\triangle $ for any number or iterations of (6). Let A be a non-negative symmetric matrix so the objective function will strictly increase along any non-constant trajectory of (6). The asymptotically stable points correspond to dominant set for the similarity matrix A. With an affinity matrix A of graph G the abstract algorithmic structure of dominant set clustering approach is:

3 Results and Evaluation

This section highlights the detail description about performed experiments mentioned in Sect. 3.1 and the results comparison with different evaluation measures in Sect. 3.3. The experimental evaluation summary is mentioned in Sect. 3.4.

3.1 Experimental Settings

We have selected a sample of 2,048 vectors of blockchain transaction data. The pairwise distance between two set of observations has been computed by using standardized euclidean distance similarity function. The transaction data has been experimented in matlab with different parameters and settings. Some graph structures have been studied with gephi graph visualization tool. The machine used to perform experiments and compute the clustering results is a standard computer with 2.3 GHz Intel Dual-core with 3GB RAM.

3.2 Experimental Results

Dominant set approach and central clustering approach has been applied on standard blockchain transactions data. With K as number of clusters, the sample output of first 2,048 vectors of transaction edges are shown in Fig. 2. With dominant set approach, only 2 required clusters of transaction edges are extracted out of the whole graph. Transactions edges that show ‘similar behavior’ are in one cluster(blue) and the transaction edges that show ‘different behavior’ are in another cluster(green) as shown in Fig. 3. With K-means approach, we have extracted 2 clusters(red and blue) from the whole transaction graph which is actually a partitioning of the whole graph as shown in Fig. 4.

3.3 Evaluation Measures and Comparison

There are different measures for comparing clustering results. Clustering results of blockchain transactions are evaluated by computing silhouette value and rand index value. Silhouette values of results are computed to know about well-separatedness of clusters and their average is calculated to get better quantitative analysis view. Silhouette value plots are shown in Figs. 5 and 6.

Table 1 shows average silhouette value computed for range of values of K for both approaches as shown in Fig. 7

Table 1. Average silhouette values for dominant set and K-means clustering from 3 to 7

Full size table

Rand index is usually used to see the accuracy or measure of similarity between two clusterings. Mathematically rand index is defined as:

For analysis and evaluation 6 clusters have been selected for the calculation of percentage accuracy and rand index for both approaches as shown in Tables 2 and 3. The plotted results are shown in Figs. 8 and 9.

Table 2. Rand index and accuracy for dominant set clustering from 1 to 6

Full size table

Table 3. Rand index and accuracy for K-means clustering from 1 to 6

Full size table

From Fig. 8 rand index plot shows that at around cluster 3 the slope is getting lower in dominant set approach. The result accuracy plot as shown in Fig. 9 shows that the percentage result accuracy of clustering results of dominant set is better than K-means approach that shows less spike at around cluster 4.

3.4 Summary of the Results

The experimental evaluation explains that: (1) dominant set method performs well for 2,048 vectors of blockchain transaction data (2) transactions with ‘similar behavior’ are clustered without specifying number of clusters in advance (3) with dominant set approach transactions subgraph with ‘similar behavior’ is extracted out of the main graph, and (4) percentage result accuracy shows better clustering results in dominant set method than feature based approach when number of clusters increases.

3.5 Constraints and Complexity

Some constraints related to experimental results are: (1) smooth running of experiments mainly depend on size of the blockchain transactions selected, and (2) hardware specification. The standard approach used here to find dominant set is replicator dynamics. From theoretical point of view, replicator dynamics has quadratic $O(n^2)$ computational complexity in every step of the dynamics for a dataset with n points.

4 Conclusion

We analyzed blockchain transaction data with pairwise dominant set and central clustering approaches by testing and evaluating it with different measures and settings. The experimental and evaluation results show better clustering accuracy of dominant set approach than existing method of central clustering approach. The in-depth information coming out of the dominant set clustering analysis can be useful for blockchain maintainers and to accurately identify anomalous transactions in the blockchain.

From scalability point of view, further study is needed for the concept of centralized blockchain with distributed chain of dominant set labels of the transactions.

Finally, in would be interesting to investigate the impact of alternative approaches to the analysis of blockchain systems, like for instance semantics-based static analysis techniques, that already provided interesting results in security analysis of software systems [2, 6, 26].

References

Back, A., Corallo, M., Dashjr, L., Friedenbach, M., Maxwell, G., Miller, A., Poelstra, A., Timon, J., Wuille, P.: Enabling Blockchain Innovations with Pegged Sidechains (2014)
Google Scholar
Barbon, G., Cortesi, A., Ferrara, P., Pistoia, M., Tripp, O.: Privacy analysis of android apps: implicit flows and quantitative analysis. In: Saeed, K., Homenda, W. (eds.) CISIM 2015. LNCS, vol. 9339, pp. 3–23. Springer, Cham (2015). doi:10.1007/978-3-319-24369-6_1
Chapter Google Scholar
Beck, R., Czepluch, J.S., Lollike, N., Malone, S.: Blockchain-The Gateway to Trust-Free Cryptographic Transactions. In: ECIS, Turkey (2016)
Google Scholar
Bamert, T., Decker, C., Elsen, L., Wattenhofer, R., Welten, S.: Have a snack, pay with bitcoins. In: IEEE 13th International Conference on Peer-to-Peer Computing, Italy, pp. 1–5 (2013)
Google Scholar
Bonneau, J., Miller, A., Clark, J., Narayanan, A., Kroll, J.A., Felten, E.W.: SoK: research perspectives and challenges for bitcoin and cryptocurriencies. In: IEEE Symposium on Security and Privacy (2015)
Google Scholar
Cortesi, A., Ferrara, P., Pistoia, M., Tripp, O.: Datacentric Semantics for Verification of Privacy Policy Compliance by Mobile Applications. In: D’Souza, D., Lal, A., Larsen, K.G. (eds.) VMCAI 2015. LNCS, vol. 8931, pp. 61–79. Springer, Heidelberg (2015). doi:10.1007/978-3-662-46081-8_4
Google Scholar
Croman, K., Decker, C., Eyal, I., Gencer, A.E., Juel, A., Koshba, A., Miller, A., Saxena, P., Shi, E., Sirer, E.G., Song, D., Wattenhofer, R.: On Scaling Decentralized Blockchains (2016)
Google Scholar
Decker, C., Wattenhofer, R.: Information Propagation in the Bitcoin Networks. In: IEEE Thirteenth International Conference on Peer-to-Peer Computing, Italy (2013)
Google Scholar
Decker, C., Wattenhofer, R.: Bitcoin transaction Malleability and MtGox. In: Kutyłowski, M., Vaidya, J. (eds.) ESORICS 2014. LNCS, vol. 8713, pp. 313–326. Springer, Cham (2014). doi:10.1007/978-3-319-11212-1_18
Google Scholar
Decker, C., Wattenhofer, R.: A Fast and Scalable Payment Network with Bitcoin Duplex Micropayment Channels. In: 17th International Symposium, Canada, pp. 3–18 (2015)
Google Scholar
Doll, A., Chagani, S., Kranch, M., Murti, V.: Btctrackr: finding and displaying clusters in bitcoin. Princeton University, USA (2014)
Google Scholar
Eyal, I., Gencer, A.E., Sirer, E.G., Renesse, R.V.: Bitcoin-NG: a scalable blockchain protocol. In: 13th USENIX Symposium on Networked Systems Design and Implementation, USA (2016)
Google Scholar
Eyal, I., Sirer, E.G.: Majority is not Enough: Bitcoin Mining is Vulnerable. Cornell University, USA (2014)
Google Scholar
King, S., Nadal, S.: PPCoin: Peer-to-peer Crypto-Currency with Proof-of-Stake (2012)
Google Scholar
Kosba, A., Miller, A., Shi, E., Wen, Z.: Hawk: The Blockchain Model of Cryptography and Privacy-Preserving Smart Contracts. In: Cryptology ePrint Archive (2015)
Google Scholar
Kroll, J.A., Davey, I.C., Felten, E.W.: The economics of bitcoin minning or, bitcoin in the presence of adversaries. In: 12th Workshop on the Economics of Information Security (2013)
Google Scholar
Lischke, M., Fabians, B.: Analyzing the Bitcoin Network: The First Four Years (2016)
Google Scholar
Meiklejohn, S., Pomarole, M., Jordan, G., Levchenko, K., McCoy, D., Voelker, G.M., Savage, S.: A fistful of bitcoins: characterizing payments among men with no names. In: Proceedings of Internet Measurement Conference, Spain, pp. 127–140 (2013)
Google Scholar
Miers, I., Garman, C., Green, M., Rubin, A.D.: Zerocoin: Anonymous Distributed e-Cash from Bitcoin. In: IEEE Symposium on Security and Privacy (2013)
Google Scholar
Miller, A., Jansen, R.: Shadow-bitcoin: scalable simulation via direct execution of multi-threaded applications. In: IARC Cryptology ePrint Archive (2015)
Google Scholar
Motzkin, T.S., Straus, E.G.: Maxima for Graphs and a new Proof of a Theorem of Turan (1965)
Google Scholar
Pavan, M., Pelillo, M.: A New Graph-theoretic Approach to Clustering and Segmentation. In: CVPR (2003)
Google Scholar
Pavan, M., Pelillo, M.: Dominant Sets and Pairwise Clustering. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 167–172 (2007)
Google Scholar
Pazmino, J.E., Rodrigues, S.: Simply dividing a bitcoin network node may reduce transaction verification time. In: SIJ Transactions on Computer Networks and Communication Engineering, pp. 17–21 (2015)
Google Scholar
Ried, F., Harrigan, M.: An Analysis of Anonymity in the Bitcoin System. In: Security and Privacy in Social Networks, pp. 197–223 (2013)
Google Scholar
Zanioli, M., Ferrara, P., Cortesi, A.: SAILS: static analysis of information leakage with sample. In: Proceedings of the ACM Symposium on Applied Computing, SAC 2012, pp. 1308–1313 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

DAIS - Ca’ Foscari University, Venice, Italy
Malik Khurram Awan & Agostino Cortesi

Authors

Malik Khurram Awan
View author publications
You can also search for this author in PubMed Google Scholar
Agostino Cortesi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Agostino Cortesi .

Editor information

Editors and Affiliations

Bialystok University of Technology, Bialystock, Poland
Khalid Saeed
Warsaw University of Technology, Warsaw, Poland
Władysław Homenda
A.K. Choudhury School of Information Technology, University of Calcutta, Kolkata, West Bengal, India
Rituparna Chaki

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Awan, M.K., Cortesi, A. (2017). Blockchain Transaction Analysis Using Dominant Sets. In: Saeed, K., Homenda, W., Chaki, R. (eds) Computer Information Systems and Industrial Management. CISIM 2017. Lecture Notes in Computer Science(), vol 10244. Springer, Cham. https://doi.org/10.1007/978-3-319-59105-6_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-59105-6_20
Published: 17 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59104-9
Online ISBN: 978-3-319-59105-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Blockchain Transaction Analysis Using Dominant Sets

Abstract

Similar content being viewed by others