Fair workload distribution for multi-server systems with pulling strategies

doi:10.1016/j.peva.2017.04.005

Performance Evaluation

Volume 113, August 2017, Pages 26-41

https://doi.org/10.1016/j.peva.2017.04.005 Get rights and content

Abstract

We consider systems with a single queue and multiple parallel servers. Each server fetches a job from the queue immediately after completing its current work. We propose a pulling strategy that aims at achieving a fair distribution of the number of processed jobs among the servers. We show that if the service times are exponentially distributed then our strategy ensures that in the long run the expected difference among the processed jobs at each server is finite while maintaining a reasonable throughput. We give the analytical expressions for the stationary distribution and the relevant stationary performance indices like the throughput and the system’s balance. Interestingly, the proposed strategy can be used to control the join-queue length in fork-join queues and the analytical model gives the closed form expression of the performance indices in saturation.

Introduction

The problem of parallel job scheduling has been widely studied in the literature with the aim of improving some performance indices, such as the throughput, the response time, the fairness or a combination of these indices. One can distinguish two basic approaches to the problem depending on the phase at which the dispatcher is placed. In the model depicted in Fig. 1(A), the dispatcher decides how to assign a job to a server according to some scheduling discipline which takes into account the queueing state and other information about the servers. In contrast, the model depicted in Fig. 1(B) stores the jobs in a shared queue and these are assigned by the dispatcher to a server as soon as it becomes available. The scheduling policy can rely on a pushing strategy, i.e., the dispatcher takes the initiative to send a job to a specific server or on a pulling strategy, i.e., the servers decide autonomously to fetch a job from the shared queue. In queueing theory, these models have been widely investigated and the impact of the scheduling discipline on the performance indices is well-known.

In this paper we study the model depicted in Fig. 1(B) relying on a pulling strategy. In contrast to prior works, we focus on the balance of the total number of jobs served by a set of $K$ identical servers. Informally, we propose a stateless scheduling discipline implemented by the servers so that the difference between the number of jobs served by each unity is finite in steady-state. We stress on the fact that we aim at maintaining finite the absolute difference among the jobs served by each server and not the difference of proportions of the total number of served jobs (which would be easily achieved by simple strategies like the random one). The proposed rate adaptation policy may find application in the regulation of the join-queue length in fork-join systems as discussed later on.

The contributions of the paper can be summarised as follows: (a) We propose a scheduling discipline based on a server rate-adaptation algorithm. Informally, each server maintains a variable to store the difference between the total number of jobs served by itself and a neighbour. Given just this piece of information, the server may decide to slow down its maximum service speed in order to reduce this difference. As soon as the server finishes the service of its job, it fetches a new one from the queue. (b) We study two rate-adaptation strategies. The first one, named bimodal strategy, uses only two distinct service rates: the highest is used when a server has served less jobs than its neighbour, while the slowest is used otherwise. The second rate-adaptation policy, named proportional strategy, requires a server to reduce a fixed maximum service rate in proportion to the number of extra jobs it has served with respect to its neighbour. (c) We propose a Markovian model for such a scheduling discipline and for both the rate-adaptation strategies described above. We show that, despite the little knowledge that each server has about the state of the system, we can derive a necessary and sufficient condition for the job balance index to have finite expectation in the bimodal strategy whereas in the proportional strategy the job balance is unconditionally finite. Finally, we compare the performance indices obtained by the application of the two rate-adaptation policies. (d) We derive the exact expressions for two relevant performance indices: the system’s throughput and the balance index. The latter measures the differences among the total number of jobs processed by each server, hence low values imply a well balanced system. Although the Markov process underlying the models has an infinite state space, these expressions involve finite sums derived from the evaluation of hypergeometric functions.

Our findings show that maintaining reasonable low values for the balance index reduces the throughput at around 70% of the maximum. More interestingly, numerical evidences show that this value scales slowly with the number of servers, which means that the rate adaptation policy scales well with the system’s size. From a theoretical point of view, to analyse our model we resort to the notion of $ρ$ -reversibility. Indeed, we show that although the model is in general not reversible, it satisfies the Kolmogorov’s criteria for the $ρ$ -reversibility [1] (and also the dynamic reversibility [2]) allowing us to derive an analytical product-form expression for the invariant measure. This expression differs from the common product-form since the associated process is not obtained by composition of simpler components as, e.g., in [3], [4], [5].

Related work. There are several recent papers addressing the problem of parallel job scheduling which can be classified according to the system structure that they consider and the performance indices that they aim to optimise. In many cases, the scheduling problem aims at minimising the response or queueing time such as in [6], [7], [8] or optimising more complicated indices which may involve the notion of job-value as in [9]. In contrast, we propose a solution which aims at minimising the difference between the total number of jobs served by each server in steady-state. This result can be used to regulate the join-queue length in fork-join queues. Fork-join queueing systems have been widely studied for the performance evaluation of distributed systems and operating systems [10], [11], [12], [13], [14]. Models with numerically tractable stationary distributions are mostly based on product-form stochastic Petri nets (see, e.g., [15], [16], [17]) while several other works have addressed the problem of providing approximations or bounds for these queueing models (see, e.g., [11], [17]). In fork-join stations a job is split into several tasks which are served by independent servers. The job is considered served once all the parallel servers have completed their tasks and their resulting computations are joined. Examples of such a computational approach include the parallel processing of Big Data within the MapReduce framework [18], RAID disks, parallel processing systems with horizontal decomposition [19]. Moreover, differently from [20], [8], our results are not based on an asymptotic analysis of the model but they are exact for any finite number $K \geq 2$ of servers. In [21] we proposed a rate adaptation policy for fork-join queues that corresponds to the proportional strategy illustrated here. For the sake of comparing the bimodal and the proportional models in Section 5 we just state the performance indices of the proportional strategy. Notice that we also give an expression for the invariant measure of the underlying Markov process for any rate adaptation policy, which is a novel contribution.

Structure of the paper. In Section 2 we describe the scheduling algorithm and in Section 3 we propose a Markovian model for its analysis. In Sections 4 The bimodal model, 5 The proportional model we study two possible implementations of the proposed scheduling strategy and derive their performance indices. In Section 6 we compare the performance indices obtained from the analytical models with the simulation outcomes obtained by relaxing some of the hypothesis. Section 7 concludes the paper.

Section snippets

Algorithm description

We consider the system depicted in Fig. 1(B) consisting of $K$ identical servers fetching jobs from a shared unbounded queue. Our goal is to define an algorithm to dynamically regulate the service rate of each server in order to balance the total number of jobs served by each unit.

The algorithm we propose is stateless, in the sense that each node has a very limited information about the global system. Let us label each of the $K$ servers by a positive natural number $k \in [1, K]$ and define the successor

Markovian model

In this section we propose a Markovian model for the algorithm described in Section 2 which is based on two assumptions: (1) The system has always waiting jobs, i.e., the throughput depends only on the service rates of the servers; (2) The job sizes are modelled by independent and identically distributed exponential random variables. As a consequence, since each server can dynamically adjust its own service rate depending on its state $n_{k}$ , the service times at each server $k$ are independent and

The bimodal model

In this section we study the model whose underlying CTMC is $X_{K} (t)$ and the transition rates are defined as: $λ (n_{k}) = {\begin{cases} η & if n_{k} \geq 0 \\ μ & if n_{k} < 0 \end{cases}$ where $k \in {1, \dots, K}$ . Intuitively, each server $k$ can work at two different rates corresponding to the cases in which it has served less or more jobs than the server $k^{+}$ . We prove that for all finite $K$ a necessary and sufficient condition for the ergodicity of $X_{K} (t)$ is that $η < μ$ , i.e., any server $k$ has to work faster if it has processed less jobs than $k^{+}$ .

The proportional model

We consider the case in which each server $k \in {1, \dots, K}$ can fetch a job from the queue with a maximum rate $ζ$ but it may decide to slow down its service rate according to its internal state, i.e., the value of $n_{k}$ . We study the case in which function $λ (n_{k})$ is defined as: $λ (n_{k}) = {\begin{cases} \frac{ζ}{(n_{k} + 1)} & if n_{k} \geq 0 \\ ζ & if n_{k} < 0 . \end{cases}$ The Markov chain underlying this model has been previously considered in [21], hence we omit the proofs and just state the main results for the sake of comparing the performance indices of the

Sensitivity analysis

In this section we compare the performance measures obtained from the analytical models based on the Flatto–Hahn–Wright assumptions (i.e., the arrivals follow a Poisson distribution and the service times are i.i.d. exponential random variable) with the simulation outcomes obtained by relaxing some of the hypothesis. Specifically, we are interested in studying the impact of the service time distribution on the performance indices. All the simulations consist of 15 independent runs and the warm

Conclusion

In this paper we have proposed an algorithm for balancing the total number of jobs performed by each of a set of $K$ identical servers. The servers use a small amount of information to adapt their service rates in order to maintain the difference between the total number of served jobs small. We have defined a $ρ$ -reversible CTMC to study the stationary distribution and the performance indices for two rate-adaptation policies, named bimodal and proportional. For both these strategies we have

Andrea Marin is an assistant professor of Computer Science at the University Ca’ Foscari of Venice since 2011. He received his Ph.D. degree in Computer Science in 2007 from the same university. His main research interests include stochastic modelling of computer and communication systems for performance and reliability analysis, queueing theory, and models with product-form solutions. He has contributed in developing a probabilistic calculus for the formal analysis of wireless ad-hoc networks.

References (27)

A. Marin et al.
On the relations between Markov chain lumpability and reversibility
Acta Inf.
(2016)
F. Kelly
Reversibility and Stochastic Networks
(1979)
F. Baskett et al.
Open, closed, and mixed networks of queues with different classes of customers
J. ACM
(1975)
E. Barbierato et al.
Exploiting product forms solution techniques in multiformalism modeling
Electron. Notes Theor. Comput. Sci.
(2013)
J. Hillston et al.
Contextual lumpability
M. Harcol-Balter
Task assignment with unknown duration
J. ACM
(2002)
E. Hyytiä et al.
M/M/1-PS queue and size-aware task assignment
Perform. Eval.
(2011)
Y. Lu et al.
Join-idle-queue: A novel load balancing algorithm for dynamically scalable web services
Perform. Eval.
(2011)
S. Doroudi et al.
Value driven load balancing
Perform. Eval.
(2014)
F. Alomari et al.
Efficient response time approximations for multiclass fork and join queues in open and closed queuing networks
IEEE Trans. Parallel Distrib. Syst.
(2014)

R. Chen

An upper bound solution for homogeneous fork/join queueing systems

IEEE Trans. Parallel Distrib. Syst.

(2011)

R.J. Chen

A hybrid solution of fork/join synchronization in parallel queues

IEEE Trans. Parallel Distrib. Syst.

(2001)

J.C.S. Lui et al.

Computing performance bounds of fork-join parallel programs under a multiprocessing environment

IEEE Trans. Parallel Distrib. Syst.

(1998)

Cited by (0)

Sabina Rossi received her Ph.D. in Computational Mathematics and Informatics from the University of Padova in 1994. She is Associate Professor of Computer Science at the University Ca’ Foscari of Venice since Nov. 2012. Formerly she has been Assistant Professor at Ca’ Foscari (2000–2012), visiting professor at Universitè Paris 7 (2007) and research fellow at the Universitè Catholique de Louvain-la-Neuve, Belgium (1997). She is the (co-)author of over 50 technical papers in refereed international journals and conference proceedings. Her current research focuses on the development of formal tools for the analysis and verification based on process algebraic techniques and, specifically, on stochastic process algebras. Sabina Rossi has been an invited speaker at IFIP Theoretical Computer Science TCS 2010 and has been in the program committees of various international conferences and workshops.

View full text

Performance Evaluation

Fair workload distribution for multi-server systems with pulling strategies

Abstract

Introduction

Section snippets

Algorithm description

Markovian model

The bimodal model

The proportional model

Sensitivity analysis

Conclusion

On the relations between Markov chain lumpability and reversibility

Acta Inf.

Reversibility and Stochastic Networks

Open, closed, and mixed networks of queues with different classes of customers

J. ACM

Exploiting product forms solution techniques in multiformalism modeling

Electron. Notes Theor. Comput. Sci.

Contextual lumpability

Task assignment with unknown duration

J. ACM

M/M/1-PS queue and size-aware task assignment

Perform. Eval.

Join-idle-queue: A novel load balancing algorithm for dynamically scalable web services

Perform. Eval.

Value driven load balancing

Perform. Eval.

Efficient response time approximations for multiclass fork and join queues in open and closed queuing networks

IEEE Trans. Parallel Distrib. Syst.

An upper bound solution for homogeneous fork/join queueing systems

IEEE Trans. Parallel Distrib. Syst.

A hybrid solution of fork/join synchronization in parallel queues

IEEE Trans. Parallel Distrib. Syst.

Computing performance bounds of fork-join parallel programs under a multiprocessing environment

IEEE Trans. Parallel Distrib. Syst.