Elsevier

Performance Evaluation

Volume 116, November 2017, Pages 101-118
Performance Evaluation

Power control in saturated fork-join queueing systems

https://doi.org/10.1016/j.peva.2017.08.008Get rights and content

Abstract

The analysis of fork-join queueing systems has played an important role for the performance evaluation of distributed systems where parallel computations associated with the same job are carried out and a job is considered served only when all the parallel tasks it consists of are served and then joined. The fork-join nodes that we consider consist of K2 parallel servers each of which is equipped with two First Come First Served queues, namely the service-queue and the join-queue. The former stores the tasks waiting to be served while the latter stores the served tasks waiting to be joined. Under heavy load conditions, the variance of the service times associated with the tasks tends to cause long join-queue lengths. In this work, we propose an algorithm to dynamically control the servers’ speeds (e.g., via frequency scaling), that aims at reducing the power consumption of the servers whose join-queue lengths are longer than the others’. Under Markovian assumptions, we provide a model for the performance evaluation of the system in saturation that allows us to derive the expression for the steady-state distribution, the system’s throughput and balance index. Finally, we derive the analytical expression for the marginal state probabilities of each server and provide upper and lower bounds for the expected power consumption.

Introduction

Fork-join queueing stations have been extensively studied in the literature because of their wide applications in the context of distributed and parallel systems. Such queueing stations behave as follows: jobs arrive according to a certain arrival process and are forked into K tasks that are enqueued in the service-queues and then served by independent servers. Once a task is served, it is enqueued in the join-queue waiting for the service completions of all the other tasks of the job it belongs to. Once all the tasks of a job are served, the join operation is performed and the job leaves the system. In this work we assume that all the queues implement a First Come First Served (FCFS) discipline.

Fork-join queues have found applications in a wide variety of domains in computer science and telecommunication networks. For instance, in [1] the authors study the response times of multiprocessor systems by means of fork-join networks, in [2] the authors consider parallel communication systems and in [3] a RAID system is studied by simulating a fork-join station.

Unfortunately, despite their importance, few analytical results are known for this class of queueing systems. One of the reasons is the complexity of the model consisting of two sets of queues, the service-queues and the join-queues, and no general decomposition result is available in the state of the art [4]. Many works have considered the fork-join station under heavy traffic (see, e.g., [5]) and provided approximations of the expected response time based on the analysis of the associated reflecting Brownian motion [6]. In this scenario we observe that when K2 the join-queues tend to be very long because each served task has to wait for the completion of the slowest of its siblings (which may also be enqueued at their servers). In [7] the authors observe that such a system can be highly inefficient both because it handles long join-queues and because the servers work at maximum speed even if their join-queue lengths are very long. Therefore, significant energy saving can be obtained by slowing down the servers that have already served more tasks than others.

In this work we introduce a rate control mechanism for the station’s servers that allows us to control the join-queue lengths and to reduce the system’s power consumption. The importance of containing the size of the output buffer and reducing the energy consumption is well-known in the literature, e.g., [7], [8], [9]. In contrast with [7], we do not require the estimation of the amount of work needed by a task, but we base our algorithm on a single state variable associated with each server. We assume that each server has a neighbour defined to form a circular dependency. In our model, the neighbour of server i can be server (imodK)+1. If a server has completed a number of tasks which is lower than or equal to its neighbour then it works at its maximum speed, otherwise it reduces its speed by a certain factor. Therefore, each server has to maintain a single variable that is incremented by 1 at each local task completion, while it is decremented by 1 when a task completion occurs at its neighbour. In contrast with centralized approaches (see, e.g., MaxWeight scheduling in [10]), the allocation of the resources does not require the knowledge of the queue-lengths of the whole system, but each server aims at auto-adapting its working speed based on the knowledge of the state of its neighbour. In practice, the reduction of the server speed can be obtained via frequency scaling and this can lead to a significant reduction in the power consumption (see e.g., [11], [12], [13]). Our contribution includes an analytically tractable model of such a rate control mechanism under Markovian assumptions. We start by considering the Flatto–Hahn–Wright (FHW) model [14], [15] in saturation, i.e., the service times are modelled by independent and identically distributed (i.i.d.) exponential random variables, the join operation is instantaneous, and the service-queues are never empty. In [16] Latouche studies a system of two queues with paired customers which resembles our model for K=2. He shows that the stochastic process modelling the join-queue lengths is not positive recurrent because of the variance in the service times. Conversely, by the introduction of our rate-control mechanism we show that, for any K2, the process underlying the join-queue lengths becomes positive recurrent and their expectation is finite. It is worth of noticing that the saturation assumption plays an important role in the analysis of the behaviour of systems under stress. For instance, the saturation assumption has been widely used for the analysis of performance of wireless networks (see, e.g., [17], [18] and the references therein) since it both provides an analytical tractability of the model and results for the worst-case scenario. For the same reasons, in the analysis of fork-join systems, the heavy traffic approximation is at the base of many relevant results (see, e.g., [5], [19] and the references therein). In our case, thanks to a particular encoding of the state space, we observe that the continuous-time Markov chain underlying our model is dynamically reversible [20], [21], [22], [23] and this allows us to derive an elegant expression for the invariant measure of the process based on the product of the chain’s transition rates. Notice that this approach is different from that based on the model’s decomposition adopted by the product-form theory (see, e.g., [24], [25]) although they share the property of admitting a closed-form expression for the invariant measure. Moreover, we are able to derive an analytical expression for the system’s throughput and an index, called balance index, which measures how well-balanced are the join-queues. The stationary probabilities, the marginal stationary probabilities, the throughput and the balance index are expressed in terms of Kummer’s confluent hypergeometric functions. In general, the evaluation of such functions can be done by numerical approximations, but in our case the evaluation points are such that a closed form expression is known. The marginal stationary probabilities allow us to provide tight bounds for the system’s power consumption and hence to study the trade-off between throughput and energy saving.

Finally, we study by simulation the behaviour of our algorithm when the service times are not exponentially distributed and show the impact of the service times’ coefficients of variation (CV) on the performance indices.

The paper is an extended and revised version of the work [26]. With respect to the former version we have extended the paper by including all the proofs of the theorems and lemmas and by introducing a new performance index for the model called balance index whose expression is obtained in closed-form for any value of the number of servers K. The experimental section has been extended in order to include the sensitivity analysis for the balance index with respect to the service time distribution.

In [16] the author studies a system consisting of two queues whose arrivals are modelled according to independent Poisson processes and the service is paired, meaning that the server works only if both the queues are non-empty. It is proven that, if the arrival rates are state-independent then the Markov chain underlying the model is not positive recurrent. In [27], [28] the authors propose an approximate analysis of such a system with an arbitrary number of queues and phase-type service times. In [29] the authors extend their previous work on fork-join queueing networks in order to include join nodes and apply an approximate analysis to study their stationary performance indices based on a decomposition technique or an iterative solution of tractable models. In [14], [15] the authors introduce the so called FHW model consisting of only two exponential servers. They derive the stability conditions and propose an approximate analysis as well as some exact results on the conditional join-queue lengths. In [30] the authors provide the exact expression of the mean response time for the FHW model, when K=2 and the service times are i.i.d. exponential random variables. They also give an approximation technique to study the models with K>2. In [31], [32] the authors study the stability conditions for a set of fork-join queueing networks. In [6] the author applies a method based on heavy traffic assumption to characterize the total job count process as a multidimensional reflected Brownian motion in an orthant. In this way, numerical procedures to evaluate the performance indices of fork-join queueing nodes in stationarity can be formulated. Orders statistics have been used to solve a class of fork-join queues with block-regular structure in [33].

The work that is probably closer to ours is [7] where the authors propose to reduce the energy consumption of a fork-join station by slowing down the servers that work on tasks with lower needs. They devise a scheduling algorithm and prove an optimality property. However, in contrast with what we propose here, the method requires the estimation of the tasks’ service demands which is not always possible. In [8], [9] the authors propose an approach based on the order statistics that introduces deterministic delays at the servers aiming at reducing the task dispersion. The delays are determined so that the 100αth percentile of variability of the distributions obtained once the delays are inserted is minimized. In [34] we study a method for the fair distribution of workload among a set of servers. However, the model proposed in [34] differs from that we study here for the implementation of the rate adaptation mechanism.

The paper is structured as follows. In Section 2 we introduce the problem that we aim to address and describe the algorithm that we propose for its solution. In Section 3 we provide an analytical model for the performance evaluation of the algorithm under the assumptions of saturated station and independent exponential service time distributions. Section 4 studies the performance of the rate-control algorithm by using the results of the previous section and stochastic simulations. Finally, Section 5 gives some concluding remarks. The proofs of lemmas and theorems that were not included in the conference version are reported in Appendix.

Section snippets

Rate-control algorithm

In this section we formally introduce the problem that we are studying and the rate-control algorithm that we propose. In the following sections we study the performance of such an algorithm in terms of throughput, system balance and energy saving.

Analytical model for the rate-control mechanism

In this section we consider the FHW model equipped with our rate control mechanism, i.i.d. exponentially distributed service times, immediate join and in saturation. Let us consider vector n=(n1,,nK) of the state variables of each server, and observe that at each time epoch we have k=1Knk=0. We aim at studying the stochastic process n(t) on the state space SK={n=(n1,,nK):nkZ,k=1Knk=0}. Since the service completions are the only events that cause a state change, from the fact that they

Numerical evaluation

In this section we study the sensitivity of the performance indices with respect to the distribution of the service times. Then, we study the system’s performance in terms of throughput, balancing and energy consumption of the model implementing the rate-control algorithm under the assumptions introduced in Section 3. We consider four important performance indices: the system throughput, the balance index, the expected join-queue lengths and the power consumption. While for the first two

Conclusion

In this paper we have proposed a rate-control mechanism for fork-join stations designed to maintain the join-queue lengths finite in the long run, even when the station is saturated. We observed that the variance in the service time distribution causes an unbounded increase of the join-queue lengths. Informally, the idea behind our rate-control mechanism is to reduce the operating speed of the servers that have served more customers while maintaining at the maximum level the speed of the other

References (40)

  • NguyenV.

    Processing networks with parallel and sequential tasks: heavy traffic analysis and Browinian limits

    Ann. Appl. Probab.

    (1993)
  • T. Rauber, G. Rünger, Energy-aware execution of fork-join-based task parallelism, in: Proc. of the 20th IEEE...
  • I. Tsimashenka, W. Knottenbelt, P.G. Harrison, Controlling variability in split-merge systems, in: Proc. of Analytical...
  • TsimashenkaI. et al.

    Controlling variability in split-merge systems and its impact on performance

    Ann. Oper. Res.

    (2014)
  • SrikantR. et al.

    Communication Networks: An Optimization, Control, and Stochastic Networks Prospective

    (2014)
  • B.M. Elahi, C.L. Williamson, Autoscaling effects in speed scaling systems, in: 24th IEEE International Symposium on...
  • ElahiB.M. et al.

    Decoupled speed scaling: Analysis and evaluation

    Perform. Eval.

    (2014)
  • FlattoL. et al.

    Two parallel queues created by arrivals with two demands

    SIAM J. Appl. Math.

    (1984)
  • WrightP.E.

    Two parallel processors with coupled inputs

    Adv. Appl. Probab.

    (1992)
  • LatoucheG.

    Queues with paired customers

    J. Appl. Probab.

    (1981)
  • Cited by (11)

    View all citing articles on Scopus
    View full text