Activity pre-scheduling for run-time optimization of grid workflows

https://doi.org/10.1016/j.sysarc.2008.01.009Get rights and content

Abstract

The capability to support resource sharing between different organizations and high-level performance are noteworthy features of grid computing. Applications require significant design effort and complex coordination of resources to define, deploy and execute components on heterogeneous and often unknown resources. A common trend today aims at diffusing workflow management techniques to reduce the complexity of grid systems through model-driven approaches that significantly simplify application design through the composition of distributed services often belonging to different organizations. With this approach, the adoption of efficient workflow enactors becomes a key aspect to improve efficiency through run-time optimizations, so reducing the burden for the developer, who is only responsible of defining the functional aspects of complex applications since he/she has only to identify the activities that characterize the application and the causal relationships among them. This paper focuses on performance improvements of grid workflows by presenting a new pattern for workflow design that ensures activity pre-scheduling at run-time through a technique that generates fine-grained concurrency with a couple of concepts: asynchronous invocation of services and continuation of execution. The technique is implemented in a workflow enactment service that dynamically optimizes process execution with a very limited effort for application developer.

Introduction

Grid computing is widely adopted for a variety of applications involving intensive computation and/or massive data manipulation. These applications benefit from the execution on large-scale network computing systems, typically distributed across the Internet and composed of heterogeneous resources [1], able to share computing power and data storage capacity [4]. This potential can be exploited in different research and industrial areas, such as scientific communities that can gain great benefits from collaboration among different teams. Biologists, physicists, and earth scientists use complex applications that need a massive amount of data [2] to achieve scientific application goals [19]. However, not only scientists are interested in grid computing: business processes, engineering, government and multimedia services are all examples of potential grid applications.

Most of these applications are based on the coordination and management of resources and tasks to achieve the application goal. Programming such coordination could be tedious and difficult due to the scale, the variability and the heterogeneity of a grid system. Workflow programming, coming from the field of business process management (BPM), is emerging as an effective paradigm to define and manage processes in distributed environments spanning multiple organizations. The paradigm, in fact, aims at the separation of control logic, which defines the steps of the process to be fulfilled to reach a goal, from application logic, used to manage the resources and to execute the tasks on them. This represents a peculiar feature to easily coordinate resources in grid environment, as demonstrated by the increasing research focus on this topic and by the existence of the grid workflow forum [19], an open forum about workflow in grid computing.

However, differently from BPM, grid computing is more concerned with performance optimization of application executed as workflow. This could be obtained by acting at different abstraction layers in the workflow management system (WfMS): at design-time or at run-time. The former regards optimization performed during process definition. The latter may be further divided in three types according to the phase at run-time in which they occur: enactment time, when activities are scheduled for the assignment to adequate resources, at binding time, i.e., when resources are selected and activities assigned to them for the execution, and at execution time, when the resources are engaged in executing the scheduled activities [19].

While a lot of research in grid computing has been focused on the optimization of resource selection and local scheduling triggered by the availability of the selected resources, a little or no effort has been devoted to the optimization at enactment time. At this level, workflow’s sequential constraints between activities could be relaxed to improve the concurrency and consequently the degree of parallelism when the workflow is executed in the grid.

Even though human intervention in the process definition can help to augment concurrency at design-time, automatic optimization is a desirable feature to keep simple the process description. Such optimization should be suggested by declarative labels provided by designers during the inception and design phases, so reducing additional effort. To accomplish this objective we propose a declarative mechanism that allows designers to simplify workflow design by specifying that some workflow tasks’ execution could be anticipated with respect to the causal order specified by a regular transition, moving at run-time the actual choices on task scheduling to improve workflow concurrency. The mechanism is based on two programming concepts required by the execution environment: anticipation and continuation. They could be implemented directly in the workflow engine or provided by a grid middleware exploited by the engine.

This paper discusses the adoption of such a mechanism, presents the architecture of a WfMS that supports these features and analyzes the consequent impact on computation and data transfer for some micro-benchmarks and typical grid workflows. To this end, the paper is organized as follows. In Section 2, a reference architecture of a WfMS is presented with the aim of identifying the aspects of a workflow that can be optimized. Section 3 discusses fine-grained concurrency, the way to achieve it in workflows, the language adopted to describe workflows and activity anticipation, and its impact on the implementation of a prototypical workflow engine. Section 4 evaluates some experimental results to provide a theoretical interpretation of performances. Section 5 analyses related work in grid workflow management. Section 6 concludes this paper and outlines future activities.

Section snippets

Background

According to the reference model proposed by the workflow management coalition (WfMC) [17], the operations of a WfMS are divided into two main areas: build-time and run-time. The former regards the design and definition of the process and the abstract definition of the tasks. The latter concerns the enactment of the activity, the selection of the most suitable resource for the activity, the execution and monitoring of the process. This feature usefully separates the conceptual problem of the

Workflow fine-grained concurrency

Considering the potential optimizations on the scheduling dimension, this section introduces the comprehension of fine-grained concurrency in workflow execution, the achievable advantages of its implementation in workflow enactment services and the overall impacts on a workflow management system.

Performance analysis

In order to assess the quantitative and qualitative characterization of the potential performance gain, we have conducted some experiments. Firstly, we aimed at identifying a theoretical performance framework, then we observed the behaviour of a real and more complex applications.

Related work

Bonita [15] represents one of the first examples of design-time approach for implementing activity anticipation. It is a cooperative workflow system, which enacts processes that manage human collaboration. Bonita is able to execute processes in two ways: traditional or flexible. The former executes the process according to the WfMC reference model definition [17], whereas the flexible execution is able to perform activity anticipation by specifying this through a boolean attribute of the

Conclusions and future works

This paper addressed the problem of workflow process scheduling improvement to reach high performance in grid environments with a low cost for workflow designers. The system proposed implements techniques derived from concurrent languages: asynchronous calls and symbolic references. In particular, we defined a new workflow pattern, Early Start, that allows the designer to label a set of activities as “early”, without the needs to further analyse the implementation, easing design and simplifying

Acknowledgements

The work described in this paper is framed within the CoreGRID Network of Excellence funded by the European Commission and the ArtDeco Project funded by the Italian Ministry of Research (MIUR). We also thank Ester Giallonardo and Franca Perrina for their support in the implementation of the system and for the fruitful discussions on the topic.

Giancarlo Tretola graduated in Computer Engineering in 2003. He received a Ph.D. in Computer Engineering from the Department of Engineering of the University of Sannio in 2007. His current research interests include Service Oriented Computing, Workflow Management, and Grid Computing.

References (30)

  • K. Krauter et al.

    A taxonomy and survey of grid resource management systems for distributed computing

    Software: Practice and Experience (SPE)

    (2002)
  • J. Yu et al.

    A taxonomy of workflow management systems for grid computing

    Journal of Grid Computing

    (2005)
  • R. Buyya, M. Murshed, D. Abramson, A deadline and budget constrained cost-time optimization algorithm for scheduling...
  • I.T. Foster, C. Kesselman, Steven Tuecke, The anatomy of the grid – enabling scalable virtual organizations, in:...
  • Qifeng Huang, Yan Huang, Workflow engine with multi-level parallelism supports, in: Fourth All Hands Meeting, 19–22nd...
  • Dragos-Anton Manolescu, Workflow enactment with continuation and future objects, in: OOPSLA 2002, 2002, pp....
  • N. Ranaldo, E. Zimeo, An Economy-driven mapping heuristic for hierarchical master-slave applications in grid systems,...
  • R. Kowalski

    Algorithm = logic + control

    Communications of the ACM

    (1979)
  • M. Snir et al.
    (1998)
  • Laurent Baduel, Francois Baude, Denis Caromel, Object-Oriented SPMD, INRIA, CNRS, University of Nice Sophia-Antipolis,...
  • G. Tretola, E. Zimeo, Workflow fine-grained concurrency with automatic continuations, in: Proceedings of the IEEE IPDPS...
  • G. Tretola, E. Zimeo, Activity pre-scheduling in grid workflow, in: Proceedings of the 15th Euromicro International...
  • G. Tretola, E. Zimeo, Client-side implementation of dynamic asynchronous invocations for web services, in: Proceedings...
  • G. Tretola, E. Zimeo, Extending web services semantics to support asynchronous invocations and continuation, in:...
  • Bonita, Workflow Cooperative System....
  • Cited by (4)

    Giancarlo Tretola graduated in Computer Engineering in 2003. He received a Ph.D. in Computer Engineering from the Department of Engineering of the University of Sannio in 2007. His current research interests include Service Oriented Computing, Workflow Management, and Grid Computing.

    Eugenio Zimeo graduated in Electronic Engineering at the University of Salerno (Italy) and received the PhD degree in Computer Science from the University of Naples in 1999. Currently he is an assistant professor at the University of Sannio in Benevento (Italy), where teaches courses on Computer Networks, Web Technology and Distributed Systems. His primary research interests are in the areas of software architectures and frameworks for distributed systems, high performance middleware, service oriented computing and grid computing, wireless sensor networks and mobile computing. He has published about 60 scientific papers in journals and conferences of the field and leads many large research projects. He is a member of the IEEE Computer Society.

    View full text