Task Scheduling And Processor Assignment

An important factor in the performance of a parallel system, is how the computational load is mapped onto the compute nodes in the system. Ideally, to achieve maximum parallelism, the load must be evenly distributed across the compute nodes. The problem of statically mapping the workload of a parallel algorithm to the compute nodes in a distributed memory system, has been studied under different problem models, such as [1, 2]. These static mapping policies do not model applications consisting of a sequence of tasks (algorithms), where the output of one task becomes the input to the next task in the sequence.

Optimal use of resources is particularly important in high-performance embedded applications due to limited resources and other constraints such as desired latency or throughput [3]. When several parallel tasks need to be executed in a pipelined fashion, tradeoffs exist between assigning compute nodes to maximize the overall throughput and assigning compute nodes to minimize a single data set's response time (or latency.) The throughput requirement says that when allocating processors to tasks, it should be guaranteed that all the input data sets will be handled in a timely manner. That is, the processing rate should not fall behind the input data rate. The response time criteria, on the other hand, require minimizing the latency of computation on a particular set of data input.

Reference:

1: M. Berger and S. Bokhari, ``A Partitioning Strategy for Nonuniform Problems on Multiprocessors,'' IEEE Trans. on Computers, 36(5):570-580, May 1987.
2: F. Berman and L. Snyder, ``On Mapping Parallel Algorithms into Parallel Architectures,'' Journal of Parallel and Distributed Computing, 4:439-458, 1987.
3: A. Choudhary, B. Narahari, D. Nicol, and R. Simha, ``Optimal Processor Assignment for Pipeline Computations,'' IEEE Trans. on Parallel and Distributed Systems, April, 1994.