The system model for the type of STAP applications considered in this work is shown in Figure 1. This model is suitable for the computational characteristics found in these applications. A pipeline is a collection of tasks which are executed sequentially. The input to the first task is obtained normally from sensors or other input devices and the inputs to the rest of the tasks in the pipeline are the outputs of their previous tasks. The set of pipelines shown in the figure indicates that the same pipeline is repeated on subsequent input data sets. Each block in a pipeline represents one parallel task, which itself is parallelized on multiple (different number of) compute nodes.
From a single task point of view, the execution flow consists of three phases: receive, compute, and send phases, shown in Figure 1 In the receive and send phases, communication involves data transfer between two different groups of compute nodes. It also involves message packing in the send phase and unpacking in the receive phase. Data redistribution strategy plays an important role in determining the communication performance. In the compute phase, work load is evenly partitioned among all compute nodes assigned in each task to achieve the maximum efficiency. For the parallel systems with multiple processors in each compute node, multi-threading technique can be employed to further improve the computation performance. We will discuss the implementation of multiple threads in our parallel pipeline system later in this paper.