Introduction to parallel computing for scientists and engineers. Shared memory parallel architectures and programming, concepts using shared address space, locks, events, barriers, loop scheduling, compiler directives such as DOALL, portable parallel libraries such as PTHREADS. Distributed memory message-passing parallel architectures and programming, concepts including message sends and receives, global communication promitives, single-program multiple data (SPMD) programs, portable parallel message programming using MPI. Data parallel architectures and programming, concepts such as array sections and array operations, data distribution and alignment, languages such as High Performance Fortran (HPF). Parallel algorithms for engineering applications.
ECE 361 (Computer Architecture) and ECE 230 (Programming for Computer Engineers) or equivalent.
Class notes (copies of lecture transparencies) to be handed out to students.
1. V. Kumar et al, Introduction to Parallel Computing , Benjamin Cummings Press, 1994.
2. I. Foster, Designing and Building Parallel Programs , Addison Wesley, 1994.
3. B. Bauer, ``Practical Parallel Programming,'' Academic Press, 1992. Login to "origin.cpdc.ece.nwu.edu" and type "insight" for an online version.
4. W. Gropp et al, Using MPI: Portable Parallel Programming with the Message Passing Interface , MIT Press, 1994.
5. C. Koelbel et al, The High-Performance Fortran Handbook , MIT Press, 1994.
Northwestern University Center for Parallel and Distributed Computing Lab; accounts on various supercomputer centers to be arranged.
nwu.school.meas.class.ece-358
Four homeworks and programming assignments (10 % each) - 40 % total
Two exams (30 % each) - 60 % total
The following is an approximate breakdown of topics that will be covered in this course (assuming each lecture is for 1.5 hours).
Lecture 1: Introduction to parallel computing: motivation for parallel computing, options of parallel computing, economics of parallel computing, basic concepts of parallel algorithms.
Lecture 2 : Introduction to parallel programming: data and task parallelism, coarse and fine grain parallelism, performance of parallel programs, load balancing and scheduling, analysis of simple parallel programs.
Lecture 3 : Overview of shared memory parallel architectures: memory organization, interconnect organization, cache coherence, case studies of machines such as SGI Challenge, IBM J-30, HP/Convex Exemplar.
Lecture 4 : Introduction to shared memory parallel programming: shared memory model, process creation and destruction, mutual exclusion, locks, barriers.
Lecture 5 : Explicit shared memory programming: loop scheduling, static and dynamic, loop parallelization strategies.
Lecture 6 : Shared memory parallel programming: use of PTHREADS
libraries, case studies of explicit parallel programming, computation of
PI, matrix multiplication, solution of partial differential equations.
Download PTHREAD DOCUMENTATION
Download Example PTHREAD program hello.c
Download Makefile for compiling PTHREADS Programs
Lecture 7 : Implicit shared memory parallel programming: use of compiler directives for parallel programming, DOALL and DOACROSS and PRAGRA directives for loop level parallelism, parallel programming examples using directives.
Lecture 8 : Distributed memory multicomputer architectures: overview of distributed memory parallel machines, message passing schemes, store and forward versus wormhole routing, interconnection networks, case studies of parallel machines such as Intel Paragon, IBM SP-2, Thinking Machine CM-5.
Lecture 9 : Global Communication operations in distributed memory machines: one-to-all broadcast, reduction, shift, scatter, gather operations, analysis of performance of above operations on various parallel architectures.
Lecture 10 : Introduction to message-passing programming: basics of message passing, global and local addresses, single-program multiple data (SPMD programs) introduction to Message Passing Interface (MPI).
Lecture 11 : Intermediate concepts in message passing programming: global and local addresses, loop scheduling for parallel loops.
Lecture 12 : Advanced message-passing concepts: topologies, and decompositions, case studies of example applications, matrix multiplication, solution of partial differential equations.
Lecture 13 : Introduction to SIMD parallel architectures: Single-instruction multiple data stream architectures, control and data units, interconnection networks, case studies of machines such as Thinking Machines CM-2, CM-5 and Masspar MP-2.
Lecture 14: Introduction to data parallel programming: Fortran-90, array sections, array operations, array intrinsic operations.
Lecture 15: Introduction to High Performance Fortran (HPF): FORALL directives, INDEPENDENT directives, simple parallel programs.
Lecture 16: High Performance Fortran: data distribution and alignment directives, simple parallel programming examples, matrix multiplication, solution of partial differential equations.
Lecture 17: Methodology for Parallel Algorithm Design: concurrency, locality, scalability, modularity; partitioning, agglomeration, communication, maping, performance analysis of parallel algorithms.
Lecture 18: Parallel Matrix Algorithms: matrix representations, parallel dense matrix operations, matrix-vector, matrix-matrix multiplication, solutions of linear system of equations.
Lecture 19: Parallel Sparse Matrix Solvers: sparse matrix representations, parallel iterative methods, parallel direct methods.