Comparison of MPI, OpenMP, and Stream Processing

=MPI=

MPI is a language-independent "communications protocol" used to program parallel computers. Both point-to-point and collective communication are supported. MPI "is a message-passing application programmer interface, together with protocol and semantic specifications for how its features must behave in any implementation." [Gropp "et al" 96, p.3] So, MPI is a specification, not an implementation.

MPI is not sanctioned by any major standards body; nevertheless, it has become the "de facto" standard for communication among processes that model a parallel program running on a "distributed memory system". Actual distributed memory supercomputers such as computer clusters often run these programs. The principal MPI-1 model has no shared memory concept, and MPI-2 has only a limited distributed shared memory concept. Nonetheless, MPI programs are regularly run on shared memory computers.

Designing programs around the MPI model (as opposed to explicit shared memory models) has advantages on NUMA architectures as programming for MPI encourages memory locality.

Most MPI implementations consist of a specific set of routines (API) callable from Fortran, C, or C++ and from any language capable of interfacing with such routine libraries. The advantages of MPI over older message passing libraries are portability (because MPI has been implemented for almost every distributed memory architecture) and speed (because each implementation is in principle optimized for the hardware on which it runs).

MPI is often compared with PVM, which is a popular distributed environment and message passing system developed in 1989, and which was one of the systems that motivated the need for standard parallel message passing systems.

Threaded shared memory programming models (such as Pthreads and OpenMP) and message passing programming (MPI/PVM) can be considered as complementary programming approaches.

OpenMP

OpenMP is an implementation of "multithreading", a method of parallelization whereby the master "thread" (a series of instructions executed consecutively) "forks" a specified number of slave "threads" and a task is divided among them. The threads then run concurrently, with the runtime environment allocating threads to different processors.

The section of code that is meant to run in parallel is marked accordingly, with a preprocessor directive that will cause the threads to form before the section is executed. Each thread has an "id" attached to it which can be obtained using a function (called omp_get_thread_num() in C/C++ and OMP_GET_THREAD_NUM() in FORTRAN). The thread id is an integer, and the master thread has an id of "0". After the execution of the parallelized code, the threads "join" back into the master thread, which continues onward to the end of the program. The number of threads for execution can be determined either statically (by environment variables) or dynamically (by a function call).

By default, each thread executes the parallelized section of code independently. "Work-sharing constructs" can be used to divide a task among the threads so that each thread executes its allocated part of the code. Both Task parallelism and Data parallelism can be achieved using OpenMP in this way.

tream Processing

Stream processing is a parallel computer programming paradigm that places many additional restrictions which streamline the hardware. The term comes from the concept of streaming data in and out of an execution core without utilizing inter-thread communication, scattered (ie, random) writes or even reads, or local memory. Also branching is often not allowed or is limited (hence, streaming is also strongly related to SIMD). It best describes real-time audio/video processing and characterizes early GPU as well as many DSP efforts. Modern (DX10) GPUs however remove many of these limitations and are essentially multithreaded, although still retain many peculiarities compared to ordinary multi-core CPUs. (Nevertheless, marketing continues to erroneously characterize modern GPGPU programming as 'stream processing.')

The stream processing paradigm, in its pure form, is highly efficient. Algorithms that don't require the missing features can be written quickly and run on optimized hardware. The runtime can also automate certain tasks, such as DMA management, thread launching, and resource management. The hardware is drastically simplified and hence can be made much more powerful for the same die area. Stream processing is the means by which specialized audio and video chips were able to process vast amounts of data in real time on workstations and personal computers long before general central processing units could handle the feat.

Pros and Cons of MPI

* Pros of MPI
**does not require shared memory architectures which are more expensive than distributed memory architectures
**can be used on a wider range of problems since it exploits both task parallelism and data parallelism
**highly portable with specific optimization for the implementation on most hardware

* Cons of MPI
**requires more programming changes to go from serial to parallel version
**can be harder to debug
**performance is limited by the communication network between the nodes

Pros and Cons of OpenMP

*Pros
**easier to program and debug (compared to MPI)
**data layout and decomposition is handled automatically by directives.
**gradual parallelism: directives can be added incrementally so the program can be parallelized one portion after another and thus no dramatic change to code is needed.
**unified code for both serial and parallel applications: OpenMP constructs are treated as comments when sequential compilers are used.
**original (serial) code statements need not, in general, be modified when parallelized with OpenMP. This reduces the chance of inadvertently introducing bugs and helps maintenance as well.
**both coarse-grained and fine-grained parallelism are possible

*Cons
**currently only runs efficiently in shared-memory multiprocessor platforms
**requires a compiler that supports OpenMP.
**scalability is limited by memory architecture.
**reliable error handling is missing.
**lacks fine-grained mechanisms to control thread-processor mapping.
**synchronization between subsets of threads is not allowed.
**mostly used for loop parallelization

References

* Hillis, W. Daniel and Steele, Guy L., Data Parallel Algorithms Communications of the ACM December 1986
* Blelloch, Guy E, Vector Models for Data-Parallel Computing MIT Press 1990. ISBN 0-262-02313-X
* Pros and Cons of MPI and OpenMP http://www.dartmouth.edu/~rc/classes/intro_mpi/parallel_prog_compare.html

ee also

*SIMD
*Data parallelism

Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

Parallel programming model — A parallel programming model is a set of software technologies to express parallel algorithms and match applications with the underlying parallel systems. It encloses the areas of applications, programming languages, compilers, libraries,… … Wikipedia
Multi-core — A multi core processor (or chip level multiprocessor, CMP) combines two or more independent cores into a single package composed of a single integrated circuit (IC), called a die, or more dies packaged together. The individual core is normally a… … Wikipedia
Multi-core processor — Diagram of a generic dual core processor, with CPU local level 1 caches, and a shared, on die level 2 cache … Wikipedia

Academic Dictionaries and Encyclopedias

Comparison of MPI, OpenMP, and Stream Processing

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Comparison of MPI, OpenMP, and Stream Processing

Look at other dictionaries:

Share the article and excerpts

Direct link