GPGPU

General-purpose computing on graphics processing units (GPGPU, also referred to as GPGP and to a lesser extent GP²) is the technique of using a GPU, which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the CPU. It is made possible by the addition of programmable stages and higher precision arithmetic to the rendering pipelines, which allows software developers to use stream processing on non-graphics data.

GPU improvements

GPU functionality has, traditionally, been very limited. In fact, for many years the GPU was only used to accelerate certain parts of the graphics pipeline. Some improvements were needed before GPGPU became feasible.

Programmability

Programmable vertex and fragment shaders were added to the graphics pipeline to enable game programmers to generate even more realistic effects. Vertex shaders allow the programmer to alter per-vertex attributes, such as position, color, texture coordinates, and normal vector. Fragment shaders are used to calculate the color of a fragment, or per-pixel. Programmable fragment shaders allow the programmer to substitute, for example, a lighting model other than those provided by default by the graphics card, typically simple Gouraud shading. Shaders have enabled graphics programmers to create lens effects, displacement mapping, and depth of field.

The programmability of the pipelines have trended according to Microsoft’s DirectX specificationFact|date=February 2007 , with DirectX8 introducing Shader Model 1.1, DirectX8.1 Pixel Shader Models 1.2, 1.3 and 1.4, and DirectX9 defining Shader Model 2.x and 3.0. Each shader model increased the programming model flexibilities and capabilities, ensuring the conforming hardware follows suit. The DirectX10 specification introduces Shader Model 4.0 which unifies the programming specification for vertex, geometry (“Geometry Shaders” are new to DirectX10) and fragment processing allowing for a better fit for unified shader hardware, thus providing a single computational pool of programmable resource.vague|date=March 2008

Data types

Pre-DirectX9 graphics cards only supported paletted or integral color typesvague|date=March 2008 . Various formats are available, each containing a red element, a green element, and a blue elementFact|date=February 2007. Sometimes an additional alpha value is added, to be used for transparency. Common formats are:
*8 bits per pixel – Palette modevague|date=March 2008 , where each value is an index in a table with the real color value specified in one of the other formats. Possibly 2 bits for red, 3 bits for green, and 3 bits for blue.
*16 bits per pixel – Usually allocated as 5 bits for red, 6 bits for green, and 5 bits for blue.
*24 bits per pixel – 8 bits for each of red, green, and blue
*32 bits per pixel – 8 bits for each of red, green, blue, and alpha

For early fixed function or limited programmability graphics (i.e. up to and including DirectX8.1 compliant GPUs) this was sufficient because this is also the representation used in displays. This representation does have certain limitations, however. Given sufficient graphics processing power even graphics programmers would like to use better formats, such as floating point data formats, in order to obtain effects such as high dynamic range imaging. Many GPGPU applications require floating point accuracy, which came with graphics cards conforming to the DirectX9 specification.

DirectX9 Shader Model 2.x suggested the support of two precision types: full and partial precision. Full precision support could either be FP32 and FP24 (floating point 24-bit per component) or greater, while partial precision was FP16. ATI’s R300 series of GPUs supported FP24 precision only in the programmable fragment pipeline (although FP32 was supported in the vertex processors) while Nvidia’s NV30 series supported both FP16 and FP32; other vendors such as S3 Graphics and XGI supported a mixture of formats up to FP24.

Shader Model 3.0 altered the specification, increasing full precision requirements to a minimum of FP32 support in the fragment pipeline. ATI’s Shader Model 3.0 compliant R5xx generation (Radeon X1000 series) supports just FP32 throughout the pipeline while Nvidia’s NV4x and G7x series continued to support both FP32 full precision and FP16 partial precisions. Although not stipulated by Shader Model 3.0, both ATI and Nvidia’s Shader Model 3.0 GPUs introduced support for blendable FP16 render targets, easier facilitating the support for High Dynamic Range Rendering.Fact|date=February 2007

The implementations of floating point on Nvidia GPUs are mostly IEEE compliant, however this is not true across all vendors. [http://doi.acm.org/10.1145/1198555.1198768 Mapping computational concepts to GPUs] : Mark Harris. Mapping computational concepts to GPUs. In ACM SIGGRAPH 2005 Courses (Los Angeles, California, July 31 – August 4, 2005). J. Fujii, Ed. SIGGRAPH '05. ACM Press, New York, NY, 50.] This has implications for correctness which are considered important to some scientific applications. While 64-bit floating point values (double precision float) are commonly available on CPUs, these are not universally supported on GPUs; some GPU architectures sacrifice IEEE-compliance while others lack double-precision altogether. There have been efforts to emulate double precision floating point values on GPUs, however the speed tradeoff negates any benefit to offloading the computation onto the GPU in the first place. [http://numod.ins.uni-bonn.de/research/papers/public/GoStTu05double.pdf Double precision on GPUs (Proceedings of ASIM 2005)] : Dominik Goddeke, Robert Strzodka, and Stefan Turek. Accelerating Double Precision (FEM) Simulations with (GPUs). Proceedings of ASIM 2005 – 18th Symposium on Simulation Technique, 2005.]

Most operations on the GPU operate in a vectorized fashion: a single operation can be performed on up to four values at once. For instance, if one color is to be modulated by another color , the GPU can produce the resulting color in a single operation. This functionality is useful in graphics because almost every basic data type is a vector (either 2, 3, or 4 dimensional). Examples include vertices, colors, normal vectors, and texture coordinates. Many other applications can put this to good use, and because of their higher performance, vector instructions (SIMD) have long been available on CPUs.

In November 2006 Nvidia launched CUDA, a SDK and API that allows a programmer to use the C programming language to code algorithms for execution on Geforce 8 series GPUs. AMD offers a similar SDK for their ATI-based GPUs and that SDK and technology is called CTM (Close to Metal), designed to compete directly with Nvidia's CUDA. CTM provides a thin hardware interfaceClarifyme|date=March 2008. AMD has also announced the AMD Stream Processor product line (combining a CPU and a GPU technology on one chip. Compared, for example, to traditional floating point accelerators such as the 64-bit CSX600 boards from ClearSpeed that is used in today's supercomputers, current top-end GPUs from Nvidia and AMD emphasize single-precision (32-bit) computation; double-precision (64-bit) computation executes much slower.

GPGPU programming concepts

GPUs are designed specifically for graphics and thus are very restrictive in terms of operations and programming. Because of their nature GPUs are only effective at tackling problems that can be solved using stream processing and the hardware can only be used in certain ways.

tream processing

GPUs can only process independent vertices and fragments, but can process many of them in parallel. This is especially effective when the programmer wants to process many vertices or fragments in the same way. In this sense, GPUs are stream processors – processors that can operate in parallel by running a single kernel on many records in a stream at once.

A stream is simply a set of records that require similar computation. Streams provide data parallelism. Kernels are the functions that are applied to each element in the stream. In the GPUs, vertices and fragments are the elements in streams and vertex and fragment shaders are the kernels to be run on them. Since GPUs process elements independently there is no way to have shared or static data. For each element we can only read from the input, perform operations on it, and write to the output. It is permissible to have multiple inputs and multiple outputs, but never a piece of memory that is both readable and writable vague|date=March 2008.

Arithmetic intensity is defined as the operations performed per word of memory transferred. It is important for GPGPU applications to have high arithmetic intensity or memory access latency will limit computation speed.Fact|date=February 2007

Ideal GPGPU applications have large data sets, high parallelism, and minimal dependency between data elements.

GPU programming concepts

Computational resources

There are a variety of computational resources available on the GPU:
*Programmable processors – Vertex, primitive and fragment pipelines allow programmer to perform kernel on streams of data
*Rasterizer – creates fragments and interpolates per-vertex constants such as texture coordinates and color
*Texture Unit – read only memory interface
*Framebuffer – write only memory interface

In fact, the programmer can substitute a write only texture for output instead of the framebuffer. This is accomplished either through Render-To-Texture (RTT), Render-To-Backbuffer-Copy-To-Texture(RTBCTT), or the more recent stream-out .

Textures as stream

The most common form for a stream to take in GPGPU is a 2D grid because this fits naturally with the rendering model built into GPUs. Many computations naturally map into grids: matrix algebra, image processing, physically based simulation, and so on.

Since textures are used as memory, texture lookups are then used as memory reads. Certain operations can be done automatically by the GPU because of this.

Kernels

Kernels can be thought of as the body of loops. For example, if the programmer were operating on a grid on the CPU he might have code that looked like this:

/* Pseudocode */
x = 1e8y = 1e8
make array x by y
for each "x" { // Loop this block 1e8 times for each "y" { // Loop this block 1e8 times do_some_hard_work(x, y) // This is done 1e16 times (10 000 000 000 000 000)

On the GPU, the programmer only specifies the body of the loop as the kernel and what data to loop over by invoking geometry processing.

Flow control

In regular programs it is possible to control the flow of the program using if-then-else statements and various forms of loops. Such flow control structures have only recently been added to GPUs.Fact|date=February 2007 Conditional writes could be accomplished using a series of simpler instructionsvague|date=March 2008, but looping and conditional branching were not possible.

Recent GPUs allow branching, but usually with a performance penalty. Branching should generally be avoided in inner loops, whether in CPU or GPU code, and various techniques, such as static branch resolution, pre-computation, and Z-cull [http://graphics.idav.ucdavis.edu/publications/print_pub?pub_id=907 GPGPU survey paper] : John D. Owens, David Luebke, Naga Govindaraju, Mark Harris, Jens Krüger, Aaron E. Lefohn, and Tim Purcell. "A Survey of General-Purpose Computation on Graphics Hardware". Computer Graphics Forum, volume 26, number 1, 2007, pp. 80-113.] can be used to achieve branching when hardware support does not exist.

GPU techniques

Map

The map operation simply applies the given function (the kernel) to every element in the stream. A simple example is multiplying each value in the stream by a constant (increasing the brightness of an image). The map operation is simple to implement on the GPU. The programmer generates a fragment for each pixel on screen and applies a fragment program to each one. The result stream of the same size is stored in the output buffer.

Reduce

Some computations require calculating a smaller stream (possibly a stream of only 1 element) from a larger stream. This is called a reduction of the stream. Generally a reduction can be accomplished in multiple steps. The results from the previous step are used as the input for the current step and the range over which the operation is applied is reduced until only one stream element remains.

tream filtering

Stream filtering is essentially a non-uniform reduction. Filtering involves removing items from the stream based on some criteria.

catter

The scatter operation is most naturally defined on the vertex processor. The vertex processor is able to adjust the position of the vertex, which allows the programmer to control where information is deposited on the grid. Other extensions are also possible, such as controlling how large an area the vertex affects.

The fragment processor cannot perform a direct scatter operation because the location of each fragment on the grid is fixed at the time of the fragment's creation and cannot be altered by the programmer. However, a logical scatter operation may sometimes be recast or implemented with an additional gather step. A scatter implementation would first emit both an output value and an output address. An immediately following gather operation uses address comparisons to see whether the output value maps to the current output slot.

Gather

The fragment processor is able to read textures in a random access fashion, so it can gather information from any grid cell, or multiple grid cells, as desiredvague|date=March 2008 .

ort

The sort operation transforms an unordered set of elements into an ordered set of elements. The most common implementation on GPUs is using sorting networks.

earch

The search operation allows the programmer to find a particular element within the stream, or possibly find neighbors of a specified element. The GPU is not used to speed up the search for an individual element, but instead is used to run multiple searches in parallel.Fact|date=February 2007

Data structures

A variety of data structures can be represented on the GPU:
*Dense arrays
*Sparse arrays – static or dynamic
*Adaptive structures

Applications

The following are some of the areas where GPUs have been used for general purpose computing:
*Computer clusters or a variation of a parallel computing (utilizing GPU cluster technology) for highly calculation-intensive tasks:
**High-performance clusters (HPC) (often referred to as supercomputers)
***including cluster technologies like Message Passing Interface, and single-system image (SSI), distributed computing, and Beowulf
**Grid computing (a form of distributed computing) (networking many heterogeneous computers to create a virtual computer architecture)
**Load-balancing clusters (sometimes referred to as a server farm)
*Physical based simulation and physics engines (usually based on Newtonian physics models)
**Conway's Game of Life, cloth simulation, incompressible fluid flow by solution of Navier-Stokes equations
*Lattice gauge theory
*Segmentation – 2D and 3D
*Level-set methods
*CT reconstruction
*Fast Fourier transform
*Tone mapping
*Audio signal processing
**Audio and Sound Effects Processing, to use a GPU for DSP (digital signal processing)
**Analog signal processing
**Speech processing
*Digital image processing
*Video Processing
**Hardware accelerated video decoding and post-processing
***Motion compensation (mo comp)
***Inverse discrete cosine transform (iDCT)
***Variable-length decoding (VLD)
***Inverse quantization (IQ)
***In-loop deblocking
***Bitstream processing (CAVLC/CABAC)
***Deinterlacing
****Spatial-temporal de-interlacing
***Noise reduction
***Edge enhancement
***Color correction
**Hardware accelerated video encoding and pre-processing
*Raytracing
*Global illumination – photon mapping, radiosity, subsurface scattering
*Geometric computing – constructive solid geometry, distance fields, collision detection, transparency computation, shadow generation
*Scientific computing
**Weather forecasting
**Climate research (including research into global warming)
**Molecular modeling
**Quantum mechanical physics
*Bioinformatics [Schatz, M.C., Trapnell, C., Delcher, A.L., Varshney, A. (2007) [http://www.biomedcentral.com/1471-2105/8/474/ High-throughput sequence alignment using Graphics Processing Units.] BMC Bioinformatics 8:474.] cite journal |author=Svetlin A. Manavski, Giorgio Valle |title=CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment |journal= [http://www.biomedcentral.com/1471-2105/9/S2/S10] |volume=BMC Bioinformatics 9(Suppl 2):S10|year=2008]
*Computational finance
*Medical imaging
*Computer vision
*Digital signal processing / signal processing
*Control engineering
*Neural networks
*Database operations
*Lattice Boltzmann methods
*Cryptography and cryptanalysis

References

ee also

*Graphics processing unit
** Comparison of ATI graphics processing units
** Comparison of Nvidia graphics processing units
** Graphics pipeline
** Graphics card
*Stream processing
*BrookGPU
*Physics engine is a computer program or that simulates Newtonian physics (on CPU, GPU or PPU)
** Physics processing unit
** List of games using physics engines
*Havok Physics / Havok FX, commercial physics engine middleware SDK for computer and video games
*PhysX SDK, commercial realtime physics engine middleware SDK developed by AGEIA
** AGEIA also designed a dedicated physics processing unit expansion card designed to accelerate the PhysX SDK
* GPU programming libraries/layers:
**Close to Metal, AMD/ATI's competing GPGPU technology for ATI Radeon-based GPUs
**CUDA (Compute Unified Device Architecture), Nvidia's competing GPGPU technology for Nvidia GeForce-based GPUs
**Sh, a GPGPU library for C++
** OpenCL (Open Computing Language), Apple's GPU utilization introduced in OS X Snow Leopard
*Audio processing unit (DSP can also be done on a GPU with GPGPU technology)
*Acceleware

External links

*http://www.gpgpu.org
* [http://www.gpgpu.org/w/index.php/Main_Page GPGPU Wiki]
* [http://www.gpgpu.org/s2005 SIGGRAPH 2005 GPGPU Course Notes]
* [http://www.gpgpu.org/vis2005 IEEE VIS 2005 GPGPU Course Notes]
*http://developer.nvidia.com
*http://www.atitech.com/developer
* [http://www.agilemolecule.com/Ascalaph/Ascalaph-Liquid.html Ascalaph Liquid GPU] molecular dynamics.
* [http://gneuron.freehostia.com C# Backpropagation library written for GPU]
* [http://graphics.stanford.edu/~mhouston/public_talks/R520-mhouston.pdf Slideshow for ATI GPGPU physics demonstration] by Stanford grad student Mike Houston See p.13 for overview of mapping of conventional program tasks to GPU hardware.
* [http://techreport.com/onearticle.x/8887 Tech Report article: "ATI stakes claims on physics, GPGPU ground"] by Scott Wasson
*http://www.acceleware.com
*http://www.vision4ce.com/ ruggeded PC with GPGPU accelerated image and signal processing
* [http://www.gass-ltd.co.il/en GPGPU in Israel]
*http://www.gpu4vision.org GPGPU Publications, Videos and Software

Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

GPGPU — o General Purpose Computing on Graphics Processing Units es un concepto reciente dentro de informática que trata de estudiar y aprovechar las capacidades de cómputo de una GPU. Una GPU es un procesador diseñado para los cómputos implicados en la… … Wikipedia Español
GPGPU — General Purpose Computation on Graphics Processing Unit, kurz GPGPU, bezeichnet die Verwendung des Grafikprozessors zur Berechnung von Aufgaben, die in keinem Zusammenhang mit der eigentlichen Grafikberechnung stehen. Dies können z. B.… … Deutsch Wikipedia
GPGPU — В этой статье не хватает ссылок на источники информации. Информация должна быть проверяема, иначе она может быть поставлена под сомнение и удалена. Вы можете отредактировать эту статью … Википедия
GPGPU — General Purpose Processing on Graphics Processing Units GPGPU est l abréviation de General Purpose Processing on Graphics Processing Units, c est à dire calcul générique sur un processeur graphique. GPGPU en modèle de remplacement du CPU Jusqu à… … Wikipédia en Français
General-Purpose Processing on Graphics Processing Units — GPGPU est l abréviation de General Purpose Processing on Graphics Processing Units, c est à dire calcul générique sur un processeur graphique afin de bénéficier de leur capacité de traitement massivement parallèle. Sommaire 1 GPGPU en modèle de… … Wikipédia en Français
BrookGPU — is the Stanford University Graphics group s compiler and runtime implementation of the Brook stream programming language for using modern graphics hardware for non graphical, or general purpose computations. Use of Graphics Processing Unit (or… … Wikipedia
Graphics processing unit — GPU redirects here. For other uses, see GPU (disambiguation). GeForce 6600GT (NV43) GPU A graphics processing unit or GPU (also occasionally called visual processing unit or VPU) is a specialized circuit designed to rapidly manipulate and alter… … Wikipedia
CUDA — Developer(s) Nvidia Corporation Stable release 4.0 / May 17 2011; 6 months ago (May 17 2011) Operating system Windows XP and later Mac OS X Linux … Wikipedia
General Purpose Computation on Graphics Processing Unit — (kurz GPGPU, vom Englischen für Allzweck Berechnung auf Grafikprozessoreinheit(en)) bezeichnet die Verwendung eines Grafikprozessors für Berechnungen über seinen ursprünglichen Aufgabenbereich hinaus. Dies können beispielsweise Berechnungen zu… … Deutsch Wikipedia
Larrabee (GPU) — Larrabee is the codename for a graphics processing unit (GPU) chip that Intel is developing separately from its current line of integrated graphics accelerators. The video card containing Larrabee is expected to compete with GeForce and Radeon… … Wikipedia

Academic Dictionaries and Encyclopedias

GPGPU

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

GPGPU

Look at other dictionaries:

Share the article and excerpts

Direct link