Code generation (compiler)

Code generation (compiler)

In computer science, code generation is the process by which a compiler's code generator converts some intermediate representation of source code into a form (e.g., machine code) that can be readily executed by a machine (often a computer).

Sophisticated compilers typically perform multiple passes over various intermediate forms. This multi-stage process is used because many algorithms for code optimization are easier to apply one at a time, or because the input to one optimization relies on the processing performed by another optimization. This organization also facilitates the creation of a single compiler that can target multiple architectures, as only the last of the code generation stages (the backend) needs to change from target to target. (For more information on compiler design, see Compiler.)

The input to the code generator typically consists of a parse tree or an abstract syntax tree. The tree is converted into a linear sequence of instructions, usually in an intermediate language such as three address code. Further stages of compilation may or may not be referred to as "code generation", depending on whether they involve a significant change in the representation of the program. (For example, a peephole optimization pass would not likely be called "code generation", although a code generator might incorporate a peephole optimization pass.)

Contents

Major tasks in code generation

In addition to the basic conversion from an intermediate representation into a linear sequence of machine instructions, a typical code generator tries to optimize the generated code in some way. The generator may try to use faster instructions, use fewer instructions, exploit available registers, and avoid redundant computations.

Tasks which are typically part of a sophisticated compiler's "code generation" phase include:

Instruction selection is typically carried out by doing a recursive postorder traversal on the abstract syntax tree, matching particular tree configurations against templates; for example, the tree W := ADD(X,MUL(Y,Z)) might be transformed into a linear sequence of instructions by recursively generating the sequences for t1 := X and t2 := MUL(Y,Z), and then emitting the instruction ADD W, t1, t2.

In a compiler that uses an intermediate language, there may be two instruction selection stages — one to convert the parse tree into intermediate code, and a second phase much later to convert the intermediate code into instructions from the instruction set of the target machine. This second phase does not require a tree traversal; it can be done linearly, and typically involves a simple replacement of intermediate-language operations with their corresponding opcodes. However, if the compiler is actually a language translator (for example, one that converts Eiffel to C), then the second code-generation phase may involve building a tree from the linear intermediate code.

Runtime code generation

When code generation occurs at runtime, as in just-in-time compilation (JIT), it is important that the entire process be efficient with respect to space and time. For example, when regular expressions are interpreted and used to generate code at runtime, a non-determistic finite state machine is often generated instead of a deterministic one, because usually the former can be created more quickly and occupies less memory space than the latter. Despite its generally generating less efficient code, JIT code generation can take advantage of profiling information that is available only at runtime.

Related concepts

The fundamental task of taking input in one language and producing output in a non-trivially different language can be understood in terms of the core transformational operations of formal language theory. Consequently, some techniques that were originally developed for use in compilers have come to be employed in other ways as well. For example, YACC (Yet Another Compiler Compiler) takes input in Backus-Naur form and converts it to a parser in C. Though it was originally created for automatic generation of a parser for a compiler, yacc is also often used to automate writing code that needs to be modified each time specifications are changed. (For example, see [1].)

Many integrated development environments (IDEs) support some form of automatic source code generation, often using algorithms in common with compiler code generators, although commonly less complicated. (See also: Program transformation, Data transformation.)

Reflection

In general, a syntax and semantic analyzer tries to retrieve the structure of the program from the source code, while a code generator uses this structural information (e.g., data types) to produce code. In other words, the former adds information while the latter loses some of the information. One consequence of this information loss is that reflection becomes difficult or even impossible. To counter this problem, code generators often embed syntactic and semantic information in addition to the code necessary for execution.

See also


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Code generation — may refer to: Code generation (compiler), a mechanism to produce the executable form of computer programs, such as machine code, in some automatic manner Automatic programming (source code generation), the act of generating source code based on… …   Wikipedia

  • Comparison of code generation tools — This article compares variable metamodel code generation tools[clarification needed]. Fixed metamodel code generation tools, such as UML tools, are excluded (see List of UML tools). Name Creator OS First public release Latest stable version… …   Wikipedia

  • Compiler construction — is an area of computer science that deals with the theory and practice of developing programming languages and their associated compilers. The theoretical portion is primarily concerned with syntax, grammar and semantics of programming languages …   Wikipedia

  • Compiler — This article is about the computing term. For the anime, see Compiler (anime). A diagram of the operation of a typical multi language, multi target compiler A compiler is a computer program (or set of programs) that transforms source code written …   Wikipedia

  • Compiler-compiler — A compiler compiler or compiler generator is a tool that creates a parser, interpreter, or compiler from some form of formal description of a language and machine. The earliest and still most common form of compiler compiler is a parser generator …   Wikipedia

  • Generation language — Quelltext eines Programms in der objektorientierten Programmiersprache Ruby. Eine Programmiersprache ist eine Notation für Computerprogramme; sie dient sowohl dazu, diese während und nach ihrer Entwicklung (Programmierung) darzustellen als auch… …   Deutsch Wikipedia

  • GNU Compiler Collection — Cc1 redirects here. For other uses of CC1 or CC 1, see CC1 (disambiguation). GNU Compiler Collection Developer(s) GNU Project Initial release May 23, 1987 ( …   Wikipedia

  • Multi-pass compiler — A multi pass compiler is a type of compiler that processes the source code or abstract syntax tree of a program several times. This is in contrast to a one pass compiler, which traverses the program only once. Each pass takes the result of the… …   Wikipedia

  • Cross compiler — A cross compiler is a compiler capable of creating executable code for a platform other than the one on which the compiler is run. Cross compiler tools are used to generate executables for embedded system or multiple platforms. It is used to… …   Wikipedia

  • Tiny C Compiler — Infobox Software name = Tiny C Compiler logo = developer = Fabrice Bellard latest release version = 0.9.24 latest release date = release date|2008|04|01 programming language = C and Assembly operating system = Linux, Unix, Microsoft Windows genre …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”