 Definite assignment analysis

In computer science, definite assignment analysis is a dataflow analysis used by compilers to conservatively ensure that a variable or location is always assigned to before it is used.
Contents
Motivation
In C and C++ programs, a source of particularly difficulttodiagnose errors is the nondeterministic behavior that results from reading uninitialized variables; this behavior can vary between platforms, builds, and even from run to run.
There are two common ways to solve this problem. One is to ensure that all locations are written before they are read. Rice's theorem establishes that this problem cannot be solved in general for all programs; however, it is possible to create a conservative (imprecise) analysis that will accept only programs that satisfy this constraint, while rejecting some correct programs, and definite assignment analysis is such an analysis. The Java^{[1]} and C#^{[2]} programming language specifications require that the compiler report a compiletime error if the analysis fails. Both languages require a specific form of the analysis that is spelled out in meticulous detail. In Java, this analysis was formalized by Stärk et al.^{[3]}, and some correct programs are rejected and must be altered to introduce explicit unnecessary assignments. In C#, this analysis was formalized by Fruja, and is precise as well as sound, in the sense that all variables assigned along all control flow paths will be considered definitely assigned.^{[4]} The Cyclone language also requires programs to pass a definite assignment analysis, but only on variables with pointer types, to ease porting of C programs.^{[5]}
The second way to solve the problem is to automatically initialize all locations to some fixed, predictable value at the point at which they are defined, but this introduces new assignments that may impede performance. In this case, definite assignment analysis enables a compiler optimization where redundant assignments — assignments followed only by other assignments with no possible intervening reads — can be eliminated. In this case, no programs are rejected, but programs for which the analysis fails to recognize definite assignment may contain redundant initialization. The Common Language Infrastructure relies on this approach.^{[6]}
Terminology
A variable or location can be said to be in one of three states at any given point in the program:
 Definitely assigned: The variable is known with certainty to be assigned.
 Definitely unassigned: The variable is known with certainty to be unassigned.
 Unknown: The variable may be assigned or unassigned; the analysis is not precise enough to determine which.
The analysis
The following is based on Fruja's formalization of the C# intraprocedural (single method) definite assignment analysis, which is responsible for ensuring that all local variables are assigned before they are used.^{[4]} It simultaneously does definite assignment analysis and constant propagation of boolean values. We define five static functions:
Name Domain Description before All statements and expressions Variables definitely assigned before the evaluation of the given statement or expression. after All statements and expressions Variables definitely assigned after the evaluation of the given statement or expression, assuming it completes normally. vars All statements and expressions All variables available in the scope of the given statement or expression. true All boolean expressions Variables definitely assigned after the evaluation of the given expression, assuming the expression evaluates to true. false All boolean expressions Variables definitely assigned after the evaluation of the given expression, assuming the expression evaluates to false. We supply dataflow equations that define the values of these functions on various expressions and statements, in terms of the values of the functions on their syntactic subexpressions. Assume for the moment that there are no goto, break, continue, return, or exception handling statements. Following are a few examples of these equations:
 Any expression or statement e that does not affect the set of variables definitely assigned: after(e) = before(e)
 Let e be the assignment expression loc = v. Then before(v) = before(e), and after(e) = after(v) U {loc}.
 Let e be the expression true. Then true(e) = before(e) and false(e) = vars(e). In other words, if e evaluates to false, all variables are (vacuously) definitely assigned, because e does not evaluate to false.
 Since method arguments are evaluated left to right, before(arg_{i + 1}) = after(arg_{i}). After a method completes, out parameters are definitely assigned.
 Let s be the conditional statement if (e) s_{1} else s_{2}. Then before(e) = before(s), before(s_{1}) = true(e), before(s_{2}) = false(e), and after(s) = after(s_{1}) intersect after(s_{2}).
 Let s be the while loop statement while (e) s_{1}. Then before(e) = before(s), before(s_{1}) = true(e), and after(s) = false(e).
 And so on.
At the beginning of the method, no local variables are definitely assigned. The verifier repeatedly iterates over the abstract syntax tree and uses the dataflow equations to migrate information between the sets until a fixed point can be reached. Then, the verifier examines the before set of every expression that uses a local variable to ensure that it contains that variable.
The algorithm is complicated by the introduction of controlflow jumps like goto, break, continue, return, and exception handling. Any statement that can be the target of one of these jumps must intersect its before set with the set of definitely assigned variables at the jump source. When these are introduced, the resulting data flow may have multiple fixed points, as in this example:
 int i = 1;
 L:
 goto L;
Since the label L can be reached from two locations, the controlflow equation for goto dictates that before(2) = after(1) intersect before(3). But before(3) = before(2), so before(2) = after(1) intersect before(2). This has two fixedpoints for before(2), {i} and the empty set. However, it can be shown that because of the monotonic form of the dataflow equations, there is a unique maximal fixed point (fixed point of largest size) that provides the most possible information about the definitely assigned variables. Such a maximal (or maximum) fixed point may be computed by standard techniques; see dataflow analysis.
An additional issue is that a controlflow jump may render certain control flows infeasible; for example, in this code fragment the variable i is definitely assigned before it is used:
 int i;
 if (j < 0) return; else i = j;
 print(i);
The dataflow equation for if says that after(2) = after(return) intersect after(i = j). To make this work out correctly, we define after(e) = vars(e) for all controlflow jumps; this is vacuously valid in the same sense that the equation false(true) = vars(e) is valid, because it is not possible for control to reach a point immediately after a controlflow jump.
References
 ^ J. Gosling, B. Joy, G. Steele, G. Bracha. "The Java Language Specification, 3rd Edition". pp. Chapter 16 (pp.527–552). http://java.sun.com/docs/books/jls/third_edition/html/defAssign.html. Retrieved December 2, 2008.
 ^ "Standard ECMA334, C# Language Specification". ECMA International. pp. Section 12.3 (pp.122–133). http://www.ecmainternational.org/publications/standards/Ecma334.htm. Retrieved December 2, 2008.
 ^ Stärk, Robert F.; E. Borger, Joachim Schmid (2001). Java and the Java Virtual Machine: Definition, Verification, Validation. Secaucus, NJ, USA: SpringerVerlag New York, Inc.. pp. Section 8.3. ISBN 3540420886.
 ^ ^{a} ^{b} Fruja, Nicu G. (October 2004). "The Correctness of the Definite Assignment Analysis in C#". Journal of Object Technology 3 (9): 29–52. doi:10.5381/jot.2004.3.9.a2. http://www.jot.fm/issues/issue_2004_10/article2. Retrieved 20081202. "We actually prove more than correctness: we show that the solution of the analysis is a perfect solution (and not only a safe approximation)."
 ^ "Cyclone: Definite Assignment". Cyclone User's Manual. http://cyclone.thelanguage.org/wiki/Definite%20Assignment. Retrieved December 16, 2008.
 ^ "Standard ECMA335, Common Language Infrastructure (CLI)". ECMA International. pp. Section 1.8.1.1 (Partition III, pg. 19). http://www.ecmainternational.org/publications/standards/Ecma335.htm. Retrieved December 2, 2008.
Categories:
Wikimedia Foundation. 2010.
Look at other dictionaries:
Dataflow analysis — is a technique for gathering information about the possible set of values calculated at various points in a computer program. A program s control flow graph (CFG) is used to determine those parts of a program to which a particular value assigned… … Wikipedia
Principal component analysis — PCA of a multivariate Gaussian distribution centered at (1,3) with a standard deviation of 3 in roughly the (0.878, 0.478) direction and of 1 in the orthogonal direction. The vectors shown are the eigenvectors of the covariance matrix scaled by… … Wikipedia
Multivariate analysis of variance — (MANOVA) is a generalized form of univariate analysis of variance (ANOVA). It is used when there are two or more dependent variables. It helps to answer : 1. do changes in the independent variable(s) have significant effects on the dependent … Wikipedia
Cyclone (programming language) — Cyclone Appeared in 2006 (2006) Designed by AT T Labs Stable release 1.0 (May 8, 2006; 5 years ago (2006 05 08)) Influenced by … Wikipedia
Programming language — lists Alphabetical Categorical Chronological Generational A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that… … Wikipedia
Compiler — This article is about the computing term. For the anime, see Compiler (anime). A diagram of the operation of a typical multi language, multi target compiler A compiler is a computer program (or set of programs) that transforms source code written … Wikipedia
Isotopes of dubnium — Dubnium (Db) is an artificial element, and thus a standard atomic mass cannot be given. Like all artificial elements, it has no stable isotopes. The first isotope to be synthesized was 261Db in 1968. There are 12 known radioisotopes from 256Db to … Wikipedia
Dubnium — rutherfordium ← dubnium → seaborgium Ta ↑ Db ↓ (Upp) … Wikipedia
Algorithm — Flow chart of an algorithm (Euclid s algorithm) for calculating the greatest common divisor (g.c.d.) of two numbers a and b in locations named A and B. The algorithm proceeds by successive subtractions in two loops: IF the test B ≤ A yields yes… … Wikipedia
formal logic — the branch of logic concerned exclusively with the principles of deductive reasoning and with the form rather than the content of propositions. [1855 60] * * * Introduction the abstract study of propositions, statements, or assertively used … Universalium