Cyclone (programming language)

Cyclone (programming language)
Cyclone
Appeared in 2006 (2006)
Designed by AT&T Labs
Stable release 1.0 (May 8, 2006; 5 years ago (2006-05-08))
Influenced by C
Website cyclone.thelanguage.org

The Cyclone programming language is intended to be a safe dialect of the C language. Cyclone is designed to avoid buffer overflows and other vulnerabilities that are endemic in C programs, without losing the power and convenience of C as a tool for system programming.

Cyclone development was started as a joint project of AT&T Labs Research and Greg Morrisett's group at Cornell in 2001. Version 1.0 was released on May 8, 2006.

Contents

Language features

Cyclone attempts to avoid some of the common pitfalls of C, while still maintaining its look and performance. To this end, Cyclone places the following limits on programs:

To maintain the tool set that C programmers are used to, Cyclone provides the following extensions:

  • Never-NULL pointers do not require NULL checks
  • "Fat" pointers support pointer arithmetic with run-time bounds checking
  • Growable regions support a form of safe manual memory management
  • Garbage collection for heap-allocated values
  • Tagged unions support type-varying arguments
  • Injections help automate the use of tagged unions for programmers
  • Polymorphism replaces some uses of void *
  • varargs are implemented as fat pointers
  • Exceptions replace some uses of setjmp and longjmp

For a better high-level introduction to Cyclone, the reasoning behind Cyclone and the source of these lists, see this paper.

Cyclone looks, in general, much like C, but it should be viewed as a C-like language.

Pointer/reference types

Cyclone implements three kinds of reference (following C terminology these are called pointers):

  • * (the normal type)
  • @ (the never-NULL pointer), and
  • ? (the only type with pointer arithmetic allowed, "fat" pointers).

The purpose of introducing these new pointer types is to avoid common problems when using pointers. Take for instance a function, called foo that takes a pointer to an int:

 int foo(int *);

Although the person who wrote the function foo could have inserted NULL checks, let us assume that for performance reasons they did not. Calling foo(NULL); will result in undefined behavior (typically, although not necessarily, a SIGSEGV being sent to the application). To avoid such problems, Cyclone introduces the @ pointer type, which can never be NULL. Thus, the "safe" version of foo would be:

 int foo(int @);

This tells the Cyclone compiler that the argument to foo should never be NULL, avoiding the aforementioned undefined behavior. The simple change of * to @ saves the programmer from having to write NULL checks and the operating system from having to trap NULL pointer dereferences. This extra limit, however, can be a rather large stumbling block for most C programmers, who are used to being able to manipulate their pointers directly with arithmetic. Although this is desirable, it can lead to buffer overflows and other "off-by-one"-style mistakes. To avoid this, the ? pointer type is delimited by a known bound, the size of the array. Although this adds overhead due to the extra information stored about the pointer, it improves safety and security. Take for instance a simple (and naïve) strlen function, written in C:

 int strlen(const char *s)
 {
     int iter = 0;
     if (s == NULL) return 0;
     while (s[iter] != '\0') {
        iter++;
     }
     return iter;
 }

This function assumes that the string being passed in is terminated by NUL ('\0'). However, what would happen if char buf[] = {'h','e','l','l','o','!'}; were passed to this string? This is perfectly legal in C, yet would cause strlen to iterate through memory not necessarily associated with the string s. There are functions, such as strnlen which can be used to avoid such problems, but these functions are not standard with every implementation of ANSI C. The Cyclone version of strlen is not so different from the C version:

 int strlen(const char ? s)
 {
    int iter, n = s.size;
    if (s == NULL) return 0;
    for (iter = 0; iter < n; iter++, s++) {
       if (*s == '\0') return iter;
    }
    return n;
 }

Here, strlen bounds itself by the length of the array passed to it, thus not going over the actual length. Each of the kinds of pointer type can be safely cast to each of the others, and arrays and strings are automatically cast to ? by the compiler. (Casting from ? to * invokes a bounds check, and casting from ? to @ invokes both a NULL check and a bounds check. Casting from * or ? results in no checks whatsoever; the resulting ? pointer has a size of 1.)

Dangling pointers and region analysis

Consider the following code, in C:

 char *itoa(int i)
 {
    char buf[20];
    sprintf(buf,"%d",i);
    return buf;
 }

This returns an object that is allocated on the stack of the function itoa, which is not available after the function returns. While gcc and other compilers will warn about such code, the following will typically compile without warnings:

 char *itoa(int i)
 {
    char buf[20], *z;
    sprintf(buf,"%d",i);
    z = buf;
    return z;
 }

Cyclone does regional analysis of each segment of code, preventing dangling pointers, such as the one returned from this version of itoa. All of the local variables in a given scope are considered to be part of the same region, separate from the heap or any other local region. Thus, when analyzing itoa, the compiler would see that z is a pointer into the local stack, and would report an error.

Manual memory management

Examples

The best example to start with is the classic Hello world program:

 #include <stdio.h>
 #include <core.h>
 using Core;
 int main(int argc, string_t ? args)
 {
    if (argc <= 1) {
       printf("Usage: hello-cyclone <name>\n");
       return 1;
    }
    printf("Hello from Cyclone, %s\n", args[1]);
    return 0;
 }

See also

References

External links

Presentations:


Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • ML (programming language) — ML Paradigm(s) multi paradigm: imperative, functional Appeared in 1973 Designed by Robin Milner others at the University of Edinburgh Typing discipline static, strong, inferred …   Wikipedia

  • C (programming language) — C The C Programming Language[1] (aka K R ) is the seminal book on C …   Wikipedia

  • Cyclone (disambiguation) — Cyclone may refer to several different things: Contents 1 Meteorology 2 Technology 3 Sport 4 Motorcycles and cars 5 …   Wikipedia

  • Cyclone (язык программирования) — У этого термина существуют и другие значения, см. Cyclone. Cyclone Семантика: процедурный …   Википедия

  • List of programming languages by category — Programming language lists Alphabetical Categorical Chronological Generational This is a list of programming languages grouped by category. Some languages are listed in multiple categories. Contents …   Wikipedia

  • List of programming languages — Programming language lists Alphabetical Categorical Chronological Generational The aim of this list of programming languages is to include all notable programming languages in existence, both those in current use and historical ones, in… …   Wikipedia

  • Greg Morrisett — John Gregory Morrisett is Allen B. Cutting Professor of Computer Science and Associate Dean for Computer Science and Engineering in the Harvard School of Engineering and Applied Sciences.His group at Cornell University created the Cyclone… …   Wikipedia

  • Tandem Computers — A Tandem Computers promotional mug Tandem Computers, Inc. was the dominant manufacturer of fault tolerant computer systems for ATM networks, banks, stock exchanges, telephone switching centers, and other similar commercial transaction processing… …   Wikipedia

  • Region-based memory management — In computer science, region based memory management is a type of memory management in which each allocated object is assigned to a region. A region, also called a zone, arena, or memory context, is a collection of allocated objects that can be… …   Wikipedia

  • Gestion de mémoire par régions — En informatique, la gestion de mémoire par région est un type de gestion de mémoire avec lequel chaque objet alloué est assigné à une région. Une région, alias une zone, une arène, ou un contexte mémoire, est une collection d’objets alloués qui… …   Wikipédia en Français

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”