chomski

chomski
pp, chomski virtual machine
Paradigm(s) scripting language
Appeared in 2007
Designed by mj bishop
Typing discipline none; all data is treated as a string
Major implementations [1]
Influenced by Sed, Awk
OS Cross-platform
Website bumble.sourceforge.net/code/apps/pp/

chomski virtual machine (named after the noted linguist Noam Chomsky) and pp (the pattern parser) refer to both a command line computer language and utility (interpreter for that language) which can be used to parse and transform text patterns. The utility reads input files character by character (sequentially), applying the operation which has been specified via the command line or a pp script, and then outputs the line. It was developed from 2006 as a Unix and Windows utility, and is available today for Windows and Linux systems. Pp has derived a number of ideas and syntax elements from Sed, a command line text stream editor.

Contents

Features

The chomski language uses many ideas taken from sed, the Unix stream editor. For example, sed includes two virtual variables or data buffers, known as the "pattern space" and the "hold space". These two variables constitute an extremely simple virtual machine. In the Chomski language this virtual machine has been augmented with several new buffers or registers along with a number of commands to manipulate these buffers.

The chomski virtual machine includes a tape data structure as well as a stack (data structure), along with a "workspace" (which is the equivalent of the sed "pattern space" and a number of other buffers of lesser importance. This virtual machine is designed specifically to be apt for the parsing of formal languages. This parsing process traditionally involves two phases; the lexical analysis phase and the formal grammar phase. During the lexical analysis phase as series of tokens are generated. These tokens are then used as the input for a set of formal grammar rule. The chomski virtual machine uses the stack to hold these tokens and uses the tape structure to hold the attributes of these parse tokens. In a pp script, these two phases, lexing and parsing, are combined in one script file. A series of command words are used to manipulate the different data structures of the virtual machine.

Purpose and Motivation

The purpose of the pp tool is to parse and transform text patterns. The text patterns conform to the rules provided in a formal language and include many context free languages. Where as traditional Unix tools (such as awk, sed, grep, etc) process text one line at a time, and use regular expressions to search or transform text, the pp tool processes text one character at a time and can use context free grammars to transform (or compile) the text. However, in common with the Unix philosophy, the pp tool works upon plain text streams, encoded according to the locale of the local computer, and produces as output another plain text stream, allowing the pp tool to be used as part of a standard pipeline.

The motivation for the creation of the pp tool and the chomski virtual machine was to allow the writing of parsing scripts, rather than having to resort to traditional parsing tools such as Lex and Yacc.

Usage

The following example shows a typical use of chomski, where the -s option indicates that the chomski expression follows:

cat inputFileName | chomski -s  '/(/ { until ")"; print; } clear;' > outputFileName

In the above script, only text within brackets would be saved in the output file.

Under Unix (and Windows), chomski can be used as a filter in a pipeline:

generate_data | chomski -s '/x/{clear;add "y";}print;clear;'

That is, generate the data, and then make the small change of replacing x with y.

Several commands can be put together in a file called, for example, substitute.chom and then be applied using the -f option to read the commands from the file:

cat inputFileName | chomski -f substitute.chom > outputFileName

Besides substitution, other forms of simple processing are possible. For example, the following uses the plus and count commands to count the number of lines in a file:

cat inputFileName | chomski -s '[-n]{plus;} <>{count;print;}' 

This example used some of the following metacharacters and language features:

  • The square Brackets ([]) indicate the matching of a character class.
  • The -n string matches a newline character.
  • The <> string matches the end of the input stream (text file).
  • The curly braces ({}) follow tests and group multiple statements.
  • The semi-colon (;) terminates all statements,

Complex chomski constructs are possible, allowing it to serve as a simple, but highly specialised, programming language. Chomski has only one flow control statement (apart from the test structures <>, [], // etc), namely the check command, which jumps back to the @@ label (no other labels are permitted).

History

The idea for chomski arose from the limitations of regular expression engines which use a line by line paradigm, and the limitations on parsing nested text patterns with regular expressions. chomski evolved as a natural progression from the grep and sed command. Development began approximately in 2006 and continued sporadically.[1]

Limitations

Chomski is not a general purpose programming language. Like sed it is designed for a limited type of usage. chomski currently does not support unicode strings, since the current implementation uses standard C character arrays. Chomski does not currently have a debugger for debugging complex scripts.

See also

References

  1. ^ Developers (m.j.bishop) personal recollection

External links


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Chomski — Infobox programming language name = chomski paradigm = scripting language year = 2007 designer = mj bishop typing = none; all data is treated as a string implementations = [http://bumble.sourceforge.net/machine/c/ chomski] influenced by = Sed,… …   Wikipedia

  • Alejandro Chomski — est un monteur, réalisateur et scénariste argentin, né le 27 novembre 1968 à Buenos Aires (Argentine). Sommaire 1 Filmographie 1.1 comme Réalisateur 1.2 comme Scénariste …   Wikipédia en Français

  • Noam Chomski — Noam Chomsky‎ Avram Noam Chomsky [ˈævɹəm ˈnoʊəm ˈtʃɑːmskɪ] (* 7. Dezember 1928 in Philadelphia, Pennsylvania, USA) ist Professor für Linguistik am Massachusetts Institute of Technology (MIT). Er entwickelte die nach ihm benannte …   Deutsch Wikipedia

  • ÉTATS-UNIS - La pensée américaine — Y a t il une pensée américaine? Dans l’abstrait, la question revient à se demander si une pensée peut avoir une nationalité. L’aire de propagation de la pensée grecque fut si vaste qu’on ne prit pas garde que l’universalité de la philosophie… …   Encyclopédie Universelle

  • Hoy y mañana — Título Hoy y mañana Ficha técnica Dirección Alejandro Chomski Guion Alejandro Chomski Reparto …   Wikipedia Español

  • Nightmare (1998 The Outer Limits) — Nightmare The Outer Limits episode Episode no. Season 4 Episode 20 Directed by James Head Written by Sa …   Wikipedia

  • Feel the Noise — Infobox Film name = Feel the Noise caption = Theatrical release poster director = Alejandro Chomski producer = Simon Fields Cathy Gesualdo Jennifer Lopez Sofia Sondervan writer = Albert Leon starring = Omarion Grandberry Kevin Rios Giancarlo… …   Wikipedia

  • Stal Gorzów Wielkopolski — Infobox Speedway team clubname = KS Stal Gorzów Wielkopolski track = ul. Kwiatowa 55 Gorzów Wlkp. country = Poland founded = 1947 closed = manager = Stanisław Chomski captain = league = Speedway Ekstraliga website = [http://www.stalgorzow.pl/… …   Wikipedia

  • Cinema argentin — Cinéma argentin Cinéma des Amériques Cinéma argentin Cinéma américain Cinéma brésilien Cinéma canadien …   Wikipédia en Français

  • Cinéma Argentin — Cinéma des Amériques Cinéma argentin Cinéma américain Cinéma brésilien Cinéma canadien …   Wikipédia en Français

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”