Self-modifying code

Self-modifying code

In computer science, self-modifying code is code that alters its own instructions, intentionally or otherwise, while it is executing.

Self-modifying code is quite straightforward to write when using assembly language (taking into account the CPU cache). It is also supported by some high level language interpreters such as SNOBOL4, the Lisp programming language, or the ALTER verb in COBOL. It is more difficult to implement on compilers but compilers such as Clipper and Spitbol make a fair attempt at it, and COBOL almost encourages it. One batch programming technique is to use self-modifying code [ [http://www.csd.net/~cgadd/knowbase/DOS0019.HTM Self-modifying Batch File] by Lars Fosdal] . Most scripting languages such as Perl and Python are interpreted, which means that the program can generate new code and execute it; usually, this is done in a variable, but it can also be performed by writing out a new file and running it in the scripting language interpreter.

Usage

Self-modifying code can be used for various purposes:

#Semi-automatic optimization of a state dependent loop.
#Runtime code generation, or specialization of an algorithm in runtime or loadtime (which is popular, for example, in the domain of real-time graphics) such as a general sort utility preparing code to perform the key comparison described in a specific invocation.
#Altering of inlined state of an object, or simulating the high-level construction of closures.
#Patching of subroutine address calling, as done usually at load time of dynamic libraries, or, on each invocation patching the subroutine's internal references to its parameters so as to use their actual addresses. Whether this is regarded as 'self-modifying code' or not is a case of terminology.
#Evolutionary computing systems such as genetic programming.
#Hiding of code to prevent reverse engineering, as through use of a disassembler or debugger.
#Hiding of code to evade detection by virus/spyware scanning software and the like.
#Filling 100% of memory (in some architectures) with a rolling pattern of repeating opcodes, to erase all programs and data, or to burn-in hardware.
#Compression of code to be decompressed and executed at runtime, e.g., when memory or disk space is limited.
#Some very limited instruction sets leave no option but to use self-modifying code to achieve certain functionality. For example, a "One Instruction Set Computer" machine that uses only the subtract-and-branch-if-negative "instruction" cannot do an indirect copy (something like the equivalent of "*a = **b" in the C programming language) without using self-modifying code.
#Altering instructions for fault-tolerance

The second and third types are probably the kinds mostly used also in high-level languages, such as LISP.

Optimizing a state-dependent loop

Pseudocode example:

repeat N times { if STATE is 1 increase A by one else decrease A by one do something with A }

Self-modifying code in this case would simply be a matter of rewriting the loop like this:

repeat N times { "increase" A by one do something with A } when STATE has to switch { replace the opcode "increase" above with the opcode to decrease }

Note that 2-state replacement of the opcode can be easily written as'xor var at address with the value "opcodeOf(Inc) xor opcodeOf(dec)"'.

Choosing this solution will have to depend of course on the value of 'N' and the frequency of state changing.

Attitudes

Self-modifying code can either be seen as a feature like any other (or even as just "delayed code-editing"), or as a bad practice which makes code harder to read and maintain.

In the early days of computers, self-modifying code was used often in order to reduce the usage of memory, which was extremely limited, and didn't pose any problem. It was also used to implement subroutine calls and returns when the instruction set only provided simple branching or skipping instructions to vary the control flow. This application is still relevant in certain ultra-RISC architectures, at least theoretically; see for example One instruction set computer. Donald Knuth's MIX architecture also used self-modifying code to implement subroutine calls.

Already, critical systems which are too complex for people to fully manage in real time, such as the Internet and electrical distribution networks routinely rely upon self-modifying behaviors (though not necessarily self-modifying code) in order to function acceptably.

Use as camouflage

Self-modifying code was used to hide copy protection instructions in 1980s disk based programs for platforms such as IBM PC and Apple II. For example, on an IBM PC (or compatible), the floppy disk drive access instruction 'int 0x13' would not appear in the executable program's image but it would be written into the executable's memory image after the program started executing.

Self-modifying code is also sometimes used by programs that do not want to reveal their presence — such as computer viruses and some shellcodes. Viruses and shellcodes that use self-modifying code mostly do this in combination with polymorphic code. Polymorphic viruses are sometimes called primitive self-mutators. Modifying a piece of running code is also used in certain attacks, such as buffer overflows.

elf-referential machine learning systems

Traditional machine learning systems have a fixed, pre-programmed learning algorithm to adjust their parameters. However, since the 1980s Jürgen Schmidhuber has published several self-modifying systems with the ability to change their own learning algorithm. They avoid the danger of catastrophic self-rewrites by making sure that self-modifications will survive only if they are useful according to a user-given fitness function or error function or reward function.

Operating systems

Because of the security implications of self-modifying code, all of the major operating systems are careful to remove such vulnerabilities as they become known. The concern is typically not that programs will intentionally modify themselves, but that they could be maliciously changed by an exploit.

As consequence of the troubles that can be caused by these exploits, an OS feature called W^X (for "write xor execute") has been developed which prohibits a program from making any page of memory both writable and executable. Some systems prevent a writable page from ever being changed to be executable, even if write permission is removed. Other systems provide a back door of sorts, allowing multiple mappings of a page of memory to have different permissions. A relatively portable way to bypass W^X is to create a file with all permissions, then map the file into memory twice. On Linux, one may use an undocumented SysV shared memory flag to get executable shared memory without needing to create a file. On Windows Vista and Windows XP the W^X protection is named Data Execution Prevention and can be disabled via the Control Panel.

Regardless, at a meta-level, programs can still modify their own behavior by changing data stored elsewhere (see Metaprogramming) or via use of polymorphism.

Just-in-time compilers

Just-in-time compilers for Java, .NET, ActionScript 3.0 and other programming languages compile blocks of byte-code or p-code into machine code suitable for the host processor and then immediately execute them. Fabrice Bellard's Tiny C Compiler can and has been used as C-Just-in-Time-Compiler-Library, e.g. by TCCBOOT (a bootloader that can compile, load and run its operation system on-the-fly).

Graphics drivers for modern GPUs perform JIT-Compilation of DirectX or OpenGL/GLSL geometry and fragment shaders, and can thus be seen as self-modifying code, sometimes distributed over multiple processors and DSPs (or even self-modifying hardware).

Some CPU Architecture Emulators use similar techniques as JIT-Compilers (simulated instruction set as "programming language" that becomes compiled for the target processor).

Interaction of cache and self-modifying code

On architectures without coupled data and instruction cache (some ARM and MIPS cores) the cache synchronization must be explicitly performed by the modifying code (flush data cache and invalidate instruction cache for the modified memory area).

In some cases short sections of self-modifying code executes more slowly on modern processors. This is because a modern processor will usually try to keep blocks of code in its cache memory. Each time the program rewrites a part of itself, the rewritten part must be loaded into the cache again, which results in a slight delay, if the modified codelet shares the same cache line with the modifying code, as is the case when the modified memory address is located within a few bytes to the one of the modifying code.

The cache invalidation issue on modern processors usually means that self-modifying code would still be faster only when the modification will occur rarely, such as in the case of a state switching inside an inner loop.Fact|date=March 2008

Most modern processors load the machine code before they execute it, which means that if an instruction that is too near the instruction pointer is modified, the processor will not notice, but instead execute the code as it was "before" it was modified. See Prefetch Input Queue (PIQ). PC processors have to handle self-modifying code correctly for backwards compatibility reasons but they are far from efficient at doing soFact|date=March 2008.

Henry Massalin's Synthesis kernel

The Synthesis kernel written by Dr. Henry Massalin as his Ph.D. thesis is commonly viewed to be the "mother of all self-modifying code." Massalin's tiny Unix kernel takes a structured, or even object oriented, approach to self-modifying code, where code is created for individual quajects, like filehandles; generating code for specific tasks allows the Synthesis kernel to (as a JIT interpreter might) apply a number of optimizations such as constant folding or common subexpression elimination.

The Synthesis kernel was extremely fast, but was written entirely in assembly. The resulting lack of portability has prevented Massalin's optimization ideas from being adopted by any production kernel. However, the structure of the techniques suggests that they could be captured by a higher level language, albeit one more complex than existing mid-level languages. Such a language and compiler could allow development of extremely fast operating systems and applications.

Paul Haeberli and Bruce Karsh have objected to the "marginalization" of self-modifying code, and optimization in general, in favor of reduced development costs, drawing a parallel to the "heavy religious atmosphere" which the Italian Futurist movement rebelled against.

ee also

*Reflection (computer science)
*Self-replication
*Quine (computing)

References

External links

* [http://asm.sourceforge.net/articles/smc.html Using self-modifying code under Linux]
* [http://public.carnet.hr/~jbrecak/sm.html Self-modifying C code]
* [http://www.cs.columbia.edu/~library/TR-repository/reports/reports-1992/cucs-039-92.ps.gz "Synthesis: An Efficient Implementation of Fundamental Operating System Services"] : Henry Massalin's Ph.D. thesis on the Synthesis kernel
* [http://www.graficaobscura.com/future/index.html Futurist Programming]
* [http://flint.cs.yale.edu/flint/publications/smc.html Certified Self-Modifying Code]
*Jürgen Schmidhuber's publications on [http://www.idsia.ch/~juergen/metalearner.html self-modifying code for self-referential machine learning systems]
* [http://www.pcosmos.ca/pcastl/ PCASTL: by Parent and Childset Accessible Syntax Tree Language]


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать курсовую

Look at other dictionaries:

  • Code segment — In computing, a code segment, also known as a text segment or simply as text, is one of the sections of a program in an object file or in memory, which contains executable instructions. It has a fixed size and is usually read only. If the text… …   Wikipedia

  • Self-Protecting Digital Content — (SPDC), is a copy protection (Digital Rights Management) architecture which allows restriction of access to, and copying of, the next generation of optical discs and streaming/downloadable content. Overview Designed by Cryptography Research, Inc …   Wikipedia

  • International Obfuscated C Code Contest — The International Obfuscated C Code Contest (abbreviated IOCCC) was a programming contest for the most creatively obfuscated C code, held annually between 1984 and 1996, and thereafter in 1998, 2000, 2001, 2004, and 2006.[1] The winners of the… …   Wikipedia

  • Polymorphic code — In computer terminology, polymorphic code is code that mutates while keeping the original algorithm intact. This technique is sometimes used by computer viruses, shellcodes and computer worms to hide their presence. Most anti virus software and… …   Wikipedia

  • Metamorphic code — In computer virus terms, metamorphic code is code that can reprogram itself. Often, it does this by translating its own code into a temporary representation, editing the temporary representation of itself, and then writing itself back to normal… …   Wikipedia

  • Selbstmodifizierender Code — Mit der Bezeichnung Selbstmodifizierender Code (engl: Self Modifying Code) wird ein Abschnitt eines Computerprogramms bezeichnet, das zur Lösung der Programmaufgabe Teile des eigenen Programmcodes während der Ausführung gezielt verändert. Unter… …   Deutsch Wikipedia

  • Oligomorphic code — An oligomorphic engine is generally used by a computer virus to generate a decryptor for itself in a way comparable to a simple polymorphic engine. It does this by randomly selecting each piece of the decryptor from several predefined… …   Wikipedia

  • List of Code Lyoko characters — This is a list of characters in the French animated television series Code Lyoko. It covers the protagonists, the antagonist, and supporting characters. Contents 1 Main Characters 1.1 Aelita 1.2 Jeremy Belpois …   Wikipedia

  • List of primary characters in Code Lyoko — This is a list of primary characters in the French animated television series Code Lyoko . It covers the protagonists, the antagonist, and the primary supporting characters. Main characters The main characters are Aelita Hopper, Jeremy Belpois,… …   Wikipedia

  • Algorithmic efficiency — In computer science, efficiency is used to describe properties of an algorithm relating to how much of various types of resources it consumes. Algorithmic efficiency can be thought of as analogous to engineering productivity for a repeating or… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”