Ctype.h

Ctype.h

The header ctype.h in the ANSI C Standard Library for the C programming language contains declarations for character classification functions.

History

Early toolsmiths writing in C under Unix began developing idioms at a rapid rate to classify characters into different types. For example, in the ASCII character set, the following test identifies a letter:

if ('A' <= c && c <= 'Z' || 'a' <= c && c <= 'z')

However, this idiom does "not" work for other character sets such as EBCDIC.

Pretty soon, programs became thick with tests such as the one above, or worse, tests almost like the one above. A programmer can write the same idiom several different ways, which slows comprehension and increases the chance for errors.

Before long, the idioms were replaced by the functions in .

Implementation

Unlike the above example, the character classification routines are not written as comparison tests. In most C libraries, they are written as static table lookups instead of macros or functions.

For example, an array of 256 eight-bit integers, arranged as bitfields, is created, where each bit corresponds to a particular property of the character, e.g., isdigit, isalpha. If the lowest-order bit of the integers corresponds to the isdigit property, the code could be written thusly:

#define isdigit(x) (TABLE [x] & 1)

Early versions of Linux used a potentially faulty method similar to the first code sample:

#define isdigit(x) ((x) >= '0' && (x) <= '9')

This can cause problems if x has a side effect---for instance, if one calls isdigit(x++) or isdigit(run_some_program()). It would not be immediately evident that the argument to isdigit is being evaluated twice. For this reason, the table-based approach is generally used.

The difference between these two methods became a point of interest during the SCO v. IBM case.

The contents of

The contains prototypes for a dozen character classification functions. All of these functions except isdigit and isxdigit are locale-specific; their behavior may change if the locale changes.

The Single Unix Specification Version 3 adds functions similar to the above:

Incorrect usage

The standard states (§7.4-1)::"In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined."

Unfortunately many programmers forget that a char type may be either signed or unsigned, depending on the implementation. If the char types are signed, the implicit conversion from char to int may generate negative values, resulting in undefined behavior. That usually means that if the argument is used as an index to a lookup table, it will access an area outside of the correct table, and may even crash the program.

The correct way to use char arguments is to first cast them to unsigned char.

The int-type values returned by getchar, getc, and fgetc are guaranteed to be in the range of unsigned char (or EOF), and thus no cast is needed in these cases.

External links

* [http://www.cppreference.com/ C++ Reference]


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Ctype.h — Стандартная библиотека языка программирования С assert.h complex.h ctype.h errno.h fenv.h float.h inttypes.h iso646.h limits.h locale.h math.h setjmp.h signal.h stdarg.h stdbool.h stddef.h stdint.h stdio.h …   Википедия

  • Ctype.h — Saltar a navegación, búsqueda ctype.h es un archivo de cabecera de la biblioteca estándar del lenguaje de programación C diseñado para operaciones básicas con caracteres. Contiene los prototipos de las funciones y macros para clasificar… …   Wikipedia Español

  • ctype.h — Стандартная библиотека языка программирования С assert.h complex.h ctype.h errno.h fenv.h float.h inttypes.h iso646.h limits.h locale.h math.h setjmp.h signal.h stdarg.h stdbool.h stddef.h …   Википедия

  • Calestous Juma — (born 9 June, 1953 at Port Victoria, western Kenya) is an internationally recognized authority in the application of science and technology to sustainable development worldwide. He is Professor of the Practice of International Development and… …   Wikipedia

  • Republic of Ireland postal addresses — Postal addresses in Ireland are similar to those in many other parts of the world. Currently there is no national post code system. However, Dublin is divided into postal districts, under a system which was similar to that used in cities… …   Wikipedia

  • Seton Hall University School of Law — Established 1951 Type Private President Msgr. Robert Sheeran …   Wikipedia

  • University of Mississippi School of Law — Established 1854 Type Public Dean Richard Gershon …   Wikipedia

  • Robert L. Childers — is a judge in the Circuit Court of Tennessee for the 30th Judicial District. Judge Childers was designated by the Tennessee Supreme Court in 2003 to preside over the widely publicized Anna Mae He case. In a controversial ruling… …   Wikipedia

  • Comparison of programming languages (object-oriented programming) — Programming language comparisons General comparison Basic syntax Basic instructions Arrays Associative arrays String operations …   Wikipedia

  • Russian Empire — Infobox Former Country native name = Российская Империя (ru Cyrl) Rossiyskaya Imperiya (translit) conventional long name = Russian Empire common name = Russia| continent = Eurasia status = Empire government type = Monarchy| year start = 1721 year …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”