Digraphs and trigraphs
This article needs additional citations for verification. Please help improve this article by adding reliable references. Unsourced material may be challenged and removed. (September 2008) |
In computer programming, digraphs and trigraphs are sequences of two and three characters respectively which are interpreted as one character by the programming language.
Various reasons exist for using digraphs and trigraphs: keyboards may not have keys to cover the entire character set of the language, input of special characters may be difficult, text editors may reserve some characters for special use and so on. Trigraphs might also be used for some EBCDIC code pages that lack characters such as {
and }
.
History
The basic character set of the C programming language is a subset of the ASCII character set that includes nine characters which lie outside the ISO 646 invariant character set. This can pose a problem for writing source code when the keyboard being used does not support any of these nine characters. The ANSI C committee invented trigraphs as a way of entering source code using keyboards that support any version of the ISO 646 character set.
Implementations
Trigraphs are not commonly encountered outside compiler test suites.[1] Some compilers support an option to turn recognition of trigraphs off, or disable trigraphs by default and require an option to turn them on. Some can issue warnings when they encounter trigraphs in source files. Borland supplied a separate program, the trigraph preprocessor, to be used only when trigraph processing is desired (the rationale was to maximise speed of compilation).
- C programming language supports digraphs in ISO C 94 mode of compiling.
- Pascal programming language supports digraphs
(.
,.)
,(*
and*)
for[
,]
,{
and}
respectively. - Vim text editor uses digraphs.
- GNU Screen has a digraph command, bound to ^A ^V by default.
- The J programming language uses dot and colon characters to extend the meaning of the basic characters available.
- Mobile phones have introduced digraph (and multi-graph) sequences into the English lexicon via Short message service messaging. Emoticons, such as :) appear in modern English language dictionaries.
Language support
Different systems have different sets of defined trigraphs:
C
The C preprocessor replaces all occurrences of the following nine trigraph sequences by their single-character equivalents before any other processing.
A programmer may want to place two question marks together yet not have the compiler treat them as introducing a trigraph. The C grammar does not permit two consecutive ?
tokens, so the only places in a C file where two question marks in a row may be used are in multi-character constants, string literals, and comments. To safely place two consecutive question marks within a string literal, the programmer can use string concatenation "...?""?..."
or an escape sequence "...?\?..."
.
Trigraph | Equivalent |
---|---|
??= |
#
|
??/ |
\
|
??' |
^
|
??( |
[
|
??) |
]
|
??! |
|
|
??< |
{
|
??> |
}
|
??- |
~
|
???
is not itself a trigraph sequence.
The ??/
trigraph can be used to introduce an escaped newline for line splicing; this must be taken into account for correct and efficient handling of trigraphs within the preprocessor. It can also cause surprises, particularly within comments. For example:
// Will the next line be executed????????????????/ a++;
which is a single logical comment line (used in C++ and C99), and
/??/ * A comment *??/ /
which is a correctly formed block comment.
In 1994 a normative amendment to the C standard, included in C99, supplied digraphs as more readable alternatives to six of the trigraphs. They are:
Digraph | Equivalent |
---|---|
<: |
[
|
:> |
]
|
<% |
{
|
%> |
}
|
%: |
#
|
%:%: |
##
|
Unlike trigraphs, digraphs are handled during tokenization, and it must always represent a full token by itself. If a digraph sequence occurs inside another token, for example a quoted string, or a character constant, it will not be replaced.
Notes
- ↑ "The New C Standard: An Economic and Cultural Commentary" by Derek M. Jones, sentence 117
References
- RFC 1345
Stub icon | This computer-related article is a stub. You can help Wikipedia by expanding it. |
de:Digraph (Informatik) de:Trigraph ru:Диграф (программирование) ru:Триграф (языки си)
If you like SEOmastering Site, you can support it by - BTC: bc1qppjcl3c2cyjazy6lepmrv3fh6ke9mxs7zpfky0 , TRC20 and more...