C string
In computing, a C string is a character sequence stored as a one-dimensional character array and terminated with a null character ('\0', called NUL in ASCII). The name refers to the ubiquitous C programming language which uses this string representation. Alternative names are ASCIIZ and null-terminated string.
The length of a C string is found by searching for the (first) NUL byte. This takes O(n) (linear) time with respect to the string length, and it also means a string cannot contain the NUL byte. At the time C (and the languages that it was derived from) were developed, memory was extremely limited, and using only one byte of overhead to store the length of a string was imperative. The only popular alternative (often called a "Pascal string") used one byte to store the length. This allowed the string to contain NUL and made finding the length take O(1) (constant) time, but it limited the length to 255 bytes. This length limitation proved to be far more restrictive than the limitations of C strings.
On modern systems the memory usage is less of a concern, and a larger value can be used for the length (if you have vast numbers of short strings a hash table can be used to save memory instead). Most replacements for C strings (such as the C++ std::string
container and the Qt string) use a 32-bit or more length. Thus NUL bytes can be placed in the string and finding the length is O(1).
Making a "copy" of a C string with any number of bytes removed from the start can be done by just moving the pointer, an O(1) constant time operation (far faster than any other string representation). Many pieces of software have taken advantage of this, making it difficult to change them to a new string style without serious speed impact.
The NUL termination has historically created security problems. A bug or malicious program can insert a NUL into the middle of a string, truncating it unexpectedly. A common bug was to not write the NUL at the end of a string (often not detected because there was a NUL already there), allowing leakage of program internal information, added to the end of the string. Due to the expense of finding the length, many programs did not bother before copying the string to a fixed-size buffer, causing a buffer overflow.
Many attempts have been made to make C string handling less error prone. These range from adding safer and more useful functions such as strdup
and strlcpy
, to entire wrappers to treat the string as an opaque object, such as the MFC CString
class which internally represents the string as a C string, but does not require the programmer to handle memory allocation issues.
C String header
The C standard library named string.h
(<cstring>
header in C++) is used to work with C strings. Confusion or programming errors arise when strings are treated as simple data types. Specific functions have to be employed for comparison and assignment such as strcpy
for assignment instead of the standard =
and strncmp
instead of ==
for comparison.
Operation | Function | Description |
---|---|---|
Copying | ||
memcpy |
Copies a block of memory | |
memmove |
Move block of memory | |
strcpy |
Copy string | |
strncpy |
Copy n number characters from string | |
Concatenation | ||
strcat |
Concatenate strings | |
strncat |
Append n number of characters from string | |
Comparison | ||
memcmp |
Compare two blocks of memory | |
strcmp |
Compare two strings | |
strcoll |
Compare two strings using locale | |
strncmp |
Compare first n characters of two strings | |
strxfrm |
Transform string using locale | |
Searching | ||
memchr |
Locate character in block of memory | |
strchr |
Locate first occurrence of character in string | |
strcspn |
Get span until character in string | |
strpbrk |
Locate character in string | |
strrchr |
Locate last occurrence of character in string | |
strspn |
Get span of character set in string | |
strstr |
Locate substring | |
strtok |
Split string into tokens | |
Other | ||
memset |
Fill block of memory | |
strerror |
Get pointer to error message string | |
strlen |
Get string length |
Trivia
C strings are equivalent to the strings created by the .ASCIZ directive of the PDP-11 and VAX macroassembly languages and the ASCIZ directive of the MACRO-10 macro assembly language for the PDP-10.
See also
References
http://www.cplusplus.com/reference/clibrary/cstring/
|
If you like SEOmastering Site, you can support it by - BTC: bc1qppjcl3c2cyjazy6lepmrv3fh6ke9mxs7zpfky0 , TRC20 and more...