Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Post History

81%
+7 −0
Q&A Common string handling pitfalls in C programming

The reader is assumed to understand how arrays and pointers work in C. You cannot understand pointers before you understand arrays, and you cannot understand strings before you understand pointer...

posted 3y ago by Lundin‭  ·  edited 1mo ago by Karl Knechtel‭

Answer
#3: Post edited by user avatar Karl Knechtel‭ · 2024-11-13T20:39:43Z (about 1 month ago)
Format with details blocks for an at-a-glance view of the combined "FAQ".
  • _The reader is assumed to understand how arrays and pointers work in C. You cannot understand pointers before you understand arrays, and you cannot understand strings before you understand pointers and arrays both. Decent C books therefore teaches arrays, then pointers, then strings, in that order._
  • Some of the below text was taken from an original article written by me [here](https://stackoverflow.com/a/58526132/584518) on Stack Overflow.
  • ---
  • > **Q: Does C have a string class?
  • > A: No it does not and that string class (which C doesn't have) is not `char`.**
  • Therefore bug 1) will not compile. You cannot assign a string to a single character, because a single character is what it sounds like, a single letter.
  • In C you have to handle everything manually: allocation, assignment, copies, comparisons. There is a standard library `string.h` which does contain some helpful functions though.
  • ---
  • > **Q: What exactly does a string consist of in C?
  • > A: A C string is a character array that ends with a null terminator.**
  • All characters have a symbol table value. The null terminator is the symbol value `0` (zero). It is used to mark the end of a string. This is necessary since the size of the string isn't stored anywhere. Therefore, every time you allocate room for a string, you must include sufficient space for the null terminator character.
  • Bug 2) does not do this, it only allocates room for the 5 characters of `"hello"`. Correct code should be:
  • char str[6] = "hello";
  • Or equivalently, you can write self-documenting code for 5 characters plus 1 null terminator:
  • char str[5+1] = "hello";
  • But you can also use this and let the compiler do the counting and pick the size:
  • char str[] = "hello"; // Will allocate 6 bytes automatically
  • If you don't append a null terminator at the end of a string, then library functions expecting a string won't work properly and you will get "undefined behavior" bugs such as garbage output or program crashes. That's what happens if you attempt to print the string in Bug 2).
  • The most common way to write a null terminator character in C is by using a so-called "octal escape sequence", looking like this: `'\0'`. This is 100% equivalent to writing `0`, but the `\` serves as self-documenting code to state that the zero is explicitly meant to be a null terminator. Code such as `if(str[i] == '\0')` will check if the specific character is the null terminator.
  • So you can even do the above examples explicitly, character by character:
  • char str[6] = {'h', 'e', 'l', 'l', 'o', '\0'};
  • Please note that the term null terminator has nothing to do with null pointers or the `NULL` macro! This can be confusing - very similar names but very different meanings. This is why the null terminator is sometimes referred to as `NUL` with one L, not to be confused with `NULL` or null pointers.
  • The `"hello"` part in Bug 2) is called a _string literal_. This is to be regarded as a read-only string. The `""` syntax means that the compiler will append a null terminator in the end of the string literal automatically. So if you print out `sizeof("hello")` you will get 6, not 5, because you get the size of the array including a null terminator.
  • ---
  • > **Q: Who is responsible for allocating memory for the string?
  • > A: You are - the C programmer.**
  • That's why Bug 3) causes a crash, you cannot just store a string where an uninitialized pointer points at. It needs to point at valid, allocated memory.
  • So you need to allocate an array somewhere, sufficiently large to hold the string, including null termination. You could do this as a local character array as in the above examples, or you can do this by determining the size in run-time.
  • When allocating memory for a string dynamically in run-time, remember to also allocate room for the null terminator:
  • char input[n] = ... ;
  • ...
  • char* str = malloc(strlen(input) + 1);
  • Notably, this array also has to be read/write memory. If we do something like
  • `char* str = "hello"; str[0] = 'a';`
  • then it compiles just fine but crashes in run-time. This is because the string literal `"hello"` is a read-only memory, null-terminated character array stored by the compiler in specialized read-only memory.
  • You can use string literals just as strings, but you can never write to them. Therefore it is strongly recommended to only point at them with a pointer to read-only data:
  • `const char* str = "hello";`.
  • This pointer can however (unlike a pointer to dynamic memory, see Bug 4)) be safely set to point at a different string literal, so when dealing with a lot of string look-ups, an array of pointers to `const char` might be a sensible choice.
  • ---
  • > **Q: How can a string get assigned a new value?
  • > A: Either during initialization or through `strcpy()`.**
  • The above examples show various different ways to create a string by allocating an array or by having a pointer point at a string allocated elsewhere. But if you need to change this string in run-time, you can't just write `str = "new value"`.
  • In case `str` in that example is an array, then it won't work because C simply wasn't designed to do assignment to arrays in run-time. In case `str` is a pointer, then it will work by having `str` point at a string literal as previously explained.
  • But it will forget all about where it previously pointed - if it for example previously pointed at dynamically allocated memory like in Bug 4), then we have a memory leak.
  • The normal way to assign a value to a string in run-time is to use the `strcpy` function (which is a perfectly safe function, see [Is strcpy dangerous and what should be used instead?](https://software.codidact.com/posts/281518) ). It works as `strcpy(destination, source)`, where destination must be a valid memory area holding a large-enough character array. For details see [man strcpy](https://man7.org/linux/man-pages/man3/strcpy.3.html).
  • ---
  • > **Q: How do you properly compare strings?
  • > A: By comparing them character by character, usually done with strcmp().**
  • Code such as Bug 5) with the `==` equality operator, won't work because it doesn't compare the contents of the strings, just their addresses. So Bug 5) is just comparing the address of a local character array with the address of a string literal, which is nonsense.
  • Instead, the character arrays have to be compared character by character. Note that they can have different lengths too, so one needs to check for the null terminator of either character array while iterating through them.
  • The `strcmp()` function does all this in an efficient manner, so the easiest and most correct solution is just to call that one. It works as `strcmp(first_string, second_string)` and returns a value less than 0, larger than zero or zero, if the first string is considered less than, more than or equal to the second string. The `strcmp` implementation will likely just compare symbol values of the characters, so "less than" might mean alphabetically, though without care taken of things like lower/upper case, digits or punctuation. See [man strcmp](https://man7.org/linux/man-pages/man3/strcmp.3.html) for details.
  • <section class="notice is-warning">
  • *The reader is assumed to understand how arrays and pointers work in C. You cannot understand pointers before you understand arrays, and you cannot understand strings before you understand pointers and arrays both. Decent C books therefore teaches arrays, then pointers, then strings, in that order.*
  • Some of the below text was taken from an original article written by me [here](https://stackoverflow.com/a/58526132/584518) on Stack Overflow.
  • </section>
  • ---
  • > **Does C have a string class?**
  • <details><summary>No, and `char` is definitely not such a class.</summary>
  • Therefore the code shown will not compile. You cannot assign a string to a single character, because a single character is what it sounds like, a single letter.
  • In C you have to handle everything manually: allocation, assignment, copies, comparisons. There is a standard library `string.h` which does contain some helpful functions though.
  • </details>
  • > **What exactly does a string consist of in C?**
  • <details><summary>A C string is a character array that ends with a null terminator.</summary>
  • All characters have a symbol table value. The null terminator is the symbol value `0` (zero). It is used to mark the end of a string. This is necessary since the size of the string isn't stored anywhere. Therefore, every time you allocate room for a string, you must include sufficient space for the null terminator character.
  • The example code does not do this, it only allocates room for the 5 characters of `"hello"`. Correct code should be:
  • char str[6] = "hello";
  • Or equivalently, you can write self-documenting code for 5 characters plus 1 null terminator:
  • char str[5+1] = "hello";
  • But you can also use this and let the compiler do the counting and pick the size:
  • char str[] = "hello"; // Will allocate 6 bytes automatically
  • If you don't append a null terminator at the end of a string, then library functions expecting a string won't work properly and you will get "undefined behavior" bugs such as garbage output or program crashes. That's what happens if you attempt to print the string in Bug 2).
  • The most common way to write a null terminator character in C is by using a so-called "octal escape sequence", looking like this: `'\0'`. This is 100% equivalent to writing `0`, but the `\` serves as self-documenting code to state that the zero is explicitly meant to be a null terminator. Code such as `if(str[i] == '\0')` will check if the specific character is the null terminator.
  • So you can even do the above examples explicitly, character by character:
  • char str[6] = {'h', 'e', 'l', 'l', 'o', '\0'};
  • Please note that the term null terminator has nothing to do with null pointers or the `NULL` macro! This can be confusing - very similar names but very different meanings. This is why the null terminator is sometimes referred to as `NUL` with one L, not to be confused with `NULL` or null pointers.
  • The `"hello"` part of the code is called a _string literal_. This is to be regarded as a read-only string. The `""` syntax means that the compiler will append a null terminator in the end of the string literal automatically. So if you print out `sizeof("hello")` you will get 6, not 5, because you get the size of the array including a null terminator.
  • </details>
  • > **Who is responsible for allocating memory for the string?**
  • <details><summary>You are - the C programmer.</summary>
  • That's why the code crashes: you cannot just store a string where an uninitialized pointer points at. It needs to point at valid, allocated memory.
  • So you need to allocate an array somewhere, sufficiently large to hold the string, including null termination. You could do this as a local character array as in the above examples, or you can do this by determining the size in run-time.
  • When allocating memory for a string dynamically in run-time, remember to also allocate room for the null terminator:
  • char input[n] = ... ;
  • ...
  • char* str = malloc(strlen(input) + 1);
  • Notably, this array also has to be read/write memory. If we do something like
  • `char* str = "hello"; str[0] = 'a';`
  • then it compiles just fine but crashes in run-time. This is because the string literal `"hello"` is a read-only memory, null-terminated character array stored by the compiler in specialized read-only memory.
  • You can use string literals just as strings, but you can never write to them. Therefore it is strongly recommended to only point at them with a pointer to read-only data:
  • `const char* str = "hello";`.
  • This pointer can however (unlike a pointer to dynamic memory, see Bug 4)) be safely set to point at a different string literal, so when dealing with a lot of string look-ups, an array of pointers to `const char` might be a sensible choice.
  • </details>
  • > **How can a string get assigned a new value?**
  • <details><summary>Either during initialization or by modifying the pointed-at memory.</summary>
  • The above examples show various different ways to create a string by allocating an array or by having a pointer point at a string allocated elsewhere. But if you need to change this string in run-time, you can't just write `str = "new value"`.
  • In case `str` in that example is an array, then it won't work because C simply wasn't designed to do assignment to arrays in run-time. In case `str` is a pointer, then it will work by having `str` point at a string literal as previously explained.
  • But it will forget all about where it previously pointed - if it for example previously pointed at dynamically allocated memory like in Bug 4), then we have a memory leak.
  • The normal way to assign a value to a string in run-time is to use the `strcpy` function (which is a perfectly safe function, see [Is strcpy dangerous and what should be used instead?](https://software.codidact.com/posts/281518) ). It works as `strcpy(destination, source)`, where destination must be a valid memory area holding a large-enough character array. For details see [man strcpy](https://man7.org/linux/man-pages/man3/strcpy.3.html).
  • </details>
  • > **How do you properly compare strings?**
  • <details><summary>Character by character.</summary>
  • Code such as Bug 5) with the `==` equality operator, won't work because it doesn't compare the contents of the strings, just their addresses. So Bug 5) is just comparing the address of a local character array with the address of a string literal, which is nonsense.
  • Instead, the character arrays have to be compared character by character. Note that they can have different lengths too, so one needs to check for the null terminator of either character array while iterating through them.
  • The `strcmp()` function does all this in an efficient manner, so the easiest and most correct solution is just to call that one. It works as `strcmp(first_string, second_string)` and returns a value less than 0, larger than zero or zero, if the first string is considered less than, more than or equal to the second string. The `strcmp` implementation will likely just compare symbol values of the characters, so "less than" might mean alphabetically, though without care taken of things like lower/upper case, digits or punctuation. See [man strcmp](https://man7.org/linux/man-pages/man3/strcmp.3.html) for details.
  • </details>
#2: Post edited by user avatar Lundin‭ · 2021-11-12T11:28:51Z (about 3 years ago)
  • _The reader is assumed to understand how arrays and pointers work in C. You cannot understand pointers before you understand arrays, and you cannot understand strings before you understand pointers and arrays both. Decent C books therefore teaches arrays, then pointers, then strings, in that order._
  • Some of the below text was taken by an original article written by me [here](https://stackoverflow.com/a/58526132/584518) on Stack Overflow.
  • ---
  • > **Q: Does C have a string class?
  • > A: No it does not and that string class (which C doesn't have) is not `char`.**
  • Therefore bug 1) will not compile. You cannot assign a string to a single character, because a single character is what it sounds like, a single letter.
  • In C you have to handle everything manually: allocation, assignment, copies, comparisons. There is a standard library `string.h` which does contain some helpful functions though.
  • ---
  • > **Q: What exactly does a string consist of in C?
  • > A: A C string is a character array that ends with a null terminator.**
  • All characters have a symbol table value. The null terminator is the symbol value `0` (zero). It is used to mark the end of a string. This is necessary since the size of the string isn't stored anywhere. Therefore, every time you allocate room for a string, you must include sufficient space for the null terminator character.
  • Bug 2) does not do this, it only allocates room for the 5 characters of `"hello"`. Correct code should be:
  • char str[6] = "hello";
  • Or equivalently, you can write self-documenting code for 5 characters plus 1 null terminator:
  • char str[5+1] = "hello";
  • But you can also use this and let the compiler do the counting and pick the size:
  • char str[] = "hello"; // Will allocate 6 bytes automatically
  • If you don't append a null terminator at the end of a string, then library functions expecting a string won't work properly and you will get "undefined behavior" bugs such as garbage output or program crashes. That's what happens if you attempt to print the string in Bug 2).
  • The most common way to write a null terminator character in C is by using a so-called "octal escape sequence", looking like this: `'\0'`. This is 100% equivalent to writing `0`, but the `\` serves as self-documenting code to state that the zero is explicitly meant to be a null terminator. Code such as `if(str[i] == '\0')` will check if the specific character is the null terminator.
  • So you can even do the above examples explicitly, character by character:
  • char str[6] = {'h', 'e', 'l', 'l', 'o', '\0'};
  • Please note that the term null terminator has nothing to do with null pointers or the `NULL` macro! This can be confusing - very similar names but very different meanings. This is why the null terminator is sometimes referred to as `NUL` with one L, not to be confused with `NULL` or null pointers.
  • The `"hello"` part in Bug 2) is called a _string literal_. This is to be regarded as a read-only string. The `""` syntax means that the compiler will append a null terminator in the end of the string literal automatically. So if you print out `sizeof("hello")` you will get 6, not 5, because you get the size of the array including a null terminator.
  • ---
  • > **Q: Who is responsible for allocating memory for the string?
  • > A: You are - the C programmer.**
  • That's why Bug 3) causes a crash, you cannot just store a string where an uninitialized pointer points at. It needs to point at valid, allocated memory.
  • So you need to allocate an array somewhere, sufficiently large to hold the string, including null termination. You could do this as a local character array as in the above examples, or you can do this by determining the size in run-time.
  • When allocating memory for a string dynamically in run-time, remember to also allocate room for the null terminator:
  • char input[n] = ... ;
  • ...
  • char* str = malloc(strlen(input) + 1);
  • Notably, this array also has to be read/write memory. If we do something like
  • `char* str = "hello"; str[0] = 'a';`
  • then it compiles just fine but crashes in run-time. This is because the string literal `"hello"` is a read-only memory, null-terminated character array stored by the compiler in specialized read-only memory.
  • You can use string literals just as strings, but you can never write to them. Therefore it is strongly recommended to only point at them with a pointer to read-only data:
  • `const char* str = "hello";`.
  • This pointer can however (unlike a pointer to dynamic memory, see Bug 4)) be safely set to point at a different string literal, so when dealing with a lot of string look-ups, an array of pointers to `const char` might be a sensible choice.
  • ---
  • > **Q: How can a string get assigned a new value?
  • > A: Either during initialization or through `strcpy()`.**
  • The above examples show various different ways to create a string by allocating an array or by having a pointer point at a string allocated elsewhere. But if you need to change this string in run-time, you can't just write `str = "new value"`.
  • In case `str` in that example is an array, then it won't work because C simply wasn't designed to do assignment to arrays in run-time. In case `str` is a pointer, then it will work by having `str` point at a string literal as previously explained.
  • But it will forget all about where it previously pointed - if it for example previously pointed at dynamically allocated memory like in Bug 4), then we have a memory leak.
  • The normal way to assign a value to a string in run-time is to use the `strcpy` function (which is a perfectly safe function, see [Is strcpy dangerous and what should be used instead?](https://software.codidact.com/posts/281518) ). It works as `strcpy(destination, source)`, where destination must be a valid memory area holding a large-enough character array. For details see [man strcpy](https://man7.org/linux/man-pages/man3/strcpy.3.html).
  • ---
  • > **Q: How do you properly compare strings?
  • > A: By comparing them character by character, usually done with strcmp().**
  • Code such as Bug 5) with the `==` equality operator, won't work because it doesn't compare the contents of the strings, just their addresses. So Bug 5) is just comparing the address of a local character array with the address of a string literal, which is nonsense.
  • Instead, the character arrays have to be compared character by character. Note that they can have different lengths too, so one needs to check for the null terminator of either character array while iterating through them.
  • The `strcmp()` function does all this in an efficient manner, so the easiest and most correct solution is just to call that one. It works as `strcmp(first_string, second_string)` and returns a value less than 0, larger than zero or zero, if the first string is considered less than, more than or equal to the second string. The `strcmp` implementation will likely just compare symbol values of the characters, so "less than" might mean alphabetically, though without care taken of things like lower/upper case, digits or punctuation. See [man strcmp](https://man7.org/linux/man-pages/man3/strcmp.3.html) for details.
  • _The reader is assumed to understand how arrays and pointers work in C. You cannot understand pointers before you understand arrays, and you cannot understand strings before you understand pointers and arrays both. Decent C books therefore teaches arrays, then pointers, then strings, in that order._
  • Some of the below text was taken from an original article written by me [here](https://stackoverflow.com/a/58526132/584518) on Stack Overflow.
  • ---
  • > **Q: Does C have a string class?
  • > A: No it does not and that string class (which C doesn't have) is not `char`.**
  • Therefore bug 1) will not compile. You cannot assign a string to a single character, because a single character is what it sounds like, a single letter.
  • In C you have to handle everything manually: allocation, assignment, copies, comparisons. There is a standard library `string.h` which does contain some helpful functions though.
  • ---
  • > **Q: What exactly does a string consist of in C?
  • > A: A C string is a character array that ends with a null terminator.**
  • All characters have a symbol table value. The null terminator is the symbol value `0` (zero). It is used to mark the end of a string. This is necessary since the size of the string isn't stored anywhere. Therefore, every time you allocate room for a string, you must include sufficient space for the null terminator character.
  • Bug 2) does not do this, it only allocates room for the 5 characters of `"hello"`. Correct code should be:
  • char str[6] = "hello";
  • Or equivalently, you can write self-documenting code for 5 characters plus 1 null terminator:
  • char str[5+1] = "hello";
  • But you can also use this and let the compiler do the counting and pick the size:
  • char str[] = "hello"; // Will allocate 6 bytes automatically
  • If you don't append a null terminator at the end of a string, then library functions expecting a string won't work properly and you will get "undefined behavior" bugs such as garbage output or program crashes. That's what happens if you attempt to print the string in Bug 2).
  • The most common way to write a null terminator character in C is by using a so-called "octal escape sequence", looking like this: `'\0'`. This is 100% equivalent to writing `0`, but the `\` serves as self-documenting code to state that the zero is explicitly meant to be a null terminator. Code such as `if(str[i] == '\0')` will check if the specific character is the null terminator.
  • So you can even do the above examples explicitly, character by character:
  • char str[6] = {'h', 'e', 'l', 'l', 'o', '\0'};
  • Please note that the term null terminator has nothing to do with null pointers or the `NULL` macro! This can be confusing - very similar names but very different meanings. This is why the null terminator is sometimes referred to as `NUL` with one L, not to be confused with `NULL` or null pointers.
  • The `"hello"` part in Bug 2) is called a _string literal_. This is to be regarded as a read-only string. The `""` syntax means that the compiler will append a null terminator in the end of the string literal automatically. So if you print out `sizeof("hello")` you will get 6, not 5, because you get the size of the array including a null terminator.
  • ---
  • > **Q: Who is responsible for allocating memory for the string?
  • > A: You are - the C programmer.**
  • That's why Bug 3) causes a crash, you cannot just store a string where an uninitialized pointer points at. It needs to point at valid, allocated memory.
  • So you need to allocate an array somewhere, sufficiently large to hold the string, including null termination. You could do this as a local character array as in the above examples, or you can do this by determining the size in run-time.
  • When allocating memory for a string dynamically in run-time, remember to also allocate room for the null terminator:
  • char input[n] = ... ;
  • ...
  • char* str = malloc(strlen(input) + 1);
  • Notably, this array also has to be read/write memory. If we do something like
  • `char* str = "hello"; str[0] = 'a';`
  • then it compiles just fine but crashes in run-time. This is because the string literal `"hello"` is a read-only memory, null-terminated character array stored by the compiler in specialized read-only memory.
  • You can use string literals just as strings, but you can never write to them. Therefore it is strongly recommended to only point at them with a pointer to read-only data:
  • `const char* str = "hello";`.
  • This pointer can however (unlike a pointer to dynamic memory, see Bug 4)) be safely set to point at a different string literal, so when dealing with a lot of string look-ups, an array of pointers to `const char` might be a sensible choice.
  • ---
  • > **Q: How can a string get assigned a new value?
  • > A: Either during initialization or through `strcpy()`.**
  • The above examples show various different ways to create a string by allocating an array or by having a pointer point at a string allocated elsewhere. But if you need to change this string in run-time, you can't just write `str = "new value"`.
  • In case `str` in that example is an array, then it won't work because C simply wasn't designed to do assignment to arrays in run-time. In case `str` is a pointer, then it will work by having `str` point at a string literal as previously explained.
  • But it will forget all about where it previously pointed - if it for example previously pointed at dynamically allocated memory like in Bug 4), then we have a memory leak.
  • The normal way to assign a value to a string in run-time is to use the `strcpy` function (which is a perfectly safe function, see [Is strcpy dangerous and what should be used instead?](https://software.codidact.com/posts/281518) ). It works as `strcpy(destination, source)`, where destination must be a valid memory area holding a large-enough character array. For details see [man strcpy](https://man7.org/linux/man-pages/man3/strcpy.3.html).
  • ---
  • > **Q: How do you properly compare strings?
  • > A: By comparing them character by character, usually done with strcmp().**
  • Code such as Bug 5) with the `==` equality operator, won't work because it doesn't compare the contents of the strings, just their addresses. So Bug 5) is just comparing the address of a local character array with the address of a string literal, which is nonsense.
  • Instead, the character arrays have to be compared character by character. Note that they can have different lengths too, so one needs to check for the null terminator of either character array while iterating through them.
  • The `strcmp()` function does all this in an efficient manner, so the easiest and most correct solution is just to call that one. It works as `strcmp(first_string, second_string)` and returns a value less than 0, larger than zero or zero, if the first string is considered less than, more than or equal to the second string. The `strcmp` implementation will likely just compare symbol values of the characters, so "less than" might mean alphabetically, though without care taken of things like lower/upper case, digits or punctuation. See [man strcmp](https://man7.org/linux/man-pages/man3/strcmp.3.html) for details.
#1: Initial revision by user avatar Lundin‭ · 2021-11-12T11:22:10Z (about 3 years ago)
_The reader is assumed to understand how arrays and pointers work in C. You cannot understand pointers before you understand arrays, and you cannot understand strings before you understand pointers and arrays both. Decent C books therefore teaches arrays, then pointers, then strings, in that order._

Some of the below text was taken by an original article written by me [here](https://stackoverflow.com/a/58526132/584518) on Stack Overflow.

---

> **Q: Does C have a string class?  
> A: No it does not and that string class (which C doesn't have) is not `char`.**

Therefore bug 1) will not compile. You cannot assign a string to a single character, because a single character is what it sounds like, a single letter.

In C you have to handle everything manually: allocation, assignment, copies, comparisons. There is a standard library `string.h` which does contain some helpful functions though.

---

> **Q: What exactly does a string consist of in C?  
> A: A C string is a character array that ends with a null terminator.**

All characters have a symbol table value. The null terminator is the symbol value `0` (zero). It is used to mark the end of a string. This is necessary since the size of the string isn't stored anywhere. Therefore, every time you allocate room for a string, you must include sufficient space for the null terminator character.

Bug 2) does not do this, it only allocates room for the 5 characters of `"hello"`. Correct code should be:

    char str[6] = "hello";

Or equivalently, you can write self-documenting code for 5 characters plus 1 null terminator: 

    char str[5+1] = "hello";

But you can also use this and let the compiler do the counting and pick the size:

    char str[] = "hello"; // Will allocate 6 bytes automatically

If you don't append a null terminator at the end of a string, then library functions expecting a string won't work properly and you will get "undefined behavior" bugs such as garbage output or program crashes. That's what happens if you attempt to print the string in Bug 2).

The most common way to write a null terminator character in C is by using a so-called "octal escape sequence", looking like this: `'\0'`. This is 100% equivalent to writing `0`, but the `\` serves as self-documenting code to state that the zero is explicitly meant to be a null terminator. Code such as `if(str[i] == '\0')` will check if the specific character is the null terminator.

So you can even do the above examples explicitly, character by character:

    char str[6] = {'h', 'e', 'l', 'l', 'o', '\0'};

Please note that the term null terminator has nothing to do with null pointers or the `NULL` macro! This can be confusing - very similar names but very different meanings. This is why the null terminator is sometimes referred to as `NUL` with one L, not to be confused with `NULL` or null pointers. 

The `"hello"` part in Bug 2) is called a _string literal_. This is to be regarded as a read-only string. The `""` syntax means that the compiler will append a null terminator in the end of the string literal automatically. So if you print out `sizeof("hello")` you will get 6, not 5, because you get the size of the array including a null terminator.

---

> **Q: Who is responsible for allocating memory for the string?  
> A: You are - the C programmer.**

That's why Bug 3) causes a crash, you cannot just store a string where an uninitialized pointer points at. It needs to point at valid, allocated memory.

So you need to allocate an array somewhere, sufficiently large to hold the string, including null termination. You could do this as a local character array as in the above examples, or you can do this by determining the size in run-time. 

When allocating memory for a string dynamically in run-time, remember to also allocate room for the null terminator:

    char input[n] = ... ;
    ...
    char* str = malloc(strlen(input) + 1);

Notably, this array also has to be read/write memory. If we do something like  
`char* str = "hello"; str[0] = 'a';` 

then it compiles just fine but crashes in run-time. This is because the string literal `"hello"` is a read-only memory, null-terminated character array stored by the compiler in specialized read-only memory. 

You can use string literals just as strings, but you can never write to them. Therefore it is strongly recommended to only point at them with a pointer to read-only data:  
`const char* str = "hello";`.  
This pointer can however (unlike a pointer to dynamic memory, see Bug 4)) be safely set to point at a different string literal, so when dealing with a lot of string look-ups, an array of pointers to `const char` might be a sensible choice.

---

> **Q: How can a string get assigned a new value?  
> A: Either during initialization or through `strcpy()`.**

The above examples show various different ways to create a string by allocating an array or by having a pointer point at a string allocated elsewhere. But if you need to change this string in run-time, you can't just write `str = "new value"`. 

In case `str` in that example is an array, then it won't work because C simply wasn't designed to do assignment to arrays in run-time. In case `str` is a pointer, then it will work by having `str` point at a string literal as previously explained. 
But it will forget all about where it previously pointed - if it for example previously pointed at dynamically allocated memory like in Bug 4), then we have a memory leak.

The normal way to assign a value to a string in run-time is to use the `strcpy` function (which is a perfectly safe function, see [Is strcpy dangerous and what should be used instead?](https://software.codidact.com/posts/281518) ). It works as `strcpy(destination, source)`, where destination must be a valid memory area holding a large-enough character array. For details see [man strcpy](https://man7.org/linux/man-pages/man3/strcpy.3.html).

---

> **Q: How do you properly compare strings?  
> A: By comparing them character by character, usually done with strcmp().**

Code such as Bug 5) with the `==` equality operator, won't work because it doesn't compare the contents of the strings, just their addresses. So Bug 5) is just comparing the address of a local character array with the address of a string literal, which is nonsense.

Instead, the character arrays have to be compared character by character. Note that they can have different lengths too, so one needs to check for the null terminator of either character array while iterating through them. 

The `strcmp()` function does all this in an efficient manner, so the easiest and most correct solution is just to call that one. It works as `strcmp(first_string, second_string)` and returns a value less than 0, larger than zero or zero, if the first string is considered less than, more than or equal to the second string. The `strcmp` implementation will likely just compare symbol values of the characters, so "less than" might mean alphabetically, though without care taken of things like lower/upper case, digits or punctuation. See [man strcmp](https://man7.org/linux/man-pages/man3/strcmp.3.html) for details.