Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Is strcpy dangerous and what should be used instead?
I heard rumours that the strcpy
function is dangerous and shouldn't be used. Supposedly it can be exploited to create buffer overflows somehow.
And indeed when I compile my C code in the admittedly non-conforming Visual Studio C compiler, it warns me about using strcpy
among other functions, labelling it dangerous. Is it true?
If so, what should we use instead? Some rumours say that strncpy
should be used instead, others like Microsoft tell me to use strcpy_s
.
2 answers
Summary (TL;DR)
-
Using
strcpy
directly on non-sanitized user input is bad, otherwise it's fine. -
strncpy
is a dangerous function that should be avoided. Its presence in your source is a much greater danger than buffer overruns. -
If portability and backwards-compatibility are no concerns, then there's nothing wrong with using
strcpy_s
, given that the function is available.
What is a buffer overflow/overrun exploit?
Long time ago, Microsoft did a study/article (I can't find the link, seems MS removed it from their site) where they analysed hacks and exploits, to see which functions were most often exploited by hackers. They looked at a broad range of functions, not just standard library ones but Microsoft-specific and POSIX ones too. They found that strcpy
is often exploited when it is used directly on raw user input.
Old school "buffer exploits" use various command-line input or command-line arguments to provide more data than the input buffer of the program was designed for. This could in the easiest form be abused to simply crash the program.
The more sinister hacks would however rather disassemble the target executable, finding out where exactly on the stack something like a return address was stored, then use the buffer exploit to overwrite that particular location. You could then sneak in something like the address to some location at the bottom of the executable, where you have injected your potentially malicious program.
So if the application programmer just merrily strcpy
some provided argv
command line argument into a 100 bytes large stack-allocated buffer, and there's a return address sitting on the stack 5 bytes further down, then the hacker would provide those extra bytes to overwrite that address.
Is strcpy
dangerous?
Based on this, Microsoft naively made the wrong conclusion that the strcpy
function is dangerous, since it was a recurring function abused by a lot of such exploits. For example if you don't provide a null terminated string, the function will just keep on going, copying beyond array bounds.
They came to the conclusion that this was the fault of strcpy
since it doesn't check the amount of characters to copy. After which they listed strcpy
as deprecated and dangerous. They started to lobby for alternative non-standard functions invented by themselves, such as strcpy_s
.
However, the actual problem isn't strcpy
but programmers who don't sanitize their program input. This could be done with functions like fgets
or memchr
where you can set a fixed size, then only copy as much as the set limit allows. In case of strings you can then parse the input to verify that it contains a null terminator, all before you label the user input as a valid C string. strcpy_s
works in a similar manner, taking a size and stopping upon encountering a null terminator.
If you know that the C string is in fact null terminated and proper, then there is no harm in calling strcpy
- it is perfectly safe and likely quite efficient. From an old answer of mine at another site:
There is nothing wrong with the
strcpy()
function, that's a myth. This function has existed for some 30-40 years and every little bit of it is properly documented. So what the function does and what it does not should not come as a surprise, even to beginner C programmers.What
strcpy
does and does not:
- It copies a null-terminated string into another memory location.
- It does not take any responsibility for error handling.
- It does not fix bugs in the caller application.
- It does not take any responsibility for educating C programmers.
Because of the last remark above, you must know the following before calling
strcpy
:
- If you pass a string of unknown length to strcpy, without checking its length in advance, you have a bug in the caller application.
- If you pass some chunk of data which does not end with \0, you have a bug in the caller application.
- If you pass two pointers to
strcpy()
, which point at memory locations that overlap, you invoke undefined behavior. Meaning you have a bug in the caller application.
Summary: using strcpy
directly on non-sanitized user input is bad, otherwise it's fine.
What about strncpy
?
Somewhere at the time when Microsoft flagged strcpy
as obsolete and dangerous, some other misguided rumour started. This nasty rumour said that strncpy
should be used as a safer version of strcpy
. Since it takes the size as parameter and it's already part of the C standard lib, so it's portable. This seemed very convenient - spread the word, forget about non-standard strcpy_s
, lets use strncpy
! No, this is not a good idea...
Looking at the history of strncpy
, it goes back to the very earliest days of Unix, where several string formats co-existed. Something called "fixed width strings" existed - they were not null terminated but came with a fixed size stored together with the string. One of the things Dennis Ritchie (the inventor of the C language) wished to avoid when creating C, was to store the size together with arrays [The Development of the C Language, Dennis M. Ritchie]. Likely in the same spirit as this, the "fixed width strings" were getting phased out over time, in favour for null terminated ones.
The function used to copy these old fixed width strings was named strncpy
. This is the sole purpose that it was created for. It has no relation to strcpy
. In particular it was never intended to be some more secure version - computer program security wasn't even invented when these functions were made.
Somehow strncpy
still made it into the first C standard in 1989. A whole lot of highly questionable functions did - the reason was always backwards compatibility. We can also read the story about strncpy
in the C99 rationale 7.21.2.4:
The strncpy function strncpy was initially introduced into the C library to deal with fixed-length name fields in structures such as directory entries. Such fields are not used in the same way as strings: the trailing null is unnecessary for a maximum-length field, and setting trailing bytes for shorter 5 names to null assures efficient field-wise comparisons. strncpy is not by origin a “bounded strcpy,” and the Committee preferred to recognize existing practice rather than alter the function to better suit it to such use.
This is where it starts to smell fishy. "The trailing null is unnecessary"? Yet somewhere on the way to standardization, they made strncpy
stop upon encountering null termination. But what if it doesn't? That's where the function becomes wildly dangerous. From the C standard (ISO 9899:2018) 7.24.2.4 we can read:
char *strncpy(char * restrict s1,
const char * restrict s2,
size_t n);
If the array pointed to by s2 is a string that is shorter than n characters, null characters are appended to the copy in the array pointed to by s1, until n characters in all have been written.
If it is shorter... uh-oh. Else go haywire and don't null terminate the string.
Now how do programmers usually and most naturally call this supposed safe function? Like most other functions - by passing along the buffer size. Like in this little program:
#include <string.h>
#include <stdio.h>
#define n 11
int main()
{
char str[n];
char src[] = "hello world eat deadbeef";
strncpy(str, src, n);
puts(str);
return 0;
}
This prints hello world
when I try it on Windows (gcc/mingw x86_64). But there is undefined behavior... when I try it on gcc Linux x86_64, I get hello worldhello world eat deadbeef
. Simply because the strncpy()
call doesn't store the null terminator, since there was no room - the source string is much longer than the destination. n-1
won't solve it either. We have to stomp in and manually null terminate it. This is all very unintuitive and strncpy
was never intended to be used in this manner in the first place.
Summary: strncpy
is a dangerous function that should be avoided. Its presence in your source is a much greater danger than buffer overruns.
What about strcpy_s
?
Originally released as a non-standard function by Microsoft, it comes with a size parameter. strcpy_s
returns an error code if it fails, rather than a pointer. You'll need to check this error code.
Using this function is however the wrong solution to the problem of no input sanitation, so it is dubious which problem this function was supposed to solve in the first place.
Later on somehow, all of these _s
functions made it into an optional library of the C standard "C11", the so called "Annex K bounds-checking interface". They were first introduced by a pre-study technical report known as TR 24731-1. But even to this day, this library is barely implemented by any C compiler - it is barely implemented in Microsoft Visual Studio even though they invented most of it. Annex K is not always compatible with the Microsoft functions using the same names.
Overall, the "bounds-checking interface" was a big fiasco. Experts from within the C standard committee itself filed some valid criticism against the library here. They address problems with strcpy_s
specifically in the report. Most notably, switching out strcpy
for strcpy_s
in existing code comes with numerous pitfalls.
So while strcpy_s
might be safer than strcpy
in some special cases (and most certainly safer than strncpy
) it suffers from portability and compatibility concerns. It should be regarded just like any system-specific API function and can't be assumed to be portable.
Summary: if portability and backwards-compatibility are no concerns, then there's nothing wrong with using strcpy_s
, given that the function is available.
What about other similar functions: memcpy
? strncpy_s
? strlcpy
?
memcpy
is always preferred when you know the size in advance. It's always faster than strcpy
. It is safe and portable.
There exists various other "safe" versions in the criticised "bounds-checking interface", including strncpy_s
which fixes the null termination problem mentioned earlier.
The strlcpy
etc functions originate from BSD/Unix and are basically the non-standard Unix equivalents to the non-standard Microsoft ones. And similarly, strlcpy
etc are fine to use if portability is not a concern.
There are lots of subtle details and difference between all of these functions, I won't go into details here.
New & exciting as per the C23 standard: memccpy
. Works just like memcpy
but stops if it finds a certain given value. So it can actually be used as an 100% equivalent to strncpy
, minus the dangerous str
prefix. You have to take care of the null termination manually.
EDIT:
While it didn't find the original Microsoft article, I did find an old related one here: Security Development Lifecycle (SDL) Banned Function Calls. Notably, Microsoft also raises the same valid concerns against strncpy
etc as I do above - Microsoft is likely innocent of the rumour that strncpy
is a safe verion of strcpy
.
strcpy(3) can be safe. Some compilers, such as GCC and Clang, use a feature test macro, _FORTIFY_SOURCE
, (see feature_test_macros(7) https://man7.org/linux/man-pages/man7/feature_test_macros.7.html), to ask the compiler to add some checks to make sure that buffer overflow doesn't happen.
If the bug is detected at compile time, it will raise a warning. If the bug is detected at run time, it will abort(3) the program.
This is the simplest thing a programmer can do to prevent the problems that strcpy(3) can cause.
Truncating
Another way to avoid buffer overflow is truncating the string. This adds complexity to the code, which can itself cause more bugs. In general, if you can use _FORTIFY_SOURCE
, it's preferred.
If you do this, you need to:
- Specify the limiting size.
- Check the return value of the functions, to detect truncation.
- Do something if you detect truncation!
The last step is usually neglected, causing second-order bugs. Continuing the program with a truncated string can be very dangerous too.
If you really need to do this, there's no standard function in ISO C. In POSIX, strlcpy(3) and strlcat(3) will be added soon (in Issue 8, POSIX.1-202x). If you need to chain several such calls, strlcat(3) is hard to use (you need to check the return value after every call); you may want to use stpecpy(3), which you'll need to write yourself (see man 3 stpecpy
https://man.archlinux.org/man/stpecpy.3 for a simple implementation). In case your system doesn't provide strlcpy(3), you may also want to write your own stpecpy(3) implementation.
2 comment threads