Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
How to properly use malloc?
I have seen this construct quite a lot:
int *p = (int*) malloc(n * sizeof(int));
Is this how it should be? What is the proper way to use malloc
? I have also seen this:
int *p = malloc(n * sizeof *p);
Which one is preferable?
3 answers
Should we cast the result of malloc?
The cast to the intended type is not necessary in C, since during assignment, the void*
returned by malloc
can be implicitly converted to any other object pointer and vice versa (as per C17 6.5.16.1). So the cast does add clutter. However, there's nothing wrong with being explicit by types either, so it is mostly a matter of coding style.
There are some historical and compatibility reasons why you would/shouldn't cast:
- C++ doesn't allow implicit conversions with void pointers so the cast is necessary if C++ compatibility is desired. (However, using
malloc
in C++ is very questionable and dangerous, but that's another story.) - The old C90 standard supported dangerous implicit int default function declarations, so if you forgot to
#include <stdlib.h>
then added a cast to themalloc
result, you would hide the bug that the missing include introduced.
How should we write the size parameter to malloc?
There's several possible options here:
malloc( n * sizeof(int) )
malloc( sizeof(int[n]) )
malloc( n * sizeof *p )
calloc( n, sizeof(int) )
calloc( n, sizeof *p )
-
malloc( n * sizeof(int) )
has the advantage of being clear and explicit, this would be the most "old scool" way of usingmalloc
and probably the most common method taught in C books.The argument against this style is that in case we change the type of
p
, then themalloc
call turns incorrect. I always thought that this was a rather weak argument - because: yes if you only change part of a code without considering all of it, it will break. That goes for all C code. -
malloc( sizeof(int[n]) )
is very similar to the above one, except it is slightly more readable. It uses a VLA declaration though, so it's not 100% portable to old or exotic C implementations. -
malloc( n * sizeof *p )
uses a trick, namely that thesizeof
operand isn't evaluated for side-effects here, so we can actually de-reference a pointer to get the size of the pointed-at type, without the code actually executing that de-referencing (which would have been wildly undefined behavior).The argument in favour of this style is the opposite of the argument against the above ones - in case
p
changes type then themalloc
expression is still valid. That's nice. Arguments against this style will follow further below. -
The
calloc
versions are 100% equivalent to theirmalloc
counterparts, except it will also zero-out all memory. Meaningcalloc
is slower but it provides extra functionality.
Why sizeof *p
isn't some universally great style
The argument for using the *p
style doesn't scale well when we start dealing with more complex situations like pointer-to-pointers or 2D arrays. Consider this:
void failloc (int n, int p[5][5])
{
p = malloc(n * sizeof *p);
printf("Yay we allocated %zu bytes\n", n * sizeof *p);
}
When calling this code as failloc(1, arr)
then we expect 1*5*5*sizeof(int)
bytes = 100 bytes on 32 bit computer. However, it only allocates 1 * sizeof(int[5])
. That is, 20 bytes, which is of course wrong.
One may then argue and say hey, you picked the wrong type, it should have been int (*p)[5][5]
and the code will work. Yes indeed, if I change the type of p
it will start working. Now, what was the initial argument for using this style again?
In case an int**
was passed to a similar function, then again the style doesn't scale well, consider this (bad!) code:
int** failloc (int x, int y,)
{
int** p = malloc(x * sizeof *p);
for(int i=0; i<y; i++)
p[i] = malloc(y * sizeof **p);
return p;
}
We have to do **p
suddenly which is inconsistent. Now what if I change the type of p
to int*
? Hey, you can't go change the type of p
again, stop doing that!
Yet another scenario: flexible array members.
typedef struct
{
int foo;
int bar[];
} flexy;
flexy* p = malloc(sizeof *p + n * sizeof /* what? */ );
We can't write sizeof *bar
, there's no such identifier. We have to write sizeof *p->bar
. This simply isn't readable.
The only sensible way out is to combine the *p
style with one of the other two. Personally I'd probably have written that line as
malloc( sizeof *p + sizeof(int[n]) );
Or for those who insist on using the same style consistently:
malloc(sizeof(flexy) + n * sizeof(int) );
So overall I don't think the sizeof *p
trick is nearly as smart as people like to claim, it's nice in some situations, it doesn't work in other situations.
Overall the programmer must actually know what they are doing - no way around it. If you change the type of some variable, then you better review every single line in the code base using that variable - simple as that.
I don't think any of these styles come with any significant benefits or strong arguments for using one over the other - it is all very subjective. There is no obvious right or wrong style here.
TL;DR
You should use
int *p = malloc(n * sizeof *p);
for two reasons
- The cast
(int*)
is not necessary, which means it's clutter. - Using
sizeof *p
instead ofsizeof(int)
removes code duplication.
But remember to check if allocation succeeded before using the memory. It's done like this:
int *p = malloc(n * sizeof *p);
if(!p) {
// Handle error
}
Longer answer
1 - Casting
Some people argue that the cast makes it possible to compile the code with both a C compiler and a C++ compiler. While this is technically true and sometimes useful, it's not the typical use case. If you know that you want to be able to do this, then cast. A C++ compiler will throw a compiler error if you don't.
But in C, a void pointer (malloc returns a void pointer[1]) can safely be implicitly casted to any other pointer type and back. So it is completely safe to omit it. Note that this is true for void pointers in general. It's not special for malloc. It just happens to be the case that this discussion comes up a lot when talking about malloc.
2 - sizeof
Let's say you have this code:
int *p;
// Many lines of code
p = malloc(size1 * sizeof(int));
// More lines of code
p = malloc(size2 * sizeof(int));
Suddenly you realize that you have to change the type of p
to another pointer type. Will you remember to change everywhere? And are you sure you will not miss anything? Using sizeof *p
eliminates this problem. But do remember that sizeof p
is the size of the pointer, that is sizeof (int*)
. Mixing this up might give you annoying and hard traced bugs.
If you have a pointer to pointer to create a 2D structure, the pattern is like this:
int **p;
p = malloc(x * sizeof *p);
for(int i=0; i<x; i++)
p[i] = malloc(y * sizeof *p[i]);
If you're dealing with pointers to arrays, you might want to be a bit careful. Especially if they are arguments to functions. For instance, int a[5][5]
means different things if it is declared as an argument or in function body or global space. If declared in function body or global space, that will give you a two dimensional 5x5 array. But if declared in a function argument, it will be a pointer to one dimensional 5 array. The equivalent declaration is int (*a)[5]
.
There is also the case with flexible array members of a struct, but IMO it's pretty obvious that this method does not work flawlessly there. It also does not work for void pointers, because they cannot be dereferenced.
Read more about when this work and not here
More opinionated stuff
Other argue that "it's good habit" to add that extra check that the cast gives. It forces you to think one more time. I strongly disagree with this for several reasons.
Firstly, you very rarely do this for non-pointer types. This code looks completely ridiculous:
signed char x = (signed char)42;
long y = (long)x - (long)8;
Do note that these casts do have their uses. What I'm saying is that it is a bad thing to blindly throw them in everywhere without even knowing why or if it's needed.
Secondly, in C a cast typically means "I know what I'm doing". So if you're doing it wrong, you can actually HIDE a bug. The argument about forcing you to think again makes sense in C++, because it will not compile if you do it wrong.
-
In early C, before the 89 standard, there was no void pointer. Instead, a char pointer was used. That's over 30 years ago. ↩︎
3 comment threads
There are several things to consider when calling the malloc(3) family of functions:
-
nelem * sizeof(elem)
orsizeof(elem) * nelem
? - Use the type or the pointer name in
sizeof()
? - To cast or not to cast?
TL;DR:
Use a simple and readable interface, and let it handle the safety for you. Of course, you need to write such an interface, wrapping the standard C library functions.
foo_t *p;
p = MALLOC(nelem, foo_t);
Let's answer the questions one by one.
nelem * sizeof(elem)
or sizeof(elem) * nelem
?
None of the alternatives is good.
Use functions that take separate arguments:
-
calloc(3)
(ISO C) -
reallocarray(3)
(several Unix systems) -
recallocarray(3)
(several Unix systems)
I suggest writing your own mallocarray() and reallocarrayf() as wrappers around these. reallocarrayf() is more complex (read about reallocf(3)), but mallocarray() can be trivially implemented:
#define mallocarray(nmemb, size) reallocarray(NULL, nmemb, size)
As a rule, I always use array allocation functions, even if I only want 1 element:
foo_t *p;
// p = MALLOC(1, foo_t); // which expands to:
p = (foo_t *) mallocarray(1, sizeof(foo_t));
This has several benefits:
-
It clearly shows the number of elements I want. If it's 1, or if it's 42.
An exception is when allocating a structure with a flexible array member. In such case, you really need to call malloc(3) with the precise size you want:
struct foo {
int a;
bar_t b[];
};
struct foo *p;
p = (struct foo *) malloc(sizeof(struct foo) + sizeof(bar_t) * n);
-
The multiplication is performed inside the function, inside a check to prevent overflow.
However, as a general rule, when you multiply numbers, put first the value with the widest type, to avoid overflowing intermediate operations (read about precedence in operators).
int a, b;
size_t c, d;
...
d = a * b * c; // might overflow `a * b` easily
d = c * a * b; // less likely
Use the type or the pointer name in sizeof()?
None of the alternatives is good.
If you use the type, then if you change the type of the pointer might introduce silent bugs in malloc(3) calls, since there's no way that the compiler knows if your doing the right thing.
If you use the name, then you reduce ability to search in the code. Code is more often read (and searched) than modified. It's quite useful to be able to search through a code base for all allocations of a given type with something like the following.
grep -rni 'alloc.*\bfoo_t)' src
The solution comes thanks to the next question, which is about casting. As a starter, I'll say that in this case readability is preferable to safety; we'll try to make it safe somehow... continue reading.
To cast or not to cast? That is the question.
The unsurprising answer, again, is that none of the alternatives is good.
Casting, in general, can disable many warnings, and is a dreaded thing that should be used very carefully. However, that doesn't apply to malloc(3), since it (by returning void *
) implicitly converts to any pointer type, so we can't do more damage by forcing a conversion.
Casting also had issues regarding implicit functions, but those are gone for good (since C99), so this is a non-issue. If you lived in a cave for the last $year - 1999
years, please get out and see the light.
Casts still have issues, which is why they're not ideal:
- Clutters the code, considerably reducing readability.
- They aren't DRY.
So let's not write the cast (which is not the same as don't cast).
Casting the result of malloc(3) is good because:
- It couples the call with the type of the pointer. Since the cast converts the
void *
to a non-void
pointer, it no longer converts to anything implicitly. This means that if the type of the pointer ever changes, malloc(3) calls assigned to that pointer will no longer compile, and we will be able to fix any compilation errors.
So, the solution is to write the cast inside a macro, which uses the same information that we will need for sizeof(), thus avoiding repeating ourselves, and also avoiding readability issues.
#define MALLOC(n, type) ((type *) mallocarray(n, sizeof(type)))
which is to be used as this:
foo_t *p;
p = MALLOC(42, foo_t); // 42 elements of type foo_t.
It's readable, it's concise, it's easily searchable, and it's safe.
0 comment threads