1
0
mirror of https://git.tartarus.org/simon/putty.git synced 2025-01-09 17:38:00 +00:00
putty-source/utils/memory.c

148 lines
3.4 KiB
C
Raw Permalink Normal View History

/*
* PuTTY's memory allocation wrappers.
*/
Side-channel tester: align memory allocations. While trying to get an upcoming piece of code through testsc, I had trouble - _yet again_ - with the way that control flow diverges inside the glibc implementations of functions like memcpy and memset, depending on the alignment of the input blocks _above_ the alignment guaranteed by malloc, so that doing the same sequence of malloc + memset can lead to different control flow. (I believe this is done either for cache performance reasons or SIMD alignment requirements, or both: on x86, some SIMD instructions require memory alignment beyond what malloc guarantees, which is also awkward for our x86 hardware crypto implementations.) My previous effort to normalise this problem out of sclog's log files worked by wrapping memset and all its synonyms that I could find. But this weekend, that failed for me, and the reason appears to be ifuncs. I'm aware of the great irony of committing code to a security project with a log message saying something vague about ifuncs, on the same weekend that it came to light that commits matching that description were one of the methods used to smuggle a backdoor into the XZ Utils project (CVE-2024-3094). So I'll bend over backwards to explain both what I think is going on, and why this _isn't_ a weird ifunc-related backdooring attempt: When I say I 'wrap' memset, I mean I use DynamoRIO's 'drwrap' API to arrange that the side-channel test rig calls a function of mine before and after each call to memset. The way drwrap works is to look up the symbol address in either the main program or a shared library; in this case, it's a shared library, namely libc.so. Then it intercepts call instructions with exactly that address as the target. Unfortunately, what _actually_ happens when the main program calls memset is more complicated. First, control goes to the PLT entry for memset (still in the main program). In principle, that loads a GOT entry containing the address of memset (filled in by ld.so), and jumps to it. But in fact the GOT entry varies its value through the program; on the first call, it points to a resolver function, whose job is to _find out_ the address of memset. And in the version of libc.so I'm currently running, that resolver is an STT_GNU_IFUNC indirection function, which tests the host CPU's capabilities, and chooses an actual implementation of memset depending on what it finds. (In my case, it looks as if it's picking one that makes extensive use of x86 SIMD.) To avoid the overhead of doing this on every call, the returned function pointer is then written into the main program's GOT entry for memset, overwriting the address of the resolver function, so that the _next_ call the main program makes through the same PLT entry will go directly to the memset variant that was chosen. And the problem is that, after this has happened, none of the new control flow ever goes near the _official_ address of memset, as read out of libc.so's dynamic symbol table by DynamoRIO. The PLT entry isn't at that address, and neither is the particular SIMD variant that the resolver ended up choosing. So now my wrapper on memset is never being invoked, and memset cheerfully generates different control flow in runs of my crypto code that testsc expects to be doing exactly the same thing as each other, and all my tests fail spuriously. My solution, at least for the moment, is to completely abandon the strategy of wrapping memset. Instead, let's just make it behave the same way every time, by forcing all the affected memory allocations to have extra-strict alignment. I found that 64-byte alignment is not good enough to eliminate memset-related test failures, but 128-byte alignment is. This would be tricky in itself, if it weren't for the fact that PuTTY already has its own wrapper function on malloc (for various reasons), which everything in our code already uses. So I can divert to C11's aligned_alloc() there. That in turn is done by adding a new #ifdef to utils/memory.c, and compiling it with that #ifdef into a new object library that is included in testsc, superseding the standard memory.o that would otherwise be pulled in from our 'utils' static library. With the previous memset-compensator removed, this means testsc is now dependent on having aligned_alloc() available. So we test for it at cmake time, and don't build testsc at all if it can't be found. This shouldn't bother anyone very much; aligned_alloc() is available on _my_ testsc platform, and if anyone else is trying to run this test suite at all, I expect it will be on something at least as new as that. (One awkward thing here is that we can only replace _new_ allocations with calls to aligned_alloc(): C11 provides no aligned version of realloc. Happily, this doesn't currently introduce any new problems in testsc. If it does, I might have to do something even more painful in future.) So, why isn't this an ifunc-related backdoor attempt? Because (and you can check all of this from the patch): 1. The memset-wrapping code exists entirely within the DynamoRIO plugin module that lives in test/sclog. That is not used in production, only for running the 'testsc' side-channel tester. 2. The memset-wrapping code is _removed_ by this patch, not added. 3. None of this code is dealing directly with ifuncs - only working around the unwanted effects on my test suite from the fact that they exist somewhere else and introduce awkward behaviour.
2024-04-01 07:48:36 +00:00
#ifdef ALLOCATION_ALIGNMENT
/* Before we include standard headers, define _ISOC11_SOURCE so that
* we get the declaration of aligned_alloc(). */
#define _ISOC11_SOURCE
#endif
#include <assert.h>
#include <stdlib.h>
Move standalone parts of misc.c into utils.c. misc.c has always contained a combination of things that are tied tightly into the PuTTY code base (e.g. they use the conf system, or work with our sockets abstraction) and things that are pure standalone utility functions like nullstrcmp() which could quite happily be dropped into any C program without causing a link failure. Now the latter kind of standalone utility code lives in the new source file utils.c, whose only external dependency is on memory.c (for snew, sfree etc), which in turn requires the user to provide an out_of_memory() function. So it should now be much easier to link test programs that use PuTTY's low-level functions without also pulling in half its bulky infrastructure. In the process, I came across a memory allocation logging system enabled by -DMALLOC_LOG that looks long since bit-rotted; in any case we have much more advanced tools for that kind of thing these days, like valgrind and Leak Sanitiser, so I've just removed it rather than trying to transplant it somewhere sensible. (We can always pull it back out of the version control history if really necessary, but I haven't used it in at least a decade.) The other slightly silly thing I did was to give bufchain a function pointer field that points to queue_idempotent_callback(), and disallow direct setting of the 'ic' field in favour of calling bufchain_set_callback which will fill that pointer in too. That allows the bufchain system to live in utils.c rather than misc.c, so that programs can use it without also having to link in the callback system or provide an annoying stub of that function. In fact that's just allowed me to remove stubs of that kind from PuTTYgen and Pageant!
2019-01-03 08:44:11 +00:00
#include <limits.h>
Move standalone parts of misc.c into utils.c. misc.c has always contained a combination of things that are tied tightly into the PuTTY code base (e.g. they use the conf system, or work with our sockets abstraction) and things that are pure standalone utility functions like nullstrcmp() which could quite happily be dropped into any C program without causing a link failure. Now the latter kind of standalone utility code lives in the new source file utils.c, whose only external dependency is on memory.c (for snew, sfree etc), which in turn requires the user to provide an out_of_memory() function. So it should now be much easier to link test programs that use PuTTY's low-level functions without also pulling in half its bulky infrastructure. In the process, I came across a memory allocation logging system enabled by -DMALLOC_LOG that looks long since bit-rotted; in any case we have much more advanced tools for that kind of thing these days, like valgrind and Leak Sanitiser, so I've just removed it rather than trying to transplant it somewhere sensible. (We can always pull it back out of the version control history if really necessary, but I haven't used it in at least a decade.) The other slightly silly thing I did was to give bufchain a function pointer field that points to queue_idempotent_callback(), and disallow direct setting of the 'ic' field in favour of calling bufchain_set_callback which will fill that pointer in too. That allows the bufchain system to live in utils.c rather than misc.c, so that programs can use it without also having to link in the callback system or provide an annoying stub of that function. In fact that's just allowed me to remove stubs of that kind from PuTTYgen and Pageant!
2019-01-03 08:44:11 +00:00
#include "defs.h"
#include "puttymem.h"
#include "misc.h"
void *safemalloc(size_t factor1, size_t factor2, size_t addend)
{
if (factor1 > SIZE_MAX / factor2)
goto fail;
size_t product = factor1 * factor2;
if (addend > SIZE_MAX)
goto fail;
if (product > SIZE_MAX - addend)
goto fail;
size_t size = product + addend;
if (size == 0)
size = 1;
void *p;
#ifdef MINEFIELD
p = minefield_c_malloc(size);
Side-channel tester: align memory allocations. While trying to get an upcoming piece of code through testsc, I had trouble - _yet again_ - with the way that control flow diverges inside the glibc implementations of functions like memcpy and memset, depending on the alignment of the input blocks _above_ the alignment guaranteed by malloc, so that doing the same sequence of malloc + memset can lead to different control flow. (I believe this is done either for cache performance reasons or SIMD alignment requirements, or both: on x86, some SIMD instructions require memory alignment beyond what malloc guarantees, which is also awkward for our x86 hardware crypto implementations.) My previous effort to normalise this problem out of sclog's log files worked by wrapping memset and all its synonyms that I could find. But this weekend, that failed for me, and the reason appears to be ifuncs. I'm aware of the great irony of committing code to a security project with a log message saying something vague about ifuncs, on the same weekend that it came to light that commits matching that description were one of the methods used to smuggle a backdoor into the XZ Utils project (CVE-2024-3094). So I'll bend over backwards to explain both what I think is going on, and why this _isn't_ a weird ifunc-related backdooring attempt: When I say I 'wrap' memset, I mean I use DynamoRIO's 'drwrap' API to arrange that the side-channel test rig calls a function of mine before and after each call to memset. The way drwrap works is to look up the symbol address in either the main program or a shared library; in this case, it's a shared library, namely libc.so. Then it intercepts call instructions with exactly that address as the target. Unfortunately, what _actually_ happens when the main program calls memset is more complicated. First, control goes to the PLT entry for memset (still in the main program). In principle, that loads a GOT entry containing the address of memset (filled in by ld.so), and jumps to it. But in fact the GOT entry varies its value through the program; on the first call, it points to a resolver function, whose job is to _find out_ the address of memset. And in the version of libc.so I'm currently running, that resolver is an STT_GNU_IFUNC indirection function, which tests the host CPU's capabilities, and chooses an actual implementation of memset depending on what it finds. (In my case, it looks as if it's picking one that makes extensive use of x86 SIMD.) To avoid the overhead of doing this on every call, the returned function pointer is then written into the main program's GOT entry for memset, overwriting the address of the resolver function, so that the _next_ call the main program makes through the same PLT entry will go directly to the memset variant that was chosen. And the problem is that, after this has happened, none of the new control flow ever goes near the _official_ address of memset, as read out of libc.so's dynamic symbol table by DynamoRIO. The PLT entry isn't at that address, and neither is the particular SIMD variant that the resolver ended up choosing. So now my wrapper on memset is never being invoked, and memset cheerfully generates different control flow in runs of my crypto code that testsc expects to be doing exactly the same thing as each other, and all my tests fail spuriously. My solution, at least for the moment, is to completely abandon the strategy of wrapping memset. Instead, let's just make it behave the same way every time, by forcing all the affected memory allocations to have extra-strict alignment. I found that 64-byte alignment is not good enough to eliminate memset-related test failures, but 128-byte alignment is. This would be tricky in itself, if it weren't for the fact that PuTTY already has its own wrapper function on malloc (for various reasons), which everything in our code already uses. So I can divert to C11's aligned_alloc() there. That in turn is done by adding a new #ifdef to utils/memory.c, and compiling it with that #ifdef into a new object library that is included in testsc, superseding the standard memory.o that would otherwise be pulled in from our 'utils' static library. With the previous memset-compensator removed, this means testsc is now dependent on having aligned_alloc() available. So we test for it at cmake time, and don't build testsc at all if it can't be found. This shouldn't bother anyone very much; aligned_alloc() is available on _my_ testsc platform, and if anyone else is trying to run this test suite at all, I expect it will be on something at least as new as that. (One awkward thing here is that we can only replace _new_ allocations with calls to aligned_alloc(): C11 provides no aligned version of realloc. Happily, this doesn't currently introduce any new problems in testsc. If it does, I might have to do something even more painful in future.) So, why isn't this an ifunc-related backdoor attempt? Because (and you can check all of this from the patch): 1. The memset-wrapping code exists entirely within the DynamoRIO plugin module that lives in test/sclog. That is not used in production, only for running the 'testsc' side-channel tester. 2. The memset-wrapping code is _removed_ by this patch, not added. 3. None of this code is dealing directly with ifuncs - only working around the unwanted effects on my test suite from the fact that they exist somewhere else and introduce awkward behaviour.
2024-04-01 07:48:36 +00:00
#elif defined ALLOCATION_ALIGNMENT
/* aligned_alloc requires the allocation size to be rounded up */
p = aligned_alloc(
ALLOCATION_ALIGNMENT,
(size + ALLOCATION_ALIGNMENT - 1) & ~(ALLOCATION_ALIGNMENT-1));
#else
p = malloc(size);
#endif
Move standalone parts of misc.c into utils.c. misc.c has always contained a combination of things that are tied tightly into the PuTTY code base (e.g. they use the conf system, or work with our sockets abstraction) and things that are pure standalone utility functions like nullstrcmp() which could quite happily be dropped into any C program without causing a link failure. Now the latter kind of standalone utility code lives in the new source file utils.c, whose only external dependency is on memory.c (for snew, sfree etc), which in turn requires the user to provide an out_of_memory() function. So it should now be much easier to link test programs that use PuTTY's low-level functions without also pulling in half its bulky infrastructure. In the process, I came across a memory allocation logging system enabled by -DMALLOC_LOG that looks long since bit-rotted; in any case we have much more advanced tools for that kind of thing these days, like valgrind and Leak Sanitiser, so I've just removed it rather than trying to transplant it somewhere sensible. (We can always pull it back out of the version control history if really necessary, but I haven't used it in at least a decade.) The other slightly silly thing I did was to give bufchain a function pointer field that points to queue_idempotent_callback(), and disallow direct setting of the 'ic' field in favour of calling bufchain_set_callback which will fill that pointer in too. That allows the bufchain system to live in utils.c rather than misc.c, so that programs can use it without also having to link in the callback system or provide an annoying stub of that function. In fact that's just allowed me to remove stubs of that kind from PuTTYgen and Pageant!
2019-01-03 08:44:11 +00:00
if (!p)
goto fail;
Move standalone parts of misc.c into utils.c. misc.c has always contained a combination of things that are tied tightly into the PuTTY code base (e.g. they use the conf system, or work with our sockets abstraction) and things that are pure standalone utility functions like nullstrcmp() which could quite happily be dropped into any C program without causing a link failure. Now the latter kind of standalone utility code lives in the new source file utils.c, whose only external dependency is on memory.c (for snew, sfree etc), which in turn requires the user to provide an out_of_memory() function. So it should now be much easier to link test programs that use PuTTY's low-level functions without also pulling in half its bulky infrastructure. In the process, I came across a memory allocation logging system enabled by -DMALLOC_LOG that looks long since bit-rotted; in any case we have much more advanced tools for that kind of thing these days, like valgrind and Leak Sanitiser, so I've just removed it rather than trying to transplant it somewhere sensible. (We can always pull it back out of the version control history if really necessary, but I haven't used it in at least a decade.) The other slightly silly thing I did was to give bufchain a function pointer field that points to queue_idempotent_callback(), and disallow direct setting of the 'ic' field in favour of calling bufchain_set_callback which will fill that pointer in too. That allows the bufchain system to live in utils.c rather than misc.c, so that programs can use it without also having to link in the callback system or provide an annoying stub of that function. In fact that's just allowed me to remove stubs of that kind from PuTTYgen and Pageant!
2019-01-03 08:44:11 +00:00
return p;
fail:
out_of_memory();
}
void *saferealloc(void *ptr, size_t n, size_t size)
{
void *p;
if (n > INT_MAX / size) {
p = NULL;
} else {
size *= n;
if (!ptr) {
#ifdef MINEFIELD
p = minefield_c_malloc(size);
Side-channel tester: align memory allocations. While trying to get an upcoming piece of code through testsc, I had trouble - _yet again_ - with the way that control flow diverges inside the glibc implementations of functions like memcpy and memset, depending on the alignment of the input blocks _above_ the alignment guaranteed by malloc, so that doing the same sequence of malloc + memset can lead to different control flow. (I believe this is done either for cache performance reasons or SIMD alignment requirements, or both: on x86, some SIMD instructions require memory alignment beyond what malloc guarantees, which is also awkward for our x86 hardware crypto implementations.) My previous effort to normalise this problem out of sclog's log files worked by wrapping memset and all its synonyms that I could find. But this weekend, that failed for me, and the reason appears to be ifuncs. I'm aware of the great irony of committing code to a security project with a log message saying something vague about ifuncs, on the same weekend that it came to light that commits matching that description were one of the methods used to smuggle a backdoor into the XZ Utils project (CVE-2024-3094). So I'll bend over backwards to explain both what I think is going on, and why this _isn't_ a weird ifunc-related backdooring attempt: When I say I 'wrap' memset, I mean I use DynamoRIO's 'drwrap' API to arrange that the side-channel test rig calls a function of mine before and after each call to memset. The way drwrap works is to look up the symbol address in either the main program or a shared library; in this case, it's a shared library, namely libc.so. Then it intercepts call instructions with exactly that address as the target. Unfortunately, what _actually_ happens when the main program calls memset is more complicated. First, control goes to the PLT entry for memset (still in the main program). In principle, that loads a GOT entry containing the address of memset (filled in by ld.so), and jumps to it. But in fact the GOT entry varies its value through the program; on the first call, it points to a resolver function, whose job is to _find out_ the address of memset. And in the version of libc.so I'm currently running, that resolver is an STT_GNU_IFUNC indirection function, which tests the host CPU's capabilities, and chooses an actual implementation of memset depending on what it finds. (In my case, it looks as if it's picking one that makes extensive use of x86 SIMD.) To avoid the overhead of doing this on every call, the returned function pointer is then written into the main program's GOT entry for memset, overwriting the address of the resolver function, so that the _next_ call the main program makes through the same PLT entry will go directly to the memset variant that was chosen. And the problem is that, after this has happened, none of the new control flow ever goes near the _official_ address of memset, as read out of libc.so's dynamic symbol table by DynamoRIO. The PLT entry isn't at that address, and neither is the particular SIMD variant that the resolver ended up choosing. So now my wrapper on memset is never being invoked, and memset cheerfully generates different control flow in runs of my crypto code that testsc expects to be doing exactly the same thing as each other, and all my tests fail spuriously. My solution, at least for the moment, is to completely abandon the strategy of wrapping memset. Instead, let's just make it behave the same way every time, by forcing all the affected memory allocations to have extra-strict alignment. I found that 64-byte alignment is not good enough to eliminate memset-related test failures, but 128-byte alignment is. This would be tricky in itself, if it weren't for the fact that PuTTY already has its own wrapper function on malloc (for various reasons), which everything in our code already uses. So I can divert to C11's aligned_alloc() there. That in turn is done by adding a new #ifdef to utils/memory.c, and compiling it with that #ifdef into a new object library that is included in testsc, superseding the standard memory.o that would otherwise be pulled in from our 'utils' static library. With the previous memset-compensator removed, this means testsc is now dependent on having aligned_alloc() available. So we test for it at cmake time, and don't build testsc at all if it can't be found. This shouldn't bother anyone very much; aligned_alloc() is available on _my_ testsc platform, and if anyone else is trying to run this test suite at all, I expect it will be on something at least as new as that. (One awkward thing here is that we can only replace _new_ allocations with calls to aligned_alloc(): C11 provides no aligned version of realloc. Happily, this doesn't currently introduce any new problems in testsc. If it does, I might have to do something even more painful in future.) So, why isn't this an ifunc-related backdoor attempt? Because (and you can check all of this from the patch): 1. The memset-wrapping code exists entirely within the DynamoRIO plugin module that lives in test/sclog. That is not used in production, only for running the 'testsc' side-channel tester. 2. The memset-wrapping code is _removed_ by this patch, not added. 3. None of this code is dealing directly with ifuncs - only working around the unwanted effects on my test suite from the fact that they exist somewhere else and introduce awkward behaviour.
2024-04-01 07:48:36 +00:00
#elif defined ALLOCATION_ALIGNMENT
p = aligned_alloc(ALLOCATION_ALIGNMENT, size);
#else
p = malloc(size);
#endif
} else {
#ifdef MINEFIELD
p = minefield_c_realloc(ptr, size);
#else
p = realloc(ptr, size);
#endif
}
}
Move standalone parts of misc.c into utils.c. misc.c has always contained a combination of things that are tied tightly into the PuTTY code base (e.g. they use the conf system, or work with our sockets abstraction) and things that are pure standalone utility functions like nullstrcmp() which could quite happily be dropped into any C program without causing a link failure. Now the latter kind of standalone utility code lives in the new source file utils.c, whose only external dependency is on memory.c (for snew, sfree etc), which in turn requires the user to provide an out_of_memory() function. So it should now be much easier to link test programs that use PuTTY's low-level functions without also pulling in half its bulky infrastructure. In the process, I came across a memory allocation logging system enabled by -DMALLOC_LOG that looks long since bit-rotted; in any case we have much more advanced tools for that kind of thing these days, like valgrind and Leak Sanitiser, so I've just removed it rather than trying to transplant it somewhere sensible. (We can always pull it back out of the version control history if really necessary, but I haven't used it in at least a decade.) The other slightly silly thing I did was to give bufchain a function pointer field that points to queue_idempotent_callback(), and disallow direct setting of the 'ic' field in favour of calling bufchain_set_callback which will fill that pointer in too. That allows the bufchain system to live in utils.c rather than misc.c, so that programs can use it without also having to link in the callback system or provide an annoying stub of that function. In fact that's just allowed me to remove stubs of that kind from PuTTYgen and Pageant!
2019-01-03 08:44:11 +00:00
if (!p)
out_of_memory();
return p;
}
void safefree(void *ptr)
{
if (ptr) {
#ifdef MINEFIELD
minefield_c_free(ptr);
#else
free(ptr);
#endif
}
}
void *safegrowarray(void *ptr, size_t *allocated, size_t eltsize,
size_t oldlen, size_t extralen, bool secret)
{
/* The largest value we can safely multiply by eltsize */
assert(eltsize > 0);
size_t maxsize = (~(size_t)0) / eltsize;
size_t oldsize = *allocated;
/* Range-check the input values */
assert(oldsize <= maxsize);
assert(oldlen <= maxsize);
assert(extralen <= maxsize - oldlen);
/* If the size is already enough, don't bother doing anything! */
if (oldsize > oldlen + extralen)
return ptr;
/* Find out how much we need to grow the array by. */
size_t increment = (oldlen + extralen) - oldsize;
/* Invent a new size. We want to grow the array by at least
* 'increment' elements; by at least a fixed number of bytes (to
* get things started when sizes are small); and by some constant
* factor of its old size (to avoid repeated calls to this
* function taking quadratic time overall). */
if (increment < 256 / eltsize)
increment = 256 / eltsize;
if (increment < oldsize / 16)
increment = oldsize / 16;
/* But we also can't grow beyond maxsize. */
size_t maxincr = maxsize - oldsize;
if (increment > maxincr)
increment = maxincr;
size_t newsize = oldsize + increment;
void *toret;
if (secret) {
toret = safemalloc(newsize, eltsize, 0);
if (oldsize) {
memcpy(toret, ptr, oldsize * eltsize);
smemclr(ptr, oldsize * eltsize);
sfree(ptr);
}
} else {
toret = saferealloc(ptr, newsize, eltsize);
}
*allocated = newsize;
return toret;
}