Bug 497977 - valgrind thinks that size_t is signed
Summary: valgrind thinks that size_t is signed
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.19.0
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-12-28 11:20 UTC by Vincent Lefèvre
Modified: 2024-12-30 15:29 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vincent Lefèvre 2024-12-28 11:20:31 UTC
SUMMARY
When outputting a message about a large size_t value, valgrind thinks that this type is signed, while it is always unsigned according to the C standard.

STEPS TO REPRODUCE
1. Consider the following C program:

#include <stdlib.h>
int main (void)
{
  void *p = malloc ((size_t) -1);
  return p != NULL;
}

2. Compile it.
3. Run the executable with valgrind.

OBSERVED RESULT
qaa% valgrind ./tst
[...]
==1153312== Argument 'size' of function malloc has a fishy (possibly negative) value: -1
==1153312==    at 0x48437B4: malloc (vg_replace_malloc.c:381)
==1153312==    by 0x401139: main (in /home/vinc17/tst)
[...]

Note the "possibly negative" and the output value "-1".

As a comparison, GCC is correct by regarding this value as positive when outputting its warning:

qaa% gcc-snapshot tst.c -o tst
tst.c: In function 'main':
tst.c:4:13: warning: argument 1 value '18446744073709551615' exceeds maximum object size 9223372036854775807 [-Walloc-size-larger-than=]
    4 |   void *p = malloc ((size_t) -1);
      |             ^~~~~~~~~~~~~~~~~~~~
In file included from tst.c:1:
/usr/include/stdlib.h:672:14: note: in a call to allocation function 'malloc' declared here
  672 | extern void *malloc (size_t __size) __THROW __attribute_malloc__
      |              ^~~~~~

EXPECTED RESULT
The message must not say that the value is possibly negative and must not show a negative value.

SOFTWARE/OS VERSIONS
valgrind-3.19.0 (valgrind 1:3.19.0-1 Debian package under Debian/unstable)

ADDITIONAL INFORMATION
I had reported the following Debian bug in 2020:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=962772
where this issue showed up with GNU MPFR's tabort_defalloc1 and tabort_defalloc2 tests from its testsuite (these tests are now disabled when they are run under valgrind, but the above simple testcase allows one to reproduce the wrong output). The goal of these MPFR tests is to test the allocation functions with a huge positive value, such as "(size_t) -1".
Comment 1 Philippe Waroquiers 2024-12-28 15:34:27 UTC
Here is the piece of code that leads to this message:
Bool MC_(record_fishy_value_error) ( ThreadId tid, const HChar *function_name,
                                     const HChar *argument_name, SizeT value)
{
   MC_Error extra;

   tl_assert(VG_INVALID_THREADID != tid);

   if ((SSizeT)value >= 0) return False;  // not a fishy value
    ....

As you can see, the (unsigned) SizeT value is explicitly casted to a signed type SSizeT and compared to 0.
While technically/formally, valgrind should not complain about the provided value and/or should not speak about negative values,
this is an explicit decision to have this behavior as such values will typically be a real user bug.
Comment 2 Vincent Lefèvre 2024-12-28 23:23:27 UTC
I can understand that valgrind outputs a message, but the cast to the signed type should just be internal to valgrind. The output message should show the actual value.
Comment 3 Mark Wielaard 2024-12-28 23:59:46 UTC
Maybe we should output both. Even if accurate, I find the gcc output not very helpful. It might technically be a large unsigned value, it most likely is a small negative value (cast to an unsigned type).
Comment 4 Vincent Lefèvre 2024-12-29 02:22:34 UTC
The GCC output is helpful in the sense that it shows that the value is huge; the exact value really does not matter: malloc will fail to allocate the memory for any size of this order of magnitude. The "it most likely is a small negative value (cast to an unsigned type)" may be true only if it is actually a small negative value (in magnitude). But with

#include <stdlib.h>
int main (void)
{
  void *p = malloc ((size_t) 0x8000000000000000);
  return p != NULL;
}

valgrind currently outputs

Argument 'size' of function malloc has a fishy (possibly negative) value: -9223372036854775808

while in this case, I do not see a cast of a negative value to an unsigned type as being the probable cause.

I would tend to say that something wrong occurring with negative values is likely only when the width of the value converted to signed is roughly at most the width of the size_t type divided by 2. For instance, on a machine with a 64-bit size_t, if an internal computation of the size is done with signed 32-bit arithmetic and overflows, one may get a 32-bit negative value, and after it is converted to the unsigned 64-bit type size_t, one gets a huge positive value instead of the expected 32-bit positive value.

So, perhaps output the corresponding negative value in addition to the positive value, but IMHO, only when the negative value is small enough in magnitude.
Comment 5 Paul Floyd 2024-12-29 13:15:59 UTC
You are getting it the wrong way round. This error message isn't about library internals. It's about user code. The kind of thing that this is protecting against is

https://godbolt.org/z/8e7vrd55j

Because of C's terrible implicit conversions, signed int will convert to size_t without a squeak.

I agree with Mark that GCC's error is a lot less useful than the Memcheck one and Philippe that this was done deliberately to detect user bugs. This is a runtime error, not compile time. We have no way of knowing whether the original input was signed or unsigned. It's just a bunch of bits.

Trying to allocate 0x8000000000000000 or more is always going to fail. Perhaps it might be possible with a huge system in maybe 10 or 20 years time.

This is slightly more of an issue on 32bit systems where using more than half of the VM space for an application is easy these days. Thankfully 32bit systems are on the way out.
Comment 6 Vincent Lefèvre 2024-12-29 15:40:04 UTC
The point is that the current message is misleading. It gives the impression that size_t is signed, while it is actually unsigned. At least it should not give incorrect information to a user who deliberately provides a huge value to malloc().
Comment 7 Paul Floyd 2024-12-29 17:12:04 UTC
I still strongly disagree. Unsigned long values that have the same bit pattern as negative signed long are still ridiculously high.

Even with a ludicrous amount of overcommit and swap we're still a long way from having systems with 8 exbibytes of physical memory.
Comment 8 Vincent Lefèvre 2024-12-29 21:15:13 UTC
You did not understand. Such huge values are precisely used to check the correct handling of allocation failures in a testsuite.

Note that they may also occur in production, e.g. in arbitrary-precision software, just because the user asks something that triggers the use of huge integers. However, valgrind is usually not involved in such cases.
Comment 9 Florian Krohm 2024-12-29 21:53:09 UTC
(In reply to Vincent Lefèvre from comment #0)

> [...]
> ==1153312== Argument 'size' of function malloc has a fishy (possibly
> negative) value: -1

I agree that this message could be improved. Clearly, the bit pattern being passed as 'size' argument is being
interpreted as an unsigned value. So it cannot possibly be negative.
Perhaps:
Argument 'size' of function malloc is excessively large: <value in hex>.
Perhaps a negative  value of <negative value here> was passed.

I guess most people will agree that ~(SizeT)0 >>1 as malloc size is also likely a symptom of a bug. But as it does not pass valgrind's fishiness definition there will be no complaint. Which makes me think that being able to provide a fishy value threshold on the command line might be worth considering.
Comment 10 Vincent Lefèvre 2024-12-30 00:12:13 UTC
(In reply to Florian Krohm from comment #9)
> I agree that this message could be improved. Clearly, the bit pattern being
> passed as 'size' argument is being
> interpreted as an unsigned value. So it cannot possibly be negative.
> Perhaps:
> Argument 'size' of function malloc is excessively large: <value in hex>.
> Perhaps a negative  value of <negative value here> was passed.

Yes, this would be much better.

> I guess most people will agree that ~(SizeT)0 >>1 as malloc size is also
> likely a symptom of a bug. But as it does not pass valgrind's fishiness
> definition there will be no complaint. Which makes me think that being able
> to provide a fishy value threshold on the command line might be worth
> considering.

IIRC, the only cases where I saw a huge allocation request were:
* A test with a huge value on purpose as in the GNU MPFR testsuite (not a bug).
* An attempt to allocate the space just because I was asking too much (not a bug), probably with GP/PARI or Maple.
* The following bug in GMP: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=863217 where a negative value was really be passed, due to a size computation with the 32-bit int type and an integer overflow: -2^31 (which is INT_MIN). I think that this bug was fixed by the following commit, which changed the type of some variables from int to size_t: https://gmplib.org/repo/gmp/rev/725745f3637f
Comment 11 Paul Floyd 2024-12-30 09:08:51 UTC
The smallest unsigned long that also looks like a signed long is 0x8000000000000000. That's 8 exbibytes.

How many 256Gbyte DDR modules is that? About 33.5 million of them (32 mebimodules if you like). Let's say 20W each. That means over 600MW just for the system RAM. That's about 5 large data centres. It's going to take a couple of decades of Moore's Law (assuming that it doesn't run out of steam or in this case atoms) before that becomes practical.

The two cases where I see this arising are
1. Bugs due to signed to unsigned conversion
2. Artificial testcases.

For 32bit there is a fairly strong case for allowing addresses in the top half of the address space. I see no justification at all for 64bit (at least probably not in my lifetime).

We could change the message to be something like

"Argument '%s' of function %s has a fishy (possibly negative) value: %ld (signed) or %lu (unsigned)\n",

That would give

==4026== Argument 'size' of function malloc has a fishy (possibly negative) value: -1 (signed) 18446744073709551615 (unsigned)
==4026==    at 0x484D2E4: malloc (vg_replace_malloc.c:450)
==4026==    by 0x2017E5: main (malloc3.c:15)

or on 32 bit

==4037== Argument 'size' of function malloc has a fishy (possibly negative) value: -1 (signed) 4294967295 (unsigned)
==4037==    at 0x74C1C42: malloc (vg_replace_malloc.c:450)
==4037==    by 0x401629: main (malloc3.c:15)

With a size of 0x8000000000000000 that gives

==4064== Argument 'size' of function malloc has a fishy (possibly negative) value: -9223372036854775808 (signed) 9223372036854775808 (unsigned)
==4064==    at 0x484D2E4: malloc (vg_replace_malloc.c:450)
==4064==    by 0x201846: main (malloc3.c:19)

In this case both signed and unsigned values are nonsense. I don't see that as much of an improvement.
Comment 12 Vincent Lefèvre 2024-12-30 10:33:26 UTC
(In reply to Paul Floyd from comment #11)
> In this case both signed and unsigned values are nonsense. I don't see that
> as much of an improvement.

The issue is not the value itself, but the fact that valgrind currently says that the value is negative. This gives confusing information for the user. When I first saw this error, I started to blame my compiler / C library for providing a signed size_t type.
Comment 13 Paul Floyd 2024-12-30 13:00:39 UTC
(In reply to Vincent Lefèvre from comment #12)

> The issue is not the value itself, but the fact that valgrind currently says
> that the value is negative. This gives confusing information for the user.
> When I first saw this error, I started to blame my compiler / C library for
> providing a signed size_t type.

It's saying "(possibly negative)" and giving what would be the negative value.
Comment 14 Vincent Lefèvre 2024-12-30 14:08:46 UTC
(In reply to Paul Floyd from comment #13)
> It's saying "(possibly negative)" and giving what would be the negative
> value.

This is still confusing, because with an unsigned size_t, a negative value is not possible.
Or you should say: "possibly a negative value converted to size_t".
Comment 15 Paul Floyd 2024-12-30 15:29:42 UTC
(In reply to Vincent Lefèvre from comment #14)

> This is still confusing, because with an unsigned size_t, a negative value
> is not possible.
> Or you should say: "possibly a negative value converted to size_t".

You're still thinking of this from the perspective of the malloc implementation. Users don't care about that. They just call malloc. 

Thanks to implicit conversion the user isn't even limited to just positive and negative values. malloc will happily take floating point and  complex and imaginary numbers as well. None of those are likely, the point I'm making is that just about anything will get implicitly converted. Above all ***we have no idea what the user wrote in their call to malloc***· All that we are saying is that it is fishy and possibly signed negative.

I just had a look on GitHub for issues related to this message.

One bug, the fix was to correctly initialize a flag and I can't follow the connection between that flag and the error
One testcase
Two look like a junk values
One using a char for size which probably wrapped then then was implicitly converted
One due to wrapping on a 32bit system
Another one due to wrapping, using an int instead of a long
A junk value due to reading from an outdated json format
Wrong value due to string not null teminated
Another problem with wrapping

The majority look to me like signed integer values with a problem related to wrapping (initial type too small).