Bug 490182

Summary: memcheck fails with message "Assertion `offsetB < 4096' failed" when using a large number of active registers and tracking origins on aarch64 machine
Product: [Developer tools] valgrind Reporter: cbeauchene93
Component: memcheckAssignee: Julian Seward <jseward>
Status: REPORTED ---    
Severity: crash CC: pjfloyd, wash
Priority: NOR    
Version First Reported In: 3.19.0   
Target Milestone: ---   
Platform: RedHat Enterprise Linux   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:
Attachments: Repro code

Description cbeauchene93 2024-07-12 15:33:35 UTC
Created attachment 171609 [details]
Repro code

SUMMARY
Valgrind fails with the message "Assertion `offsetB < 4096' failed" and "valgrind: the 'impossible' happened: LibVEX called failure_exit().". This consistently happens on my Aarch64 Amazon Linux 2 (m6g.4xlarge EC2 instance) when running memcheck on binaries where a "high" number of active registers are used, but only when running with "--track-origins=yes". 

STEPS TO REPRODUCE
1. Ensure you are running on an arm64 machine (I am using an m6g.4xlarge EC2 instance)
2. Download the attached .tgz file, untar, and cd into this dir
3. Download a C++ compiler (I am using g++ (GCC) 7.3.1 20180712 (Red Hat 7.3.1-17))
4. Compile the repro code with the "bad" function implementation (uses a high number of registers): g++ valgrind-repro.cpp valgrind-bug.S -o valgrind-bug
5. Run the binary under memcheck using track-origins: valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes ./valgrind-bug and you should see the failure.
6. Run without track-origins and you should see a success
7. Additionally, you can compile the repro code with the "ok" function implementation (uses less active registers): g++ valgrind-repro.cpp valgrind-ok.S -o valgrind-ok
8. Run this under memcheck with track origins and you should see a success as well.

OBSERVED RESULT
==20438== Memcheck, a memory error detector
==20438== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==20438== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==20438== Command: ./valgrind-bug
==20438==

vex: priv/host_arm64_defs.c:2829 (genSpill_ARM64): Assertion `offsetB < 4096' failed.
vex storage: T total 105861288 bytes allocated
vex storage: P total 0 bytes allocated

valgrind: the 'impossible' happened:
   LibVEX called failure_exit().

host stacktrace:
==20438==    at 0x58051B88: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==20438==    by 0x58051CC7: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==20438==    by 0x58051EF7: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==20438==    by 0x58051F17: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==20438==    by 0x5806B017: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==20438==    by 0x58156FE3: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==20438==    by 0x581BA3E7: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==20438==    by 0x58002CFF: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==20438==    by 0x581B2AF7: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==20438==    by 0x5815414F: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==20438==    by 0x5806DC1F: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==20438==    by 0x580B709F: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==20438==    by 0x58108FDF: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==20438==    by 0xFFFFFFFFFFFFFFFF: ???

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 20438)
==20438==    at 0x400704: main (in valgrind-bug)
client stack range: [0x1FFEFFE000 0x1FFF000FFF] client SP: 0x1FFF0003A0
valgrind stack range: [0x1008FB8000 0x10090B7FFF] top usage: 14624 of 1048576

EXPECTED RESULT
==26287== Memcheck, a memory error detector
==26287== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==26287== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==26287== Command: ./valgrind-ok
==26287==
==26287==
==26287== HEAP SUMMARY:
==26287==     in use at exit: 0 bytes in 0 blocks
==26287==   total heap usage: 1 allocs, 1 frees, 72,704 bytes allocated
==26287==
==26287== All heap blocks were freed -- no leaks are possible
==26287==
==26287== For lists of detected and suppressed errors, rerun with: -s
==26287== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Please let me know if there is any additional information that is needed. Appreciate the help!
Comment 1 cbeauchene93 2024-07-12 15:38:59 UTC
Note: The C++ compiler version should not be too important here as we are using assembly code to reproduce this issue.
Comment 2 Paul Floyd 2024-07-13 17:07:03 UTC
That's probably this bit

      case HRcVec128: {
         HReg x21  = hregARM64_X21();  // baseblock
         HReg x9   = hregARM64_X9();   // spill temporary
         vassert(0 == (offsetB & 15)); // check sane alignment
         vassert(offsetB < 4096);

It looks like

   vreg_state[v_idx].spill_offset
         = toShort(con->guest_sizeB * 3 + ss_no * 8);

guest_sizeB is 944, so *3 that's 2832.

ssno is less than this constant

#  define N_SPILL64S (LibVEX_N_SPILL_BYTES / 8)

and

#define LibVEX_N_SPILL_BYTES 4096

Lastly

            for (ss_no = 0; ss_no < N_SPILL64S; ss_no++) {
               if (ss_busy_until_before[ss_no] <= vreg_state[v_idx].live_after)
                  break;
            }
            if (ss_no == N_SPILL64S) {
               vpanic("N_SPILL64S is too low in VEX. Increase and recompile.");
            }

That looks inconsistent with the asserts. ss_no can go up to 512 in the above code. But the calculation of spill_offset includes guest plus 2 shadows which leaves only 1264 bytes or 148 spill registers.

Not sure how to fix this.