Bug 483711 - -m32 flag breaks Valgrind
Summary: -m32 flag breaks Valgrind
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.16.1
Platform: unspecified Unspecified
: NOR crash
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-03-15 23:33 UTC by railway_bylaws.0b
Modified: 2024-12-16 00:31 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description railway_bylaws.0b 2024-03-15 23:33:30 UTC
SUMMARY

I wanted to debug 32bit compiled code on my x86-64 container (Docker) installed on my ARM64 machine, but unfortunately, after compiling with gcc and the appropriate flags (-m32, -g), I get the following error message from valgrind:

==961== Memcheck, a memory error detector
==961== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==961== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==961== Command: ./test
==961==

valgrind: m_scheduler/scheduler.c:1704 (vgPlain_scheduler): the 'impossible' happened.
valgrind: VG_(scheduler), phase 3: run_innerloop detected host state invariant failure

host stacktrace:
==961==    at 0x5803EF67: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/memcheck-x86-linux)
==961==    by 0x5803F090: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/memcheck-x86-linux)
==961==    by 0x5803F184: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/memcheck-x86-linux)
==961==    by 0x5809FEF0: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/memcheck-x86-linux)
==961==    by 0x580F5F22: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/memcheck-x86-linux)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 961)
==961==    at 0x40023FD: handle_preload_list (rtld.c:902)
==961==    by 0x4005541: dl_main (rtld.c:1735)
==961==    by 0x401A055: _dl_sysdep_start (dl-sysdep.c:252)
==961==    by 0x4001FEC: _dl_start_final (rtld.c:485)
==961==    by 0x4001FEC: _dl_start (rtld.c:575)
==961==    by 0x40010BA: ??? (in /lib/i386-linux-gnu/ld-2.31.so)
client stack range: [0x3F7FF000 0x3F800FFF] client SP: 0x3F7FFA20
valgrind stack range: [0x22FA6000 0x230A5FFF] top usage: 5676 of 1048576

This doesn't happen whenever I compile without the -m32 flag. I have tried to replicate the issue with both ubuntu and debian and the issue persists.
I have read the documentation to find anything pointing me to the right direction but I haven't found anything helpful, I am afraid this is a bug unfortunately.

STEPS TO REPRODUCE
(Given you are inside a Docker container)
1. Compile any .c file using gcc and the -m32 flag
2. Run valgrind

OBSERVED RESULT
the "impossible" happened

EXPECTED RESULT
Successful debugging session
Comment 1 Paul Floyd 2024-03-16 10:53:03 UTC
3.16.1 is fairly old, can you try something more recent?

Docker doesn't seem to be a good environment for running Valgrind. I've never used it but I keep hearing of problems. I don't think that Docker does a good enough job of virtualizaton. Valgrind works better either on a real OS or on full virtualization systems like VirtualBox and VMware.
Comment 2 railway_bylaws.0b 2024-03-16 15:53:17 UTC
Of course, I have just repeated the same steps with ver 3.22.0, but unfortunately the outcome remains the same, albeit a bit more detailed.


==10121== Command: ./test
==10121==

valgrind: m_scheduler/scheduler.c:1730 (vgPlain_scheduler): the 'impossible' happened.
valgrind: VG_(scheduler), phase 3: run_innerloop detected host state invariant failure

host stacktrace:
==10121==    at 0x5803B997: show_sched_status_wrk (m_libcassert.c:407)
==10121==    by 0x5803BAD0: report_and_quit (m_libcassert.c:478)
==10121==    by 0x5803BBC4: vgPlain_assert_fail (m_libcassert.c:544)
==10121==    by 0x580951C8: vgPlain_scheduler (scheduler.c:1757)
==10121==    by 0x580E615E: thread_wrapper (syswrap-linux.c:102)
==10121==    by 0x580E615E: run_a_thread_NORETURN (syswrap-linux.c:155)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 10121)
==10121==    at 0x40023FD: handle_preload_list (rtld.c:902)
==10121==    by 0x4005541: dl_main (rtld.c:1735)
==10121==    by 0x401A055: _dl_sysdep_start (dl-sysdep.c:252)
==10121==    by 0x4001FEC: _dl_start_final (rtld.c:485)
==10121==    by 0x4001FEC: _dl_start (rtld.c:575)
==10121==    by 0x40010BA: ??? (in /lib/i386-linux-gnu/ld-2.31.so)
client stack range: [0x3F7FF000 0x3F800FFF] client SP: 0x3F7FFA60
valgrind stack range: [0x2326F000 0x2336EFFF] top usage: 6792 of 1048576


I know containerization poses some challenges, but I am trying to work with as little disk space as possible.
Comment 3 railway_bylaws.0b 2024-03-16 15:55:38 UTC
(In reply to railway_bylaws.0b from comment #2)
> Of course, I have just repeated the same steps with ver 3.22.0, but
> unfortunately the outcome remains the same, albeit a bit more detailed.
> 
> 
> ==10121== Command: ./test
> ==10121==
> 
> valgrind: m_scheduler/scheduler.c:1730 (vgPlain_scheduler): the 'impossible'
> happened.
> valgrind: VG_(scheduler), phase 3: run_innerloop detected host state
> invariant failure
> 
> host stacktrace:
> ==10121==    at 0x5803B997: show_sched_status_wrk (m_libcassert.c:407)
> ==10121==    by 0x5803BAD0: report_and_quit (m_libcassert.c:478)
> ==10121==    by 0x5803BBC4: vgPlain_assert_fail (m_libcassert.c:544)
> ==10121==    by 0x580951C8: vgPlain_scheduler (scheduler.c:1757)
> ==10121==    by 0x580E615E: thread_wrapper (syswrap-linux.c:102)
> ==10121==    by 0x580E615E: run_a_thread_NORETURN (syswrap-linux.c:155)
> 
> sched status:
>   running_tid=1
> 
> Thread 1: status = VgTs_Runnable (lwpid 10121)
> ==10121==    at 0x40023FD: handle_preload_list (rtld.c:902)
> ==10121==    by 0x4005541: dl_main (rtld.c:1735)
> ==10121==    by 0x401A055: _dl_sysdep_start (dl-sysdep.c:252)
> ==10121==    by 0x4001FEC: _dl_start_final (rtld.c:485)
> ==10121==    by 0x4001FEC: _dl_start (rtld.c:575)
> ==10121==    by 0x40010BA: ??? (in /lib/i386-linux-gnu/ld-2.31.so)
> client stack range: [0x3F7FF000 0x3F800FFF] client SP: 0x3F7FFA60
> valgrind stack range: [0x2326F000 0x2336EFFF] top usage: 6792 of 1048576
> 
> 
> I know containerization poses some challenges, but I am trying to work with
> as little disk space as possible.

I forgot to stress that the issue only presents itself when compiling with the -m32 flag, valgrind behaves as expected otherwise, even in a containerized environment.
Comment 4 Paul Floyd 2024-03-16 16:54:45 UTC
Can you provide an exe that reproduces the problem?

As far as I know none of the Valgrind developers use Docker, and x86 isn't actively supported, so there's not much chance of this being fixed.

It might help if you could build Valgrind from source and get a proper stacktrace from Valgrind. Many Linux packagers have decided that they know better than the Valgrind developers and they strip the Valgrind bninaries.
Comment 5 Mark Wielaard 2024-03-16 17:16:05 UTC
Various x86_64 (amd64) builders on builder.sourceware.org are docker based. I think valgrind works fine under a linux container/docker setup. But none of them run x86 (32bit).

The comments around the assert read:

      case VG_TRC_INVARIANT_FAILED:
         /* This typically happens if, after running generated code,
            it is detected that host CPU settings (eg, FPU/Vector
            control words) are not as they should be.  Vex's code
            generation specifies the state such control words should
            be in on entry to Vex-generated code, and they should be
            unchanged on exit from it.  Failure of this assertion
            usually means a bug in Vex's code generation. */
         //{ UInt xx;
         //  __asm__ __volatile__ (
         //     "\t.word 0xEEF12A10\n"  // fmrx r2,fpscr
         //     "\tmov %0, r2" : "=r"(xx) : : "r2" );
         //  VG_(printf)("QQQQ new fpscr = %08x\n", xx);
         //}
         vg_assert2(0, "VG_(scheduler), phase 3: "
                       "run_innerloop detected host "
                       "state invariant failure", trc);
Comment 6 John Reiser 2024-12-15 23:02:33 UTC
TLDR: RFE: configure script should support 32-bit arm targets on 64-bit Linux for arm64 (aarch64)

Details:
The `configure` script for valgrind believes that specifying "gcc -m32" is all that is necessary to build a valgrind that analyzes 32-bit programs.  This may be true on amd64 (x86_64 hardware and OS running i686 targets) and perhaps other "big iron" machines such as SPARC, but is false for arm64 (aarch64 and OS) running arm ("arm32") targets.  In particular, "gcc -m32" fails on Debian trixie Linux (now version "testing", will become "stable" Debian-13 in summer 2025 [7 months from now]).  So this Comment is a RFE (Request For Enhancement) to handle "Secondary build architecture" for arm64/arm32.

Running 64-bit  Debian-testing Linux operating system (version "trixie") on  RaspberryPi model 3 B V1.2 hardware ("arm64" or Aarch64-A), the valgrind configure script ends with output:
=====
                    Version: 3.25.0.GIT
         Maximum build arch: arm64
         Primary build arch: arm64
       Secondary build arch: 
                   Build OS: linux
     Link Time Optimisation: no
       Primary build target: ARM64_LINUX
     Secondary build target: 
           Platform variant: vanilla
      Primary -DVGPV string: -DVGPV_arm64_linux_vanilla=1
         Default supp files: ./xfree-3.supp ./xfree-4.supp glibc-2.X-drd.supp glibc-2.X-helgrind.supp glibc-2.X.supp
=====
Note that the Secondary build arch is empty. The RFE: there should be a way for the configure script to decide that "arm32" is a legitimate target. Unfortunately it's complicated, because there are at least two 32-bit arm targets: 'arm' (base level "vanilla" hardware, v7 and above, sometimes called 'arm7hl' ) and 'armhf' (v7 plus hardware instructions for floating point arithmetic), and the procedure calling conventions are DIFFERENT for floating-point values.  [Things would be simpler if valgrind (or at least memcheck) would guarantee not to use any floating point values internally, so that the same compiled binary 32-bit code for memcheck would run identically on both 32-bit flavors.]

On my RPi with 64-bit Debian "trixie", I have the packages:
=====
||/ Name           Version      Architecture Description
+++-==============-============-============-=================================
ii  gcc-12         12.4.0-2     arm64        GNU C compiler
ii  gcc-13         13.3.0-8     arm64        GNU C compiler
ii  gcc-14         14.2.0-8     arm64        GNU C compiler
=====
and each of them fails to recognize "-m32":
=====
$ gcc-12 -m32 --version
gcc-12: error: unrecognized command-line option '-m32'
gcc-12 (Debian 12.4.0-2) 12.4.0
$ gcc-13 -m32 --version
gcc-13: error: unrecognized command-line option '-m32'
gcc-13 (Debian 13.3.0-8) 13.3.0
$ gcc -m32 --version
gcc: error: unrecognized command-line option '-m32'
gcc (Debian 14.2.0-8) 14.2.0
=====

For compiling 32-bit arm software on arm64, I have the packages
   gcc-arm-linux-gnueabihf: /usr/bin/arm-linux-gnueabihf-gcc
   gcc-arm-linux-gnueabi: /usr/bin/arm-linux-gnueabi-gcc
which already generate 32-bit code without requiring "-m32" parameter, and also
=====
$ /usr/bin/clang-14 -m32 --version
Debian clang version 14.0.6
Target: arm-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

$ /usr/bin/clang-16 -m32 --version
Debian clang version 16.0.6 (20)
Target: arm-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
=====
So clang DOES recognize "-m32".  But clang  says that another parameter
should be specified, too:
=====
$ clang-16 -m32 hello.c
clang: warning: unknown platform, assuming -mfloat-abi=soft
clang: warning: unknown platform, assuming -mfloat-abi=soft
clang: warning: unknown platform, assuming -mfloat-abi=soft
clang: warning: unknown platform, assuming -mfloat-abi=soft
clang: warning: unknown platform, assuming -mfloat-abi=soft
=====

Workaround:  Before running the configure script, then I will
   export GCC=$HOME/bin/my gcc-32-or-64
so that $(GCC) designates a shell script
that figures out which compiler to run:
=====
   new_args=""
   mode="-m64"  # default on this 64-bit machine

   # Pick the last explicit -m32 or -m64 specifier (and delete it)
   for arg in "$*"; do case "$arg" in
    -m32) mode="-m32" ;;
    -m64) mode="-m64" ;;
    *)  new_args = "$new_args $arg"
  esac
  case "$mode" in
    -m32) exec  gcc-arm-linux-gnueabi  $new_args ;;
    -m64) exec gcc-14  $new_args ;;
  esac
=====
[and figure ou thow to handle embedded whitespace within an arg.]
Comment 7 John Reiser 2024-12-16 00:31:58 UTC
Current state with fixed the shell script, re-configure, and re-build:

export CC=$HOME/bin/my_gcc_32_or_64
export CXX=$HOME/bin/my_g++_32_or_64

./configure --prefix=$HOME/local

===== my_gcc_32_or_64
#! /bin/bash
# set -x

   new_args=""
   mode="-m64"  # default on this 64-bit machine

   # Pick the last explicit -m32 or -m64 specifier (and delete it)
   for arg in "$@"; do case "$arg" in
    -m32) mode="-m32" ;;
    -m64) mode="-m64" ;;
    *)  new_args="$new_args $arg" ;;
  esac; done

  case "$mode" in
    -m32) exec /usr/bin/clang-14 -m32 -march=armv7 -mfloat-abi=soft $new_args ;;
    -m64) exec /usr/bin/clang-14 $new_args ;;
  esac

# EOF

===== my_g++_32_or_64
#! /bin/bash
# set -x

   new_args=""
   mode="-m64"  # default on this 64-bit machine

   # Pick the last explicit -m32 or -m64 specifier (and delete it)
   for arg in "$@"; do case "$arg" in
    -m32) mode="-m32" ;;
    -m64) mode="-m64" ;;
    *)  new_args="$new_args $arg" ;;
  esac; done

  case "$mode" in
    -m32) exec /usr/bin/g++-14 -m32  -march=armv7 -mfloat-abi=soft $new_args ;;
    -m64) exec /usr/bin/g++-14  $new_args ;;
  esac

# EOF

=====

./configure --prefix=$HOME/local
make -j2   ## most that 64-bit RaspberryPi model 3 b v1.2 can support without paging
make install

## 64-bit valgrind (memcheck) works properly

## 32-bit valgrind (memcheck) fails.
## ./hello32 is hello-world.c compiled for 32-bit execution
$ file ./hello32
./hello32: ELF 32-bit LSB pie executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.3, BuildID[sha1]=4a6b63ea5e184b17ed93d0be701b7c0cde42411f, for GNU/Linux 3.2.0, with debug_info, not stripped
$ $HOME/local/bin/valgrind ./a.out
valgrind: m_ume.c: can't open interpreter

$ readelf --segments ./hello32
Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  ARM_EXIDX      0x0006dc 0x000006dc 0x000006dc 0x00008 0x00008 R   0x4
  PHDR           0x000034 0x00000034 0x00000034 0x00140 0x00140 R   0x4
  INTERP         0x000198 0x00000198 0x00000198 0x00013 0x00013 R   0x1
      [Requesting program interpreter: /lib/ld-linux.so.3]
  LOAD           0x000000 0x00000000 0x00000000 0x00780 0x00780 R E 0x1000
  LOAD           0x000f08 0x00001f08 0x00001f08 0x00134 0x00138 RW  0x1000
  DYNAMIC        0x000f10 0x00001f10 0x00001f10 0x000f0 0x000f0 RW  0x4
  NOTE           0x000174 0x00000174 0x00000174 0x00024 0x00024 R   0x4
  NOTE           0x0006e8 0x000006e8 0x000006e8 0x00098 0x00098 R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x10
  GNU_RELRO      0x000f08 0x00001f08 0x00001f08 0x000f8 0x000f8 R   0x1

$ ls -l /lib/ld-linux.so.3
ls: cannot access '/lib/ld-linux.so.3': No such file or directory
$ dpkg --search /lib/ld-linux.so.3
dpkg-query: no path found matching pattern /lib/ld-linux.so.3