Bug 466172

Summary: SIGTRAP crash whenever getaddrinfo call is issued by valgrind
Product: [Developer tools] valgrind Reporter: Mike J <do.not.spam.me.kde.bugzilla>
Component: memcheckAssignee: Julian Seward <jseward>
Status: RESOLVED NOT A BUG    
Severity: crash CC: b_betts, mark, pjfloyd, thomas.akin
Priority: NOR    
Version First Reported In: 3.20.0   
Target Milestone: ---   
Platform: RedHat Enterprise Linux   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:

Description Mike J 2023-02-21 01:09:14 UTC
SUMMARY
On my work RHEL 7.9 development system, valgrind coredumps and exits following a SIGTRAP generated when calling the C library getaddrinfo function. This occurs when running valgrind on our internal applications that call getaddrinfo(), and also when running valgrind on the 'hostname -d' command. This will produce a core file for the program. Output from valgrind is available below, as is gdb output from the core file. The problem occurs with the memcheck tool, and also the callgrind tool.
Further details given below are from running valgrind with hostname -d.

 No failure occurs when running  valgrind  when hostname has no args, as it does not generate a getaddrinfo call.

Similar results were seen when running the native valgrind v3.15 install for RHEL 7.9, as seen when running a locally compiled valgrind v3.20 release. The stack trace in the core file does not make a lot of sense as the system doesn't currently have debuginfo packages installed. I'm trying to arrange these for the glibc and hostname packages via a sysadmin, but this is not available at the moment. 

On my home host with VirtualBox, using RHEL 7.9, RHEL 8.7 and Fedora 35, running valgrind with signal tracing turned on against 'hostname -d' shows two SIGSEGV signals are raised but handled by valgrind and it continues to successful command completion. I can't comment as to whether this should or shouldn't happen, I've just used this to establish an expected baseline. When run with strace instead of valgrind, no signals are found to be raised. When running strace with valgrind and hostname -d, both strace and valgrind report the two SIGSEGV signals.

STEPS TO REPRODUCE
1. valgrind --trace-signals=yes -v hostname -d
2. 
3. 

OBSERVED RESULT
valgrind handles a single SIGSEGV and continues, then handles a SIGTRAP and core dumps. valgrind output indicates the crash happened during a getaddrinfo call.

EXPECTED RESULT
Program being run with valgrind should survive a call to getaddrinfo and continue running, so that memory leak checking can be carried out.

SOFTWARE/OS VERSIONS
Linux: RHEL 7.9 server without GUI installed

ADDITIONAL INFORMATION
Valgrind generates the following output
==80114== Memcheck, a memory error detector
==80114== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==80114== Using Valgrind-3.20.0-5147d671e4-20221024 and LibVEX; rerun with -h for copyright info
==80114== Command: hostname -d
==80114== Parent PID: 20454
==80114== 
--80114-- 
--80114-- Valgrind options:
--80114--    --trace-signals=yes
--80114--    -v
--80114--    --log-file=valgrind.out
--80114-- Contents of /proc/version:
--80114--   Linux version 3.10.0-1160.81.1.el7.x86_64 (mockbuild@x86-vm-38.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Thu Nov 24 12:21:22 UTC 2022
--80114-- 
--80114-- Arch and hwcaps: AMD64, LittleEndian, amd64-cx16-lzcnt-rdtscp-sse3-ssse3-avx-avx2-bmi-f16c-rdrand
--80114-- Page sizes: currently 4096, max supported 4096
--80114-- Valgrind library directory: /home/auser/local/libexec/valgrind
--80114-- Reading syms from /usr/bin/hostname
--80114--    object doesn't have a symbol table
--80114-- Reading syms from /usr/lib64/ld-2.17.so
--80114-- Reading syms from /home/auser/local/libexec/valgrind/memcheck-amd64-linux
--80114--    object doesn't have a dynamic symbol table
--80114-- Scheduler: using generic scheduler lock implementation.
--80114-- Max kernel-supported signal is 64, VG_SIGVGKILL is 64
--80114-- Reading suppressions file: /home/auser/local/libexec/valgrind/default.supp
==80114== embedded gdbserver: reading from /tmp/vgdb-pipe-from-vgdb-to-80114-by-auser-on-hostname.localdomain
==80114== embedded gdbserver: writing to   /tmp/vgdb-pipe-to-vgdb-from-80114-by-auser-on-hostname.localdomain
==80114== embedded gdbserver: shared mem   /tmp/vgdb-pipe-shared-mem-vgdb-80114-by-auser-on-hostname.localdomain
==80114== 
==80114== TO CONTROL THIS PROCESS USING vgdb (which you probably
==80114== don't want to do, unless you know exactly what you're doing,
==80114== or are doing some strange experiment):
==80114==   /home/auser/local/libexec/valgrind/../../bin/vgdb --pid=80114 ...command...
==80114== 
==80114== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==80114==   /path/to/gdb hostname
==80114== and then give GDB the following command
==80114==   target remote | /home/auser/local/libexec/valgrind/../../bin/vgdb --pid=80114
==80114== --pid is optional if only one valgrind process is running
==80114== 
--80114-- REDIR: 0x4019e40 (ld-linux-x86-64.so.2:strlen) redirected to 0x580cc9e5 (vgPlain_amd64_linux_REDIR_FOR_strlen)
--80114-- REDIR: 0x4019c10 (ld-linux-x86-64.so.2:index) redirected to 0x580cc9ff (vgPlain_amd64_linux_REDIR_FOR_index)
--80114-- sync signal handler: signal=11, si_code=1, EIP=0x4001cf1, eip=0x1003803576, from kernel
--80114-- SIGSEGV: si_code=1 faultaddr=0x1ffeffd9b8 tid=1 ESP=0x1ffeffd9b8 seg=0x1ffe001000-0x1ffeffdfff
--80114--        -> extended stack base to 0x1ffeffd000
--80114-- Reading syms from /home/auser/local/libexec/valgrind/vgpreload_core-amd64-linux.so
--80114-- Reading syms from /home/auser/local/libexec/valgrind/vgpreload_memcheck-amd64-linux.so
==80114== WARNING: new redirection conflicts with existing -- ignoring it
--80114--     old: 0x04019e40 (strlen              ) R-> (0000.0) 0x580cc9e5 vgPlain_amd64_linux_REDIR_FOR_strlen
--80114--     new: 0x04019e40 (strlen              ) R-> (2007.0) 0x04c30b10 strlen
--80114-- REDIR: 0x4019dc0 (ld-linux-x86-64.so.2:strcmp) redirected to 0x4c31d00 (strcmp)
--80114-- REDIR: 0x4019fa0 (ld-linux-x86-64.so.2:strncmp) redirected to 0x4c31420 (strncmp)
--80114-- REDIR: 0x401aa80 (ld-linux-x86-64.so.2:mempcpy) redirected to 0x4c35d90 (mempcpy)
--80114-- REDIR: 0x401abd0 (ld-linux-x86-64.so.2:stpcpy) redirected to 0x4c349d0 (stpcpy)
--80114-- Reading syms from /usr/lib64/liboneagentproc.so
--80114--    object doesn't have a symbol table
--80114-- Reading syms from /usr/lib64/libnsl-2.17.so
--80114-- Reading syms from /usr/lib64/libc-2.17.so
==80114== WARNING: new redirection conflicts with existing -- ignoring it
--80114--     old: 0x052e8000 (memalign            ) R-> (1011.0) 0x04c2fde2 memalign
--80114--     new: 0x052e8000 (memalign            ) R-> (1017.0) 0x04c2fdb2 aligned_alloc
==80114== WARNING: new redirection conflicts with existing -- ignoring it
--80114--     old: 0x052e8000 (memalign            ) R-> (1011.0) 0x04c2fde2 memalign
--80114--     new: 0x052e8000 (memalign            ) R-> (1017.0) 0x04c2fd85 aligned_alloc
==80114== WARNING: new redirection conflicts with existing -- ignoring it
--80114--     old: 0x052e8000 (memalign            ) R-> (1011.0) 0x04c2fde2 memalign
--80114--     new: 0x052e8000 (memalign            ) R-> (1017.0) 0x04c2fdb2 aligned_alloc
==80114== WARNING: new redirection conflicts with existing -- ignoring it
--80114--     old: 0x052e8000 (memalign            ) R-> (1011.0) 0x04c2fde2 memalign
--80114--     new: 0x052e8000 (memalign            ) R-> (1017.0) 0x04c2fd85 aligned_alloc
--80114-- REDIR: 0x52f21d0 (libc.so.6:strcasecmp) redirected to 0x4a247af (_vgnU_ifunc_wrapper)
--80114-- REDIR: 0x52eef40 (libc.so.6:strnlen) redirected to 0x4a247af (_vgnU_ifunc_wrapper)
--80114-- REDIR: 0x52f44d0 (libc.so.6:strncasecmp) redirected to 0x4a247af (_vgnU_ifunc_wrapper)
--80114-- REDIR: 0x52f19a0 (libc.so.6:memset) redirected to 0x4a247af (_vgnU_ifunc_wrapper)
--80114-- REDIR: 0x52f1950 (libc.so.6:memcpy@GLIBC_2.2.5) redirected to 0x4a247af (_vgnU_ifunc_wrapper)
--80114-- REDIR: 0x52f08b0 (libc.so.6:strncpy) redirected to 0x4a247af (_vgnU_ifunc_wrapper)
--80114-- REDIR: 0x52ef020 (libc.so.6:strncmp) redirected to 0x4a247af (_vgnU_ifunc_wrapper)
--80114-- REDIR: 0x52ee850 (libc.so.6:strcpy) redirected to 0x4a247af (_vgnU_ifunc_wrapper)
--80114-- REDIR: 0x52f2020 (libc.so.6:stpcpy) redirected to 0x4a247af (_vgnU_ifunc_wrapper)
--80114-- REDIR: 0x52eee10 (libc.so.6:strlen) redirected to 0x4a247af (_vgnU_ifunc_wrapper)
--80114-- REDIR: 0x52ed300 (libc.so.6:index) redirected to 0x4a247af (_vgnU_ifunc_wrapper)
--80114-- REDIR: 0x52f1380 (libc.so.6:bcmp) redirected to 0x4a247af (_vgnU_ifunc_wrapper)
--80114-- REDIR: 0x52f8270 (libc.so.6:rawmemchr) redirected to 0x4a247af (_vgnU_ifunc_wrapper)
--80114-- REDIR: 0x52ed3c0 (libc.so.6:strcmp) redirected to 0x4a247af (_vgnU_ifunc_wrapper)
--80114-- REDIR: 0x52f6bc0 (libc.so.6:memcpy@@GLIBC_2.14) redirected to 0x4a247af (_vgnU_ifunc_wrapper)
--80114-- REDIR: 0x52f1b00 (libc.so.6:mempcpy) redirected to 0x4a247af (_vgnU_ifunc_wrapper)
--80114-- REDIR: 0x5307fd0 (libc.so.6:strstr) redirected to 0x4a247af (_vgnU_ifunc_wrapper)
--80114-- REDIR: 0x52f0930 (libc.so.6:__GI_strrchr) redirected to 0x4c304d0 (__GI_strrchr)
--80114-- REDIR: 0x52f6c30 (libc.so.6:__GI_memcpy) redirected to 0x4c329b0 (__GI_memcpy)
--80114-- REDIR: 0x52eee60 (libc.so.6:__GI_strlen) redirected to 0x4c30a70 (__GI_strlen)
--80114-- REDIR: 0x52f08f0 (libc.so.6:rindex) redirected to 0x4a247af (_vgnU_ifunc_wrapper)
--80114-- REDIR: 0x53a2d50 (libc.so.6:__strrchr_sse42) redirected to 0x4c30560 (__strrchr_sse42)
--80114-- REDIR: 0x53a0fc0 (libc.so.6:__strcmp_sse42) redirected to 0x4c31cb0 (__strcmp_sse42)
--80114-- REDIR: 0x52ed340 (libc.so.6:__GI_strchr) redirected to 0x4c30600 (__GI_strchr)
--80114-- REDIR: 0x52e7740 (libc.so.6:malloc) redirected to 0x4c2b100 (malloc)
--80114-- REDIR: 0x52f1030 (libc.so.6:memchr) redirected to 0x4c31da0 (memchr)
--80114-- delivering signal 5 (SIGTRAP):1 to thread 1
--80114-- delivering 5 (code 1) to default handler; action: terminate+core
==80114== 
==80114== Process terminating with default action of signal 5 (SIGTRAP): dumping core
--80114--        -> extended stack base to 0x1ffefff000
==80114==    at 0x53495E1: getaddrinfo (in /usr/lib64/libc-2.17.so)
==80114==    by 0x1FFEFFFA7F: ???
==80114==    by 0x529B225: getenv (in /usr/lib64/libc-2.17.so)
==80114== 
==80114== HEAP SUMMARY:
==80114==     in use at exit: 128 bytes in 1 blocks
==80114==   total heap usage: 1 allocs, 0 frees, 128 bytes allocated
==80114== 
==80114== Searching for pointers to 1 not-freed blocks
==80114== Checked 2,422,200 bytes
==80114== 
==80114== LEAK SUMMARY:
==80114==    definitely lost: 128 bytes in 1 blocks
==80114==    indirectly lost: 0 bytes in 0 blocks
==80114==      possibly lost: 0 bytes in 0 blocks
==80114==    still reachable: 0 bytes in 0 blocks
==80114==         suppressed: 0 bytes in 0 blocks
==80114== Rerun with --leak-check=full to see details of leaked memory
==80114== 
==80114== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Trace/breakpoint trap (core dumped)

@@@@@@@@@@@@@@@@@@@@@@
GDB Output from subsequent run

gdb `which valgrind` vgcore.78953
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/auser/local/bin/valgrind...done.

warning: core file may not match specified executable file.
[New LWP 78953]
Core was generated by `'.
Program terminated with signal 5, Trace/breakpoint trap.
#0  0x00000000053495e1 in ?? ()
(gdb) where
#0  0x00000000053495e1 in ?? ()
#1  0x0000000000401b19 in myvprintf_str (send=0x1ffefff970, send_arg2=0x1, flags=<optimized out>, width=<optimized out>,
    str=<optimized out>, capitalise=1 '\001') at m_debuglog.c:786
#2  0x0000000000000000 in ?? ()
(gdb) 

The line "#0  0x00000000053495e1 in ?? ()" from gdb output matches the valgrind output 
==80114==    at 0x53495E1: getaddrinfo (in /usr/lib64/libc-2.17.so)
Comment 1 Paul Floyd 2023-02-21 09:18:11 UTC
This might be fairly tricky to reproduce. I can't reproduce this on a RHEL 7.9 machine using vas4 ldap.

What network config are you using?

I do get

SYSCALL[21118,1](12) sys_brk ( 0x0 ) --> [pre-success] Success(0x4224000) 
--21118-- REDIR: 0x4019e40 (ld-linux-x86-64.so.2:strlen) redirected to 0x580c7ed5 (???)
--21118-- sync signal handler: signal=11, si_code=1, EIP=0x4001f49, eip=0x1002bb608b, from kernel
--21118-- SIGSEGV: si_code=1 faultaddr=0x1ffeffdf80 tid=1 ESP=0x1ffeffdf30 seg=0x1ffe801000-0x1ffeffdfff
--21118:1: signals extending a stack base 0x1ffeffe000 down by 4096 new base 0x1ffeffd000 to cover 0x1ffeffd000
--21118--        -> extended stack base to 0x1ffeffd000
SYSCALL[21118,1](63) sys_newuname ( 0x1ffeffdd6a )[sync] --> Success(0x0) 
--21118-- REDIR: 0x4019c10 (ld-linux-x86-64.so.2:index) redirected to 0x580c7eef (???)
SYSCALL[21118,1](9) sys_mmap ( 0x0, 4096, 3, 34, -1, 0 ) --> [pre-success] Success(0x4022000) 
--21118-- sync signal handler: signal=11, si_code=1, EIP=0x4001cf1, eip=0x1002bd28ae, from kernel
--21118-- SIGSEGV: si_code=1 faultaddr=0x1ffeffcef8 tid=1 ESP=0x1ffeffcef8 seg=0x1ffe801000-0x1ffeffcfff
--21118:1: signals extending a stack base 0x1ffeffd000 down by 4096 new base 0x1ffeffc000 to cover 0x1ffeffc000
--21118--        -> extended stack base to 0x1ffeffc000
Comment 2 Mark Wielaard 2023-02-21 10:44:19 UTC
(In reply to Paul Floyd from comment #1)
> This might be fairly tricky to reproduce. I can't reproduce this on a RHEL
> 7.9 machine using vas4 ldap.

I also am unable to reproduce. It might be helpful to install the glibc debuginfo to get a better idea where the issue comes from.
Comment 3 Mike J 2023-02-21 21:20:23 UTC
Thanks. Although the sysadmins installed the correct debuginfo for glibc and hostname today, I won't have collatable results from this until 23rd Feb. I'll provide an update then
Comment 4 Mike J 2023-02-27 18:14:45 UTC
It was noted that we have Dynatrace OneAgent installed, which preloads one of its libraries by adding it to /etc/ld.so.preload. Although originally thought it might be involved in the problem, we have concluded today that it is not involved, by doing two valgrind runs with gdb attached on hostname -d, with and without the preloaded library.

The two run details are shown below with gdb output, in the hope that somebody can spot something untoward that valgrind may be doing.
In the first run, /etc/ld.so.preload is set up to dynamically link in a Dynatrace OneAgent library for each program started.
In the second run, /etc/ld.so.preload is renamed and ldconfig run to relink runtime shared library cache
GDB is attached once valgrind is started.
Breakpoints are set on show_name and getaddrinfo, but stepped through from the show_name breakpoint to also watch the dynamic linker behaviour in loading the getaddrinfo call.
The initial step from show_name() to getaddrinfo() call shows dynamic linker involved in loading call from glibc library.
On first entry into getaddrinfo function, the callstack is OK.
On the next step instruction, the callstack becomes corrupted.
Continuing on leads to a SIGSEGV, rather than a SIGTRAP, which crashes the program. 
Both runs are identical in outcome.
If the debugger is not attached, a SIGTRAP is instead raised, which crashes the program.

Lines with @@@@@@@@@@@@@ below are eye catchers for relevant notes

@@@@@@@@@@@@@
First run
@@@@@@@@@@@@@

[auser@hostname ~]$ cat /etc/ld.so.preload
/$LIB/liboneagentproc.so

[auser@hostname ~]$ ldd /usr/bin/hostname
        linux-vdso.so.1 =>  (0x00007ffd02d96000)
        /$LIB/liboneagentproc.so => /lib64/liboneagentproc.so (0x00002af8ce652000)
        libnsl.so.1 => /usr/lib64/libnsl.so.1 (0x00002af8ce860000)
        libc.so.6 => /usr/lib64/libc.so.6 (0x00002af8cea7a000)
        /lib64/ld-linux-x86-64.so.2 (0x00002af8ce42e000)

Terminal 1
valgrind --trace-signals=yes -v --log-file=valgrind.out.2  --vgdb=full --vgdb-stop-at=startup hostname -d   
Terminal 2
 cat valgrind.out.2
==77535== Memcheck, a memory error detector
==77535== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==77535== Using Valgrind-3.20.0-5147d671e4-20221024 and LibVEX; rerun with -h for copyright info
==77535== Command: hostname -d
==77535== Parent PID: 111647
==77535==
--77535--
--77535-- Valgrind options:
--77535--    --trace-signals=yes
--77535--    -v
--77535--    --log-file=valgrind.out.2
--77535--    --vgdb=full
--77535--    --vgdb-stop-at=startup
--77535-- Contents of /proc/version:
--77535--   Linux version 3.10.0-1160.81.1.el7.x86_64 (mockbuild@x86-vm-38.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Thu Nov 24 12:21:22 UTC 2022
--77535--
--77535-- Arch and hwcaps: AMD64, LittleEndian, amd64-cx16-lzcnt-rdtscp-sse3-ssse3-avx-avx2-bmi-f16c-rdrand
--77535-- Page sizes: currently 4096, max supported 4096
--77535-- Valgrind library directory: /home/auser/local/libexec/valgrind
--77535-- Reading syms from /usr/bin/hostname
--77535--   Considering /usr/lib/debug/.build-id/93/633698bd11eeb4bee21a388c191a5656990d8e.debug ..
--77535--   .. build-id is valid
--77535-- Reading syms from /usr/lib64/ld-2.17.so
--77535--   Considering /usr/lib/debug/.build-id/62/c449974331341bb08dcce3859560a22af1e172.debug ..
--77535--   .. build-id is valid
--77535-- Reading syms from /home/auser/local/libexec/valgrind/memcheck-amd64-linux
--77535--    object doesn't have a dynamic symbol table
--77535-- Scheduler: using generic scheduler lock implementation.
--77535-- Max kernel-supported signal is 64, VG_SIGVGKILL is 64
--77535-- Reading suppressions file: /home/auser/local/libexec/valgrind/default.supp
==77535== (action at startup) vgdb me ...
==77535== embedded gdbserver: reading from /tmp/vgdb-pipe-from-vgdb-to-77535-by-auser-on-hostname.localdomain
==77535== embedded gdbserver: writing to   /tmp/vgdb-pipe-to-vgdb-from-77535-by-auser-on-hostname.localdomain
==77535== embedded gdbserver: shared mem   /tmp/vgdb-pipe-shared-mem-vgdb-77535-by-auser-on-hostname.localdomain
==77535==
==77535== TO CONTROL THIS PROCESS USING vgdb (which you probably
==77535== don't want to do, unless you know exactly what you're doing,
==77535== or are doing some strange experiment):
==77535==   /home/auser/local/libexec/valgrind/../../bin/vgdb --pid=77535 ...command...
==77535==
==77535== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==77535==   /path/to/gdb hostname
==77535== and then give GDB the following command
==77535==   target remote | /home/auser/local/libexec/valgrind/../../bin/vgdb --pid=77535
==77535== --pid is optional if only one valgrind process is running
==77535==
[auser@hostname ~]$ gdb /usr/bin/hostname
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/hostname...Reading symbols from /usr/lib/debug/usr/bin/hostname.debug...done.
done.
(gdb) target remote | /home/auser/local/libexec/valgrind/../../bin/vgdb --pid=77535
Remote debugging using | /home/auser/local/libexec/valgrind/../../bin/vgdb --pid=77535
relaying data between gdb and process 77535
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug/usr/lib64/ld-2.17.so.debug...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x0000000004001140 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) break show_name
Breakpoint 1 at 0x401930: file hostname.c, line 256.
(gdb) break getaddrinfo
Breakpoint 2 at 0x401170
(gdb) cont
Continuing.
Missing separate debuginfo for /lib64/liboneagentproc.so
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/48/81feaea9f4a359f31684c530503b5c629d53af.debug

Breakpoint 1, show_name (type=type@entry=DNS) at hostname.c:256
256     {
(gdb) n
264             switch(type)
(gdb) n
334                             memset(&hints, 0, sizeof(struct addrinfo));
(gdb) n
335                             hints.ai_socktype = SOCK_DGRAM;
(gdb) n
336                             hints.ai_flags = AI_CANONNAME;
(gdb) n
338                             p = localhost();
(gdb) n
339                             if ((ret = getaddrinfo(p, NULL, &hints, &res)) != 0)
(gdb) s
_vgr20160ZU_ldZhlinuxZhx86Zh64ZdsoZd2_strcmp (s1=0x40078f "libc.so.6", s2=0x401cf5a "") at ../shared/vg_replace_strmem.c:941
941      STRCMP(VG_Z_LD_LINUX_X86_64_SO_2, strcmp)
(gdb) finish
Run till exit from #0  _vgr20160ZU_ldZhlinuxZhx86Zh64ZdsoZd2_strcmp (s1=0x40078f "libc.so.6", s2=0x401cf5a "")
    at ../shared/vg_replace_strmem.c:941
0x0000000004010a15 in _dl_name_match_p (name=0x40078f "libc.so.6", map=0x4223150) at dl-misc.c:305
305       if (strcmp (name, map->l_name) == 0)
Value returned is $1 = 1
(gdb) finish
Run till exit from #0  0x0000000004010a15 in _dl_name_match_p (name=0x40078f "libc.so.6", map=0x4223150) at dl-misc.c:305
0x0000000004009826 in do_lookup_x (new_hash=new_hash@entry=2089078220, old_hash=old_hash@entry=0x1ffefff460,
    result=result@entry=0x1ffefff470, scope=<optimized out>, i=<optimized out>, i@entry=0, flags=flags@entry=1, skip=skip@entry=0x0,
    undef_map=undef_map@entry=0x4223150) at dl-lookup.c:463
463               && __builtin_expect (_dl_name_match_p (version->filename, map), 0))
Value returned is $2 = 0
(gdb) finish
Run till exit from #0  0x0000000004009826 in do_lookup_x (new_hash=new_hash@entry=2089078220, old_hash=old_hash@entry=0x1ffefff460,
    result=result@entry=0x1ffefff470, scope=<optimized out>, i=<optimized out>, i@entry=0, flags=flags@entry=1, skip=skip@entry=0x0,
    undef_map=undef_map@entry=0x4223150) at dl-lookup.c:463
_dl_lookup_symbol_x (undef_name=0x400820 "getaddrinfo", undef_map=0x4223150, ref=ref@entry=0x1ffefff528, symbol_scope=0x42234a8,
    version=0x4023030, type_class=type_class@entry=1, flags=1, skip_map=skip_map@entry=0x0) at dl-lookup.c:742
742           if (res > 0)
Value returned is $3 = 1
(gdb) where
#0  _dl_lookup_symbol_x (undef_name=0x400820 "getaddrinfo", undef_map=0x4223150, ref=ref@entry=0x1ffefff528, symbol_scope=0x42234a8,
    version=0x4023030, type_class=type_class@entry=1, flags=1, skip_map=skip_map@entry=0x0) at dl-lookup.c:742
#1  0x000000000400edee in _dl_fixup (l=<optimized out>, reloc_arg=<optimized out>) at ../elf/dl-runtime.c:111
#2  0x00000000040169ea in _dl_runtime_resolve_xsave () at ../sysdeps/x86_64/dl-trampoline.h:131
#3  0x0000000000401b19 in show_name (type=type@entry=DNS) at hostname.c:339
#4  0x00000000004013e4 in main (argc=2, argv=0x1ffefffb88) at hostname.c:550
(gdb) finish
Run till exit from #0  _dl_lookup_symbol_x (undef_name=0x400820 "getaddrinfo", undef_map=0x4223150, ref=ref@entry=0x1ffefff528,
    symbol_scope=0x42234a8, version=0x4023030, type_class=type_class@entry=1, flags=1, skip_map=skip_map@entry=0x0) at dl-lookup.c:742
0x000000000400edee in _dl_fixup (l=<optimized out>, reloc_arg=<optimized out>) at ../elf/dl-runtime.c:111
111           result = _dl_lookup_symbol_x (strtab + sym->st_name, l, &sym, l->l_scope,
Value returned is $4 = (struct link_map *) 0x40249a8
(gdb) finish
Run till exit from #0  0x000000000400edee in _dl_fixup (l=<optimized out>, reloc_arg=<optimized out>) at ../elf/dl-runtime.c:111
_dl_runtime_resolve_xsave () at ../sysdeps/x86_64/dl-trampoline.h:132
132             mov %RAX_LP, %R11_LP    # Save return value
Value returned is $5 = 87340512
(gdb) where
#0  _dl_runtime_resolve_xsave () at ../sysdeps/x86_64/dl-trampoline.h:132
#1  0x0000000000401b19 in show_name (type=type@entry=DNS) at hostname.c:339
#2  0x00000000004013e4 in main (argc=2, argv=0x1ffefffb88) at hostname.c:550
(gdb) finish
Run till exit from #0  _dl_runtime_resolve_xsave () at ../sysdeps/x86_64/dl-trampoline.h:132

Breakpoint 2, __GI_getaddrinfo (name=0x5632040 "hostname.localdomain", service=service@entry=0x0,
    hints=hints@entry=0x1ffefff920, pai=pai@entry=0x1ffefff918) at ../sysdeps/posix/getaddrinfo.c:2208
2208    {
(gdb) where
#0  __GI_getaddrinfo (name=0x5632040 "hostname.localdomain", service=service@entry=0x0, hints=hints@entry=0x1ffefff920,
    pai=pai@entry=0x1ffefff918) at ../sysdeps/posix/getaddrinfo.c:2208
#1  0x0000000000401b19 in show_name (type=type@entry=DNS) at hostname.c:339
#2  0x00000000004013e4 in main (argc=2, argv=0x1ffefffb88) at hostname.c:550

@@@@@@@@@@@@@
Call stack becomes scrambled on next step. Only #0 is shown correctly with where
@@@@@@@@@@@@@
(gdb) s
__GI_getaddrinfo (name=0x5632040 "hostname.localdomain", service=0x0, hints=0x1ffefff920, pai=0x1ffefff918)
    at ../sysdeps/posix/getaddrinfo.c:2215
2215      if (name != NULL && name[0] == '*' && name[1] == 0)
(gdb) where
#0  __GI_getaddrinfo (name=0x5632040 "hostname.localdomain", service=0x0, hints=0x1ffefff920, pai=0x1ffefff918)
    at ../sysdeps/posix/getaddrinfo.c:2215
#1  0x0000001ffefffa30 in ?? ()
#2  0x000000000529d226 in __GI_getenv (name=0x1ffefff920 "\002") at getenv.c:35
#3  0x0000000000000000 in ?? ()
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x0000001ffefffa30 in ?? ()
(gdb) cont
Continuing.

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb) quit

Terminal 1
Segmentation fault (core dumped)

@@@@@@@@@@@@@
Second run
@@@@@@@@@@@@@
[auser@hostname ~]$ ls -l /etc/ld.so.preload
ls: cannot access /etc/ld.so.preload: No such file or directory

[auser@hostname ~]$ ldd /usr/bin/hostname
        linux-vdso.so.1 =>  (0x00007fffad9b2000)
        libnsl.so.1 => /usr/lib64/libnsl.so.1 (0x00002b3691222000)
        libc.so.6 => /usr/lib64/libc.so.6 (0x00002b369143c000)
        /lib64/ld-linux-x86-64.so.2 (0x00002b3690ffe000)

Terminal 1
[auser@hostname ~]$ valgrind --trace-signals=yes -v --log-file=valgrind.out.3  --vgdb=full --vgdb-stop-at=startup hostname -d

Terminal 2
[auser@hostname ~]$ cat valgrind.out.3
==92915== Memcheck, a memory error detector
==92915== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==92915== Using Valgrind-3.20.0-5147d671e4-20221024 and LibVEX; rerun with -h for copyright info
==92915== Command: hostname -d
==92915== Parent PID: 73200
==92915==
--92915--
--92915-- Valgrind options:
--92915--    --trace-signals=yes
--92915--    -v
--92915--    --log-file=valgrind.out.3
--92915--    --vgdb=full
--92915--    --vgdb-stop-at=startup
--92915-- Contents of /proc/version:
--92915--   Linux version 3.10.0-1160.81.1.el7.x86_64 (mockbuild@x86-vm-38.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Thu Nov 24 12:21:22 UTC 2022
--92915--
--92915-- Arch and hwcaps: AMD64, LittleEndian, amd64-cx16-lzcnt-rdtscp-sse3-ssse3-avx-avx2-bmi-f16c-rdrand
--92915-- Page sizes: currently 4096, max supported 4096
--92915-- Valgrind library directory: /home/auser/local/libexec/valgrind
--92915-- Reading syms from /usr/bin/hostname
--92915--   Considering /usr/lib/debug/.build-id/93/633698bd11eeb4bee21a388c191a5656990d8e.debug ..
--92915--   .. build-id is valid
--92915-- Reading syms from /usr/lib64/ld-2.17.so
--92915--   Considering /usr/lib/debug/.build-id/62/c449974331341bb08dcce3859560a22af1e172.debug ..
--92915--   .. build-id is valid
--92915-- Reading syms from /home/auser/local/libexec/valgrind/memcheck-amd64-linux
--92915--    object doesn't have a dynamic symbol table
--92915-- Scheduler: using generic scheduler lock implementation.
--92915-- Max kernel-supported signal is 64, VG_SIGVGKILL is 64
--92915-- Reading suppressions file: /home/auser/local/libexec/valgrind/default.supp
==92915== (action at startup) vgdb me ...
==92915== embedded gdbserver: reading from /tmp/vgdb-pipe-from-vgdb-to-92915-by-auser-on-hostname.localdomain
==92915== embedded gdbserver: writing to   /tmp/vgdb-pipe-to-vgdb-from-92915-by-auser-on-hostname.localdomain
==92915== embedded gdbserver: shared mem   /tmp/vgdb-pipe-shared-mem-vgdb-92915-by-auser-on-hostname.localdomain
==92915==
==92915== TO CONTROL THIS PROCESS USING vgdb (which you probably
==92915== don't want to do, unless you know exactly what you're doing,
==92915== or are doing some strange experiment):
==92915==   /home/auser/local/libexec/valgrind/../../bin/vgdb --pid=92915 ...command...
==92915==
==92915== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==92915==   /path/to/gdb hostname
==92915== and then give GDB the following command
==92915==   target remote | /home/auser/local/libexec/valgrind/../../bin/vgdb --pid=92915
==92915== --pid is optional if only one valgrind process is running
==92915==
[auser@hostname ~]$ gdb /usr/bin/hostname
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/hostname...Reading symbols from /usr/lib/debug/usr/bin/hostname.debug...done.
done.
(gdb)  target remote | /home/auser/local/libexec/valgrind/../../bin/vgdb --pid=92915
Remote debugging using | /home/auser/local/libexec/valgrind/../../bin/vgdb --pid=92915
relaying data between gdb and process 92915
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug/usr/lib64/ld-2.17.so.debug...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x0000000004001140 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) break show_name
Breakpoint 1 at 0x401930: file hostname.c, line 256.
(gdb) break getaddrinfo
Breakpoint 2 at 0x401170
(gdb) cont
Continuing.

Breakpoint 1, show_name (type=type@entry=DNS) at hostname.c:256
256     {
(gdb) n
264             switch(type)
(gdb) n
334                             memset(&hints, 0, sizeof(struct addrinfo));
(gdb) n
335                             hints.ai_socktype = SOCK_DGRAM;
(gdb) n
336                             hints.ai_flags = AI_CANONNAME;
(gdb) n
338                             p = localhost();
(gdb) n
339                             if ((ret = getaddrinfo(p, NULL, &hints, &res)) != 0)
(gdb) s
_vgr20160ZU_ldZhlinuxZhx86Zh64ZdsoZd2_strcmp (s1=0x40078f "libc.so.6", s2=0x401cf5a "") at ../shared/vg_replace_strmem.c:941
941      STRCMP(VG_Z_LD_LINUX_X86_64_SO_2, strcmp)
(gdb) where
#0  _vgr20160ZU_ldZhlinuxZhx86Zh64ZdsoZd2_strcmp (s1=0x40078f "libc.so.6", s2=0x401cf5a "") at ../shared/vg_replace_strmem.c:941
#1  0x0000000004010a15 in _dl_name_match_p (name=0x40078f "libc.so.6", map=0x4223150) at dl-misc.c:305
#2  0x0000000004009826 in do_lookup_x (new_hash=new_hash@entry=2089078220, old_hash=old_hash@entry=0x1ffefff4a0,
    result=result@entry=0x1ffefff4b0, scope=<optimized out>, i=<optimized out>, i@entry=0, flags=flags@entry=1, skip=skip@entry=0x0,
    undef_map=undef_map@entry=0x4223150) at dl-lookup.c:463
#3  0x000000000400a09f in _dl_lookup_symbol_x (undef_name=0x400820 "getaddrinfo", undef_map=0x4223150, ref=ref@entry=0x1ffefff568,
    symbol_scope=0x42234a8, version=0x4034a90, type_class=type_class@entry=1, flags=1, skip_map=skip_map@entry=0x0) at dl-lookup.c:739
#4  0x000000000400edee in _dl_fixup (l=<optimized out>, reloc_arg=<optimized out>) at ../elf/dl-runtime.c:111
#5  0x00000000040169ea in _dl_runtime_resolve_xsave () at ../sysdeps/x86_64/dl-trampoline.h:131
#6  0x0000000000401b19 in show_name (type=type@entry=DNS) at hostname.c:339
#7  0x00000000004013e4 in main (argc=2, argv=0x1ffefffbd8) at hostname.c:550
(gdb) finish
Run till exit from #0  _vgr20160ZU_ldZhlinuxZhx86Zh64ZdsoZd2_strcmp (s1=0x40078f "libc.so.6", s2=0x401cf5a "")
    at ../shared/vg_replace_strmem.c:941
0x0000000004010a15 in _dl_name_match_p (name=0x40078f "libc.so.6", map=0x4223150) at dl-misc.c:305
305       if (strcmp (name, map->l_name) == 0)
Value returned is $1 = 1
(gdb) finish
Run till exit from #0  0x0000000004010a15 in _dl_name_match_p (name=0x40078f "libc.so.6", map=0x4223150) at dl-misc.c:305
0x0000000004009826 in do_lookup_x (new_hash=new_hash@entry=2089078220, old_hash=old_hash@entry=0x1ffefff4a0,
    result=result@entry=0x1ffefff4b0, scope=<optimized out>, i=<optimized out>, i@entry=0, flags=flags@entry=1, skip=skip@entry=0x0,
    undef_map=undef_map@entry=0x4223150) at dl-lookup.c:463
463               && __builtin_expect (_dl_name_match_p (version->filename, map), 0))
Value returned is $2 = 0
(gdb) finish
Run till exit from #0  0x0000000004009826 in do_lookup_x (new_hash=new_hash@entry=2089078220, old_hash=old_hash@entry=0x1ffefff4a0,
    result=result@entry=0x1ffefff4b0, scope=<optimized out>, i=<optimized out>, i@entry=0, flags=flags@entry=1, skip=skip@entry=0x0,
    undef_map=undef_map@entry=0x4223150) at dl-lookup.c:463
_dl_lookup_symbol_x (undef_name=0x400820 "getaddrinfo", undef_map=0x4223150, ref=ref@entry=0x1ffefff568, symbol_scope=0x42234a8,
    version=0x4034a90, type_class=type_class@entry=1, flags=1, skip_map=skip_map@entry=0x0) at dl-lookup.c:742
742           if (res > 0)
Value returned is $3 = 1
(gdb) finish
Run till exit from #0  _dl_lookup_symbol_x (undef_name=0x400820 "getaddrinfo", undef_map=0x4223150, ref=ref@entry=0x1ffefff568,
    symbol_scope=0x42234a8, version=0x4034a90, type_class=type_class@entry=1, flags=1, skip_map=skip_map@entry=0x0) at dl-lookup.c:742
0x000000000400edee in _dl_fixup (l=<optimized out>, reloc_arg=<optimized out>) at ../elf/dl-runtime.c:111
111           result = _dl_lookup_symbol_x (strtab + sym->st_name, l, &sym, l->l_scope,
Value returned is $4 = (struct link_map *) 0x40344c8
(gdb) finish
Run till exit from #0  0x000000000400edee in _dl_fixup (l=<optimized out>, reloc_arg=<optimized out>) at ../elf/dl-runtime.c:111
_dl_runtime_resolve_xsave () at ../sysdeps/x86_64/dl-trampoline.h:132
132             mov %RAX_LP, %R11_LP    # Save return value
Value returned is $5 = 85186016
(gdb) finish
Run till exit from #0  _dl_runtime_resolve_xsave () at ../sysdeps/x86_64/dl-trampoline.h:132

Breakpoint 2, __GI_getaddrinfo (name=0x5424040 "hostname.localdomain", service=service@entry=0x0,
    hints=hints@entry=0x1ffefff970, pai=pai@entry=0x1ffefff968) at ../sysdeps/posix/getaddrinfo.c:2208
2208    {
(gdb) where
#0  __GI_getaddrinfo (name=0x5424040 "hostname.localdomain", service=service@entry=0x0, hints=hints@entry=0x1ffefff970,
    pai=pai@entry=0x1ffefff968) at ../sysdeps/posix/getaddrinfo.c:2208
#1  0x0000000000401b19 in show_name (type=type@entry=DNS) at hostname.c:339
#2  0x00000000004013e4 in main (argc=2, argv=0x1ffefffbd8) at hostname.c:550

@@@@@@@@@@@@@
Call stack becomes scrambled on next step. Only #0 is shown correctly with where
@@@@@@@@@@@@@
(gdb) s
__GI_getaddrinfo (name=0x5424040 "hostname.localdomain", service=0x0, hints=0x1ffefff970, pai=0x1ffefff968)
    at ../sysdeps/posix/getaddrinfo.c:2215
2215      if (name != NULL && name[0] == '*' && name[1] == 0)
(gdb) where
#0  __GI_getaddrinfo (name=0x5424040 "hostname.localdomain", service=0x0, hints=0x1ffefff970, pai=0x1ffefff968)
    at ../sysdeps/posix/getaddrinfo.c:2215
#1  0x0000001ffefffa80 in ?? ()
#2  0x000000000508f226 in __GI_getenv (name=0x1ffefff970 "\002") at getenv.c:35
#3  0x0000000000000000 in ?? ()
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x0000001ffefffa80 in ?? ()
(gdb) cont
Continuing.

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb) quit
[auser@hostname ~]$

Terminal 1
Segmentation fault (core dumped)
Comment 5 Paul Floyd 2023-02-27 21:07:18 UTC
Thanks for the detailed analysis.

You're stepping through the code in ld.so that's resolving the PIC stuff (the PLT). It seems to be resolving to some function address, but then the function call is failing, Unless the function lookup is going wrong I can't think what could be the problem.

"getaddrinfo" only takes 4 arguments so they will be passed in registers.

The asm for getaddrinfo is 

00000000000efb00 <getaddrinfo@@GLIBC_2.2.5>:
   efb00:       55                      push   %rbp
   efb01:       48 89 e5                mov    %rsp,%rbp
   efb04:       41 57                   push   %r15
   efb06:       41 56                   push   %r14
   efb08:       41 55                   push   %r13
   efb0a:       41 54                   push   %r12

If you do a 'step in' with gdb do you see the next instruction being this push %rbp?

(I usually do this in TUI mode ctrl-x a then split screen ctrl-x 2 until I see source / asm / command panels)
Comment 6 Mike J 2023-02-28 22:00:23 UTC
Thanks Paul. I was unaware of TUI mode, its really useful.

The following extract is from the TUI asm and command windows.
It shows a int3 rather than a push %rbp on the initial entry, where the call stack still shows as being normal.
On the stepi (rather than a step tried previously), the call stack has then become corrupted.

Is the int3 likely to be something that valgrind might introduce instead of the push %rbp ?

B+>x0x534b5e0 <__GI_getaddrinfo>    int3   
   x0x534b5e1 <__GI_getaddrinfo+1>  mov    %rsp,%rbp
   x0x534b5e4 <__GI_getaddrinfo+4>  push   %r15     
   x0x534b5e6 <__GI_getaddrinfo+6>  push   %r14     
   x0x534b5e8 <__GI_getaddrinfo+8>  mov    %rdi,%r14
   x0x534b5eb <__GI_getaddrinfo+11> push   %r13     
   x0x534b5ed <__GI_getaddrinfo+13> mov    %rsi,%r13
   x0x534b5f0 <__GI_getaddrinfo+16> push   %r12     
   x0x534b5f2 <__GI_getaddrinfo+18> mov    %rdx,%r12
   x0x534b5f5 <__GI_getaddrinfo+21> push   %rbx     
   x0x534b5f6 <__GI_getaddrinfo+22> sub    $0x518,%rsp
   x0x534b5fd <__GI_getaddrinfo+29> test   %rdi,%rdi
   x0x534b600 <__GI_getaddrinfo+32> mov    %rcx,-0x530(%rbp)

(gdb) where
#0  __GI_getaddrinfo (name=0x5632040 "hostname.localdomain", service=service@entry=0x0, hints=hints@entry=0x1ffefff930,
    pai=pai@entry=0x1ffefff928) at ../sysdeps/posix/getaddrinfo.c:2208
#1  0x0000000000401b19 in show_name (type=type@entry=DNS) at hostname.c:339
#2  0x00000000004013e4 in main (argc=2, argv=0x1ffefffb98) at hostname.c:550

(gdb) stepi
stepi

  >x0x534b5e1 <__GI_getaddrinfo+1>  mov    %rsp,%rbp
   x0x534b5e4 <__GI_getaddrinfo+4>  push   %r15
   x0x534b5e6 <__GI_getaddrinfo+6>  push   %r14
   x0x534b5e8 <__GI_getaddrinfo+8>  mov    %rdi,%r14
   x0x534b5eb <__GI_getaddrinfo+11> push   %r13     
   x0x534b5ed <__GI_getaddrinfo+13> mov    %rsi,%r13
   x0x534b5f0 <__GI_getaddrinfo+16> push   %r12     
   x0x534b5f2 <__GI_getaddrinfo+18> mov    %rdx,%r12
   x0x534b5f5 <__GI_getaddrinfo+21> push   %rbx     
   x0x534b5f6 <__GI_getaddrinfo+22> sub    $0x518,%rsp
   x0x534b5fd <__GI_getaddrinfo+29> test   %rdi,%rdi
   x0x534b600 <__GI_getaddrinfo+32> mov    %rcx,-0x530(%rbp)
   x0x534b607 <__GI_getaddrinfo+39> movq   $0x0,-0x4c0(%rbp)

(gdb) where
where
#0  0x000000000534b5e1 in __GI_getaddrinfo (name=0x5632040 "hostname.localdomain", service=0x0, hints=0x1ffefff930, pai=0x1ffefff928)
    at ../sysdeps/posix/getaddrinfo.c:2208
#1  0x0000001ffefffa40 in ?? ()
#2  0x000000000529d226 in __GI_getenv (name=0x1ffefff930 "\002") at getenv.c:35
#3  0x0000000000000000 in ?? ()
Comment 7 Paul Floyd 2023-02-28 22:18:05 UTC
(In reply to Mike J from comment #6)
> Thanks Paul. I was unaware of TUI mode, its really useful.
> 
> The following extract is from the TUI asm and command windows.
> It shows a int3 rather than a push %rbp on the initial entry, where the call
> stack still shows as being normal.
> On the stepi (rather than a step tried previously), the call stack has then
> become corrupted.
> 
> Is the int3 likely to be something that valgrind might introduce instead of
> the push %rbp ?

int3 causes the application to stop if it is being debugged or (as the title of this item says) terminate with SIGTRAP. Valgrind doesn't use PTRACE like debuggers do, and it shouldn't be inserting an int3 since it will still cause a SIGTRAP.

I don't yet have any ideas what could cause the memory to be corrupted.
Comment 8 Mike J 2023-03-29 19:49:03 UTC
Hi.

This bug can be closed. It is not caused by valgrind.

In case it is of use in future to anyone, further checks have shown that TaniumClient version 7.4.9.1046 was running on the system and caused the problem. valgrind was working on the system with an earlier TaniumClient release, but stopped working when the TaniumClient package was upgraded late last year, affecting valgrind runs.

As indirectly noted in the earlier comments, the C library getaddrinfo() function is dynamically loaded when first called by an application. I took valgrind out of the picture and ran "/usr/bin/hostname -d" under the control of the gdb debugger.
- With TaniumClient running, on entry to getaddrinfo(), the int3 instruction is seen instead of the expected push %rbp instruction, corrupting the call stack and raising a SIGTRAP. If run with valgrind, valgrind catches the raised SIGTRAP signal in this case and exits with a core dump, showing a corrupt call stack.
- When TaniumClient is stopped, the expected push %rbp instruction is instead seen. If run with valgrind, it runs correctly and completes normally.

Thanks to Paul Floyd and Mark Wielaard for checking the problem and debugging advice which pointed me in the right direction for problem diagnosis.
Comment 9 Paul Floyd 2023-03-30 07:01:19 UTC
Thanks for letting us know.

I also find that slightly disconcerting - the big corporate I work for also uses Tanium, though presumably without the setting or option that caused this problem.
Comment 10 b_betts 2023-05-03 21:58:12 UTC
I found that osqueryd can also cause the same problem (running under Ubuntu 16.04).
Comment 11 Paul Floyd 2023-05-04 05:50:39 UTC
(In reply to b_betts from comment #10)
> I found that osqueryd can also cause the same problem (running under Ubuntu
> 16.04).

Interesting. I had a quick look and can't see much that would cause it - osquery uses a lit of 3rd party libs so it is probably in one of those.
Comment 12 Paul Floyd 2023-05-26 12:53:45 UTC
I asked Tanium about this. This is their answer:

> XXXXXXXX  (Tanium UK)
>
> Hi YYYYYYYY (Customer)​ ,
>
> Unfortunately this is a recent issue we've discovered. It's related to Recorder (possibly recording DNS events). Our development team
>  are looking into this, but I can't give any timeframes for resolution.
>
> Some people have had success by upgrading their kernel. The only other workaround is to switch the THR profile to Tools only to disable the
> THR Recorder extension on affected endpoints.
>
> I would encourage you to log a case with our support centre, just to get your account formally registered as being affected by this issue.
Comment 13 Thomas Akin 2023-08-24 15:52:48 UTC
The issue isn't caused by Tanium - it represents itself after the Tanium recorder is configured to use eBPF to capture DNS acticity on certain kernel versions. However, the actual bug appears to be in either eBPF, the kernel, or the debugger.

You can reproduce the issue without Tanium even being installed:

# dnf install bpftrace
# bpftrace -e 'uprobe:libc:getaddrinfo {}' &
# valgrind hostname -d

We updated our configuration options to allow you to work around the issue by disabling DNS events on systems with this issue so that you can still run the recorder for all other events using eBPF. As it's an underlying issue on the systems themselves, all we can do is allow you to avoid a configuration that will trigger the underlying problem.