| Summary: | SIGTRAP crash whenever getaddrinfo call is issued by valgrind | ||
|---|---|---|---|
| Product: | [Developer tools] valgrind | Reporter: | Mike J <do.not.spam.me.kde.bugzilla> |
| Component: | memcheck | Assignee: | Julian Seward <jseward> |
| Status: | RESOLVED NOT A BUG | ||
| Severity: | crash | CC: | b_betts, mark, pjfloyd, thomas.akin |
| Priority: | NOR | ||
| Version First Reported In: | 3.20.0 | ||
| Target Milestone: | --- | ||
| Platform: | RedHat Enterprise Linux | ||
| OS: | Linux | ||
| Latest Commit: | Version Fixed/Implemented In: | ||
| Sentry Crash Report: | |||
|
Description
Mike J
2023-02-21 01:09:14 UTC
This might be fairly tricky to reproduce. I can't reproduce this on a RHEL 7.9 machine using vas4 ldap. What network config are you using? I do get SYSCALL[21118,1](12) sys_brk ( 0x0 ) --> [pre-success] Success(0x4224000) --21118-- REDIR: 0x4019e40 (ld-linux-x86-64.so.2:strlen) redirected to 0x580c7ed5 (???) --21118-- sync signal handler: signal=11, si_code=1, EIP=0x4001f49, eip=0x1002bb608b, from kernel --21118-- SIGSEGV: si_code=1 faultaddr=0x1ffeffdf80 tid=1 ESP=0x1ffeffdf30 seg=0x1ffe801000-0x1ffeffdfff --21118:1: signals extending a stack base 0x1ffeffe000 down by 4096 new base 0x1ffeffd000 to cover 0x1ffeffd000 --21118-- -> extended stack base to 0x1ffeffd000 SYSCALL[21118,1](63) sys_newuname ( 0x1ffeffdd6a )[sync] --> Success(0x0) --21118-- REDIR: 0x4019c10 (ld-linux-x86-64.so.2:index) redirected to 0x580c7eef (???) SYSCALL[21118,1](9) sys_mmap ( 0x0, 4096, 3, 34, -1, 0 ) --> [pre-success] Success(0x4022000) --21118-- sync signal handler: signal=11, si_code=1, EIP=0x4001cf1, eip=0x1002bd28ae, from kernel --21118-- SIGSEGV: si_code=1 faultaddr=0x1ffeffcef8 tid=1 ESP=0x1ffeffcef8 seg=0x1ffe801000-0x1ffeffcfff --21118:1: signals extending a stack base 0x1ffeffd000 down by 4096 new base 0x1ffeffc000 to cover 0x1ffeffc000 --21118-- -> extended stack base to 0x1ffeffc000 (In reply to Paul Floyd from comment #1) > This might be fairly tricky to reproduce. I can't reproduce this on a RHEL > 7.9 machine using vas4 ldap. I also am unable to reproduce. It might be helpful to install the glibc debuginfo to get a better idea where the issue comes from. Thanks. Although the sysadmins installed the correct debuginfo for glibc and hostname today, I won't have collatable results from this until 23rd Feb. I'll provide an update then It was noted that we have Dynatrace OneAgent installed, which preloads one of its libraries by adding it to /etc/ld.so.preload. Although originally thought it might be involved in the problem, we have concluded today that it is not involved, by doing two valgrind runs with gdb attached on hostname -d, with and without the preloaded library.
The two run details are shown below with gdb output, in the hope that somebody can spot something untoward that valgrind may be doing.
In the first run, /etc/ld.so.preload is set up to dynamically link in a Dynatrace OneAgent library for each program started.
In the second run, /etc/ld.so.preload is renamed and ldconfig run to relink runtime shared library cache
GDB is attached once valgrind is started.
Breakpoints are set on show_name and getaddrinfo, but stepped through from the show_name breakpoint to also watch the dynamic linker behaviour in loading the getaddrinfo call.
The initial step from show_name() to getaddrinfo() call shows dynamic linker involved in loading call from glibc library.
On first entry into getaddrinfo function, the callstack is OK.
On the next step instruction, the callstack becomes corrupted.
Continuing on leads to a SIGSEGV, rather than a SIGTRAP, which crashes the program.
Both runs are identical in outcome.
If the debugger is not attached, a SIGTRAP is instead raised, which crashes the program.
Lines with @@@@@@@@@@@@@ below are eye catchers for relevant notes
@@@@@@@@@@@@@
First run
@@@@@@@@@@@@@
[auser@hostname ~]$ cat /etc/ld.so.preload
/$LIB/liboneagentproc.so
[auser@hostname ~]$ ldd /usr/bin/hostname
linux-vdso.so.1 => (0x00007ffd02d96000)
/$LIB/liboneagentproc.so => /lib64/liboneagentproc.so (0x00002af8ce652000)
libnsl.so.1 => /usr/lib64/libnsl.so.1 (0x00002af8ce860000)
libc.so.6 => /usr/lib64/libc.so.6 (0x00002af8cea7a000)
/lib64/ld-linux-x86-64.so.2 (0x00002af8ce42e000)
Terminal 1
valgrind --trace-signals=yes -v --log-file=valgrind.out.2 --vgdb=full --vgdb-stop-at=startup hostname -d
Terminal 2
cat valgrind.out.2
==77535== Memcheck, a memory error detector
==77535== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==77535== Using Valgrind-3.20.0-5147d671e4-20221024 and LibVEX; rerun with -h for copyright info
==77535== Command: hostname -d
==77535== Parent PID: 111647
==77535==
--77535--
--77535-- Valgrind options:
--77535-- --trace-signals=yes
--77535-- -v
--77535-- --log-file=valgrind.out.2
--77535-- --vgdb=full
--77535-- --vgdb-stop-at=startup
--77535-- Contents of /proc/version:
--77535-- Linux version 3.10.0-1160.81.1.el7.x86_64 (mockbuild@x86-vm-38.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Thu Nov 24 12:21:22 UTC 2022
--77535--
--77535-- Arch and hwcaps: AMD64, LittleEndian, amd64-cx16-lzcnt-rdtscp-sse3-ssse3-avx-avx2-bmi-f16c-rdrand
--77535-- Page sizes: currently 4096, max supported 4096
--77535-- Valgrind library directory: /home/auser/local/libexec/valgrind
--77535-- Reading syms from /usr/bin/hostname
--77535-- Considering /usr/lib/debug/.build-id/93/633698bd11eeb4bee21a388c191a5656990d8e.debug ..
--77535-- .. build-id is valid
--77535-- Reading syms from /usr/lib64/ld-2.17.so
--77535-- Considering /usr/lib/debug/.build-id/62/c449974331341bb08dcce3859560a22af1e172.debug ..
--77535-- .. build-id is valid
--77535-- Reading syms from /home/auser/local/libexec/valgrind/memcheck-amd64-linux
--77535-- object doesn't have a dynamic symbol table
--77535-- Scheduler: using generic scheduler lock implementation.
--77535-- Max kernel-supported signal is 64, VG_SIGVGKILL is 64
--77535-- Reading suppressions file: /home/auser/local/libexec/valgrind/default.supp
==77535== (action at startup) vgdb me ...
==77535== embedded gdbserver: reading from /tmp/vgdb-pipe-from-vgdb-to-77535-by-auser-on-hostname.localdomain
==77535== embedded gdbserver: writing to /tmp/vgdb-pipe-to-vgdb-from-77535-by-auser-on-hostname.localdomain
==77535== embedded gdbserver: shared mem /tmp/vgdb-pipe-shared-mem-vgdb-77535-by-auser-on-hostname.localdomain
==77535==
==77535== TO CONTROL THIS PROCESS USING vgdb (which you probably
==77535== don't want to do, unless you know exactly what you're doing,
==77535== or are doing some strange experiment):
==77535== /home/auser/local/libexec/valgrind/../../bin/vgdb --pid=77535 ...command...
==77535==
==77535== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==77535== /path/to/gdb hostname
==77535== and then give GDB the following command
==77535== target remote | /home/auser/local/libexec/valgrind/../../bin/vgdb --pid=77535
==77535== --pid is optional if only one valgrind process is running
==77535==
[auser@hostname ~]$ gdb /usr/bin/hostname
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/hostname...Reading symbols from /usr/lib/debug/usr/bin/hostname.debug...done.
done.
(gdb) target remote | /home/auser/local/libexec/valgrind/../../bin/vgdb --pid=77535
Remote debugging using | /home/auser/local/libexec/valgrind/../../bin/vgdb --pid=77535
relaying data between gdb and process 77535
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug/usr/lib64/ld-2.17.so.debug...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x0000000004001140 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) break show_name
Breakpoint 1 at 0x401930: file hostname.c, line 256.
(gdb) break getaddrinfo
Breakpoint 2 at 0x401170
(gdb) cont
Continuing.
Missing separate debuginfo for /lib64/liboneagentproc.so
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/48/81feaea9f4a359f31684c530503b5c629d53af.debug
Breakpoint 1, show_name (type=type@entry=DNS) at hostname.c:256
256 {
(gdb) n
264 switch(type)
(gdb) n
334 memset(&hints, 0, sizeof(struct addrinfo));
(gdb) n
335 hints.ai_socktype = SOCK_DGRAM;
(gdb) n
336 hints.ai_flags = AI_CANONNAME;
(gdb) n
338 p = localhost();
(gdb) n
339 if ((ret = getaddrinfo(p, NULL, &hints, &res)) != 0)
(gdb) s
_vgr20160ZU_ldZhlinuxZhx86Zh64ZdsoZd2_strcmp (s1=0x40078f "libc.so.6", s2=0x401cf5a "") at ../shared/vg_replace_strmem.c:941
941 STRCMP(VG_Z_LD_LINUX_X86_64_SO_2, strcmp)
(gdb) finish
Run till exit from #0 _vgr20160ZU_ldZhlinuxZhx86Zh64ZdsoZd2_strcmp (s1=0x40078f "libc.so.6", s2=0x401cf5a "")
at ../shared/vg_replace_strmem.c:941
0x0000000004010a15 in _dl_name_match_p (name=0x40078f "libc.so.6", map=0x4223150) at dl-misc.c:305
305 if (strcmp (name, map->l_name) == 0)
Value returned is $1 = 1
(gdb) finish
Run till exit from #0 0x0000000004010a15 in _dl_name_match_p (name=0x40078f "libc.so.6", map=0x4223150) at dl-misc.c:305
0x0000000004009826 in do_lookup_x (new_hash=new_hash@entry=2089078220, old_hash=old_hash@entry=0x1ffefff460,
result=result@entry=0x1ffefff470, scope=<optimized out>, i=<optimized out>, i@entry=0, flags=flags@entry=1, skip=skip@entry=0x0,
undef_map=undef_map@entry=0x4223150) at dl-lookup.c:463
463 && __builtin_expect (_dl_name_match_p (version->filename, map), 0))
Value returned is $2 = 0
(gdb) finish
Run till exit from #0 0x0000000004009826 in do_lookup_x (new_hash=new_hash@entry=2089078220, old_hash=old_hash@entry=0x1ffefff460,
result=result@entry=0x1ffefff470, scope=<optimized out>, i=<optimized out>, i@entry=0, flags=flags@entry=1, skip=skip@entry=0x0,
undef_map=undef_map@entry=0x4223150) at dl-lookup.c:463
_dl_lookup_symbol_x (undef_name=0x400820 "getaddrinfo", undef_map=0x4223150, ref=ref@entry=0x1ffefff528, symbol_scope=0x42234a8,
version=0x4023030, type_class=type_class@entry=1, flags=1, skip_map=skip_map@entry=0x0) at dl-lookup.c:742
742 if (res > 0)
Value returned is $3 = 1
(gdb) where
#0 _dl_lookup_symbol_x (undef_name=0x400820 "getaddrinfo", undef_map=0x4223150, ref=ref@entry=0x1ffefff528, symbol_scope=0x42234a8,
version=0x4023030, type_class=type_class@entry=1, flags=1, skip_map=skip_map@entry=0x0) at dl-lookup.c:742
#1 0x000000000400edee in _dl_fixup (l=<optimized out>, reloc_arg=<optimized out>) at ../elf/dl-runtime.c:111
#2 0x00000000040169ea in _dl_runtime_resolve_xsave () at ../sysdeps/x86_64/dl-trampoline.h:131
#3 0x0000000000401b19 in show_name (type=type@entry=DNS) at hostname.c:339
#4 0x00000000004013e4 in main (argc=2, argv=0x1ffefffb88) at hostname.c:550
(gdb) finish
Run till exit from #0 _dl_lookup_symbol_x (undef_name=0x400820 "getaddrinfo", undef_map=0x4223150, ref=ref@entry=0x1ffefff528,
symbol_scope=0x42234a8, version=0x4023030, type_class=type_class@entry=1, flags=1, skip_map=skip_map@entry=0x0) at dl-lookup.c:742
0x000000000400edee in _dl_fixup (l=<optimized out>, reloc_arg=<optimized out>) at ../elf/dl-runtime.c:111
111 result = _dl_lookup_symbol_x (strtab + sym->st_name, l, &sym, l->l_scope,
Value returned is $4 = (struct link_map *) 0x40249a8
(gdb) finish
Run till exit from #0 0x000000000400edee in _dl_fixup (l=<optimized out>, reloc_arg=<optimized out>) at ../elf/dl-runtime.c:111
_dl_runtime_resolve_xsave () at ../sysdeps/x86_64/dl-trampoline.h:132
132 mov %RAX_LP, %R11_LP # Save return value
Value returned is $5 = 87340512
(gdb) where
#0 _dl_runtime_resolve_xsave () at ../sysdeps/x86_64/dl-trampoline.h:132
#1 0x0000000000401b19 in show_name (type=type@entry=DNS) at hostname.c:339
#2 0x00000000004013e4 in main (argc=2, argv=0x1ffefffb88) at hostname.c:550
(gdb) finish
Run till exit from #0 _dl_runtime_resolve_xsave () at ../sysdeps/x86_64/dl-trampoline.h:132
Breakpoint 2, __GI_getaddrinfo (name=0x5632040 "hostname.localdomain", service=service@entry=0x0,
hints=hints@entry=0x1ffefff920, pai=pai@entry=0x1ffefff918) at ../sysdeps/posix/getaddrinfo.c:2208
2208 {
(gdb) where
#0 __GI_getaddrinfo (name=0x5632040 "hostname.localdomain", service=service@entry=0x0, hints=hints@entry=0x1ffefff920,
pai=pai@entry=0x1ffefff918) at ../sysdeps/posix/getaddrinfo.c:2208
#1 0x0000000000401b19 in show_name (type=type@entry=DNS) at hostname.c:339
#2 0x00000000004013e4 in main (argc=2, argv=0x1ffefffb88) at hostname.c:550
@@@@@@@@@@@@@
Call stack becomes scrambled on next step. Only #0 is shown correctly with where
@@@@@@@@@@@@@
(gdb) s
__GI_getaddrinfo (name=0x5632040 "hostname.localdomain", service=0x0, hints=0x1ffefff920, pai=0x1ffefff918)
at ../sysdeps/posix/getaddrinfo.c:2215
2215 if (name != NULL && name[0] == '*' && name[1] == 0)
(gdb) where
#0 __GI_getaddrinfo (name=0x5632040 "hostname.localdomain", service=0x0, hints=0x1ffefff920, pai=0x1ffefff918)
at ../sysdeps/posix/getaddrinfo.c:2215
#1 0x0000001ffefffa30 in ?? ()
#2 0x000000000529d226 in __GI_getenv (name=0x1ffefff920 "\002") at getenv.c:35
#3 0x0000000000000000 in ?? ()
(gdb) cont
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x0000001ffefffa30 in ?? ()
(gdb) cont
Continuing.
Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb) quit
Terminal 1
Segmentation fault (core dumped)
@@@@@@@@@@@@@
Second run
@@@@@@@@@@@@@
[auser@hostname ~]$ ls -l /etc/ld.so.preload
ls: cannot access /etc/ld.so.preload: No such file or directory
[auser@hostname ~]$ ldd /usr/bin/hostname
linux-vdso.so.1 => (0x00007fffad9b2000)
libnsl.so.1 => /usr/lib64/libnsl.so.1 (0x00002b3691222000)
libc.so.6 => /usr/lib64/libc.so.6 (0x00002b369143c000)
/lib64/ld-linux-x86-64.so.2 (0x00002b3690ffe000)
Terminal 1
[auser@hostname ~]$ valgrind --trace-signals=yes -v --log-file=valgrind.out.3 --vgdb=full --vgdb-stop-at=startup hostname -d
Terminal 2
[auser@hostname ~]$ cat valgrind.out.3
==92915== Memcheck, a memory error detector
==92915== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==92915== Using Valgrind-3.20.0-5147d671e4-20221024 and LibVEX; rerun with -h for copyright info
==92915== Command: hostname -d
==92915== Parent PID: 73200
==92915==
--92915--
--92915-- Valgrind options:
--92915-- --trace-signals=yes
--92915-- -v
--92915-- --log-file=valgrind.out.3
--92915-- --vgdb=full
--92915-- --vgdb-stop-at=startup
--92915-- Contents of /proc/version:
--92915-- Linux version 3.10.0-1160.81.1.el7.x86_64 (mockbuild@x86-vm-38.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Thu Nov 24 12:21:22 UTC 2022
--92915--
--92915-- Arch and hwcaps: AMD64, LittleEndian, amd64-cx16-lzcnt-rdtscp-sse3-ssse3-avx-avx2-bmi-f16c-rdrand
--92915-- Page sizes: currently 4096, max supported 4096
--92915-- Valgrind library directory: /home/auser/local/libexec/valgrind
--92915-- Reading syms from /usr/bin/hostname
--92915-- Considering /usr/lib/debug/.build-id/93/633698bd11eeb4bee21a388c191a5656990d8e.debug ..
--92915-- .. build-id is valid
--92915-- Reading syms from /usr/lib64/ld-2.17.so
--92915-- Considering /usr/lib/debug/.build-id/62/c449974331341bb08dcce3859560a22af1e172.debug ..
--92915-- .. build-id is valid
--92915-- Reading syms from /home/auser/local/libexec/valgrind/memcheck-amd64-linux
--92915-- object doesn't have a dynamic symbol table
--92915-- Scheduler: using generic scheduler lock implementation.
--92915-- Max kernel-supported signal is 64, VG_SIGVGKILL is 64
--92915-- Reading suppressions file: /home/auser/local/libexec/valgrind/default.supp
==92915== (action at startup) vgdb me ...
==92915== embedded gdbserver: reading from /tmp/vgdb-pipe-from-vgdb-to-92915-by-auser-on-hostname.localdomain
==92915== embedded gdbserver: writing to /tmp/vgdb-pipe-to-vgdb-from-92915-by-auser-on-hostname.localdomain
==92915== embedded gdbserver: shared mem /tmp/vgdb-pipe-shared-mem-vgdb-92915-by-auser-on-hostname.localdomain
==92915==
==92915== TO CONTROL THIS PROCESS USING vgdb (which you probably
==92915== don't want to do, unless you know exactly what you're doing,
==92915== or are doing some strange experiment):
==92915== /home/auser/local/libexec/valgrind/../../bin/vgdb --pid=92915 ...command...
==92915==
==92915== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==92915== /path/to/gdb hostname
==92915== and then give GDB the following command
==92915== target remote | /home/auser/local/libexec/valgrind/../../bin/vgdb --pid=92915
==92915== --pid is optional if only one valgrind process is running
==92915==
[auser@hostname ~]$ gdb /usr/bin/hostname
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/hostname...Reading symbols from /usr/lib/debug/usr/bin/hostname.debug...done.
done.
(gdb) target remote | /home/auser/local/libexec/valgrind/../../bin/vgdb --pid=92915
Remote debugging using | /home/auser/local/libexec/valgrind/../../bin/vgdb --pid=92915
relaying data between gdb and process 92915
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug/usr/lib64/ld-2.17.so.debug...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x0000000004001140 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) break show_name
Breakpoint 1 at 0x401930: file hostname.c, line 256.
(gdb) break getaddrinfo
Breakpoint 2 at 0x401170
(gdb) cont
Continuing.
Breakpoint 1, show_name (type=type@entry=DNS) at hostname.c:256
256 {
(gdb) n
264 switch(type)
(gdb) n
334 memset(&hints, 0, sizeof(struct addrinfo));
(gdb) n
335 hints.ai_socktype = SOCK_DGRAM;
(gdb) n
336 hints.ai_flags = AI_CANONNAME;
(gdb) n
338 p = localhost();
(gdb) n
339 if ((ret = getaddrinfo(p, NULL, &hints, &res)) != 0)
(gdb) s
_vgr20160ZU_ldZhlinuxZhx86Zh64ZdsoZd2_strcmp (s1=0x40078f "libc.so.6", s2=0x401cf5a "") at ../shared/vg_replace_strmem.c:941
941 STRCMP(VG_Z_LD_LINUX_X86_64_SO_2, strcmp)
(gdb) where
#0 _vgr20160ZU_ldZhlinuxZhx86Zh64ZdsoZd2_strcmp (s1=0x40078f "libc.so.6", s2=0x401cf5a "") at ../shared/vg_replace_strmem.c:941
#1 0x0000000004010a15 in _dl_name_match_p (name=0x40078f "libc.so.6", map=0x4223150) at dl-misc.c:305
#2 0x0000000004009826 in do_lookup_x (new_hash=new_hash@entry=2089078220, old_hash=old_hash@entry=0x1ffefff4a0,
result=result@entry=0x1ffefff4b0, scope=<optimized out>, i=<optimized out>, i@entry=0, flags=flags@entry=1, skip=skip@entry=0x0,
undef_map=undef_map@entry=0x4223150) at dl-lookup.c:463
#3 0x000000000400a09f in _dl_lookup_symbol_x (undef_name=0x400820 "getaddrinfo", undef_map=0x4223150, ref=ref@entry=0x1ffefff568,
symbol_scope=0x42234a8, version=0x4034a90, type_class=type_class@entry=1, flags=1, skip_map=skip_map@entry=0x0) at dl-lookup.c:739
#4 0x000000000400edee in _dl_fixup (l=<optimized out>, reloc_arg=<optimized out>) at ../elf/dl-runtime.c:111
#5 0x00000000040169ea in _dl_runtime_resolve_xsave () at ../sysdeps/x86_64/dl-trampoline.h:131
#6 0x0000000000401b19 in show_name (type=type@entry=DNS) at hostname.c:339
#7 0x00000000004013e4 in main (argc=2, argv=0x1ffefffbd8) at hostname.c:550
(gdb) finish
Run till exit from #0 _vgr20160ZU_ldZhlinuxZhx86Zh64ZdsoZd2_strcmp (s1=0x40078f "libc.so.6", s2=0x401cf5a "")
at ../shared/vg_replace_strmem.c:941
0x0000000004010a15 in _dl_name_match_p (name=0x40078f "libc.so.6", map=0x4223150) at dl-misc.c:305
305 if (strcmp (name, map->l_name) == 0)
Value returned is $1 = 1
(gdb) finish
Run till exit from #0 0x0000000004010a15 in _dl_name_match_p (name=0x40078f "libc.so.6", map=0x4223150) at dl-misc.c:305
0x0000000004009826 in do_lookup_x (new_hash=new_hash@entry=2089078220, old_hash=old_hash@entry=0x1ffefff4a0,
result=result@entry=0x1ffefff4b0, scope=<optimized out>, i=<optimized out>, i@entry=0, flags=flags@entry=1, skip=skip@entry=0x0,
undef_map=undef_map@entry=0x4223150) at dl-lookup.c:463
463 && __builtin_expect (_dl_name_match_p (version->filename, map), 0))
Value returned is $2 = 0
(gdb) finish
Run till exit from #0 0x0000000004009826 in do_lookup_x (new_hash=new_hash@entry=2089078220, old_hash=old_hash@entry=0x1ffefff4a0,
result=result@entry=0x1ffefff4b0, scope=<optimized out>, i=<optimized out>, i@entry=0, flags=flags@entry=1, skip=skip@entry=0x0,
undef_map=undef_map@entry=0x4223150) at dl-lookup.c:463
_dl_lookup_symbol_x (undef_name=0x400820 "getaddrinfo", undef_map=0x4223150, ref=ref@entry=0x1ffefff568, symbol_scope=0x42234a8,
version=0x4034a90, type_class=type_class@entry=1, flags=1, skip_map=skip_map@entry=0x0) at dl-lookup.c:742
742 if (res > 0)
Value returned is $3 = 1
(gdb) finish
Run till exit from #0 _dl_lookup_symbol_x (undef_name=0x400820 "getaddrinfo", undef_map=0x4223150, ref=ref@entry=0x1ffefff568,
symbol_scope=0x42234a8, version=0x4034a90, type_class=type_class@entry=1, flags=1, skip_map=skip_map@entry=0x0) at dl-lookup.c:742
0x000000000400edee in _dl_fixup (l=<optimized out>, reloc_arg=<optimized out>) at ../elf/dl-runtime.c:111
111 result = _dl_lookup_symbol_x (strtab + sym->st_name, l, &sym, l->l_scope,
Value returned is $4 = (struct link_map *) 0x40344c8
(gdb) finish
Run till exit from #0 0x000000000400edee in _dl_fixup (l=<optimized out>, reloc_arg=<optimized out>) at ../elf/dl-runtime.c:111
_dl_runtime_resolve_xsave () at ../sysdeps/x86_64/dl-trampoline.h:132
132 mov %RAX_LP, %R11_LP # Save return value
Value returned is $5 = 85186016
(gdb) finish
Run till exit from #0 _dl_runtime_resolve_xsave () at ../sysdeps/x86_64/dl-trampoline.h:132
Breakpoint 2, __GI_getaddrinfo (name=0x5424040 "hostname.localdomain", service=service@entry=0x0,
hints=hints@entry=0x1ffefff970, pai=pai@entry=0x1ffefff968) at ../sysdeps/posix/getaddrinfo.c:2208
2208 {
(gdb) where
#0 __GI_getaddrinfo (name=0x5424040 "hostname.localdomain", service=service@entry=0x0, hints=hints@entry=0x1ffefff970,
pai=pai@entry=0x1ffefff968) at ../sysdeps/posix/getaddrinfo.c:2208
#1 0x0000000000401b19 in show_name (type=type@entry=DNS) at hostname.c:339
#2 0x00000000004013e4 in main (argc=2, argv=0x1ffefffbd8) at hostname.c:550
@@@@@@@@@@@@@
Call stack becomes scrambled on next step. Only #0 is shown correctly with where
@@@@@@@@@@@@@
(gdb) s
__GI_getaddrinfo (name=0x5424040 "hostname.localdomain", service=0x0, hints=0x1ffefff970, pai=0x1ffefff968)
at ../sysdeps/posix/getaddrinfo.c:2215
2215 if (name != NULL && name[0] == '*' && name[1] == 0)
(gdb) where
#0 __GI_getaddrinfo (name=0x5424040 "hostname.localdomain", service=0x0, hints=0x1ffefff970, pai=0x1ffefff968)
at ../sysdeps/posix/getaddrinfo.c:2215
#1 0x0000001ffefffa80 in ?? ()
#2 0x000000000508f226 in __GI_getenv (name=0x1ffefff970 "\002") at getenv.c:35
#3 0x0000000000000000 in ?? ()
(gdb) cont
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x0000001ffefffa80 in ?? ()
(gdb) cont
Continuing.
Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb) quit
[auser@hostname ~]$
Terminal 1
Segmentation fault (core dumped)
Thanks for the detailed analysis. You're stepping through the code in ld.so that's resolving the PIC stuff (the PLT). It seems to be resolving to some function address, but then the function call is failing, Unless the function lookup is going wrong I can't think what could be the problem. "getaddrinfo" only takes 4 arguments so they will be passed in registers. The asm for getaddrinfo is 00000000000efb00 <getaddrinfo@@GLIBC_2.2.5>: efb00: 55 push %rbp efb01: 48 89 e5 mov %rsp,%rbp efb04: 41 57 push %r15 efb06: 41 56 push %r14 efb08: 41 55 push %r13 efb0a: 41 54 push %r12 If you do a 'step in' with gdb do you see the next instruction being this push %rbp? (I usually do this in TUI mode ctrl-x a then split screen ctrl-x 2 until I see source / asm / command panels) Thanks Paul. I was unaware of TUI mode, its really useful.
The following extract is from the TUI asm and command windows.
It shows a int3 rather than a push %rbp on the initial entry, where the call stack still shows as being normal.
On the stepi (rather than a step tried previously), the call stack has then become corrupted.
Is the int3 likely to be something that valgrind might introduce instead of the push %rbp ?
B+>x0x534b5e0 <__GI_getaddrinfo> int3
x0x534b5e1 <__GI_getaddrinfo+1> mov %rsp,%rbp
x0x534b5e4 <__GI_getaddrinfo+4> push %r15
x0x534b5e6 <__GI_getaddrinfo+6> push %r14
x0x534b5e8 <__GI_getaddrinfo+8> mov %rdi,%r14
x0x534b5eb <__GI_getaddrinfo+11> push %r13
x0x534b5ed <__GI_getaddrinfo+13> mov %rsi,%r13
x0x534b5f0 <__GI_getaddrinfo+16> push %r12
x0x534b5f2 <__GI_getaddrinfo+18> mov %rdx,%r12
x0x534b5f5 <__GI_getaddrinfo+21> push %rbx
x0x534b5f6 <__GI_getaddrinfo+22> sub $0x518,%rsp
x0x534b5fd <__GI_getaddrinfo+29> test %rdi,%rdi
x0x534b600 <__GI_getaddrinfo+32> mov %rcx,-0x530(%rbp)
(gdb) where
#0 __GI_getaddrinfo (name=0x5632040 "hostname.localdomain", service=service@entry=0x0, hints=hints@entry=0x1ffefff930,
pai=pai@entry=0x1ffefff928) at ../sysdeps/posix/getaddrinfo.c:2208
#1 0x0000000000401b19 in show_name (type=type@entry=DNS) at hostname.c:339
#2 0x00000000004013e4 in main (argc=2, argv=0x1ffefffb98) at hostname.c:550
(gdb) stepi
stepi
>x0x534b5e1 <__GI_getaddrinfo+1> mov %rsp,%rbp
x0x534b5e4 <__GI_getaddrinfo+4> push %r15
x0x534b5e6 <__GI_getaddrinfo+6> push %r14
x0x534b5e8 <__GI_getaddrinfo+8> mov %rdi,%r14
x0x534b5eb <__GI_getaddrinfo+11> push %r13
x0x534b5ed <__GI_getaddrinfo+13> mov %rsi,%r13
x0x534b5f0 <__GI_getaddrinfo+16> push %r12
x0x534b5f2 <__GI_getaddrinfo+18> mov %rdx,%r12
x0x534b5f5 <__GI_getaddrinfo+21> push %rbx
x0x534b5f6 <__GI_getaddrinfo+22> sub $0x518,%rsp
x0x534b5fd <__GI_getaddrinfo+29> test %rdi,%rdi
x0x534b600 <__GI_getaddrinfo+32> mov %rcx,-0x530(%rbp)
x0x534b607 <__GI_getaddrinfo+39> movq $0x0,-0x4c0(%rbp)
(gdb) where
where
#0 0x000000000534b5e1 in __GI_getaddrinfo (name=0x5632040 "hostname.localdomain", service=0x0, hints=0x1ffefff930, pai=0x1ffefff928)
at ../sysdeps/posix/getaddrinfo.c:2208
#1 0x0000001ffefffa40 in ?? ()
#2 0x000000000529d226 in __GI_getenv (name=0x1ffefff930 "\002") at getenv.c:35
#3 0x0000000000000000 in ?? ()
(In reply to Mike J from comment #6) > Thanks Paul. I was unaware of TUI mode, its really useful. > > The following extract is from the TUI asm and command windows. > It shows a int3 rather than a push %rbp on the initial entry, where the call > stack still shows as being normal. > On the stepi (rather than a step tried previously), the call stack has then > become corrupted. > > Is the int3 likely to be something that valgrind might introduce instead of > the push %rbp ? int3 causes the application to stop if it is being debugged or (as the title of this item says) terminate with SIGTRAP. Valgrind doesn't use PTRACE like debuggers do, and it shouldn't be inserting an int3 since it will still cause a SIGTRAP. I don't yet have any ideas what could cause the memory to be corrupted. Hi. This bug can be closed. It is not caused by valgrind. In case it is of use in future to anyone, further checks have shown that TaniumClient version 7.4.9.1046 was running on the system and caused the problem. valgrind was working on the system with an earlier TaniumClient release, but stopped working when the TaniumClient package was upgraded late last year, affecting valgrind runs. As indirectly noted in the earlier comments, the C library getaddrinfo() function is dynamically loaded when first called by an application. I took valgrind out of the picture and ran "/usr/bin/hostname -d" under the control of the gdb debugger. - With TaniumClient running, on entry to getaddrinfo(), the int3 instruction is seen instead of the expected push %rbp instruction, corrupting the call stack and raising a SIGTRAP. If run with valgrind, valgrind catches the raised SIGTRAP signal in this case and exits with a core dump, showing a corrupt call stack. - When TaniumClient is stopped, the expected push %rbp instruction is instead seen. If run with valgrind, it runs correctly and completes normally. Thanks to Paul Floyd and Mark Wielaard for checking the problem and debugging advice which pointed me in the right direction for problem diagnosis. Thanks for letting us know. I also find that slightly disconcerting - the big corporate I work for also uses Tanium, though presumably without the setting or option that caused this problem. I found that osqueryd can also cause the same problem (running under Ubuntu 16.04). (In reply to b_betts from comment #10) > I found that osqueryd can also cause the same problem (running under Ubuntu > 16.04). Interesting. I had a quick look and can't see much that would cause it - osquery uses a lit of 3rd party libs so it is probably in one of those. I asked Tanium about this. This is their answer: > XXXXXXXX (Tanium UK) > > Hi YYYYYYYY (Customer) , > > Unfortunately this is a recent issue we've discovered. It's related to Recorder (possibly recording DNS events). Our development team > are looking into this, but I can't give any timeframes for resolution. > > Some people have had success by upgrading their kernel. The only other workaround is to switch the THR profile to Tools only to disable the > THR Recorder extension on affected endpoints. > > I would encourage you to log a case with our support centre, just to get your account formally registered as being affected by this issue. The issue isn't caused by Tanium - it represents itself after the Tanium recorder is configured to use eBPF to capture DNS acticity on certain kernel versions. However, the actual bug appears to be in either eBPF, the kernel, or the debugger.
You can reproduce the issue without Tanium even being installed:
# dnf install bpftrace
# bpftrace -e 'uprobe:libc:getaddrinfo {}' &
# valgrind hostname -d
We updated our configuration options to allow you to work around the issue by disabling DNS events on systems with this issue so that you can still run the recorder for all other events using eBPF. As it's an underlying issue on the systems themselves, all we can do is allow you to avoid a configuration that will trigger the underlying problem.
|