Valgrind receive "m_scheduler/scheduler.c:1592 (vgPlain_scheduler): the 'impossible' happened." error when running "hello world" program in 32-bit. Also tested on Callgrind producing the same error. Reproducible: Always Steps to reproduce: 1. Compile hello world -> gcc -m32 hello.c 2. Run valgrind -> valgrind ./a.out Output: ==18138== Memcheck, a memory error detector ==18138== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==18138== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info ==18138== Command: ./a.out ==18138== --18138-- Valgrind options: --18138-- -v --18138-- Contents of /proc/version: --18138-- Linux version 4.8.15 (root@gentoo64.transmode.se) (gcc version 4.9.4 (Gentoo 4.9.4 p1.0, pie-0.6.4) ) #2 SMP PREEMPT Sat Dec 17 10:14:28 CET 2016 --18138-- Arch and hwcaps: X86, LittleEndian, x86-mmxext-sse1-sse2-lzcnt --18138-- Page sizes: currently 4096, max supported 4096 --18138-- Valgrind library directory: /usr/lib64/valgrind --18138-- Reading syms from /lib32/ld-2.23.so --18138-- Considering /usr/lib/debug/lib32/ld-2.23.so.debug .. --18138-- .. CRC is valid --18138-- Reading syms from /home/pnyberg/tmp/a.out --18138-- Reading syms from /usr/lib64/valgrind/memcheck-x86-linux --18138-- object doesn't have a symbol table --18138-- object doesn't have a dynamic symbol table --18138-- Scheduler: using generic scheduler lock implementation. --18138-- Reading suppressions file: /usr/lib64/valgrind/default.supp ==18138== embedded gdbserver: reading from /tmp/vgdb-pipe-from-vgdb-to-18138-by-pnyberg-on-??? ==18138== embedded gdbserver: writing to /tmp/vgdb-pipe-to-vgdb-from-18138-by-pnyberg-on-??? ==18138== embedded gdbserver: shared mem /tmp/vgdb-pipe-shared-mem-vgdb-18138-by-pnyberg-on-??? ==18138== ==18138== TO CONTROL THIS PROCESS USING vgdb (which you probably ==18138== don't want to do, unless you know exactly what you're doing, ==18138== or are doing some strange experiment): ==18138== /usr/lib64/valgrind/../../bin/vgdb --pid=18138 ...command... ==18138== ==18138== TO DEBUG THIS PROCESS USING GDB: start GDB like this ==18138== /path/to/gdb ./a.out ==18138== and then give GDB the following command ==18138== target remote | /usr/lib64/valgrind/../../bin/vgdb --pid=18138 ==18138== --pid is optional if only one valgrind process is running ==18138== --18138-- REDIR: 0x4018610 (ld-linux.so.2:strlen) redirected to 0x38075552 (???) valgrind: m_scheduler/scheduler.c:1592 (vgPlain_scheduler): the 'impossible' happened. valgrind: VG_(scheduler), phase 3: run_innerloop detected host state invariant failure host stacktrace: ==18138== at 0x3805A4A4: ??? (in /usr/lib64/valgrind/memcheck-x86-linux) ==18138== by 0x3805A5F6: ??? (in /usr/lib64/valgrind/memcheck-x86-linux) ==18138== by 0x3805A759: ??? (in /usr/lib64/valgrind/memcheck-x86-linux) ==18138== by 0x380B4BC3: ??? (in /usr/lib64/valgrind/memcheck-x86-linux) ==18138== by 0x380C6F47: ??? (in /usr/lib64/valgrind/memcheck-x86-linux) sched status: running_tid=1 Thread 1: status = VgTs_Runnable ==18138== at 0x4007DD8: _dl_map_object (dl-load.c:1941) ==18138== by 0x4000C64: map_doit (rtld.c:483) ==18138== by 0x400EE34: _dl_catch_error (dl-error.c:187) ==18138== by 0x4000870: do_preload (rtld.c:666) ==18138== by 0x4003700: dl_main (rtld.c:1499) ==18138== by 0x4015D31: _dl_sysdep_start (dl-sysdep.c:249) ==18138== by 0x40047F0: _dl_start_final (rtld.c:307) ==18138== by 0x40047F0: _dl_start (rtld.c:413) ==18138== by 0x4000A76: ??? (in /lib32/ld-2.23.so) Note: see also the FAQ in the source distribution. It contains workarounds to several common problems. In particular, if Valgrind aborted or crashed after identifying problems in your program, there's a good chance that fixing those problems will prevent Valgrind aborting or crashing, especially if it happened in m_mallocfree.c. If that doesn't help, please report this bug to: www.valgrind.org In the bug report, send all the above text, the valgrind version, and what OS and version you are using. Thanks. System: uname -a Linux hostname 4.8.15 #2 SMP PREEMPT Sat Dec 17 10:14:28 CET 2016 x86_64 Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz GenuineIntel GNU/Linux gcc -v Using built-in specs. COLLECT_GCC=/usr/x86_64-pc-linux-gnu/gcc-bin/4.9.4/gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/4.9.4/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /var/tmp/portage/sys-devel/gcc-4.9.4/work/gcc-4.9.4/configure --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.9.4 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.4/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.9.4 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.9.4/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.9.4/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.4/include/g++-v4 --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/4.9.4/python --enable-languages=c,c++,fortran --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --enable-checking=release --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo 4.9.4 p1.0, pie-0.6.4' --enable-libstdcxx-time --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --enable-multilib --with-multilib-list=m32,m64 --disable-altivec --disable-fixed-point --enable-targets=all --disable-libgcj --enable-libgomp --disable-libmudflap --disable-libssp --disable-libcilkrts --enable-vtable-verify --enable-libvtv --enable-lto --without-cloog --enable-libsanitizer Thread model: posix gcc version 4.9.4 (Gentoo 4.9.4 p1.0, pie-0.6.4)
Downgrading kernel to version 4.4.39-gentoo and the error is no longer present.
(In reply to Patrik Nyberg from comment #1) > Downgrading kernel to version 4.4.39-gentoo and the error is no longer > present. Yeah, I did wonder if this is kernel specific. The same failure happened some years ago and it turned out to be, if I remember correctly, a kernel bug, in which the kernel did not correctly maintain the FPU state across context switches. I wonder if 4.9.x has some related problem.
Okej, sounds like you might be on to something. I just upgraded to 4.9.4 and the error occurs again.
I don't have any such problems with the 4.9.3 kernel that comes with Fedora 25. Is it possible that this is a Gentoo-specific problem?
It might be, but it also seems to be a problem specific to the hardware. Several of my colleagues have now tried and it seems that all running the same kind of CPU as me (i5-6200U) are having the issue, while it is working fine for others (running on i7-4770 for example).
This patch will solve the problem (at least for the simple hello world case). --- valgrind-3.12.0/coregrind/m_dispatch/dispatch-x86-linux.S.org 2017-01-17 13:52:58.290661172 +0100 +++ valgrind-3.12.0/coregrind/m_dispatch/dispatch-x86-linux.S 2017-01-17 13:53:09.399596888 +0100 @@ -126,6 +126,7 @@ or %fpucw. We can't mess with %eax or %edx here as they holds the tentative return value, but any others are OK. */ #if !defined(ENABLE_INNER) + jmp remove_frame /* This check fails for self-hosting, so skip in that case */ pushl $0 fstcw (%esp)
(In reply to Patrik Nyberg from comment #6) > This patch will solve the problem (at least for the simple hello world case). Sure. That just disables the assertion, though. It doesn't resolve the underlying issue.
(In reply to Patrik Nyberg from comment #5) Are you sure that the i5-6200U connection is the only thing in common? That processor is a mid-range Skylake, and I am sure we would have heard by now if there were problems with Valgrind on Skylake. That's why I ask.
We just found this on the kernel bugzilla. Seems to be related https://bugzilla.kernel.org/show_bug.cgi?id=190061
(In reply to Julian Seward from comment #8) > (In reply to Patrik Nyberg from comment #5) > > Are you sure that the i5-6200U connection is the only thing in > common? That processor is a mid-range Skylake, and I am sure we > would have heard by now if there were problems with Valgrind on > Skylake. That's why I ask. Yes the only thing in common we can find is that, I agree with you that it seems unlikely if this issue is on all Skylake processors.
*** Bug 374850 has been marked as a duplicate of this bug. ***
I am coming over from this closed bug: https://bugs.kde.org/show_bug.cgi?id=375171 In response to whether my processor is Skylake, I believe it is. laptop: https://www.amazon.com/Dell-Inspiron-i7559-2512BLK-Generation-GeForce/dp/B015PYZ0J6 CPU: Intel Quad Core i7-6700HQ 2.6 GHz https://ark.intel.com/products/88967/Intel-Core-i7-6700HQ-Processor-6M-Cache-up-to-3_50-GHz the i7-6700HQ is listed under this wikipedia list of Skylakw processors: https://en.wikipedia.org/wiki/Skylake_(microarchitecture)#List_of_Skylake_processors The intel page doesn't specify the microarchitecture, but wikipedia says it is Skylake. So I presume I am, yes.
*** Bug 374482 has been marked as a duplicate of this bug. ***
Hi, I stumbled upon this issue while working to prepare a new GDB release on a Fedora Rawhide VM. I talked to Mark, he pointed me to this bug, and we decided it would be a good idea to provide some information about how to reproduce it. I found this bug while running the GDB testsuite. I had executed the whole testsuite at one moment, and did not notice any failures related to valgrind. Then, I had to upgrade my VM and make sure it was running the latest software available on Rawhide. The upgrade installed the following packages: https://people.redhat.com/sdurigan/valgrind-375171/dnf-upgrade As you can see, the Linux kernel was upgraded (from kernel-5.1.0-1.fc31.x86_64 to kernel-5.2.0-0.rc1.git3.1.fc31.x86_64). After that, I ran the full testsuite again, and noticed two valgrind-related tests that started failing when they are compiled using -m32: gdb.base/valgrind-disp-step.exp gdb.base/valgrind-infcall.exp They fail on upstream GDB as well, by the way. If you would like to run the tests on your machine, you can: 1) Build GDB: https://sourceware.org/gdb/wiki/BuildingNatively 2) Run (from the build directory): $ make check-gdb TESTS='gdb.base/valgrind-disp-step.exp gdb.base/valgrind-infcall.exp' RUNTESTFLAGS='--target_board unix/-m32' If you would like to see the full log that is generated when you run this command here, you can find it here: https://people.redhat.com/sdurigan/valgrind-375171/gdb.log As I said, I'm using a VM running Fedora Rawhide. The list of packages I have installed can be found here: https://people.redhat.com/sdurigan/valgrind-375171/packages I am able to reproduce the problem every time I run the tests. I can provide more information if needed. Thanks!