mips64-octeon-linux-gnu, run with any programs the signal 10 (SIGBUS): dumping core and Assertion 'sizeof(*regs) == sizeof(prs->pr_reg)' , and other programs also have the same problems. In https://bugs.kde.org/show_bug.cgi?id=325538, the patch for the 3.9 version provide a solution to this bug. However, this patch berings other inexplicable problems in the 3.10.0 version, such as unidentified command:vex mips->IR: unhandled instruction bytes: 0xD8 0x5E 0xFE 0xF6 Reproducible: Always Steps to Reproduce: 1.valgrind --tool=memcheck ls Actual Results: ~ # valgrind --tool=memcheck ls ==11696== Memcheck, a memory error detector ==11696== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==11696== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info ==11696== Command: ls ==11696== ==11696== Invalid write of size 8 ==11696== at 0x4001C28: _dl_start_user (in /lib64/ld-2.9.so) ==11696== by 0x4001BB8: __start (in /lib64/ld-2.9.so) ==11696== Address 0xfff000868 is on thread 1's stack ==11696== 8 bytes below stack pointer ==11696== ==11696== Invalid read of size 8 ==11696== at 0x41D3594: (below main) (libc-start.c:213) ==11696== Address 0xffffffffffff8a00 is not stack'd, malloc'd or (recently) free'd ==11696== ==11696== ==11696== Process terminating with default action of signal 10 (SIGBUS): dumping core ==11696== at 0x41D3594: (below main) (libc-start.c:213) valgrind: m_coredump/coredump-elf.c:260 (fill_prstatus): Assertion 'sizeof(*regs) == sizeof(prs->pr_reg)' failed. host stacktrace: ==11696== at 0x3804B860: show_sched_status_wrk (m_libcassert.c:319) ==11696== by 0x3804BBB8: report_and_quit (m_libcassert.c:390) ==11696== by 0x3804BE44: vgPlain_assert_fail (m_libcassert.c:455) ==11696== by 0x3807F878: fill_prstatus (coredump-elf.c:260) ==11696== by 0x3807F878: dump_one_thread (coredump-elf.c:567) ==11696== by 0x3807FBCC: make_elf_coredump (coredump-elf.c:670) ==11696== by 0x3807FBCC: vgPlain_make_coredump (coredump-elf.c:742) ==11696== by 0x38066AAC: default_action (m_signals.c:1770) ==11696== by 0x38066AAC: deliver_signal (m_signals.c:1829) ==11696== by 0x38068744: sync_signalhandler_from_kernel (m_signals.c:2487) ==11696== by 0x38068744: sync_signalhandler (m_signals.c:2575) ==11696== by 0xFFFFFFF00C: ??? sched status: running_tid=1 Thread 1: status = VgTs_Runnable ==11696== at 0x41D3594: (below main) (libc-start.c:213) Expected Results: no dump core uname -a Linux (none) 2.6.32.13-Cavium-Octeon #1 SMP Wed Sep 3 12:55:04 CST 2014 mips64 unknown
*** Bug 341038 has been marked as a duplicate of this bug. ***
I'm confused - you say that running 3.10 with the patch from the other bug causes other problems, but 3.10 includes that patch as far as I can see. So are you seeing this assertion with an unpatched 3.10 or are you seeing the other problem - ie the "unhandled instruction bytes" assertion?
1. An unpatched 3.10 will always give this assertion. 2. 3.10 includes the patch i know, but it seems to be no effect.
So how were you getting the other assertion then? What patch did you apply to get that?
Created attachment 89624 [details] This patch is for 3.9 This patch is for 3.9, but i find 3.10 is different about this patch compared with 3.9. I try to used it for 3.10, but the "unhandled instruction bytes" assertion is displayed.
So you're saying that if you use 3.9 with the patch applied then you get the instruction bytes assertion, and if you use unpatched 3.10 you get the fill_prstatus assertion? In that case I suspect the instruction bytes assertion is not relevant (probably just an instruction that 3.10 now supports) and the real question is why the fix in #325538 is not working for you, but that's a question for the MIPS people.
thanks, i‘ve another question, the function of suppression file is just suppress log, anything else?
As stated on different places, support for Cavium instructions extensions is still incomplete in Valgrind. Having said this, the situation is getting better with every release and I hope we can have a few more patches landing soon for Cavium. In this case, we need to come up with a smallest test example that is failing for you. Have you tried to run the test suite - with 'make regtest'?
Created attachment 89641 [details] make regtest there seems to be something error, but I don't know why.
(In reply to szspp99 from comment #9) > Created attachment 89641 [details] > make regtest > > there seems to be something error, but I don't know why. Can you checkout Valgrind from the trunk and do the same? This FPU test issue you are seeing has been fixed since 3.10.
Created attachment 89664 [details] Valgrind-3.11.0.SVN regtest some garbled code.
(In reply to szspp99 from comment #11) > Created attachment 89664 [details] > Valgrind-3.11.0.SVN regtest > > some garbled code. Are you seeing the Valgrind issue with any program you try? If so, can you make the simplest example possible that will trigger the issue (feel free to try "int main() { return 5; }"), compile it statically and attach it here?
Created attachment 89740 [details] mips64-octeon-linux-gnu-gcc test.c -o test -static
/tmp # valgrind --tool=memcheck ./test ==4526== Memcheck, a memory error detector ==4526== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==4526== Using Valgrind-3.11.0.SVN and LibVEX; rerun with -h for copyright info ==4526== Command: ./test ==4526== ==4526== Invalid write of size 8 ==4526== at 0x120005BC0: ptmalloc_init (arena.c:486) ==4526== by 0x12000AEC0: malloc_hook_ini (hooks.c:37) ==4526== by 0x120038AF4: _dl_init_paths (dl-load.c:649) ==4526== by 0x12000EC80: _dl_non_dynamic_init (dl-support.c:246) ==4526== by 0x12000F7A0: __libc_init_first (init-first.c:82) ==4526== by 0x120003B44: (below main) (libc-start.c:159) ==4526== Address 0xffffffffffff9028 is not stack'd, malloc'd or (recently) free'd ==4526== ==4526== ==4526== Process terminating with default action of signal 10 (SIGBUS): dumping core ==4526== at 0x120005BC0: ptmalloc_init (arena.c:486) ==4526== by 0x12000AEC0: malloc_hook_ini (hooks.c:37) ==4526== by 0x120038AF4: _dl_init_paths (dl-load.c:649) ==4526== by 0x12000EC80: _dl_non_dynamic_init (dl-support.c:246) ==4526== by 0x12000F7A0: __libc_init_first (init-first.c:82) ==4526== by 0x120003B44: (below main) (libc-start.c:159) valgrind: m_coredump/coredump-elf.c:262 (fill_prstatus): Assertion 'sizeof(*regs) == sizeof(prs->pr_reg)' failed. host stacktrace: ==4526== at 0x3804B7A0: show_sched_status_wrk (m_libcassert.c:319) ==4526== by 0x3804BAF8: report_and_quit (m_libcassert.c:390) ==4526== by 0x3804BD5C: vgPlain_assert_fail (m_libcassert.c:456) ==4526== by 0x3807F940: fill_prstatus (coredump-elf.c:262) ==4526== by 0x3807F940: dump_one_thread (coredump-elf.c:571) ==4526== by 0x3807FC94: make_elf_coredump (coredump-elf.c:674) ==4526== by 0x3807FC94: vgPlain_make_coredump (coredump-elf.c:748) ==4526== by 0x38066CDC: default_action (m_signals.c:1777) ==4526== by 0x38066CDC: deliver_signal (m_signals.c:1836) ==4526== by 0x380689C4: sync_signalhandler_from_kernel (m_signals.c:2493) ==4526== by 0x380689C4: sync_signalhandler (m_signals.c:2581) ==4526== by 0xFFFFFFF00C: ??? sched status: running_tid=1 Thread 1: status = VgTs_Runnable ==4526== at 0x120005BC0: ptmalloc_init (arena.c:486) ==4526== by 0x12000AEC0: malloc_hook_ini (hooks.c:37) ==4526== by 0x120038AF4: _dl_init_paths (dl-load.c:649) ==4526== by 0x12000EC80: _dl_non_dynamic_init (dl-support.c:246) ==4526== by 0x12000F7A0: __libc_init_first (init-first.c:82) ==4526== by 0x120003B44: (below main) (libc-start.c:159) Note: see also the FAQ in the source distribution. It contains workarounds to several common problems. In particular, if Valgrind aborted or crashed after identifying problems in your program, there's a good chance that fixing those problems will prevent Valgrind aborting or crashing, especially if it happened in m_mallocfree.c. If that doesn't help, please report this bug to: www.valgrind.org In the bug report, send all the above text, the valgrind version, and what OS and version you are using. Thanks.
It‘s looks like to be the same problem as before.
OK, I see and know the problem now. The easiest way for you is to take a patch proposed at BZ #328670. Here is the link to the patch: https://bugsfiles.kde.org/attachment.cgi?id=84046 Let me know if this fixes the issue for you.
Yes, the problem has been fixed. Thank you very much,
Petar, can I close this as a dup of bug 328670 (per your comment 16), or is this something different?
(In reply to Petar Jovanovic from comment #16) > OK, I see and know the problem now. > The easiest way for you is to take a patch proposed at BZ #328670. There's something very strange about this patch. It defines _MIPS_ARCH_OCTEON as something that can be evaluated at run time, but then uses it as if it was a preprocessor macro. That can't be right.
I might be persuaded to land this if anybody offers to test it on both octeon and "normal" mips. But I don't want to land it without proper testing.
(In reply to Julian Seward from comment #18) > Petar, can I close this as a dup of bug 328670 (per your comment 16), or is > this something different? I would close it. (In reply to Julian Seward from comment #20) > I might be persuaded to land this if anybody offers to test it on both > octeon and "normal" mips. But I don't want to land it without proper > testing. Any reason we should be taking these changes, since agreement on BZ# 328670 was to close the issue as won't fix?
If we don't take this patch, will we cause inconvenience for many people? Are most people using "normal" toolchains on Octeon, and therefore don't need this patch? I don't have any understanding about the landscape of these apparently-mutually-incompatible MIPS variants, so I don't have much basis on which to usefully comment.
(In reply to Julian Seward from comment #22) > If we don't take this patch, will we cause inconvenience for many people? > Are most people using "normal" toolchains on Octeon, and therefore don't > need this patch? I don't have any understanding about the landscape of > these apparently-mutually-incompatible MIPS variants, so I don't have much > basis on which to usefully comment. As it can be seen on bug 328670, I was not against applying the patch, I was only advocating we should have a valid regression test and the patch is applied for Cavium variants only. This change should be relevant only for programs built for older (pre-Cavium II) cores, if I am right. How many people are affected without it - I do not know, I would guess not many, these would be less and less as time passes. Still, it came from Maran (@Cavium) suggestion to ignore the patch, especially since the changes in kernel (that correspond to this) have not been upstream, so there is no public reference why we would be doing it.
It seems that if you compile with -march=octeon2 and newer then k0 won't be used. But the latest toolchain from cavium still seems to generate code that uses k0 by default. What's worse is that cavium-supplied glibc is compiled to make use of $k0. Unless valgrind supports this then any program will die before reaching main(). I run on octeon2 HW and compile with -march=octeon2 and still need this. In theory it might be possible to recompile glibc with different flags but that's not fun. It's also worth noting that this hack shouldn't be break anything else because $k0 is normally undefined for userspace.
(In reply to Crestez Dan Leonard from comment #24) > Unless valgrind supports this then any program will die before reaching > main(). I run on octeon2 HW and compile with -march=octeon2 and still need > this. In theory it might be possible to recompile glibc with different flags > but that's not fun. > It depends. It can be. Are Cavium additions to glibc available in public? Can you share sources for the kernel that make this change? If you bring up a regular MIPS32/MIPS64 image, you will not have these issues. Support for Cavium specific changes is incomplete in general, so this is not the only issue. Again, we can add the change, just someone needs to come up with a regression test. Also, we can reopen the BZ#328670 and move discussion to that issue. > It's also worth noting that this hack shouldn't be break anything else > because $k0 is normally undefined for userspace. True, but false for pre-Cavium II.