Bug 341036 - dumping core and Assertion 'sizeof(*regs) == sizeof(prs->pr_reg)
Summary: dumping core and Assertion 'sizeof(*regs) == sizeof(prs->pr_reg)
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (other bugs)
Version First Reported In: 3.10.0
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL: https://bugs.kde.org/show_bug.cgi?id=...
Keywords:
: 341038 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-11-17 03:50 UTC by szspp99
Modified: 2016-04-09 12:18 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments
This patch is for 3.9 (3.02 KB, patch)
2014-11-18 08:18 UTC, szspp99
Details
make regtest (199.75 KB, text/plain)
2014-11-20 06:40 UTC, szspp99
Details
Valgrind-3.11.0.SVN regtest (121.21 KB, text/plain)
2014-11-21 08:59 UTC, szspp99
Details
mips64-octeon-linux-gnu-gcc test.c -o test -static (3.08 MB, text/plain)
2014-11-27 03:41 UTC, szspp99
Details

Note You need to log in before you can comment on or make changes to this bug.
Description szspp99 2014-11-17 03:50:40 UTC
mips64-octeon-linux-gnu, run with any programs the signal 10 (SIGBUS): dumping core and Assertion 'sizeof(*regs) == sizeof(prs->pr_reg)' , and other programs also have the same problems. 
In https://bugs.kde.org/show_bug.cgi?id=325538, the patch for the 3.9 version provide a solution to this bug. However, this patch berings other inexplicable problems in the 3.10.0 version,
such as unidentified command:vex mips->IR: unhandled instruction bytes: 0xD8 0x5E 0xFE 0xF6


Reproducible: Always

Steps to Reproduce:
1.valgrind --tool=memcheck ls


Actual Results:  
~ # valgrind --tool=memcheck ls
==11696== Memcheck, a memory error detector
==11696== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==11696== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
==11696== Command: ls
==11696== 
==11696== Invalid write of size 8
==11696==    at 0x4001C28: _dl_start_user (in /lib64/ld-2.9.so)
==11696==    by 0x4001BB8: __start (in /lib64/ld-2.9.so)
==11696==  Address 0xfff000868 is on thread 1's stack
==11696==  8 bytes below stack pointer
==11696== 
==11696== Invalid read of size 8
==11696==    at 0x41D3594: (below main) (libc-start.c:213)
==11696==  Address 0xffffffffffff8a00 is not stack'd, malloc'd or (recently) free'd
==11696== 
==11696== 
==11696== Process terminating with default action of signal 10 (SIGBUS): dumping core
==11696==    at 0x41D3594: (below main) (libc-start.c:213)

valgrind: m_coredump/coredump-elf.c:260 (fill_prstatus): Assertion 'sizeof(*regs) == sizeof(prs->pr_reg)' failed.

host stacktrace:
==11696==    at 0x3804B860: show_sched_status_wrk (m_libcassert.c:319)
==11696==    by 0x3804BBB8: report_and_quit (m_libcassert.c:390)
==11696==    by 0x3804BE44: vgPlain_assert_fail (m_libcassert.c:455)
==11696==    by 0x3807F878: fill_prstatus (coredump-elf.c:260)
==11696==    by 0x3807F878: dump_one_thread (coredump-elf.c:567)
==11696==    by 0x3807FBCC: make_elf_coredump (coredump-elf.c:670)
==11696==    by 0x3807FBCC: vgPlain_make_coredump (coredump-elf.c:742)
==11696==    by 0x38066AAC: default_action (m_signals.c:1770)
==11696==    by 0x38066AAC: deliver_signal (m_signals.c:1829)
==11696==    by 0x38068744: sync_signalhandler_from_kernel (m_signals.c:2487)
==11696==    by 0x38068744: sync_signalhandler (m_signals.c:2575)
==11696==    by 0xFFFFFFF00C: ???

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable
==11696==    at 0x41D3594: (below main) (libc-start.c:213)


Expected Results:  
no dump core

uname -a 
Linux (none) 2.6.32.13-Cavium-Octeon #1 SMP Wed Sep 3 12:55:04 CST 2014 mips64 unknown
Comment 1 Tom Hughes 2014-11-17 07:15:14 UTC
*** Bug 341038 has been marked as a duplicate of this bug. ***
Comment 2 Tom Hughes 2014-11-17 07:22:29 UTC
I'm confused - you say that running 3.10 with the patch from the other bug causes other problems, but 3.10 includes that patch as far as I can see.

So are you seeing this assertion with an unpatched 3.10 or are you seeing the other problem - ie the "unhandled instruction bytes" assertion?
Comment 3 szspp99 2014-11-18 05:01:41 UTC
1. An unpatched 3.10 will always give this assertion.
2. 3.10 includes the patch i know, but it seems to be no effect.
Comment 4 Tom Hughes 2014-11-18 07:07:38 UTC
So how were you getting the other assertion then? What patch did you apply to get that?
Comment 5 szspp99 2014-11-18 08:18:44 UTC
Created attachment 89624 [details]
This  patch is for 3.9

This  patch is for 3.9, but i find 3.10 is different about this patch compared with 3.9.  I try to used it for 3.10,  but the "unhandled instruction bytes" assertion is displayed.
Comment 6 Tom Hughes 2014-11-18 08:43:07 UTC
So you're saying that if you use 3.9 with the patch applied then you get the instruction bytes assertion, and if you use unpatched 3.10 you get the fill_prstatus assertion?

In that case I suspect the instruction bytes assertion is not relevant (probably just an instruction that 3.10 now supports) and the real question is why the fix in #325538 is not working for you, but that's a question for the MIPS people.
Comment 7 szspp99 2014-11-19 02:09:47 UTC
thanks, i‘ve another question,  the function of suppression file is just suppress log, anything else?
Comment 8 Petar Jovanovic 2014-11-19 19:18:09 UTC
As stated on different places, support for Cavium instructions
extensions is still incomplete in Valgrind.
Having said this, the situation is getting better with every release
and I hope we can have a few more patches landing soon for Cavium.

In this case, we need to come up with a smallest test example that is
failing for you.

Have you tried to run the test suite - with 'make regtest'?
Comment 9 szspp99 2014-11-20 06:40:30 UTC
Created attachment 89641 [details]
make regtest

there seems to be something error, but I don't know why.
Comment 10 Petar Jovanovic 2014-11-20 11:32:59 UTC
(In reply to szspp99 from comment #9)
> Created attachment 89641 [details]
> make regtest
> 
> there seems to be something error, but I don't know why.

Can you checkout Valgrind from the trunk and do the same?
This FPU test issue you are seeing has been fixed since 3.10.
Comment 11 szspp99 2014-11-21 08:59:55 UTC
Created attachment 89664 [details]
Valgrind-3.11.0.SVN regtest

some garbled code.
Comment 12 Petar Jovanovic 2014-11-27 00:17:24 UTC
(In reply to szspp99 from comment #11)
> Created attachment 89664 [details]
> Valgrind-3.11.0.SVN regtest
> 
> some garbled code.

Are you seeing the Valgrind issue with any program you try?
If so, can you make the simplest example possible that will trigger the issue (feel free to try "int main() { return 5; }"), compile it statically and attach it here?
Comment 13 szspp99 2014-11-27 03:41:53 UTC
Created attachment 89740 [details]
mips64-octeon-linux-gnu-gcc test.c -o test -static
Comment 14 szspp99 2014-11-27 03:43:26 UTC
/tmp # valgrind --tool=memcheck ./test
==4526== Memcheck, a memory error detector
==4526== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==4526== Using Valgrind-3.11.0.SVN and LibVEX; rerun with -h for copyright info
==4526== Command: ./test
==4526== 
==4526== Invalid write of size 8
==4526==    at 0x120005BC0: ptmalloc_init (arena.c:486)
==4526==    by 0x12000AEC0: malloc_hook_ini (hooks.c:37)
==4526==    by 0x120038AF4: _dl_init_paths (dl-load.c:649)
==4526==    by 0x12000EC80: _dl_non_dynamic_init (dl-support.c:246)
==4526==    by 0x12000F7A0: __libc_init_first (init-first.c:82)
==4526==    by 0x120003B44: (below main) (libc-start.c:159)
==4526==  Address 0xffffffffffff9028 is not stack'd, malloc'd or (recently) free'd
==4526== 
==4526== 
==4526== Process terminating with default action of signal 10 (SIGBUS): dumping core
==4526==    at 0x120005BC0: ptmalloc_init (arena.c:486)
==4526==    by 0x12000AEC0: malloc_hook_ini (hooks.c:37)
==4526==    by 0x120038AF4: _dl_init_paths (dl-load.c:649)
==4526==    by 0x12000EC80: _dl_non_dynamic_init (dl-support.c:246)
==4526==    by 0x12000F7A0: __libc_init_first (init-first.c:82)
==4526==    by 0x120003B44: (below main) (libc-start.c:159)

valgrind: m_coredump/coredump-elf.c:262 (fill_prstatus): Assertion 'sizeof(*regs) == sizeof(prs->pr_reg)' failed.

host stacktrace:
==4526==    at 0x3804B7A0: show_sched_status_wrk (m_libcassert.c:319)
==4526==    by 0x3804BAF8: report_and_quit (m_libcassert.c:390)
==4526==    by 0x3804BD5C: vgPlain_assert_fail (m_libcassert.c:456)
==4526==    by 0x3807F940: fill_prstatus (coredump-elf.c:262)
==4526==    by 0x3807F940: dump_one_thread (coredump-elf.c:571)
==4526==    by 0x3807FC94: make_elf_coredump (coredump-elf.c:674)
==4526==    by 0x3807FC94: vgPlain_make_coredump (coredump-elf.c:748)
==4526==    by 0x38066CDC: default_action (m_signals.c:1777)
==4526==    by 0x38066CDC: deliver_signal (m_signals.c:1836)
==4526==    by 0x380689C4: sync_signalhandler_from_kernel (m_signals.c:2493)
==4526==    by 0x380689C4: sync_signalhandler (m_signals.c:2581)
==4526==    by 0xFFFFFFF00C: ???

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable
==4526==    at 0x120005BC0: ptmalloc_init (arena.c:486)
==4526==    by 0x12000AEC0: malloc_hook_ini (hooks.c:37)
==4526==    by 0x120038AF4: _dl_init_paths (dl-load.c:649)
==4526==    by 0x12000EC80: _dl_non_dynamic_init (dl-support.c:246)
==4526==    by 0x12000F7A0: __libc_init_first (init-first.c:82)
==4526==    by 0x120003B44: (below main) (libc-start.c:159)


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.
Comment 15 szspp99 2014-11-27 04:39:51 UTC
It‘s looks like to be the same problem as before.
Comment 16 Petar Jovanovic 2014-11-27 15:46:05 UTC
OK, I see and know the problem now.
The easiest way for you is to take a patch proposed at BZ #328670.

Here is the link to the patch:
https://bugsfiles.kde.org/attachment.cgi?id=84046

Let me know if this fixes the issue for you.
Comment 17 szspp99 2014-11-28 03:32:39 UTC
Yes, the problem has been fixed. Thank you very much,
Comment 18 Julian Seward 2015-02-03 00:03:05 UTC
Petar, can I close this as a dup of bug 328670 (per your comment 16), or is
this something different?
Comment 19 Julian Seward 2015-03-27 17:02:42 UTC
(In reply to Petar Jovanovic from comment #16)
> OK, I see and know the problem now.
> The easiest way for you is to take a patch proposed at BZ #328670.

There's something very strange about this patch.  It defines _MIPS_ARCH_OCTEON 
as something that can be evaluated at run time, but then uses it as if it was a
preprocessor macro.  That can't be right.
Comment 20 Julian Seward 2015-03-27 17:04:49 UTC
I might be persuaded to land this if anybody offers to test it on both
octeon and "normal" mips.  But I don't want to land it without proper
testing.
Comment 21 Petar Jovanovic 2015-03-27 17:16:27 UTC
(In reply to Julian Seward from comment #18)
> Petar, can I close this as a dup of bug 328670 (per your comment 16), or is
> this something different?

I would close it.

(In reply to Julian Seward from comment #20)
> I might be persuaded to land this if anybody offers to test it on both
> octeon and "normal" mips.  But I don't want to land it without proper
> testing.

Any reason we should be taking these changes, since agreement on BZ# 328670 was to close the issue as won't fix?
Comment 22 Julian Seward 2015-03-27 17:28:46 UTC
If we don't take this patch, will we cause inconvenience for many people?
Are most people using "normal" toolchains on Octeon, and therefore don't
need this patch?  I don't have any understanding about the landscape of
these apparently-mutually-incompatible MIPS variants, so I don't have much
basis on which to usefully comment.
Comment 23 Petar Jovanovic 2015-03-27 18:10:08 UTC
(In reply to Julian Seward from comment #22)
> If we don't take this patch, will we cause inconvenience for many people?
> Are most people using "normal" toolchains on Octeon, and therefore don't
> need this patch?  I don't have any understanding about the landscape of
> these apparently-mutually-incompatible MIPS variants, so I don't have much
> basis on which to usefully comment.

As it can be seen on bug 328670, I was not against applying the patch, I was only advocating we should have a valid regression test and the patch is applied for Cavium variants only. This change should be relevant only for programs built for older (pre-Cavium II) cores, if I am right.
How many people are affected without it - I do not know, I would guess not many, these would be less and less as time passes. Still, it came from Maran (@Cavium) suggestion to ignore the patch, especially since the changes in kernel (that correspond to this) have not been upstream, so there is no public reference why we would be doing it.
Comment 24 Crestez Dan Leonard 2015-03-31 18:05:10 UTC
It seems that if you compile with -march=octeon2 and newer then k0 won't be used. But the latest toolchain from cavium still seems to generate code that uses k0 by default.

What's worse is that cavium-supplied glibc is compiled to make use of $k0. Unless valgrind supports this then any program will die before reaching main(). I run on octeon2 HW and compile with -march=octeon2 and still need this. In theory it might be possible to recompile glibc with different flags but that's not fun.

It's also worth noting that this hack shouldn't be break anything else because $k0 is normally undefined for userspace.
Comment 25 Petar Jovanovic 2015-04-02 01:29:56 UTC
(In reply to Crestez Dan Leonard from comment #24)
> Unless valgrind supports this then any program will die before reaching
> main(). I run on octeon2 HW and compile with -march=octeon2 and still need
> this. In theory it might be possible to recompile glibc with different flags
> but that's not fun.
> 
It depends. It can be. Are Cavium additions to glibc available in public?
Can you share sources for the kernel that make this change?

If you bring up a regular MIPS32/MIPS64 image, you will not have these issues. Support for Cavium specific changes is incomplete in general, so this is not the only issue.

Again, we can add the change, just someone needs to come up with a regression test. Also, we can reopen the BZ#328670 and move discussion to that issue.

> It's also worth noting that this hack shouldn't be break anything else
> because $k0 is normally undefined for userspace.

True, but false for pre-Cavium II.