325538 – cavim octeon mips64 ，valgrind reported "dumping core" and "Assertion 'sizeof(*regs) == sizeof(prs->pr_reg)' failed."

Bug 325538 - cavim octeon mips64 ，valgrind reported "dumping core" and "Assertion 'sizeof(*regs) == sizeof(prs->pr_reg)' failed."

Summary: cavim octeon mips64 ，valgrind reported "dumping core" and "Assertion 'sizeof(...

Status:	RESOLVED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	memcheck (other bugs)
Version First Reported In:	3.9.0.SVN
Platform:	Ubuntu Linux

Importance:	NOR grave
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2013-10-02 09:24 UTC by mengzr
Modified:	2014-09-03 06:42 UTC (History)
CC List:	3 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
test.elf (11.55 KB, application/octet-stream) 2013-10-08 06:55 UTC, mengzr	Details
statically compile (3.08 MB, application/octet-stream) 2013-10-08 10:13 UTC, mengzr	Details
cavium patch (3.02 KB, patch) 2013-10-14 16:49 UTC, Dejan Jevtic	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description mengzr 2013-10-02 09:24:08 UTC

/tmp/valgrind/bin # ./valgrind --tool=memcheck ls -l
==1003== Memcheck, a memory error detector
==1003== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==1003== Using Valgrind-3.9.0.SVN and LibVEX; rerun with -h for copyright info
==1003== Command: ls -l
==1003== 
      1003:
      1003:
      1003:
==1003== Invalid read of size 4
==1003==    at 0x400C820: _dl_relocate_object (in /lib64/ld-2.9.so)
==1003==    by 0x4004C30: dl_main (in /lib64/ld-2.9.so)
==1003==  Address 0x50 is not stack'd, malloc'd or (recently) free'd
==1003== 
==1003== 
==1003== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==1003==  Access not within mapped region at address 0x50
==1003==    at 0x400C820: _dl_relocate_object (in /lib64/ld-2.9.so)
==1003==    by 0x4004C30: dl_main (in /lib64/ld-2.9.so)
==1003==  If you believe this happened as a result of a stack
==1003==  overflow in your program's main thread (unlikely but
==1003==  possible), you can try to increase the size of the
==1003==  main thread stack using the --main-stacksize= flag.
==1003==  The main thread stack size used in this run was 8388608.

valgrind: m_coredump/coredump-elf.c:259 (fill_prstatus): Assertion 'sizeof(*regs) == sizeof(prs->pr_reg)' failed.
==1003==    at 0x380479BC: report_and_quit (m_libcassert.c:260)
==1003==    by 0x38047C4C: vgPlain_assert_fail (m_libcassert.c:340)
==1003==    by 0x38078B9C: make_elf_coredump (coredump-elf.c:259)
==1003==    by 0x38060A0C: deliver_signal (m_signals.c:1732)
==1003==    by 0x380626A4: sync_signalhandler (m_signals.c:2449)
==1003==    by 0xFFFFFFF00C: ???

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable
==1003==    at 0x400C820: _dl_relocate_object (in /lib64/ld-2.9.so)
==1003==    by 0x4004C30: dl_main (in /lib64/ld-2.9.so)


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.


Reproducible: Always

Steps to Reproduce:
1. in ubuntu 12.04,  configure and compile Valgrind
    svn co svn://svn.valgrind.org/valgrind/trunk valgrind
     cd valgrind
    ./autogen.sh
    ./configure --prefix=/tmp/valgrind --host=mips64-octeon-linux-gnu
    make
    make install
2. compress /tmp/valgrind, generated valgrind.zip
3. transfer valgrind.zip to router with tftp
    router's cpu is mips64 octeon (Cavium )
    router's OS is linux 2.6.32
4. uzip valgrind.zip
5. cd /tmp/valgrind/bin
6. /tmp/valgrind/bin # ./valgrind --tool=memcheck ls -l

Comment 1 Dejan Jevtic 2013-10-02 12:10:26 UTC

@mengzr
Can you tell me how to get that rootfs that is running on your router?

Comment 2 mengzr 2013-10-03 12:20:42 UTC

@Dejan Jevtic ,  sorry，the rootfs haven't been open in public website， if you want to do some test， I can do it for you。

Comment 3 Dejan Jevtic 2013-10-03 12:44:27 UTC

@mengzr

Can you try to statically compile (-static) little example that just return 0:

int main() {return 0;}

Then you can try to run this example under Valgrind.
If this fails can you send me the binary that fails?

Comment 4 mengzr 2013-10-08 06:55:03 UTC

Created attachment 82715 [details]
test.elf

Comment 5 mengzr 2013-10-08 07:01:46 UTC

@Dejan Jevtic 
        I'm sorry so late reply to you due to vacation，I try it according to your method，valgrind reported the same problem，the binary (test.elf) in the attachment，please help me to analysis  again， thank u。

--------------------------------------------------------------------------------------------------------------------------

~ # /tmp/valgrind/bin/valgrind --tool=memcheck ./test.elf 
==1018== Memcheck, a memory error detector
==1018== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==1018== Using Valgrind-3.9.0.SVN and LibVEX; rerun with -h for copyright info
==1018== Command: ./test.elf
==1018== 
      1018:
      1018:
      1018:
      1018:
      1018:
      1018:
      1018:
      1018:
      1018:
      1018:
      1018:
      1018:
      1018:
==1018== Invalid read of size 4
==1018==    at 0x400CDD8: _dl_relocate_object (in /lib64/ld-2.9.so)
==1018==    by 0x4004C30: dl_main (in /lib64/ld-2.9.so)
==1018==  Address 0x128 is not stack'd, malloc'd or (recently) free'd
==1018== 
==1018== 
==1018== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==1018==  Access not within mapped region at address 0x128
==1018==    at 0x400CDD8: _dl_relocate_object (in /lib64/ld-2.9.so)
==1018==    by 0x4004C30: dl_main (in /lib64/ld-2.9.so)
==1018==  If you believe this happened as a result of a stack
==1018==  overflow in your program's main thread (unlikely but
==1018==  possible), you can try to increase the size of the
==1018==  main thread stack using the --main-stacksize= flag.
==1018==  The main thread stack size used in this run was 8388608.

valgrind: m_coredump/coredump-elf.c:259 (fill_prstatus): Assertion 'sizeof(*regs) == sizeof(prs->pr_reg)' failed.
==1018==    at 0x380479BC: report_and_quit (m_libcassert.c:260)
==1018==    by 0x38047C4C: vgPlain_assert_fail (m_libcassert.c:340)
==1018==    by 0x38078B9C: make_elf_coredump (coredump-elf.c:259)
==1018==    by 0x38060A0C: deliver_signal (m_signals.c:1732)
==1018==    by 0x380626A4: sync_signalhandler (m_signals.c:2449)
==1018==    by 0xFFFFFFF00C: ???

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable
==1018==    at 0x400CDD8: _dl_relocate_object (in /lib64/ld-2.9.so)
==1018==    by 0x4004C30: dl_main (in /lib64/ld-2.9.so)


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.

Comment 6 Dejan Jevtic 2013-10-08 07:29:11 UTC

@mengzr
You didn't statically compiled your program.
Can you try to compile your program with -static and for little endian e.g.
$ mips64-octeon-linux-gnu-gcc test.c -g3 -static -EL -o test.elf

Comment 7 mengzr 2013-10-08 10:13:19 UTC

Created attachment 82717 [details]
statically compile

Comment 8 mengzr 2013-10-08 10:21:22 UTC

@Dejan Jevtic 

  statically compiled：
./tools/bin/mips64-octeon-linux-gnu-gcc tmp/rootfs-build/ip/tunnel/tunnel_test/client.c -g3 -static -o test.elf

 valgrind reported this：
~ # /tmp/valgrind/bin/valgrind --tool=memcheck ./test.elf          
==1030== Memcheck, a memory error detector
==1030== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==1030== Using Valgrind-3.9.0.SVN and LibVEX; rerun with -h for copyright info
==1030== Command: ./test.elf
==1030== 
==1030== Invalid write of size 8
==1030==    at 0x120005BD0: ptmalloc_init (arena.c:486)
==1030==    by 0x12000AED0: malloc_hook_ini (hooks.c:37)
==1030==    by 0x120038B04: _dl_init_paths (dl-load.c:649)
==1030==    by 0x12000EC90: _dl_non_dynamic_init (dl-support.c:246)
==1030==    by 0x12000F7B0: __libc_init_first (init-first.c:82)
==1030==    by 0x120003B54: (below main) (libc-start.c:159)
==1030==  Address 0xffffffffffff9028 is not stack'd, malloc'd or (recently) free'd
==1030== 
==1030== 
==1030== Process terminating with default action of signal 10 (SIGBUS): dumping core
==1030==    at 0x120005BD0: ptmalloc_init (arena.c:486)
==1030==    by 0x12000AED0: malloc_hook_ini (hooks.c:37)
==1030==    by 0x120038B04: _dl_init_paths (dl-load.c:649)
==1030==    by 0x12000EC90: _dl_non_dynamic_init (dl-support.c:246)
==1030==    by 0x12000F7B0: __libc_init_first (init-first.c:82)
==1030==    by 0x120003B54: (below main) (libc-start.c:159)

valgrind: m_coredump/coredump-elf.c:259 (fill_prstatus): Assertion 'sizeof(*regs) == sizeof(prs->pr_reg)' failed.
==1030==    at 0x380479BC: report_and_quit (m_libcassert.c:260)
==1030==    by 0x38047C4C: vgPlain_assert_fail (m_libcassert.c:340)
==1030==    by 0x38078B9C: make_elf_coredump (coredump-elf.c:259)
==1030==    by 0x38060A0C: deliver_signal (m_signals.c:1732)
==1030==    by 0x380626A4: sync_signalhandler (m_signals.c:2449)
==1030==    by 0xFFFFFFF00C: ???

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable
==1030==    at 0x120005BD0: ptmalloc_init (arena.c:486)
==1030==    by 0x12000AED0: malloc_hook_ini (hooks.c:37)
==1030==    by 0x120038B04: _dl_init_paths (dl-load.c:649)
==1030==    by 0x12000EC90: _dl_non_dynamic_init (dl-support.c:246)
==1030==    by 0x12000F7B0: __libc_init_first (init-first.c:82)
==1030==    by 0x120003B54: (below main) (libc-start.c:159)


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.

Comment 9 mengzr 2013-10-08 10:27:10 UTC

it looks like octeon do not support    little endian.
after run  "mips64-octeon-linux-gnu-gcc test.c -g3 -static -EL -o test.elf"

mips64-octeon-linux-gnu-gcc report this:

/home/mengzr/projects/rgosm-build/.toolchain-octeon/tools-gcc-4.3/bin/../lib/gcc/mips64-octeon-linux-gnu/4.3.3/../../../../mips64-octeon-linux-gnu/bin/ld: /home/mengzr/projects/rgosm-build/.toolchain-octeon/tools-gcc-4.3/bin/../lib/gcc/mips64-octeon-linux-gnu/4.3.3/libgcc.a(_fpcmp_parts_tf.o): compiled for a big endian system and target is little endian
/home/mengzr/projects/rgosm-build/.toolchain-octeon/tools-gcc-4.3/bin/../lib/gcc/mips64-octeon-linux-gnu/4.3.3/../../../../mips64-octeon-linux-gnu/bin/ld: /home/mengzr/projects/rgosm-build/.toolchain-octeon/tools-gcc-4.3/bin/../lib/gcc/mips64-octeon-linux-gnu/4.3.3/libgcc.a(_fpcmp_parts_tf.o): endianness incompatible with that of the selected emulation
/home/mengzr/projects/rgosm-build/.toolchain-octeon/tools-gcc-4.3/bin/../lib/gcc/mips64-octeon-linux-gnu/4.3.3/../../../../mips64-octeon-linux-gnu/bin/ld: failed to merge target specific data of file /home/mengzr/projects/rgosm-build/.toolchain-octeon/tools-gcc-4.3/bin/../lib/gcc/mips64-octeon-linux-gnu/4.3.3/libgcc.a(_fpcmp_parts_tf.o)

......

Comment 10 mengzr 2013-10-12 22:58:37 UTC

hi，@Dejan Jevtic 
Have you found a solution to this problem 。

Comment 11 Dejan Jevtic 2013-10-14 16:49:11 UTC

Created attachment 82851 [details]
cavium patch

@mengzr

Currently Valgrind is fully supporting only vanilla mips64 and
mips64r2 instruction sets.
We added some Cavium specific instructions, but not all of them.
In your example we have bbit0 and bbit1 instructions that are
currently not supported by Valgrind because these instructions
are not mips64 or mips64r2. I attached a small patch that
adds support for these instructions. The small test that you sent
to me now should run with no errors.
Can you try to apply this patch and run tests.

Comment 12 mengzr 2013-10-15 10:59:57 UTC

@Dejan Jevtic

would you tell me how to patch "patch_cavium.diff" in valgrind source code.

Comment 13 Dejan Jevtic 2013-10-15 11:13:40 UTC

@mengzr

$ svn co svn://svn.valgrind.org/valgrind/trunk valgrind
$ cd valgrind
$ patch -p0 < /path/to/your/patch

Comment 14 mengzr 2013-10-16 03:05:48 UTC

@Dejan Jevtic, thank you very much. valgrind can be used in cavim octeon mips64 now.

~ # /tmp/valgrind/bin/valgrind --tool=memcheck ./test.elf 
==998== Memcheck, a memory error detector
==998== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==998== Using Valgrind-3.9.0.SVN and LibVEX; rerun with -h for copyright info
==998== Command: ./test.elf
==998== 
==998== 
==998== HEAP SUMMARY:
==998==     in use at exit: 0 bytes in 0 blocks
==998==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==998== 
==998== All heap blocks were freed -- no leaks are possible
==998== 
==998== For counts of detected and suppressed errors, rerun with: -v
==998== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
~ # 
~ # 
~ # 
~ # 
~ # /tmp/valgrind/bin/valgrind --tool=memcheck ls -l      
==999== Memcheck, a memory error detector
==999== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==999== Using Valgrind-3.9.0.SVN and LibVEX; rerun with -h for copyright info
==999== Command: ls -l
==999== 
==999== Invalid write of size 8
==999==    at 0x4001C28: _dl_start_user (in /lib64/ld-2.9.so)
==999==    by 0x4001BB8: __start (in /lib64/ld-2.9.so)
==999==  Address 0xfff000ab8 is just below the stack ptr.  To suppress, use: --workaround-gcc296-bugs=yes
==999== 
==999== Warning: noted but unhandled ioctl 0x40087468 with no size/direction hints
==999==    This could cause spurious value errors to appear.
==999==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
*Oct 16 02:46:22: %RG_SYSMON-4-CPU_WATERMARK_HIGH: warning! cpu0 usage above high watermark(85%),current cpu usage 100%
drwxr-xr-x    2 1001     1001         4560 Aug 27 08:46 bin
drwxr-xr-x    2 1001     1001          432 Aug 27 09:08 boot
drwxr-xr-x    2 1001     1001          160 Aug 27 08:46 bootloader
drwxrwxrwx   10 root     root         1456 Aug 31 07:54 data
drwxr-xr-x    6 root     root        97600 Oct 16 02:30 dev
drwxr-xr-x   11 1001     1001         2120 Aug 27 11:13 etc
drwxr-xr-x    5 1001     1001          352 Aug 27 11:13 home
drwxr-xr-x    3 1001     1001          224 Aug 27 08:59 lib
drwxr-xr-x    4 1001     1001        14528 Aug 27 08:50 lib64
lrwxrwxrwx    1 1001     1001           10 Aug 27 08:46 linuxrc -> /sbin/init
drwxrwxrwx    6 1001     1001          416 Aug 27 11:13 mnt
dr-xr-xr-x  150 root     root            0 Jan  1  1970 proc
drwxr-xr-x    2 1001     1001          160 Aug 27 08:46 root
drwxr-xr-x    3 1001     1001          232 Aug 27 11:13 rootfs
drwxr-xr-x    2 1001     1001        12328 Aug 27 09:07 sbin
drwxr-xr-x   15 root     root            0 Oct 16 02:30 sys
-rwxrwxrwx    1 root     root      3232842 Oct  8 10:09 test.elf
drwxrwxrws    9 root     root          400 Oct 16 02:42 tmp
-rwxrwxrwx    1 root     root         3949 Aug 31 07:20 tunnel.sh
-rwxrwxrwx    1 root     root        11827 Oct  8 07:07 tunnel_client.elf
-rwxrwxrwx    1 root     root         4174 Aug 31 07:20 tunnel_kernel.sh
drwxr-xr-x    9 1001     1001          608 Aug 27 09:07 usr
drwxr-xr-x    4 1001     1001          416 Aug 27 11:13 var
-rw-------    1 root     root            0 Oct  8 06:44 vgcore.1018
-rw-------    1 root     root            0 Aug 27 12:34 vgcore.1023
-rw-------    1 root     root            0 Oct  8 10:09 vgcore.1030
-rw-------    1 root     root            0 Aug 27 12:44 vgcore.1031
-rw-------    1 root     root            0 Aug 27 13:25 vgcore.1034
-rw-------    1 root     root            0 Aug 27 13:25 vgcore.1035
-rw-------    1 root     root            0 Aug 27 13:26 vgcore.1036
-rw-------    1 root     root            0 Aug 27 13:26 vgcore.1040
drwxrwxrwx    3 root     root          224 Aug 27 11:13 zebos
==999== 
==999== HEAP SUMMARY:
==999==     in use at exit: 264 bytes in 3 blocks
==999==   total heap usage: 198 allocs, 195 frees, 67,998 bytes allocated
==999== 
==999== LEAK SUMMARY:
==999==    definitely lost: 16 bytes in 2 blocks
==999==    indirectly lost: 248 bytes in 1 blocks
==999==      possibly lost: 0 bytes in 0 blocks
==999==    still reachable: 0 bytes in 0 blocks
==999==         suppressed: 0 bytes in 0 blocks
==999== Rerun with --leak-check=full to see details of leaked memory
==999== 
==999== For counts of detected and suppressed errors, rerun with: -v
==999== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 21 from 5)
~ #

Comment 15 uk2806 2013-10-18 12:15:38 UTC

Hey,

I am tasked with something on the similar lines. However when I did the patch according to the method given by Dejan, I am not yet able to run valgrind for mips64 code. Are there any dependencies that I am missing?

Utsav

Comment 16 Dejan Jevtic 2013-10-18 12:36:46 UTC

@uk2806

Can you send me the log that you are getting when you try to run Valgrind?

Comment 17 uk2806 2013-10-21 10:49:19 UTC

@Dejan Jevtic
ukumar@ubuntu:~/SampleCode$ valgrind --tool=memcheck ./test.elf
valgrind: ./test.elf: Permission denied

Comment 18 uk2806 2013-10-21 11:25:04 UTC

^ One thing which I failed to mention was that it only shows such an error when I use a mips64 file. If I run valgrind with a normal C file, then it generally does not show any errors.

Comment 19 Julian Seward 2014-05-09 12:01:37 UTC

Dejan, what's the status on this?  Can it be closed now?

Comment 20 Dejan Jevtic 2014-05-09 13:33:44 UTC

@Julian

This bug is fixed and can be closed.

Comment 21 Julian Seward 2014-09-03 06:42:59 UTC

Closing.