176926 – memcheck floating point exception at valgrind startup with PPC 440EPX

Bug 176926 - memcheck floating point exception at valgrind startup with PPC 440EPX

Summary: memcheck floating point exception at valgrind startup with PPC 440EPX

Status:	RESOLVED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	memcheck (show other bugs)
Version:	3.3.1
Platform:	Unlisted Binaries Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2008-12-04 22:38 UTC by Sebastien
Modified:	2009-01-13 08:51 UTC (History)
CC List:	1 user (show)

See Also:
Latest Commit:
Version Fixed In:

Attachments
12/4/2008 valgrind trunk memcheck core (11.95 KB, application/rar) 2008-12-04 22:40 UTC, Sebastien	Details
valgrind 3.3.1 memcheck core (10.48 KB, application/rar) 2008-12-04 22:41 UTC, Sebastien	Details
Fix ppc machine detection by catching SIGFPE too. (9.10 KB, patch) 2009-01-11 18:20 UTC, Bart Van Assche	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Sebastien 2008-12-04 22:38:34 UTC

Component: memcheck

Summary: getting floating point exception at valgrind startup with PPC 440EPX AMCC

Environment:

https://www.amcc.com/MyAMCC/jsp/public/productDetail/product_detail.jsp?productID=PPC440EPx

uname -r
2.6.18_pro500-440epx_eval

I'm having the same problem with valgrind trunk (12/2/2008) and Valgrind 3.3.1 release.

I am crosscompiling with MontaVista  Linux 5.0 ADK (Virtual Target)

Steps to reproduce

Crosscompile for 440EPX environment:
svn co svn://svn.valgrind.org/valgrind/trunk valgrind
svn cat svn://svn.valgrind.org/valgrind/branches/CROSS_COMPILATION/vex-cross-compilation.patch | patch -p0

./configure --build=i686-linux --host=powerpc-montavista-linux-gnu --target=powerpc-montavista-linux-gnu CC=ppc_440ep-gcc CXX=ppc_440ep-g++ LD=ppc_440ep-ld LDD=ppc_440ep-ldd AR=ppc_440ep-ar AS=ppc_440ep-as NM=ppc_440ep-nm STRIP=ppc_440ep-strip RANLIB=ppc_440ep-ranlib OBJDUMP=ppc_440ep-objdump CPPFLAGS=-I/opt/montavista/pro/devkit/ppc/440ep/target/usr/include/ LDFLAGS=-L/opt/montavista/pro/devkit/ppc/440ep/target/usr/lib/ --prefix=/root/valgrind

above is for using montavista ADK, can be modified accordingly if using another development kit.

make ; make install

on target:
/root/valdrind
floating point exception

I am attaching 2 cores:
memcheck_8.core (with trunk as of 12/2/2008)

memcheck_8.core.3.3.1 (with valgrind 3.3.1 release)

From valgring user's list:

"A source code comment in m_machine.c says the following:
 
 /* ppc32 doesn't seem to have a sane way to find out what insn
    sets the CPU supports.  So we have to arse around with
    SIGILLs.  Yuck. */"

440 online documentation if that can help:

http://tree.celinuxforum.org/CelfPubWiki/BookEandPpc440
https://www.amcc.com/MyAMCC/retrieveDocument/PowerPC/440EPx/PPC440EPx_PB2023.pdf

Also from the user list, it seems that 440GX does not have this problem

Comment 1 Sebastien 2008-12-04 22:40:10 UTC

Created attachment 29047 [details]
12/4/2008 valgrind trunk memcheck core

Comment 2 Sebastien 2008-12-04 22:41:26 UTC

Created attachment 29048 [details]
valgrind 3.3.1 memcheck core

gdb stacktrace

> (gdb) bt
> #0  0x380206d0 in vgPlain_machine_get_hwcaps () at m_machine.c:404
> #1  0x38021b5c in valgrind_main (argc=2, argv=0x7fda8c94, envp=0x7fda8ca0)
> at m_main.c:1312
> #2  0x38024874 in _start_in_C_linux (pArgc=0x7fda8c90) at m_main.c:2327
> #3  0x38020de0 in _start ()

Comment 3 Sebastien 2008-12-04 22:42:47 UTC

Comment on attachment 29047 [details]
12/4/2008 valgrind trunk memcheck core

(gdb) info threads
* 1 process 8932  0x3802e1e0 in vgPlain_machine_get_hwcaps () at m_machine.c:454

(gdb) bt
#0  0x3802e1e0 in vgPlain_machine_get_hwcaps () at m_machine.c:454
#1  0x3802f974 in valgrind_main (argc=2, argv=0x7f930c94, envp=0x7f930ca0) at m_main.c:1390
#2  0x38032a28 in _start_in_C_linux (pArgc=0x7f930c90) at m_main.c:2492
#3  0x3802ea50 in _start ()

Comment 4 Julian Seward 2008-12-22 15:32:43 UTC

This looks to me like a problem with signal handling.  m_machine.c
sets up to catch a SIGILL, then tries the instruction.  If it gets a
signal then it knows the instruction is not supported.  However, it
looks instead like a SIGFPE arrives, not a SIGILL; it is not
caught and so kills the process.  It should be easy enough to extend
the logic to trap both.

Comment 5 Bart Van Assche 2009-01-11 18:20:40 UTC

Created attachment 30135 [details]
Fix ppc machine detection by catching SIGFPE too.

Can you please verify whether the attached patch solves the floating point exception and makes Valgrind work properly ?

Comment 6 Sebastien 2009-01-12 16:39:04 UTC

Thanks, I'm about to test the patch, keep posted

Comment 7 Sebastien 2009-01-12 18:10:16 UTC

Hi,

The original problem seems to be fixed, thanks! I am however seeing some other problems down the road, for instance I am getting disInstr fatal errors (shown below).

Should I open a new bug or keep the same one for this? If same, are there some specific options I should run valgrind with to get more information on the disInstr problem to attach the log file?

==21074==
disInstr(ppc): unhandled instruction: 0x7D20009D
                 primary 31(0x1F), secondary 157(0x9D)
==21074== valgrind: Unrecognised instruction at address 0xfe9be20.
==21074== Your program just tried to execute an instruction that Valgrind
==21074== did not recognise.  There are two possible reasons for this.
==21074== 1. Your program has a bug and erroneously jumped to a non-code
==21074==    location.  If you are running Memcheck and you just saw a
==21074==    warning about a bad jump, it's probably your program's fault.
==21074== 2. The instruction is legitimate but Valgrind doesn't handle it,
==21074==    i.e. it's Valgrind's fault.  If you think this is the case or
==21074==    you are not sure, please let us know and we'll try to fix it.
==21074== Either way, Valgrind will now raise a SIGILL signal which will
==21074== probably kill your program.

Comment 8 Bart Van Assche 2009-01-12 18:34:57 UTC

Please open a new bug for the unhandled instruction -- the unhandled instruction issue is not related to the floating point exception triggered by the machine detection code. Thanks for testing.

Comment 9 Sebastien 2009-01-13 00:44:43 UTC

OK with 
https://bugs.kde.org/attachment.cgi?id=30135
patch then. Thanks!

Comment 10 Bart Van Assche 2009-01-13 08:51:27 UTC

A fix has been committed on the trunk (r8945) and will be included in one of the next releases (either 3.4.1 or 3.5.0).