Component: memcheck Summary: getting floating point exception at valgrind startup with PPC 440EPX AMCC Environment: https://www.amcc.com/MyAMCC/jsp/public/productDetail/product_detail.jsp?productID=PPC440EPx uname -r 2.6.18_pro500-440epx_eval I'm having the same problem with valgrind trunk (12/2/2008) and Valgrind 3.3.1 release. I am crosscompiling with MontaVista Linux 5.0 ADK (Virtual Target) Steps to reproduce Crosscompile for 440EPX environment: svn co svn://svn.valgrind.org/valgrind/trunk valgrind svn cat svn://svn.valgrind.org/valgrind/branches/CROSS_COMPILATION/vex-cross-compilation.patch | patch -p0 ./configure --build=i686-linux --host=powerpc-montavista-linux-gnu --target=powerpc-montavista-linux-gnu CC=ppc_440ep-gcc CXX=ppc_440ep-g++ LD=ppc_440ep-ld LDD=ppc_440ep-ldd AR=ppc_440ep-ar AS=ppc_440ep-as NM=ppc_440ep-nm STRIP=ppc_440ep-strip RANLIB=ppc_440ep-ranlib OBJDUMP=ppc_440ep-objdump CPPFLAGS=-I/opt/montavista/pro/devkit/ppc/440ep/target/usr/include/ LDFLAGS=-L/opt/montavista/pro/devkit/ppc/440ep/target/usr/lib/ --prefix=/root/valgrind above is for using montavista ADK, can be modified accordingly if using another development kit. make ; make install on target: /root/valdrind floating point exception I am attaching 2 cores: memcheck_8.core (with trunk as of 12/2/2008) memcheck_8.core.3.3.1 (with valgrind 3.3.1 release) From valgring user's list: "A source code comment in m_machine.c says the following: /* ppc32 doesn't seem to have a sane way to find out what insn sets the CPU supports. So we have to arse around with SIGILLs. Yuck. */" 440 online documentation if that can help: http://tree.celinuxforum.org/CelfPubWiki/BookEandPpc440 https://www.amcc.com/MyAMCC/retrieveDocument/PowerPC/440EPx/PPC440EPx_PB2023.pdf Also from the user list, it seems that 440GX does not have this problem
Created attachment 29047 [details] 12/4/2008 valgrind trunk memcheck core
Created attachment 29048 [details] valgrind 3.3.1 memcheck core gdb stacktrace > (gdb) bt > #0 0x380206d0 in vgPlain_machine_get_hwcaps () at m_machine.c:404 > #1 0x38021b5c in valgrind_main (argc=2, argv=0x7fda8c94, envp=0x7fda8ca0) > at m_main.c:1312 > #2 0x38024874 in _start_in_C_linux (pArgc=0x7fda8c90) at m_main.c:2327 > #3 0x38020de0 in _start ()
Comment on attachment 29047 [details] 12/4/2008 valgrind trunk memcheck core (gdb) info threads * 1 process 8932 0x3802e1e0 in vgPlain_machine_get_hwcaps () at m_machine.c:454 (gdb) bt #0 0x3802e1e0 in vgPlain_machine_get_hwcaps () at m_machine.c:454 #1 0x3802f974 in valgrind_main (argc=2, argv=0x7f930c94, envp=0x7f930ca0) at m_main.c:1390 #2 0x38032a28 in _start_in_C_linux (pArgc=0x7f930c90) at m_main.c:2492 #3 0x3802ea50 in _start ()
This looks to me like a problem with signal handling. m_machine.c sets up to catch a SIGILL, then tries the instruction. If it gets a signal then it knows the instruction is not supported. However, it looks instead like a SIGFPE arrives, not a SIGILL; it is not caught and so kills the process. It should be easy enough to extend the logic to trap both.
Created attachment 30135 [details] Fix ppc machine detection by catching SIGFPE too. Can you please verify whether the attached patch solves the floating point exception and makes Valgrind work properly ?
Thanks, I'm about to test the patch, keep posted
Hi, The original problem seems to be fixed, thanks! I am however seeing some other problems down the road, for instance I am getting disInstr fatal errors (shown below). Should I open a new bug or keep the same one for this? If same, are there some specific options I should run valgrind with to get more information on the disInstr problem to attach the log file? ==21074== disInstr(ppc): unhandled instruction: 0x7D20009D primary 31(0x1F), secondary 157(0x9D) ==21074== valgrind: Unrecognised instruction at address 0xfe9be20. ==21074== Your program just tried to execute an instruction that Valgrind ==21074== did not recognise. There are two possible reasons for this. ==21074== 1. Your program has a bug and erroneously jumped to a non-code ==21074== location. If you are running Memcheck and you just saw a ==21074== warning about a bad jump, it's probably your program's fault. ==21074== 2. The instruction is legitimate but Valgrind doesn't handle it, ==21074== i.e. it's Valgrind's fault. If you think this is the case or ==21074== you are not sure, please let us know and we'll try to fix it. ==21074== Either way, Valgrind will now raise a SIGILL signal which will ==21074== probably kill your program.
Please open a new bug for the unhandled instruction -- the unhandled instruction issue is not related to the floating point exception triggered by the machine detection code. Thanks for testing.
OK with https://bugs.kde.org/attachment.cgi?id=30135 patch then. Thanks!
A fix has been committed on the trunk (r8945) and will be included in one of the next releases (either 3.4.1 or 3.5.0).