Bug 250631 - svn - armv7: segmentation fault
Summary: svn - armv7: segmentation fault
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.6 SVN
Platform: Compiled Sources Linux
: NOR major
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-09-09 09:46 UTC by Alexander Stohr
Modified: 2010-10-20 18:09 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
console log from proposed test run (199.77 KB, text/plain)
2010-09-09 12:29 UTC, Alexander Stohr
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Stohr 2010-09-09 09:46:02 UTC
i knew its still experimental but it seemed the only option for me...
i gave valgrinds experimental arm support a go.
i used svn sources from this afternoon.
i used a natively compiled version on this platform:

Processor       : ARMv7 Processor rev 3 (v7l)
BogoMIPS        : *****
Features        : swp half thumb fastmult vfp edsp
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x1
CPU part        : 0xc08
CPU revision    : 3
L1 I cache      :VIPT

it is supposed to be some TI OMAP3 chip, meaning the Cortex-A8 core design.

the used compiler has this version information;

gcc (GCC) 4.3.1
Copyright (C) 2008 Free Software Foundation, Inc.

when running it the normal way it simply segfaults.
when running it in gdb it reports the lines appended below.
(for other reasons i am just updating to latest stable gdb.)

as i really want to use your tooling i would be keen on providing you
more information. please instruct me what i should do in this case
to improve the situation for the test reports i can provide to you.

regards, Alex.

PS: i have a long term programming experience - its just a blockage
that i am seeing right now because the segfault is in-between nowhere.


user@machine# gdb valgrind
GNU gdb 6.8                                            
Copyright (C) 2008 Free Software Foundation, Inc.      
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.           
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"   
and "show warranty" for details.                                             
This GDB was configured as "arm-angstrom-linux-gnueabi"...                   
(gdb) run ls -l                                                              
Starting program: /usr/local/bin/valgrind ls -l                              
Executing new program: /usr/local/lib/valgrind/memcheck-arm-linux            
==6437== Memcheck, a memory error detector                                   
==6437== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.     
==6437== Using Valgrind-3.6.0.SVN and LibVEX; rerun with -h for copyright info
==6437== Command: ls -l                                                       
==6437==                                                                      

Program received signal SIGSEGV, Segmentation fault.
0x625ce9f4 in ?? ()                                 
(gdb) bt                                            
#0  0x625ce9f4 in ?? ()
Cannot access memory at address 0x0
(gdb)
Comment 1 Julian Seward 2010-09-09 09:51:43 UTC
What is the result of the command

/usr/local/bin/valgrind -d -d -v -v --trace-flags=10000000 ls -l

(there may be quite a lot of output.  Post or attach it all.)
Comment 2 Alexander Stohr 2010-09-09 12:29:56 UTC
Created attachment 51462 [details]
console log from proposed test run

see attachment.

path to executeable was tuned since its now integrated in the cross-build environment. SVN snapshop is still from the same date/time.

~> valgrind --version
valgrind-3.6.0.SVN
Comment 3 Julian Seward 2010-09-09 13:34:32 UTC
(In reply to comment #2)
> Created an attachment (id=51462) [details]
> console log from proposed test run

Startup was almost completely successful, but the place where the TLS
pointer is obtained from is wrong, and this causes the program to
segfault when starting up libpthread.  ARM Linux has two different
ways to tell the current thread what its TLS pointer is, but Valgrind
only supports one of them, and your Linux setup uses the other.  (I
guess).  I will try to find more information.

==== SB 1158 [tid 1] (0xffff0fe0) SBs exec'd 36436 ====
==== SB 1159 [tid 1] __pthread_initialize_minimal+8(0x498be84) SBs exec'd 36437 ====
==== SB 1160 [tid 1] __pthread_initialize_minimal+40(0x498bea4) SBs exec'd 36438 ====
==11969== Invalid write of size 4
==11969==    at 0x498BEAC: __pthread_initialize_minimal (in /lib/libpthread-2.9.so)
==11969==  Address 0x4002204c is not stack'd, malloc'd or (recently) free'd

==11969== Process terminating with default action of signal 11 (SIGSEGV)
==11969==  Access not within mapped region at address 0x4002204C
==11969==    at 0x498BEAC: __pthread_initialize_minimal (in /lib/libpthread-2.9.so
Comment 4 Alexander Stohr 2010-09-10 12:20:36 UTC
i've found this FAQ source (talking about Android, i'm on OE-OAOS-Angstroem-custom):
http://elinux.org/Android_on_OMAP#TLS_issue

the linked chapter and the two consecutive ones might be of some help for understanding this case.

that far as i can resolve on wikipedia (http://en.wikipedia.org/wiki/Armv7)
the term "armv7" resolves to an ARM Cortex-* model - its supposed to be a Cortex-A8 in form of the TI OMAP3.

The above linked FAQ lists "ARMv6K (MPCORE) and ARMv7 (Cortex). Regarding OMAP, this is OMAP3 (Cortex)." (my case) - it says these models have a "TLS issue" or better say they have an asic add-on that serves for TLS in hardware. other older models ("OMAP1 (ARM9) and OMAP2 (ARM11) don't have this issue. ") dont have this unit.

for my understanding on old devices you will run the "old" method as you have no choice. for new devices you might be able to still run the old method if you platform code is consistent but you rather want to use the "new" method. (sorry if i cant give them a more valid name... i'm typing on the fly)

the android solution seems to be a trap operation making all code the same but serving the intended behavior using a fault handler. - i've to dig my packages in order to see for critical components (e.g. pthreads and mono) to make sure they are consistently configured.

as valgrind is definitely very system dependent i see that this might be the root cause. depending on the TLS design the needed changes might range from
"none" to "runtime-detectable" to "compile time defined". anything that replaces that sole segfault with something more informative is indeed desirable.
(as said, this is writing on the fly - i might be very on error with that.)

my main focus still simply is: valgrind should do its job on that system.
Comment 5 Julian Seward 2010-09-10 12:38:28 UTC
Try applying the inverse of r10973 to your tree, with a command
like this (not sure if this is right)

svn merge -r10973:10972 svn://svn.valgrind.org/valgrind/branches/ARM .

Does that help?
Comment 6 Alexander Stohr 2010-09-13 18:22:16 UTC
found the change in question linked there:
  http://old.nabble.com/ARM-set_tls-syscall-handling-td27407796.html

applied it that way to the targets machine native build environment:
  # patch -R -p2 <../r10973.patch
  patching file coregrind/m_scheduler/scheduler.c
  Hunk #1 succeeded at 1070 (offset 10 lines).
  patching file coregrind/m_syswrap/syswrap-arm-linux.c
  Hunk #1 succeeded at 279 (offset 14 lines).
  patching file coregrind/pub_core_threadstate.h

compiling, installing, testing...

old: # /usr/______bin/valgrind -d -d -v -v --trace-flags=10000000 ls -l
new: # /usr/local/bin/valgrind -d -d -v -v --trace-flags=10000000 ls -l
[...]
==12865==
==12865== HEAP SUMMARY:
==12865==     in use at exit: 14,326 bytes in 38 blocks
==12865==   total heap usage: 106 allocs, 68 frees, 26,143 bytes allocated
==12865==
==12865== Searching for pointers to 38 not-freed blocks
--12865--   Scanning root segment: 0x28000..0x28fff (4096)
--12865--   Scanning root segment: 0x401d000..0x401dfff (4096)
--12865--   Scanning root segment: 0x4022000..0x4023fff (8192)
--12865--   Scanning root segment: 0x4025000..0x4025fff (4096)
--12865--   Scanning root segment: 0x4026000..0x4026fff (4096)
--12865--   Scanning root segment: 0x482e000..0x482efff (4096)
--12865--   Scanning root segment: 0x483e000..0x483efff (4096)
--12865--   Scanning root segment: 0x484d000..0x484dfff (4096)
--12865--   Scanning root segment: 0x485f000..0x485ffff (4096)
--12865--   Scanning root segment: 0x4983000..0x4983fff (4096)
--12865--   Scanning root segment: 0x4984000..0x4986fff (12288)
--12865--   Scanning root segment: 0x49a3000..0x49a3fff (4096)
--12865--   Scanning root segment: 0x49a4000..0x49a5fff (8192)
--12865--   Scanning root segment: 0xbd7fd000..0xbd800fff (16384)
==12865== Checked 72,056 bytes
==12865==
==12865== LEAK SUMMARY:
==12865==    definitely lost: 72 bytes in 2 blocks
==12865==    indirectly lost: 240 bytes in 20 blocks
==12865==      possibly lost: 0 bytes in 0 blocks
==12865==    still reachable: 14,014 bytes in 16 blocks
==12865==         suppressed: 0 bytes in 0 blocks
==12865== Rerun with --leak-check=full to see details of leaked memory
==12865==
==12865== Use --track-origins=yes to see where uninitialised values come from
==12865== ERROR SUMMARY: 2 errors from 1 contexts (suppressed: 25 from 5)
==12865==
==12865== 2 errors in context 1 of 1:
==12865== Conditional jump or move depends on uninitialised value(s)
==12865==    at 0x4016554: index (in /lib/ld-2.9.so)
==12865==
--12865--
--12865-- used_suppression:     25 U1004-ARM-_dl_relocate_object
==12865==
==12865== ERROR SUMMARY: 2 errors from 1 contexts (suppressed: 25 from 5)
--12865:1:core_os  VG_(terminate_NORETURN)(tid=1)

looks like it reached the end of the test run without hitting a segfault.
thank you for that hint. whatever this tells me about the used platform...
Comment 7 Peter Maydell 2010-10-20 18:09:05 UTC
There is also discussion of this issue in bug 254556.