This patch provides support for Accelerated TLS. Valgrind currently has missing support for hardware based Thread Context Switching. An optimization on CVM Octeon MIPS allows the "thread local storage pointer" (TLS) to be stored by the kernel in $k0. In MIPS/Linux thread local storage (TLS) is achieved via hardware register 29. GCC and Glibc and CVM VG use the instruction rdhwr v1, $29 to obtain the updated value of the thread pointer. When the hardware register 29 doesn't exist on a MIPS processors , the kernel traps this instruction with a Reserved Instruction Exception the overhead of which is very high. On CVM Octeon, use of k0 and CVMSEG provides a fast access to the thread pointer allowing Octeon Linux to access the thread pointer using a single instruction. This improves the performance of TLS and thread intensive applications. The instruction emulation, takes many hundreds of cycles, while this optimization allows a single cycle local access. This improved access support is part of the Cavium supplied toolchain and programs running under Valgrind on CVM SDK fail if this optimization is not supported. Reproducible: Always Steps to Reproduce: Executing valgrind using any of its regression tests on an Octeon II Processor using Cavium SDK 3.0 will produce this error. Actual Results: ==17094== Memcheck, a memory error detector ==17094== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==17094== Using Valgrind-3.9.0.SVN and LibVEX; rerun with -h for copyright info ==17094== Command: memcheck/tests/malloc1 ==17094== Process terminating with default action of signal 10 (SIGBUS) ==17094== at 0x40190AC: __dl_runtime_resolve (dl-trampoline.c:159) ==17094== by 0x4018D88: _dl_runtime_resolve (in /usr/local/Cavium_Networks/OCTEON-SDK/tools/lib64/ld-2.16.so) ==17094== ==17094== HEAP SUMMARY: ==17094== in use at exit: 0 bytes in 0 blocks ==17094== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==17094== ==17094== All heap blocks were freed -- no leaks are possible ==17094== ==17094== For counts of detected and suppressed errors, rerun with: -v ==17094== ERROR SUMMARY: 9 errors from 8 contexts (suppressed: 0 from 0) ./vg-in-place: line 31: 17094 Bus error VALGRIND_LIB="$vgbasedir/.in_place" VALGRIND_LIB_INNER="$vgbasedir/.in_place" "$vgbasedir/coregrind/valgrind" "$@" Expected Results: ==19650== Memcheck, a memory error detector ==19650== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==19650== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info ==19650== Command: memcheck/tests/inits ==19650== ==19650== Conditional jump or move depends on uninitialised value(s) ==19650== at 0x120000B90: main (inits.c:17) ==19650== ==19650== ==19650== HEAP SUMMARY: ==19650== in use at exit: 0 bytes in 0 blocks ==19650== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==19650== ==19650== All heap blocks were freed -- no leaks are possible ==19650== ==19650== For counts of detected and suppressed errors, rerun with: -v ==19650== Use --track-origins=yes to see where uninitialised values come from ==19650== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
Created attachment 84046 [details] Patch for TLS Issue ****************** PREREQUISITES ***************** ----Testing Environment specification:----- SDK 3.0 with new tool chain Tool-chain tool-121004 gcc version 4.7 glibc version 2.16 kernel version 3.4.27-rt37-Cavium-Octeon cpu info Cavium Octeon II V0.1 MotherBoard OCTEON_CN63XX valgrind 3.10.0 r13752 release 13752 command to checkout: svn co -r 13752 svn://svn.valgrind.org/valgrind/trunk valgrind ****************** Applying patch ***************** copy patch file to valgrind folder and following command should be run to apply patch. run patch -p2 -i valgrind_tls_10_dec_2013.patch ****************** Files patched ***************** patch file coregrind/m_syswrap/syswrap-mips64-linux.c
AFAIU, this is Cavium specific issue that is present for Cavium Octeon and Octeon Plus, but not for Cavium II, correct? If so, we want to limit any changes to early Cavium variants. So, VG_(get_machine_model) should be extended to recognize separate variants (similar to what mips_features.c in the tests folder already does). Further, can we have a test case that is failing rather than relying on what toolchain would generate? Thanks.
Petar, I just realized that the kernel patch to support thread local storage acceleration is not upstream and hence this patch will not make sense beyond the octeon toolchains. Hence I request to ignore this patch and close the bug.
(In reply to Maran Pakkirisamy from comment #3) > Petar, > I just realized that the kernel patch to support thread local storage > acceleration is not upstream and hence this patch will not make sense beyond > the octeon toolchains. Hence I request to ignore this patch and close the > bug. I have got no objections. We can close it as won't fix.