Bug 328670 - Support for Accelerated Thread Local Storage (TLS) Access optimization on CVM MIPS
Summary: Support for Accelerated Thread Local Storage (TLS) Access optimization on CVM...
Status: RESOLVED INTENTIONAL
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (other bugs)
Version First Reported In: 3.9.0
Platform: Debian stable Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-12-11 15:09 UTC by Dr. Zahid Anwar, School of Electrical Engg. & Computer Science, National Univ. of Sciences & Technology, Islamabad, Pakistan
Modified: 2015-03-31 18:07 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments
Patch for TLS Issue (1.57 KB, patch)
2013-12-11 15:13 UTC, Dr. Zahid Anwar, School of Electrical Engg. & Computer Science, National Univ. of Sciences & Technology, Islamabad, Pakistan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dr. Zahid Anwar, School of Electrical Engg. & Computer Science, National Univ. of Sciences & Technology, Islamabad, Pakistan 2013-12-11 15:09:57 UTC
This patch provides support for Accelerated TLS. Valgrind currently has missing support for hardware based Thread Context Switching. An optimization on CVM Octeon MIPS allows the "thread local storage pointer" (TLS) to be stored by the kernel in $k0. 

In MIPS/Linux thread local storage (TLS) is achieved via hardware register 29. GCC and Glibc and CVM VG use the instruction rdhwr v1, $29 to obtain the updated value
of   the   thread   pointer.   When the hardware register   29   doesn't   exist   on a MIPS   processors , the  kernel traps this  instruction  with  a  Reserved Instruction Exception the overhead of which is very high. On CVM Octeon, use of k0 and CVMSEG provides a fast access to the thread pointer allowing Octeon Linux to access the thread pointer using a single 
instruction. This improves the performance of TLS and thread intensive applications.
The instruction emulation, takes many hundreds of cycles, while this optimization allows a
single  cycle  local  access. This improved access support  is  part  of the Cavium  supplied
toolchain and programs running under Valgrind on CVM SDK fail if this optimization is not supported.

Reproducible: Always

Steps to Reproduce:
Executing valgrind using any of its regression tests on an Octeon II Processor using Cavium SDK 3.0 will produce this error.
Actual Results:  
==17094== Memcheck, a memory error detector
==17094== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==17094== Using Valgrind-3.9.0.SVN and LibVEX; rerun with -h for copyright info
==17094== Command: memcheck/tests/malloc1
==17094== Process terminating with default action of signal 10 (SIGBUS)
==17094==    at 0x40190AC: __dl_runtime_resolve (dl-trampoline.c:159)
==17094==    by 0x4018D88: _dl_runtime_resolve (in /usr/local/Cavium_Networks/OCTEON-SDK/tools/lib64/ld-2.16.so)
==17094== 
==17094== HEAP SUMMARY:
==17094==     in use at exit: 0 bytes in 0 blocks
==17094==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==17094== 
==17094== All heap blocks were freed -- no leaks are possible
==17094== 
==17094== For counts of detected and suppressed errors, rerun with: -v
==17094== ERROR SUMMARY: 9 errors from 8 contexts (suppressed: 0 from 0)
./vg-in-place: line 31: 17094 Bus error               VALGRIND_LIB="$vgbasedir/.in_place" VALGRIND_LIB_INNER="$vgbasedir/.in_place" "$vgbasedir/coregrind/valgrind" "$@"

Expected Results:  
==19650== Memcheck, a memory error detector
==19650== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==19650== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==19650== Command: memcheck/tests/inits
==19650== 
==19650== Conditional jump or move depends on uninitialised value(s)
==19650==    at 0x120000B90: main (inits.c:17)
==19650== 
==19650== 
==19650== HEAP SUMMARY:
==19650==     in use at exit: 0 bytes in 0 blocks
==19650==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==19650== 
==19650== All heap blocks were freed -- no leaks are possible
==19650== 
==19650== For counts of detected and suppressed errors, rerun with: -v
==19650== Use --track-origins=yes to see where uninitialised values come from
==19650== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
Comment 1 Dr. Zahid Anwar, School of Electrical Engg. & Computer Science, National Univ. of Sciences & Technology, Islamabad, Pakistan 2013-12-11 15:13:17 UTC
Created attachment 84046 [details]
Patch for TLS Issue

******************  PREREQUISITES   *****************

----Testing Environment specification:-----
																			
SDK             3.0 with new tool chain																		
Tool-chain	tool-121004																		
gcc version	4.7																		
glibc version	2.16																		
kernel version	3.4.27-rt37-Cavium-Octeon																		
cpu info	Cavium Octeon II V0.1																		
MotherBoard	OCTEON_CN63XX																		

valgrind 3.10.0 	r13752 release 13752
command to checkout: svn co -r 13752 svn://svn.valgrind.org/valgrind/trunk valgrind

******************  Applying patch   *****************

copy patch file to valgrind folder and following command should be run to apply patch.
run patch -p2 -i valgrind_tls_10_dec_2013.patch

******************  Files patched   *****************
patch file coregrind/m_syswrap/syswrap-mips64-linux.c
Comment 2 Petar Jovanovic 2014-11-19 18:44:39 UTC
AFAIU, this is Cavium specific issue that is present for Cavium Octeon and Octeon Plus, but not for Cavium II, correct? If so, we want to limit any changes to early Cavium variants. So, VG_(get_machine_model) should be extended to recognize separate variants (similar to what mips_features.c in the tests folder already does).

Further, can we have a test case that is failing rather than relying on what toolchain would generate?

Thanks.
Comment 3 Maran Pakkirisamy 2014-11-20 06:37:59 UTC
Petar,
I just realized that the kernel patch to support thread local storage acceleration is not upstream and hence this patch will not make sense beyond the octeon toolchains. Hence I request to ignore this patch and close the bug.
Comment 4 Petar Jovanovic 2014-11-20 12:54:39 UTC
(In reply to Maran Pakkirisamy from comment #3)
> Petar,
> I just realized that the kernel patch to support thread local storage
> acceleration is not upstream and hence this patch will not make sense beyond
> the octeon toolchains. Hence I request to ignore this patch and close the
> bug.

I have got no objections. We can close it as won't fix.