Summary: | Support for Cavium MIPS Octeon Atomic and Count Instructions | ||
---|---|---|---|
Product: | [Developer tools] valgrind | Reporter: | Dr. Zahid Anwar, School of Electrical Engg. & Computer Science, National Univ. of Sciences & Technology, Islamabad, Pakistan <zahid.anwar> |
Component: | vex | Assignee: | Julian Seward <jseward> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | dejanjevtic87, florian, maran.pakkirisamy, mips32r2 |
Priority: | NOR | ||
Version: | 3.9.0 | ||
Target Milestone: | --- | ||
Platform: | Debian stable | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Attachments: |
Add support cavium octeon Atomic and count instructions
Test Cases Output File Adds support for CVM MIPS atomic and count instructions Test Cases VGTEST file Cavium atomic instructions Cavium atomic tests SAA testcase fix cvm_atomic_thread |
Description
Dr. Zahid Anwar, School of Electrical Engg. & Computer Science, National Univ. of Sciences & Technology, Islamabad, Pakistan
2013-11-06 13:12:34 UTC
Comment 1
Dr. Zahid Anwar, School of Electrical Engg. & Computer Science, National Univ. of Sciences & Technology, Islamabad, Pakistan
2013-11-06 13:21:26 UTC
Created attachment 83370 [details] Add support cavium octeon Atomic and count instructions ****************** PRE-REQUISITES ***************** ----Testing Enviroment specification:----- SDK 3.0 with new tool chain Tool-chain tool-121004 gcc version 4.7 glibc version 2.11 kernel version 3.4.27-rt37-Cavium-Octeon cpu info Cavium Octeon II V0.1 MotherBoard OCTEON_CN63XX valgrind 3.9.0 r13667 release 13667 command to checkout: svn co -r 13667 svn://svn.valgrind.org/valgrind/trunk valgrind ****************** Applying patch ***************** Note: It is assumed that the previous Patch for Bug #:326444 (Link: https://bugs.kde.org/show_bug.cgi?id=326444) has been already applied. copy patch file to valgrind folder and following command should be run to apply patch. run patch -p1 -i valgrind_r13667_6_Nov_2013.patch ****************** Files patched ***************** patch file VEX/priv/guest_mips_toIR.c patch file VEX/priv/host_mips_defs.c patch file VEX/priv/host_mips_defs.h patch file VEX/priv/host_mips_isel.c patch file VEX/priv/ir_defs.c patch file VEX/priv/ir_opt.c patch file VEX/priv/main_main.c patch file VEX/pub/libvex_ir.h ******** Instructions added (Cavium specific) ****** - BADDU - POP - DPOP - SAA - SAAD - LAA - LAAD - LAW - LAWD - LAI - LAID - LAS - LASD - LAD - LADD - LAC - LACD
Comment 2
Dr. Zahid Anwar, School of Electrical Engg. & Computer Science, National Univ. of Sciences & Technology, Islamabad, Pakistan
2013-11-06 13:25:05 UTC
Created attachment 83373 [details]
Test Cases
Comment 3
Dr. Zahid Anwar, School of Electrical Engg. & Computer Science, National Univ. of Sciences & Technology, Islamabad, Pakistan
2013-11-06 13:26:11 UTC
Created attachment 83374 [details]
Output File
Can you please address all issues mentioned in #326444 that are applicable to this change? Further, can you add some comments to cvm_atomic.c - e.g. what does each macro do and such? vgtest file seems to be missing.
Comment 5
Dr. Zahid Anwar, School of Electrical Engg. & Computer Science, National Univ. of Sciences & Technology, Islamabad, Pakistan
2013-12-11 14:08:23 UTC
(In reply to comment #4) > Can you please address all issues mentioned in #326444 that are applicable > to this change? > Patch has been updated to address all applicable issues in #326444 which were 1) valgrind code style conformance and 2) rebase with ToT (revision 13752) > Further, can you add some comments to cvm_atomic.c - e.g. what does each > macro > do and such? > > vgtest file seems to be missing. Detailed comments and vg_test file have been added.
Comment 6
Dr. Zahid Anwar, School of Electrical Engg. & Computer Science, National Univ. of Sciences & Technology, Islamabad, Pakistan
2013-12-11 14:11:58 UTC
Created attachment 84043 [details] Adds support for CVM MIPS atomic and count instructions ****************** PRE-REQUISITES ***************** ----Testing Enviroment specification:----- SDK 3.0 with new tool chain Tool-chain tool-121004 gcc version 4.7 glibc version 2.16 kernel version 3.4.27-rt37-Cavium-Octeon cpu info Cavium Octeon II V0.1 MotherBoard OCTEON_CN63XX valgrind 3.10.0 r13752 release 13752 command to checkout: svn co -r 13752 svn://svn.valgrind.org/valgrind/trunk valgrind ****************** Applying patch ***************** Note: It is assumed that the previous Patch for Bug #:326444 (Link: https://bugs.kde.org/show_bug.cgi?id=326444) has been already applied. copy patch file to valgrind folder and following command should be run to apply patch. run patch -p2 -i valgrind_bug327223_10_Dec_2013.patch ****************** Files patched ***************** patched file memcheck/mc_translate.c patched file none/tests/mips64/Makefile.am patched file VEX/priv/guest_mips_toIR.c patched file VEX/priv/host_mips_defs.c patched file VEX/priv/host_mips_defs.h patched file VEX/priv/host_mips_isel.c patched file VEX/priv/ir_defs.c patched file VEX/priv/ir_opt.c patched file VEX/priv/main_main.c patched file VEX/pub/libvex_ir.h ******** Instructions added (CVM MIPS) ****** - BADDU - POP - DPOP - SAA - SAAD - LAA - LAAD - LAW - LAWD - LAI - LAID - LAS - LASD - LAD - LADD - LAC - LACD
Comment 7
Dr. Zahid Anwar, School of Electrical Engg. & Computer Science, National Univ. of Sciences & Technology, Islamabad, Pakistan
2013-12-11 14:15:00 UTC
Created attachment 84044 [details]
Test Cases
Comment 8
Dr. Zahid Anwar, School of Electrical Engg. & Computer Science, National Univ. of Sciences & Technology, Islamabad, Pakistan
2013-12-11 14:15:27 UTC
Created attachment 84045 [details]
VGTEST file
@Dr. Zahid Anwar Did you try to implement Cavium MIPS Octeon special atomic instructions by using IRStmt_LLSC ? Petar, Dejan, what's the status on this? (In reply to comment #10) > Petar, Dejan, what's the status on this? This patch seems fine and can be applied after some cleanup, but it would add new Iops that likely necessary: Iop_LoadAtomicAdd, Iop_LoadAtomicSwap, Iop_LoadAtomicInc, Iop_LoadAtomicDec, Iop_LoadAtomicSet, Iop_LoadAtomicClr, Are you fine with that? (In reply to comment #11) > Are you fine with that? No .. I would prefer to do this without changing the core IR if at all possible. What are the semantics of these instructions? Specifically, what are the atomicity properties? (In reply to comment #12) > (In reply to comment #11) > > Are you fine with that? > > No .. I would prefer to do this without changing the core IR if at all > possible. What are the semantics of these instructions? Specifically, what > are the atomicity properties? I presumed you would prefer that, this is why the issue is still open. Here is a description of some of the instructions: #1 LADD - to load and decrement a doubleword of memory atomically (the 64-bit aligned address specified by register base is decremented after being read, and the register rd gets the memory read result. LAAD rd, (base), rt GPR[rd] = Mem[GPR[base]] // atomic Mem[GPR[base]] = Mem[GPR[base]] + GPR[rt] // atomic #2 SAAD - to atomically add a doubleword to a memory location SAAD rt, (base) memory[base] = memory[base] + rt // atomic (In reply to comment #13) > (In reply to comment #12) > > (In reply to comment #11) > > > Are you fine with that? > > > > No .. I would prefer to do this without changing the core IR if at all > > possible. What are the semantics of these instructions? Specifically, what > > are the atomicity properties? > > I presumed you would prefer that, this is why the issue is still open. > > Here is a description of some of the instructions: > > #1 > LADD - to load and decrement a doubleword of memory atomically (the 64-bit > aligned address specified by register base is decremented after being read, > and the register rd gets the memory read result. > > LAAD rd, (base), rt > GPR[rd] = Mem[GPR[base]] // atomic > Mem[GPR[base]] = Mem[GPR[base]] + GPR[rt] // atomic > s390 has insns like that as well. For instance "load and add" which reads: The second operand [memory] is added to the third operand [gpr], and the sum is placed at the second-operand loca- tion [memory]. Subsequently, the original contents of the sec- ond operand [memory] (prior to the addition) are placed unchanged at the first-operand location [gpr]. ... The fetch of the second operand [memory] for purposes of loading and the store into the second-operand loca- tion [memory] appear to be a block-concurrent interlocked- update reference as observed by other CPUs. I've implemented this using mkIRCAS. Function s390_irgen_load_and_add64 in guest_s390_toIR.c does the work. Perhaps you can use it for inspiration .... and tell me whether I got it right :) I remember it was a bit of a pain to do.. (In reply to comment #14) > (In reply to comment #13) > > (In reply to comment #12) > > > (In reply to comment #11) > > > > Are you fine with that? > > > > > > No .. I would prefer to do this without changing the core IR if at all > > > possible. What are the semantics of these instructions? Specifically, what > > > are the atomicity properties? > > > > I presumed you would prefer that, this is why the issue is still open. > > > > Here is a description of some of the instructions: > > > > #1 > > LADD - to load and decrement a doubleword of memory atomically (the 64-bit > > aligned address specified by register base is decremented after being read, > > and the register rd gets the memory read result. > > > > LAAD rd, (base), rt > > GPR[rd] = Mem[GPR[base]] // atomic > > Mem[GPR[base]] = Mem[GPR[base]] + GPR[rt] // atomic > > > > s390 has insns like that as well. For instance "load and add" which reads: > > The second operand [memory] is added to the third operand [gpr], > and the sum is placed at the second-operand loca- > tion [memory]. Subsequently, the original contents of the sec- > ond operand [memory] (prior to the addition) are placed > unchanged at the first-operand location [gpr]. > ... > The fetch of the second operand [memory] for purposes of > loading and the store into the second-operand loca- > tion [memory] appear to be a block-concurrent interlocked- > update reference as observed by other CPUs. > > I've implemented this using mkIRCAS. > Function s390_irgen_load_and_add64 in guest_s390_toIR.c does the work. > Perhaps you can use it for inspiration .... and tell me whether I got it > right :) > I remember it was a bit of a pain to do.. Thanks Florian for bringing it up. Yes, that is a better approach than the one in the proposed patch. Created attachment 86813 [details]
Cavium atomic instructions
Hello all,
this patch is adding support for Cavium atomic instructions
based on function s390_irgen_load_and_add64 in guest_s390_toIR.c
like Florian Krohm suggested.
Created attachment 86814 [details]
Cavium atomic tests
This patch is adding additional tests for Cavium atomic instructions.
Patches are committed in revision r13994 VEX r2865. This bug can be closed. 1#include <stdio.h> 2 3void saa() { 4 unsigned int rt = 0x11111111; 5 unsigned int out = 0, out_pre = 0; 6 unsigned int offset = 4; 7 unsigned int base[] = {0x22222222, 0x33333333}; 8 9 __asm__ volatile ( 10 "move $t0, %2" "\n\t" /* mem */ 11 "move $t1, %3" "\n\t" /* offset */ 12 "daddu $t0, $t1, $t0""\n\t" /* mem + offset */ 13 "lw %1, 0($t0)" "\n\t" /* out_pre = *(mem + offset) */ 14 "move $t2, %4" "\n\t" /* t2 = rt */ 15 "saa $t2, ($t0)" "\n\t" 16 "lw %0, 0($t0)" "\n\t" /* out = *(mem + offset) */ 17 : "=&r" (out), "=&r" (out_pre) 18 : "r" (base), "r" (offset), "r" (rt) 19 : "$12", "$13", "$14", "cc", "memory" 20 ); 21 22 printf("saa rt: %x out: %x out_pre: %x\n", rt, out, out_pre); 23} 24 25int main() { 26 saa(); 27 28 return 0; 29} The above saa instruction(Cavium Octeon specific) testcase produces following valgrind report. valgrind --track-origins=yes --quiet ./saa ==2587== Conditional jump or move depends on uninitialised value(s) ==2587== at 0x10000B4C: saa (saa.c:9) ==2587== by 0x10000BD8: main (saa.c:26) ==2587== saa rt: 11111111 out: 44444444 out_pre: 33333333 The following patch helps to get rid of the above valgrind error. --- a/priv/guest_mips_toIR.c +++ b/priv/guest_mips_toIR.c @@ -2203,7 +2203,10 @@ static void mips_irgen_load_and_add32(IRTemp op1addr, IRTemp n /* If old_mem contains the expected value, then the CAS succeeded. Otherwise, it did not */ - jump_back(binop(Iop_CmpNE32, mkexpr(old_mem), mkexpr(expd))); + jump_back(binop(Iop_CmpNE64, + mkWidenFrom32(Ity_I64, mkexpr(old_mem), True), + mkWidenFrom32(Ity_I64, mkexpr(expd), True))); + if (putIntoRd) putIReg(rd, mkWidenFrom32(Ity_I64, mkexpr(old_mem), True)); } I am not sure if this is the right fix, but for your perusal. Created attachment 88882 [details]
SAA testcase fix
Excerpt from cvm_atomic.stdout.exp-LE
saa :: value: 0x42b0c0a28677b502, memPre: 0x82d2ab1300000000, mem: 0x82d2ab138677b502
saa :: value: 0x9e705cc51ad8dca0, memPre: 0x7e876382d2ab13, mem: 0x7e87639dab87b3
saa :: value: 0x47f505569a08a180, memPre: 0xc31510f3007e8763, mem: 0xc31510f39a8728e3
...
saad :: value: 0x9e705cc51ad8dca0, memPre: 0x7e876382d2ab13, mem: 0x9eeee4289dab87b3
saad :: value: 0x94ff52fc81afa797, memPre: 0x976d6e9ac31510f3, mem: 0x2c6cc19744c4b88a
saad :: value: 0x3c2cd9a9cda20766, memPre: 0xb7746d775ad6a5fb, mem: 0xf3a147212878ad61
...
It is expected (value + mempre) == mem in both the cases of SAA and SAAD.
It can be observed that SAAD entries hold the condition while SAA entries don't.
On analysing cvm_atomic.c, I found that 64 bit inputs are provided to both SAA and SAAD instructions (macro TEST3). SAA fails to execute as expected if its input are 64 bit.
I have attached a patch to fix this. The patch addresses only SAA instruction. If the approach of the fix is accepted, I will post a proper patch.
The other 32 bit atomic instructions also suffer from the same issue and they also seem to require similar fix.
laa
law
lai
lad
las
lac
Another question:
Why is expected output file is named as cvm_atomic.stdout.exp-LE?
I assume LE is little endian. AFAIK, Octeon processors run as big endian.
--Maran
If the claims sound valid, I request to reopen the issue. Created attachment 89007 [details] cvm_atomic_thread 32 bit instructions(saa, laa, law, lai, lad, lac, las) are passed with 64 bit parameters. As mentioned in comment #20, SAA does not execute as expected when 64 bit integer parametrs are passed. Though the result of other 32 bit version of atomic instructions are not affected even when 64 bit integers are passed, I though its better to fix them (to pass 32 bit parameters to 32 bit instructions) as well. Also a superfluous memset has been removed. The patch demands an update to stdout.exp file as well. I will post it, if the changes of this patch are consented. (In reply to Maran Pakkirisamy from comment #19) > 1#include <stdio.h> > 2 > 3void saa() { > 4 unsigned int rt = 0x11111111; > 5 unsigned int out = 0, out_pre = 0; > 6 unsigned int offset = 4; > 7 unsigned int base[] = {0x22222222, 0x33333333}; > 8 > 9 __asm__ volatile ( > 10 "move $t0, %2" "\n\t" /* mem */ > 11 "move $t1, %3" "\n\t" /* offset */ > 12 "daddu $t0, $t1, $t0""\n\t" /* mem + offset */ > 13 "lw %1, 0($t0)" "\n\t" /* out_pre = *(mem + > offset) */ > 14 "move $t2, %4" "\n\t" /* t2 = rt */ > 15 "saa $t2, ($t0)" "\n\t" > 16 "lw %0, 0($t0)" "\n\t" /* out = *(mem + offset) > */ > 17 : "=&r" (out), "=&r" (out_pre) > 18 : "r" (base), "r" (offset), "r" (rt) > 19 : "$12", "$13", "$14", "cc", "memory" > 20 ); > 21 > 22 printf("saa rt: %x out: %x out_pre: %x\n", rt, out, out_pre); > 23} > 24 > 25int main() { > 26 saa(); > 27 > 28 return 0; > 29} > > The above saa instruction(Cavium Octeon specific) testcase produces > following valgrind report. > > valgrind --track-origins=yes --quiet ./saa > ==2587== Conditional jump or move depends on uninitialised value(s) > ==2587== at 0x10000B4C: saa (saa.c:9) > ==2587== by 0x10000BD8: main (saa.c:26) > ==2587== > saa rt: 11111111 out: 44444444 out_pre: 33333333 > > The following patch helps to get rid of the above valgrind error. > > --- a/priv/guest_mips_toIR.c > +++ b/priv/guest_mips_toIR.c > @@ -2203,7 +2203,10 @@ static void mips_irgen_load_and_add32(IRTemp op1addr, > IRTemp n > > /* If old_mem contains the expected value, then the CAS succeeded. > Otherwise, it did not */ > - jump_back(binop(Iop_CmpNE32, mkexpr(old_mem), mkexpr(expd))); > + jump_back(binop(Iop_CmpNE64, > + mkWidenFrom32(Ity_I64, mkexpr(old_mem), True), > + mkWidenFrom32(Ity_I64, mkexpr(expd), True))); > + > if (putIntoRd) > putIReg(rd, mkWidenFrom32(Ity_I64, mkexpr(old_mem), True)); > } > > I am not sure if this is the right fix, but for your perusal. This issue seems to be valid, but I do not think it is necessarily related to how saa is modeled, but rather what code is later generated in this case. Obviously, what you are seeing is a false warning and it seems to be related what code gets generated for Iop_CmpNE32 on MIPS64 platform. You should open a new issue for discussing this. (In reply to Maran Pakkirisamy from comment #20) > Created attachment 88882 [details] > SAA testcase fix > > Excerpt from cvm_atomic.stdout.exp-LE > > saa :: value: 0x42b0c0a28677b502, memPre: 0x82d2ab1300000000, mem: > 0x82d2ab138677b502 > saa :: value: 0x9e705cc51ad8dca0, memPre: 0x7e876382d2ab13, mem: > 0x7e87639dab87b3 > saa :: value: 0x47f505569a08a180, memPre: 0xc31510f3007e8763, mem: > 0xc31510f39a8728e3 > ... > saad :: value: 0x9e705cc51ad8dca0, memPre: 0x7e876382d2ab13, mem: > 0x9eeee4289dab87b3 > saad :: value: 0x94ff52fc81afa797, memPre: 0x976d6e9ac31510f3, mem: > 0x2c6cc19744c4b88a > saad :: value: 0x3c2cd9a9cda20766, memPre: 0xb7746d775ad6a5fb, mem: > 0xf3a147212878ad61 > ... > It is expected (value + mempre) == mem in both the cases of SAA and SAAD. > It can be observed that SAAD entries hold the condition while SAA entries > don't. > > On analysing cvm_atomic.c, I found that 64 bit inputs are provided to both > SAA and SAAD instructions (macro TEST3). SAA fails to execute as expected if > its input are 64 bit. > Correct me if I am wrong, but description of SAA says that it takes the least-significant 32-bit word and it adds it to the memory. So, 64-bit values are not a problem. > I have attached a patch to fix this. The patch addresses only SAA > instruction. If the approach of the fix is accepted, I will post a proper > patch. > I do not see any problem, so first we need to define a problem before we discuss a patch. Can you create a test which will give different output when executed under Valgrind? Please open a new issue for that. > The other 32 bit atomic instructions also suffer from the same issue and > they also seem to require similar fix. > laa > law > lai > lad > las > lac > > Another question: > Why is expected output file is named as cvm_atomic.stdout.exp-LE? > I assume LE is little endian. AFAIK, Octeon processors run as big endian. > LE is little endian indeed. No, Octeon processors are not BE-only. > This issue seems to be valid, but I do not think it is necessarily > related to how saa is modeled, but rather what code is later generated > in this case. Obviously, what you are seeing is a false warning and it > seems to be related what code gets generated for Iop_CmpNE32 on MIPS64 > platform. > You should open a new issue for discussing this. Ok. Created #341481 https://bugs.kde.org/show_bug.cgi?id=341481 (In reply to Petar Jovanovic from comment #24) > (In reply to Maran Pakkirisamy from comment #20) > Correct me if I am wrong, but description of SAA says that it takes the > least-significant 32-bit word and it adds it to the memory. > So, 64-bit values are not a problem. > > > I have attached a patch to fix this. The patch addresses only SAA > > instruction. If the approach of the fix is accepted, I will post a proper > > patch. > > > I do not see any problem, so first we need to define a problem before we > discuss a patch. Can you create a test which will give different output > when executed under Valgrind? Please open a new issue for that. > Sorry for not making it clear. The problem I claim is not with the valgrind implementation of SAA (and other 32 bit atomic instructions), but with the testcase. Let us take an SAA entry in the output file cvm_atomic.stdout.exp-LE saa :: value: 0x94ff52fc81afa797, memPre: 0x976d6e9ac31510f3, mem: 0x976d6e9a44c4b88a As mentioned already, each entry is supposed satisfy the condition mem == value + memPre # SAA adds "value" to old content(memPre) at a mem loc. and overwrites the mem loc. by the result(mem). You can observe that the above entry does not hold the condition. But SAA takes the second half of memPre (0xc31510f3 in little endian case) and the second half of value (0x81afa797) and replaces second half of mem. So, SAA functions as expected (with and without valgrind) but only that the output printed by the testcase does not make complete sense. The reason being 64bit datatype parameters are handled by the macro TEST3 for a 32 bit instruction. However, the problem in the testcase does not impact its validity, as the same output is going to be produced with and without valgrind since the arrays reg_val_double, reg_val_double_copy are already initialized. In big endian case, SAA takes the first half of memPre (0x976d6e9a) and the second half of value (0x81afa797) and replaces first half of mem. So, with the testcase as such, we will have to create a separate stdout.exp for BE - cvm_atomic.stdout.exp-BE. However, if the 32 bit instructions are handled properly with 32 bit datatypes, I think single stdout.exp will be sufficient for both BE and LE. I observe another problem with the testcase of both 32 bit and 64 bit instructions (Again it does not impact the validity of the testcase). From cvm_atomic.c, case SAAD: { /* Atomic Add Double - saad rt, (base). */ copy_reg_val_double(); for(j = 8; j < N; j += 8) TEST3("saad", j, reg_val_double_copy, reg_val_double[j]); break; } reg_val_double_copy and reg_val_double are both arrays of unsigned long long entries. The above TEST3's macro expansion can be expressed as mem[reg_val_double_copy + j] = reg_val_double[j] + mem[reg_val_double_copy + j] Due to pointer arithmetic, reg_val_double[j] becomes mem[reg_val_double + 8*j] I think the author's intention was to use jth element of both the arrays reg_val_double and reg_val_double_copy. But its not the case. |