Bug 327223

Summary:	Support for Cavium MIPS Octeon Atomic and Count Instructions
Product:	[Developer tools] valgrind	Reporter:	Dr. Zahid Anwar, School of Electrical Engg. & Computer Science, National Univ. of Sciences & Technology, Islamabad, Pakistan <zahid.anwar>
Component:	vex	Assignee:	Julian Seward <jseward>
Status:	RESOLVED FIXED
Severity:	normal	CC:	dejanjevtic87, florian, maran.pakkirisamy, mips32r2
Priority:	NOR
Version:	3.9.0
Target Milestone:	---
Platform:	Debian stable
OS:	Linux
Latest Commit:		Version Fixed In:
Attachments:	Add support cavium octeon Atomic and count instructions Test Cases Output File Adds support for CVM MIPS atomic and count instructions Test Cases VGTEST file Cavium atomic instructions Cavium atomic tests SAA testcase fix cvm_atomic_thread

Description Dr. Zahid Anwar, School of Electrical Engg. & Computer Science, National Univ. of Sciences & Technology, Islamabad, Pakistan 2013-11-06 13:12:34 UTC

The MIPS port for valgrind does not have support for the following Cavium Octeon specific instructions: 

-Baddu
-pop
-dpop
-saa
-saad
-laa
-laad
-lai
-laid
-lad
-ladd
-law
-lawd
-las
-lasd
-lac
-lacd

Reproducible: Always

Steps to Reproduce:
1.Executing valgrind with any test case having these instructions on an Octeon Processor using Cavium SDK 3.0 will produce this error.
Actual Results:  
==10857== Memcheck, a memory error detector
==10857== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==10857== Using Valgrind-3.9.0.SVN and LibVEX; rerun with -h for copyright info
==10857== Command: ../atomic
==10857== 
vex mips->IR: unhandled instruction bytes: 0x71 0xC0 0x78 0x2C
==10857== valgrind: Unrecognised instruction at address 0x1200000e4.
==10857==    at 0x1200000E4: __start (ldx.s:14)
==10857==    by 0xFFF000A78: ???
==10857== Your program just tried to execute an instruction that Valgrind
==10857== did not recognise.  There are two possible reasons for this.
==10857== 1. Your program has a bug and erroneously jumped to a non-code
==10857==    location.  If you are running Memcheck and you just saw a
==10857==    warning about a bad jump, it's probably your program's fault.
==10857== 2. The instruction is legitimate but Valgrind doesn't handle it,
==10857==    i.e. it's Valgrind's fault.  If you think this is the case or
==10857==    you are not sure, please let us know and we'll try to fix it.
==10857== Either way, Valgrind will now raise a SIGILL signal which will
==10857== probably kill your program.
==10857== 
==10857== Process terminating with default action of signal 4 (SIGILL)
==10857==  Illegal opcode at address 0x1200000E4
==10857==    at 0x1200000E4: __start (ldx.s:14)
==10857==    by 0xFFF000A78: ???
==10857== 
==10857== HEAP SUMMARY:
==10857==     in use at exit: 0 bytes in 0 blocks
==10857==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==10857== 
==10857== All heap blocks were freed -- no leaks are possible
==10857== 
==10857== For counts of detected and suppressed errors, rerun with: -v
==10857== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
./vg-in-place: line 31: 10857 Illegal instruction     VALGRIND_LIB="$vgbasedir/.in_place" VALGRIND_LIB_INNER="$vgbasedir/.in_place" "$vgbasedir/coregrind/valgrind" "$@"


Expected Results:  
==12504== Memcheck, a memory error detector
==12504== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==12504== Using Valgrind-3.9.0.SVN and LibVEX; rerun with -h for copyright info
==12504== Command: ../atomic
==12504== 
==12504== 
==12504== HEAP SUMMARY:
==12504==     in use at exit: 0 bytes in 0 blocks
==12504==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==12504== 
==12504== All heap blocks were freed -- no leaks are possible
==12504== 
==12504== For counts of detected and suppressed errors, rerun with: -v
==12504== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Comment 1 Dr. Zahid Anwar, School of Electrical Engg. & Computer Science, National Univ. of Sciences & Technology, Islamabad, Pakistan 2013-11-06 13:21:26 UTC

Created attachment 83370 [details]
Add support cavium octeon Atomic and count instructions

******************  PRE-REQUISITES   *****************

----Testing Enviroment specification:-----
																			
SDK             3.0 with new tool chain																		
Tool-chain	tool-121004																		
gcc version	4.7																		
glibc version	2.11																		
kernel version	3.4.27-rt37-Cavium-Octeon																		
cpu info	Cavium Octeon II V0.1																		
MotherBoard	OCTEON_CN63XX																		

valgrind 3.9.0 	r13667 release 13667
command to checkout: svn co -r 13667 svn://svn.valgrind.org/valgrind/trunk valgrind

******************  Applying patch   *****************

Note: It is assumed that the previous Patch for Bug #:326444 (Link: https://bugs.kde.org/show_bug.cgi?id=326444) has been already applied.

copy patch file to valgrind folder and following command should be run to apply patch.
run patch -p1 -i valgrind_r13667_6_Nov_2013.patch 

******************  Files patched   *****************
patch file VEX/priv/guest_mips_toIR.c
patch file VEX/priv/host_mips_defs.c
patch file VEX/priv/host_mips_defs.h
patch file VEX/priv/host_mips_isel.c
patch file VEX/priv/ir_defs.c
patch file VEX/priv/ir_opt.c
patch file VEX/priv/main_main.c
patch file VEX/pub/libvex_ir.h

********  Instructions added (Cavium specific) ******
- BADDU
- POP
- DPOP
- SAA
- SAAD
- LAA
- LAAD
- LAW
- LAWD
- LAI
- LAID
- LAS
- LASD
- LAD
- LADD
- LAC
- LACD

Comment 2 Dr. Zahid Anwar, School of Electrical Engg. & Computer Science, National Univ. of Sciences & Technology, Islamabad, Pakistan 2013-11-06 13:25:05 UTC

Created attachment 83373 [details]
Test Cases

Comment 3 Dr. Zahid Anwar, School of Electrical Engg. & Computer Science, National Univ. of Sciences & Technology, Islamabad, Pakistan 2013-11-06 13:26:11 UTC

Created attachment 83374 [details]
Output File

Comment 4 Petar Jovanovic 2013-11-16 04:42:21 UTC

Can you please address all issues mentioned in #326444 that are applicable
to this change?

Further, can you add some comments to cvm_atomic.c - e.g. what does each macro
do and such?

vgtest file seems to be missing.

Comment 5 Dr. Zahid Anwar, School of Electrical Engg. & Computer Science, National Univ. of Sciences & Technology, Islamabad, Pakistan 2013-12-11 14:08:23 UTC

(In reply to comment #4)
> Can you please address all issues mentioned in #326444 that are applicable
> to this change?
> 
Patch has been updated to address all applicable issues in #326444 which were 1) valgrind code style conformance and 2) rebase with ToT (revision 13752)

> Further, can you add some comments to cvm_atomic.c - e.g. what does each
> macro
> do and such?
> 
> vgtest file seems to be missing.
Detailed comments and vg_test file have been added.

Comment 6 Dr. Zahid Anwar, School of Electrical Engg. & Computer Science, National Univ. of Sciences & Technology, Islamabad, Pakistan 2013-12-11 14:11:58 UTC

Created attachment 84043 [details]
Adds support for CVM MIPS atomic and count instructions

******************  PRE-REQUISITES   *****************

----Testing Enviroment specification:-----
																			
SDK             3.0 with new tool chain																		
Tool-chain	tool-121004																		
gcc version	4.7																		
glibc version	2.16																		
kernel version	3.4.27-rt37-Cavium-Octeon																		
cpu info	Cavium Octeon II V0.1																		
MotherBoard	OCTEON_CN63XX																		

valgrind 3.10.0 	r13752 release 13752
command to checkout: svn co -r 13752 svn://svn.valgrind.org/valgrind/trunk valgrind

******************  Applying patch   *****************
Note: It is assumed that the previous Patch for Bug #:326444 (Link: https://bugs.kde.org/show_bug.cgi?id=326444) has been already applied.

copy patch file to valgrind folder and following command should be run to apply patch.
run patch -p2 -i valgrind_bug327223_10_Dec_2013.patch 

******************  Files patched   *****************
patched file memcheck/mc_translate.c
patched file none/tests/mips64/Makefile.am
patched file VEX/priv/guest_mips_toIR.c
patched file VEX/priv/host_mips_defs.c
patched file VEX/priv/host_mips_defs.h
patched file VEX/priv/host_mips_isel.c
patched file VEX/priv/ir_defs.c
patched file VEX/priv/ir_opt.c
patched file VEX/priv/main_main.c
patched file VEX/pub/libvex_ir.h

********  Instructions added (CVM MIPS) ******
- BADDU
- POP
- DPOP
- SAA
- SAAD
- LAA
- LAAD
- LAW
- LAWD
- LAI
- LAID
- LAS
- LASD
- LAD
- LADD
- LAC
- LACD

Comment 7 Dr. Zahid Anwar, School of Electrical Engg. & Computer Science, National Univ. of Sciences & Technology, Islamabad, Pakistan 2013-12-11 14:15:00 UTC

Created attachment 84044 [details]
Test Cases

Comment 8 Dr. Zahid Anwar, School of Electrical Engg. & Computer Science, National Univ. of Sciences & Technology, Islamabad, Pakistan 2013-12-11 14:15:27 UTC

Created attachment 84045 [details]
VGTEST file

Comment 9 Dejan Jevtic 2014-02-05 16:03:41 UTC

@Dr. Zahid Anwar

Did you try to implement Cavium MIPS Octeon special atomic instructions by using IRStmt_LLSC ?

Comment 10 Julian Seward 2014-05-09 12:23:24 UTC

Petar, Dejan, what's the status on this?

Comment 11 Petar Jovanovic 2014-05-09 13:23:23 UTC

(In reply to comment #10)
> Petar, Dejan, what's the status on this?

This patch seems fine and can be applied after some cleanup, but it would
add new Iops that likely necessary:

       Iop_LoadAtomicAdd,
       Iop_LoadAtomicSwap,
       Iop_LoadAtomicInc,
       Iop_LoadAtomicDec,
       Iop_LoadAtomicSet,
       Iop_LoadAtomicClr,


Are you fine with that?

Comment 12 Julian Seward 2014-05-12 13:58:18 UTC

(In reply to comment #11)
> Are you fine with that?

No .. I would prefer to do this without changing the core IR if at all
possible.  What are the semantics of these instructions? Specifically, what
are the atomicity properties?

Comment 13 Petar Jovanovic 2014-05-12 17:52:02 UTC

(In reply to comment #12)
> (In reply to comment #11)
> > Are you fine with that?
> 
> No .. I would prefer to do this without changing the core IR if at all
> possible.  What are the semantics of these instructions? Specifically, what
> are the atomicity properties?

I presumed you would prefer that, this is why the issue is still open.

Here is a description of some of the instructions:

#1
LADD - to load and decrement a doubleword of memory atomically (the 64-bit aligned address specified by register base is decremented after being read, and the register rd gets the memory read result.

LAAD rd, (base), rt
GPR[rd] = Mem[GPR[base]]  // atomic
Mem[GPR[base]] = Mem[GPR[base]] + GPR[rt] // atomic

#2
SAAD - to atomically add a doubleword to a memory location
SAAD rt, (base)
memory[base] = memory[base] + rt // atomic

Comment 14 Florian Krohm 2014-05-12 18:54:42 UTC

(In reply to comment #13)
> (In reply to comment #12)
> > (In reply to comment #11)
> > > Are you fine with that?
> > 
> > No .. I would prefer to do this without changing the core IR if at all
> > possible.  What are the semantics of these instructions? Specifically, what
> > are the atomicity properties?
> 
> I presumed you would prefer that, this is why the issue is still open.
> 
> Here is a description of some of the instructions:
> 
> #1
> LADD - to load and decrement a doubleword of memory atomically (the 64-bit
> aligned address specified by register base is decremented after being read,
> and the register rd gets the memory read result.
> 
> LAAD rd, (base), rt
> GPR[rd] = Mem[GPR[base]]  // atomic
> Mem[GPR[base]] = Mem[GPR[base]] + GPR[rt] // atomic
> 

s390 has insns like that as well. For instance "load and add" which reads:

The second operand  [memory] is added to the third operand [gpr],
and the sum is placed at the second-operand loca-
tion [memory]. Subsequently, the original contents of the sec-
ond operand [memory] (prior to the addition) are placed
unchanged at the first-operand location [gpr].
...
The fetch of the second operand [memory] for purposes of
loading and the store into the second-operand loca-
tion [memory] appear to be a block-concurrent interlocked-
update reference as observed by other CPUs. 

I've implemented this using mkIRCAS.
Function s390_irgen_load_and_add64 in guest_s390_toIR.c does the work.
Perhaps you can use it for inspiration .... and tell me whether I got it right :)
I remember it was a bit of a pain to do..

Comment 15 Petar Jovanovic 2014-05-23 15:36:05 UTC

(In reply to comment #14)
> (In reply to comment #13)
> > (In reply to comment #12)
> > > (In reply to comment #11)
> > > > Are you fine with that?
> > > 
> > > No .. I would prefer to do this without changing the core IR if at all
> > > possible.  What are the semantics of these instructions? Specifically, what
> > > are the atomicity properties?
> > 
> > I presumed you would prefer that, this is why the issue is still open.
> > 
> > Here is a description of some of the instructions:
> > 
> > #1
> > LADD - to load and decrement a doubleword of memory atomically (the 64-bit
> > aligned address specified by register base is decremented after being read,
> > and the register rd gets the memory read result.
> > 
> > LAAD rd, (base), rt
> > GPR[rd] = Mem[GPR[base]]  // atomic
> > Mem[GPR[base]] = Mem[GPR[base]] + GPR[rt] // atomic
> > 
> 
> s390 has insns like that as well. For instance "load and add" which reads:
> 
> The second operand  [memory] is added to the third operand [gpr],
> and the sum is placed at the second-operand loca-
> tion [memory]. Subsequently, the original contents of the sec-
> ond operand [memory] (prior to the addition) are placed
> unchanged at the first-operand location [gpr].
> ...
> The fetch of the second operand [memory] for purposes of
> loading and the store into the second-operand loca-
> tion [memory] appear to be a block-concurrent interlocked-
> update reference as observed by other CPUs. 
> 
> I've implemented this using mkIRCAS.
> Function s390_irgen_load_and_add64 in guest_s390_toIR.c does the work.
> Perhaps you can use it for inspiration .... and tell me whether I got it
> right :)
> I remember it was a bit of a pain to do..

Thanks Florian for bringing it up.
Yes, that is a better approach than the one in the proposed patch.

Comment 16 Dejan Jevtic 2014-05-25 11:07:58 UTC

Created attachment 86813 [details]
Cavium atomic instructions

Hello all,
this patch is adding support for Cavium atomic instructions
based on function s390_irgen_load_and_add64 in guest_s390_toIR.c
like Florian Krohm suggested.

Comment 17 Dejan Jevtic 2014-05-25 11:09:02 UTC

Created attachment 86814 [details]
Cavium atomic tests

This patch is adding additional tests for Cavium atomic instructions.

Comment 18 Dejan Jevtic 2014-06-04 11:41:08 UTC

Patches are committed in revision r13994 VEX r2865.
This bug can be closed.

Comment 19 Maran Pakkirisamy 2014-09-29 09:36:36 UTC

 1#include <stdio.h>
 2
 3void saa() {
 4   unsigned int rt = 0x11111111;
 5   unsigned int out = 0, out_pre = 0;
 6   unsigned int offset = 4;
 7   unsigned int base[] = {0x22222222, 0x33333333};
 8
 9  __asm__ volatile (
10                    "move $t0, %2"       "\n\t"  /* mem */
11                    "move $t1, %3"       "\n\t"  /* offset */
12                    "daddu $t0, $t1, $t0""\n\t"  /* mem + offset */
13                    "lw %1, 0($t0)"      "\n\t"  /* out_pre = *(mem + offset) */
14                    "move $t2, %4"       "\n\t"  /* t2 = rt  */
15                    "saa $t2, ($t0)"     "\n\t"
16                    "lw %0, 0($t0)"      "\n\t"  /* out = *(mem + offset) */
17                    : "=&r" (out), "=&r" (out_pre)
18                    : "r" (base), "r" (offset), "r" (rt)
19                    : "$12", "$13", "$14", "cc", "memory"
20                    );
21
22  printf("saa rt: %x out: %x out_pre: %x\n", rt, out, out_pre);
23}
24
25int main() {
26  saa();
27
28  return 0;
29}

The above saa instruction(Cavium Octeon specific) testcase produces following valgrind report.

valgrind --track-origins=yes --quiet ./saa
==2587== Conditional jump or move depends on uninitialised value(s)
==2587==    at 0x10000B4C: saa (saa.c:9)
==2587==    by 0x10000BD8: main (saa.c:26)
==2587== 
saa rt: 11111111 out: 44444444 out_pre: 33333333

The following patch helps to get rid of the above valgrind error.

--- a/priv/guest_mips_toIR.c
+++ b/priv/guest_mips_toIR.c
@@ -2203,7 +2203,10 @@ static void mips_irgen_load_and_add32(IRTemp op1addr, IRTemp n
 
    /* If old_mem contains the expected value, then the CAS succeeded.
       Otherwise, it did not */
-   jump_back(binop(Iop_CmpNE32, mkexpr(old_mem), mkexpr(expd)));
+   jump_back(binop(Iop_CmpNE64,
+                   mkWidenFrom32(Ity_I64, mkexpr(old_mem), True),
+                   mkWidenFrom32(Ity_I64, mkexpr(expd), True)));
+                   
    if (putIntoRd)
       putIReg(rd, mkWidenFrom32(Ity_I64, mkexpr(old_mem), True));
 }

I am not sure if this is the right fix, but for your perusal.

Comment 20 Maran Pakkirisamy 2014-09-29 10:45:18 UTC

Created attachment 88882 [details]
SAA testcase fix

Excerpt from cvm_atomic.stdout.exp-LE

saa :: value: 0x42b0c0a28677b502, memPre: 0x82d2ab1300000000, mem: 0x82d2ab138677b502
saa :: value: 0x9e705cc51ad8dca0, memPre: 0x7e876382d2ab13, mem: 0x7e87639dab87b3
saa :: value: 0x47f505569a08a180, memPre: 0xc31510f3007e8763, mem: 0xc31510f39a8728e3
...
saad :: value: 0x9e705cc51ad8dca0, memPre: 0x7e876382d2ab13, mem: 0x9eeee4289dab87b3
saad :: value: 0x94ff52fc81afa797, memPre: 0x976d6e9ac31510f3, mem: 0x2c6cc19744c4b88a
saad :: value: 0x3c2cd9a9cda20766, memPre: 0xb7746d775ad6a5fb, mem: 0xf3a147212878ad61
...
It is expected (value + mempre) == mem in both the cases of SAA and SAAD.
It can be observed that SAAD entries hold the condition while SAA entries don't.

On analysing cvm_atomic.c, I found that 64 bit inputs are provided to both SAA and SAAD instructions (macro TEST3). SAA fails to execute as expected if its input are 64 bit.

I have attached a patch to fix this. The patch addresses only SAA instruction. If the approach of the fix is accepted, I will post a proper patch. 

The other 32 bit atomic instructions also suffer from the same issue and they also seem to require similar fix.
laa
law
lai
lad
las
lac

Another question:
Why is expected output file is named as cvm_atomic.stdout.exp-LE?
I assume LE is little endian. AFAIK, Octeon processors run as big endian.

--Maran

Comment 21 Maran Pakkirisamy 2014-09-29 10:47:00 UTC

If the claims sound valid, I request to reopen the issue.

Comment 22 Maran Pakkirisamy 2014-10-07 10:30:38 UTC

Created attachment 89007 [details]
cvm_atomic_thread

32 bit instructions(saa, laa, law, lai, lad, lac, las) are passed with 64 bit parameters. As mentioned in comment #20, SAA does not execute as expected when 64 bit integer parametrs are passed. Though the result of other 32 bit version of atomic instructions are not affected even when 64 bit integers are passed, I though its better to fix them (to pass 32 bit parameters to 32 bit instructions) as well.

Also a superfluous memset has been removed.

The patch demands an update to stdout.exp file as well. I will post it, if the changes of this patch are consented.

Comment 23 Petar Jovanovic 2014-11-26 16:50:59 UTC

(In reply to Maran Pakkirisamy from comment #19)
>  1#include <stdio.h>
>  2
>  3void saa() {
>  4   unsigned int rt = 0x11111111;
>  5   unsigned int out = 0, out_pre = 0;
>  6   unsigned int offset = 4;
>  7   unsigned int base[] = {0x22222222, 0x33333333};
>  8
>  9  __asm__ volatile (
> 10                    "move $t0, %2"       "\n\t"  /* mem */
> 11                    "move $t1, %3"       "\n\t"  /* offset */
> 12                    "daddu $t0, $t1, $t0""\n\t"  /* mem + offset */
> 13                    "lw %1, 0($t0)"      "\n\t"  /* out_pre = *(mem +
> offset) */
> 14                    "move $t2, %4"       "\n\t"  /* t2 = rt  */
> 15                    "saa $t2, ($t0)"     "\n\t"
> 16                    "lw %0, 0($t0)"      "\n\t"  /* out = *(mem + offset)
> */
> 17                    : "=&r" (out), "=&r" (out_pre)
> 18                    : "r" (base), "r" (offset), "r" (rt)
> 19                    : "$12", "$13", "$14", "cc", "memory"
> 20                    );
> 21
> 22  printf("saa rt: %x out: %x out_pre: %x\n", rt, out, out_pre);
> 23}
> 24
> 25int main() {
> 26  saa();
> 27
> 28  return 0;
> 29}
> 
> The above saa instruction(Cavium Octeon specific) testcase produces
> following valgrind report.
> 
> valgrind --track-origins=yes --quiet ./saa
> ==2587== Conditional jump or move depends on uninitialised value(s)
> ==2587==    at 0x10000B4C: saa (saa.c:9)
> ==2587==    by 0x10000BD8: main (saa.c:26)
> ==2587== 
> saa rt: 11111111 out: 44444444 out_pre: 33333333
> 
> The following patch helps to get rid of the above valgrind error.
> 
> --- a/priv/guest_mips_toIR.c
> +++ b/priv/guest_mips_toIR.c
> @@ -2203,7 +2203,10 @@ static void mips_irgen_load_and_add32(IRTemp op1addr,
> IRTemp n
>  
>     /* If old_mem contains the expected value, then the CAS succeeded.
>        Otherwise, it did not */
> -   jump_back(binop(Iop_CmpNE32, mkexpr(old_mem), mkexpr(expd)));
> +   jump_back(binop(Iop_CmpNE64,
> +                   mkWidenFrom32(Ity_I64, mkexpr(old_mem), True),
> +                   mkWidenFrom32(Ity_I64, mkexpr(expd), True)));
> +                   
>     if (putIntoRd)
>        putIReg(rd, mkWidenFrom32(Ity_I64, mkexpr(old_mem), True));
>  }
> 
> I am not sure if this is the right fix, but for your perusal.

This issue seems to be valid, but I do not think it is necessarily
related to how saa is modeled, but rather what code is later generated
in this case. Obviously, what you are seeing is a false warning and it
seems to be related what code gets generated for Iop_CmpNE32 on MIPS64
platform.
You should open a new issue for discussing this.

Comment 24 Petar Jovanovic 2014-11-26 18:27:03 UTC

(In reply to Maran Pakkirisamy from comment #20)
> Created attachment 88882 [details]
> SAA testcase fix
> 
> Excerpt from cvm_atomic.stdout.exp-LE
> 
> saa :: value: 0x42b0c0a28677b502, memPre: 0x82d2ab1300000000, mem:
> 0x82d2ab138677b502
> saa :: value: 0x9e705cc51ad8dca0, memPre: 0x7e876382d2ab13, mem:
> 0x7e87639dab87b3
> saa :: value: 0x47f505569a08a180, memPre: 0xc31510f3007e8763, mem:
> 0xc31510f39a8728e3
> ...
> saad :: value: 0x9e705cc51ad8dca0, memPre: 0x7e876382d2ab13, mem:
> 0x9eeee4289dab87b3
> saad :: value: 0x94ff52fc81afa797, memPre: 0x976d6e9ac31510f3, mem:
> 0x2c6cc19744c4b88a
> saad :: value: 0x3c2cd9a9cda20766, memPre: 0xb7746d775ad6a5fb, mem:
> 0xf3a147212878ad61
> ...
> It is expected (value + mempre) == mem in both the cases of SAA and SAAD.
> It can be observed that SAAD entries hold the condition while SAA entries
> don't.
> 
> On analysing cvm_atomic.c, I found that 64 bit inputs are provided to both
> SAA and SAAD instructions (macro TEST3). SAA fails to execute as expected if
> its input are 64 bit.
> 
Correct me if I am wrong, but description of SAA says that it takes the
least-significant 32-bit word and it adds it to the memory.
So, 64-bit values are not a problem.

> I have attached a patch to fix this. The patch addresses only SAA
> instruction. If the approach of the fix is accepted, I will post a proper
> patch. 
> 
I do not see any problem, so first we need to define a problem before we
discuss a patch. Can you create a test which will give different output
when executed under Valgrind? Please open a new issue for that.

> The other 32 bit atomic instructions also suffer from the same issue and
> they also seem to require similar fix.
> laa
> law
> lai
> lad
> las
> lac
> 
> Another question:
> Why is expected output file is named as cvm_atomic.stdout.exp-LE?
> I assume LE is little endian. AFAIK, Octeon processors run as big endian.
> 
LE is little endian indeed. No, Octeon processors are not BE-only.

Comment 25 Maran Pakkirisamy 2014-12-02 09:12:03 UTC

> This issue seems to be valid, but I do not think it is necessarily
> related to how saa is modeled, but rather what code is later generated
> in this case. Obviously, what you are seeing is a false warning and it
> seems to be related what code gets generated for Iop_CmpNE32 on MIPS64
> platform.
> You should open a new issue for discussing this.

Ok. Created #341481
https://bugs.kde.org/show_bug.cgi?id=341481

Comment 26 Maran Pakkirisamy 2014-12-03 13:20:09 UTC

(In reply to Petar Jovanovic from comment #24)
> (In reply to Maran Pakkirisamy from comment #20)
> Correct me if I am wrong, but description of SAA says that it takes the
> least-significant 32-bit word and it adds it to the memory.
> So, 64-bit values are not a problem.
> 
> > I have attached a patch to fix this. The patch addresses only SAA
> > instruction. If the approach of the fix is accepted, I will post a proper
> > patch. 
> > 
> I do not see any problem, so first we need to define a problem before we
> discuss a patch. Can you create a test which will give different output
> when executed under Valgrind? Please open a new issue for that.
> 

Sorry for not making it clear. The problem I claim is not with the valgrind implementation of SAA (and other 32 bit atomic instructions), but with the testcase.

Let us take an SAA entry in the output file cvm_atomic.stdout.exp-LE
saa :: value: 0x94ff52fc81afa797, memPre: 0x976d6e9ac31510f3, mem: 0x976d6e9a44c4b88a
As mentioned already, each entry is supposed satisfy the condition
mem == value + memPre # SAA adds "value" to old content(memPre) at a mem loc. and overwrites  the mem loc. by the result(mem).

You can observe that the above entry does not hold the condition.
But SAA takes the second half of memPre (0xc31510f3 in little endian case) and  the second half of value (0x81afa797) and replaces second half of mem.

So, SAA functions as expected (with and without valgrind) but only that the output printed by the testcase does not make complete sense. The reason being 64bit datatype parameters are handled by the macro TEST3 for a 32 bit instruction.

However, the problem in the testcase does not impact its validity, as the same output is going to be produced with and without valgrind since the arrays reg_val_double, reg_val_double_copy are already initialized.

In big endian case, SAA takes the first half of memPre (0x976d6e9a) and  the second half of value (0x81afa797) and replaces first half of mem.

So, with the testcase as such, we will have to create a separate stdout.exp for BE - cvm_atomic.stdout.exp-BE.
However, if the 32 bit instructions are handled properly with 32 bit datatypes, I think single stdout.exp will be sufficient for both BE and LE.

I observe another problem with the testcase of both 32 bit and 64 bit instructions (Again it does not impact the validity of the testcase).
From cvm_atomic.c,
         case SAAD: {  /* Atomic Add Double - saad rt, (base). */
            copy_reg_val_double();
            for(j = 8; j < N; j += 8)
               TEST3("saad", j, reg_val_double_copy, reg_val_double[j]);
            break;
         }

reg_val_double_copy and reg_val_double are both arrays of unsigned long long entries.
The above TEST3's macro expansion can be expressed as
mem[reg_val_double_copy + j] = reg_val_double[j] + mem[reg_val_double_copy + j] 

Due to pointer arithmetic, reg_val_double[j] becomes mem[reg_val_double + 8*j]
I think the author's intention was to use jth element of both the arrays reg_val_double and reg_val_double_copy. But its not the case.