Bug 402519 - POWER 3.0 addex instruction incorrectly implemented
Summary: POWER 3.0 addex instruction incorrectly implemented
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: vex (show other bugs)
Version: 3.15 SVN
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Mark Wielaard
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-12-24 09:41 UTC by Julian Seward
Modified: 2019-01-04 12:39 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Fix addex instruction implementation (11.80 KB, text/plain)
2018-12-31 21:45 UTC, Mark Wielaard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Julian Seward 2018-12-24 09:41:49 UTC
SUMMARY

(from private email)

instruction addex [..] is fundamentally broken and does not generate and/or consume OV correctly.  Tested with 3.13 and 3.14.

STEPS TO REPRODUCE
1. 
2. 
3. 

OBSERVED RESULT


EXPECTED RESULT


SOFTWARE/OS VERSIONS
Windows: 
MacOS: 
Linux/KDE Plasma: 
(available in About System)
KDE Plasma Version: 
KDE Frameworks Version: 
Qt Version: 

ADDITIONAL INFORMATION
Comment 1 Mark Wielaard 2018-12-31 21:04:48 UTC
Some more details from private email(s).

This was originally reported as an issue with GMP on power9.

Here is some example code:

File t.c:

#include <stdio.h>
unsigned long f();
int
main ()
{
  unsigned long foo = f();
  for (int i = 56; i >= 0; i -= 8)
    printf ("%c", (char) (foo >> i));
  printf ("\n");
  return 0;
}

File f.s:

        .globl  f
        .type   f, @function
f:      li      5, 0x6E63
        addis   5, 5, 0x7269
        li      6, 0x6C67
        addis   6, 6, 0x5661
        insrdi  5, 6, 32, 0

        li      3, -1
        subfo   0, 0, 0         # OV <- 0
        addex   4, 3, 3, 0      # OV <- 1
        addex   3, 3, 3, 0      # r3 <- 0xff..ff + 0xff..ff + 1 = 0xff..fe = -2

        sub     3, 5, 3
        blr

$ gcc -std=c99 t.c f.s
$ valgrind ./a.out

Correct output is "Valgrind".
Output under Valgrind: "Valgrine".

There is actually a testcase for addex in valgrind
none/tests/ppc64/test_isa_3_0.c. But it is commented out.
And indeed, it fails when enabled.

The issue seems to be that the XER OV flag is set wrongly.
The code in calculate_XER_OV_64 () does:

       /* (argL^argR^-1) & (argL^res) & (1<<63)  ? 1:0 */
       // i.e. ((both_same_sign) & (sign_changed) & (sign_mask))

Which checks for two's complements overflow.

But addex is akin to adde except that addex inputs carry from OV and outputs
carry to OV, while adde of course uses CA.

The meaning of OV for other instructions is signed overflow.
Comment 2 Mark Wielaard 2018-12-31 21:45:43 UTC
Created attachment 117217 [details]
Fix addex instruction implementation

https://code.wildebeest.org/git/user/mjw/valgrind/commit/?h=power9-addex
Comment 3 Mark Wielaard 2019-01-04 12:39:18 UTC
Carl reviewed and double checked API test results.
Pushed to master.

commit 2c1f016e634bf79faf45e81c14c955c711bc202f
Author: Mark Wielaard <mark@klomp.org>
Date:   Mon Dec 31 22:26:31 2018 +0100

    Bug 402519 - POWER 3.0 addex instruction incorrectly implemented
    
    addex uses OV as carry in and carry out. For all other instructions
    OV is the signed overflow flag. And instructions like adde use CA
    as carry.
    
    Replace set_XER_OV_OV32 with set_XER_OV_OV32_ADDEX, which will
    call calculate_XER_CA_64 and calculate_XER_CA_32, but with OV
    as input, and sets OV and OV32.
    
    Enable test_addex in none/tests/ppc64/test_isa_3_0.c and update
    the expected output. test_addex would fail to match the expected
    output before this patch.