SUMMARY (from private email) instruction addex [..] is fundamentally broken and does not generate and/or consume OV correctly. Tested with 3.13 and 3.14. STEPS TO REPRODUCE 1. 2. 3. OBSERVED RESULT EXPECTED RESULT SOFTWARE/OS VERSIONS Windows: MacOS: Linux/KDE Plasma: (available in About System) KDE Plasma Version: KDE Frameworks Version: Qt Version: ADDITIONAL INFORMATION
Some more details from private email(s). This was originally reported as an issue with GMP on power9. Here is some example code: File t.c: #include <stdio.h> unsigned long f(); int main () { unsigned long foo = f(); for (int i = 56; i >= 0; i -= 8) printf ("%c", (char) (foo >> i)); printf ("\n"); return 0; } File f.s: .globl f .type f, @function f: li 5, 0x6E63 addis 5, 5, 0x7269 li 6, 0x6C67 addis 6, 6, 0x5661 insrdi 5, 6, 32, 0 li 3, -1 subfo 0, 0, 0 # OV <- 0 addex 4, 3, 3, 0 # OV <- 1 addex 3, 3, 3, 0 # r3 <- 0xff..ff + 0xff..ff + 1 = 0xff..fe = -2 sub 3, 5, 3 blr $ gcc -std=c99 t.c f.s $ valgrind ./a.out Correct output is "Valgrind". Output under Valgrind: "Valgrine". There is actually a testcase for addex in valgrind none/tests/ppc64/test_isa_3_0.c. But it is commented out. And indeed, it fails when enabled. The issue seems to be that the XER OV flag is set wrongly. The code in calculate_XER_OV_64 () does: /* (argL^argR^-1) & (argL^res) & (1<<63) ? 1:0 */ // i.e. ((both_same_sign) & (sign_changed) & (sign_mask)) Which checks for two's complements overflow. But addex is akin to adde except that addex inputs carry from OV and outputs carry to OV, while adde of course uses CA. The meaning of OV for other instructions is signed overflow.
Created attachment 117217 [details] Fix addex instruction implementation https://code.wildebeest.org/git/user/mjw/valgrind/commit/?h=power9-addex
Carl reviewed and double checked API test results. Pushed to master. commit 2c1f016e634bf79faf45e81c14c955c711bc202f Author: Mark Wielaard <mark@klomp.org> Date: Mon Dec 31 22:26:31 2018 +0100 Bug 402519 - POWER 3.0 addex instruction incorrectly implemented addex uses OV as carry in and carry out. For all other instructions OV is the signed overflow flag. And instructions like adde use CA as carry. Replace set_XER_OV_OV32 with set_XER_OV_OV32_ADDEX, which will call calculate_XER_CA_64 and calculate_XER_CA_32, but with OV as input, and sets OV and OV32. Enable test_addex in none/tests/ppc64/test_isa_3_0.c and update the expected output. test_addex would fail to match the expected output before this patch.