Created attachment 107934 [details] bug.c When I run the simple program obtained from the source file bug.c in attachment, I obtain different results depending on whether it is run under valgrind or not. When it is not run under valgrind, I obtain the expected values (lcm (2^i, 42)), when it is run under valgrind, the values obtained are false for large i. I suspect that for large value of i, 2^i exceed a threshold in the GMP library code and that some asm code is used. It may be that the asm code in not correctly emulate by valgrind. How to reproduce the code: GMP version: 6.1.2 (configured and compiled with gmp-mparam.h -> mpn/x86_64/skylake/gmp-mparam.h) $ gcc -o bug bug.c -lgmp -lm $ ./bug > normal $ valgrind ./bug > valgrind $ cmp normal valgrind normal valgrind differ: byte 339003, line 1469
I'm seeing a similar problem, was about to file a new bug, but instead commenting here: I'm trying to run gmp test under valgrind. Earlier versions complained that it didn't recognize all instructions, but after upgrading to the valgrind package in debian stable valgrind runs without complaints. My machine has a x86_64 broadwell cpu ("Core i3-5010U") with adx and bmi2 extensions. I'm using gmp master, with the mul_basecase code from https://gmplib.org/repo/gmp/file/tip/mpn/x86_64/coreibwl/mul_basecase.asm, using all of mulx, adcx and adox. I will also attach a disassembly of the corresponding object file. Consider the following test program: #include <stdio.h> #include <gmp.h> int main (int argc, char **argv) { mpz_t a, b, c; mpz_inits(a, b, c, NULL); mpz_set_str (a, "ffffffffffffffffffffff00000007ffffffffffffff00000000000000fffffffffffffffffffff", 16); mpz_set_str (b, "1fffffffffffffffffffffffffffffffffffffffe00007ffffffff", 16); mpz_mul(c, a, b); gmp_printf("%Zx\n", c); mpz_clears(a, b, c, NULL); return 0; } When run, it outputs the number 1fffffffffffffffffffffe0000000ffffffffffdfffe7ffffffff000020001ffff7ff000040fffdfff81ffff800000000e00007ffffffff00000001ffff800000001 However, when ran under valgrind, ==19918== Memcheck, a memory error detector ==19918== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==19918== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==19918== Command: ./a.out ==19918== 1ffffeffffffffffffffffe0000100fffff7ffffffffe00001000000001ffffefffffffffffffffe0000100000000000000010000000000000001001ffff800000001 Note the output is quite different, the most significant (1ffff) and least significant (001ffff800000001) words agree, but the all the rest differ. I suspect it's one of the new and somewhat obscure instructions adox, adcx or mulx that isn't handled correctly.
Created attachment 110579 [details] Disassembly of gmp's mul_basecase.o, for the broadwell cpu family.
In the mean time, a more self-contained example was posted by the GMP author to https://gmplib.org/list-archives/gmp-devel/2018-February/004728.html, I'm copying here for easy reference. To reproduce: $ gcc valgrind-nisse.c bwl_mul_basecase.s $ ./a.out expected: ffffffffffffefff ffe0000800100000 ffff800000000dff 007ffffffff00000 001ffff800000001 got: ffffffffffffefff ffe0000800100000 ffff800000000dff 007ffffffff00000 001ffff800000001 $ valgrind ./a.out ==27485== Memcheck, a memory error detector ==27485== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==27485== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==27485== Command: ./a.out ==27485== expected: ffffffffffffefff ffe0000800100000 ffff800000000dff 007ffffffff00000 001ffff800000001 got: ffffffffffffefff 0000000000100fff ffffffffffefffff 0000000000000001 001ffff800000001 ==27485== ==27485== HEAP SUMMARY: ==27485== in use at exit: 0 bytes in 0 blocks ==27485== total heap usage: 1 allocs, 1 frees, 1,024 bytes allocated ==27485== ==27485== All heap blocks were freed -- no leaks are possible ==27485== ==27485== For counts of detected and suppressed errors, rerun with: -v ==27485== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) I'll upload the two source files as attachments in a moment.
Created attachment 110628 [details] GMP assembly mul_basecase.asm after m4 preprocessing
Created attachment 110629 [details] Main program calling assembly mul_basecase.
Reproduced. Thank you very much to whoever reduced the testcase.
It seems likely to me, from looking at the implementation, that V does not correctly preserve the OSZAP flags after ADCX, nor the SZACP flags after ADOX. At least, that's my current theory. I'll try to hack up a fix.
The large errors are hard to explain just from incorrect carries applied by adcx and adox (unless internal representation of O and C flag can somehow get values far beyond {0,1}). One hypothesis might be that mulx incorrectly clobbers the Z flag. Then the initial logic for loop startup (the two je instructions) would branch incorrectly.
On further analysis and testing, all 3 insns -- adcx, adox and mulx -- appear to be correctly implemented. So now I'm even more mystified.
Argh. V doesn't actually write the computed result to the destination register in the case where one of the sources is a memory operand, for adox and adcx. Eg adoxq 48(%rdi),%r12 %r12 is never written :-)
Niels, the patch below makes your reduced testcase run correctly. Can you test it more widely on GMP and let me know if it is OK? Then I'll land it. $ git diff -U8 diff --git a/VEX/priv/guest_amd64_toIR.c b/VEX/priv/guest_amd64_toIR.c index d00f453..f462030 100644 --- a/VEX/priv/guest_amd64_toIR.c +++ b/VEX/priv/guest_amd64_toIR.c @@ -3075,22 +3075,22 @@ ULong dis_op2_E_G ( const VexAbiInfo* vbi, putIRegG(size, pfx, rm, mkexpr(dst1)); } else if (op8 == Iop_Sub8 && flag == WithFlagCarry) { helper_SBB( size, dst1, dst0, src, /*no store*/IRTemp_INVALID, IRTemp_INVALID, 0 ); putIRegG(size, pfx, rm, mkexpr(dst1)); } else if (op8 == Iop_Add8 && flag == WithFlagCarryX) { - /* normal store */ helper_ADCX_ADOX( True/*isADCX*/, size, dst1, dst0, src ); + putIRegG(size, pfx, rm, mkexpr(dst1)); } else if (op8 == Iop_Add8 && flag == WithFlagOverX) { - /* normal store */ helper_ADCX_ADOX( False/*!isADCX*/, size, dst1, dst0, src ); + putIRegG(size, pfx, rm, mkexpr(dst1)); } else { assign( dst1, binop(mkSizedOp(ty,op8), mkexpr(dst0), mkexpr(src)) ); if (isAddSub(op8)) setFlags_DEP1_DEP2(op8, dst0, src, ty); else setFlags_DEP1(op8, dst1, ty); if (keep) putIRegG(size, pfx, rm, mkexpr(dst1));
I've never built valgrind from source before, but I'll give it a try, maybe already this evening.
(In reply to Niels Möller from comment #12) git clone git://sourceware.org/git/valgrind.git trunk cd trunk ./autogen.sh && ./configure --prefix=`pwd`/Inst && make -j8 && make -j8 install ./Inst/bin/valgrind --version
I added mem,reg and reg,mem tests to test-amd64 and without the patch ad?x? mem,reg instructions fail as expected, and with the patch I only get failures for unrelated instructions that ignore their assembly operands, which sounds more like a tester bug than a VEX one.
Patch works fine for me. I haven't been able to run the complete gmp testsuite under valgrind (some years ago, automake supported that with "make check TESTS_ENVIRONMENT=valgrind", but now that seems to only run the shell under valgrind, not the programs under test. Any hints on how to do that appreciated). But I've run the mini-gmp tests (which compares results computed by gmp and mini-gmp) under valgrind, and gmp's t-mul test program. Both fail without the patch, and succeeed after the patch is applied.
(In reply to jacobly.alt from comment #14) > I added mem,reg and reg,mem tests to test-amd64 [..] Great! Can you pls make the testcase diffs available?
(In reply to Niels Möller from comment #15) > But I've run the mini-gmp tests [..] Good, thanks. That's good enough for me. I'll land the fix.
Created attachment 110697 [details] Test mem variants of instructions in test-amd64. I think this triggers a latent bug in the test code with string instructions (I'm guessing due to the fact that assemblers will accept operands for them, but ignore them since there's only one possible encoding to output, which just happens to match up with how parameters are passed to functions), but it certainly tests more arithmetic code paths in VEX.
Fix committed, 6ae2edea014669d8082747f0f268e9404e0fd296.
(In reply to jacobly.alt from comment #18) > Created attachment 110697 [details] > Test mem variants of instructions in test-amd64. Is this a patch relative to the git trunk, or a revised version of the original test patch? I am unclear.
Created attachment 110721 [details] Test mem variants of instructions in test-amd64. Sorry, I had not noticed the switch to git. Here's a git patch against git trunk. The patch also includes the test changes from the bug 360415 patch which added tests for ad?x.