Bug 384930 - Valgrind fails to compute correctly some code using the GMP library
Summary: Valgrind fails to compute correctly some code using the GMP library
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.13.0
Platform: Debian testing Linux
: NOR major
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-09-21 13:50 UTC by mail
Modified: 2018-02-16 15:27 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
bug.c (634 bytes, text/x-csrc)
2017-09-21 13:50 UTC, mail
Details
Disassembly of gmp's mul_basecase.o, for the broadwell cpu family. (10.21 KB, text/plain)
2018-02-12 20:52 UTC, Niels Möller
Details
GMP assembly mul_basecase.asm after m4 preprocessing (4.78 KB, text/plain)
2018-02-13 20:00 UTC, Niels Möller
Details
Main program calling assembly mul_basecase. (657 bytes, text/plain)
2018-02-13 20:01 UTC, Niels Möller
Details
Test mem variants of instructions in test-amd64. (9.33 KB, patch)
2018-02-16 02:04 UTC, jacobly.alt
Details
Test mem variants of instructions in test-amd64. (9.47 KB, patch)
2018-02-16 15:27 UTC, jacobly.alt
Details

Note You need to log in before you can comment on or make changes to this bug.
Description mail 2017-09-21 13:50:17 UTC
Created attachment 107934 [details]
bug.c

When I run the simple program obtained from the source file bug.c in attachment, I obtain different results depending on whether it is run under valgrind or not.

When it is not run under valgrind, I obtain the expected values (lcm (2^i, 42)), when it is run under valgrind, the values obtained are false for large i.

I suspect that for large value of i, 2^i exceed a threshold in the GMP library code and that some asm code is used. It may be that the asm code in not correctly emulate by valgrind.


How to reproduce the code:

GMP version: 6.1.2 (configured and compiled with gmp-mparam.h -> mpn/x86_64/skylake/gmp-mparam.h)

$ gcc -o bug bug.c -lgmp -lm
$ ./bug > normal
$ valgrind ./bug > valgrind 
$ cmp normal valgrind 
normal valgrind differ: byte 339003, line 1469
Comment 1 Niels Möller 2018-02-12 20:50:57 UTC
I'm seeing a similar problem, was about to file a new bug, but instead commenting here:

I'm trying to run gmp test under valgrind. Earlier versions complained that it didn't recognize all instructions, but after upgrading to the valgrind package in debian stable valgrind runs without complaints. My machine has a x86_64 broadwell cpu ("Core i3-5010U") with adx and bmi2 extensions. 

I'm using gmp master, with the mul_basecase code from https://gmplib.org/repo/gmp/file/tip/mpn/x86_64/coreibwl/mul_basecase.asm, using all of mulx, adcx and adox. I will also attach a disassembly of the corresponding object file. Consider the following test program:

#include <stdio.h>
#include <gmp.h>

int main (int argc, char **argv) 
{
  mpz_t a, b, c;
  mpz_inits(a, b, c, NULL);
  mpz_set_str (a, "ffffffffffffffffffffff00000007ffffffffffffff00000000000000fffffffffffffffffffff", 16);
  mpz_set_str (b, "1fffffffffffffffffffffffffffffffffffffffe00007ffffffff", 16);
  mpz_mul(c, a, b);
  gmp_printf("%Zx\n", c);
  mpz_clears(a, b, c, NULL);
  return 0;
}

When run, it outputs the number 

1fffffffffffffffffffffe0000000ffffffffffdfffe7ffffffff000020001ffff7ff000040fffdfff81ffff800000000e00007ffffffff00000001ffff800000001

However, when ran under valgrind, 

==19918== Memcheck, a memory error detector
==19918== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==19918== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==19918== Command: ./a.out
==19918== 
1ffffeffffffffffffffffe0000100fffff7ffffffffe00001000000001ffffefffffffffffffffe0000100000000000000010000000000000001001ffff800000001

Note the output is quite different, the most significant (1ffff) and least significant (001ffff800000001) words agree, but the all the rest differ.

I suspect it's one of the new and somewhat obscure instructions adox, adcx or mulx that isn't handled correctly.
Comment 2 Niels Möller 2018-02-12 20:52:00 UTC
Created attachment 110579 [details]
Disassembly of gmp's mul_basecase.o, for the broadwell cpu family.
Comment 3 Niels Möller 2018-02-13 19:58:56 UTC
In the mean time, a more self-contained example was posted by the GMP author to https://gmplib.org/list-archives/gmp-devel/2018-February/004728.html, I'm copying here for easy reference. To reproduce:

$ gcc valgrind-nisse.c bwl_mul_basecase.s 
$ ./a.out 
expected: ffffffffffffefff ffe0000800100000 ffff800000000dff 007ffffffff00000 001ffff800000001
     got: ffffffffffffefff ffe0000800100000 ffff800000000dff 007ffffffff00000 001ffff800000001
$ valgrind ./a.out 
==27485== Memcheck, a memory error detector
==27485== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==27485== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==27485== Command: ./a.out
==27485== 
expected: ffffffffffffefff ffe0000800100000 ffff800000000dff 007ffffffff00000 001ffff800000001
     got: ffffffffffffefff 0000000000100fff ffffffffffefffff 0000000000000001 001ffff800000001
==27485== 
==27485== HEAP SUMMARY:
==27485==     in use at exit: 0 bytes in 0 blocks
==27485==   total heap usage: 1 allocs, 1 frees, 1,024 bytes allocated
==27485== 
==27485== All heap blocks were freed -- no leaks are possible
==27485== 
==27485== For counts of detected and suppressed errors, rerun with: -v
==27485== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

I'll upload the two source files as attachments in a moment.
Comment 4 Niels Möller 2018-02-13 20:00:54 UTC
Created attachment 110628 [details]
GMP assembly mul_basecase.asm after m4 preprocessing
Comment 5 Niels Möller 2018-02-13 20:01:59 UTC
Created attachment 110629 [details]
Main program calling assembly mul_basecase.
Comment 6 Julian Seward 2018-02-14 16:43:45 UTC
Reproduced.  Thank you very much to whoever reduced the testcase.
Comment 7 Julian Seward 2018-02-14 18:00:29 UTC
It seems likely to me, from looking at the implementation, that
V does not correctly preserve the OSZAP flags after ADCX, nor the
SZACP flags after ADOX.  At least, that's my current theory.
I'll try to hack up a fix.
Comment 8 Niels Möller 2018-02-15 12:46:52 UTC
The large errors are hard to explain just from incorrect carries applied by adcx and adox (unless internal representation of O and C flag can somehow get values far beyond {0,1}).

One hypothesis might be that mulx incorrectly clobbers the Z flag. Then the initial logic for loop startup (the two je instructions) would branch incorrectly.
Comment 9 Julian Seward 2018-02-15 12:52:54 UTC
On further analysis and testing, all 3 insns -- adcx, adox and mulx --
appear to be correctly implemented.  So now I'm even more mystified.
Comment 10 Julian Seward 2018-02-15 13:42:35 UTC
Argh.  V doesn't actually write the computed result to the
destination register in the case where one of the sources is
a memory operand, for adox and adcx.  Eg

  adoxq 48(%rdi),%r12

%r12 is never written :-)
Comment 11 Julian Seward 2018-02-15 13:49:48 UTC
Niels, the patch below makes your reduced testcase run correctly.
Can you test it more widely on GMP and let me know if it is OK?
Then I'll land it.

$ git diff -U8
diff --git a/VEX/priv/guest_amd64_toIR.c b/VEX/priv/guest_amd64_toIR.c
index d00f453..f462030 100644
--- a/VEX/priv/guest_amd64_toIR.c
+++ b/VEX/priv/guest_amd64_toIR.c
@@ -3075,22 +3075,22 @@ ULong dis_op2_E_G ( const VexAbiInfo* vbi,
          putIRegG(size, pfx, rm, mkexpr(dst1));
       } else
       if (op8 == Iop_Sub8 && flag == WithFlagCarry) {
          helper_SBB( size, dst1, dst0, src,
                      /*no store*/IRTemp_INVALID, IRTemp_INVALID, 0 );
          putIRegG(size, pfx, rm, mkexpr(dst1));
       } else
       if (op8 == Iop_Add8 && flag == WithFlagCarryX) {
-         /* normal store */
          helper_ADCX_ADOX( True/*isADCX*/, size, dst1, dst0, src );
+         putIRegG(size, pfx, rm, mkexpr(dst1));
       } else
       if (op8 == Iop_Add8 && flag == WithFlagOverX) {
-         /* normal store */
          helper_ADCX_ADOX( False/*!isADCX*/, size, dst1, dst0, src );
+         putIRegG(size, pfx, rm, mkexpr(dst1));
       } else {
          assign( dst1, binop(mkSizedOp(ty,op8), mkexpr(dst0), mkexpr(src)) );
          if (isAddSub(op8))
             setFlags_DEP1_DEP2(op8, dst0, src, ty);
          else
             setFlags_DEP1(op8, dst1, ty);
          if (keep)
             putIRegG(size, pfx, rm, mkexpr(dst1));
Comment 12 Niels Möller 2018-02-15 14:55:39 UTC
I've never built valgrind from source before, but I'll give it a try, maybe already this evening.
Comment 13 Julian Seward 2018-02-15 15:31:28 UTC
(In reply to Niels Möller from comment #12)

git clone git://sourceware.org/git/valgrind.git trunk
cd trunk
./autogen.sh && ./configure --prefix=`pwd`/Inst && make -j8 && make -j8 install
./Inst/bin/valgrind --version
Comment 14 jacobly.alt 2018-02-15 19:24:20 UTC
I added mem,reg and reg,mem tests to test-amd64 and without the patch ad?x? mem,reg instructions fail as expected, and with the patch I only get failures for unrelated instructions that ignore their assembly operands, which sounds more like a tester bug than a VEX one.
Comment 15 Niels Möller 2018-02-15 19:35:28 UTC
Patch works fine for me. 

I haven't been able to run the complete gmp testsuite under valgrind (some years ago, automake supported that with "make check TESTS_ENVIRONMENT=valgrind", but now that seems to only run the shell under valgrind, not the programs under test. Any hints on how to do that appreciated).

But I've run the mini-gmp tests (which compares results computed by gmp and mini-gmp) under valgrind, and gmp's t-mul test program. Both fail without the patch, and succeeed after the patch is applied.
Comment 16 Julian Seward 2018-02-15 21:47:28 UTC
(In reply to jacobly.alt from comment #14)
> I added mem,reg and reg,mem tests to test-amd64 [..]

Great!  Can you pls make the testcase diffs available?
Comment 17 Julian Seward 2018-02-15 21:48:12 UTC
(In reply to Niels Möller from comment #15)
> But I've run the mini-gmp tests [..]

Good, thanks.  That's good enough for me.  I'll land the fix.
Comment 18 jacobly.alt 2018-02-16 02:04:13 UTC
Created attachment 110697 [details]
Test mem variants of instructions in test-amd64.

I think this triggers a latent bug in the test code with string instructions (I'm guessing due to the fact that assemblers will accept operands for them, but ignore them since there's only one possible encoding to output, which just happens to match up with how parameters are passed to functions), but it certainly tests more arithmetic code paths in VEX.
Comment 19 Julian Seward 2018-02-16 07:06:36 UTC
Fix committed, 6ae2edea014669d8082747f0f268e9404e0fd296.
Comment 20 Julian Seward 2018-02-16 13:33:39 UTC
(In reply to jacobly.alt from comment #18)
> Created attachment 110697 [details]
> Test mem variants of instructions in test-amd64.

Is this a patch relative to the git trunk, or a revised version of
the original test patch?  I am unclear.
Comment 21 jacobly.alt 2018-02-16 15:27:53 UTC
Created attachment 110721 [details]
Test mem variants of instructions in test-amd64.

Sorry, I had not noticed the switch to git.  Here's a git patch against git trunk.  The patch also includes the test changes from the bug 360415 patch which added tests for ad?x.