Bug 417993 - vbit-test fail on s390x with Iop_Add32: spurious dependency on uninitialised value
Summary: vbit-test fail on s390x with Iop_Add32: spurious dependency on uninitialised ...
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-02-21 13:36 UTC by Andreas Arnez
Modified: 2023-10-30 22:15 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Disable And1/Or1 testing in vbit-test (2.51 KB, patch)
2020-02-21 14:39 UTC, Andreas Arnez
Details
This might fix the problem and it's the solution that I prefer. (840 bytes, patch)
2023-10-28 03:38 UTC, Eyal
Details
This might fix it, too, but I prefer the other one. (6.13 KB, patch)
2023-10-28 03:40 UTC, Eyal
Details
A test case to force this error even on amd64. (2.54 KB, patch)
2023-10-28 16:56 UTC, Eyal
Details
Clear just the vbits of the values of the operands after the test is done. (854 bytes, patch)
2023-10-30 16:50 UTC, Eyal
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Arnez 2020-02-21 13:36:19 UTC
This is seen on s390x with the test case in `memcheck/tests/vbit-test':
┌────
│ valgrind -q --expensive-definedness-checks=yes ./vbit-test
│ ==113353== Conditional jump or move depends on uninitialised value(s)
│ ==113353==    at 0x10012A8: check_result_for_binary (binary.c:372)
│ ==113353==    by 0x10025A7: test_binary_op (binary.c:683)
│ ==113353==    by 0x1000B35: main (main.c:192)
└────
The uninitialised value results from the calculation in
int_add_or_sub_vbits().  In the failing case the function receives two
32-bit unsigned integers as arguments, the first of which equals zero,
but has its least significant bit undefined, and the second value equals
0xffffffff.

One step of the calculation determines the value `a_min' as
┌────
│ a_min = aa & ~vaa;
└────
where `aa' has the lowest bit undefined and `vaa' has exactly the lowest
bit set.  However, the compiler transforms this to
┌────
│ a_min = (aa & vaa) ^ aa;
└────
After that, Valgrind's memcheck considers the lowest bit of `a_min'
undefined as well.

NOTE: In order to run the test case, I temporarily disabled Iop_Or1 and Iop_And1 for s390x in the `irops' array in irops.c.
Comment 1 Andreas Arnez 2020-02-21 14:39:38 UTC
Created attachment 126262 [details]
Disable And1/Or1 testing in vbit-test

Just for completeness, this is the patch used for disabling the And1/Or1 testing in vbit-test.  Without this patch, the test case still fails as described above, but then also runs into this:

vex: the `impossible' happened:
   sizeofIRType
Comment 2 Mark Wielaard 2020-02-21 14:59:18 UTC
Since s390x is also big endian maybe this is related to bug #417238 which fixes an issue on mips64 BE and ppc64[be]?
Comment 3 Andreas Arnez 2020-02-21 15:18:36 UTC
(In reply to Mark Wielaard from comment #2)
> Since s390x is also big endian maybe this is related to bug #417238 which
> fixes an issue on mips64 BE and ppc64[be]?
Thanks for pointing to this bug; I haven't seen that yet.
I've retested with the fix from that Bug applied, and the false positive still remains.  As far as I can tell, these two problems don't appear to be related.

(On the plus side, the patch *does* fix another failure I've seen with vbit-test that occurs when adding s390x support for And1/Or1 and enabling their testing in vbit-test.)
Comment 4 Eyal 2023-10-28 00:04:04 UTC
For each operand, we run multiple tests with various settings of value and vbits.  A test ought to look like this:

1. Create variables aa and bb and vaa and vbb.
2. Set the vbits of aa and bb to be vaa and vbb, respectively.
3. Run that operand on aa and bb.
4. Extract the vbits that valgrind computed on the result.

And for a check, we do this:
1. Run a helper function whose input is aa,bb,vaa, and vbb.
2. That helper function computes the vbits that we expect in the output of the operand.

Finally, we compare the expected vbits against what valgrind came up with.

In the function valgrind_execute_test, we set the vbits of aa and bb before we run the operand.  That part is fine.

But when we run the helper with the inputs aa,bb,vaa,vbb, there is no uncertainty there.  We have, in fact, extracted the uncertainty from aa and bb into vaa and vbb and now, aa and bb have no uncertainty.  They should have vbits 0, both of them.  It's now up to the helper function to use the definite values of aa,bb,vaa, and vbb to compute the definite expected vbits of the operand.

It would be easy to check.  Just add a loop over the operands into check_result_for_binary that does this:

   opnd->vbits.bits = 0;
   valgrind_set_vbits(opnd);

Maybe like 5 lines of code total?  I don't have the right platform to test this out, though...  If I make a patch, would someone like to test it?

Eyal
Comment 5 Eyal 2023-10-28 03:34:10 UTC
Like I said before: As part of the tests, some vbits get set.  However, we need to clear them again before we compute the expectation or else the expectation *itself* will have vbits set in it.  The expectation should have none.

I'm unable to reproduce this myself but maybe someone else can test it?  Thanks.
Comment 6 Eyal 2023-10-28 03:38:36 UTC
Created attachment 162607 [details]
This might fix the problem and it's the solution that I prefer.

This might fix the problem and it's the solution that I prefer.

After we set the vbits in order to run the test, reset them right after.
Comment 7 Eyal 2023-10-28 03:40:12 UTC
Created attachment 162608 [details]
This might fix it, too, but I prefer the other one.

We refactor the code so that the computation is done before the vbits are set.  And then we clear out all the vbits before we do the comparison.
Comment 8 Eyal 2023-10-28 16:54:52 UTC
Drawing up the truth table of those two versions of calculating a_min, I find that only the case when aa is unknown and vaa is 1 make any difference.  The output in the version aa & ~vaa return 0 is aa==x and vaa==1.  But for (aa&vaa)^aa, it gets unknown.  Note that, if vaa is 1, the expression simplifies to (aa&1)^aa, which is just aa^aa, which is always 0 but the way that valgrind works, it can't recognize that the x on the left and the x on the right are the same x so its failing.  We could not fix this without a major reworking of valgrind!

I tried to reproduce this optimization by modifying the calculation in the code but it didn't make a difference.  I suspect that the compiler is undoing the change as part of optimization.  So I forced my computer to do it by changing this:

value & ~vbits

to:

vbits_xor(vbits_and(value, vbits), value);

and putting those files into separate compilations units, in files by themselves.  This forces the code to run the instructions exactly as I specify.

I definitely wouldn't suggest doing this long term but, for testing purposes, you can try it out.  I have attached a patch that you can try.  I've added new files so don't forget to run autogen.sh and configure again.)  Now when I run this:

make -j 20 check && valgrind   -q --expensive-definedness-checks=yes memcheck/tests/vbit-test/vbit-test

I get this every time:

==586817== Conditional jump or move depends on uninitialised value(s)
==586817==    at 0x110D4E: check_result_for_binary (binary.c:372)
==586817==    by 0x1126B7: test_binary_op (binary.c:683)
==586817==    by 0x110374: main (main.c:192)
==586817== 

Either of the patches that I have attached solve the problem so I feel confident that this will fix Andreas' problem.  I much prefer the 3 line solution to the other one so I will deprecate that other solution.
Comment 9 Eyal 2023-10-28 16:56:40 UTC
Created attachment 162619 [details]
A test case to force this error even on amd64.

This is just for testing purposes.  Don't put this into the repo.
Comment 10 Eyal 2023-10-28 16:57:11 UTC
Comment on attachment 162608 [details]
This might fix it, too, but I prefer the other one.

This one is much more convoluted and unnecessary.  Just use the other one.
Comment 11 Eyal 2023-10-28 16:58:12 UTC
Comment on attachment 162608 [details]
This might fix it, too, but I prefer the other one.

This one is bigger and the other one works just as well so don't use this one.
Comment 12 Mark Wielaard 2023-10-30 15:01:22 UTC
I think your analysis is correct. We already seem to do this when printing out a complaint about a real mismatch.
See memcheck/tests/vbits-test/util.c (print_opnd): 

   /* The value itself might be partially or fully undefined, so take a         
      copy, paint the copy as defined, and print the copied value.  This is     
      so as to avoid error messages from Memcheck, which are correct, but       
      confusing. */                                                             

And testing your 3 line fix on s390x it does seem to work as expected:

$ perl tests/vg_regtest memcheck/tests/vbit-test/vbit-test.vgtest 
vbit-test:       valgrind   -q --expensive-definedness-checks=yes ./vbit-test 

== 1 test, 0 stderr failures, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures ==

I think my only question is whether we should "clear" all of data or just the value fields of the result and opnds[] of the test_data_t?
I guess it doesn't matter whether the rest of the test_data_t struct values are cleared too, but it might "hide" other/real bugs?
Comment 13 Eyal 2023-10-30 16:49:47 UTC
> I think my only question is whether we should "clear" all of data or just
> the value fields of the result and opnds[] of the test_data_t?
> I guess it doesn't matter whether the rest of the test_data_t struct values
> are cleared too, but it might "hide" other/real bugs?

I agree.  Let's clear just the vbits in the values in a loop.  I didn't know about the macro VALGRIND_MAKE_MEM_DEFINED so I'll use that one.

I see that print_opnd is called sometimes before vbits get cleared by my new code so the clearing of vbits in print_opnd must remain in addition to this new one.

I'll attach the new patch.
Comment 14 Eyal 2023-10-30 16:50:49 UTC
Created attachment 162732 [details]
Clear just the vbits of the values of the operands after the test is done.

This fixes the problem as tested by the test case patch on amd64_x86
Comment 15 Mark Wielaard 2023-10-30 22:15:16 UTC
Very nice, this also fixes the s390x case. Pushed as:

commit 0625fee9e49ebfeba0c805b00f7428e0a40ec75a
Author: Eyal Soha <eyalsoha@gmail.com>
Date:   Mon Oct 30 10:46:38 2023 -0600

    Clear vbits after the test is done.
    
    https://bugs.kde.org/show_bug.cgi?id=417993