Bug 472329 - std::optional use: "Conditional jump or move depends on uninitialised value(s)" (g++)
Summary: std::optional use: "Conditional jump or move depends on uninitialised value(s...
Status: CONFIRMED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.22.0
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Paul Floyd
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-07-17 15:06 UTC by m
Modified: 2024-04-16 14:34 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description m 2023-07-17 15:06:41 UTC
Steps to reproduce with the cpp file:

```cpp
#include <cstdint>
#include <iostream>
#include <map>
#include <optional>
#include <string>
void foo(std::optional<int64_t> f) {
  std::cout << (f ? *f : 0) << std::endl;
  static std::map<bool, std::pair<int64_t, int64_t>> test{[] {
    std::map<bool, std::pair<int64_t, int64_t>> test;
    test.emplace(true, std::pair{0, 0});
    return test;
  }()};
  auto range{test.find(true)};
  while (range != test.end()) {
    auto [a, b]{range->second};
    if (f && *f != a) {
      range++;
      continue;
    }
    std::cout << (a ? std::string{} : std::string{""}) << std::endl;
    range++;
  }
}
```

And then compilation with g++ (clang++ doesn't reproduce):

```
g++ /tmp/a.cpp -O1 -g   -o /tmp/exe && valgrind --quiet /tmp/exe
```

(g++-12.3 needs -O2, and g++-13 needs -O1)

Output:

```
0
==2753084== Conditional jump or move depends on uninitialised value(s)
==2753084==    at 0x4014CB: foo(std::optional<long>) (a.cpp:16)
==2753084==    by 0x401542: main (a.cpp:24)
==2753084==
Comment 1 m 2023-07-17 15:10:05 UTC
Apologies, the cpp was missing the last line:

````
int main() { foo(std::nullopt); }
```
Comment 2 Paul Floyd 2023-07-22 17:50:02 UTC
Not easy.

Here is the assembler

# bug472329.cpp:15:     auto [a, b]{range->second};
	.loc 8 15 30 is_stmt 0 discriminator 1 view .LVU406
	movq	40(%rbx), %rax	# MEM <type> [(struct pair *)range$_M_node_264 + 40B], SR.252
.LVL136:
	.loc 8 16 5 is_stmt 1 view .LVU407
# bug472329.cpp:16:     if (f && *f != a) {
	.loc 8 16 11 is_stmt 0 discriminator 1 view .LVU408

Error on next line:

	cmpq	%r14, %rax	# f, SR.252
	je	.L47	#,
	.loc 8 16 11 discriminator 1 view .LVU409
	cmpb	$0, -97(%rbp)	#, %sfp
	jne	.L69	#,


The corresponding C++

test is a map of bool to pair of int64_t, initialized by a rather contorted lambda (looks like test code!).
The values are true : {0, 0}

  auto range{test.find(true)};
  while (range != test.end()) {
    auto [a, b]{range->second};
    if (f && *f != a) {

range is an std:map::iterator and since it's initialized with the result of a find for true which is in the map. So the iterator should refer to that one element.

Next covariants a and b get declared and initialized from the value referred to by the iterator range. The value is {0, 0} so both a and b are initialized to zero.

Next there is a check that the variant contains something and if so the contents are compared to a. Phew!

Back to the assembler. It seems to me that this is performing the tests in the wrong order. That is

*f != a

then

f

llvm generates

	.loc	18 321 22                       # include/c++/v1/optional:321:22
	testb	%bl, %bl
.Ltmp50:
	.loc	17 16 11                        # bug472329.cpp:16:11
	je	.LBB0_19
.Ltmp51:
# %bb.12:                               # %while.body
                                        #   in Loop: Header=BB0_11 Depth=1
	#DEBUG_VALUE: foo:f <- [DW_OP_constu 72, DW_OP_minus, DW_OP_LLVM_fragment 0 64] [$rbp+0]
	#DEBUG_VALUE: foo:f <- [DW_OP_LLVM_fragment 64 8] $ebx
	#DEBUG_VALUE: foo:range <- $r14
	#DEBUG_VALUE:  <- [DW_OP_LLVM_fragment 0 64] $rax
	cmpq	-72(%rbp), %rax                 # 8-byte Folded Reload
	je	.LBB0_19

The test of bl looks like checking f and the cmpq at the end comparing *f and a.

Back to GCC, it looks like

https://archive.fosdem.org/2020/schedule/event/debugging_memcheck_reloaded/attachments/slides/4174/export/events/attachments/debugging_memcheck_reloaded/slides/4174/memcheck_reloaded.pdf

Not sure what the problem is. Maybe the fact that the two operations don't use the same width, quadword and byte.
Comment 3 m 2023-07-24 16:03:03 UTC
I can reduce the test-case further, replacing the map with set, and dropping the lambda wrapper. However, further reductions easily cause the optimizer to eat all code before it gets compiled into the binary. Currently I have:

```cpp
#include <cstdint>
#include <iostream>
#include <optional>
#include <set>
#include <string>
void foo(std::optional<int64_t> f) {
  std::cout << (f ? *f : 0) << std::endl;
  std::set<std::pair<int64_t, int64_t>> test;
  test.emplace(std::pair{0, 0});
  auto it{test.begin()};
  while (it != test.end()) {
    int64_t b{it->second};
    if (f && *f != b) {
      it++;
      break;
    }
    std::cout << "";
    it++;
  }
}
int main() { foo(std::nullopt); }
```
Comment 4 m 2023-07-24 16:06:37 UTC
Ok, the pair can be replaced by int64_t, but that is it so far:

```cpp
#include <cstdint>
#include <iostream>
#include <optional>
#include <set>
#include <string>
void foo(std::optional<int64_t> f) {
  std::cout << (f ? *f : 0) << std::endl;
  std::set<int64_t> test;
  test.emplace(0);
  auto it{test.begin()};
  while (it != test.end()) {
    int64_t b{*it};
    if (f && *f != b) {
      it++;
      break;
    }
    std::cout << "";
    it++;
  }
}
int main() { foo(std::nullopt); }
```
Comment 5 Paul Floyd 2023-09-17 13:04:38 UTC
For easy reference this is the commit that I need to look at. In particular the FIXMEs.

Author: Julian Seward <jseward@acm.org>  2019-10-21 11:19:59
Committer: Julian Seward <jseward@acm.org>  2020-01-02 06:42:21
Parent: 740381f8ac28e2878de7b068611385ed01985478 (Update following recent bug-fix commits.)
Child:  eaf8385b5ec1f6bf0f884b07d58ddb3f25ee5245 (Clean up machinery to do with conditionalising IRStmts:)
Branches: debuginfo_load_option, freebsd, master, remotes/github/freebsd, remotes/origin/VALGRIND_3_16_BRANCH, remotes/origin/master, remotes/origin/users/mark/try-out-of-memory-reason, remotes/origin/users/mark/try-realloc-again, remotes/origin/users/mark/try-vgdb-multi, remotes/origin/users/njn/try-cachegrind-cl-reqs, remotes/origin/users/njn/try-rm-I
Follows: VALGRIND_3_15_0
Precedes: VALGRIND_3_16_0, VALGRIND_3_17_0

    Initial implementation of C-source-level &&-idiom recovery
    
    This branch contains code which avoids Memcheck false positives resulting from
    gcc and clang creating branches on uninitialised data.  For example:
    
       bool isClosed;
       if (src.isRect(..., &isClosed, ...) && isClosed) {
    
    clang9 -O2 compiles this as:
    
       callq  7e7cdc0 <_ZNK6SkPath6isRectEP6SkRectPbPNS_9DirectionE>
    
       cmpb   $0x0,-0x60(%rbp)  // "if (isClosed) { .."
       je     7ed9e08           // "je after"
    
       test   %al,%al           // "if (return value of call is nonzero) { .."
       je     7ed9e08           // "je after"
    
       ..
       after:
    
    That is, the && has been evaluated right-to-left.  This is a correct
    transformation if the compiler can prove that the call to |isRect| returns
    |false| along any path on which it does not write its out-parameter
    |&isClosed|.
    
    In general, for the lazy-semantics (L->R) C-source-level && operator, we have
    |A && B| == |B && A| if you can prove that |B| is |false| whenever A is
    undefined.  I assume that clang has some kind of interprocedural analysis that
    tells it that.  The compiler is further obliged to show that |B| won't trap,
    since it is now being evaluated speculatively, but that's no big deal to
    prove.
    
    A similar result holds, per de Morgan, for transformations involving the C
    language ||.
    
    Memcheck correctly handles bitwise &&/|| in the presence of undefined inputs.
    It has done so since the beginning.  However, it assumes that every
    conditional branch in the program is important -- any branch on uninitialised
    data is an error.  However, this idiom demonstrates otherwise.  It defeats
    Memcheck's existing &&/|| handling because the &&/|| is spread across two
    basic blocks, rather than being bitwise.
    
    This initial commit contains a complete initial implementation to fix that.
    The basic idea is to detect the && condition spread across two blocks, and
    transform it into a single block using bitwise &&.  Then Memcheck's existing
    accurate instrumentation of bitwise && will correctly handle it.  The
    transformation is
    
       <contents of basic block A>
       C1 = ...
       if (!C1) goto after
       .. falls through to ..
    
       <contents of basic block B>
       C2 = ...
       if (!C2) goto after
       .. falls through to ..
    
       after:
    
     ===>
    
       <contents of basic block A>
       C1 = ...
       <contents of basic block B, conditional on C1>
       C2 = ...
       if (!C1 && !C2) goto after
       .. falls through to ..
    
       after:
    
    This assumes that <contents of basic block B> can be conditionalised, at the
    IR level, so that the guest state is not modified if C1 is |false|.  That's
    not possible for all IRStmt kinds, but it is possible for a large enough
    subset to make this transformation feasible.
    
    There is no corresponding transformation that recovers an || condition,
    because, per de Morgan, that merely corresponds to swapping the side exits vs
    fallthoughs, and inverting the sense of the tests, and the pattern-recogniser
    as implemented checks all possible combinations already.
    
    The analysis and block-building is performed on the IR returned by the
    architecture specific front ends.  So they are almost not modified at all: in
    fact they are simplified because all logic related to chasing through
    unconditional and conditional branches has been removed from them, redone at
    the IR level, and centralised.
    
    The only file with big changes is the IRSB constructor logic,
    guest_generic_bb_to_IR.c (a.k.a the "trace builder").  This is a complete
    rewrite.
    
    There is some additional work for the IR optimiser (ir_opt.c), since that
    needs to do a quick initial simplification pass of the basic blocks, in order
    to reduce the number of different IR variants that the trace-builder has to
    pattern match on.  An important followup task is to further reduce this cost.
    
    There are two new IROps to support this: And1 and Or1, which both operate on
    Ity_I1.  They are regarded as evaluating both arguments, consistent with AndXX
    and OrXX for all other sizes.  It is possible to synthesise at the IR level by
    widening the value to Ity_I8 or above, doing bitwise And/Or, and re-narrowing
    it, but this gives inefficient code, so I chose to represent them directly.
    
    The transformation appears to work for amd64-linux.  In principle -- because
    it operates entirely at the IR level -- it should work for all targets,
    providing the initial pre-simplification pass can normalise the block ends
    into the required form.  That will no doubt require some tuning.  And1 and Or1
    will have to be implemented in all instruction selectors, but that's easy
    enough.
    
    Remaining FIXMEs in the code:
    
    * Rename `expr_is_speculatable` et al to `expr_is_conditionalisable`.  These
      functions merely conditionalise code; the speculation has already been done
      by gcc/clang.
    
    * `expr_is_speculatable`: properly check that Iex_Unop/Binop don't contain
      operatins that might trap (Div, Rem, etc).
    
    * `analyse_block_end`: recognise all block ends, and abort on ones that can't
      be recognised.  Needed to ensure we don't miss any cases.
    
    * maybe: guest_amd64_toIR.c: generate better code for And1/Or1
    
    * ir_opt.c, do_iropt_BB: remove the initial flattening pass since presimp
      will already have done it
    
    * ir_opt.c, do_minimal_initial_iropt_BB (a.k.a. presimp).  Make this as
      cheap as possible.  In particular, calling `cprop_BB_wrk` is total overkill
      since we only need copy propagation.
    
    * ir_opt.c: once the above is done, remove boolean parameter for `cprop_BB_wrk`.
    
    * ir_opt.c: concatenate_irsbs: maybe de-dup w.r.t. maybe_unroll_loop_BB.
    
    * remove option `guest_chase_cond` from VexControl (?).  It was never used.
    
    * convert option `guest_chase_thresh` from VexControl (?) into a Bool, since
    the revised code here only cares about the 0-vs-nonzero distinction now.
Comment 6 Paul Floyd 2023-09-17 18:22:34 UTC
Getting back to the problem

   if (f && *f != a) {

generates

	movq	40(%rbx), %rax	# MEM <type> [(struct pair *)range$_M_node_264 + 40B], SR.252
Error on next line:
	cmpq	%r14, %rax	# f, SR.252
	je	.L47	#,
	cmpb	$0, -97(%rbp)	#, %sfp
	jne	.L69	#,

g++ has reversed the order of the checks.

Valgrind's detection of this reversal first of all does some filtering of statements that preclude such ordering.

One such statement is the load, which can potentially trap. In this case, if I understand correctly, Valgrind sees the read from 40(%rbx), sees that as potentially trapping and thus precludes it from transforming the expression back to a bit-and.

So two questions:
Why is GCC going this transfomation.
If the transformation is valid, how to add 'safe' loads to the guarded expression / bit-and transformation of VEX.
Comment 7 Paul Floyd 2023-09-18 14:57:51 UTC
After discussion on the gcc mailing list, because f is on the stack the compiler can see that dereferencing the stack memory is safe and so it allows reordering.

Valgrind assumes that all memory dereferences are potentially trapping and so does not try to perform the idiom recovery.

So for the moment I'm stumped.
Comment 8 Mark Wielaard 2024-04-16 14:34:00 UTC
I can replicate with the code of comment #4 and g++ 13.2.1 or 14.0.1 (prerelease) but only with -O1.
-O0  and -O2 don't show the issue.