Bug 417187 - [MIPS] Conditional branch problem since 'grail' changes
Summary: [MIPS] Conditional branch problem since 'grail' changes
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.15 SVN
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-02-05 11:15 UTC by Stefan Maksimovic
Modified: 2020-04-17 18:18 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
enable_grail_mips.diff (2.80 KB, text/plain)
2020-02-05 11:15 UTC, Stefan Maksimovic
Details
test program log without grail changes (308.69 KB, text/plain)
2020-02-05 11:17 UTC, Stefan Maksimovic
Details
test program log with grail changes (239.70 KB, text/plain)
2020-02-05 11:18 UTC, Stefan Maksimovic
Details
test program log without grail changes (308.69 KB, application/octet-stream)
2020-02-05 11:20 UTC, Stefan Maksimovic
Details
test program log with grail changes (71.38 KB, text/plain)
2020-02-05 11:22 UTC, Stefan Maksimovic
Details
Newly proposed solution pt 1 (3.66 KB, text/plain)
2020-02-20 12:03 UTC, Stefan Maksimovic
Details
Newly proposed solution pt 2 (2.68 KB, text/plain)
2020-02-20 12:05 UTC, Stefan Maksimovic
Details
cdebug_zlib logs (482.69 KB, application/octet-stream)
2020-02-20 13:59 UTC, Stefan Maksimovic
Details
cdebug_zlib main fn logs, w/ and w/o frame ptr used (2.21 KB, application/octet-stream)
2020-02-26 16:08 UTC, Stefan Maksimovic
Details
branch special case patch (2.20 KB, text/plain)
2020-03-02 14:12 UTC, Stefan Maksimovic
Details
handle branches in delay slots (1.25 KB, text/plain)
2020-03-12 17:01 UTC, Stefan Maksimovic
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Maksimovic 2020-02-05 11:15:19 UTC
Created attachment 125689 [details]
enable_grail_mips.diff

The problem we're having is present since the recent changes from the 'grail' branch have been applied to master.
It can be reproduced on a simple program (consists just of the main function that has a return 42; statement).
More specifically, it happens even before main is encountered (in the _dl_aux_init function).

The attached debug output depicts inconsistent IR which causes the failure:

t5 = CmpEQ32(t4,0x0:I32)
...
PUT(136) = t5

The number 136 corresponds with the PC register.
Valgrind tries to write the result of the compare operation into the PC register.

The following is a different observation which may or may not have to do with the problem above:

It has to do with conditional branches. Namely, upon hitting such a branch three instructions are analysed for each of those.
The problem seems to be having a branch instruction as the last one of those three.
Since branch instructions on MIPS have delay slots, the delay slot instruction needs to be analysed before the jump kind can be set
to the branch instruction preceding it.
As it stands that doesn't happen and the jump instruction gets assigned a jump kind of Ijk_Boring instead of Ijk_Call.
See the fall through case after superblock 4 in the attached debug output.
Comment 1 Stefan Maksimovic 2020-02-05 11:17:18 UTC
Created attachment 125690 [details]
test program log without grail changes
Comment 2 Stefan Maksimovic 2020-02-05 11:18:05 UTC
Created attachment 125691 [details]
test program log with grail changes
Comment 3 Stefan Maksimovic 2020-02-05 11:20:37 UTC
Created attachment 125692 [details]
test program log without grail changes
Comment 4 Stefan Maksimovic 2020-02-05 11:22:30 UTC
Created attachment 125693 [details]
test program log with grail changes
Comment 5 Julian Seward 2020-02-06 07:57:34 UTC
(In reply to Stefan Maksimovic from comment #0)
> Created attachment 125689 [details]
> enable_grail_mips.diff

Just as a tiny comment on this patch, and not related to the main problem here:
the patch looks reasonable, except for this:

--- a/VEX/priv/ir_defs.c
+++ b/VEX/priv/ir_defs.c
@@ -5285,6 +5285,7 @@ Bool eqIRRegArray ( const IRRegArray* descr1, const IRRegArray* descr2 )
 Int sizeofIRType ( IRType ty )
 {
    switch (ty) {
+      case Ity_I1:
       case Ity_I8:   return 1;
       case Ity_I16:  return 2;
       case Ity_I32:  return 4;

That just can't be right.  To ask for the byte size for Ity_I1 is more or
less meaningless -- it is (conceptually) a single bit.   Whatever you
commit, please don't commit this change.  Why is this necessary?  Maybe
we can find a different solution to whatever problem this solves.
Comment 6 Julian Seward 2020-02-06 08:19:52 UTC
(In reply to Stefan Maksimovic from comment #4)
> Created attachment 125693 [details]
> test program log with grail changes

Thank you for the attachments.  Let me see if I understand what you are
saying. In this log file, for SB #4, the front end logic analysed three
sequences:

  0x4013BC: 0x8C43FFFC  lw r3, 65532(r2)
  0x4013C0: 0x1460FFFE  bne r3, r0, 65534
  0x4013C4: 0x24420004  addiu r2, r2, 4

This is OK.  This is what you could call the "initial" sequence for the SB.
As I see it, it correctly contains the terminating branch (the bne) and the
delay slot (addiu).  Yes?

Next we have

  0x4013BC: 0x8C43FFFC  lw r3, 65532(r2)
  0x4013C0: 0x1460FFFE  bne r3, r0, 65534
  0x4013C4: 0x24420004  addiu r2, r2, 4

That's pretty strange, because it's a repeat of the entry block, but maybe
it's because the insn at 0x4013C0 branches back to the start of the block
(0x4013BC) (so this is a word-search loop?)

In this case, this is a speculative ("SPEC") disassembly.  Both the branch
and delay slot are analysed, but that is just luck -- in that the delay slot
just happens to fall within the 3-instruction limit.

So anyway, ignoring that, we have, finally, a second speculative disassembly:

  0x4013C8: 0x2442FFFC  addiu r2, r2, 65532
  0x4013CC: 0x8F9981B0  lw r25, 33200(r28)
  0x4013D0: 0x0320F809  jalr r31 r25

This is obviously wrong, because the delay slot (which would be at 0x4013D4)
is missing.

Do I understand all that correctly?
Comment 7 Stefan Maksimovic 2020-02-06 15:20:47 UTC
Thank you for your reply, Julian.

We pretty much agree with your analysis of the initial SB sequence as well as
the first and second speculative disassembly.

A note for the first speculative disassembly: you guessed correctly,
the branch at 0x4013C0 does jump back to 0x4013BC.

To be a bit more clear, the 16 bit offset of the bne instruction (0xFFFE signed, in this case) is left shifted 2 bits and added to the address following the branch, forming the target address of the jump.
Comment 8 Stefan Maksimovic 2020-02-06 15:25:59 UTC
(In reply to Stefan Maksimovic from comment #7)
> Thank you for your reply, Julian.
> 
> We pretty much agree with your analysis of the initial SB sequence as well as
> the first and second speculative disassembly.
> 
> A note for the first speculative disassembly: you guessed correctly,
> the branch at 0x4013C0 does jump back to 0x4013BC.
> 
> To be a bit more clear, the 16 bit offset of the bne instruction (0xFFFE
> signed, in this case) is left shifted 2 bits and added to the address
> following the branch, forming the target address of the jump.

Quoting the instruction set reference, as I may have provided erroneous information about the bne instruction above:
 
"An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following
the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address."
Comment 9 Julian Seward 2020-02-06 17:11:33 UTC
(In reply to Stefan Maksimovic from comment #8)
> > We pretty much agree with your analysis [..]

Good!

So .. last Sunday, in discussion at Fosdem, one of the three of you
(not sure who) suggested that this might be fixed as follows
(at least, this is what I interpreted the idea to mean):

in disInstr_MIPS, if the insn to be disassembled is a branch,
then *also* disassemble the following instruction at the same time,
hence providing IR for both instructions together.  In effect this
would mean that disInstr_MIPS usually "moves forward" 4 bytes, but
sometimes 8 bytes.

Does that make any sense?  Would that help?  Right now I can't
think of any other fix.

What happens if the instruction in a delay slot is itself a branch,
though?  Is that allowed?  Is it a situation you need to handle?
Comment 10 Stefan Maksimovic 2020-02-20 12:03:49 UTC
Created attachment 126199 [details]
Newly proposed solution pt 1
Comment 11 Stefan Maksimovic 2020-02-20 12:05:11 UTC
Created attachment 126200 [details]
Newly proposed solution pt 2
Comment 12 Stefan Maksimovic 2020-02-20 13:59:44 UTC
Created attachment 126210 [details]
cdebug_zlib logs

An update regarding the proposed approach in the last comment:

We've modified the disInstr_MIPS_WRK function to recursively call itself
in case of a branch instruction.
This has proven itself to be effective, as valgrind now does speculate on
conditional branches which can be seen in the debug output (after the 
[proposed solution pt 2] has been applied).
The branches and their delay slots get nicely bundled in a single
8-byte instruction.
This comprises the changes in the [proposed solution pt 1].

A few problems exist though, as there are a couple of tests failing across our
test machines. They appear after the [proposed solution pt 2] patch
has been applied, which enables the speculative execution.

As an example of that, we can take a look at memcheck/tests/cdebug_zlib (its
source, cdebug.c):
int main() {
   int x;
   if (x) return 1;
   return 0;
}

In addition to the existing output in its stderr file, we now have an additional
message:
Syscall param exit_group(status) contains uninitialised byte(s)
   at 0x491A568: _Exit (_exit.c:32)
   by 0x48A0E58: __run_exit_handlers (exit.c:97)
   by 0x48A0ED8: exit (exit.c:104)
   by 0x4881800: (below main) (libc-start.c:321)

We've analysed this for some time, but haven't been able to get to the bottom of
where the new error originates from, nor do we observe any other architectures
experiencing this kind of failure.

The attached file includes the debug output of the cdebug_zlib test runs.

Any thoughts on this?
Comment 13 Stefan Maksimovic 2020-02-26 16:08:37 UTC
Created attachment 126427 [details]
cdebug_zlib main fn logs, w/ and w/o frame ptr used

Another observation:

Modifying the main function of memcheck/tests/cdebug.c so as to make the
SPEC side exit block differ from the original results in valgrind not doing the
&&-transform which in turn does not produce the extra error mentioned in the
previous comment.

That is because the new side exit block terminates with a
PUT($pc) which gets a non-constant value, is that correct?

There are two files in the attachment, the shorter one differs from the original
in being compiled with -fomit-frame-pointer.
I've only left the debug output of the main function, the rest should be the same with
only one difference which is the aforementioned error that appears right after the _Exit
function:

Syscall param exit_group(status) contains uninitialised byte(s)
   at 0x4919B38: _Exit (_exit.c:31)
   by 0x489AEA0: __run_exit_handlers (exit.c:98)
   by 0x489AF4C: exit (exit.c:105)
   by 0x48804C4: (below main) (libc-start.c:325)
 Uninitialised value was created by a stack allocation
   at 0x4005F0: main (cdebug.c:1)


Shouldn't the _Exit function receive either 0 or 1 (coming from the return statements)
despite the fact that the variable x is uninitialized?
Does the transformation somehow propagate x to the _Exit function or does something entirely
different happen instead?
Comment 14 Stefan Maksimovic 2020-03-02 14:12:33 UTC
Created attachment 126546 [details]
branch special case patch

Another update:

Analyzing the mips and the x86 debug output side by side we noticed the following:

The fallthrough blocks in the main function differ by x86 having an unconditional jump,
which when translated to IR results in a single PUT(PC) instruction.

Mips on the other hand uses a conditional branch instruction with parameters set so as to
always take one of the two paths, making the jump effectively unconditional.
By using a conditional branch a conditional PUT(PC) was generated in the IR which enabled
the &&-transform to take place, resulting in the outcome described in previous comments.

Attached patch modifies this special case of the branch instruction by not generating
conditional PUTs to PC. Having applied this change the &&-transform does not take place
in the main function, much like in the x86 case.

This fixes a number of failing tests on our side.
Comment 15 Stefan Maksimovic 2020-03-10 15:13:49 UTC
We tested these changes a couple of days ago on a number of mips32/mips64 development boards and found no regressions present as of now.
Comment 16 Julian Seward 2020-03-11 09:42:53 UTC
Hi Stefan,

Thank you for the analyses, patches and testing.  Overall, your
solutions look good to me.  I have just one question, regarding
this patch:

  Subject: [PATCH 1/2] mips: Treat delay slot as part of previous instruction

  Do so by recursively calling disInstr_MIPS_WRK() if the
  instruction currently being disassembled is a branch/jump,
  effectively combining them into one IR instruction.

Is there any danger of this being incorrect even in pathological
cases (eg, the branch delay slot itself holding a branch, whose
delay slot also holds a branch, etc? -- or something like that?)

I mean .. is there any way that this could possibly fail, given the
strangest, most artificial code sequence?  Or is it always safe?

Also I was a bit concerned about the recursive use of disInstr_MIPS_WRK.
Closely related to the previous question .. is there any way to cause
infinite, or at least arbitrarily deep, recursion?
Comment 17 Stefan Maksimovic 2020-03-12 17:01:50 UTC
Created attachment 126749 [details]
handle branches in delay slots

We've taken a second look at the possible scenarios you described and updated the current changes so as to cover the special cases which could have possibly caused disInstr_MIPS_WRK to go into an arbitrarily deep recursion.

The result is that valgrind now mimics the behaviour of a program by emitting a SIGILL in cases where the program ran by itself also does.
Tested it on a contrived example (by having consecutive branches in the delay slots of those above) and managed to get a matching behaviour between valgrind and a standalone run of the example.
Comment 18 Julian Seward 2020-03-13 07:04:36 UTC
(In reply to Stefan Maksimovic from comment #17)
> Created attachment 126749 [details]
> handle branches in delay slots
>
> The result is that valgrind now mimics the behaviour of a program by
> emitting a SIGILL in cases where the program ran by itself also does.

That's great!  So I think we're ready to land now.  Can you land it, or
should I?  If me, is it all of the 4 attached patches?  Or are some of
them obsolete?
Comment 19 Stefan Maksimovic 2020-03-13 16:29:39 UTC
We'll land the changes if you don't mind.

One question though, do you happen to know the exact date of the code freeze?
In order for us to know by which date we need to commit the patches.
Comment 20 Julian Seward 2020-03-18 16:16:36 UTC
(In reply to Stefan Maksimovic from comment #19)
> We'll land the changes if you don't mind.
> 
> One question though, do you happen to know the exact date of the code freeze?
> In order for us to know by which date we need to commit the patches.

I don't know the date now, no.  I would suggest you land them as soon
as you feel they are stable enough, so that others can test them.  Then,
if there are stability problems after landing, we can fix or in the 
worst case back out, as necessary.  Does that work for you?
Comment 21 Mark Wielaard 2020-04-16 09:47:01 UTC
Looks like the new code freeze depends on these patches landing first. Could you please commit them as soon as possible?
Comment 22 Petar Jovanovic 2020-04-17 18:18:06 UTC
The patches have been committed. Thanks everyone.