Created attachment 125689 [details] enable_grail_mips.diff The problem we're having is present since the recent changes from the 'grail' branch have been applied to master. It can be reproduced on a simple program (consists just of the main function that has a return 42; statement). More specifically, it happens even before main is encountered (in the _dl_aux_init function). The attached debug output depicts inconsistent IR which causes the failure: t5 = CmpEQ32(t4,0x0:I32) ... PUT(136) = t5 The number 136 corresponds with the PC register. Valgrind tries to write the result of the compare operation into the PC register. The following is a different observation which may or may not have to do with the problem above: It has to do with conditional branches. Namely, upon hitting such a branch three instructions are analysed for each of those. The problem seems to be having a branch instruction as the last one of those three. Since branch instructions on MIPS have delay slots, the delay slot instruction needs to be analysed before the jump kind can be set to the branch instruction preceding it. As it stands that doesn't happen and the jump instruction gets assigned a jump kind of Ijk_Boring instead of Ijk_Call. See the fall through case after superblock 4 in the attached debug output.
Created attachment 125690 [details] test program log without grail changes
Created attachment 125691 [details] test program log with grail changes
Created attachment 125692 [details] test program log without grail changes
Created attachment 125693 [details] test program log with grail changes
(In reply to Stefan Maksimovic from comment #0) > Created attachment 125689 [details] > enable_grail_mips.diff Just as a tiny comment on this patch, and not related to the main problem here: the patch looks reasonable, except for this: --- a/VEX/priv/ir_defs.c +++ b/VEX/priv/ir_defs.c @@ -5285,6 +5285,7 @@ Bool eqIRRegArray ( const IRRegArray* descr1, const IRRegArray* descr2 ) Int sizeofIRType ( IRType ty ) { switch (ty) { + case Ity_I1: case Ity_I8: return 1; case Ity_I16: return 2; case Ity_I32: return 4; That just can't be right. To ask for the byte size for Ity_I1 is more or less meaningless -- it is (conceptually) a single bit. Whatever you commit, please don't commit this change. Why is this necessary? Maybe we can find a different solution to whatever problem this solves.
(In reply to Stefan Maksimovic from comment #4) > Created attachment 125693 [details] > test program log with grail changes Thank you for the attachments. Let me see if I understand what you are saying. In this log file, for SB #4, the front end logic analysed three sequences: 0x4013BC: 0x8C43FFFC lw r3, 65532(r2) 0x4013C0: 0x1460FFFE bne r3, r0, 65534 0x4013C4: 0x24420004 addiu r2, r2, 4 This is OK. This is what you could call the "initial" sequence for the SB. As I see it, it correctly contains the terminating branch (the bne) and the delay slot (addiu). Yes? Next we have 0x4013BC: 0x8C43FFFC lw r3, 65532(r2) 0x4013C0: 0x1460FFFE bne r3, r0, 65534 0x4013C4: 0x24420004 addiu r2, r2, 4 That's pretty strange, because it's a repeat of the entry block, but maybe it's because the insn at 0x4013C0 branches back to the start of the block (0x4013BC) (so this is a word-search loop?) In this case, this is a speculative ("SPEC") disassembly. Both the branch and delay slot are analysed, but that is just luck -- in that the delay slot just happens to fall within the 3-instruction limit. So anyway, ignoring that, we have, finally, a second speculative disassembly: 0x4013C8: 0x2442FFFC addiu r2, r2, 65532 0x4013CC: 0x8F9981B0 lw r25, 33200(r28) 0x4013D0: 0x0320F809 jalr r31 r25 This is obviously wrong, because the delay slot (which would be at 0x4013D4) is missing. Do I understand all that correctly?
Thank you for your reply, Julian. We pretty much agree with your analysis of the initial SB sequence as well as the first and second speculative disassembly. A note for the first speculative disassembly: you guessed correctly, the branch at 0x4013C0 does jump back to 0x4013BC. To be a bit more clear, the 16 bit offset of the bne instruction (0xFFFE signed, in this case) is left shifted 2 bits and added to the address following the branch, forming the target address of the jump.
(In reply to Stefan Maksimovic from comment #7) > Thank you for your reply, Julian. > > We pretty much agree with your analysis of the initial SB sequence as well as > the first and second speculative disassembly. > > A note for the first speculative disassembly: you guessed correctly, > the branch at 0x4013C0 does jump back to 0x4013BC. > > To be a bit more clear, the 16 bit offset of the bne instruction (0xFFFE > signed, in this case) is left shifted 2 bits and added to the address > following the branch, forming the target address of the jump. Quoting the instruction set reference, as I may have provided erroneous information about the bne instruction above: "An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address."
(In reply to Stefan Maksimovic from comment #8) > > We pretty much agree with your analysis [..] Good! So .. last Sunday, in discussion at Fosdem, one of the three of you (not sure who) suggested that this might be fixed as follows (at least, this is what I interpreted the idea to mean): in disInstr_MIPS, if the insn to be disassembled is a branch, then *also* disassemble the following instruction at the same time, hence providing IR for both instructions together. In effect this would mean that disInstr_MIPS usually "moves forward" 4 bytes, but sometimes 8 bytes. Does that make any sense? Would that help? Right now I can't think of any other fix. What happens if the instruction in a delay slot is itself a branch, though? Is that allowed? Is it a situation you need to handle?
Created attachment 126199 [details] Newly proposed solution pt 1
Created attachment 126200 [details] Newly proposed solution pt 2
Created attachment 126210 [details] cdebug_zlib logs An update regarding the proposed approach in the last comment: We've modified the disInstr_MIPS_WRK function to recursively call itself in case of a branch instruction. This has proven itself to be effective, as valgrind now does speculate on conditional branches which can be seen in the debug output (after the [proposed solution pt 2] has been applied). The branches and their delay slots get nicely bundled in a single 8-byte instruction. This comprises the changes in the [proposed solution pt 1]. A few problems exist though, as there are a couple of tests failing across our test machines. They appear after the [proposed solution pt 2] patch has been applied, which enables the speculative execution. As an example of that, we can take a look at memcheck/tests/cdebug_zlib (its source, cdebug.c): int main() { int x; if (x) return 1; return 0; } In addition to the existing output in its stderr file, we now have an additional message: Syscall param exit_group(status) contains uninitialised byte(s) at 0x491A568: _Exit (_exit.c:32) by 0x48A0E58: __run_exit_handlers (exit.c:97) by 0x48A0ED8: exit (exit.c:104) by 0x4881800: (below main) (libc-start.c:321) We've analysed this for some time, but haven't been able to get to the bottom of where the new error originates from, nor do we observe any other architectures experiencing this kind of failure. The attached file includes the debug output of the cdebug_zlib test runs. Any thoughts on this?
Created attachment 126427 [details] cdebug_zlib main fn logs, w/ and w/o frame ptr used Another observation: Modifying the main function of memcheck/tests/cdebug.c so as to make the SPEC side exit block differ from the original results in valgrind not doing the &&-transform which in turn does not produce the extra error mentioned in the previous comment. That is because the new side exit block terminates with a PUT($pc) which gets a non-constant value, is that correct? There are two files in the attachment, the shorter one differs from the original in being compiled with -fomit-frame-pointer. I've only left the debug output of the main function, the rest should be the same with only one difference which is the aforementioned error that appears right after the _Exit function: Syscall param exit_group(status) contains uninitialised byte(s) at 0x4919B38: _Exit (_exit.c:31) by 0x489AEA0: __run_exit_handlers (exit.c:98) by 0x489AF4C: exit (exit.c:105) by 0x48804C4: (below main) (libc-start.c:325) Uninitialised value was created by a stack allocation at 0x4005F0: main (cdebug.c:1) Shouldn't the _Exit function receive either 0 or 1 (coming from the return statements) despite the fact that the variable x is uninitialized? Does the transformation somehow propagate x to the _Exit function or does something entirely different happen instead?
Created attachment 126546 [details] branch special case patch Another update: Analyzing the mips and the x86 debug output side by side we noticed the following: The fallthrough blocks in the main function differ by x86 having an unconditional jump, which when translated to IR results in a single PUT(PC) instruction. Mips on the other hand uses a conditional branch instruction with parameters set so as to always take one of the two paths, making the jump effectively unconditional. By using a conditional branch a conditional PUT(PC) was generated in the IR which enabled the &&-transform to take place, resulting in the outcome described in previous comments. Attached patch modifies this special case of the branch instruction by not generating conditional PUTs to PC. Having applied this change the &&-transform does not take place in the main function, much like in the x86 case. This fixes a number of failing tests on our side.
We tested these changes a couple of days ago on a number of mips32/mips64 development boards and found no regressions present as of now.
Hi Stefan, Thank you for the analyses, patches and testing. Overall, your solutions look good to me. I have just one question, regarding this patch: Subject: [PATCH 1/2] mips: Treat delay slot as part of previous instruction Do so by recursively calling disInstr_MIPS_WRK() if the instruction currently being disassembled is a branch/jump, effectively combining them into one IR instruction. Is there any danger of this being incorrect even in pathological cases (eg, the branch delay slot itself holding a branch, whose delay slot also holds a branch, etc? -- or something like that?) I mean .. is there any way that this could possibly fail, given the strangest, most artificial code sequence? Or is it always safe? Also I was a bit concerned about the recursive use of disInstr_MIPS_WRK. Closely related to the previous question .. is there any way to cause infinite, or at least arbitrarily deep, recursion?
Created attachment 126749 [details] handle branches in delay slots We've taken a second look at the possible scenarios you described and updated the current changes so as to cover the special cases which could have possibly caused disInstr_MIPS_WRK to go into an arbitrarily deep recursion. The result is that valgrind now mimics the behaviour of a program by emitting a SIGILL in cases where the program ran by itself also does. Tested it on a contrived example (by having consecutive branches in the delay slots of those above) and managed to get a matching behaviour between valgrind and a standalone run of the example.
(In reply to Stefan Maksimovic from comment #17) > Created attachment 126749 [details] > handle branches in delay slots > > The result is that valgrind now mimics the behaviour of a program by > emitting a SIGILL in cases where the program ran by itself also does. That's great! So I think we're ready to land now. Can you land it, or should I? If me, is it all of the 4 attached patches? Or are some of them obsolete?
We'll land the changes if you don't mind. One question though, do you happen to know the exact date of the code freeze? In order for us to know by which date we need to commit the patches.
(In reply to Stefan Maksimovic from comment #19) > We'll land the changes if you don't mind. > > One question though, do you happen to know the exact date of the code freeze? > In order for us to know by which date we need to commit the patches. I don't know the date now, no. I would suggest you land them as soon as you feel they are stable enough, so that others can test them. Then, if there are stability problems after landing, we can fix or in the worst case back out, as necessary. Does that work for you?
Looks like the new code freeze depends on these patches landing first. Could you please commit them as soon as possible?
The patches have been committed. Thanks everyone.